sa-application text retrieval software expert 5.0...

590
Copyright © 1997 by Software Artistry, Inc. All rights reserved Text Retrieval Guide SA-Application Software Expert 5.0

Upload: others

Post on 18-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Copyright © 1997 by Software Artistry, Inc.

All rights reserved

Text Retrieval Guide

SA-Application Software Expert 5.0

Page 2: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

All rights to this publication are reserved. No part of this manual may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose other than the purchaser’s personal use without the written permission of Software Artistry, Inc.

SA-Application Software Expert, SA-Expert Advisor, SA-Expert Web, and SA-Expert Mail Agent are trademarks of Software Artistry, Inc.

IBM, OS/2 and DB2/2 are registered trademarks if International Business Machines Corp.

Microsoft, Windows, Windows NT, and Windows 95 are registered trademarks of Microsoft Corp.

Oracle, Oracle 7, SQL*DBA, SQL*Net, and SQL*Plus are trademarks of Oracle Corp.

SQLBase, SQLTalk, and SQLRouter are trademarks of Gupta Technologies, Inc.

SYBASE, Transact-SQL, and DB-Library are trademarks of Sybase, Inc.

Informix, Informix ESQL/C, adn Informix SE are trademarks of Informix Software, Inc.

Oracle, Oracle 7, SQL*DBA, SQL*Net, and SQL*Plus are trademarks of Oracle Corp.

Fulcrum is a registered trademark, and Ful/Text, Intuitive Searching, SearchBuilder, SearchServer, and SearchSQL are trademarks of Fulcrum Technologies Inc. All other names used herein are trademarks of their respective owners.

Any other products mentioned in this document are trademarks of their respective companies.

Page 3: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Purpose of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Audience Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Overview of the Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8The SA-ASE 5.0 Documentation Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 1 Indexing External Documents . . . . . . . . . . . . . .11Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Using Indices in External Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 2 Indexing Database Text Fields. . . . . . . . . . . . . .21Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Indexing Text Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 3 Creating Indexes With SABuild. . . . . . . . . . . . .29Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Using Indexes With Build Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Appendix A Document Retrieval Function Reference . . . . .33SQLDBTextIndexCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34SQLDBTextIndexDeleteSyntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36SQLDBTextIndexUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37SQLDBTextIndexUpdateAll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Appendix B Messages and Error Codes . . . . . . . . . . . . . . .39Error Messages and SQLSTATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45SearchServer Utility Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Appendix C Data Preparation and Administration . . . . . .125Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131The Administration Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Table of Contents

Page 4: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Maintaining the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195Altering the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233Providing Support Files for Searching . . . . . . . . . . . . . . . . . . . . . . . . . . 243Verifying the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257GLOSSARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261Utility Program Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Text Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299Table Management Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313Control Characters and Control Sequences . . . . . . . . . . . . . . . . . . . . . . 321

Appendix D SearchSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 341SearchSQL Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347The Search Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353SearchSQL Language Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411SearchSQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437System Information Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547Character Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573Character Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .589

Page 5: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This chapter covers the following topics:

•Purpose of This Book

•Audience Considerations

•Overview of the Contents

•Document Conventions

•The SA-ASE 5.0 Documentation Set

About This Guide

Page 6: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

6 SA-Application Software Expert 5.0

Purpose of This Book

What this book contains

The purpose of the Text Retrieval Guide is to help you learn how to use the Fulcrum SearchServer to quickly retrieve the text you need.

SearchServer is a complete document retrieval engine that is used to create indexes for large test-based files as well as large text fields in a database. The indexes can be queried using a SQL-like syntax in order to locate critical information rapidly.

Querying indexes of database text fields, rather than the fields themselves, significantly decreases the time required to accomplish query tasks that might otherwise be performed using SQL’s Like predicate.

What you will be able to do

This book serves as a reference guide when you:

•Install the SearchServer software

•Create indices using external documents

•Index database text fields

•Create indices using SA-Build scripts

Audience Considerations

Who should read this book

This book is intended for users who configure, customize, or create applications at your site, including:

•SA-Script programmers

•network administrators

•system administrators

Page 7: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Overview of the Contents

Text Retrieval Guide 7

What you should know before beginning

Before you use this book, you should be familiar with:

• The SA-ASE development tools

• Customization practices at your site, and how your function relates to the customization process

• Operating systems used at your site

Overview of the Contents

What is in this book

The SA-ASE 5.0 Text Retrieval Guide contains the following chapters. Primary chapter topics are noted as well.

•Chapter 1, “ Installing SearchServer,” explains how to install SearchServer on both a server and client workstation.

•Chapter 2, “Indexing External Documents,” explains how to use SearchServer works and how to use SearchServer to index external documents.

•Chapter 3, “ Indexing Database Text Fields,” explains how to use Search Server to index database text fields.

•Chapter 4, “Creating Indexes with SABuild,” explains how to create indexes with SABuild scripts.

•Appendix, “Document Retrieval Function Reference,” lists the SA-Script references for SearchServer.

How to use this book

This book is intended as a reference to assist you in using the Interface Designer. You can refer to any part of the book without reading preceding chapters.

Page 8: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

8 SA-Application Software Expert 5.0

Document Conventions

Introduction There are several conventions used throughout this book to identify different text use or to point out particularly important information.

Text use • Italic text is used in procedures to indicate text that you must substitute in the course of a procedure.

Type the file name: SAIDEV

Note: UNIX file names do not necessarily have extensions.

• Italic text is also used for new terms.

client/server architecture

• Bold text is used for variable and data source names when they appear in text references.

SQLFetch

• Bold text also denotes text that you must type, either at a command prompt or in a text box.

Type login

• Monospace fonts are used for code examples.

Function Add Data : INTEGER IS

Icons used in this book

Icons appear occasionally as a means of emphasizing a particularly important point.

Notes use the following format.

Page 9: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The SA-ASE 5.0 Documentation Set

Text Retrieval Guide 9

Note: This is an example of a note.

Cautions occur when user actions might adversely affect data integrity.

Caution: This is an example of a caution.

Product names

The following table shows the names by which Software Artistry products are referred to in the documentation set.

The SA-ASE 5.0 Documentation Set

Other books in the set

The following books, in addition to the SA-ASE 5.0 Text Retrieval Guide, comprise the total SA-ASE document set:

• SA-ASE 5.0 Tools and Utilities Guide - A book that describes the SA-ASE Integrated Development Environment and the various tools contained in it. The book provides detailed information and procedures to help you learn to use the tools in your environment.

Product Name Referred to as

SA-EXPERTISETM for CRM EXPERTISE

SA-EXPERTISETM for ESM EXPERTISE

SA-Application Software Expert

ASE

SA-Expert Administrator Expert Administrator

SA-Expert Advisor Expert Advisor

SA-Expert Evolution Expert Evolution

SA-Expert Foundation Manager

Expert Foundation Manager

SA-Expert Mail Agent Expert Mail Agent

SA-Expert Quality Expert Quality

SA-Expert Support Expert Support

SA-ExpertView ExpertView

SA-Expert Web Expert Web

Page 10: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

10 SA-Application Software Expert 5.0

• SA-ASE 5.0 SA-Script Language Reference - A book that provides command references for SA-Script and for the APIs used with SA-Script. The command references include syntax, return codes, and system constants.

• SA-ASE 5.0 SA-Script Programming Guide - A book that provides information about SA-ASE and the SA-Script programming language.

• SA-ASE 5.0 Legacy APIs Guide - A book that contains information about the EHLLAPI and CPIC APIs in SA-ASE.

• SA-ASE 5.0 Interface Designer Guide - A reference book that describes the Interface Designer. This new tool can be used to create dialog box forms, menus, toolbars, and string tables for any GUI applications that you build.

• SA-ASE 5.0 Tutorials - A book that contains tutorial exercises for the primary functions found in SA-ASE.

Page 11: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This chapter covers the following topics:

•Introduction

•Using Indices in External Documents

1Indexing External Documents

Page 12: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing External Documents

12 SA-Application Software Expert 5.0

1

Introduction

Overview In this chapter you will learn how to use SearchServer to create indexes that can be used to retrieve data from external documents.

Efficient document retrieval depends on an index to locate a specific document quickly and easily, usually by specifying items of information as keywords for a query.

Specifically, you will learn how to:

• Define a SearchServer index

• Store supplementary information in an index

• Use a schema

• Insert data in an index

• Build a SearchServer index

• Query a SearchServer index

Note: Unless your organization has purchased a license to index external documents, you cannot use the external document indexing function in SearchServer.

Page 13: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using Indices in External Documents

Text Retrieval Guide 13

Using Indices in External Documents

Overview This section discusses the essential information needed to create and manipulate text indexes for external documents via SA-Script. External documents are documents that are stored in files on a computer, as opposed to documents or text that might be stored in a database or other media.

Defining a SearchServer index

Because SearchServer behaves like a database in many ways, it may be helpful to think of it in that way.

• SearchServer indexes are created on the SearchServer database using the Create Schema and Create Table commands. These commands can be applied in an EABuild script, or via SQL_ExecuteImmediate in an SA-Script knowledgebase.

• Create Table uses the standard SQL syntax to create an index. For example, if you want to create an index of all the non-confidential memorandums that are distributed in a company, an initial index definition might look like this: Create Table MemoNDX (Memo APVARCHAR)

This table has only one (implicit) column of a special type, which is called APVARCHAR. This type indicates that the contents of the column consist of the contents of external documents that are associated with each row. Because only one external document can be associated with each row of the index’s table, a table definition can have only one column of type APVARCHAR.

To avoid having the same data in two places, the APVARCHAR column acts as a proxy. Data is retrieved from a file only when it is specifically requested by the user (either to build the index, or via an explicit call to SQLFetch).

Note: All SearchServer index names must be eight characters or less if your operating system’s file system does not support file names with more than eight characters.

Page 14: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing External Documents

14 SA-Application Software Expert 5.0

1

Reserved columns

When you create a SearchServer index, an additional twenty reserved columns are created as columns of the table. These columns are part of all SearchServer index tables and are primarily used by SearchServer to store information about the contents of an index. During normal use of SearchServer, you should only have to use a few of the reserved columns.

The most commonly used reserved columns are listed below.

• FT_CID is the Key column of an index table. It stores a unique integer for every row in the table.

• FT_FLIST contains a coded list of text filters that SearchServer uses to retrieve the text from the document associated with the row. Text filters are described in detail later in this chapter.

• FT_SFNAME contains the complete file name of the external document.

• FT_TEXT is the initial name of the APVARCHAR column. If you do not define an APVARCHAR column in your index table definition, you can still access the indexed file via this column name.

Note: To avoid conflicts with the reserved columns, it is recommended that column names do not start with “FT.”

Storing supplemen-tary information in an index

You may want to store supplementary information about indexed data in each row of the index table as shown in the following example.

Create Table MemoNDX(DistDate DATE,Author VARCHAR(40),Subject VARCHAR(30),Memo APVARCHAR)

This data is stored in the index. Users can employ these columns to supplement any queries they make against the actual document.

Page 15: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using Indices in External Documents

Text Retrieval Guide 15

Using a schema

To add specific text retrieval functionality to the index table, SearchServer uses the concept of a schema. Schemas define a view of a table; that is, it defines visible attributes, attribute domains, and other information. Preceding the Create Table clause with a Create Schema clause allows you to include further definitions and configurations. These definitions and configurations alter SearchServer’s behavior with regard to the table. An example follows:

Create Schema MemoNDXCreate Table MemoNDX(DistDate DATE,Author VARCHAR(40),Subject VARCHAR(30),Memo APVARCHAR)IMMEDIATE

Note that the schema has the same name as its corresponding table. This is not a requirement, but is strongly recommended to avoid confusion.

The IMMEDIATE key word alters the behavior of the index so that in this example, new data is indexed immediately after it is inserted.

The default behavior of an index is to wait for an explicit command to update an index with a Validate Index command.

Note: Even if you have no need for the extra functionality provided by using a schema, Fulcrum recommends that you precede all Create Table commands with a Create Schema command.

There are a variety of other key words and definitions that you can add to a Create Table statement if you precede the statement with a Create Schema statement.

Refer to the SearchServer documentation to learn more about these options.

Page 16: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing External Documents

16 SA-Application Software Expert 5.0

1

Inserting data into a SearchServer index

After an index has been created, the system administrator can insert data into the index using SQLInsert:

Function AddData : INTEGER ISVARIABLES DistDate : Date; Author : String; Subject : String; FT_SFNAME : String; FT_FLIST : String;

ACTIONS DistDate := $TODAY; Author := ‘Ed’; Subject := ‘New Dental Plan’; FT_SFNAME := ‘C:\Docs\Memos\DentPlan.doc’; FT_FLIST := ‘ww:s’; SQLInsert(‘MemoNDX’, DistDate, Author, Subject, FT_SFNAME, FT_FLIST);END;

Notice the use of the two reserved columns FT_SFNAME and FT_FLIST:

• FT_SFNAME defines the name of the file that is to be indexed and associated with this row in the index table.

• FT_FLIST contains the colon (:) separated list of text readers (or text filters) that are used to retrieve the file for SearchServer.

The text values s and wweach identify a specific filter. The s filter is the standard filter for opening and reading a file as a series of ASCII characters. The ww filter is designed to extract text from Microsoft Word files.

Filter order The order in which filters are listed is important.

• SearchServer activates the filters by starting with the rightmost one in the list and moving left. Thus, the last filter (s) in the list is the first one activated by SearchServer.

• The first filter is responsible for opening the file. It then passes the text it finds in that file on to the filter that precedes it.

Page 17: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using Indices in External Documents

Text Retrieval Guide 17

• The preceding filter (ww) reads the text of the file and performs translations on it before passing the data to the filter (if any) that precedes it in the list.

• When the first filter in the list receives the text, it passes its output directly to SearchServer. SearchServer then performs the actual indexing.

Filter Options FT_FLIST can have as many filters listed as necessary to translate a file for SearchServer.

If no filters are specified, SearchServer assumes that you only need the default s filter to open the file. The s filter passes a file’s contents directly to the indexing engine (that is, no translation is necessary).

Many text filters allow you to alter their behavior by including parameters as you do on a command line. For example, if you want the Microsoft Word text filter to index any hidden text it finds in a Word document, you add the parameter /h=i to the definition.

The new definition in the SA-Script example on page 16 would look like:

FT_FLIST := ‘ww/h=i:s’;

To find out more about the available filters and their options, refer to the SearchServer Data Preparation and Administration guide.

When you refer to the example on page 16, you may notice that the Memo column does not seem to be filled in.

Memo is actually a special column that represents the text of the external document that is indexed. By placing a filename into the reserved FT_SFNAME column, you “fill” the Memo column.

Page 18: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing External Documents

18 SA-Application Software Expert 5.0

1

Building a SearchServer index

Once you define a SearchServer index with the Create Schema/Create Table commands and insert rows into the index, you are ready for SearchServer to index the data in the external documents.

The command for performing this action is Validate Index. Here is an example of how it might be used:

SQLExecuteImmediate(‘Validate Index MemoNDX Validate Table’);

The Validate Table command added to the end of the string tells SearchServer to make sure the data in the index is up to date relative to the external documents.

If Validate Table is omitted, SearchServer only updates the index with documents that were inserted since the last update. SearchServer does not check old documents for changes.

Caution: Depending on the amount of data you index, building or rebuilding an index can take a considerable amount of time. This is especially critical in multi-user environments because users cannot access data while it is being indexed.

Querying a SearchServer index

You can query indexes with the SQLSelect and SQLSelectInto commands. However, SearchServer allows a number of additional predicates to be added to a Select query.

A few example queries follow:

• Select memos regarding hardware repair:

Select * from MemoNDX Where Memo contains ‘Hardware’ AND ‘Repair’

• Select all memos regarding document retrieval, doc retrieval, or text retrieval:

Select * from MemoNDX Where Memo contains ‘document’, ‘doc’, ‘text’ WITHIN 5 CHARACTERS OF ‘retrieval’

Page 19: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using Indices in External Documents

Text Retrieval Guide 19

• A query similar to the preceding one can also be invoked using a wildcard:

Select * from MemoNDX Where Memo contains ‘doc*’, ‘text’ WITHIN 5 CHARACTERS OF ‘retrieval’

• You can do an ‘intuitive’ search using a series of words like this:

Select * from MemoNDX Where Memo IS_ABOUT ‘Where is new software stored’

Note: SearchServer indexes are case insensitive.

Deleting a SearchServer index

Like a database, SearchServer indexes are removed with a Drop Table command like this:

SQLExecuteImmediate(‘Drop Table MemoNDX’);

All the supplemental data and index information associated with MemoNDX are deleted by this call. The external documents themselves are not be affected.

Other SearchServer index activities

There are other SQL functions that operate with SearchServer indexes. For a complete list, refer to the section entitled “Using SQL with SearchServer” in the chapter 3.

Page 20: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing External Documents

20 SA-Application Software Expert 5.0

1

Page 21: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This chapter covers the following topics:

•Introduction

•Indexing Text Fields

2Indexing Database Text Fields

Page 22: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Database Text Fields

22 SA-Application Software Expert 5.0

2

Introduction

Overview A text retrieval index is a database used to access pre-existing data files or documents. The index does not affect the original format in which data is stored.

In this chapter you will learn how to:

• Use SQL with SearchServer

• Create an index

• Query an index

• Update an index

Using SQL with SearchServer

In SA-Script, SearchServer appears as an additional data source that can be manipulated using the same SQL commands that are used with database data sources. Indexes appear as “tables” within the data source. Each document (or database field) that is indexed occurs as a row in the index table.

The following commands can be used effectively with a SearchServer data source:

• SQLCloseAll

• SQLCloseCursor

• SQLCommand

• SQLDelete

• SQLDeleteCurrent

• SQLExecuteImmediate

• SQLFetch

• SQLFormat

• SQLInsert

• SQLSelect

Page 23: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Text Fields

Text Retrieval Guide 23

• SQLUpdate

• SQLUpdateCurrent

• SQLGetOption

• SQLSetOption

Indexing Text Fields

Text readers The text reader is a filter that extracts data from a file or from a database and sends it to the SearchServer engine. For more information about text filters, refer to the section entitled “Inserting data into a SearchServer Index” in Chapter 2.

Increased SA-Script text retrieval functionality

To simplify the use of FT_FLIST (which has special requirements), three functions have been added to SA-Script.

• SQLDBTextIndexCreate

• SAIDBTextIndexDelete

• SQLDBTextIndexUpdate

There are two functions that perform updates in database text indexes.

• SAI_NDX_UPDATE

• SAI_NDX_REBUILD

Note: These functions must be called while you are attached to the current database. You cannot be attached to the SearchServer data source when you call these functions.

Each of these functions are discussed in greater detail in Appendix A.

Creating a database text index

Database text indexes are created with the SQLDBTextIndexCreate command in SA-Script.

Page 24: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Database Text Fields

24 SA-Application Software Expert 5.0

2

The following example shows an index for the Expert Advisor Solutions table and displays the resulting Success/Error code from the call.

KNOWLEDGEBASE MakeNDX;USES TEXTRET;ROUTINESPROCEDURE TestMain;PRIVATEROUTINESPROCEDURE TestMain ISVARIABLES Col : IndexFieldRec; ColList : LIST OF IndexFieldRec; Lines : List of String; nRC : Integer; whdl : Window;

ACTIONS SQLCommand(‘connect ADVISOR’); Col.TableName := 'solutions'; Col.FieldName := 'solution_id'; Col.Flags := BitOr (SAI_DBTRNDX_VALUE, SAI_DBTRNDX_KEY); ListPush(ColList, Col); Col.TableName := 'dbo.solutions'; Col.FieldName := 'description'; Col.Flags := SAI_DBTRNDX_LONGCHAR; ListPush(ColList, Col); nRC := SQLDBTextIndexCreate('solndx', ColList); ListInsert(Lines, nRC, $BEFORE); WinCreateScrollWindow($Desktop, whdl, $NullHandler, 5,5,50,15, 'Create Index', $SystemMonospaced, 10, $WinDefaultStyle); WinWriteLN(whdl, Lines); WinWait(whdl); END;

Page 25: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Text Fields

Text Retrieval Guide 25

Deleting a database text index

Using the SAIDBTextIndexDelete command deletes a database text index:

KNOWLEDGEBASE DelNDX;USES TEXTRET;ROUTINES PROCEDURE TestMain;PRIVATEROUTINESPROCEDURE TestMain ISVARIABLES Lines : List of String; nRC : Integer; whdl : Window;ACTIONS nRC := SQLCommand(‘connect ADVISOR’); IF (nRC <> SQL_SUCCESS) THEN EXIT;END; nRC := SQLDBTextIndexDelete('solndx'); ListInsert(Lines, nRC, $BEFORE); WinCreateScrollWindow($Desktop, whdl, $NullHandler, 5,5,50,15, 'Delete Index', $SystemMonospaced, 10, $WinDefaultStyle); WinWriteLN(whdl, Lines); WinWait(whdl); END;

Updating database text indexes

Updating text indexes does not occur automatically. Thus, you must update the indexes on a regular basis in order to keep the indexes current.

There are two functions that perform updates in database text indexes.

• SAI_NDX_UPDATE is the default command and updates every known database text index. If the index already exists, it is updated to account for changes in the data. If the index does not exist yet (that is, it is merely defined), it is created at this point.

• SAI_NDX_REBUILD always builds an index, whether it exists or not.

Page 26: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Database Text Fields

26 SA-Application Software Expert 5.0

2

The following is an example of their use:

KNOWLEDGEBASE updatsol;USES TEXTRET;ROUTINESPROCEDURE TestMain;PRIVATEROUTINESPROCEDURE TestMain ISVARIABLES Lines : List of String; nRC : Integer; whdl : Window;ACTIONS nRC := SQLCommand('Connect ADVISOR'); IF (nRC <> 1) THEN EXIT;END; ListInsert(Lines, nRC, $AFTER); nRC := SQLDBTextIndexUpdate('solndx', SAI_NDX_REBUILD); ListInsert(Lines, nRC, $AFTER); WinCreateScrollWindow($Desktop, whdl, $NullHandler, 5,5,50,15, 'Update Index', $SystemMonospaced, 10, $WinDefaultStyle); WinWriteLN(whdl, Lines); WinWait(whdl); END;

Querying a database text index

The goal of querying a database text index is to locate fields in the actual database rapidly.

To facilitate this function, the SQLSelect command seeks and acts upon the presence of a special escape sequence, $TextSearch. This sequence allows queries to a SearchServer index that can be included with a query to the database.

A valid entry in a $TextSearch sequence is a valid Where clause for a SearchServer datasource.

Page 27: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Text Fields

Text Retrieval Guide 27

Review the following examples.

• Select all the solutions that relate to hardware maintenance. Select * from solutions where $TextSearch contains ‘hardware’ & ‘maintenance’)

• Select all the active solutions that relate to hardware maintenance.

Select * from solutions where $TextSearch (contains ‘hardware’ & ‘maintenance’) AND Active = 1

• Select all the active solutions where ‘hardware’ appears near ‘applegate’ or the solution_id > 3000.

Select * from solutions where $TextSearch (contains ‘hardware’ within 50 characters of ‘applegate’) AND Active = 1 OR solution_id > 3000

Querying multiple indexes

If multiple SearchServer indexes are associated with a single table in a database, then you must specify the index name within the $TextSearch clause. The index name should precede the query and be followed by a semicolon (;).

Select * from solutions where $TextSearch (solNDX; description contains ‘hardware’ within 50 characters of ‘applegate’) AND Active = 1 OR solution_id > 3000

Important: Only one $TextSearch clause can appear within a SQLSelect command.

Page 28: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Indexing Database Text Fields

28 SA-Application Software Expert 5.0

2

Example query

Following is a code example of an index query.

KNOWLEDGEBASE querysol;ROUTINESPROCEDURE TestMain;PRIVATEROUTINESPROCEDURE TestMain ISVARIABLES Lines : List of String; nRC : Integer; whdl : Window; cursor : SQLCURSOR; System : String;ACTIONS nRC := SQLCommand('CONNECT ADVISOR'); IF (nRC <> 1) THEN EXIT;END; ListInsert(Lines, nRC, $BEFORE); nRC := SQLSelect(cursor, 'SELECT SYSTEM FROM SOLUTIONS WHERE $TextSearch(SOLNDX;DESCRIPTION CONTAINS ''PROBLEM'')'); ListInsert(Lines, nRC, $BEFORE); nRC := SQLFetch(cursor, System); WHILE (nRC = 1) DO ListInsert(Lines, System, $BEFORE); nRC := SQLFetch(cursor, System); END; ListInsert(Lines, nRC, $BEFORE); SQLCloseCursor(cursor); WinCreateScrollWindow($Desktop, whdl, $NullHandler, 5,5,50,15, 'Query Index', $SystemMonospaced, 10, BitOr($WinDefaultStyle, $WinVScroll)); WinWriteLN(whdl, Lines); WinWait(whdl); END;

Page 29: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This chapter covers the following topics:

•Introduction

•Using Indexes With Build Scripts

3Creating Indexes With SABuild

Page 30: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Creating Indexes With SABuild

30 SA-Application Software Expert 5.0

3

Introduction

Overview This chapter describes how to create indexes with SABuild scripts.

SABuild has document retrieval support built into its system so that text retrieval indexes are created when other database tables and indexes are created.

Using Indexes With Build Scripts

Creating indexes during a build

Additions have been made to the syntax of build scripts so that SearchServer database indexes can be created during a build. The format for building a database text index is shown here:

#TEXTINDEXCREATE (<Index Name>,{<Table Name>, <Column Name>, <Type/Key Information>},...);

Caution: Despite the formatting of SABuild commands in this manual, the entire command with all its arguments must appear on one line in an SABuild script.

Build index example

Following is an example of the SABuild:

#TEXTINDEXCREATE (SOL_DESC, {SOLUTIONS, DESCRIPTION, LONGCHAR}, {SOLUTIONS, SOLUTION_ID, KEYVALUE});

The example creates a new database text index called SOL_DESC, which is an index of the Description and Solution_Id fields of the solutions table. Each indexed field is identified by the table name, column name, and type information about the column.

Page 31: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using Indexes With Build Scripts

Text Retrieval Guide 31

These three values are comma-separated and placed within curly braces ({...}). Each set of three values is also separated by a comma.

Rules for creating database text Indexes

Following are the basic rules that must be followed when you create a SearchServer index.

• All SearchServer index names have a maximum of eight characters.

• Table names should be left unqualified or marked with $QUAL.

• There are five valid entries for the type information that makes up the third value in a column identifier:

• LONGCHAR identifies columns that contain text to be indexed. There must always be at least one field of this type in any database text index.

Example: varchar and char fields

• VALUE identifies columns that contain numbers which contain a numeric entry. These columns are stored with the index, but are not made part of it.

Example: integer and long fields

• LITERAL identifies a non-numeric column that is stored with the index, but not made part of it.

Example: date, time, and char fields.

• KEYVALUE fields fit the criteria for VALUE but also are a primary key for the table.

• KEYLITERAL fields fit the criteria for LITERAL but also are a primary key for the table.

• All primary keys for a table must also be declared as a part of the index. Each key should be given the type identifier KEYVALUE or KEYLITERAL.

Deleting and updating text indexes

SABuild scripts can also be used to delete and update database text indexes. Remember that indexes cannot be deleted or updated unless an index exists.

Page 32: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Creating Indexes With SABuild

32 SA-Application Software Expert 5.0

3

The syntax for these two operations parallels SA-Script Build functions and is shown below.

#TEXTINDEXDELETE (<Index Name>);#TEXTINDEXUPDATE(<Index Name>);

Page 33: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This chapter covers the following topics:

•SQLDBTextIndexCreate

•SQLDBTextIndexDeleteSyntax

•SQLDBTextIndexUpdate

•SQLDBTextIndexUpdateAll

ADocument Retrieval Function Reference

Page 34: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

34 SA-Application Software Expert 5.0

SQLDBTextIndexCreate

Syntax FUNCTION SQLDBTextIndexCreate(IndexName : STRING, Fields : LIST OF IndexFieldRec) : INTEGER;

Argument Notes

The IndexName must be eight characters or less.

The IndexFieldRec type is declared as follows:

IndexFieldRec IS RECORD

TableName : STRING; --Name of the table to be indexedFieldName : STRING; --Name of the field to be indexedFlags : INTEGER;--Combination of KEY, VALUE,LITERAL, and LONGCHAREND;

This record is defined in TEXTRET.KB.

The definitions for each of the flags in IndexFieldRec are as follows:

• SAI_DBTRNDX_KEY is a key field in the database table. At least one IndexFieldRec passed to SQLDBTextIndexCreate must have this flag set.

• SAI_DBTRNDX_VALUE indicates that the column contains a numerical value.

• SAI_DBTRNDX_LITERAL indicates that the column contains a literal value (usually a string).

• SAI_DBTRNDX_LONGCHAR indicates that the column contains a character string (usually a long text field). At least one IndexFieldRec passed to SQLDBTextIndexCreate must have this flag set.

Page 35: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SQLDBTextIndexCreate

Text Retrieval Guide 35

Important: Database text indexes must contain every primary key field in the table that is being indexed as well as at least one field of type LONGCHAR.

Notes Based on the data it receives, SQLDBTextIndexCreate creates an index with the following characteristics:

• All fields are indexed and are able to be referenced as part of the text column of the index table.

• Each indexed field is in a separate SearchServer Zone within the text column. The name of the Zone matches the name of the column in the database table. (To find out more about Zones refer to the SearchSQL Reference guide and the Data Preparation and Administration guide.

All Key fields are stored in their own column in the index table. For example: VARIABLES NewIndex : LIST OF IndexFieldRec; Entry : IndexFieldRec;

ACTIONS

Entry.TableName = ‘EQ_Defects’; Entry.FieldName = ‘Defect_Title’; Entry.Flags = SAI_DBTRNDX_LITERAL; ListInsert(NewIndex, Entry); Entry.TableName = ‘EQ_Defects’; Entry.FieldName = ‘Defect_Desc_Text’; Entry.Flags = SAI_DBTRNDX_LONGCHAR; ListInsert(NewIndex, Entry); Entry.TableName = ‘EQ_Defects’; Entry.FieldName = ‘Defect_Steps_Text’; Entry.Flags = SAI_DBTRNDX_LONGCHAR; ListInsert(NewIndex, Entry); Entry.TableName = ‘EQ_Defects’; Entry.FieldName = ‘Defect_Title’; Entry.Flags = BitOr(SAI_DBTRNDX_KEY, SAI_DBTRNDX_VALUE); ListInsert(NewIndex, Entry); SQLDBTextIndexCreate(‘DefNDX’, NewIndex);END;

Page 36: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

36 SA-Application Software Expert 5.0

SQLDBTextIndexDeleteSyntax

Syntax FUNCTION SQLDBTextIndexDelete(IndexName : STRING) : INTEGER;

Argument notes

TableName specifies the name of a database text index to be deleted.

Notes SQLDBTextIndexDelete finds the specified index and drop the index. For example:

SQLDBTextIndexDelete(‘DefNDX’);

Return Codes Description

1 Success

(other) Error Code

Return Codes Description

1 Success

(other) Error Code

Page 37: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SQLDBTextIndexUpdate

Text Retrieval Guide 37

SQLDBTextIndexUpdate

Syntax: FUNCTION SQLDBTextIndexUpdate(TableName : STRING, Method : INTEGER) : INTEGER;

Argument Notes

TableName specifies the name of the index that will be updated.

Method specifies whether the index will be updated or rebuilt from scratch. This variable can have one of two values:

• SAI_NDX_UPDATE is the default. If the index already exists, it will be updated to account for changes in the data. If the index does not exist yet (that is, it is merely defined), it will be created at this point.

• SAI_NDX_REBUILD builds the index from scratch whether it exists or not.

Notes SQLDBTextIndexUpdate finds the specified index and performs the action specified in Method. For example:

SQLDBTextIndexUpdate (‘DefNDX’, SAI_NDX_REBUILD);

Return Codes Description

1 Success

(other) Error Code

Page 38: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

38 SA-Application Software Expert 5.0

SQLDBTextIndexUpdateAll

Syntax FUNCTION SQLDBTextIndexUpdateAll(Method : INTEGER):INTEGER;

Argument notes

Method specifies whether indexes are updated or rebuilt from scratch. This variable can have one of two values:

• SAI_NDX_UPDATE is the default. If indexes already exist, they are updated to account for changes in the data. If indexes do not exist yet (that is, are merely defined), they are created at this point.

• SAI_NDX_REBUILD builds indexes from scratch whether they exist or not.

Notes SQLDBTextIndexUpdateAll updates every database text index that exists on the current data source using the method defined by Method.

Example:

SQLDBTextIndexUpdateAll(SAI_NDX_REBUILD);

Return Codes Description

1 Success

(other) Error Code

Page 39: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This appendix is reprinted with permission from Fulcrum Technologies, Inc., and contains the SearchServer 3.5 Messages and Error Codes Manual.

BMessages and Error Codes

Page 40: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

40 SA-Application Software Expert 5.0

B

PrefaceThis preface provides:

•a description of the intended audience

•a synopsis of each chapter

•a summary of the text conventions used in this manual

•an abstract of the other documents in the SearchServer documentation set

About this Manual

This manual is intended for system administrators and developers and describes the error messages and SQLSTATES for the SearchServer software. The information in this manual is organized into two chapters:

Chapter 1, "Error Messages and SQLSTATES," provides a complete description of all error messages and SQLSTATES.

Chapter 2, "SearchServer Utility Messages," describes the messages that can be produced by the SearchServer utility programs.

Text Conventions

This manual uses the following conventions:

Convention What it is Used For

CaseSensitivity

Filenames and directory names are shown in UPPERCASE letters; however, they can be entered in lowercase letters if this is a requirement in your environment.

InitialCapitals

Initial capitals are used in Windows application program names. For example:

SearchDoc

Page 41: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 41

UPPERCASE Letters

Uppercase letters are used to represent statement names, keywords, table names, column identifiers, environment variables, mnemonic symbols, data types, filenames, and directory names. For example:

SELECT, ALL, STDOCS, FT_CID, FTNPATH, SQL_SUCCESS_WITH_INFO, SMALLINT, FULTEXT.MSG, BIN

bold Bold letters are used to represent utility program names and function names. For example:

ftmload, SQLColAttributes

[ ] Square brackets ([ ]) indicate that the elements of the syntax between them are optional. In the following example, the WHERE clause is optional.

DELETE FROM <table name>

[WHERE <search condition>]

< > Angle brackets (< >) represent an element of the syntax you must substitute with a specific value. In the following example, you would supply the name of a schema.

CREATE SCHEMA [REPLACE] <schema name>

{ } Curly braces ({ }) represent groups of elements in the syntax. For example: CREATE TABLE <table name> (<column definition>[{, <column definition>}...])

Convention What it is Used For

Page 42: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

42 SA-Application Software Expert 5.0

B

Related SearchServer Documentation

SearchServer includes a comprehensive documentation set that pro-vides the information you'll need to use SearchServer. If you also purchased a Fulcrum SearchServer Software Developer's Kit (SDK) or a Fulcrum SearchBuilder product, your documentation set will in-clude manuals written for your particular development environment (for example, a SearchBuilder for Visual Basic Developer's Guide).

What's New in SearchServer 3.0 Describes what's new and changed in SearchServer 3.0 and tells you where to look for more informa-tion. It provides a description of enhancements to the SearchSQL language statements as well as a description of the enhancements to the SearchServer API functions.

Introduction to SearchServer Provides a high-level introduction to the capabilities of SearchServer. It introduces the SearchServer con-cepts and describes the process required to embed text-retrieval in an application.

SearchServer Getting Started (platform specific) Provides installa-tion and configuration instructions and all platform-specific infor-mation (including limitations).

SearchSQL Reference Provides the complete definition (syntax and semantics) of the SearchSQL language. It also contains complete in-formation about searching tables and system information tables.

SearchServer Messages and Error Codes Provides quick and easy reference to the messages and error codes returned by SearchServer and the SearchServer utility programs.

SearchServer Database Integration Describes how to use database text readers that Fulcrum provides and explains how you can modify the template code. It also provides a guide to application-level inte-

| An OR bar ( | ) indicates a mutually exclusive entry. You can enter one of the options shown on either side of the bar, but not both. For example: <column name> {<data type> | <domain name>}

... An ellipsis (...) indicates that an element of the syntax can be repeated. For example: zone list ::= <zone number> [{,<zone number>}...]

Convention What it is Used For

Page 43: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 43

gration of text and structured data.

Fulcrum Customer ServicesWe've got some of the most knowledgeable experts in text-retrieval — experts in application design and development, database integra-tion and systems engineering. Fulcrum offers you a wide range of choices to help you leverage the value of SearchServer — by ana-lyzing your requirements and helping to design the application, by transferring knowledge to your developers through Fulcrum courses and seminars, by supporting Fulcrum products through our customer support team, and by offering expert consulting.

Customer SupportIf you have a question about SearchServer, first look in the printed version of the documentation, or consult the electronic version of the documentation (using SearchDoc) or online help. You can also find late-breaking updates and technical information about SearchServer by double-clicking the Readme icon in the Fulcrum program group or folder.

If you cannot find the answer, contact Fulcrum's Customer Support Team. Our technical support staff use Fulcrum's own text-retrieval software for fast and responsive phone support. Every support per-son has instant access to all of Fulcrum's support tools, including a history of known problems, on-line design notes and product docu-mentation, technical bulletins, and product source code.

Fulcrum allows you to choose the method of contact that best meets your needs, ranging from calling us directly to sending a request electronically.

Calling Directly Fulcrum provides telephone support to registered licensees of SearchBuilder and SearchServer Software Developer's Kits (SDKs) who have up to date support agreements. For technical support, call:

• 1-800-209-4357 (for support within North America)

• 1-613-238-7068 (for support within Ottawa and outside of North America)

Electronic Services When sending your request electronically, use the electronic version of the Case Report Form (CASE.TXT) and send it to:

[email protected]

Fax Services When sending your request by fax, use the Case Re-port Form located at the back of this manual and dial:

• 1-613-238-7695

Product Training and ConsultationTo quickly bring your developers up to speed on SearchServer, we offer hands-on, interactive education courses, featuring real-world examples. You can also benefit from Fulcrum's expertise through workshops on specialized application areas such as database integra-tion and text reader creation. Courses are available at Fulcrum training locations, or on-site at your offices to help maximize the use of Fulcrum tools within your

Page 44: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

44 SA-Application Software Expert 5.0

B

own environment. Our lab at corporate headquarters in Ottawa, Can-ada, is also available for your development team, complete with ap-plication experts as required.

Consulting ServicesFulcrum's professional services consultants have been designing and creating powerful integrated text-retrieval solutions for years. They

can help guide you to success at each stage of the development pro-cess:

• Evaluation and prototyping

• Requirements analysis and design

• Code review and walkthroughs

• Building application components such as text readers (filters), high-level APIs, user interfaces and system administration utilities

• Integrating Fulcrum products with other technologies (database, im-aging)

• Benchmarking

Page 45: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 45

Chapter 1:

Error Messages and SQLSTATES

This chapter describes the standard error messages and SQLSTATES returned by the SearchServer Application Program Interface (API), SearchSQL, and the SearchServer environment.

Overview

This chapter provides a numerically arranged listing of the possible error messages and SQLSTATES for the SearchServer API, SearchSQL, and the SearchServer environment. An explanation is provided for each error message, together with a possible solution where applicable. Depending on the SearchServer API function or SearchSQL statement that caused the error, more than one cause or solution may be supplied.

Message Prefixes

Message prefixes that appear before each error message are displayed in all SearchServer applications. For each error message that is generated, the vendor name and product name are enclosed in square brackets [ ]. For example:

[Fulcrum][SearchServer]Syntax Error in SQL dynamic statement

Depending on the type of driver you’re using, the vendor name and product name may be different.

List of Error Messages and SQLSTATES

01004 Data truncatedThis error message is returned by the following API calls:

SQLColAttributes, SQLDescribeCol: The column name returned was truncated because the output buffer was too small. Increase the size of the output buffer.

SQLExtendedFetch, SQLFetch: The data returned for one or more columns was truncated because one or more bound output buffers were too small. Increase the size of the output buffer.

Page 46: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

46 SA-Application Software Expert 5.0

B

SQLGetCursorName: The cursor name returned was truncated because the output buffer was too small. Increase the size of the output buffer.

SQLGetData: Not all the data from the specified column could be retrieved by a single call to SQLGetData. Re-execute SQLGetData to retrieve more of the data until SQL_SUCCESS is returned, or increase the size of the output buffer and use this new size in a call to SQLGetData.

SQLGetInfo: The information was truncated because the output buffer was too small. Increase the size of the output buffer.

01S00 Invalid connection string attributeAn invalid attribute keyword was specified in the SQLDriverConnect connection string. SearchServer ignored the invalid keyword and was able to connect.

01S01 Error in row An error occurred while fetching one or more rows of a rowset using SQLExtendedFetch. Review theelements of the RGFRowsStatus array to determine which rows were successfully retrieved. This erroris also reported by SQLSetPos when the SQL_REFRESH option fails to refresh the entire rowset. This function call returns SQL_SUCCESS_WITH_INFO.

01S02 Option value changed The SQLSetConnectOption or SQLSetStmtOption option value does not support a similar value, but SearchServer substituted the closest available equivalent. This function call returns SQL_SUCCESS_WITH_INFO.

02000 No data found The statement was a SELECT statement, but no rows were matched.

07008 Wrong number of bound columns. Fix in SQLBind-ColWhen retrieving data from the SEARCH_TERMS table, all columns in the select list must be bound. Use SQLBindCol to bind any columns that haven’t been bound, and repeat the call to SQLFetch.

Page 47: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 47

08001 Unable to connect to data source The Driver Manager was unable to connect to the data source.

08002 Connection in use The connection handle specified in a call to SQLConnect or SQLDriverConnect had already been used to establish a connection with a data source, and the connection was still open. Close the open connection and try again, or use another connection handle.

08003 Connection not open A connection has not been opened for the specified connection handle. Open a connection using the connection handle.

08004 Establishment of connection rejected The data source name is invalid. Enter a valid data source name.

22000 Data exception You’ve entered a character string in a column defined with a numeric data type, or the character string entered is too long. Verify the data type and length of the column.

22003 Numeric value out of range This SQLSTATE was returned for one of the following reasons: You’ve inserted or retrieved a numeric value that caused the number to be truncated. You’ve entered a value that exceeds the maximum value for the column’s data type. For instance, if the value for the column is SMALLINT, the value must be between minus 32,768 and 32,767. If the data type for the column is INTEGER, the value must not exceed 1,073,741,823. You’ve specified a buffer length that is too small to receive data from a numeric column.

24000 Invalid cursor state This SQLSTATE was returned for one of the following reasons:

• Retrieval was attempted, and the hStmt was in an executed state, but no result set was associated with the statement.

• SQLGetData or SQLSetColPosition was called, and a cursor was open on the hStmt, but neither SQLFetch nor SQLExtendedFetch had been called.

Page 48: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

48 SA-Application Software Expert 5.0

B

•SQLSetPos was called, and a cursor was open on the hStmt, but SQLExtendedFetch had not been called, or the cursor was positioned before the result set, or after the end of the result set.

•The SQLSetPos option argument was set to SQL_REFRESH. The cursor was positioned before the result set, or after the end of the result set.

•The SearchSQL statement associated with the statement handle has not finished.

•The statement handle could not be reused because there was an active SELECT statement.

•You’ve specified a back reference predicate that references a SELECT statement that either doesn’t have a WHERE clause, or hasn’t successfully completed its operation. Verify that the return status of the SELECT statement being referenced is correct.

•A positioned UPDATE or DELETE statement was referenced to a working table in which the cursor was not positioned on any row. You must position the cursor and lock the row by calling the SQLExtendedFetch or SQLSetPos API functions first. You can position the cursor by calling SQLFetch, but this restricts you to strict sequential access to the rows in the working table

34000 Invalid cursor name The cursor referenced by the SearchSQL statement was not open. Reference a valid cursor name opened by a statement handle. The SQLSetCursorName API function returns this SQLSTATE if the specified cursor name was in the wrong format.

37000 Syntax error in SQL dynamic statement near syntax error location A syntax error has occurred while executing a SearchSQL statement. This statement indicates the location of the syntax error in the statement. The last word of the message is either a word, clause, or predicate taken from the statement. If the indicator is a variable name, the error is inside that clause or predicate in the statement. Otherwise, the error is located immediately after the quoted text. This error is also reported for section cases of mismatched data types (for example, specifying a character literal in a WHERE clause where a date literal is expected).

Page 49: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 49

3C000 Duplicate cursor name The SQLSetCursorName API function specified a cursor name that already exists for another statement.

42000 Access violation Access to a table, view, external file, or directory has been denied. Check the file and directory permissions for the access path to the data source. The file path can be obtained from the FULSEARCH data source parameter. Update access is always denied if you’re attempting to update a view.

70100 Operation aborted SearchServer was unable to process the cancel request. SQLCancel cancels a function that has been called asynchronously and is not yet complete (that is, has returned SQL_STILL_EXECUTING). When called after a function has completed, SQLCancel is equivalent to SQLFreeStmt (with the SQL_CLOSE option). This SQLSTATE is returned only if the cancel request cannot be serviced.

80013 Memory management failure SearchServer internal structures have become corrupted. Contact the Fulcrum Customer Support Team. It may be necessary for you to supply tracer scripts of your application and copies of your tables.

80900 Not currently supported This SQLSTATE is returned if you’ve attempted to use a feature not currently supported by SearchServer.

80901 Invalid use of system information table You’ve specified a system table in a statement other than the SELECT statement, or in an incorrect SELECT statement format. See Chapter 5, “System Information Tables,” in Fulcrum SearchServer SearchSQL Reference.

80902 Error accessing system information table The operating system has detected an error while trying to access a system information table. Contact the Fulcrum Customer Support Team.

80904 Internal software error native error code—Please

Page 50: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

50 SA-Application Software Expert 5.0

B

contact Fulcrum Customer Support Team An error in the SearchServer software has occurred. The native error code identifies where the error occurred. Frequently, the native error code returns four sets of numbers (separated by periods).

If you’ve installed the SearchServer Customization Tools, you can find the value of the third set of numbers by looking in the FTERROR.H file in the INCLUDE directory. To determine the meaning of the other numbers, contact the Fulcrum Customer Support Team.

Internal software errors can result from a lack of system resources or memory. If you’re using Microsoft Windows, you can check the level of resources being used by choosing About from the Program Manager Help menu.

You should close any active programs that are not required to free up memory, and retry the operation. In some cases, it might be necessary to restart your application. If the problem persists, contact the Fulcrum Customer Support Team.

80906 Too many tables referenced The number of tables referenced in the FROM clause of the SELECT statement exceeds the maximum limit. This limit is environment specific. For more information about environment specifications, see Fulcrum SearchServer Getting Started for your platform.

80907 Error writing temporary file The operating system has detected an error while trying to write to a temporary file. The location of the temporary files is specified by FULTEMP and the SWK entry in the configuration file. Be sure that these files directories are valid and writable

80908 MAX_SEARCH_ROWS reached The SELECT statement you entered has returned more rows than you specified as the maximum. If the SELECT statement included an ORDER BY clause, the most relevant rows are returned.

80910 System configuration file is inaccessible or corrupt The SearchServer file fultext.ftc is not available because the file is missing or corrupted, or you have insufficient permissions to access it.

Page 51: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 51

80911 Invalid number of values in insert list The INSERT statement contained too many or too few insert values for the column list. Verify the number of values and columns, and re-execute the statement.

80912 Error removing files The operating system has detected an error while trying to remove files. This error occurs as a result of a DROP TABLE SearchSQL statement.

80913 Table is protected The DROP TABLE or VALIDATE INDEX statement you’re executing has failed because a previously executed VALIDATE INDEX statement may have failed, or the table was protected using the PROTECT TABLE SearchSQL statement. Use the UNPROTECT TABLE statement to unprotect the table, or re-execute a VALIDATE INDEX statement that specifies the UNPROTECT parameter.

80914 Invalid use of view You’ve tried to specify a SearchServer view in a CREATE SCHEMA REPLACE, DELETE, DROP TABLE, INSERT, UPDATE, VALIDATE INDEX, ALTER TABLE, or PROTECT/UNPROTECT TABLE SearchSQL statement. Component tables must be updated individually.

80915 Invalid statement parameter When executing a CREATE TEXT_VECTOR statement or a SELECT statement having an is_about predicate in a WHERE clause, the statement either contained an empty character string literal, or the file specified could not be accessed. Verify the character string literal (or the filename and permissions of the file), and re-execute the statement.

When executing a CREATE SCHEMA statement, one of the table parameters is invalid. The filenames specified in the table parameters must not exceed 260 characters. Verify the length of the filenames, and re-execute the statement.

80916 File locking error The table has been locked for exclusive use by another application or utility (for example, ftcin). Wait until the other application has completed its operation, and re-execute the current statement.

Page 52: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

52 SA-Application Software Expert 5.0

B

80917 Open file limit reached The maximum number of open files has been reached. This number is specific to your computer’s environment. Close files in your own application, drop statements, or contact your Sys-tem Administrator.

80919 External document locked When executing an INSERT statement, the external file charac-teristics weren’t available (for example, last modified date). It is possible that another user has the file locked in a different appli-cation. Verify the permissions of the file and its directory, and re-execute the statement.

80920 External document not found You’ve specified an external file that could not be found. Verify the filename, path, and permissions of the file, and re-execute the statement.

80921 Invalid date value You’ve entered a character string in a column defined with a DATE data type, or the value you entered isn’t a valid date. Ver-ify the value, and re-execute the statement.

80922 Out of disk space Contact your System Administrator.

80923 Search too general The WHERE clause in your SELECT statement contained or has generated too many search terms. This condition is usually the result of a wildcard in the search term. Re-execute the statement with a more specific WHERE clause.

80924 Text vector too large The character string literal in your CREATE TEXT_VECTOR statement or is_about predicate has exceeded memory capacity. Move the character string literal to an operating system file, and re-execute the statement using the FILE option.

Page 53: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 53

80925 Invalid server name This SQLSTATE was returned for one of the following reasons: You’ve entered a SearchSQL statement specifying an invalid server name in the table qualifier syntax. Verify the server name, or exclude or modify the table qualifier syntax. An unknown server name was specified when searching the TABLE_QUALIFIER column of a system information table. This SQLSTATE is not reported if any tables from valid servers were reported.

80926 Column isn’t VALUE indexed You’ve specified a column that isn’t VALUE indexed in a SELECT statement with a comparison predicate. Verify the index mode of the column, or use a like predicate in the statement.

80927 Column isn’t indexed You specified a column defined with a NONE index mode in a WHERE clause.

80928 Thesaurus function not supported for value or lit-eral indexed columns You’ve attempted to use the THESAURUS function for value or literal indexed columns. The THESAURUS function can be used only for columns and zones defined with a NORMAL index mode.

80929 Invalid FT_CID value You’ve attempted to retrieve data from a row that doesn’t exist in the table. Make sure the SELECT statement specifies a valid FT_CID value in the WHERE clause. In the case of a CREATE TEXT_VECTOR statement, make sure the row list contains only valid FT_CID values.

80930 Back reference created with different table list The list of tables searched by a SELECT statement containing a back reference predicate isn’t identical to and in the same order as tables searched by SELECT and referenced to that back reference predicate. Verify the list of tables in both SELECT statements, and re-execute the statement.

80931 Invalid zone name You’ve referenced a zone that doesn’t exist in the statement. Verify the name of the zone through the ZONES system table, and re-execute the statement.

Page 54: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

54 SA-Application Software Expert 5.0

B

80932 Invalid zone number You’ve specified a CREATE SCHEMA statement with a zone number in the CREATE DOMAIN clause, or a field number in a CREATE TABLE clause that was outside the valid range of 33 to 64,010. This clause wasn’t one of the specialized field numbers associated with reserved columns. Verify the field and zone numbers, and re-execute the statement.

80933 Invalid use of reserved columns This SQLSTATE was returned for one of the following reasons:

•You’ve inserted a new row with values for FT_DATE, and one or both of FT_SFNAME or FT_FLIST.

•You’ve updated an existing row with a single UPDATE statement in which values are specified for FT_DATE, and one or both of FT_SFNAME or FT_FLIST.

•You’ve updated the value of FT_DATE in an existing row that already contains values for one or both of FT_SFNAME or FT_FLIST.

You’ve inserted a value in one of the following reserved columns:

— FT_CID — FT_DFLAG — FT_MTIME — FT_TEXT — FT_TIMESTAMP — FT_ROW_TYPE — FT_ROW_STATE — FT_TEXT_STATUS— FT_FORMAT — FT_ORIGINAL_SIZE

SearchServer inserts values into these reserved columns automatically.

•You’ve specified the FT_CID reserved column in a between predicate.

•You’ve attempted to change the indexing mode, data type, length field number, or the name of a reserved column. For more information about which reserved columns can be changed, see Chapter 3, “Structuring the Data,” in Fulcrum SearchServer Data Preparation and Administration.

Page 55: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 55

80934 Invalid domain name You’ve referenced a domain in a statement that doesn’t exist. Verify the name of the domain, and re-execute the statement.

80935 Statement too complex You’ve executed a statement that contains too many options, parameters, or clauses.

80937 Table already exists The table named in the CREATE SCHEMA or CREATE TABLE statement or clause already exists. If you want to replace the schema for this table, use the REPLACE option of the CREATE SCHEMA statement.

80938 Unsupported server You’ve connected to a remote SearchServer running an older version of the software which does not support the retrieval mode in use. Contact your System Administrator.

80939 No indexed rows The SELECT statement could not be executed because the table was empty or had never been indexed. SQLSTATE 02000 is also returned.

80940 Buffer number is invalid The relative or absolute positioning value passed to SQLSetColPosition (deprecated function) was out of range. Specify a position within the column text.

80941 End of data You’ve requested to position to the next row using SQLPosition, but the cursor is already positioned on the last row (or the working table is empty). The cursor position was unchanged.

80942 At first row You’ve requested to position to the previous row using SQLPosition, but the cursor is already positioned on or before the first row (or the working table is empty). The cursor position was unchanged.

80943 Row number is invalid The row specified by the (deprecated) absolute or relative SQLPosition positioning option was out of range of the working table. The cursor position was unchanged.

Page 56: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

56 SA-Application Software Expert 5.0

B

80944 A required search term was not found in the index The SELECT statement has a WHERE clause with a query term that doesn’t exist in the table index. SQLSTATE 02000 is also returned.

80945 Record locking not supported in this NFS environ-ment The file system where the table files reside doesn’t provide the services necessary to implement row locking. Transfer the files to a file system where record locking is supported, or change the mode of the table to NOLOCKING.

80946 Table file is missing This SQLSTATE was returned for one of the following reasons:

•The table referenced doesn’t have a complete set of table management files. The missing files can be recovered from system backups.

•The indexing request couldn’t locate a necessary file (for example, the stopfile). If this SQLSTATE is encountered as a result of a VALIDATE INDEX statement execution, the indexing log for the table being indexed may contain additional information.

For more information about table management files, see Fulcrum SearchServer Data Preparation and Administration. The set of table management files comprising this table was incomplete. The missing files may be recovered from system backups. For more information about table management files, see Fulcrum SearchServer Data Preparation and Administration.

80947 Indexing rules have changed, use ABANDON with VALIDATE INDEX Since the table was last indexed, the following files may have affected the indexing rules:

•The file containing the list of stop words for this table has been updated.

•The configuration file has been changed to add OPT:a.

This invalidates all current indexing information. The table must be completely reindexed.

Page 57: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 57

80948 Zone number used in multiple columns A single zone number has been declared as part of two or three columns. The zone numbers used in each column must be distinct.

80950 Duplicate zone names You’ve used the same name for more than one zone in the same schema.

80951 Duplicate column names You’ve used the same name for two columns in the same schema.

80952 Duplicate domain names You’ve used the same name for two domains in the same schema.

80953 Invalid text vector nameThe text vector name used in the SELECT statement isn’t the name of a valid text vector. You must have previously created this text vector using the CREATE TEXT_VECTOR statement.

80954 Column name and zone name identical You’ve used the same name for a column and a zone in the same table.

80955 Zone defined but not used The schema included a zone definition that was never used in a domain. Remove the unnecessary zone definition from the CREATE SCHEMA statement.

80956 Domain defined but not used The schema included a domain definition that was never used in a column. Remove the unnecessary domain definition from the CREATE SCHEMA statement.

80957 Zone used in more than one domain definition The schema included two or more domain definitions that referred to the same zone. Make sure that each zone name appears in only one domain definition.

Page 58: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

58 SA-Application Software Expert 5.0

B

80958 Domain used in more than one column definition The schema included two or more column definitions that referred to the same domain definition, and the domain definition contains a list of zone names. You can use the same domain in several columns if the domain is used to specify an index mode.

80959 Length of VARCHAR column must be no greater than 32767 The schema included a column or domain definition that specified a VARCHAR column longer than 32,767 characters. Decrease the length of the column to 32,767 characters or less.

80960 Incorrect definition for renaming of reserved columnThe schema included a column that renamed a reserved column, and the definition of the renamed column didn’t exactly match the original definition of the reserved column. The definition must be exactly the same, even to the specification of the index mode. Use a domain for some of the reserved columns.

80962 An incomplete set of table files exists An attempt to create a table failed because some of the table management files already existed. These files must be removed before the table can be created. Use the “delete” command provided by your native operating system to remove all files with a .CFG extension.

80963 Syntax error in table type string The SQLTables table type contained a syntax error. The table type string is a comma separated list of table types. Each table type in the list can be enclosed by single quotation marks. For example, (‘table’, ‘view’) or (table, view), where the parentheses delimit for the purposes of this example, signify the start and end of the table type list string.

80964 Some working tables may have been invalidated A DROP TABLE statement was initiated or executed while another statement was active on the affected table. The working table for the other statement has been invalidated.

80965 Text vector name already in use You’ve used a text vector name that already refers to a valid text vector. Choose another text vector name, or close the statement handle for the previous CREATE TEXT_VECTOR statement.

Page 59: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 59

80966 Two zones have identical zone number lists You’ve defined two zones in the same schema that include the identical set of zone numbers.

80968 No match codes are present A request to position to the next or prior match code within a column using the deprecated SQLSetColPosition failed. The match codes did not exist, or no match codes are present in the indicated direction.

80969 The column is bound You’ve used the SQLSetColPosition or SQLGetData (for deprecated retrieval modes) API function on a bound column. This SQLSTATE will not be returned to applications using SearchServer’s default retrieval mode.

80972 The data has changed since the index used to perform this search One or more columns in this row have been altered since the last indexing operation. SearchServer has retrieved the modified data, but match codes (if any) won’t be inserted.

80974 Text reader list cannot be set independently of the external text filename You cannot insert or update the text reader list column unless a value for the external text filename is provided at the same time.

80975 The document text reader list value is invalid You’ve attempted to insert or update a row, create a text vector, or retrieve the external document of an existing row. This action failed because a document text reader in the text reader list wasn’t found in the dynamic library table (or it may have been found, but its open function wasn’t present). For more information, see utility messages 407 and 408 in Chapter 2 “SearchServer Utility Messages.”

80976 Cannot assign data for directory and library file

Page 60: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

60 SA-Application Software Expert 5.0

B

rows These special rows are used as data for the indexing operation, and could not be retrieved by a search. This error will also be returned if you associate a value for FT_DATE with a library or directory row. Any values assigned to columns other than FT_SFNAME and FT_FLIST have no effect and cannot be retrieved.

80977 Duplicate column name in insert or update You’ve attempted to duplicate a column in the list of columns in an INSERT or UPDATE statement.

80979 Cannot initialize SearchServer environment The SearchServer environment could not be initialized. Because the number of applications is dependent on the amount of available memory, you do not have enough memory to run another SearchServer application at this time.

80980 This row has been deleted since the search was per-formed This SQLSTATE was returned for one of the following reasons:

•On retrieval, no data was returned because the row was deleted.

•On UPDATE or DELETE, no UPDATE or DELETE took place because the row had already been deleted.

80984 Row locked This SQLSTATE was returned for one of the following reasons:

•You’ve attempted to fetch data from a row in a table that another user was positioned on in such a way that it hinders simultaneous access. Simultaneous access is not permitted if you or the other user specified a FOR UPDATE statement in the SELECT statement. If the other user has locked the row by an explicit SQLSetPos request (or has executed a positioned UPDATE statement on this row), the row has not been moved or closed by the working table.

•You’ve attempted to execute a positioned UPDATE or a DELETE statement that uses a cursor from a SELECT statement. The FOR UPDATE clause wasn’t specified and another user is positioned on the row.

80985 Access to this row would compromise stability of

Page 61: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 61

another fetch You’ve attempted to fetch data from a row in a working table while already positioned on a row in another working table derived from the same table. Another working table was created by a SELECT statement that specified the FOR UPDATE clause. The requested data could not be retrieved without making the earlier (but still current) row susceptible to change by another user. Cancel the other SELECT statement, or position its cursor off the working table before retrying the operation.

80986 This row has changed since it was last retrieved You’ve tried to execute a positioned DELETE or UPDATE statement on a row that has been modified either by another user, or by another statement in your application after it was fetched. Either create the working table with a SELECT statement that specifies the FOR UPDATE clause, or don’t execute any other SearchSQL statements. Do not refer to any other working table in between fetching, updating, or deleting a row.

This error is also reported by SQLGetData if “read consistency” has been violated. This occurs when the information being returned has not been cached and the row had been changed since the program retrieved other columns from the same row.

80988 External file has been deleted The external document has been deleted since the last indexing operation was performed on this table.

80989 Value indexed columns cannot be used in contains predicatesYou’ve specified a column defined with a VALUE index mode in a contains or like predicate. Use a comparison predicate instead.

80990 Lock lost on current rowReturned in conjunction with SQLSTATE 80984, this error indicates that you attempted to prevent the positioning UPDATE or DELETE statement. Another application gained exclusive control over the row.

Page 62: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

62 SA-Application Software Expert 5.0

B

80992 Syntax error in stop word fileThe stop word file wasn’t in the correct format. Check the file’s contents. A duplicate word may exist in the file.

80995 Zones cannot be used with exact numeric or date columns You’ve attempted to assign a zone to a column whose type is SMALLINT, INTEGER, or DATE.

80996 Invalid index mode for exact numeric or date column You’ve defined an exact numeric or date column and attempted to specify an index mode of NORMAL or LITERAL. The index mode for these columns can only be VALUE or NONE.

80997 Table currently unsearchable: indexing in progress The table’s index was discarded by another process (executing a VALIDATE INDEX statement that specified the ABANDON) before the search was completed. The search can be retried after the VALIDATE INDEX statement has terminated.

80998 Cannot specify table parameters for existing table You’ve specified table parameters in a CREATE SCHEMA statement when either the REPLACE option is included, or the table specified already exists as a Ful/Text collection.

80999 Unable to access message file (state # number) The SearchServer message file FULTEXT.FTC could not be found. The FULTEXT.FTC file is normally found in the FULTEXT subdirectory of the SearchServer installation directory. The directory path to this file is specified by the FULSEARCH parameter. For more information about how to set this parameter, see Fulcrum SearchServer Getting Started for your platform.

809A0 Duplicate items in select list The select list in the SELECT statement references the same column or function more than once. This error is reported at retrieval time, and only when a deprecated retrieval mode is in effect. Applications using the current version of SearchServer will not retrieve this error.

Page 63: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 63

809A1 The column attribute cannot be changed until the cursor is re-positioned This error is always jointly returned with SQLSTATE S1010, and is associated with deprecated functionality. A column’s retrieval attributes cannot be changed while the cursor is on a row.

809A2 Table locked for update by this process, search can-not be performed This SQLSTATE is returned when a SELECT statement with no WHERE clause is executing a table that has a locked row (for example, by executing a SELECT statement with a FOR UPDATE clause on another statement handle). The previous SELECT statement must be closed before the new search can be performed.

809A4 Syntax error in table configuration file There is a syntax error in the .CFG file associated with the table being referenced. This can happen only if the file has been manually edited (for example, creating a view). Correct the syntax error and re-execute the statement.

809A5 Limit on number of tables in a view exceeded The .CFG file for this view lists too many component tables. For more information about table management files, see Fulcrum SearchServer Data Preparation and Administration.

809A6 Concurrent execution of two VALIDATE INDEX statements not allowed A single application cannot use the indexing engine concurrently. Only one VALIDATE INDEX statement can be executed at a time.

809A8 Specified stop file is invalid or inaccessible The stop file specified in the STOPFILE table parameter of the CREATE SCHEMA statement doesn’t exist or could not be opened.

809A9 INDEXDIR is invalid or inaccessible The directory specified in the INDEXDIR table parameter of the CREATE SCHEMA statement does not exist or could not be accessed.

Page 64: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

64 SA-Application Software Expert 5.0

B

809B0 WORKDIR is invalid or inaccessible The directory specified in the WORKDIR table parameter of the CREATE SCHEMA statement does not exist or could not be accessed.

809B1 BASEPATH is invalid or inaccessible The directory specified in the BASEPATH table parameter of the CREATE SCHEMA statement does not exist or could not be accessed.

809B2 Buffer length must be at least 1 greater than block-size The buffer length specified in an SQLGetData API function call must be one greater than the blocksize. The blocksize for a specific statement handle is obtained using the SQLGetColAttribute API function call with the SQL_COLUMN_BLOCKSIZE attribute. The SERVER_INFO information table returns the blocksize used by all statement handles of a connection. Reduce the blocksize, or increase the buffer size. This error applies to deprecated functionality.

809B3 Statement too complex for available memory The statement could not be parsed due to a lack of available system memory. Free up some memory, and call the function again.

809B4 Locked rows encountered during sort, placed at end of working table One or more rows were locked and could not be accessed for sorting. These rows are placed at the end of the working table. The remaining rows are sorted as specified by the ORDER BY clause.

809B5Back reference not allowed with system information table You cannot use a back reference predicate when searching one of the system information tables. Remove this predicate from your query.

Page 65: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 65

809B6 Cannot create table: target directory does not exist or is inaccessible SearchServer could not create a table because the directory path identified by the FULCREATE parameter is invalid. This directory must already exist and be writable by your account. For more information about how to set this parameter, see Fulcrum SearchServer Getting Started for your platform.

809B7 Client/server operation not enabled The FTNPATH data source parameter doesn’t specify any network connectors. To ensure that FTNPATH is set appropriately, verify the correct node name and port number with your system administrator.

809B8 Error occurred during indexing. See log file An error occurred during indexing and was recorded in the log file. An Iindexing errors may occur when a document can’t be accessed or the format is unsupported.because documents could not be opened or saved in a supported format (for example, Microsoft Word for Windows Fast Save format).

For an explanation and recovery procedure of message numbers found in the log file, see Chapter 2, “SearchServer Utility Messages”. An index log file’s contents can be viewed through the FTT_INDEXLOG column of the TABLES system information table.

809B9 Cannot locate Fulcrum indexing engine A VALIDATE INDEX statement failed because the indexing engine file, FTWXSnnn.EXE (where nnn is the file version number), could not be located. This error can only occur in the Microsoft Windows 16-bit environment. To find this file, look in the directory where the SearchServer DLLs reside (in the BIN subdirectory of the SearchServer installation directory). For more information, see Fulcrum SearchServer Getting Started for your platform.

809C0 An error has occurred: possibly due to a bad thesau-rus/variant file You’ve specified a thesaurus or variant file that is corrupted or could not be interpreted correctly. Please check the syntax of your file.

Page 66: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

66 SA-Application Software Expert 5.0

B

809C1 Error in line line number of dynamic library table. Dynamic library not found An attempt to insert or update a row, or an attempt to retrieve the external document of an existing row failed because a document text reader in the document text reader list could not be dynamically bound. To find out which document text reader could not be bound, get a readable copy of the dynamic library table. This table is extracted from FULTEXT.FTC using ftmunld in the following syntax:

ftmunld image.dlt -m fultext.ftc -o fultext.eft

The readable copy of the dynamic library table is stored in IMAGE.DLT. Go to the line in IMAGE.DLT that gives the error message. The third field of this line contains the name of the document text reader dynamic load library file. The next fields contain function entry points of the document text reader dynamic load library.

The following conditions should also be determined:

1. Is the dynamic load library filename spelled properly?

2. Does the specified dynamic load library file exist?

3. Has the dynamic load library file been compiled and linked with all the required compiler and linker switches needed to make a dynamic loadable library? (See the sample make files in the EXAMPLES directory for samples of compiler and linker switches.)

4. Does the dynamic load library file reside in a directory where it will be found by running the SearchServer applica-tion?

809C2 Error in line line number of dynamic library table. Entry point not found An attempt to insert or update a row, or an attempt to retrieve the external document of an existing row failed because a document text reader function entry point could not be found in its dynamic load library. To find out which document text reader list contains the entry point that could not be found, obtain a readable copy of the dynamic library table. This table is extracted from FULTEXT.FTC using ftmunld in the following syntax:

Page 67: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 67

ftmunld image.dlt -m fultext.ftc -o fultext.eft

The readable copy of the dynamic library table is stored in IMAGE.DLT. Go to the line in IMAGE.DLT that gives the error message. The third field of this line contains the name of the document text reader dynamic load library file. The next fields contain function entry points of the document text reader dynamic load library.

The following conditions should also be determined:

1. Is the dynamic load library filename spelled properly?

2. Do the named functions reside in the dynamic load library?

3. Does an obsolete dynamic load library of the same name currently exist as the one specified, but does not contain all of the named functions?

4. Does the specified library reside in a directory that would be found instead of the intended directory?

809C3 Invalid comparison predicate specified for searching the FT_CID column When searching the FT_CID column, this SQLSTATE is returned if the <, >, <= or >= comparison predicates are specified. Only the = and <> predicates can be used for this column.

809C4 Invalid SELECT statement for SEARCH_TERMS table A syntax error has been found in a SELECT statement referencing the SEARCH_TERMS system table. It is a valid SELECT statement, but does not meet the additional restrictions that apply when searching this table. The following three possibilities may be causing the problem:

•incorrect SELECT list

•use of a union clause

•use of an unsupported where clause

For more information about the SEARCH_TERMS system table, see Fulcrum SearchServer SearchSQL Reference.

Page 68: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

68 SA-Application Software Expert 5.0

B

809C6 Search not supported A Terms Ordered algorithm for an Intuitive Search cannot be performed on an IMMEDIATE table that has an empty periodic index.

809D0 Working table has too many rows to calculate RELE-VANCE() values This SQLSTATE is reported when SearchServer is unable to sort on RELEVANCE() because the search produced too many rows (approximately 5000). When ordering by other columns in the Microsoft Windows 16-bit environment, the maximum number of rows is 16,384.

809D1 This operation is not supported for a search of the SEARCH_TERMS table You’ve attempted to retrieve a multiple-row rowset from the SEARCH_TERMS table, which only supports single-row retrieval.

809D2 Working table contains too many rows to be sorted, left in unsorted order An attempt was made in the 16-bit Microsoft Windows environment to sort a working table that contains more than the maximum 16,384 rows. The working table is left in unsorted order, and SQLExecDirect returns SQL_SUCCESS_WITH_INFO.

809D3 UPDATE may not be used to modify columns listed in the ORDER BY clause A positioned UPDATE statement cannot modify a column that is included in an ORDER BY clause of the corresponding SELECT statement.

809D4 Specified character set translation table could not be found The translation table specified in the SET CHARACTER_SET statement could not be located. If this is a custom translation table, be sure that you have installed the table correctly.

809D5 This option cannot be used with a TERM CON-

Page 69: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 69

TAINS clause You’ve attempted to use the SQL_FETCH_SEEK_GE option of the SQLPosition function used for positioning within the SEARCH_TERMS table, but the original SELECT statement specified a TERM CONTAINS clause. This positioning function is available only when a TERM CONTAINS clause was not specified in the SELECT statement.

809D7 ORDER BY clause may only refer to names derived from the SELECT list A sort criterion in the ORDER BY clause contains a name that does not appear in the corresponding SELECT list. The ORDER BY clause can contain only the names of columns that appear in the SELECT list. If a column or function is assigned a name using an AS clause, the ORDER BY clause may only refer to the name that appears after the AS keyword. A function must be assigned a name on an AS clause if that function is to be used in an ORDER BY clause.

809D8 FT_CID and APVARCHAR columns may not be used for sorting The ORDER BY clause refers to one or more columns of a type that cannot be used for sorting. Columns that are not permitted in the ORDER BY clause are the external text column (APVARCHAR), and the FT_CID column.

809D9 Indexing capabilities limited due to older version of a remote ftserver The remote SearchServer is an old version that does not fully support all the capabilities of the client indexing software. Upgrade the remote SearchServer software to the current version. To retrieve documents in SearchServer 3.0, the server must be upgraded to SearchServer 3.0.

809E1 Duplicate names derived from SELECT list As a result of names assigned by one or more AS clauses, the select list would produce a working table with more than one column of the same name. For instance, the following SELECT statements would produce SQLSTATE 809E1:

SELECT NAME, SUBJECT AS NAME SELECT AUTHOR AS NAME, SUBJECT AS NAME

This SQLSTATE is reported only when using deprecated retrieval modes.

Page 70: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

70 SA-Application Software Expert 5.0

B

809E2 Invalid data structure type An invalid structure tag was passed to SQLSetStmtOption with the SQL_DS_TYPE option. This option code is used internally by SearchServer and should not be used directly by the application. If you receive this error code and are not referring to SQL_DS_TYPE, contact the Fulcrum Customer Support Team.

809E3 Invalid data structure size An internal data structure of an incorrect size was detected by SQLSetStmtOption or SQLSetConnectOption. This indicates misuse of the private SQL_DS_TYPE option or an internal error. See SQLSTATE 809E2.

809E4 SELECT on SEARCH_TERMS table not supported by this function This message will be returned by the API functions SQLGetCursorName, SQLSetCursorName, SQLSetStmtOption, SQLGetStmtOption, SQLColAttributes, SQLRowCount, SQLGetData, SQLSetColPosition, SQLSetPos, or SQLDescribeCol. These functions are returned whenever they are passed a statement handle pointing to a statement that was a SELECT on SEARCH_TERMS.

809E5 IMMEDIATE index was created with older version of SearchServer The format of the immediate index files changed after SearchServer Version 1.2/1.3. All IMMEDIATE tables must be reindexed with the older version before being used by more recent versions of SearchServer.

809E6 Wrong ORDER BY clause for SELECT on SEARCH_TERMS table When searching the SEARCH_TERMS table, you must use specific sort specifications. For more information about valid query syntax, See Fulcrum SearchServer SearchSQL Reference.

809E7 WHERE clause includes no valid columns in speci-fied table When searching the SEARCH_TERMS table, the where clause contained a predicate specifying COLUMN_NAMEs. No columns in the table specified by TABLE_NAME satisfy those in COLUMN_NAME.

809E8 Invalid table specification for search of

Page 71: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 71

SEARCH_TERMS table The specification of TABLE_NAME (and optionally TABLE_QUALIFIER) in the where clause of a SELECT on the SEARCH_TERMS table either doesn’t match an existing table, or matches multiple existing tables.

809E9 Invalid option for a search of the SEARCH_TERMS table Only the next and previous row positioning options are permitted when positioning to rows of a SEARCH_TERMS result list.

809F0 TERM clause is invalid for search of SEARCH_TERMS table Either the TERM pattern didn’t end with a percent sign (%), or there was more than one pattern included in the clause.

809F1Unable to use specified collation sequence SearchServer was unable to access the specified collation sequence. This SQLSTATE may be returned with another SQLSTATE that provides additional information about why the collation sequence could not be accessed.

809F2 Invalid comparison predicate for normal or literal indexed column Only the comparison predicate operators (=, < >) are supported for NORMAL and LITERAL indexed columns. This SQLSTATE is returned when one of the comparison predicate operators (<, >, <= or >= ) is used to search a NORMAL or LITERAL indexed column.

809F5 Cannot obtain exclusive lock on table CREATE SCHEMA REPLACE or ALTER TABLE was unable to acquire the necessary exclusive lock on the table because another user is performing a similar action. Re-execute the statement once the table is available.

809F7 Column isn’t LITERAL or NORMAL indexed A query that was expecting a LITERAL or NORMAL indexed column either didn’t find any indexed terms, or the column was VALUE indexed.

809F8 Unable to update immediate index, use VALIDATE

Page 72: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

72 SA-Application Software Expert 5.0

B

INDEX to make row searchable SearchServer was unable to update the immediate index files associated with a table due to another process writing to the immediate index files at the same time. This error can occur during a positioned INSERT or UPDATE operation. When this error occurs, the row was successfully changed, but will not be searchable until a VALIDATE INDEX is executed on the table. This SQLSTATE applies only to 16-bit Microsoft Windows and Windows 95 environments .

809F9 Missing value in SET clause The SET clause list was missing for the INSERT or UPDATE statement.

809FA Cannot update or delete row FT_CID = cid Returned jointly with SQLSTATE 80972, cid indicates the value of the row that has changed since indexing.

809FB Text vector does not contain useful search criteria In an Intuitive Search, the maximum document percentage was set too low for the search criteria to return any relevant information. Additionally, the search criteria terms appeared more frequently than the maximum percentage specified. Increase the percentage frequency, or increase the scope of the target text to include additional words to obtain a relevant search with the same criteria.

809FC Table is corrupt A row could not be accessed because the internal data structures stored in the table files were corrupt. Recreate the table.

809FD FULCREATE directory not included in FULSEARCH path The table specified in the CREATE SCHEMA or CREATE TABLE statement was not created because the directory specified in the FULCREATE parameter is not included in the FULSEARCH parameter setting. Correct FULCREATE and/or FULSEARCH, and re-execute the CREATE statement. For more information about how to set the FULCREATE and FULSEARCH data source parameters, see Fulcrum SearchServer Getting Started for your platform.

809FE Invalid comparison predicate An invalid comparison predicate was executed as part of a SELECT statement.

Page 73: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 73

809G0 Cannot have more than one CREATE TABLE clause in a CREATE SCHEMA statement A CREATE SCHEMA statement was parsed in which there was more than one CREATE TABLE clause. Rephrase the CREATE SCHEMA statement using one CREATE TABLE clause.

809G1 Invalid qualifier in DBQ The DBQ= attribute in the connection string passed to SQLDriverConnect specified an invalid qualifier that is neither a valid remote SearchServer name (nodename), nor an absolute directory path.

809G2 Invalid qualifier The string passed with the SQL_CURRENT_QUALIFIER option was not a valid qualifier. The string was neither a remote SearchServer name (nodename), nor an absolute directory path.

809G3 External document not compatible with document text reader list The external document was not in a format recognizable by the specified document text reader list. The inserted row has been added to the table, but the external document cannot be searched or retrieved. Update this row to specify the correct external document filename and the text reader list.

809G7 Memory Allocation Failure—May be too many con-current applications running SearchServer was unable to allocate memory for temporary internal data structures. In some environments, this may be a symptom of resource exhaustion by multiple concurrent SearchServer applications. Close some applications before continuing.

809G8 A network operation has failed. Check the server A network I/O failure has occurred. The connection may have been broken. Verify that the server is still running and accessible, and restart the server if necessary.

809G9 Table catalog file is corrupt You attempted to perform an operation on a file that is not a SearchServer table. The Management Data files may be invalid. You should recreate the table. For more information, see Appendix C, “Table Management Files,” in Fulcrum SearchServer Data Preparation and Administration.

Page 74: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

74 SA-Application Software Expert 5.0

B

809GA FULTEMPor FULCREATE for data source (data source name) is invalid or inaccessible The FULTEMP directory value defined by the data source is invalid. Use the ODBC Administrator to lookup the FULTEMP directory value for the data source name, and verify that the directory exists and that files can be created.

809GB Index files for table (table name) are corrupt One or both of the dictionary and reference index files are corrupt. Run a VALIDATE INDEX statement that specifies the ABANDON parameter to rebuild the index files.

809GC An error occurred when attempting to read the text file This SQLSTATE was returned for one of the following reasons:

•You unsuccessfully tried to read an external text file.

•You tried to read a corrupted or invalid library file.

A system I/O error occurred.

809GD Text reader error: text reader specific message An error has occurred during indexing that was detected by the text reader. If you’re using the ODBC custom text reader, check the .PRX file for errors in the parameter file. You should also check the index log file for any more text reader error messages.

809H0 Invalid driver statement handle specified for odom-eter association The driver statement handle being associated with another statement handle by SQLSetStmtOption using SQL_SS_ASSOC_ODOMETER must be a valid statement handle within the same connection.

809H1 Cannot associate odometer to the same statement handle The statement driver handle passed as the option value in SQLSetStmtOption using SQL_SS_ASSOC_ODOMETER (or, in Driver Manager environments, the corresponding driver handle ) is identical to the handle on which the option is being set.

809H2 Stmt handle cannot be used with SQL_SS_ODOMETER and SQL_SS_ASSOC_ODOMETER This SQLSTATE was returned for one of the following reasons:

Page 75: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 75

The statement handle used with SQL_SS_ASSOC_ODOMETER of SQLSetStmtOption already had an odometer associated by the previous (deprecated) mechanism.

SQLSetStmtOption with SQL_SS_ODOMETER specifies a handle with an existing association using SQL_SS_ASSOC_ODOMETER. SS_ODOMETER as an option to SQLSetStmtOption is deprecated.

809H3 Row capacity exceeded for non-SearchServer tablesAn INSERT or UPDATE statement overflowed the total amount of data and/or the number of columns. If FRAGMENTED is TRUE, each separate line of column text counts as if it were a separate column.

809H4 There is only one column; it cannot be dropped The column specified in the ALTER TABLE DROP statement is the only column in the table. You must specify at least one column for the table.

809H5 The table has attributes which aren’t supported on this platformThis SQLSTATE occurs only when the table resides on a shared file system (for example, PC-NFS). The table is not directly accessible by software running on the current client platform. Either run your software on the platform where the table is located, or access it via a remote SearchServer.

809HA The table is currently being reindexed. Cannot retrieve search terms You cannot search the SEARCH_TERMS table for terms in a table that are in the process of being indexed. Re-execute the statement when indexing is completed.

809HB The index will be temporarily out of date While an immediate table is being periodically indexed, its indexes can’t be instantly updated when rows are inserted, updated, or deleted. Once the VALIDATE INDEX operation is completed, the indexes will be updated.

809HC Full pathname is longer than the allowed maximum pathsize The FULLNAME() function cannot represent the complete path and filename of the external file because it would exceed the maximum length of a path on this system.

Page 76: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

76 SA-Application Software Expert 5.0

B

809HD Insufficient memory for requested search memory size The memory size requested by SET SEARCH_MEMORY_SIZE or the SQL_SS_SEARCH_MEMORY_SIZE of the SQLSetConnectOption cannot be supported. The search memory size cannot be changed.

809I1 The table is currently being reindexed. Statement request not performed An error was detected when performing a search or a searched update on an immediate table because another application was performing a periodic index of the same table. The search or searched update was not performed.

809I2 Invalid use of the FULLNAME() function This error is returned if the FULLNAME() function is used in a WHERE clause, or in any statement other than SELECT.

809I3 Invalid use of the ORIGINAL() function The ORIGINAL() function can’t be used in a WHERE or ORDER_BY clause.

809I4 Invalid use of the CUSTOM_VIEWER() function The CUSTOM_VIEWER() function cannot be specified in a WHERE or ORDER_BY clause from SQLFetch, SQLGetData, SQLExtendedFetch, and OPEN.

809I5 Shared cursor incompatible with old retrieval mode This error is returned if the cursor is already being used by another statement handle, or if it’s returned in the old (deprecated) retrieval mode.

809I9 TR chain does not support requested mode of opera-tion The requested mode of operation (customer viewer, original) is not supported by one of the text readers specified in the text reader chain. Verify that the requested mode of operation is supported by that text reader.

809J0 Only one linguistic processing filter list allowed per

Page 77: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 77

statement This error is returned if you’ve specified a different linguistic processing filter list for the same statement. Only one name can be specified for each statement, either through the THESAURUS function, or in the is_about predicate of the WHERE clause.

809J1 Invalid linguistic processing filter list The linguistic processing filter list specified for use in the statement is invalid. Verify the name of the linguistic processing filter list and re-execute the statement.

809J2 Unable to free statement handle associated with con-nection SearchServer was not able to free one of the statement handles associated with the current connection. This error is returned when an error occurred during the SQLDisConnect function call.

809J3 Page positioning does not support ORIGINAL() or CUSTOM_VIEWER() Page positioning is not supported for the ORIGINAL() and CUSTOM_VIEWER() interpretation of the external text column. This error is returned if the SQL_PAGE option is used with SQLSetColPosition on the ORIGINAL() or CUSTOM_VIEWER() interpretation of the external text column.

809J4 SQLSetColPosition option supports SQL_FETCH_ABSOLUTE for CUSTOM_VIEWER() Only the SQL_FETCH_ABSOLUTE option of SQLSetColPosition is supported for CUSTOM_VIEWER() interpretation of the external text column.

809J5 MAX_EXEC_TIME reached. Working table returned unsorted The maximum execution time for a statement had expired before SearchServer can finish sorting the result set. As a result, the working table is returned unsorted. The maximum execution time for a statement is set through the SET MAX_EXEC_TIME statement. This error may be returned in conjunction with SQLSTATE S1T00.

809J6 Unable to retrieve text reader specific error mes-

Page 78: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

78 SA-Application Software Expert 5.0

B

sage; out of memory SearchServer was unable to retrieve the error message generated by a text reader due to insufficient memory. Close any unnecessary applications to free up memory and try again.

809J7 Back reference contains no context information The back reference predicate does not contain any context information. It cannot be used to search for context information. Either create another back reference predicate that contains the desired context information, or remove this predicate from your query and re-execute the statement.

809J8 Option specified to the linguistic processing filter is invalid This SQLSTATE was returned for one of the following reasons:

•You specified an option that attempted to apply an invalid linguistic processing filter: /option/option.

•You cannot use an option specified to the linguistic processing filter.

For more information about the linguistic processing filter, see Fulcrum SearchServer SearchSQL Reference.

809K0 Linguistic processing database name invalid or not found on the server The linguistic processing filter name was not available on the machine where the table resides. Verify that your client/server environments are able to access the linguistic packages required for the tables that will use them. For more information about the linguistic processing filter, see Fulcrum SearchServer SearchSQL Reference.

809K1 Linguistic processing database files not found on FULSEARCH path The *.DAT files required for the linguistic processing filter were not found in the path specified by FULSEARCH. You must make available the appropriate *.DAT files for each language you specify. For more information about the linguistic processing filter, see Fulcrum SearchServer SearchSQL Reference.

809K2 Error on accessing linguistic processing database files The linguistic processing database or cache files could not be opened or initialized.

Page 79: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 79

809K3 20_STREAM FORMAT_TEXT mode must be set before first retrieval request is returned This error occurs if you make a request to a SearchServer 2.0 server from SearchServer 3.0 before executing the 20_STREAM FORMAT_TEXT statement. You must execute the SearchSQL statement SET FORMAT_TEXT ‘20_STREAM’ (for example, in ExecSQL) before making any requests in order to access SearchServer 2.0 servers from SearchServer 3.0.

809K4 Cannot change FORMAT_TEXT from 20_STREAM in the same connection This error occurs if you try to set FORMAT_TEXT to another value after setting it to ‘20_STREAM’ in the same session. You must execute a SQLDisconnect API statement before you change the FORMAT_TEXT value again.

809K5 Table definition files created with an older version of SearchServerThis error occurs if you’ve tried to execute an INSERT, DELETE, or UPDATE statement when using a SearchServer 3.5 data source or application that is reading a SearchServer 2.0 table. Although SearchServer 3.5 can read SearchServer 2.0 tables, it can’t edit them. This restriction is a result of the changes to the internal format of SearchServer table management files for the SearchServer 3.5 release.

809K8 One or more rows/documents were not indexed. See Log file An error occurred during indexing and was recorded in the log file. An indexing error may occur when a document can’t be accessed or the format is unsupported.

809L0 Document text reader failed due to an exception This error occurs when you try to open a document that was not indexed due to a document text reader failure during indexing. SearchServer marked the row as indexed and continued indexing, but because the text reader failed the document cannot be read. See the log file for more information.

Page 80: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

80 SA-Application Software Expert 5.0

B

809L1 Document text reader failed due to a timeout This error occurs when you try to open a document that was not indexed due to a document text reader timeout during indexing. SearchServer marked the row as indexed and continued indexing, but SearchServer timed-out the text reader. See the log file for more information.

809L2 Statement not permitted in read-only connection This error occurs when you try to execute any of the following statements through a read-only connection: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE SCHEMA,VALIDATE INDEX, DELETE, INSERT, UPDATE.

809L3 WHERE clause contained only stopwords The Query failed to return any documents because it didn’t contain any usable search terms.

809L4 ORDER BY sort abandoned - Collation Sequence not found The collation sequence couldn’t be found to perform the sort specified. Check the collation sequence name for errors. If the name specified is correct, check its availability locally or on the servers referenced by the seach.

809L5 ORDER BY sort done on Client - Collation seqence not found on one or more servers The sort operation specified by the ORDER BY clause in a SELECT statement was performed on the client because one or more of the servers’ utilities doesn’t support the collation sequence.

809VB SearchServer Visual Basic API internal error This SQLSTATE was returned for one of the following reasons:

•An attempt to insert or retrieve a row failed because the text reader list specified a text reader in a dynamic library that could not be located. This SQLSTATE appears jointly with 80975.

•The Visual Basic glue layer encountered an internal error. This is the result of memory corruption or a low memory condition. Check the available system resources, terminate any active Microsoft Windows programs that are not needed, and restart your application. If the problem persists, contact the Fulcrum Customer Support Team.

Page 81: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 81

I0001 Number of input fields does not match the header Either not enough field values were created in a particular input record, or too many were generated. The record is ignored. There must be a one-to-one correspondence of the field names in the header with the individual field values in the data rows.

I0002 Field value was too big The maximum length of a CHAR or VAR_CHAR column in SearchServer is 32,767 Kbytes. Input fields larger than this will not be imported. Check the field values in your input file.

I0003 Mismatched quotes in input file This error occurred because mismatched quotation marks was found in the input file. A record with mismatched quotations cannot be imported Use the correct (matching) set of quotation marks and try again to import the record. If you’re using the Import wizard in SearchAdmin, see the online help for more information. If you’re not using SearchAdmin, see Appendix A, “Utility Program Summary,” in Fulcrum SearchServer Data Preparation and Administration.

I0004 Bad delimiter specified The character you selected to be used as a field delimiter is invalid. Specify a different field delimiter characternot the quotation character or the quotation escape character. If you’re using the Import wizard in SearchAdmin, see the online help for more information. If you’re not using SearchAdmin, see Appendix A, “Utility Program Summary,” in Fulcrum SearchServer Data Preparation and Administration.

I0005 Syntax error in input file An unescaped quotation mark was found in the middle of a field value. This error results in an invalid control sequence. If you’re using the Import wizard in SearchAdmin, see the online help for more information. If you’re not using SearchAdmin, see Appendix A, “Utility Program Summary,” in Fulcrum SearchServer Data Preparation and Administration.

I0006 Unexpected end of file The end of the input file was found at an unexpected position. This error usually occurs in the middle of a control sequence. Verify your input file, or re-create it.

Page 82: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

82 SA-Application Software Expert 5.0

B

IM001 Driver does not support this function The Driver Manager detected that the SearchServer driver did not support the API function call for the specified connection or statement handles.

IM002 Data source name not found and no default driver specified The Driver Manager detected that the data source name specified in the SQLConnect or SQLDriverConnect API function call was not found in the ODBC.INI file. Use an existing data source name, or add a new data source name. For more information about how to add data source names, see Fulcrum SearchServer Getting Started for your platform.

IM003 Specified driver could not be loaded The SearchServer driver listed in the data source specification ODBC.INI file (or specified by the SQLDriverConnect API function DRIVER keyword) wasn’t found or could not be loaded by the Driver Manager. Verify that the file path associated with DRIVER is valid.

SearchServer requires a file pathname to FTETnnnW.DLL (where nnn is the file version number) for a Microsoft Windows 16-bit SearchServer driver. For a pathname on the Microsoft Windows NT 32-bit environment, the file is FTETnnnN.DLL.

IM004 Driver’s SQLAllocEnv failed During a call to SQLDriverConnect or SQLConnect, the Driver Manager called the SearchServer driver SQLAllocEnv API function, and SearchServer returned an error to the Driver Manager.

IM005 Driver’s SQLAllocConnect failed During a call to SQLDriverConnect or SQLConnect, the Driver Manager called the SearchServer driver SQLAllocConnect API function, and SearchServer returned an error to the Driver Manager.

IM006 Driver’s SQLSetConnectOption failed During a call to SQLDriverConnect or SQLConnect, the Driver Manager called the SearchServer driver SQLSetConnectOption API function, and SearchServer returned an error to the Driver Manager. SQLConnect or SQLDriverConnect returns SQL_SUCCESS_WITH_INFO.

Page 83: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 83

IM008 Dialog failed The Driver Manager failed to display the data sources dialog. IM010Data source name too long The data source name was longer than SQL_MAX_DSN_LENGTH characters. The name you specify must be within SQL_MAX_DSN_LENGTH characters.

IM011 Driver name is too longThe SQLDriverConnect driver name (DRIVER) found in the connection string was greater than the maximum length of 255. Use a shorter driver name.

S1C00 Driver not capable The driver or data source does not support the requested operation.

S1001 Memory allocation failure The Driver Manager or driver (for example, SearchServer) was unable to allocate the memory required to execute or complete the function.

S1002 Invalid column number This SQLSTATE was returned for one of the following reasons:

•The statement wasn’t a SELECT statement.

•The column number was less than one or greater than the number of columns in the select list.

•The column number was specified as SQL_TEXT_COLUMN, but the select list doesn’t include the external text column.

S1003 Program type out of range An invalid conversion type identifier was specified in a call to the API functions SQLBindCol or SQLGetData. For more information about valid conversion type identifiers, see Fulcrum SearchServer C API Reference.

S1004 SQL data type out of range The dtype value you specified in a call to SQLGetTypeInfo was invalid. Try again, ensuring that the dtype value you specify is valid.

Page 84: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

84 SA-Application Software Expert 5.0

B

S1008 Operation canceled Asynchronous processing was enabled for the statement handle. After the API function was first called (but before it had completely executed), the SQLCancel function was called for the statement handle. The original API function returns this SQLSTATE using the same statement handle to verify the cancel request.

S1009 Invalid argument value One of the parameters in a call to an API function had an illegal value. The possible causes include:

•You’ve specified a NULL pointer for a parameter that must return a result (for example, the handle being allocated or the buffer being retrieved).

• You’ve specified the length of a required result buffer as zero or negative.

•You’ve passed an incorrect value to a parameter that is constrained to be one of a particular set. For example, in a call to the SQLCancel API function, the flag parameter can only be SQL_RESUME or SQL_CANCEL.

•The length of the userid, password, or server name was negative or exceeded SQL_USERID_LENGTH, or you attempted to specify a userid with the local or default server.

S1010 Function sequence error This SQLSTATE is returned when an API function could not be called at this time with the specified environment, connection, or statement handles. You must call another API function(s) first. The possible causes include:

•There is no server associated with this connection handle. You called the SQLAllocStmt API function before the SQLConnect API function. You can also get this SQLSTATE if you call the SQLConnect API function when there is already a server associated with the connection.

• Statement execution hasn’t been initiated. API functions that deal with columns or retrieve data all require that you first execute a SELECT statement using that statement handle or an API function that constructs a working table. For example, you can’t bind, describe, fetch, or get data from a particular statement handle until you have executed a SELECT statement.

Page 85: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 85

•The state of statement execution is otherwise inappropriate for this API function. After executing a statement that successfully returned a working table, you can’t execute a second statement using the same statement handle. To use the same statement again, call SQLFreeStmt with the SQL_CLOSE option.

•You can’t initiate execution of one SELECT or VALIDATE INDEX statement until the previous one has been completed or canceled.

•You can’t retrieve data or position the cursor if the associated statement didn’t create a working table.

•The state of information retrieval is inappropriate for this API function. For example, you can’t position within a row or call the SQLGetData API function until you’ve positioned to a row.

•You can’t free a connection while it is still in use, or free the environment while a connection handle exists.

•Asynchronous execution of some API function has not completed on this statement handle. You’ve attempted to call a different function with the same statement handle or its associated connection handle.

S1012 Invalid transaction operation code specified An invalid transaction operation value was specified for SQLTransact. The valid values are SQL_COMMIT and SQL_ROLLBACK. SearchServer does not support SQL_ROLLBACK, but this is rejected with SQLSTATE S1C00.

S1015 No cursor name available You called SQLGetCursorName on a statement handle that had never been assigned a cursor name. Cursor names are assigned explicitly by calling SQLSetCursorName or automatically when a SELECT is executed on the statement handle.

S1090 Invalid string or buffer length The buffer length specified in an API function call was invalid. Normally, this is caused by the value being set to less than 0.

S1091 Descriptor type out of range The value specified for the descriptor type of the SQLColAt-tributes API function was invalid. For more information about valid descriptor types, see the Fulcrum SearchServer C API

Page 86: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

86 SA-Application Software Expert 5.0

B

Reference, or the Fulcrum SearchBuilder for Visual Basic Ref-erence.

S1092 Option type out of range The value specified for the option parameters of SQLFreeStmt, SQLGetConnectOption, SQLGetStmtOption, SQLSetConnectOption, SQLColAttributes, and SQLSetStmtOption were not in an acceptable value range. For more information about valid option parameter values, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1095 Function type out of range An invalid function identifier was specified in a call to the API function SQLGetFunctions.

S1096 Information type out of range The information type value specified in the SQLGetInfo API function call was not valid. For more information about valid information types, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1097 Column type out of range An invalid column type identifier was specified in a call to the API function SQLSpecialColumns. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1098 Scope type out of range An invalid scope type identifier was specified in a call to the API function SQLSpecialColumns. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1099 Nullable type out of range An invalid nullable type identifier was specified in a call to the API function SQLSpecialColumns. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

Page 87: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 87

S1100 Uniqueness option type out of range An invalid uniqueness type identifier was specified in a call to the API function SQLStatistics. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1101 Accuracy option type out of range An invalid accuracy type identifier was specified in a call to the API function SQLStatistics. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1103 Direction option out of range An invalid direction type identifier was specified in a call to the API function SQLDataSources or SQLDriver. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1106 Fetch type out of range An invalid fetch type identifier was specified in a call to the API function SQLExtendedFetch. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1107 Row value out of range This error message was returned by the following API calls:

•SQLExtendedFetch: The value specified with the SQL_CURSOR_TYPE statement option was SQL_CURSOR_KEYSET_DRIVEN, but the value specified with the SQL_KEYSET_SIZE statement option was greater than zero and less than the value specified with the SQL_ROWSET_SIZE statement option.

• SQLSetPos: The value specified for the rownum argument was greater than the number of rows in the rowset.

• SQLSetScrollOptions: The value specified for the number of rows for which keys will be buffered was less than one, but not equal to a valid identifier. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

Page 88: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

88 SA-Application Software Expert 5.0

B

•The value specified for the number of rows for which keys will be buffered was greater than zero, but less than the number of rows in a rowset. The value specified for the number of rows in a rowset was zero.

S1108 Concurrency option out of range An invalid concurrency type identifier was specified in a call to the API function SQLSetScrollOptions. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1109 Invalid cursor position An invalid cursor position was specified in the option argument SQL_REFRESH. The value in the rowstat array (for the row specified by the rownum argument) was SQL_ROW_DELETED or SQL_ROW_ERROR.

S1110 Invalid driver completion An invalid driver completion type identifier was specified in a call to the API function SQLDriverConnect. For more information about valid identifiers, see the Fulcrum SearchServer C API Reference, or the Fulcrum SearchBuilder for Visual Basic Reference.

S1T00 Timeout expired The timeout period expired before SearchServer returned the result set. The timeout period is set through the SQLSetStmtOption SQL_QUERY_TIMEOUT option, or through the SET MAX_EXEC_TIME statement. If the SQLSetStmtOption SQL_SS_CAPABLE and SQL_SS_KEEPRESULT bit are set in the option, SQL_SUCCESS_WITH_INFO is returned. This provides the ability to retain partial result lists. If this option is not enabled, SQL_ERROR is returned.

SGS00 Invalid table name The table you’re referencing in the statement doesn’t exist. Verify the name of the table through the TABLES system information table, and re-execute the statement.

Page 89: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Error Messages and SQLSTATES

Text Retrieval Guide 89

SIS00 Invalid index The table wasn’t indexed or a VALIDATE INDEX statement for the table has failed. Execute a VALIDATE INDEX statement. Check the indexing log to determine why indexing failed. For more information about indexing, see Fulcrum SearchServer Data Preparation and Administration.

SJS00 Invalid column name The column you’re referencing in the statement doesn’t exist. Verify the name of the column through the COLUMNS system information table, and re-execute the statement.

Page 90: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

90 SA-Application Software Expert 5.0

B

Page 91: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 91

Chapter 2:

SearchServer Utility Messages

This chapter provides a list of the messages that can be produced by the SearchServer utility programs.

Overview

SearchServer utility messages are contained in compiled form in the system configuration file (FULTEXT.FTC). Most of the utility programs read messages from this file. If a utility can’t find the system configuration file, the following message is displayed:

353: Can’t access file “fultext.ftc” message...

Utility messages that contain the terms “collection” and “cname” are the SearchServer equivalent to the term “table”. Utility messages that contain the term “filter” are equivalent to either the SearchServer terms “text reader” or “network connector”, depending on the context of the message description.

Message SyntaxThe messages that can be displayed by utilities or written to the indexing log file of a table by SearchServer are listed in numerical order. Most of the messages are prefixed with a numeric message identifier, but there are cases where the message number doesn’t appear in the text of the message. In this case, the message is logically grouped with the associated numbered message.

The messages displayed in FULTEXT.FTC fall into the following four categories:

Category What it Does

access message Usually indicates that a directory or file can’t be accessed in the manner required to complete the procedure specified.

usage message Indicates the correct syntax for the procedure.

Page 92: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

92 SA-Application Software Expert 5.0

B

Table 2-1 Categories of the Utility Messages

The category of messages that can be displayed can usually be determined by their syntax.

Access Message SyntaxThe syntax of an access message is:

utility: access_message [message]

where utility is the name of the utility program sending the message, access_message is the phrase that describes the type of access problem, and message indicates one of the following:

•a supplementary message could appear (UNIX only)

•the message “unknown cause” is displayed

•the name of the file being accessed is displayed

Some of these messages are written to the index log file, and some are displayed at the terminal. Access problems can be caused either by a lack of temporary working space, or by file system permission denial.

Usage Message SyntaxIf a usage message is displayed, the syntax of the command you entered is incorrect. The message indicates the correct syntax as follows:

utility: USAGE: utility tablename [option_list]

where utility is the name of the utility program sending the message, and tablename [option_list] represents a table name, followed by the parameters available for the utility program. If you aren’t sure of the command syntax of a utility program, you can obtain this usage message for most utilities by entering just the utility name.

information message Provides information or reports an error condition.

other Messages that don’t fall into any of the above categories.

Category What it Does

Page 93: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 93

Information Message SyntaxInformation messages are either standard index log file messages, or they are for information only. The latter are either written to the index log file, or they are displayed at the terminal with the following syntax:

utility: message

where utility is the name of the utility program emitting the message, and message represents the information message supplied by the utility. The message can indicate that an error has occurred, in which case additional debugging information could also be given.

Other Message Syntax

The syntax and meaning of these messages will vary. They don’t fall into any of the previous categories.

List of Messages

The following conventions are used to represent placeholders for the names and numbers that are displayed in the messages:

Table 2-2 Message Conventions

Placeholder Meaning

number any decimal number not representing a cid or vcc

cid FT_CID column value

cname table name

vcc text character count (for Fulcrum use only)

filename a filename or pathname

message a portion of a message which is dynamically provided by the utility according to the context

term a string of characters comprising a search term

Page 94: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

94 SA-Application Software Expert 5.0

B

In some cases, the messages reference an internal error code. When you report a problem to the Fulcrum Customer Support Team, inform support personnel of the entire error message, and any internal error codes.

2: SearchServer: removed from catalog filename (catalog id cid)

A row has been deleted by a VALIDATEINDEX statement because the external document could not be opened. This message is written to the log file of the table. 8: starting indexing at date and time -- version identificationIndexing started at the indicated date and time. The VALIDATEINDEX statement writes this message to the log file of the table at the start of indexing. The version identification applied to the Fulcrum Ful/Text software is used by SearchServer.

9: ending indexing at date and time Indexing was completed at the indicated date and time. The VALIDATEINDEX statement writes this message to the log file of the table when indexing is completed. 13: utility: failed to seek in a file: filename The utility could not be repositioned to a specific location in the indicated file. This can be caused by an internal program error or a custom text reader problem.

14: utility: failed to read from a file: filename The utility encountered a failure while reading from the indicated file. This rare occurrence could be the result of a hardware error or a custom text reader problem. 15: utility: can’t create filename The utility could not create a new file with the indicated name. Check that the filename is valid for your operating system, and that you have write permission for the directory in which the file is to be created.

16: utility: can’t open filename The utility could not open the indicated file. This error is often the result of a typographical error, so be sure to check the filename.

17: utility: can’t read filename

Page 95: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 95

The utility encountered a failure while reading from the indicated file. This rare occurrence could be the result of a hardware error, or a filter internal I/O error. Previous indexing data for this document is not affected.

18: utility: can’t write filename The utility could not open the indicated file in write mode. Check the filename and ensure that write permission exists for the file.

19: utility: insufficient memory The utility could not obtain enough memory to carry out its work. Try closing some applications or terminating some processes, and retry the command.

21: utility: function message set fterrno to number An error has occurred for which there is no specific message. This can be caused by a syntax error in the configuration file. Contact the Fulcrum Customer Support Team.

22: utility: failed to open a file for output: filename The utility could not open the indicated file in write mode. Check the filename and ensure that write permission exists for the file.

23: utility: failed to open a file for input: filename The utility could not open the indicated file in read mode. Check the filename and ensure that read permission exists for the file.

24: utility: failed to write to a file: filename The utility encountered a failure while writing to the indicated file. The possible causes include running out of file space, or a hardware error.

25: utility: can’t get required values from configuration fileThe configuration file was damaged. You might have damaged it while editing. Correct any invalid edits.

27: SearchServer: stop file specification has been modified; use ftindex -a

Page 96: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

96 SA-Application Software Expert 5.0

B

A VALIDATE INDEX statement was invoked for a table whose stop file specification (STP=, STF,= or OPT: entry in the configuration file, or the stop file itself) has been modified since the last indexing operation. The message is written to the index log file. Run a VALIDATE INDEX statement that specifies the ABANDON parameter to re-index that table with the new stop words.

28: SearchServer: collection cname full pathname too long to process directory filename

Too many nested directories. Check the contents of the directories, and organize the directories and/or files for shorter pathnames. The maximum number of characters permitted in the path is environment dependent.

30: SearchServer: index files corrupted: rebuild index files using ftindex -a

The index files could not be read. Run a VALIDATE INDEX statement that specifies the ABANDON parameter to rebuild the index files.

31: utility: catalog record corrupted: cid The row named by cid (catalog identifier) has been damaged. Try to recover good catalog data to a text file using ftcout. You might have to reorganize the table from the text file using ftcin with the -n and -p parameters. Alternatively, rebuild the table or restore it from a system backup.

32: SearchServer: expected number wordlist records but found number An internal program error has occurred during indexing. Contact the Fulcrum Customer Support Team.

33: ftcin: catalog field data must be supplied An attempt was made to undelete a deleted row without supplying any new column data. Select a different catalog id, or add some column data.

37: utility: not enough space in file system for message The utility could not complete the indicated operation due to insufficient disk space. Make more space available and retry the operation.

38: ftmunld: access to filename is required

Page 97: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 97

An attempt to access the default minifile (FULTEXT.FTC) failed. Re-execute the ftmunld utility by specifying a full file path for the minifile with the -m option.

39: ftcin: filename not cataloged: message The ftcin utility could not add the specified file to the table for one of the following reasons:

•permission denied

•file not found

•already in catalog

•conflicts with catalog id cid

•invalid text reader

Correct the text-catalog file and try ftcin again

48: ftmunld: no such file: filename The minifile specified with the -m option could not be accessed. Verify that the minifile exists and can be accessed. Normally, the minifile used is called FULTEXT.FTC. Re-execute ftmunld by specifying the full file path for the minifile.

51: ftlock: collection cname currently being indexed An attempt was made to unlock the table, and the utility program detects that the table is being indexed. Wait for indexing to complete.

Note: Terminating indexing by using operating system services (for example, kill in UNIX) can render the table unsearchable until it has been completely re-indexed (using a VALIDATEINDEX statement that specifies the ABANDON parameter.) Whenever possible, use application-provided methods to terminate indexing normally

53: utility: error in stop file filename This error was probably caused by a syntax error in the stop file. Check the stop file.

55: SearchServer (function): words out of sequence -- message

An internal program error has occurred. Contact the Fulcrum Customer Support Team.

Page 98: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

98 SA-Application Software Expert 5.0

B

56: SearchServer (function): CID cid after cid; term term An internal program error has occurred. Contact the Fulcrum Customer Support Team.

57: SearchServer (function): VCC vcc after vcc; CID: cid term: term

An internal program error has occurred. Contact the Fulcrum Customer Support Team.

58: SearchServer: failed to move filename to filename The process by which .DUP and .RUP replace and index files .DCT and .REF has failed. Provide sufficient disk space for the operation.

60: ftcin: date field at input line number ignored for catalog id cid

This is a warning from the ftcin utility.

66: SearchServer: can’t open filename filter; catalog record cid deleted

The specified file has been deleted since indexing started, or the file format is incompatible with the text reader list specified. If the file still exists, correct its format and execute another INSERT statement to add it to the table again. 67: SearchServer: can’t open filename filter; catalog

record cid ignored An operating system file or directory that was renamed or deleted during program execution could not be accessed. Recreate the specified file or directory, or delete this row of the table and re-execute the VALIDATEINDEX statement.

68: SearchServer: can’t access filename filter; catalog record cid ignored

The file specified could not be read for indexing. Change the permissions on the file or delete this row, and re-execute the VALIDATEINDEX statement.

69: SearchServer: can’t get dictionary’s status to build index

An internal program error has occurred. Contact the Fulcrum Customer Support Team.

70 : ftserver: no network filters defined

Page 99: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 99

The ftserver utility could not proceed because the network connectors have not been defined. Instructions for configuring distributed operations are provided in Fulcrum SearchServer Getting Started for your platform.

71: utility: illegal option message The specified option wasn’t valid for this utility.

72: utility: unknown conversion character message An internal program error has occurred. Contact the Fulcrum Customer Support Team.

73: utility: can’t find configuration file for cname SearchServer could not find the configuration file for the specified table. Check the table name for misspelling and try again. Otherwise, verify that the table exists and that it has been created correctly. For more information about the FULSEARCH data source, see Fulcrum SearchServer Getting Started for your platform.

74: SearchServer: concordance network failed: doc cid fterrno number

An internal program error has occurred. Contact the Fulcrum Customer Support Team.

77: SearchServer: word too long in text; rest of document cid not indexed

The maximum word size has been exceeded in the document with the specified cid. The document must be modified before it can be fully indexed. You can either split the word size, or insert control sequences to disable indexing. For more information about the maximum word size, see Fulcrum SearchServer Getting Started for your platform.

78: SearchServer: word too long in catalog; rest of catalog record cid not indexed

The maximum word size has been exceeded in the catalog data. Use the ftcin and ftcout utilities to edit the data in the specified row. You can either split the word size, or insert control sequences to disable indexing. For more information about the maximum word size, see Fulcrum SearchServer Getting Started for your platform.

79: SearchServer: error reinitializing cname.dyx fterrno number

Page 100: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

100 SA-Application Software Expert 5.0

B

During the merge phase of indexing an immediate table, an error occurred initializing the immediate index. See the INCLUDE file (FTERROR.H) for a description of the fterrno values. Recreate the table using the ftcout and ftcin utilities to restore the catalog data, and report the circumstances to the Fulcrum Customer Support Team.

80: SearchServer: buffer too large (number) The space requested by the BUFFER parameter of the VALIDATEINDEX statement could not be allocated. Try a smaller buffer size.

82: SearchServer: warning failed to unlink filename errno number

A previously allocated temporary file could not be deleted (unlinked). Ensure that a temporary file hasn’t been left on the disk. To find this temporary file, examine the current working directory and the directories pointed to by FULTEMP.

85: SearchServer: seq err number offset: number current: term cid vcc previous: term cid vcc

An internal program error has occurred. Contact the Fulcrum Customer Support Team.

86: invalid filter list An attempt was made to open a file using an invalid text reader list. Check the text reader name specified. It is also possible that the file and text reader are mismatched (for example, a WordPerfect document and a Microsoft Word text reader).

87: SearchServer: bad format number at offset numbers term: term CID: cid VCC: vcc Immediate index was corrupted. Run a VALIDATE INDEX statement that specifies the ABANDON parameter, and contact the Fulcrum Customer Support Team.

88: SearchServer: bad format number at offset numbers term: term CID: cid

Immediate index was corrupted, bad FT_CID (cid), or indexing information encountered. Run a VALIDATE INDEX statement that specifies the ABANDON parameter, and contact the Fulcrum Customer Support Team.

89: utility: CID cid busy

Page 101: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 101

One of the utilities (ftcout, ftcin) encountered a busy catalog record during processing. This means that the table doesn’t have the NOLOCKING table parameter, and another program was reading that record. The utility program should be re-run for that record.

90: ft: invalid or incomplete termcap entry for terminal type “type”

The srchdoc termcap file was corrupted. Correct the entry for your terminal, or use a different terminal type. ft was renamed to srchdoc for SearchServer use.

92: ftlock: high-integrity mode is not supported There is no support for catalog locking/high-integrity mode on this machine. This error is reported if the -h flag is selected on such an environment. Run ftlock without the -h flag.

104: utility: exclusive write access to existing catalog and index files is required

An attempt was made to remove or recreate a table for which some of the files have the read-only attribute set. Use the attrib (in MS-DOS) or chmod (in UNIX) command to set the “write” permission on all table files.

106: ftlin: usage: ftlin library-name [-i in-files] [-s [-o text-catalog]]

Usage message for the ftlin utility.

108: ftmload: ‘name’ is not an editable object An attempt was made to modify a non-variable part of the SearchServer message file. 109: SearchServer: read and write access to filename is required The directory for the index files could not be accessed.

111: utility: configuration file for collection cname too big for buffer The configuration file for the table was too long. Edit the configuration file to remove unnecessary statements, and/or shorten explicit paths.

112: utility: syntax error in configuration file or stop file for collection cname

Page 102: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

102 SA-Application Software Expert 5.0

B

The configuration file or stop file was damaged. You might have damaged it while editing to enable a feature of SearchServer. Correct any invalid edits.

113: utility: catalog or index files not defined for collection cname

These are messages related to errors in the configuration file or stop file. If the file has been edited, check it for errors and correct them.

116: ftlout: usage: ftlout library-name Usage message for the ftlout utility.

117: ftlock: usage: ftlock cname... ([-l] [-u] [-h | -c] [-r]) | [-n]

Usage message for the ftlock utility.

120: ftlock: collection cname already locked The table was already locked. This is an information message.

122: ftlock: collection cname status = unlocked The cname table can be indexed or dropped.

123: ftlock: collection cname status = locked The cname table has been locked for one of the following reasons:

•The table has been locked to prevent indexing or being dropped.

•The table is currently being indexed.

•The last indexing operation on this table failed.

124: ft: can’t find message file filename The standard srchdoc message file FULTEXT.MSG could not be found. Use srchdoc only for viewing the SearchServer documentation (STDOCS). ft was renamed to srchdoc for SearchServer use.

126: utility: syntax error in configuration file %s A syntax error has been found in the configuration file. The probable cause for this error occurs when editing the file. Check the file for syntax errors, and make the necessary corrections. 127: utility: can’t access configuration file for collection

cname: message

Page 103: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 103

The access permissions are set incorrectly in the configuration file for the table. Correct the permissions using the attrib (in MS-DOS) or chmod (in UNIX) command.

128: utility: can’t initialize configuration file filename The configuration file has been found, but it can’t be read.

129: ft: no message file available for collection cname srchdoc could not find the STDOCS.MSG file. In distributed environments, the message file is expected to reside on the client node. ft was renamed to srchdoc for SearchServer use.

130: utility: can’t open message file filename A message file was found, but could not be opened. Check the access permission on the message file, and modify it so that SearchDoc can open the file.

131: ft: error number at line number in message file filename

An invalid change was made to the message file. Restore the original message file, and use srchdoc only for viewing the SearchServer documentation (STDOCS). ft was renamed to srchdoc for SearchServer use.

132: utility: can’t access file filename for collection cname: message

The file or table named either could not be found, or could not be accessed because permission has been denied, or the file isn’t readable. Check file permissions, and make sure the table has been created.

133: ft: error number in template number in message file filename An invalid change was made to the message file. Restore the original message file, and use srchdoc only for viewing the SearchServer documentation (STDOCS). ft was renamed to srchdoc for SearchServer use.

136: utility: collection cname locked: requested action not performed

The specified table is locked and the procedure invoked wasn’t performed. Unlock the table using the UNPROTECT statement and run the procedure.

140: utility: write access to catalog and index files required for collection cname

Page 104: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

104 SA-Application Software Expert 5.0

B

Write access for table files was denied. The permission might have been changed, or the files might have been created by a different user. Correct the permissions using the attrib (in MS-DOS), chmod (in UNIX), or the ownership using chown (in UNIX) command. You can also remove and re-create the table.

141: SearchServer: can’t find catalog for collection cname Either the catalog file has been deleted, or there are errors in the configuration file. Delete the remaining files and start again, or correct the configuration file.

143: utility: file size limit exceeded while writing message Your operating system imposed a quota on the file size, or disk usage was exceeded. Delete any unnecessary files, or ask your System Administrator to increase your quota.

144: SearchServer: no documents require indexing The VALIDATEINDEX statement found no rows that required indexing. This is an information message only.

146: ftcout: usage: ftcout cname [-r startcid:endcid] [-l] [-x] [-t text catalog] [-e]

Usage message for the ftcout utility.

147: utility: inconsistent or corrupt catalog for collection cname

The catalog file was damaged. Remove and re-create the table. If the cause of the damage is known (for example, your system crashes), restore the system from your backup copies. Otherwise, contact the Fulcrum Customer Support Team.

149: ftcin: syntax error in input at line number The text-catalog contained syntax errors. Edit the text-catalog.

153: utility: catalog record overflow There was too much catalog information. Remove some of the column data to be cataloged.

154: utility: attempt to exceed maximum catalog size for collection cname

An attempt was made to access a row that doesn’t exist, or an attempt was made to write a new record beyond the permissible range. The ftcout utility reports this error if requested to output a row that is beyond the end of the table. You can obtain the maximum catalog identification number for a table by running ftlock with the -r parameter.

Page 105: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 105

The ftcin utility reports this error if the catalog identification number (FT_CID) in the text catalog exceeds the maximum allowable limit (16,777,215).

155: utility: terminated by user The specified utility program has been terminated by an interrupt or quit signal from the keyboard.

156: size of catalog files: number This information message is written to the log file when indexing is completed and any rows have been added, deleted, or re-indexed.

157: size of index files: number This information message is written to the log file when indexing is completed and any rows have been added, deleted, or re-indexed.

159: utility: missing argument for option message The specified option letter requires that an argument follow it, but none was given. Re-run the utility with the correct argument list.

165: SearchServer: CID:cid FTCDYNDX not set An inconsistency was found in the immediate index during the merge phase of indexing. Rebuild the table using the ftcout and ftcin utilities. Report the problem to the Fulcrum Customer Support Team.

166: SearchServer: word length exceeded maximum value The dictionary file is corrupt or a table containing a long word was indexed on a system with the standard maximum word length (for example, 250 in UNIX) and then re-indexed on another system with a smaller maximum permissible word length (80 under MS-DOS). Break the long word into two words, or move the table back to a system with the standard maximum word length and re-index the table. 167: ftlock: warning: collection cname is not searchable The table isn’t searchable because it has never been indexed, has no rows, or the index files are missing or damaged. Try running a VALIDATE INDEX statement that specifies the ABANDON parameter.

169: ftcin: indexing in progress for filename (catalog id cid)

Page 106: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

106 SA-Application Software Expert 5.0

B

The ftcin utility has detected a row with the status FTCBUSY, meaning that the row is currently being indexed. If the row has no external document, the filename displayed is two double quotation marks (““).

170: SearchServer: collection cname was created with newer version of Ful/Text

An incompatible version of SearchServer was used with a table. If possible, use the version of software that was used to create the table.

171: ftcin: filename not cataloged: ftcsdup() found busy

recordsA file was found that was suspected of being new to the table (not previously cataloged), but its filename could not be checked against at least one busy row. This means that the table is in high-integrity mode and another program was reading that record. Therefore, no action was taken to catalog that file. Re-run the ftcin utility. 172: ftlock: collection: cname-or-ftlock: meta-collection:

cname The ftlock utility displays this message at the head of the report produced by the -r option. The term “collection” refers to your table, and “meta-collection” refers to a SearchServer view. The report can contain any of the following messages:

Note: Entries separated by slashes are alternatives—only one entry will occur.

•location: filename

•index type: immediate (index block size number)

•index type: periodic

•mode: high/low integrity

•status: locked/unlocked

•size: small/medium/large

•term order: word/zone

•zone size: number (bytes)

•bin-size: number (bytes)

•entry-bins: number

Page 107: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 107

•max fields: number

•FTCULMT: on

•indexed by: release

•highest CID in use: cid

•directory: filename

•dictionary statistics: absent or incomplete

•collection contains a DTD entry

•number of components: number

194: ft: can’t find “termcap” file The termcap file could not be found. Ensure that the termcap file exists in one of the directories specified in the FULSEARCH parameter. For more information about the FULSEARCH parameter, see Fulcrum SearchServer Getting Started for your platform. ft was renamed to srchdoc for SearchServer use.

195: ft: can’t open termcap file “filename” An internal error or file could not be opened/read. Verify that the file is readable; if it is not, contact the Fulcrum Customer Support Team. ft was renamed to srchdoc for SearchServer use.

196: ft: no entry in “termcap” file for terminal type “message”

The SRCHDOC user interface could not find an entry for the given terminal type in either the SearchServer termcap file, or the system termcap file. Create an entry for your terminal in the SearchServer termcap file, or use a different terminal type. ft was renamed to srchdoc for SearchServer use.

203: SearchServer: skipping filename (catalog id cid) The VALIDATEINDEX statement could not process the designated row at this time. The row is left in the table and will be checked again by the next VALIDATEINDEX statement.

204: number documents This message reports on the number of rows that were indexed by the VALIDATE INDEX statement currently executed.

205: number records were deleted

Page 108: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

108 SA-Application Software Expert 5.0

B

This message reports on the number of rows that were deleted by the VALIDATEINDEX statement currently executed.

206: number records were skipped This message reports on the number of rows that could not be indexed, but were left in the table by the VALIDATEINDEX statement currently executed. These rows will be indexed when the next VALIDATEINDEX statement is executed.

207: number words processed This message reports on the number of words (indexable terms) contained in the rows indexed by the VALIDATEINDEX statement currently executed.

208: SearchServer: insufficient memory to index collection (number)

While indexing under MS-DOS, SearchServer discovered that there wasn’t enough physical memory on the machine to continue processing. The number in parentheses represents the amount of memory required to complete this operation.

209: SearchServer: warning: insufficient memory causing poor performance (number)

Under MS-DOS, indexing would run faster if there were more physical memory. The number in parentheses represents the amount of memory recommended for this operation.

210: fthmake: variant rules file too large, not all substitutions read

There were too many entries in the variant rules file. Reduce the number and/or complexity of this file.

211: ftcin usage: ftcin cname [-n] [-p] [-q] [-m|-c] [-v] [-x] [-t text-catalog] [-i inpath] [-o outpath] [-b bufsize]

Usage message for the ftcin utility.

223: SearchServer: failed to update filename as obsolete An unsuccessful attempt has been made to write to the dictionary. Once users have closed the dictionary, another attempt will be made to delete it as part of the index update process. Contact the Fulcrum Customer Support Team.

234: fthtest: usage: fthtest: thesaurus-file [term...] [-q case_norm] -or- fthtest: term1 [-h thesaurus-file] [-l variant-file] [-c cname] [-t outfile] [-q case_norm]

Usage message for the fthtest utility.

Page 109: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 109

235: SearchServer: indexing performance degraded; users have collection open for write

Under MS-DOS, the indexing software attempted to open the table for exclusive use, but failed. The table was then opened without exclusive use, which can cause poor performance. To obtain optimum indexing performance, prevent access to the table (for example, using the ftcin utility) when indexing is to be performed.

236: term: term A message from the fthtest utility indicating the term from the command line that is being looked for in the thesaurus.

237: enter term: A message from the fthtest utility prompting the user to enter a term (or cancel to exit).

238: synonym: A heading from the fthtest utility followed by a list of synonyms for the specified term.

239: (synonym empty) A message from the fthtest utility indicating that the thesaurus contains a synonym rule for the given term that has an empty synonym list.

240: suffix: A heading from the fthtest utility followed by a list of words with alternative suffixes derived from the specified term.

241: (suffix empty) A message from the fthtest utility indicating that the thesaurus contains a suffix rule that matches the given term, but which generates no alternative words.

242: “term” not found A message from the fthtest utility indicating that the term entered isn’t found, and isn’t matched by any suffix rule in the named thesaurus.

245: fthmake: words at end of file filename There was a syntax error in the thesaurus source file (such as a missing semicolon). Correct the thesaurus source file.

246: fthmake: need right terminator in filename

Page 110: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

110 SA-Application Software Expert 5.0

B

There was a syntax error in the thesaurus source file (such as a missing semicolon). Correct the thesaurus source file syntax.

247: fthmake: suffix root is in twice in filename The same suffix rule appeared more than once in the thesaurus source file being used. Remove the redundant suffix rule from the thesaurus source file.

248: fthmake: program failure An internal program error has occurred. Contact the Fulcrum Customer Support Team.

249: fthtest: term1 [-h\ thesaurus-file] [-l\ variant-file] [-c\ cname] [-t\ outfile] [-q case_norm]

Second line of usage message for the fthtest utility.

260: recommended: make more disk space available and retry indexing

The indexing software writes this message to the log file when indexing fails due to a lack of disk space.

261: ftlock: index files unavailable; use ftindex -a cname The dictionary (.DCT), reference (.REF), and/or differential (.DYX) files are missing. Rerun a VALIDATE INDEX statement that specifies the ABANDON parameter.

262: recommended: make more disk space available; rename .dup,.rup to .dct,.ref

The indexing software writes this message to the log file as a suggestion to overcome space problems. It might be appropriate to move some of the files to another file system.

263: internal software error -- please contact Customer Support

An unexpected condition occurred during indexing or executing a utility. Please contact the Fulcrum Customer Support Team.

266: SearchServer: insufficient disk space for CID:cid size:number (bytes)

SearchServer has determined that the current document can’t be indexed due to space limitations. Make more disk space available, or move the index or temporary files for indexing to another location and then retry indexing.

Page 111: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 111

267: SearchServer: insufficient disk space to continue indexing

SearchServer has determined that there is insufficient disk space for indexing prior to processing any documents. Make more disk space available, or move the index or temporary files for indexing to another location and then retry indexing.

278: syntax error in stop file at line number: got message, message

The ftlock utility displays this message on the screen and SearchServer writes it to a log file when it encounters a syntax error in the stop file while accessing a table. The first “message” indicates the stop file where the error was detected, and the second is a description of what was expected at that point. Edit the stop file to correct the error indicated by the message.

280: utility: index files uninitialized; use ftindex -y The stop file hasn’t been compiled into the dictionary for an immediate table. Execute a VALIDATEINDEX statement that specifies the ABANDON parameter on the table.

286: SearchServer: syntax error in library expansion stream of filename

The library expansion text reader didn’t supply data in the expected syntax. This message indicates a problem with the library text reader.

287: SearchServer: missing filter list; document not catalogued

The expansion library text reader didn’t supply a text reader list. This message indicates a problem with the library text reader.

288: SearchServer: invalid field id number; field data ignored

The library text reader has attempted to write data to an invalid column. This message indicates a problem with the library text reader.

290: stop file contains too many stop words The number of stop words is limited to 1024. Reduce the number of stop words in the stop file.

291: stop file is too large

Page 112: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

112 SA-Application Software Expert 5.0

B

The size of the stop file exceeded 10,000 characters. Reduce the size of the stop file.

292: duplicate stop word (term) The stop word term appeared more than once in the stop file. Remove duplicate entries for the term.

293: too many duplicate stop words More than ten stop words were duplicated. Remove all duplicate terms from the stop word file.

294: utility: no support for high-integrity collections on NFS

Some versions of Sun NFS protocol or transparent remote file access don’t support record locking for remote files. The message indicates that the attempted operation isn’t supported (for example, immediate indexing of a high-integrity table). Try to get the operating system upgraded to a more recent version, or use distributed operations.

295: ftlock: low-integrity not permitted for immediate collections

The ftlock utility was asked to change an immediate table to low-integrity mode. This operation isn’t permitted.

296: SearchServer: ulimit value too small to continue indexing

The value of ulimit (1), the process file size limit, was less than 768 kilobytes. Increase the ulimit value and re-execute the VALIDATEINDEX statement.

298: fthmake: variant rules file does not conform to required syntax

The variant rules file wasn’t constructed properly. Edit the variant rule file.

301: utility: network software not present for node/filter The required communications software wasn’t loaded.

302: utility: server not responding @node/filter No ftserver process was assigned to this network connector at the remote node.

303: utility: unrecognized node node

Page 113: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 113

The network software didn’t recognize the node name.

304: utility: undefined network filter filter The network connector identification string was incorrect.

305: utility: network error number @node/filter A physical or logical disruption in network communications has occurred (for example, broken cable, data overrun, etc).

306: utility: incompatible server software release @node/filter

The remote ftserver utility wasn’t compatible with the version of Ful/Text used by SearchServer.

307: ftlock: write access to filename required to search collection cname

File access permissions are incorrect on the designated file. Use the chmod command to grant write access to the file.

309: SearchServer: context map failure failure number catalog record cid not updated

The indexing software has detected either:

•Too many document markers. The maximum row size has been exceeded, therefore the catalog should be rebuilt with an increased binsize and/or entry-bins value. You may also need to reduce the number of markers placed in the text of the document.

•An inconsistency in the number of values provided between one marker and another. Rewrite the text reader that inserts document markers into the text so that consistent values are provided between markers.

313: ftserver: usage: ftserver [-w] [-q] [-l log [-r]] [-f nfid] [-p FULPATH] [-n | -u username] [-t FULTEMP] [-x PATH] [-c curdir] [-m FULCREATE] [-z FULSEARCH] -or- ftserver -a argsfile

Usage message for the ftserver program.

320: utility: converting catalog to new format Produced by the ftcin utility and the VALIDATE INDEX statement when catalog conversion is required. This doesn’t occur for tables created by SearchServer.

321: ftmload: usage: ftmload sourcefile [-m minifile] [-f filter_list] [-v] [-e] [-o object_name]

Page 114: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

114 SA-Application Software Expert 5.0

B

Usage message for the ftmload utility.

322: utility: source filename: “filename” This information message identifies the input file when the -v option is used by ftmload or ftmunld.

323: utility: target filename: “filename” This information message identifies the output file when the -v option is used by ftmload or ftmunld.

324: utility: object name: “objectname” This information message identifies the message object loaded when the -v option is used by ftmload or ftmunld.

325: ftmload: message #number is too long (max number characters)

The number of characters indicated in the ftmess utility message was too long to be compiled into FULTEXT.FTC. Reduce the number of characters in the message to the indicated length.

326: ftmunld: inconsistent or corrupt file filename The minifile filename is corrupt or has an invalid “index object.” Restore FULTEXT.FTC from your backup files or the SearchServer distribution disks. If necessary, re-install any custom text readers.

327: utility: too many collections Too many tables were specified on the command line.

337: SearchServer: indexing state number An internal software error occurred during indexing. Contact the Fulcrum Customer Support Team.

338: utility: not supported for meta-collections Any activity that writes to the catalog isn’t supported for views, or an attempt was made to access an invalid (nested) view.

342: ftcin: -i and -o parameters are not supported for remote collections

The ftcin utility reports this message when the -i and -o parameters (applicable to local tables only) are specified on the command line, but cname refers to a remote table.

343: fthmake: word too long, unable to compile thesaurus term

Page 115: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 115

The thesaurus source file could not be processed because it contains a word that is longer than 250 characters (or 80 characters under MS-DOS). Edit the thesaurus file to truncate the long word(s), and re-run the fthmake utility.

351: ftlock: collection cname: invalid meta-collection component cname

The specified view could not be examined because one of its component tables could not be found. Make the specified component table accessible, or remove the applicable COL= entry from the view’s configuration file.

352: Unable to access message number number The FULTEXT.FTC message file doesn’t contain the specified message. Restore the FULTEXT.FTC file from the distribution disks or backup, or contact the Fulcrum Customer Support Team.

353: Can’t access file “filename” message number number The utility message file wasn’t accessible. Ensure that the FULTEXT.FTC file directory is specified in the FULSEARCH parameter. For more information about how to set this parameter, see Fulcrum SearchServer Getting Started for your platform.

354: No text for message number number The utility message file contains no text for the specified message. Restore the FULTEXT.FTC file from the distribution disks or backup, or contact the Fulcrum Customer Support Team.

355: SearchServer: WARNING: disk-space checking is disabled because of UNC files

In the MS-DOS environment, disk space checking keys on the names of the table support files (except the .CFG, .STP, .MSG files), and on the directory names that can be specified in the WORKDIR and SW2 parameters. When these entities are named using LAN Manager UNC syntax, normal disk space checking can’t be performed, and message 355 is written to the log file.

360: Duplicate concordance term ignored: term CID cid position/vcc number

Page 116: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

116 SA-Application Software Expert 5.0

B

The indexing engine detected two instances of the same term with identical text character count information. Contact the Fulcrum Customer Support Team.

361: fthmake: usage: fthmake srcfile objfile [logfile |-][-f filter-list] [-l rules] [-q case_norm]

Usage message for the fthmake utility.

365: ftcin: it is not possible to write a DTD entry to a remote collection, collection contains a DTD entry

The ftcin utility cannot be used to load table data in a remote table (client/server). Load the table by running ftcin locally to the target table.

367: ftcout: error in DTD entry in collection cname The schema for table cname was corrupt. Execute a CREATESCHEMA statement that specifies the REPLACE option. If this fails, remove and re-create the table.

370: ftlin> The ftlin utility is prompting for the list of filenames to be written to the library file.

371: ftserver: unable to set priority to priority number, use 256, 512, 1024 or 768

This utility message occurs if:

•A priority could not be obtained. The priority can be obtained only in OS/2 operating system environments. Remove the -s option from the ftserver command.

•An invalid priority number was specified in the ftserver -s option. Re-execute ftserver using 256, 512, 1024, or 768.

•A valid priority was given, but an attempt to change to the priority failed.

388: SearchServer: catalog record overflow, old data left

as indexed: CID cid Column data generated by text readers was not written to the table. Generally, this will occur because the marker data in the document was too large for the table catalog record. Change the text reader so that it produces less column data for each row, or re-create the table with a larger maxbins value. Contact the Fulcrum Customer Support Team. Note : Indexing will not fail, and old indexes will not be affected

Page 117: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 117

390: utility: FULTEMP is not a writable directory Verify access to the directory specified in the FULTEMP parameter. For more information about the FULTEMP parameter, see Fulcrum SearchServer Getting Started for your platform.

391: ftserver: FULCREATE is not a writable directory Verify access to the directory specified in the FULCREATE parameter. For more information about the FULCREATE parameter, see Fulcrum SearchServer Getting Started for your platform.

392: ftmunld: Unable to unload object ftmess ftmunld could not load ftmess and etmess objects in a non development environment.

401: ftcin:-t input filename is mandatory on Windows In the Microsoft Windows environment, the catalog data must be supplied to ftcin using the -t option. Re-execute ftcin using the -t option.

403: utility: Unrecognized case normalization specification: case specification

The fthmake or fthtest utilities were supplied with an invalid case normalization specification using the -q option. Re-execute either utility by using one of the following case specifications:

•DEFAULT

•ASIAN

•NONE

•EUROPA3

404: utility: Unrecognized case normalization code:code This utility message occurs if no table and case normalization specification was specified using the -q option. The case normalization specification setting was attempted using a default case code. This attempt failed. Re-execute fthmake or fthtest by specifying a table, or using the -q option.

407: ftserver: Failed to bind DLL specified on line line number of dynamic library table

Page 118: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

118 SA-Application Software Expert 5.0

B

A document text reader in the document text reader list could not be dynamically bound. To find out which document text reader could not be bound, obtain a readable copy of the dynamic library table. This table is extracted from FULTEXT.FTC using ftmunld in the following syntax:

ftmunld image.dlt -o fultext.eft

The readable copy of the dynamic library table is stored in IMAGE.DLT. Position to the line in IMAGE.DLT given in the error message. The third field of this line contains the name of the document text reader dynamic load library file. In order to obtain a readable copy of the dynamic library table, you should also verify the following:

1. Is the dynamic load library filename spelled properly?

2. Does the specified dynamic load library file exist?

3. Has the dynamic load library file been compiled and linked with all the required compiler and linker switches needed to make a dynamic loadable library?

4. Does the dynamic load library file reside in a directory where it will be found by running the SearchServer applica-tion?

408: ftserver: Can’t find function named on line line number of dynamic library table

A document text reader function entry point could not be found in its dynamic load library. To find out which document text reader entry point could not be found, obtain a readable copy of the dynamic library table. This table is extracted from FULTEXT.FTC using ftmunld in the following syntax:

ftmunld image.dlt -o fultext.eft

The readable copy of the dynamic library table is stored in IMAGE.DLT. Position to the line in IMAGE.DLT given in the error message. The third field of this line contains the name of the document text reader dynamic load library file. The next fields contain function entry points of the document text reader dynamic load library.

The following conditions should also be verified:

Page 119: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 119

1. Are all function name fields spelled properly?

2. Do the named functions reside in the dynamic load library?

3. Does an obsolete dynamic load library of the same name currently exist as the one specified, but does not contain all of the named functions?

4. Does the specified library reside in a directory that would be found instead of the intended directory?

409: ftserver: Syntax error on line line number of dynamic

library table A syntax error was detected in the dynamic library table. To view this error, obtain a readable copy of the dynamic library table. This error is extracted from FULTEXT.FTC using ftmunld in the following syntax:

ftmunld image.dlt -o fultext.eft

The readable copy of the dynamic library table is stored in IMAGE.DLT. Position to the line in IMAGE.DLT given in the error message. This line contains the syntax error.

The following conditions should also be verified:

1. Is the first field valid? The correct choices are: “FIDF” (for a text reader), “FITT” (for a character translation table), or “FINF” (for a network connector).

2. Are any of the fields on the line too long? The identifier field (second field) should not exceed 30 characters. The dynamic library name field should not exceed 255 charac-ters ( 80 under Microsoft Windows 16-bit environments). Function name fields should not exceed 30 characters.

3. All characters on the line should be printable characters.

4. Non-whitespace characters should not appear after the last field on the line.

5. The end of the file should not come before the end of the line. Each line should be terminated normally.

Page 120: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

120 SA-Application Software Expert 5.0

B

410: ftserver: maximum list of network filters expected: count chars, got: count chars

The length of the -f option value is greater than the maximum allowable length. Reduce the length of the text reader list specified in the -f option to less than or equal to the “expected count” displayed in the message text.

426: fthmake: thesaurus file name is too long. Do not exceed number characters

The filename for the compiled thesaurus object file is too long. This is the filename without any extension or pathname attached. For information about the maximum filename length, see Appendix A, “SearchServer Specifications,” in Fulcrum SearchServer Getting Started for your platform.

427: utility: OPT entry has been modified; use ftindex -a The OPT entry in the table’s configuration file has been modified. Re-index by VALIDATE INDEX tablename ABANDON.

435: SearchServer: open failed due to timeout The document couldn’t be opened in the allotted time. The allotted time is 20% of the timeout base-time set using the SQLSetConnectOption. This message is written to the log file of the table. 436: SearchServer: open failed due to exception

exception_code at address While trying to open a document an exception code was encountered. The exception code was caused by a catastrophic text reader error detected by the firewall mechanism during indexing. This message is written to the log file of the table. 437: SearchServer: open failed due to watchdog system

error: fterrno While trying to open a document the watchdog system detected error fterrno. The watchdog system protects against indefinite loops caused by event exceptions. This message is written to the log file of the table. 438: SearchServer: document not completely indexed due to

timeout

Page 121: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 121

The document couldn’t be indexed in the allotted time. The allotted time is 100% of the timeout base-time set using the SQLSetConnectOption. This message is written to the log file of the table. 439: SearchServer: document not completely indexed due

to exception exception_code at address While trying to index the document an exception code was encountered. The exception code was caused by a catastrophic text reader error detected by the firewall mechanism during indexing. This message is written to the log file of the table. 439: SearchServer: document not completely indexed due

to exception exception_code at address While trying to index the document an exception was generated by the text reader. For text readers that perform document format intepretation, such as ftmf, this error probably indicates a problem interpreting the document format. This message is written to the log file of the table. 440: SearchServer: document not completely indexed due

to watchdog system error: fterrno While trying to index a document the watchdog system detected error fterrno. The watchdog system protects against indefinite loops caused by text readers. This message is written to the log file of the table.

441: SearchServer: can’t parse filename text_reader (catalog record cid)

This message is written to the log file of the table if indexing fails while a particular document is being indexed. The purpose of this message is to identify what document text reader and cid were involved in the failure. Look for the next message in the log file for more information about the nature of the failure.

442: SearchServer: close failed due to timeout The document couldn’t be closed in the allotted time. The allotted time is 20% of the timeout base-time set using SQLSetConnectOption. For text readers that perform document format intepretation, such as ftmf, this error probably indicates a problem interpreting the document format. This error could also occur due to network problems if the document is located remotely. This message is written to the log file of the table.

443: SearchServer: close failed due to exception exception_code at address

Page 122: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

122 SA-Application Software Expert 5.0

B

While trying to close a document an exception code was generated by a text reader. For text readers that perform document format intepretation, such as ftmf, this error probably indicates a problem interpreting the document format. This message is written to the log file of the table.

444: SearchServer: close failed due to watchdog system error: fterrno

While trying to close a document the watchdog system detected error fterrno. The watchdog system protects against indefinite loops caused by text readers. This message is written to the log file of the table. 445: SearchServer: close failed due to insufficient memoryThe mechanism used to detect timeouts and exceptions requires a small amount of memory. However, if this memory is unavailable during an attempt to close a file that has been opened for indexing, such errors cannot be detected. Accordingly, the close attempt is abandoned and this message is written to the log file of the table.

446: SearchServer: can’t close filename text_reader; (catalog record cid)

The mechanism used to detect timeouts and exceptions requires a small amount of memory. However, if this memory is unavailable during an attempt to close a file that has been opened for indexing, such errors cannot be detected. Accordingly, the close attempt is abandoned and this message is written to the log file of the table.

447: SearchServer: can’t get date and size filename text_reader; (catalog record cid)

This message is written to the log file of the table to identify a catalog record with an external file whose date and size attributes can’t be retrieved. Look at the next message in the log file for more information about the cause of this failure.

448: SearchServer: failed to get date and size due to timeout

SearchServer couldn’t obtain the document’s date and size in the allotted time. The allotted time is 20% of the timeout base-time set using SQLSetConnectOption. This error could also occur due to network problems if the document is located remotely. This message is written to the log file of the table.

449: SearchServer: failed to get date and size due to exception exception_code at address

Page 123: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchServer Utility Messages

Text Retrieval Guide 123

While trying to obtain the date and size for a document an exception was generated by a text reader. For text readers that perform document format intepretation, such as ftmf, this error probably indicates a problem interpreting the document format. This message is written to the log file of the table.

450: SearchServer: failed to get date and size due to watchdog system error: fterrno

While trying to obtain a document’s date and size the watchdog system detected error fterrno. The watchdog system protects against indefinite loops caused by text readers. This message is written to the log file of the table.

451: SearchServer: expansion failed for filename (CID cid) due to exception exception_code at address

While trying to expand filename an exception was generated by a text reader. This message is written to the log file of the table.

452: SearchServer: expansion failed for filename (CID cid) due to timeout

Dictionary expansion couldn’t be completed in the allotted time. The allotted time is 100% of the timeout base-time set using the SQLSetConnectOption. This message is written to the log file of the table.

Page 124: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

124 SA-Application Software Expert 5.0

B

Page 125: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This appendix is reprinted with permission from Fulcrum Technologies, Inc., and contains the SearchServer 3.5 Data Preparation and Administration Manual.

CData Preparation and Administration

Page 126: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

126 SA-Application Software Expert 5.0

C

Preface

This preface provides:

• a description of the intended audience

• a synopsis of each chapter

• a summary of the text conventions

• abstracts of the other documents in the SearchServer documentation set

About this Manual

This manual explains how to prepare and administer SearchServer tables using the administration tools. It is intended for database ad-ministrators and application developers who are responsible for im-plementing and maintaining an application that incorporates SearchServer text-retrieval technology.

This manual provides specifications for SearchServer data organiza-tion, and instructions for assigning names and definitions in a SearchServer schema. It also provides information about data ma-nipulation and data maintenance. The following is a brief descrip-tion of what you can find in this manual:

Chapter 1, “Introduction,” provides an overview of the steps you'll need to follow to prepare and maintain the data for an application.

Chapter 2, “The Administration Tools,” introduces the administra-tion tools and describes the data preparation tasks you can perform with SearchServer.

Chapter 3, “Structuring the Data,” guides you through the steps that are required to create a table and prepare the data for search and re-trieval.

Chapter 4, “Using External Text,” describes how to choose text readers and prepare external table data.

Chapter 5, “Maintaining the Data,” describes the data manipulation functions. In this chapter, you'll learn how to delete, insert, and up-date rows, and index your table.

Chapter 6, “Altering the Table,” explains how to modify and move a table and associated files.

Page 127: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 127

Chapter 7, “Providing Support Files for Searching,” provides infor-mation about how to modify the default support files that are provid-ed with SearchServer.

Chapter 8, “Verifying the Table,” explains how to verify the table by performing some ad hoc searching on the data.

Appendix A, “Utility Program Summary,” provides a summary of the command line syntax and command parameters for invoking the utility programs.

Appendix B, “Text Readers,” describes how text readers translate character data from one format to another.

Appendix C, “Table Management Files,” provides information about the table management files and the data storage requirements for them.

Appendix D “Control Characters and Control Sequences” describes control characters and control sequences and how to use them in with your external table data.

Text Conventions

This manual uses the following conventions:

Convention What it is Used For

CaseSensitivity

Filenames and directory names are shown in UPPERCASE letters; however, they can be entered in lowercase letters if this is a requirement in your environment.

InitialCapitals

Initial capitals are used in Windows application program names. For example:

SearchDoc

UPPERCASE Letters

Uppercase letters are used to represent statement names, keywords, table names, column identifiers, environment variables, mnemonic symbols, data types, filenames, and directory names. For example:

SELECT, ALL, STDOCS, FT_CID, FTNPATH, SQL_SUCCESS_WITH_INFO, SMALLINT, FULTEXT.MSG, BIN

Page 128: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

128 SA-Application Software Expert 5.0

C

bold Bold letters are used to represent utility program names and function names. For example:

ftmload, SQLColAttributes

[ ] Square brackets ([ ]) indicate that the elements of the syntax between them are optional. In the following example, the WHERE clause is optional.

DELETE FROM <table name>

[WHERE <search condition>]

< > Angle brackets (< >) represent an element of the syntax you must substitute with a specific value. In the following example, you would supply the name of a schema.

CREATE SCHEMA [REPLACE] <schema name>

{ } Curly braces ({ }) represent groups of elements in the syntax. For example: CREATE TABLE <table name> (<column definition>[{, <column definition>}...])

| An OR bar ( | ) indicates a mutually exclusive entry. You can enter one of the options shown on either side of the bar, but not both. For example: <column name> {<data type> | <domain name>}

... An ellipsis (...) indicates that an element of the syntax can be repeated. For example: zone list ::= <zone number> [{,<zone number>}...]

Convention What it is Used For

Page 129: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 129

Related SearchServer Documentation

SearchServer includes a comprehensive documentation set that pro-vides the information you'll need to use SearchServer. If you also purchased a Fulcrum SearchServer Software Developer's Kit (SDK) or a Fulcrum SearchBuilder product, your documentation set will in-clude manuals written for your particular development environment (for example, a SearchBuilder for Visual Basic Developer's Guide).

What's New in SearchServer 3.0 Describes what's new and changed in SearchServer 3.0 and tells you where to look for more informa-tion. It provides a description of enhancements to the SearchSQL language statements as well as a description of the enhancements to the SearchServer API functions.

Introduction to SearchServer Provides a high-level introduction to the capabilities of SearchServer. It introduces the SearchServer con-cepts and describes the process required to embed text-retrieval in an application.

SearchServer Getting Started (platform specific) Provides installa-tion and configuration instructions and all platform-specific infor-mation (including limitations).

SearchSQL Reference Provides the complete definition (syntax and semantics) of the SearchSQL language. It also contains complete in-formation about searching tables and system information tables.

SearchServer Messages and Error Codes Provides quick and easy reference to the messages and error codes returned by SearchServer and the SearchServer utility programs.

SearchServer Database Integration Describes how to use database text readers that Fulcrum provides and explains how you can modify the template code. It also provides a guide to application-level inte-gration of text and structured data.

Fulcrum Customer ServicesWe've got some of the most knowledgeable experts in text-retrieval — experts in application design and development, database integra-tion and systems engineering. Fulcrum offers you a wide range of choices to help you leverage the value of SearchServer — by ana-lyzing your requirements and helping to design the application, by transferring knowledge to your developers through Fulcrum courses and seminars, by supporting Fulcrum products through our customer support team, and by offering expert consulting.

Customer SupportIf you have a question about SearchServer, first look in the printed version of the documentation, or consult the electronic version of the documentation (using SearchDoc) or online help. You can also find late-breaking updates and technical information about SearchServer by double-clicking the Readme icon in the Fulcrum program group or folder.

Page 130: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

130 SA-Application Software Expert 5.0

C

If you cannot find the answer, contact Fulcrum's Customer Support Team. Our technical support staff use Fulcrum's own text-retrieval software for fast and responsive phone support. Every support per-son has instant access to all of Fulcrum's support tools, including a history of known problems, on-line design notes and product docu-mentation, technical bulletins, and product source code.

Fulcrum allows you to choose the method of contact that best meets your needs, ranging from calling us directly to sending a request electronically.

Calling Directly Fulcrum provides telephone support to registered licensees of SearchBuilder and SearchServer Software Developer's Kits (SDKs) who have up to date support agreements. For technical support, call:

• 1-800-209-4357 (for support within North America)

• 1-613-238-7068 (for support within Ottawa and outside of North America)

Electronic Services When sending your request electronically, use the electronic version of the Case Report Form (CASE.TXT) and send it to:

[email protected]

Fax Services When sending your request by fax, use the Case Re-port Form located at the back of this manual and dial:

• 1-613-238-7695

Product Training and ConsultationTo quickly bring your developers up to speed on SearchServer, we offer hands-on, interactive education courses, featuring real-world examples. You can also benefit from Fulcrum's expertise through workshops on specialized application areas such as database integra-tion and text reader creation. Courses are available at Fulcrum training locations, or on-site at your offices to help maximize the use of Fulcrum tools within your own environment. Our lab at corporate headquarters in Ottawa, Can-ada, is also available for your development team, complete with ap-plication experts as required.

Consulting ServicesFulcrum's professional services consultants have been designing and creating powerful integrated text-retrieval solutions for years. They

can help guide you to success at each stage of the development pro-cess:

• Evaluation and prototyping

• Requirements analysis and design

• Code review and walkthroughs

• Building application components such as text readers (filters), high-level APIs, user interfaces and system administration utilities

• Integrating Fulcrum products with other technologies (database, im-aging)

• Benchmarking

Page 131: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Introduction

Text Retrieval Guide 131

Chapter 1:

Introduction

This chapter describes the steps you'll need to follow to prepare

and maintain the data for a SearchServer application. In it, you'll read about:

• the SearchServer SearchSQL language • the basic steps required to create and populate a table

Introducing SearchServer

Any information retrieval system has one objective—to help users search through large volumes of information to retrieve the specific material that they want to view or extract for further processing. To help you address this challenge, Fulcrum developed SearchServer, a powerful full-text search and retrieval engine.

Fulcrum SearchServer provides a full complement of text-retrieval capabilities to meet the information retrieval needs of your user community. It includes the utility programs and the run-time soft-ware required for executing text-retrieval applications built with the Fulcrum SearchServer Software Developer's Kit (SDK) or one of the Fulcrum SearchBuilder products.

About the SearchSQL Language

SearchServer's searching, indexing, and table management func-tionality is accessed through SearchSQL. SearchSQL is a language that offers you the complete set of the data definition, manipulation, and searching statements you'll need to create and administer a text-retrieval application. Once you've learned SearchSQL, you can: *pecify the organization of data by creating a schema *reate a table to contain the data *nsert, update, and delete rows in a table *earch a table (or multiple tables) by specifying the columns you want to

retrieve and the search criteria

Note: In this manual, you'll read mostly about the statements that re-late to data preparation and administration; namely, the data defini-tion and data manipulation statements. Information related to searching is covered in the Fulcrum SearchServer SearchSQL Ref-erence.

SearchSQL statements can be divided into three categories: Data

Page 132: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

132 SA-Application Software Expert 5.0

C

Definition Language (DDL), Data Manipulation Language (DML), and the Search and Retrieval Language (SRL).

Data Definition Language (DDL) StatementsSearchSQL includes the Data Definition Language (DDL) state-ments shown below. The DDL statements can be directly executed by an application program to permit schema and table definition and administration.

Data Manipulation Language (DML) Statements

SearchSQL contains the following Data Manipulation Language (DML) statements:

Search and Retrieval Language (SRL) Statements

SearchSQL contains the following Search and Retrieval Language (SRL) statements:

ALTER TABLE Add or delete one column in an existing schema.

CREATE SCHEMA Define a new schema and table, or replace the schema of an existing table.

CREATE TABLE Define a table.

DROP TABLE Delete an existing table.

PROTECT TABLE Protect a table from indexing, schema alteration, and removal.

UNPROTECT TABLE

Enable indexing, schema alteration, and removal.

VALIDATE INDEX Update the index for a table.

INSERT Insert new rows into a table.

UPDATE Update selected column values in table rows.

DELETE Delete rows from a table.

SELECT Search one or more tables and specify the columns and rows to be returned and the sort order of the resulting rows.

Page 133: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Introduction

Text Retrieval Guide 133

Steps in Preparing and Maintaining the Data

This section describes the process of building tables and maintaining the data using SearchSQL statements. Although the following steps imply that you've already completed the design of your application and its data model, you might find the information useful during the design phase.

Figure 1-1 shows the steps involved in this process.

Figure 1-1 Data Preparation and Maintenance Process

Structuring the Data

The first step is to decide how to structure the data into tables, rows, and columns to maximize the efficiency of the search engine while maintaining your flexibility in designing and retrieving the docu-ments that make up your table. This step involves designing the schema that will be associated with

CREATE TEXT_VECTOR

Prepare text to be used as the source for Intuitive Searching.

SET Set options for subsequently executed statements for the duration of a connection to a specific data source.

Page 134: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

134 SA-Application Software Expert 5.0

C

your data. In SearchServer, a schema is the data dictionary that be-longs to one table. It is the logical description of the data in that ta-ble. It defines the table, its columns, column attributes, zones, and domains.

Creating the Table

The next step is to create a table and its schema using either the CREATETABLE or CREATESCHEMA statement. A table is a two-dimensional grid of columns and rows. The rows and columns represent how the data is organized for update and retrieval. A row is a sequence of related values in a table, and contains one value for each column. A column contains values of the same kind and data type.

When a table is first created, it doesn't contain any rows of data. The schema defines all the columns, zones, and domains associated with the table. The table also has an associated set of optional table pa-rameters that describe the data administration characteristics of the table.

Defining the Columns

When creating a table, you must define the table's columns. When you define the columns, you are defining the actual structure of the data and the table. Each column has a set of column attributes. They are:

• column name

• data type

• field number

• index mode

The column name is simply an identifier that names the column. The data type defines the type of data (text, date, or numeric) that will be inserted into the column.

The field number assists SearchServer in locating the data when searching and indexing. The index mode determines the type of searching that can be performed on the column. You use the CRE-ATETABLE statement, or clause of the CREATESCHEMA state-ment, to define the columns and their column attributes.

Creating the Zones and Domains

Optionally, you can build zones and domains into your schema. Zones give the table more flexibility by allowing you to structure the data within a text column. For searching purposes, the data in the zone becomes an independent subsection of the text.

You create zone definitions using the CREATEZONE clause of the CREATESCHEMA statement. Each zone has a unique name and as-sociated zone numbers. These numbers relate to the zone markers (control sequences that delimit instances of the zone) in the text.

Page 135: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Introduction

Text Retrieval Guide 135

Domains are used to group zones in a column. When a domain is used to associate a list of zones with a column, a search on that col-umn is effectively a search on all its constituent zones. When search-ing, a match in any of the column's zones constitutes a match in the column itself. To create a domain, you use the CREATEDOMAIN clause of the CREATESCHEMA statement.

Domains have other purposes that you'll read about later in Chapter 3, "Structuring the Data." You can use a domain to specify an index mode for a column, or to define an alternative name for a data type.

Inserting, Deleting, and Updating the Data

Data management is an essential function in any system. SearchSQL has three statements to help you manage your data:

• INSERT

• UPDATE

• DELETE

The data that you're inserting or updating might need some prepara-tion to conform to your schema or table. For example, your docu-ments might need to be processed through a text reader if you want to allow zoning in your table.

Before you can search a table, you must insert some data into the ta-ble (a table doesn't contain any rows of data when it's first created.) To do this, you use the INSERT statement. You'll need to specify the name of the table into which you're inserting the new data, and the names of the columns and the associated value for each named col-umn. This step is also used to associate any external documents to the table.

For example, assume that you're building a table called SUPPORT using the rows and columns shown below (in addition to others):

You can create the first row using the following statement:

INSERT INTO SUPPORT (CREATOR, STATUS, SUBJECT) VALUES ('Polly', 'CLOSED', 'How to find words in sequence')

For applications where the data in the system is dynamic, you can change the data in the row of an existing table by using the UPDATE statement. This statement allows you to specify the selected rows in

CREATOR STATUS SUBJECT

1st row Polly CLOSED How to find words in sequence

2nd row Peter NULL Is there support for Ful/Text library files?

Page 136: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

136 SA-Application Software Expert 5.0

C

the table, the columns that are affected, and the new data for those columns.

When you need to delete data from your table, use the DELETE statement. This statement lets you specify the table and the search condition for selecting the rows to be deleted.

Indexing the Table

New or modified data must be indexed to make it searchable. Index-ing keeps the table's full-text index current with the new data. By de-fault, a table is created as an IMMEDIATE table so that new or updated data is immediately searchable. SearchServer does this by indexing the new data automatically during the execution of each data management statement.

However, if you specified the PERIODIC table parameter in the CREATETABLE clause, or executed a SETIMMEDIATE state-ment that specified the 'FALSE' parameter before the table was cre-ated, you must execute a VALIDATEINDEX statement to index the new and updated rows of the table.

In the case of an IMMEDIATE table, the VALIDATEINDEX state-ment reorganizes the indexes to optimize search performance. There are several optional parameters you can specify when executing this statement.

Note: Unless your application demands immediate searchability of modified data, you should create tables using the PERIODIC table parameter. This minimizes the index size, table loading and update time, and search time. You can mix both IMMEDIATE and PERI-ODIC tables in the same application.

The indexing engine determines what rows to index based on what has changed since the last indexing operation. It determines what search terms to index through the index modes of the columns spec-ified when the table was first created. It also determines how to in-dex the data through the data types chosen for the columns.

Other Administration Tasks

After you have created the table, inserted data, and indexed the table, you might need to alter its structure. SearchServer allows you to change the schema definition by adding or deleting a column, or change a column attribute (such as its name or index mode). You can also delete the table or move it to a new location. SearchServer pro-vides a set of utility programs that you can use to load a document library, a thesaurus, and character variant rules.

A Word About the Examples Used

Page 137: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Introduction

Text Retrieval Guide 137

The examples used throughout this manual are based on a customer support system. The table, named SUPPORT, contains all of the el-ements needed to exercise many of SearchServer's features. It con-tains columns and zones for searching, domains that group zones, and domains that change the default index mode of a column.

The names of some of the reserved columns have been changed to improve readability and to allow them to be used in SELECT state-ments that use the asterisk (*) option. (The asterisk (*) option selects all user-defined columns.)

You can create the SUPPORT table on your system using the Exec-SQL utility and scripts included with SearchServer. The working ta-bles presented in this manual contain data derived from the SUPPORT table.

Retrieved values that are longer than 50 characters are truncated, as in the SUBJECT column of the following working table:

Note: Where the schema and examples contain filenames, the UNIX syntax has been used, but they will work on any system that Search-Server supports.

The following code showsis the complete CREATE SCHEMA statement for the example SUPPORT table:

CREATE SCHEMA SUPPORT -- Customer Support table schema -------- -- Zones -------- -- Zones created by "t" filter from external document containing log -- text.-- Zone markers with the number below are inserted into log text by -- editor. -- There is one SUPPORT row for each incoming customer support call-- that begins a new subject for that customer. -- The external text log for each contains entries of the follow-ing -- form:

PROBLEM_NUMBER CREATOR SUBJECT

92011301 Polly What networking software is required to connect to

92012701 Marie How many connection handles can be opened concurre

92013001 Polly How do special characters in a query affect a sear

Page 138: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

138 SA-Application Software Expert 5.0

C

CREATE ZONE DATE_AND_TIME(201,202) -- date and time of this entry CREATE ZONE OPERATOR(211) -- who created this entry CREATE ZONE CONTACT(212) -- customer contact for this entry -- (if any) CREATE ZONE ENTRY_HEADING(201,202,211,212) -- all headings CREATE ZONE DESCRIPTION(32) -- record of conversation and action

---------- -- Domains ---------- CREATE DOMAIN LOG_DMN -- used to tie zones to log file (DATE_AND_TIME, OPERATOR, CONTACT, DESCRIPTION, ENTRY_HEADING) AS APVARCHAR CREATE DOMAIN PRIORITY_DMN -- used to search by priority >n VALUE AS INTEGER -- single digit CREATE DOMAIN PHONE_DMN -- not searched NONE AS VARCHAR(15) CREATE DOMAIN TLF_DMN -- not searched NONE AS VARCHAR(260) CREATE DOMAIN VERSION_DMN -- searches include punc-tuation LITERAL AS VARCHAR(10)

CREATE TABLE SUPPORT ( -- begin column definitions ------------------ -- Reserved Columns ------------------ -- FT_FLIST -- filter is always 't'TEXT_LOG_FILE TLF_DMN 3,-- FT_SFNAME: Ext. doc. filename -- pathTEXT_LOG LOG_DMN 32,-- FT_TEXT: External document LAST_MODIFIED DATE 31,-- FT_DATE: Date TEXT_LOG was last -- edited LAST_MODIFIER VARCHAR(260) 97,-- FT_OWNER: Who last edited -- TEXT_LOG DATE_CLOSED DATE 30,-- FT_DATE4: Date problem was closed

------------------ -- Support Columns ------------------ PROBLEM_NUMBER CHAR(8) , -- Same as ext. doc. name CREATOR CHAR(8) , -- Who created this row PRIORITY PRIORITY_DMN , -- Priority of the problem STATUS CHAR(8) , -- Status of the problem

-- Customer Information COMPANY VARCHAR(30) , -- Company Name PRIME_CONTACT VARCHAR(30) , -- Primary customer contact PHONE_NUMBER PHONE_DMN , -- Customer phone number ENVIRONMENT VARCHAR(20) , -- Customer machine/OS PRODUCT_VERSION VERSION_DMN , -- Product/Version/ in use SUBJECT VARCHAR(80) -- Subject of the problem ) -- End of column definitions

Page 139: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Introduction

Text Retrieval Guide 139

------------------- -- Table Parameters ------------------- STOPFILE 'support.stp' -- Minimal stopword set BASEPATH '../supdocs' -- External files are in this -- subdirectory ; -- End of schema

Page 140: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

140 SA-Application Software Expert 5.0

C

Page 141: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 141

Chapter 2:

The Administration Tools

This chapter introduces the administration tools the are included with SearchServer, including:

• the ExecSQL administration utility

• the SearchServer Administrator

• the ODBC Administrator

• the utility programs

• the system information tables

It also describes the data preparation tasks you can perform in SearchServer with these tools, and tells you where to find additional information in this and other SearchServer manuals.

About ExecSQL

ExecSQL (or execsql in non-Windows environments) is an applica-tion program that you can use to execute SearchSQL statements and run scripts. You can administer, modify, and maintain tables using this tool. In fact, most of your administration work can be done through ExecSQL.

For example, you can design a script that will create a table, insert data into the table, then index the table. ExecSQL can also be useful to verify operations performed against a table by a custom applica-tion. For more information about creating scripts for ExecSQL, see Chapter 5, "Maintaining the Data."

Many of the tasks described in this manual can be executed through ExecSQL. For a complete description of how to use ExecSQL, see Fulcrum SearchServer Getting Started.

SearchServer is delivered with a default data source to allow you to use STDOCS and the SUPPORT tables, and to create new tables. However, if you want to create your own data sources, use the ODBC Administrator.

The SearchServer Administrator

The SearchServer Administrator is a graphic PC-based client/server utility that you can use to create, modify, and manage local and re-mote SearchServer tables. You can easily create and drop tables, in-dex a table, import and export data into and out of a table, insert and remove documents from a table, and edit rows in a table.

The SearchServer Administrator is intended to be used by experi-enced System Administrators who are familiar with SearchServer terminology and concepts. For more information about SearchServ-

Page 142: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

142 SA-Application Software Expert 5.0

C

er tables, see Fulcrum SearchServer Getting Started for you plat-form.

For step-by-step instructions on how to use the SearchServer Ad-ministrator to perform specific tasks, choose "Using the SearchServ-er Administrator" in the online help.

The ODBC Administrator

The ODBC Administrator is a Windows control panel device that administers ODBC data sources. You can use the ODBC Adminis-trator to add, change, and delete data sources from your system.

A data source is a named set of tables that is bound to a data access product such as SearchServer. For remote servers, a data source in-cludes the platform where the server is running and the network used to access it. The data source associates the SearchServer ODBC driver with the data you want to access.

SearchServer comes with the SearchServer_3.0 data source. With this data source, you can access the tables that are created during the installation procedure. However, if you plan to create your own ta-bles, you should create them with your own data sources for ease of administration. For complete details, refer to Fulcrum SearchServer Getting Started for your platform.

Note: In a 16-bit Windows environment, a data source specifies val-ues for four parameters that are also found in the FULCRUM.INI file and the DOS environment variables. The parameters must match in all locations if the data source and utility programs described in the following section are to be used together.

If your environment doesn't support ODBC, the parameters speci-fied in the data source are stored as environment variables.

The Utility Programs

SearchServer provides a set of utility programs that you can use to prepare external documents, a thesaurus, and character variant rules. You can execute the utility programs directly, or through command files (shell scripts). The utility programs also allow you to:

• load a document library

• perform diagnostics

• prepare a thesaurus facility

• prepare a character variant facility

• load a custom text reader

• save and restore table data

Page 143: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 143

• import table data

Table 2-1 provides a summary of the utility programs. See Appendix A, "Utility Program Summary," for a complete description and sum-mary of the command line parameters for each utility.

Table 2-1 Utility Programs

For complete description of how to use these utilities in a Microsoft Windows environment, see the section "Utility Programs in a Mi-crosoft 16-Bit Windows Environment," in Appendix A, "Utility Pro-gram Summary."

Using Utilities in 16-bit Windows Environments

The ftcin, ftcout, and ftlock utilities are Windows applications. As such, they refer to the FULCRUM.INI parameter file for the operat-ing system parameters that are specific to your system and data sources. You'll want to modify the parameters in this file to match the data source that you're using with your SearchServer session.

For more information about the FULCRUM.INI parameter file, see Appendix A, "Utility Program Summary."

Name What it Does

ftcin loads table data saved by ftcout or created by ftlin into an existing table

ftcout saves table data for use by ftcin

fthmake compiles a thesaurus source file

fthtest tests a compiled thesaurus and/or character variant rules

ftidrck determines if a table's index files are valid

ftimport loads table data exported by other applications

ftlin creates a document library file

ftlock enables or cancels row locking

ftlout unloads a document library file

ftmload loads the dynamic library table with a custom module

ftmunld displays the dynamic library table

ftpr verifies that the printed format of the text conforms to your specifications

Page 144: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

144 SA-Application Software Expert 5.0

C

Using Utilities in 32-bit Windows Environments

In 32-bit Windows environments, all the utilities are command-line applications that use the environment variables described in Ful-crum SearchServer Getting Started for your platform.

Verifying and Manipulating Table Data

This section identifies the utility programs you can use to import data into your table or export data from your table to an external text file. In it, you'll find references to the other sections of this manual that describe the utilities in more detail.

Saving and Restoring Table Files

The ftcout utility program saves table data. It reads the table data and verifies its consistency as it unloads the data. At the same time, ftcout creates a restorable version of the table data in flat text file format. You can use the ftcin utility program to recreate the table by reading and processing the this file.

Note: These programs do not export or import data from external sources.

For specific information about checking and exporting table data us-ing ftcout and importing it using ftcin, see Chapter 5, "Maintaining the Data."

Importing Table Data from an External Source

The ftimport utility allows you to quickly load a SearchServer table with data derived from an external source, such as another database. For example, you can load profile information currently stored in a relational database. This utility is useful if you have your own doc-ument management facilities and don't need to make use of Search-Server for automatic expansion of directory entries.

The ftimport utility is available in 32-bit environments. In the 16-bit Windows environment, the import facility is provided by the SearchServer Administrator.

Creating a Document Library File

SearchServer can read multiple external documents stored in a sin-gle operating system file (a document library). This can simplify data administration and reduce operating system overhead during in-dexing and retrieval operations.

SearchServer is distributed with two utilities named ftlin and ftlout that load and unload document library files, respectively.

Page 145: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 145

Chapter 4, "Using External Text," describes how to load and unload document library files. Chapter 5, "Maintaining the Data," describes how to load library file data into a table using ftcin.

Creating a Character Variant File

Character variant rules can specify sets of characters that should be considered equivalent when searching, usually because they indicate alternative spellings. For example, in English you might want to consider "te" and "ght" as equivalent (as in "lite" and "light").

Character variant rules can also specify suffixes that should be ig-nored while searching. For example, in English you might want to ignore the plural suffix (s) and the possessive suffix ('s).

The character variant facility supports character string equivalence rules that affect the operation of the WHERE clause of a SELECT statement. The name of the character variant rules file is specified through the SET CHARACTER_VARIANT statement.

You can test your own custom variant rules file (alone or in conjunc-tion with a thesaurus file) using the fthtest utility program. For in-formation about creating and testing a character variant rules file, see the section "Creating a Character Variant Rules File" in Chapter 7, "Providing Support Files for Searching."

Creating a SearchServer Standard Thesaurus File

A SearchServer standard thesaurus file contains synonym rules and suffix rules. Suffix rules direct the search engine to search for the specified search term, and to include any plural or possessive forms of the specified search term, plus any other alternatives that can be derived from the search term.

The thesaurus file allows for more specific and complex suffix rules than the character variant file (such as words ending in "y" that are made plural by "ies"). Synonym rules allow the search engine to in-clude a list of words and phrases that are different from the search term, but typically have the same meaning.

The thesaurus facility supports the THESAURUS function in the WHERE clause of a SELECT statement. The name of the thesaurus file is specified through the SETTHESAURUS_NAME statement or optionally as a THESAURUS function parameter.

When a thesaurus file is specified, the THESAURUS function ex-pands the specified word into a list of alternative, equivalent words and phrases. The exact expansion is determined by applying rules from the thesaurus file.

You can compile and test your own custom source thesaurus file (alone or in conjunction with character variant rules) through the

Page 146: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

146 SA-Application Software Expert 5.0

C

fthmake and fthtest utility programs. fthmake compiles the thesau-rus source file (and optionally incorporates character variant rules for a thesaurus lookup). fthtest is an interactive test facility that al-lows you to verify that the compiled thesaurus generates the desired alternative forms for a given search term.

For information about creating, testing and compiling a thesaurus file, see the section, "Creating a Customized Thesaurus" in Chapter 7, "Providing Support Files for Searching."

Integrating a Custom Text Reader

SearchServer is shipped with a number of text readers to support common external document formats. If none of these text readers is suitable for your application, you can develop a custom text reader using the SearchServer Customization Tools.

Your custom text reader must produce a text stream that is compat-ible with the Fulcrum Technologies Internal Character Set (FTICS). It must also comply with the Fulcrum Technologies Document For-mat (FTDF) which specifies syntax rules for embedded control se-quences. For more information, see the section, "Embedding Control Sequences," in Chapter 4, "Using External Text."

You can also create a custom text reader that performs directory ex-pansion. The definition for your custom text reader can be loaded into the dynamic library table in the system configuration file through the ftmload utility program. For information about adding a custom text reader to SearchServer, see the section, "Installing Text Readers," in Chapter 4, "Using External Text."

Modifying or Displaying the Dynamic Library Ta-ble

In addition to integrating custom text readers, SearchServer can be extended in other ways to operate in specific application environ-ments. For example, you can integrate custom character set transla-tion tables and collation functions created using the SearchServer Customization Tools. To integrate these custom extensions, you must modify the dynamic library table using the ftmload utility pro-gram.

To display the contents of the dynamic library table, you use the ft-munld utility program. You can also use this utility to maintain an up-to-date readable version of what is in the dynamic library table. For more information about customization, see the Fulcrum Search-Server Customization Guide.

Managing the Index

This section identifies the SearchServer indexing diagnostic tools. You'll also find references to the other sections of this manual for

Page 147: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 147

more detailed diagnostic procedures.

Checking the Index Log

You can check how successful an indexing operation on a particular PERIODIC table was by running ExecSQL. Simply execute a SE-LECT statement on the FTT_INDEXLOG column of the TABLES system table and retrieve the data for the row corresponding to that table. This column provides the actual error report for the indexing operation.

This index log records the messages that are generated by the index-ing engine, as well as the sizes of the catalog, dictionary, and refer-ence. The index log can also contain messages from previous indexing operations. The index log can be truncated using the VAL-IDATE INDEX statement with the REWIND parameter.

For more information about the TABLES table, see the section called "TABLES System Table" later in this chapter. For informa-tion about the VALIDATE INDEX statement, refer to Fulcrum SearchServer SearchSQL Reference.

Note: If the index log gets too large, you might not be able to view the entire contents of the FTT_INDEXLOG column. In this case, you can view the .LOG file that is located (by default) in the FUL-TEXT directory.

Checking the Index

To verify the internal consistency of the index associated with a ta-ble (for example, just before backing up the table files), use the ft-idrck utility program. This utility checks the index as it was at the time of the last VALIDATEINDEX operation. In the case of an IM-MEDIATE table, this means that index information for new data since the last VALIDATEINDEX isn't checked.

For more information about checking the contents of the index, see the section, "Recovering from Indexing Failure," in Chapter 5, "Maintaining the Data."

The System Information Tables

SearchServer maintains a set of five system information tables that describe the current data source in terms of the server (SearchServ-er) attributes and the structure and content of tables that comprise the data source. The system information tables are called:

• SERVER_INFO

• TABLES

• COLUMNS

Page 148: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

148 SA-Application Software Expert 5.0

C

• ZONES

• SEARCH_TERMS

SearchServer updates and indexes these tables automatically. You can issue a SELECT statement on the system information tables at any time to retrieve information about the current SearchServer data source.

A complete description of the system information tables and how to use them is provided in Fulcrum SearchServer SearchSQL Refer-ence.

SERVER_INFO System Table

SearchServer maintains one SERVER_INFO system table for each connection to a data source. This system table describes the server attributes associated with the connection. A given data source in-cludes all of the tables described in the other system information ta-bles. The server attributes affect how table data is recognized and handled by SearchServer.

TABLES System Table

The TABLES system table contains a list of the names of all tables and views visible to the data source. For each table, it also contains a record of the parameters that were used to create the table, and a record of the table's indexing status. Table parameters are specified when the table is created through a CREATETABLE clause or state-ment.

COLUMNS System Table

The COLUMNS system table lists the columns in a data source and their attributes. Issuing a SELECT statement on the COLUMNS ta-ble reveals which table a column belongs to, and whether a column is defined to hold character data, numeric data, or dates. It also re-veals how SearchServer indexes that data for search purposes. This information is recorded when you define columns in the CREATE TABLE clause or statement.

Note: A table always contains a number of pre-defined (reserved) columns that include information from the table management files. However, these columns aren't represented in the COLUMNS sys-tem table unless specifically named in a CREATE TABLE clause or statement.

ZONES System Table

The ZONES system table lists the zones in a data source. Search-

Page 149: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 149

Server uses zones to search subsets of text in a character column. To enable rapid, flexible searching capability, you can delimit a zone of text anywhere in a column using embedded control sequences or a text reader.

Once you've defined the text zones and the associated columns in a CREATESCHEMA statement, a search on the ZONES system table reveals the table name, column name, zone name, and other infor-mation about the zone.

SEARCH_TERMS System Table

The SEARCH_TERMS system table contains a list of the search-able words in the data source. A search on this table returns the zone number, search term, and occurrence statistics for each search term in a given table.

Page 150: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

150 SA-Application Software Expert 5.0

C

Page 151: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 151

Chapter 3:

Structuring the DataThis chapter describes the steps you'll follow to structure your data

and create a table. You'll learn what's involved in:

• designing the data structure

• creating the table

• setting table parameters

• working with multiple tables

Introduction

Structuring your data means organizing it in SearchServer so as to maximize the efficiency of the text-retrieval engine while still al-lowing maximum flexibility to the application designer and effec-tiveness for the users. This chapter explains how to structure your data and build your table in a way that makes the best use of Search-Server's powerful search and retrieval technology.

It describes the elements of the data structure, as well as the syntax and semantic rules you'll need to create a schema and define a table. It explains how to build a table using all the functionality Search-Server offers with the CREATE SCHEMA statement, as well as how to build a table quickly using mainly the default options with the CREATE TABLE statement.

Whichever method you choose, when you're finished, your Search-Server table will be ready to populate. This chapter also explains how you can group multiple tables into a logical view that can be searched as a single entity.

Designing the Data Structure

Data in a table usually has some logical relationship—such as the re-lationship between the material in a set of manuals, or all the re-sumés maintained by a human resources department. How you organize the data affects how easily your users are able to create queries, how fast SearchServer can process them, and how long it will take to index a table.

It's also important to know whether the data will be updated fre-quently, and whether the users of the table will need immediate ac-cess to modified data. Other important factors you'll need to consider when designing your table are:

Page 152: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

152 SA-Application Software Expert 5.0

C

• the size of the document(s) you will be searching

• what information will be stored in what columns

• whether the columns should be partitioned into zones

The data structure on which your SearchServer application is based consists of schemas, tables, rows, columns, zones, and domains. A table consists of rows and columns, which represent how the data is organized for update and retrieval. The schema, or data dictionary, is the logical description of the data that you want to include in the table.

A row represents one text object, or document, and its attributes. Each row in your table contains one value for each column. A col-umn represents an individually retrievable attribute of a text object (for example, the title of a memo). When you define columns, you are defining the actual structure of the data and your table.

You can further subdivide columns into zonesregions of text such as the name of a chapter or a section headingwhich will be distin-guished from the rest of the text for searching purposes. A column that is subdivided into zones is called a segmented column. A col-umn with no zones is called a simple (or unsegmented) column.

A domain is a user-defined data type that is built on a pre-defined data type. It is used to associate zones with a column, or to override the default index mode for a column. It can also be used to override the default data length.

What Happens During Table Creation?

When you create a table, it doesn't contain any rows of data; howev-er, it does have all the column definitions that you specified in the CREATETABLE clause or statement. In addition, the table is al-ways created as an IMMEDIATE table (that is, with the IMMEDI-ATE table parameter) so that new or updated data is immediately searchable.

If you use the CREATETABLE clause to build the table, it also has all of the zone and domain definitions that you specified in the CRE-ATESCHEMA statement.

All tables have a number of reserved columns that are created by de-fault. These reserved columns define control information and text for the search and index engines. For a complete description of the reserved columns, see "Using the Reserved Columns," later in this chapter.

Among these columns is the FT_TEXT external column. This col-umn provides access to the external text that makes up the body of a document. The external text isn't actually stored by SearchServer. It is stored in its native format in an operating system file or some other unit of physical storage. For more information about this column,

Page 153: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 153

see the section, "Reserved Column Data," later in this chapter.

For any table, the maximum length of a row is dependent on the number of columns, the sum of the maximum lengths of all character column values, and the sum of precisions of all numeric fields. The length of a numeric column value is the maximum number of bytes returned to the application.

Defining a Table with the CREATE-SCHEMA Statement

The first step in building a SearchServer application is to define the characteristics of the tablemost notably the name and characteristics of each column in the table. The schema, or data dictionary, is the logical description of the data that you want to include in the table.

The CREATESCHEMA statement allows you to define the table, its columns, column attributes, zones, and domains. The syntax for the CREATE SCHEMA statement is:

CREATE SCHEMA [REPLACE] <schema name> [{<CREATE ZONE clause> | <CREATE DOMAIN

clause>}...] <CREATE TABLE clause>

Using the CREATETABLE clause of the CREATESCHEMA state-ment gives you access to all of the features and functionality re-quired to build a SearchServer table. The column definition in this clause contains the name of the column, its data type or domain name, and optionally, its field number. You can also change the de-fault values for a number of table parameters. This functionality al-lows you to customize your table.

If you don't need all the functionality provided by the CREATE SCHEMA statement, you can use the CREATE TABLE statement, described later in this chapter. The CREATE TABLE statement cre-ates a table based largely on the defaults.

When a complete CREATE SCHEMA statement is executed suc-cessfully, the following entities associated with the schema are cre-ated:

• the zones that will be incorporated into columns

• the table

• the columns belonging to the table

• the column attributes

Page 154: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

154 SA-Application Software Expert 5.0

C

Steps You Should Consider When Creating a Table

To create your table using the CREATESCHEMA statement, you'll need to do the following:

• Assign a name to the schema. Typically, the name you use is the same name you assign to the table when you use the CREATETABLE clause. You can only have one table per schema. However, you can use separate instances of the same schema definition to create multiple tables.

• Assign a unique, non-reserved name to your table.

• Assign unique, non-reserved names to the columns you plan to add to the table.

• Determine the data type and index mode, and an optional unique field number that can be associated with each column.

• Assign non-reserved zone names; one for every portion of a segmented column you want to search separately from other portions of the column.

• Provide a zone number for each zone name, and an idea of its index mode.

• Assign a unique domain name for each segmented column or column that has a non-default index mode.

• Decide the index mode you want to use for the associated domain when you want to override the default index mode of a data type.

• List the zone names you'll group into the associated domain when you want to associate one or more zones with a segmented column.

You should also have on hand information about the structure of the file system, the names of the directories that contain table manage-ment and support files, and other information needed to set the table parameters.

Naming the Schema

Each CREATE SCHEMA statement defines a single table. The rel-evant part of the syntax is:

CREATE SCHEMA [REPLACE] <schema name>

where <schema name> specifies the name of the new schema. Be-cause the schema name is not referenced anywhere else, you can use the same name for the schema and table you're creating. You can use this statement to create a new schema or overwrite an existing one using the REPLACE option.

Note: Using the REPLACE option to overwrite an existing schema will usually invalidate the index for the table. After replacing a sche-ma, always perform a VALIDATEINDEX statement that specifies

Page 155: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 155

the ABANDON parameter, on the table.

The following example shows how to create a schema named Sup-port:

CREATE SCHEMA SUPPORT

Naming the Table

Typically, the name you use is the same name you assigned to the schema. The name you choose for the table must be a valid Search-SQL identifier and must be different from any other table name. (You can use the TABLES system table to find out if the name is al-ready in use.)

For example, in the SUPPORT table, the first part of the CRE-ATETABLE clause or statement would look like this:

CREATE TABLE SUPPORT

The maximum length for table names is 10 characters (8 characters for table names in 16-bit Microsoft Windows environments).

Defining Simple Columns

Next, you define the columns and their attributes. The syntax of a column definition is considered part of the table definition. For a CREATETABLE clause, the syntax looks like this:

<column name> {<data type> | <domain name>} [<field number>]

To define a column named COMPANY in the SUPPORT table, the CREATETABLE clause or statement would look like this:

CREATE TABLE SUPPORT (COMPANY VARCHAR(30))

This example would create a column called COMPANY in a table called SUPPORT. It also supplies a data type VARCHAR with a maximum length of 30 characters to this column. Using the VAR-CHAR data type, in turn, implies the use of the NORMAL index mode.

When you create a column, you define it in the following order:

1. Name the column.

2. Choose a data type for the column, or choose a domain that will be used to define the range of values for the column. The data type can be CHAR, VARCHAR, APVARCHAR, INTEGER, SMALLINT, or DATE.

Page 156: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

156 SA-Application Software Expert 5.0

C

3. For a character type column, specify the data length if you want to override the default length.

4. Optionally, specify a field number. Field numbers are necessary only when renaming a reserved column or to control the default zone number of a segmented column.

Naming the Column

Use column names based on the data model that describes your ap-plication. For example, the SUPPORT table contains information about the customer support group's different clients, such as the name of each company, the primary contact person, and the phone number.

In order for an application to be able to retrieve this data from the client list, you'll need to create columns with names such as COM-PANY, PRIME_CONTACT, and PHONE_NUMBER. When you want to retrieve information about a particular topic, you can refer-ence these columns in your SELECT statement.

Choosing the Data Type

The data type you choose will be determined by the characteristics of the data in each column. There are three main data types:

Table 3-1 SearchSQL Data Types

About Index Modes

Every data type has a default index mode that determines the type of indexing and searching for the column. Once you've executed a CREATESCHEMA statement successfully, you'll have a table with all the columns described. After you use the INSERT statement, the inserted rows will contain data values for all or some of the columns. When the inserted rows are indexed, the indexing engine determines what to index and how to index it through the index mode for each column. If you've used the CREATESCHEMA statement, the index-

Data Type SearchSQL Syntax

character string type CHARACTER [VARYING] [(<length>)] CHAR [VARYING] [(<length>)]VARCHAR [(<length>)]APVARCHAR [(<length>)]

exact numeric type INTEGERINTSMALLINT

date type DATE

Page 157: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 157

ing engine also determines what to index through the index modes that are included in the zone definitions (or domain definition).

You can use the NORMAL index mode to search any text column. Specify this index mode for character columns (and zones) in which you will want to match ordinary words or phrases.

The LITERAL index mode is useful for terms such as alphanumeric data that would normally be indexed separately by SearchServer. The LITERAL index mode can be used with any predicate that per-forms text searching.

The VALUE index mode allows efficient searching for numeric val-ues using the in, between, and comparison predicates. (Note that search terms that are VALUE indexed are not delimited by match codes in the returned data.) Searchable columns of type DATE, IN-TEGER, and SMALLINT must be defined with the VALUE index mode.

You cannot search on a column or zone that was defined with the NONE index mode. The valid index modes and the type of index that they create are:

Unless specified otherwise, the index mode of a column is the de-fault index mode for its data type.

NORMAL The index created for the column or zone contains one entry for each word in the column. (The definition of a word is provided in Fulcrum SearchServer SearchSQL Reference.)

LITERAL The index created for the column (or zone) contains one entry for each sequence of characters delimited by whitespace. (The definition of a literal is provided in Fulcrum SearchServer SearchSQL Reference.)

VALUE The index created for the column (or zone) contains one entry for each date or numeric value in the column or zone.

NONE No index is created. A column (or zone) defined with this index mode cannot be referenced in the WHERE clause and is labeled as not searchable.

Page 158: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

158 SA-Application Software Expert 5.0

C

Table 3-2 Index Modes

Assigning a Data Length

The data type you assign to a column has a default data length and a maximum data length. These lengths define the width of the column you're creating. You can assign the data length in the column defini-tion of the CREATETABLE clause only for the CHAR or VAR-CHAR data types. The default length for CHAR and VARCHAR is 1, and the maximum length for these data types is 32,767.

Using Domains to Define New Data Types

You can use a domain to define a new data type. The AS portion of the CREATEDOMAIN clause requires you to assign a pre-defined data type to the domain.

In the following example, the PRIORITY_DMN domain assigns the exact numeric type INTEGER and VALUE index mode. This do-main has not changed any of the default settings for the data type, but has provided the application with an alias for the INTEGER data type that could be used to define any columns that contain the same type of data.

CREATE DOMAIN PRIORITY_DMN VALUEAS INTEGER

Using a Domain to Override the Data Length

The data type that you assign to a domain has a default length. You can override the default length of a character string data type by specifying a new length when you define the data type. For instance, the following example creates the VERSION_DMN domain that as-signs the VARCHAR data type with a data length of 10 characters:

CREATE DOMAIN VERSION_DMN LITERALAS VARCHAR(10)

If all you want is a way of defining several columns with the same character string data type and length, you can define a domain with the default index mode (NORMAL), and the required data type and length.

Data Type Default Index Mode

character string type NORMAL

exact numeric type VALUE

date type VALUE

Page 159: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 159

Using a Domain to Override an Index Mode

You can change the index mode of a column by specifying a domain name instead of a data type in the column definition. You can't change the index mode if you built the table by using the CRE-ATETABLE statement because such tables do not contain domains.

The data type on which the domain is built has a default index mode. You can override this default by specifying a different index mode when defining the domain. For instance, the default index mode for the VARCHAR data type is NORMAL. The following example changes the index mode for the domain to LITERAL:

CREATE DOMAIN VERSION_DMN LITERAL AS VARCHAR(10)

Using Zones to Define SegmentedColumns

One of the options you have when creating the schema using a CRE-ATESCHEMA statement is to build zones into the columns. Asso-ciating zones with a column is yet another use for a domain.

Zones give the table more flexibility by allowing you to structure the data within a text column. The data in the zone becomes an indepen-dent subsection of data for searching purposes. This enhances the precision of queries and reduces search time.

You can name the zone in the WHERE clause of a SELECT state-ment, which allows the zone to be searched separately. The CRE-ATE ZONE clause specifies the index mode for the zones that can be different from the index mode associated with the column. How-ever, you can update or insert a zone only by inserting or updating the entire column that contains it.

Note: Only text columns can be segmented. Zoning can't be used with date and numeric columns.

How Do Zones Work?

A column can contain one or more zones. If you don't specify a zone in the schema, the column consists of a single default zone whose zone number is the same as the field number of the column. This is a simple column, as described earlier in this chapter. If you've seg-mented the column into zones, any data that you don't include in a specified zone remains part of the default zone.

Zones are used for searching purposes. For example, the SUPPORT table contains a column named TEXT_LOG that points to an exter-

Page 160: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

160 SA-Application Software Expert 5.0

C

nal document file. This file contains information such as the date and time of a call, the name of the operator who answered the call, the contact person, and the description of the problem.

If the column doesn't contain zones, a search on that column returns all instances of your search term in that column for all the rows in the table. However, in a segmented column, the zones can be searched separately—thereby narrowing the search request without having to create extra columns.

For example, searching the TEXT_LOG column for every instance of Peter as the operator, would be impossible if the column wasn't segmented into zones. The search result would return all instances of Peter in the file. However, if the column was segmented into zones, as illustrated below, you could search for the term Peter when it appeared as the operator.

Figure 3-1A Column Segmented into Zones

Zones are identified in the data by zone control sequences. If these control sequences aren't already embedded in the data, they must be inserted by the application that inserts or updates the data, or by a text reader. Therefore, you must be sure that these control sequences assign the appropriate zone number to the appropriate data segment. You do this done by ensuring that the text reader (or application pa-rameters) conforms to the data model you designed.

Creating a Zone

In SearchServer, all the data in a column is either in the default zone or in a zone defined by a zone control sequence. An instance of a zone is created by placing zone control sequences in the data either through a text reader or by embedding the control sequence into the actual data. The CREATE ZONE clause of the CREATESCHEMA statement defines the set of zone instances that are to be searched to-gether under a common name. Zones are associated with a column

Page 161: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 161

through a domain.

The syntax for the CREATE ZONE clause is shown below:

CREATE ZONE <zone name> (<zone list>) [<index mode>] zone list ::= <zone number> [{,<zone number>}...]

The zone must have a valid name that is unique in the table, and it can't be the same as a column or a domain. For example, in the SUP-PORT table, the CREATE ZONE clause for the CONTACT zone would be

CREATE ZONE CONTACT (212) NORMAL

The zone numbers in the zone list relate directly to the zone control sequences that introduce instances of a zone (as described in the fol-lowing section). You can specify the index mode to override the de-fault index mode of the associated column.

To create a zone, you must understand the relationship between the zones defined and zone numbers assigned to those zones. For a com-plete description about how the zone control sequence is used to de-fine zones in the external text or column data, see Appendix D, "Control Characters and Control Sequences."

For example, the following text represents the document format for the TEXT_LOG column associated with the SUPPORT table:

DATE:\E[201s92-01-27\E[s TIME:\E[202s9:30 am\E[s OPERATOR:\E[211sMarie\E[s CONTACT:\E[212sDave Chisholm\E[s DESCRIPTION:\E[32sDave called to ask how many

connection handles ....

In this example, the text is segmented into five separate zones—201, 202, 211, 212, and 32 (which is also the default zone num-ber for this column). Each zone instance begins with the control se-quence introducer (\E[), the zone number (for example, 201), the zone number terminator character (s), the data (for example, 92-01-27), and then ends with the control sequence introducer and zone terminator character (\E[s).

Note: The control sequence introducer (\E[ ) is a notational device that represents the two-byte control character whose hexadecimal representation is 1B5B.

On the basis of these criteria, you would have to include the follow-ing CREATEZONE clauses in your CREATE SCHEMA statement:

Page 162: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

162 SA-Application Software Expert 5.0

C

CREATE ZONE DATE_AND_TIME (201, 202) CREATE ZONE OPERATOR (211) CREATE ZONE CONTACT (212) CREATE ZONE DESCRIPTION (32)

When you reference the zone name in the WHERE clause of a SE-LECT statement, the SearchServer search engine references the ap-propriate CREATE ZONE clause and searches for data associated with the zone numbers listed in the CREATEZONE clause.

When you reference a segmented column in the WHERE clause, you are implicitly searching all of the zones in the column. For a given row, a match in one or more zones of a column constitutes a match in the column. When your SELECT statement contains a select list that names a segmented column, the data retrieved from the working table for that column contains the zone control sequences.

Assigning Zone Numbers

If you're mapping zones to a properly designed data model that al-ready has segmented the data and assigned zone numbers, the zone numbers you assign in this clause should match the ones assigned in the model.

The zone number must be a number between 128 and 64,010 (inclu-sive) that has not already been assigned to another zone or user-de-fined column. In all cases, each zone and zone number must occur in only one column. In other words, the list of zone numbers (includ-ing the default zone number) associated with a column can't overlap the list for another column.

For example, the number 32 is only used with a zone that is part of a column that renames the FT_TEXT reserved column. For a com-plete description of the reserved columns, see the section, "Using the Reserved Columns," later in this chapter.

Assigning Index Modes to Zones

You can also assign an index mode to a zone. You can assign it in the CREATEZONE clause or you can let the index mode default to the default index mode of that column's data type. For instance, the following example assigns the NONE index mode to the OPERA-TOR zone:

CREATE ZONE OPERATOR (211) NONE

For more information about the index modes assigned to data types, see the section, "About Index Modes," earlier in this chapter.

The index mode determines how the indexing engine indexes the data you insert in this zone. This specifies how the data is divided into words (the smallest searchable units of text) for searching pur-poses. For a complete description of word recognition at search

Page 163: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 163

time, see Fulcrum SearchServer SearchSQL Reference.

If you decide that the index mode for the zone should differ from the index mode defined by the data type of the column, you can assign an individual index mode to a zone. If you do this, you must embed index mode delimiters in your column data. These delimiters enable and disable the index mode at the beginning and end of the zone. For a list of index mode delimiters see the section, "How Index Modes Affect Indexing," in Chapter 5, "Maintaining the Data."

When you change the index mode of the data in a zone, you must also change the zone definition. This zone and index mode must be explicitly stated in the schema.

Indexing Modes and Searching

When you execute a SELECT statement, the search engine refers to the schema to determine how the data being searched was indexed. The index mode provides this information. If the index mode speci-fied in the data of a zone doesn't match the index mode in the schema of the zone, then SearchServer won't construct the proper search.

For example, assume that the following string occurs in the PRODUCT_VERSION column of the SUPPORT table, but is in-dexed in NORMAL index mode because of a spurious embedded control sequence:

'1.1A'

In this case, the two strings '1.1' and 'A' are indexed as separate words. However, the SUPPORT schema defines the PRODUCT_VERSION column with LITERAL index mode, so that a search for '1.1A' in this column is a search for the complete string, that will not be found.

As a result, you must ensure that any changes that are made to the zone control codes and the index mode delimiters are reflected in the zone descriptions in the CREATE SCHEMA statement. If you're us-ing a previously designed text reader, you should be familiar with the zone control codes and index mode delimiters that it uses.

When you search the column, it is searched using the index mode of the column, not the index mode of the individual zones in the col-umn. For this reason, it is recommended that you don't define a zone with an index mode that is different than that of the column unless the index mode of the zone is NONE.

Grouping Zones

SearchServer gives you the flexibility of grouping many smaller zones into one zone. To do this, you create a zone and assign more than one zone number to it. For example:

Page 164: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

164 SA-Application Software Expert 5.0

C

CREATE ZONE ENTRY_HEADING (201,202,211,212)

This zone includes all the heading information for each conversation with the customer, but excludes the description. A search on the ENTRY_HEADING zone would then search all the data in the zones named DATE_AND_TIME, OPERATOR, and CONTACT. This allows you to formulate a query, without naming all the indi-vidual zones.

Using a Domain to Group Zones

Zones are assigned to columns using the CREATE DOMAIN clause of the CREATE SCHEMA statement. The zones are grouped to-gether in the domain definition and then referenced in the column definition of the CREATE TABLE clause. When you assign a do-main to a column, all the zones in the domain are assigned to the col-umn.

SearchServer determines the characteristics of these zones by look-ing at the rest of the schema. For instance, the following example creates a domain named LOG_DMN that contains four zones and assigns the data type APVARCHAR to the zones:

CREATE DOMAIN LOG_DMN (DATE_AND_TIME, OPERATOR, CONTACT, DESCRIP-TION)AS APVARCHAR

Choosing Table Parameters

Table parameters describe the data administration characteristics of the table. They can be specified only in the CREATE TABLE clause when creating a new table. You can't specify these table parameters when using the REPLACE option of the CREATESCHEMA state-ment.

If you did not explicitly set these parameters, they are set to their de-fault values specified in the SERVER_INFO system table. (For a complete description of the SERVER_INFO system table, see Ful-crum SearchServer SearchSQL Reference.)

The table parameters are:

• BASEPATH <base path>

• IMMEDIATE or PERIODIC

• INDEXDIR <index directory>

• NOLOCKING or ROWLOCKING

• NORMALIZATION <normalization type>

Page 165: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 165

• STOPFILE <stop filename>

• WILDCARD_OPT <wildcard optimization method>

• WORKDIR <work directory>

The BASEPATH, INDEXDIR, STOPFILE, and WORKDIR table parameters are used to specify the locations of various files related to the table. For each of these table parameters, a default location is defined. In most cases, you'll need to know about the configuration of directories in your operating environment to set these table pa-rameters appropriately. In the case of a table created on a server node, the directory and filename parameters are interpreted in the context of the server's file system.

Note: Under all Microsoft Windows environments, if the BASE-PATH, INDEXDIR and WORKDIR table parameters are specified as a root directory, use the notation "C:\" rather than "C:" (for exam-ple) to have SearchServer decode the request correctly. Always specify a directory explicitly instead of (for example) using "C:" to imply the current directory.

BASEPATH <base path>

The BASEPATH parameter tells SearchServer where the external document files reside. When you supply a value for this parameter, any value that is supplied for the FT_SFNAME reserved column that is a relative pathname is assumed to be relative to the pathname specified in the BASEPATH parameter.

For example, in the CREATESCHEMA statement for the SUP-PORT table, the BASEPATH parameter is specified as:

BASEPATH ../supdocs

If you specify EXTERNAL.DOC as the value for the FT_SFNAME column (TEXT_LOG_FILE in the SUPPORT table), SearchServer assumes that the path is relative to the pathname specified in the BASEPATH parameter, and prepends the pathname ../supdocs to the filename external.doc.

If the BASEPATH is a relative pathname, as in this example, it is as-sumed to be relative to (that is, the BASEPATH is prefixed by the pathname of) the default location for the table management files. (The default location is environment dependent.) This allows you to create a table that is readily portable to a different location in the file system or to a different machine.

IMMEDIATEPERIODIC

The IMMEDIATE parameter specifies an IMMEDIATE table. The

Page 166: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

166 SA-Application Software Expert 5.0

C

data that is entered into the table is immediately searchable. This means that the data can be obtained using a SELECT statement as soon as the INSERT or UPDATE statement that entered the data is executed. Also, a DELETE statement causes the row data and asso-ciated index information to be deleted immediately, so that a subse-quent SELECT statement can't match the deleted row.

These two parameters are mutually exclusive.

INDEXDIR <index directory>

The INDEXDIR parameter specifies the pathname to a directory containing SearchServer table management files (other than the ta-ble's configuration file). If you don't specify this parameter, the table management files are created in the same directory as the configura-tion file, which is the directory named by FULCREATE.

NOLOCKINGROWLOCKING

These two parameters are mutually exclusive.

The NOLOCKING parameter tells SearchServer not to perform row locking on the table files when they are accessed. In a multi-user en-vironment, disabling locking ensures that SearchServer is never de-nied access to a row in a table because of another user's access. However, the integrity of concurrent data updates is not guaranteed.

Row locking guarantees the consistency of retrieved data by using operating system locking calls on the rows to prevent multiple appli-cations from performing simultaneous updates. It is especially use-ful if a user interface is operating in a multi-user environment, where concurrent searches and updates are being performed.

If one update is being done and another is initiated, the second is giv-en a message to indicate that the row is busy. Row locking also lets your application block other retrieval requests during an update.

NORMALIZATION <normalization type>

The NORMALIZATION parameter specifies the character class settings and case normalization options appropriate for the corre-sponding internal character set. The relationship between the case normalization options and the internal character sets is shown in Ta-ble 3-3.

Case Normalization Option

Internal Character Set

DEFAULT FTCS94

EUROPA3 EFTCS94

Page 167: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 167

Table 3-3 Case Normalization Options

SearchServer also recognizes ̀ ASIAN' to restrict case normalization to the 26 English characters and ̀ NONE' to perform no case normal-ization. These two options are not associated with an internal char-acter set, but are provided for custom character sets for other languages. For more information about character sets, see Fulcrum SearchServer SearchSQL Reference.

STOPFILE <stop filename>

The STOPFILE parameter specifies the filename that contains the words that won't be indexed (stop words). If you specify an empty filename by entering two single quotation marks (' '), no stop file is associated with the table and all the words in columns defined with NORMAL index mode are indexed. As a result of the increased size of the index files, indexing and searching will require more time.

If a stop file is specified, it isn't necessary for it to exist when the ta-ble is first created. However, it must exist when you execute a VAL-IDATE INDEX statement for the table. Otherwise, the indexing operation fails. If the stop file changes, the associated table must be re-indexed using the ABANDON parameter.

A stop file is provided with the SearchServer that contains common words as stopwords. This file is called FULTEXT.STP.

WILDCARD_OPT <wildcard optimization method>

The WILDCARD_OPT parameter specifies the type of wildcard op-timization to be enabled for the table. There are three wildcard opti-mization methods:

• MINIMIZE_SEARCH_TIME

• MINIMIZE_INDEX_OVERHEAD

• NONE

If this table parameter is omitted, no wildcard optimization is per-formed (by default NONE). However, the default value can be changed for the duration of your connection by executing a SETWILDCARD_OPT statement.

For more information about using wildcards in your search, see Ful-crum SearchServer SearchSQL Reference.

WORKDIR <work directory>

The WORKDIR parameter specifies the pathname to a work direc-

ARABIC AFTCS94

Case Normalization Option

Internal Character Set

Page 168: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

168 SA-Application Software Expert 5.0

C

tory containing SearchServer temporary files during indexing. It is important that the work directory is placed in a location that provides enough disk space for the indexing activities that SearchServer must perform. For more information about space requirements, see Chap-ter 5, "Maintaining the Data."

Optimization for Table Validation

Table validation must examine all the physical data associated with a table and each row in the table in order to accurately synchronize the table data with the external documents. You should have enough memory available to optimize performance for table validation.

The amount of memory needed to give the best performance is de-pendent on the size of the table and the number of documents it con-tains. For example, a medium-size table with 10,000 documents needs approximately 450K of memory to gain the best performance on table validation. If this table grew to 100,000 documents, it would need approximately 900K. Table validation can run with less mem-ory, but will take longer to complete.

Table 3-4 indicates the minimum memory requirements for a table and the memory requirements for full optimization of the table val-idation process:

Table 3-4 Memory Requirements

For best performance during table validation, you should not use the SearchServer defaults when creating a table. These defaults cause the table to be created with high integrity and immediate indexing. Instead, specify the NOLOCKING and PERIODIC table of the CREATETABLE clause, or execute a SETNOLOCKING and SET-PERIODIC statement after the table has been created but before in-dexing.

High integrity is disabled by using the NOLOCKING table parame-ter or the SETNOLOCKING statement. Immediate indexing is dis-abled permanently by using the PERIODIC table parameter or temporarily by the SETPERIODIC statement.

IMMEDIATE tables update the index each time the table changes. High integrity tables impose locks on the data it read, reading rows one at a time, instead of reading a group of rows.

Numberof Entries

Minimum Memory

Maximum Memory

10,000 64K 448K

100,000 64K 896K

250,000 64K 1,664K

Page 169: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 169

Tables used in the following environment should be created with low integrity and periodic indexing:

• The application doesn't control access to the data store. Documents are added, updated, and removed from the data store without recording changes in the table. A batch table validation and indexing process needs to be run periodically to update the table management files.

• It is possible that documents in the data store continue to change while table validation is executing. In this case, the table files might never totally catch up to the current state of the data store. Changes already in place when table validation is started are found, but some changes made during table validation are not caught until the next run.

• The application never needs to update table data directly. All updates are performed by table validation and text readers.

• An administrator might add or delete an entire container or file.

• If your table is similar to any of the preceding examples, but users are allowed to update row data and are not adding or deleting rows, the table should be created with high integrity and periodic indexing.

How INDEXDIR and WORKDIR Relate to the Server Attributes

SearchServer always places the configuration file in the directory specified in FULCREATE. If you don't specify INDEXDIR, the ta-ble management files are created in the directory specified in FUL-CREATE. If you do specify INDEXDIR, the table management files are created in that directory rather than in FULCREATE. Search-Server must have write access to this directory.

SearchServer attempts to locate table management files and support files by consulting the FULSEARCH server attribute. To retrieve data from tables that you create, FULSEARCH must include the di-rectories specified by FULCREATE and INDEXDIR.

FULTEMP is a server attribute that defines the default location for the temporary (intermediate) sort files SearchServer creates during indexing. You can override the default for a particular table by spec-ifying the WORKDIR table parameter in the CREATETABLE clause. SearchServer must have write access to this directory.

See Fulcrum SearchServer Getting Started for more information about the server attributes SearchServer uses to create and locate ta-bles, and to find temporary work space. The server attributes are ini-tialized from data source parameters when the connection is made to a data source.

Using the Reserved Columns

Page 170: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

170 SA-Application Software Expert 5.0

C

Every table contains a number of reserved columns. These columns and their definitions are fixed and only the names of the columns and their index modes (in some cases) can be changed. SearchServer uses these columns and the values that are supplied to them to pro-vide control information to the search and index engines and to pro-vide status information to you.

When you execute a CREATE SCHEMA or CREATETABLE statement successfully, the following reserved columns will be in place (although you may have renamed some of them):

* The index mode for only these reserved columns can be changed if the column is renamed.

Table 3-5Reserved Columns and their Attributes

Reserved Column Data

FT_CID

NAME DATA TYPE

FIELD # INDEX MODE DESCRIPTION RENAME

FT_CID INTEGER 9 VALUE catalog id NO NO

FT_DATE DATE 31 VALUE document date

YES NO

FT_DFLAG SMALLINT 8 NONE sub-document NO NO

ft_dname varchar (260)

110 NORMAL document name

YES YES

FT_FLIST VARCHAR (10243)

6 NONE text reader list

YES YES

FT_FORMAT INTEGER 102 VALUE document format

YES NO

FT_KEYWORDS VARCHAR (255)

104 NORMAL document keywords

YES YES

FT_MTIME INTEGER 5 NONE last-modified date

NO NO

FT_ORIGINAL_SIZE

INTEGER 103 VALUE document disc file size

yes no

FT_OWNER VARCHAR (260)

97 NORMAL*

owner YES YES

FT_ROW_STATE VARCHAR (32)

12 NONE index state of row

NO NO

FT_ROW_TYPE VARCHAR (32)

11 NONE type of row NO NO

FT_SFNAME VARCHAR (260)

3 NONE external text filename

YES YES

FT_SUBJECT VARCHAR (255)

107 NORMAL*

subject of document

YES YES

FT_TEXT APVARCHAR 32 NORMAL*

external text

YES NO

FT_TEXT_STATUS

VARCHAR (32)

13 NONE document status

NO NO

FT_TIMESTAMP INTEGER 26 NONE last update time YES NO

Page 171: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 171

The value in this column is a numeric identifier that is assigned to each row in a table. Each value is unique within the table, and isn't reissued if the row is deleted.

Neither the application nor the text reader can assign a value in this reserved column.

Note: FT_CID values that SearchServer assigns might not be in nu-merical order.

FT_DATE

This column records the primary date associated with the document. If the row has an external document file (that is, if the value of the FT_SFNAME column isn't NULL), SearchServer automatically supplies the last-modified date of the external document file as the value. This column is updated, if the date has changed, each time the table is indexed. If the row has no external document file, the value is NULL but is considered to be zero for search purposes. Neither the application nor the text reader can assign a value in this reserved column.

FT_DFLAG

If there is an external document associated with this row (that is, the values for FT_SFNAME and FT_FLIST are not NULL), the value in the FT_DFLAG column is 1. Otherwise, the value is 0.

Neither the application nor the text reader can assign a value in this reserved column.

FT_DNAMEFT_OWNER

SearchServer supplies values for these columns for each row that is automatically generated during indexing as a result of directory ex-pansion. However, the application can insert values for these col-umns in rows without an external text column.

In addition, FT_OWNER can be set by a text reader based on the in-formation in the document. For a complete description about assign-ing values, see the section, "Assigning Values to the Reserved Columns," in Chapter 5, "Maintaining the Data."

These columns have the VARCHAR data type with a maximum length of 260.

FT_FLISTFT_SFNAME

Page 172: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

172 SA-Application Software Expert 5.0

C

The FT_FLIST reserved column specifies the list of text readers and options used to generate the external text stream. The FT_SFNAME reserved column identifies the associated operating system file (if any). Together they uniquely identify the external text for the row.

If the value in FT_FLIST is null, the standard (s) text reader is as-sumed. FT_SFNAME can be null. However, if both reserved col-umns are null, then this row has no external text, and the FT_TEXT reserved column must also be null.

If FT_SFNAME is not a fully qualified pathname, it is assumed to be relative to the BASEPATH table parameter in the CREATETA-BLE clause. The equivalent full pathname is reported by the FULLNAME() function. This value is meaningful only on the sys-tem where the external document file or directory named by the FT_SFNAME reserved column resides. In a client/server configura-tion, this is typically not the same system where the application pro-cess is executing.

These columns are set automatically by SearchServer during con-tainer expansion. For a complete description of container expansion, see the section called "Container Rows" in Chapter 5, "Maintaining the Data."

These columns have the VARCHAR data type and are not indexed or searchable. The maximum length for FT_FLIST is 10243. The maximum length for FT_SFNAME is 260.

FT_FORMAT

This column identifies the document format information provided by the text reader when the text reader is used to retrieve the external document. The value returned in this column is a number that repre-sents the word processor to be used when viewing the external doc-ument. The possible values are documented in the FTFI.H and SCCFI.H files.

If the value supplied by the text reader is NULL, the text reader has not supplied any document format information and the document format is unknown.

FT_KEYWORDS

The values in this column indicate the keywords of the document in the row. This column can be set by the text reader based on informa-tion in the document or by the application.

The maximum length of this column is 255.

FT_MTIME

This value represents the last-modified date of the external docu-ment and is updated, if necessary, each time the table is indexed for

Page 173: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 173

each row that contains a value in the external text column. Neither the text reader nor the application can assign a value for this reserved column.

FT_ORIGINAL_SIZE

The value in this column indicates the original size of the external document. The file size is provided by format translation text readers (for example, ftmf, and nti) and that can be uniquely identified. This column cannot be set by SearchServer nor the application. The value is set when the text reader is used to retrieve the external document for indexing. This only applies to DATA rows and is not used for DIRECTORY rows.

This reserved column is useful when downloading a file from a serv-er to a client. You can check the value to determine the size of the external document. It will give you an indication of the amount of time a file transfer would take to complete.

If the value supplied by the text reader is NULL, the text reader has not supplied any document size information and the document size is unknown.

FT_ROW_STATE

The value in this column indicates the indexing state of a row. The possible values are:

• NOT_YET_INDEXED

• INDEXED

• UPDATED_SINCE_LAST_INDEXING

• CANNOT_BE_INDEXED

• NULL

SearchServer sets these values automatically when executing an IN-SERT, UPDATE, or DELETE statement.

For a complete description of the FT_ROW_STATE values, see the section, "Assigning Values to the Reserved Columns," in Chapter 5, "Maintaining the Data."

FT_ROW_TYPE

The value in this column indicates the type of row depending on the type of document referenced in FT_SFNAME (DATA or DIREC-TORY). SearchServer automatically sets these values when execut-ing an INSERT, UPDATE, or DELETE statement.

For a complete description of the FT_ROW_TYPE values, see the section, "Assigning Values to the Reserved Columns," in Chapter 5, "Maintaining the Data."

FT_SUBJECT

Page 174: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

174 SA-Application Software Expert 5.0

C

The values in this column indicate the subject of the document in the row. This column can be set by the text reader based on information in the document or by the application.

The maximum length of this column is 260.

FT_TEXT

This column references the data comprising the external document. Typically, the FT_SFNAME column identifies the name of the file that contains this data, and the FT_FLIST column identifies the text reader list required to access the external document. The FT_TEXT column can't be specified in an INSERT or UPDATE statement. If both FT_SFNAME and FT_FLIST are NULL, then there is no ex-ternal document text, and FT_TEXT is NULL.

You can retrieve data from the entire external document, but Search-Server only searches for words and supplies context information within the first 16 million non-blank characters. The maximum length of FT_TEXT is 2 gigabytes.

FT_TEXT_STATUS

The value in this column indicates the status of the document in the row. The possible values are:

• OK

• MISSING

• UNREADABLE

• UPDATED

• NULL

SearchServer automatically sets these values when executing an IN-SERT, UPDATE, or DELETE statement.

For a complete description of the FT_TEXT_STATUS values, see the section, "Assigning Values to the Reserved Columns," in Chap-ter 5, "Maintaining the Data."

FT_TIMESTAMP

The values in this column indicate the timestamp of the data in the row. This reserved column is used for optimistic concurrency in the SELECT statement and searched UPDATE and DELETE state-ments to ensure that the row to be updated or deleted has not been changed since the last SELECT statement was performed. Optimis-tic concurrency provides an alternative data integrity approach to row locking. On platforms that don't support read-only locks (such as Windows), optimistic concurrency allows a user to edit data for an extended period, while other users can concurrently view infor-mation contained in the row.

Page 175: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 175

Assigning New Names to Reserved Columns

When using the asterisk option (*) to select all columns in a SE-LECT statement, reserved columns that were not explicitly named in the CREATE SCHEMA statement are not automatically selected. If you want to include such a column in your result list, you must ex-plicitly include it in your SELECT list.

Note: The following reserved columns can't be renamed: FT_MTIME, FT_DFLAG, FT_CID, FT_ROW_TYPE, FT_ROW_STATE, and FT_TEXT_STATUS.

To assign a new name to a reserved column: 1. Determine the field number and data type of the reserved column.

2. Specify the field number and data type of the reserved column in a column definition of a CREATE TABLE clause or statement.

3. Execute the CREATE SCHEMA statement that contains this clause or execute the CREATETABLE statement.

4. Once you've changed the name of a reserved column, the original name can no longer be used in a SearchSQL statement. Only the new name can be used. Specifying the field number is optional when SearchServer can determine the field number from the col-umn's data type. This is the case for the data types APVARCHAR (field number 32, column FT_TEXT) and DATE (field numbers 27 through 31).

5. For example, if you specify the APVARCHAR data type for a col-umn, SearchServer automatically assumes you are renaming the FT_TEXT reserved column and assigns field number 32. Similarly, this also applies to a DATE data type.

Defining a Table with the CREATETA-BLE Statement

If you don't need all the functionality of the CREATESCHEMA statement, you can use the CREATETABLE statement as a shortcut to building a table. However, you should be aware that this statement uses mainly the default values, and doesn't contain all of the same functionality as the CREATESCHEMA statement.

The CREATETABLE statement syntax looks like this:

CREATE TABLE <table name> <column definition>[{, <column defini-

Page 176: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

176 SA-Application Software Expert 5.0

C

tion>}...])

Because the CREATE TABLE statement can't use domains, the syn-tax for a simple column definition looks like this:

<column name> <data type> [NOT NULL<field num-ber>]

The CREATE TABLE statement has the following limitations:

• A table created with this statement can't contain domains or zones.

• Because the statement can't contain domains, you can't use it to define a new data type, to change the index mode, or to rename the reserved columns.

• You can't specify a field number for the column SearchServer automatically assigns to the field numbers.

• You can't rename reserved columns

• You can't specify table parameters when using the CREATETABLE statement.

Defining a View to Multiple Tables

Multiple tables can be grouped into a logical view that can be searched as one entity. You should group tables with similar sche-mas and data. In particular, the field numbers associated with each column should map to the similar logical data for each table included in the view.

For example, the data representing the company name in each table should be stored in a column with the same field number for all ta-bles in the view. If this isn't done, your SearchSQL query results will be difficult to understand.

You can create the view file any time, but the view can't be searched until after the component tables have been created, populated with at least one row, and indexed. Each component table must have its set of table management files available.

Note: The view file and all component tables must reside on the same node, and the view file must be in one of directories Search-Server searches for table management files. Views are read-only. They can only be searched (using the SELECT statement).

Creating a View File

You can create a view at any time. To create a view to multiple ta-bles, use a standard text editor to create a file containing one entry for each table you want to include in the view. This file is known as

Page 177: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 177

a view file.

The name of the view file should be viewname.CFG, where viewname is the name of the view. The view file contains a list of the tables that are to be part of the view. SearchServer treats the ta-bles as if they are one large table for search purposes. The total num-ber of tables in a view must not exceed an environment-dependent limit. This limit can be found in Fulcrum SearchServer Getting Started.

To search a view, specify the name of the view in the table name por-tion (FROM clause) of the SELECT statement.

A view file can have COLn:table_name entries and COLn=path entries. Both are used to specify the names of the tables to be includ-ed in the view, and their sequence numbers (n).

A COLn:table_name entry specifies that SearchServer must deter-mine the location of a component table. The colon (:) separator char-acter means that SearchServer should search for the associated table in the ordered list of directories SearchServer usually searches to find table management files.

If the component tables are to be visible in the user's environment, both individually and as part of the view, the COL parameter should be specified with a colon.

A COLn=path entry explicitly specifies where a component table can be found. The equals symbol (=) separator character indicates that SearchServer must search for the component table by following the specified path. In this case, you can specify the full or a partial path leading to the table. An unqualified or partly qualified table name is interpreted as being relative to the directory that holds the view file, while a fully qualified name is interpreted exactly as en-tered.

If the component tables are not to be visible individually in the user's environment, they should be created in (or moved to) directories that are not searched when SearchServer attempts to locate a table. Ful-crum SearchServer Getting Started explains how SearchServer searches for tables. In this case, the path to a component table must be fully or partly qualified and the separator character must be an equals symbol.

The general format of COLn:table_name entries is:

COL1:<table_1>COL2:<table_2>COL3:<table_3>

.

.

.where COL1:, COL2:, and COL3: are COL parameters with their

Page 178: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

178 SA-Application Software Expert 5.0

C

associated sequence number (n), and <table_1>, <table_2>, and <table_3> refer to the names of three tables.

COLn=path entries can exist along with COLn:table_name entries in the same view file. The general format of COLn=path entries is:

COL1=<path1>COL2=<path2>COL3=<path3>

.

.

. The digits that are part of the COL parameter in both cases refer to the sequence of component tables. The component specifications must be numbered sequentially, starting with the number 1. A num-ber can't begin with zero (such as 01), nor can any numbers be skipped (the sequence 1, 2, 4, 5, 6 isn't permitted).

For example, your view file could have the following entries:

COL1:table_x COL2:table_y COL3=../dir1/table_3 COL4=dir2/table_4 COL5:table_z COL6=../dir3/table_6 . . .

CAUTION: The FT_CID values of a view aren't persistent across changes to the view file. However, as long as there are no additions, deletions or changes to the order of component tables included in the view file, you can depend on FT_CID values not to change.

View Restrictions

One view can't be specified as a component of another view.

• A view provides read-only access to the underlying component (base) tables. SELECT operations are supported, but schema definitions, indexing, data manipulation, and removal are not. If required, these operations can be performed on the individual component tables.

• The schemas of component tables must be compatible. The schema for the view is the schema of the first component table.

• All component tables of a view must reside on the same node.

You must ensure that the component tables and their combined sizes can be accommodated by the view. The maximum CID value al-lowed in a component table varies according to the number of view components.

Page 179: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Administration Tools

Text Retrieval Guide 179

Table 3-6 Maximum CID Values

For a complete description of how to verify the maximum CID value in use for each view, see the Fulcrum SearchServer or Fulcrum SearchBuilder Developer's Guide for your environment.

Number of Component Tables

Maximum CID Value

1 — 126 16,777,215

127 — 198 8,388,607

Page 180: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

180 SA-Application Software Expert 5.0

C

Chapter 4:

Using External Text

This chapter shows you how to make external documents search-

able by using text readers to include all kinds of external documents in your tables. You'll read about:

• how text readers access your external text

• the different classes of text readers

• specifying text reader lists

• document library text readers and how to create document library files

• control sequences and how to use them

• installing a text reader

What's Involved in Accessing External Documents

Many text-retrieval applications need to search for and retrieve text stored in different kinds of documents and file formats. Applications need to be able to access these documents dynamically, and without having to make alterations to the text.

To meet this need, SearchServer provides text readers that allow you to access documents stored in any format. Using a text reader, you can retrieve an external text document in its original format instead of the SearchServer translated format, called Fulcrum Technologies Document Format (FTDF), usually used for retrieving documents. This is useful for viewing the document with a non-Fulcrum prod-uct. If you create your documents in FTDF using the Fulcrum Tech-nologies Internal Character Set (FTICS), no text reader is necessary for translation.

Text readers interpret text that is stored in formats that would other-wise be indecipherable to SearchServer. For example, an external document written in Microsoft Word can be translated on demand through a document text reader into the format SearchServer recog-nizes. A text reader also allows you to leave the documents in their revisable form and to access any changes made to them dynamically.

The text readers that Fulcrum supplies may be all you will require. However, if you need to write your own text reader, SearchServer provides a specification in the form of a text reader API. For more information on developing custom text readers, see the Fulcrum SearchServer Text Reader Developer's Guide.

If you're planning on making external documents part of a table, you'll need to decide if you want to keep the data in library files, whether control and/or formatting codes are required, and how to

Page 181: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 181

provide for the most efficient searching capability. This chapter guides you through these optional activities.

Data Model for External Documents

A table reflects the information contained in its table management files. Typically, it also reflects the data contained in one or more ex-ternal documents. An external document might be a memo, a report, a letter, an electronic mail message, or almost any other self-con-tained text file. Any kind of external document can be associated with a table.

External documents are usually stored in operating system files, but can also be stored in some other repository, such as a database. In-formation about the external documents, such as their locations and modification times, is maintained in the internal columns of a table.

Figure 4-1 External Documents are Associated with a Table

External text can be searched through the external text column of the table, that is named FT_TEXT by default. The data for this column is obtained from the external text object and delivered to Search-Server by a list of text readers specified in the FT_FLIST reserved column. A reference to the name of the associated operating system file (if any) is supplied through the FT_SFNAME column.

In addition, certain reserved internal columns give the date (FT_DATE) and timestamp (FT_TIMESTAMP) corresponding to the last modification of the external text object, the type of row it is represented by (FT_ROW_TYPE), and its indexing status

Page 182: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

182 SA-Application Software Expert 5.0

C

(FT_ROW_STATE and FT_TEXT_STATUS).

Text Reader Classes

SearchServer uses a generalized text reader architecture to access external text objects. This architecture groups text readers into three classes*torage access text readers *ormat translation text readers *TDF parsing text readers

These three classes are described in the following sections. Figure 4-2 shows how these text readers are used:

Figure 4-2 Text Reader Architecture

Storage Access Text Readers

The lowest level text reader is responsible for retrieving the original form of a text object from external storage, and delivering the orig-inal as a byte stream. This is called storage access. There are three types of storage access text readers

• direct access

• storage transformation

• expansion

Page 183: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 183

Direct Access

Direct access text readers interface directly with the external storage mechanism. SearchServer provides two direct access text readers: the standard text reader (s) and the Open DataBase Connection (ODBC) text reader (ftodbc).

Storage Transformation

Document authoring applications typically save the resulting text object as a single operating system file. This file can be converted subsequently to another form for storage. For example, it can be compressed and stored in a library with other text objects. To re-trieve the original form, these storage transformations must be re-versed.

Storage transformation text readers transform one form of storage stream to another. An example is the Fulcrum library (l) text reader, which extracts a text object from a Fulcrum document library file.

Expansion

Most external storage mechanisms provide a form of container that holds multiple text objectsa file system directory or a document li-brary file, for example. SearchServer provides a feature called row expansion, that automatically populates a table with individual rows for each member in a container For a complete description of row expansion, see Chapter 5, "Maintaining the Data." Expansion text readers emit a data stream that lists the contents of the container. The Fulcrum library text reader (l) supports this fea-ture.

Format Translation Text Readers

Once the original form of the text object has been retrieved from ex-ternal storage, it must be translated to FTDF using (unless it was cre-ated in FTDF). This translation is for indexing or viewing applications that don't deal with original form.

Examples of format translation text readers include the Fulcrum Technologies Multi-Format Text Reader (ftmf), and the native char-acter set translation text reader (nti).

FTDF Parsing Text Readers

There are a number of advantages of creating zones in text columns. However, if documents are to be retained in their original revisable form, this zoning information can't be stored directly in the docu-ment original. In this case zoning must be derived dynamically.

This is done using the FTDF parsing text reader. Once the document has been converted to FTDF by a format translation text reader, the resulting FTDF stream is parsed by a text reader that looks for par-ticular patterns in the document, and adds control sequences that

Page 184: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

184 SA-Application Software Expert 5.0

C

specify the zoning. SearchServer includes a C Source text reader (r) that places strings and comments in separate zones from other code.

The FTDF parsing text reader can also

• load data into other columns for result list display

• decide whether or not to index hidden (non-displayable) text

An example of this is where the documents are journal articles. A parsing text reader could load the document author, title, date, and journal name into separate columns, and specify a separate zone for the abstract. Hyperlink information in the document could be encod-ed as hidden text, the contents and locations stored in special col-umns for application use, and indexing of the contents suppressed.

Specifying the Text Reader List

Figure 4-2 shows the data being retrieved from external storage by a storage access text reader, optionally being transformed by one or more additional storage transform text readers, converted to FTDF by a document format translation text reader, and then optionally be-ing processed by one or more FTDF parsing text readers.

This is accomplished by specifying a chain of text readers called a text reader list. This list is stored in the FT_FLIST reserved column for each row with external text. SearchServer reads the list from right to left, starting with the lowest text reader in the chain. Each text reader in the list is separated by a colon (:). If any FTDF parsing text readers are used, they must appear at the left end of the list.

Unless documents are stored as FTDF using one of the FTICSs, you need to specify a format translation text reader. For most document formats, you can use the Multi-Format Text Reader (ftmf). Howev-er, if your documents are stored as flat text, the native translation text reader (nti) can perform the same task more efficiently.

Note: There should never be more than one format translation text reader in a text reader list. Most format translation text readers re-quire a storage access text reader, and SearchServer will issue an er-ror if this requirement is not satisfied.

To the right of the format translation text reader you list the storage access text readers. If the first text reader is not a direct access text reader, the s text reader is assumed.

The following simple text reader list uses no parsing, can handle most document formats, and assumes that the document is stored as a file (whose name is specified in the reserved FT_SFNAME col-umn). The s text reader is implicit

ftmf

Page 185: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 185

The following example would be used for a C source file stored as flat text in a DOS file

r:ntid:s

Each text reader in the list can be followed by options that control its operation. For details on the capabilities and options for individual text readers, see Appendix B, "Text Readers."

Note: Text reader lists are case sensitive. By convention, text reader identifiers and options are lowercase.

Expansion requires a more complex form called an expansion text reader list. This form is described in Chapter 5, "Maintaining the Da-ta," in the section called "Row Expansion."

The text readers included with SearchServer are listed in table 4-1. Note that some text readers cover more than one class.

Identifier Name Class What it Doesdir Directory Expansion The directory text

reader expands an operating system directory. Used implicitly by SearchServer when FT_SFNAME refers to a directory.

ftmf Multi-format Format Translation The Multi-Format Text Reader converts a wide variety of document formats into FTDF.

1 Library Storage Transform +Expansion

The library text reader provides access to the individual documents stored in Fulcrum document libraries.

nti Translation Format Translation The translation text reader translates an external character set into one of the FTICS.

Page 186: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

186 SA-Application Software Expert 5.0

C

Table 4-1 Text Readers Supplied with SearchServer

Document Library Text Readers

A document library file is a single operating system file that contains a number of logical external documents. These logical documents are called library documents.

Just as external documents are accessed through the text reader, li-brary documents are accessed through the document library text reader. This allows the individual indexing of library documents as if they were single operating system files. A single SearchServer ta-ble can include more than one document library files as well as other types of external documents.

ftodbc ODBC Direct Access +Storage Transform+Format Translation

The ODBC text reader retrieves data from an ODBC data source, and constructs a synthetic document from the retrieved data. For details, see Fulcrum SearchServer Database Integration.

r C Source Code FTDF Parsing The C source code text reader partitions ASCII C-source text into three zones: strings, comments, and other text.

s Standard Direct Access The standard text reader reads an operating system file. SearchServer uses this text reader if no direct access text reader is specified in FT_FLIST.

t Test Format Translation The test text reader processes text and control sequences expressed in the external character set and translates them into FTDF, using one of the FTICS. (Not recommended for use in tables.)

Identifier Name Class What it Does

Page 187: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 187

In order to take advantage of this feature, you must prepare a docu-ment library file. SearchServer is distributed with the ftlin and ftlout utility programs, which you can use to respectively build and unload document library files. For a complete description of these utility programs, see Appendix A, "Utility Program Summary."

Creating a Document Library File

You can create a document library file using ftlin. To create a doc-ument library file for use with row expansion, issue the following command

ftlin <library_filename> -i<filelist>

ftlin creates a document library file called <library_filename> which comprises the text files specified in the input file called <filel-ist>. This file must refer to a file that contains a list of the names of files to be loaded into the library file. Each filename must be on a separate line.

For example, if you've created an input file called LIBFILES con-taining a list of document filenames and you want to create a docu-ment library file called EXAMPLE.LIB, enter the command

ftlin example.lib -ilibfiles

Document library files can be expanded and retrieved using the doc-ument library text reader. The procedures for using row expansion is described in Chapter 5, "Maintaining the Data." If your application will not use the document library text reader, you'll need to create a library data file at the same time that you create the document library file.

Creating a Static Document Library Data File

To create a static document library file for access by the standard (s) text reader, you must create a document library file and a library data file using ftlin, then load the library data file into the table using the ftcin utility program.

Note: The procedure for loading a library data file into a table is dis-cussed in the section, "Loading a Library Data File" in Chapter 5, "Maintaining the Data."

When invoked with the -s and -o parameters, the ftlin utility pro-gram creates a library data file in addition to creating the document library file. To create a document library file and generate a library data file at the same time, issue the command

ftlin <library_filename> -i<filelist> -s -o<libdata_filename>

Page 188: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

188 SA-Application Software Expert 5.0

C

This creates a library file called <library_filename> that comprises the files specified in the input file called filelist. The <filelist> must not refer to a directory. It must refer to an ASCII file which contains a list of filenames.

The library data file is named through the <libdata_filename> argu-ment. This is the name of the file loaded into the table using ftcin. After the library data has been loaded into the table, the external doc-ument text from the library file is accessed through the standard text reader.

For example, if you create an input file called LIBFILES and you want to create a document library file called EXAMPLE.LIB and a library data file called CONTENTS, enter the following command

ftlin example.lib -ilibfiles -s -ocontents

If you omit the -o parameter and include only the -s parameter, the library data output is written to the standard output stream.

Unloading a Document Library File

Use ftlout to unload document library files created by ftlin. To cre-ate a separate file for each logical member of the library file, issue the command

ftlout <library_filename>

A separate external document is created to hold each library docu-ment, and the character string stored in the header of each document library file member is used for the filenames.

Producing Your Own Document Library File

If you prefer, you can create a custom program to produce your own document library file. That program must produce a file containing a series of library documents preceded by the header format de-scribed in Table 4-2. Document library files that have this header format can be processed with the standard text reader or with the document library text reader.

The information in Table 4-2 is relevant only if you're not using the ftlin utility program to create the header of a library file.

Header Field

What it Means

char[2] a character string that identifies the beginning of a document header ("A7" in the FTCS94)

char[11] a string of ASCII characters (0 through 7 only) that represent an octal number which is the length (in bytes) of the document that follows, not including the length of the header

Page 189: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 189

Table 4-2 Document Library File Header Format

The following example assumes two small documents. The first document, labeled FILE1, is six characters long and contains the string 'ABCDEF'. The second document, labeled FILE2, is 12 char-acters long (14 in octal) and contains the string 'THIS IS ALL.'.

The table shows the contents of the document library file that would be built by ftlin from these two documents. The numbers on the left represent file offsets.

Embedding Control Sequences

To influence the way text is indexed or displayed, you can embed control sequences in the text. These optional control sequences are encoded as directives to the indexing engine, and they can also be used to encode directives that your application is intended to sup-port. Your proprietary control sequences are passed on to the appli-cation—SearchServer does not interpret them.

Control sequences let you:

• Change the indexing mode and/or zone number

• Specify display and format attributes

Typically, a text reader dynamically inserts the required control se-quences into the text stream, leaving the external text in its native format. Alternatively, you can manually embed control sequences in the text of any column or external document associated with your ta-ble.

All SearchServer control sequences must precede the text to which they apply. Some examples that demonstrate this requirement are provided at the end of this chapter. Control sequences aren't returned to the application unless you execute a SETSHOW_SGR statement that specifies 'TRUE' or set the appropriate statement option.

char[] an optional variable-length character string, usually the name of the operating system file that was used to generate the library file record

char[1] a terminating zero byte (\0)

Header Field

What it Means

0: A 7 0 0 0 0 0 0 0 010: 0 0 6 F I L E 1 \0 A20: B C D E F A 7 0 0 030: 0 0 0 0 0 0 1 2 F I40: L E 2 \0 T H I S I50: S A L L .

Page 190: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

190 SA-Application Software Expert 5.0

C

Manual Method To add sequences manually to the text of external documents, use your native text editor or word processor to enter the changes. The control sequence introducer, \E[, can be embedded in the external text to introduce a control sequence. When you're fin-ished, save the file in its native format. Typically, you would use this method to generate test data for your application.

Note: (\E[) is a notational device that represents the two-byte control sequence introducer whose hexadecimal representation is 0x1B5B.

Afterward, you can use the translation text reader (nti) to read the external document, and convert it to FTDF. If your text reader can't generate the escape code, you can manually embed the \E[ notation in your file. In this case, use the ftpr utility with the test text reader that accepts the \E notation in the external character set and trans-lates it into esc (0x1B). For more information about test text reader translations, see AppendixB, "Text Readers."

Batch Text Method You can create a batch text processing program to embed control sequences in the text of external documents. The program scans the external text, looking for specific delimiters, headings, or other structural clues to determine where to place the appropriate control sequences. It outputs external documents in FTDF or native format. You can use the standard, test, or translation text reader to process these documents, as appropriate. Batch text processing is suitable for a text-retrieval application for static documents, such as an electronic publishing application. In this situation, it makes sense to translate documents from their orig-inal, revisable format to FTDF as part of the publishing process. This can reduce the amount of storage required to accommodate the doc-uments, and can simplify the packaging of your application when the source documents have a variety of formats.

Table 4-3 summarizes the SearchServer FTDF control sequences. For each sequence, this table indicates its format, and whether the sequence acts as a word separator. The syntax is case-sensitive.

Name Syntax TermSeparator

Select Zone \E[parameter1;parameter2; ...;parameterNs

Yes

Select Index/Display Mode

\E[parameter1;parameter2; ...;parameterNv

Yes

Insert Character Separation

\E[parameter1;parameter2; ...;parameterN+v

Yes

Page 191: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 191

Table 4-3 SearchServer Control Sequences

For a complete description of each control sequence, see Appendix D, "Control Characters and Control Sequences."

Note 1: \t denotes a tab character.

Note 2: Display control sequences are not seen by viewers that dis-play the document original.

Object Tag \E[object_ID;ref_length;label_length;{

Yes

Select Graphic Rendition

\E[parameter1;parameter2;...;parameterNm

No

Register Font \E id;length{font\tfacename

No

Associate Font with SGR

\E[SGR font number;font_id

No

Set Left Margin

\E[parameter$@ Yes

Set Right Margin

\E[parameter$A Yes

Set Page Number

\E[value*v Yes

Current Line Indent

\E[parameter$F Yes

Justify \E[parameter 1;parameter2;...;parameterN F

Yes

Set Tab \E[tab_stop1;tab_stop2; ...;tab_stopN N

Yes

Center \E[selector#y Yes

Hard Space \E[$H Yes

Soft Hyphen \E[$a No

Hard Hyphen \E[$I Yes

Set Positioning Unit Mode

\E[11h No

Graphic Size \E[size C No

Name Syntax TermSeparator

Page 192: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

192 SA-Application Software Expert 5.0

C

Installing Text Readers

A text reader can be integrated with SearchServer only after it has been compiled and linked into a shared object library called the dy-namic-link library (DLL).

Before doing the installation, you'll need to know the text reader identifier and the names of the text reader's entry points. You'll also need to know the name and location of the text reader shared library file. If you're installing a custom text reader, this information is sup-plied by the text reader developer.

Updating the Dynamic Text Reader Table

The first step in integrating a text reader with SearchServer involves creating a text reader record. This record has six fields of informa-tion similar to the following example

FIDF ftmf ftdmf11w.dll ftfmfOpen ftfmfinfo ft-fmfidentify

This example is a portion of FULTEXT.EFT. It shows the entry for the Multi-Format Text Reader. The fields are separated by space characters, but horizontal tab characters could also be used. Information in FULTEXT.EFT must be represented in ASCII. Iden-tifiers should be restricted to alphanumeric characters and the max-imum length is 18. The format of a text reader record is described in the following sections.

First and Second Fields The first field must contain the characters FIDF, which indicates that the entry refers to a dynamically-linked text reader. Enter the identifier of the text reader in the second field. The example shows the text reader identifiers ftmf.

Third Field In the third field, enter the unqualified or fully qualified filename of your text reader shared library.

A fully qualified value might be C:\FULCRUM\LIB\MYLIB.DLL. When you specify an unqualified filename, such as MYLIB.DLL, SearchServer searches for the library in a platform-dependent order (see Fulcrum SearchServer Getting Started for details). For exam-ple, in a Windows environment, SearchServer searches for an un-qualified filename in your SearchServer-installed BIN directory.

Fourth, Fifth, and Sixth Fields The remaining fields are for the names of the text reader's Open, Info, and Identify functions. In the example, the Open function of the Multi-Format Text Reader (spec-ified by ftmf in the second field) is ftfhopen, the name of its Info function is ftfhinfo, and it has no Identify function.

You can add a comment anywhere in the file as long as it is preceded by a number sign(#). A comment is terminated by the end of a line.

Page 193: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Using External Text

Text Retrieval Guide 193

Loading the Dynamic Text Reader Table

The text reader you define can't be used until the text reader record has been loaded into the SearchServer dynamic library table (FUL-TEXT.EFT). To do this, invoke the ftmload utility program as fol-lows:

echo FIDF cx mylib mytropen mytrinfo - > tmpfile ftmload tmpfile -e -o fultext.eft

This loads the new entry for the text reader into the system configu-ration file. SearchServer recognizes the newly-installed text reader after you restart your SearchServer application.

You should keep a record of modifications to the dynamic library ta-ble by running ftmunld. For example

ftmunld myfilters.eft -o fultext.eft

Page 194: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

194 SA-Application Software Expert 5.0

C

Page 195: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 195

Chapter 5:

Maintaining the Data

This chapter explains how to perform data management in Search-

Server tables. You'll learn how to:

• insert new data into your table

• assign values to reserved columns

• prepare for indexing

• index data

• modify the data in your table

• pack up and restore table data

Introduction

In the two previous chapters, you've learned how to create a table and prepare external data. This chapter discusses four operations you can perform on a table that will change the data it contains. It also provides numerous examples of how to perform these opera-tions, that you can follow using the SUPPORT table.

Note: If you execute the examples in this chapter on the SUPPORT table, you will change the results for the examples in the subsequent chapters. If you want to execute these examples (and still receive the same results as those given in subsequent chapters) you should rec-reate the SUPPORT table after completing this chapter.

Inserting Data into the Table

The INSERT statement allows you to enter data into the table one row at a time. The maximum number of rows in a table (or view) is 16,777,215.

When you use the INSERT statement, you're creating a new row and giving values to each of the columns that you name in the INSERT statement. To insert data into a new row of a table, you'll need to:

• enter the name of the table

• identify the column(s)

• identify the values for each column in the list

The syntax of the INSERT statement looks like this:

Page 196: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

196 SA-Application Software Expert 5.0

C

INSERT INTO <table name><column name>[{, <column name>}...])] VALUES (<insert value>[{, <insert value>}...])

The table name identifier is used to identify the table into which you want to insert data. This name must be the valid name of an existing table. The column name list specifies the columns of the table to which you want to add the data. You then list the corresponding values for each of the col-umns in the list.

If a column list isn't specified, the value list must contain a value for each application column in the same order as in the schema. You can assign values to some reserved columns when inserting a row. For more information, see the section, "Assigning Values to the Re-served Columns" later in this chapter.

What Happens During an Insert?

The table is empty when it is first created. When you execute an IN-SERT statement for that table, the data is inserted into a new row in the specified columns. Columns for which no value has been as-signed, are initialized to null values.

SearchServer assigns the first value to the first column named and the second value to the second column named, and continues through the value list matching each value to its corresponding col-umn name. You must be sure that the number, order, and data types of separate values match the number, order, and data types of col-umns named in the column list.

For example, if your INSERT statement looks like this:

INSERT INTO SUPPORT (COMPANY, PRIME_CONTACT) VALUES ('DIXIE CORP.', 'Dave Chisholm')

SearchServer would insert a new row containing the character strings DIXIE CORP. and Dave Chisholm in the SUPPORT ta-ble as values for the columns COMPANY and PRIME_CONTACT respectively. The remaining columns in that row have no values as-signed and would therefore contain null values.

What Information is Defined?

An insert value can be either a literal or a null value. Literals are of three different types:

• character

• numeric

• date

You can specify a null value by the NULL keyword that indicates the absence of a value. You can't search for a null value with a SE-

Page 197: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 197

LECT statement, although you can retrieve a null value from a working table through the SearchServer API.

When inserting rows into the table, you must also keep in mind the data types that were assigned to the columns that are named. If the data type of a column is numeric and you attempt to insert a charac-ter value, SearchServer returns an error. Also, the data length char-acteristic of a character column prevents the insertion of data if the data length isn't less than or equal to the maximum size allowed.

If the data length exceeds the column's length, an error is returned. If the data length is less than the size defined for a fixed-length char-acter column (for example, CHAR(10)), the value is padded on the right with spaces.

In the SUPPORT table, for example, the data type of the CREATOR and STATUS columns is CHAR and the size is 8. Therefore, the fol-lowing INSERT statement causes an error:

INSERT INTO SUPPORT(CREATOR, STATUS)VALUES ('ELIZABETH', 'KEEP OPEN UNTIL 920214')

Both of the values exceed the maximum length specified with the data type of the column.

All character data in the INSERT statement must be in the character set indicated by the current setting of the CHARACTER_SET at-tribute of the SERVER_INFO system table.

The length of any one column (except the external text reserved col-umn) can't exceed 32,767 characters. The length of any SearchSQL statement is limited only by memory availability in all platforms ex-cept 16-bit Windows where the limit is 32K.

A Word About Scripts and Control Sequences

You can use a script to insert the data into the table. It can contain any number of SearchSQL statements.

Control sequences can be embedded in the data that is specified for character columns in an INSERT statement. However, you must not embed control sequences in the data values for the following re-served columns:

• FT_SFNAME

• FT_FLIST

Use the control sequence introducer (esc[) to embed control se-quences in text to be inserted by a script. Assuming that your text ed-itor allows an esc (0x1B) code in the output (script) file, you can create the script that way. If it doesn't, you can create the script man-ually through the ExecSQL administration utility, that translates \E to esc.

Page 198: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

198 SA-Application Software Expert 5.0

C

Alternatively, you can create the script through a custom loader pro-gram. The program scans the source text, looking for specific delim-iters, headings, or other structural clues to determine where to place any required control sequences. Its output would be a number of IN-SERT statements represented in the external character set.

Accessing External Documents with FT_SFNAME and FT_FLIST

Most tables have an external document associated with each row. You give SearchServer access to each by setting the FT_SFNAME and FT_FLIST columns. FT_SFNAME contains the path to the as-sociated operating system file, while FT_FLIST contains the corre-sponding text reader list.

The text reader list is a specification for the ordered list of text read-ers, along with their options and parameters, that is to be used to de-liver the text data stream from external storage.

You can use the INSERT statement, as described earlier, for this purpose. In the following example, the file support.doc is to be read with the ftmf text reader, which gets its input from the s text reader:

INSERT INTO SEARCHME(FT_SFNAME, FT_FLIST) VALUES ('support.doc', 'ftmf:s')

If FT_FLIST does not end with a direct access text reader (one that knows how to access external storage), SearchServer uses the stan-dard s text reader for this purpose. As a result, the value of FT_FLIST in the above example can be simplified to 'ftmf'.

Note: If your documents are not stored in files (for example, in a data base), the FT_SFNAME can not be used. How FT_SFNAME is used is determined by the storage text reader specified in FT_FLIST.

Containers and Row Expansion Text Reader Lists

SearchServer contains a powerful document container management capability called row expansion.

If your documents reside in a container, such as a directory or docu-ment library, you can populate the table with the container contents simply by inserting a row for the container, and then executing a VALIDATE INDEX statement with the VALIDATE TABLE pa-rameter. If the container contents later change (that is, documents are added or removed), subsequent table validations will cause the

Page 199: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 199

changes to be reflected in the table.

Note: Since table validation can occur only as part of periodic index-ing, row expansion does not happen immediately on insertion of a container row. Row expansion is most effective for tables in which indexing is periodic, rather than immediate.

SearchServer considers a container row to be one with either of the following:

• FT_SFNAME names a directory.

• FT_FLIST contains an expansion text reader list.

SearchServer sets the FT_ROW_TYPE column for container rows to 'DIRECTORY'. Otherwise, it sets it to 'DATA'.

CAUTION: Row expansion checks for the existing rows by looking at the contents of FT_SFNAME and FT_FLIST. If a match is found, row expansion will assume that it is responsible for that row, wheth-er it was actually created by row expansion or not. The results of in-serting rows that overlap container contents, or that duplicate external documents, is unspecified.

If FT_SFNAME names a directory, automatic recognition of exter-nal document formats is possible, using the reserved text reader list identify. For more information about identify, see the section called "Auto Recognition (identify)" later in this chapter.

Expansion text reader lists can be used, for example, with Fulcrum document libraries.

How to Use Directories

If FT_SFNAME names a directory, then SearchServer expands the entire directory tree. FT_FLIST for each document (DATA) row is modeled on the FT_FLIST value for the container. For example, the following will cause SEARCHME table validation to insert and maintain a row for every file (and directory) in the searchme directory tree, using the ftmf text reader: INSERT INTO SEARCHME(FT_SFNAME, FT_FLIST) VALUES ('searchme', 'ftmf')

Directory expansion fills in several reserved columns:

Page 200: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

200 SA-Application Software Expert 5.0

C

For data rows only, directory expansion initializes additional re-served columns:

Some text readers such as FTMF subsequently override the values of FT_DNAME and FT_OWNER.

Auto-Recognition (identify)

SearchServer supports a feature called auto-recognition. This fea-ture is used for directory expansion only.

If you use 'identify' as the text reader list in a directory row, Search-Server looks for a text reader with an Identify function that claims to recognize the format of the document stored in the file for that row. If one is found, SearchServer puts its name in FT_FLIST for that row; otherwise, the nti text reader is used. The ftmf text reader recognizes most comment desktop document formats.

For example, the following will cause SEARCHME table validation to insert and maintain a row for all files and directories in the searchme directory, using the ftmf text reader for desktop docu-ments, and nti for flat text documents:

INSERT INTO SEARCHME(FT_SFNAME, FT_FLIST) VALUES ('searchme', 'identify')

FT_FLIST If the contents of FT_FLIST does not contain a "!", table validation converts FT_FLIST to a form containing a "!", suitable for directory row expansion. FT_FLIST for each data row will consists of the portion following the "!".

In the above example, FT_FLIST for the row searchme will be "dir!ftmf", while that for each resulting document row will be just "ftmf".

FT_SFNAME The path of the file or directory for this row, relative to BASEPATH for the table.

FT_MTIME The last-modified time stamp of the file or directory.

FT_DATE The date corresponding to FT_MTIME.

FT_DNAME The file (or directory) name, relative to its container (i.e., with path components stripped).

FT_OWNER The operating system’s name for the owner of the file.

Page 201: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 201

Note: The text reader list must consist of 'identify' by itself, with no other text readers or options. As a result, only default character translation will be in effect.

How to Use Expansion Text Reader Lists and Libraries

An expansion text reader list is a special form of text reader list that SearchServer uses to drive row expansion. It consists of two parts, separated by a "!" character. The first part specifies a text reader list that begins with an expansion text reader (i.e., one that knows how to enumerate container contents). The second part, called a model text reader list, is the text reader list that is to be used for the rows that result from the expansion. A model text reader list typically also contains a "@" character as a place holder for a key that is used to locate the container member.

For example, the following will cause SEARCHME table validation to insert and maintain a row for each member of the Fulcrum docu-ment library file library.dlf, using the ftmf text reader:

INSERT INTO SEARCHME(FT_SFNAME, FT_FLIST) VALUES ('library.dlf', 'l/x:s!ftmf:l/r/@:s')

For a complete description of the l text reader, see Appendix B, "Text Readers."

How to Use Library Data Files to Popu-late a Table

An alternative to using row expansion via table validation to popu-late a table, is to use a library data file. When a Fulcrum document library is created with the ftlin utility, a corresponding library data file is created. This file can be used with the ftcin utility to insert cor-responding rows into the table.

For example, the following will cause a row to be inserted in the ta-ble SEARCHME for each member of the Fulcrum document library file described by the library data file contents.

ftcin searchme -q -x -t contents

This method has some advantages: it does not require a container row, and retrieval is more efficient. However, the ftlin utility as-sumes the use of the standard s text reader. As a result, unless the li-brary data file is manually edited, other text readers cannot be used. This means that the document library members must already have

Page 202: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

202 SA-Application Software Expert 5.0

C

been translated to FTDF (possibly by a utility such as ftpr). This scenario avoids document translation at run-time, and does not use table validation (that maintains currency). It is therefore most useful in static (usually read-only) applications.

Note: The table data is generated by a separate utility and then load-ed once with another utility. Therefore, it is not useful for libraries that change. If a library changes, the table must be re-built.

For a complete description of ftcin, see Appendix A, "Utility Pro-gram Summary."

Assigning Values to the Reserved Col-umns

Reserved columns are created automatically when you create a ta-ble. They are accessed internally by SearchServer and are used to provide critical information to your application. For a complete list of the reserved columns and their definitions, see the section, "Using the Reserved Columns," in Chapter3, "Structuring the Data."

When you assign new names to the reserved columns they are then referenced by those new names when used in a SearchSQL state-ment. Values are assigned in the same way as application-defined columns.

You can assign values using the INSERT and UPDATE statements to the following reserved columns:

FT_DNAME FT_OWNERFT_DATE FT_SFNAMEFT_FLIST FT_SUBJECTFT_KEYWORDS

The following reserved columns can't have values assigned by the INSERT and UPDATE statements but provide important informa-tion to your application. Some of them, such as FT_DFLAG, and FT_MTIME, provide document status information and are assigned values automatically when an INSERT and UPDATE statement are executed, or in the case of FT_TEXT_STATUS, when a retrieval is performed.

FT_CID FT_ROW_STATE FT_DATE FT_ROW_TYPE FT_DFLAG FT_TEXTFT_FORMAT FT_TEXT_STATUSFT_MTIME FT_TIMESTAMPFT_ORIGINAL_SIZE

Page 203: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 203

The FT_TEXT reserved column has the external document text as its value. This value is derived from the file specified by the FT_SFNAME reserved column.

The following section describes how the INSERT, DELETE, and UPDATE statements interact with the following reserved columns:

FT_DATE FT_ROW_STATEFT_DFLAG FT_ROW_TYPEFT_DNAME FT_SFNAMEFT_FLIST FT_TEXT_STATUSFT_OWNER FT_TIMESTAMP

FT_DATE

SearchServer automatically assigns the date value in the following manner:

• If the table doesn't have an external document, the date value assigned is NULL (but considered zero for search purposes).

• If there is an external document, then SearchServer assigns the date of the most recent modification of the document file.

FT_DFLAG

This is a flag that is set for the application by SearchServer. The val-ue of this reserved column can be either 0 or 1. SearchServer sets the value of FT_DFLAG to 1 for any row that has an external document.

When you execute an INSERT or UPDATE statement, this flag is set automatically in the following manner:

• If the data includes a value for FT_SFNAME, the flag value is set to 1.

• If the FT_SFNAME column is not referenced in an UPDATE statement, the previous setting of this flag is unchanged.

• If the FT_SFNAME column is not referenced in an INSERT statement, or if an UPDATE explicitly sets it to NULL, the flag is set to 0.

FT_DNAME and FT_OWNER

If the row isn't a container, the application can provide the values for these two columns by executing an INSERT or UPDATE statement. If a row is inserted using directory expansion, these columns are as-signed as follows during table validation:

Page 204: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

204 SA-Application Software Expert 5.0

C

Text readers can also set these columns. You can also update the val-ues for these columns in a row inserted by row expansion.

FT_FLIST

This reserved column supplies the text reader list that is used to in-terpret the file that is specified in the FT_SFNAME column. This text reader list names the text reader(s) required to read the docu-ment and convert it dynamically for use with SearchServer. (The document file itself remains unaltered.)

For example, assume that the file 92011101 has been written in Microsoft Word. SearchServer must convert this document internal-ly to FTICS so that it can perform searches. You must assign a text reader list (in this case, ftmf) to the row so that the appropriate text reader is used. When the search results are displayed to the user, they are displayed in their original form.

The INSERT statement might look like this:

INSERT INTO SUPPORT (TEXT_LOG_FILE, DATE_CLOSED, PROBLEM_NUMBER, CREATOR,

PRIORITY, STATUS, COMPANY, PRIME_CONTACT, PHONE_NUMBER,

ENVIRONMENT, PRODUCT_VERSION,SUBJECT, FT_FLIST) VALUES ('92011101', DATE'1992-01-12', '92011101', 'Peter ', 0,

'CLOSED ', 'GILFORD SYSTEMS', 'Jessica Trew', '416 728-1983',

'Windows', '1.1A', 'When can I free a statement handle',

't/i=0:ftmf:s')

The text reader list in this reserved column can be used for indexing, SearchServer retrieval, and retrieval in its original format. When re-trieving in original format, SearchServer removes any text readers that do not recognize original format mode.

You can't assign a value to the FT_FLIST reserved column unless you assign a value to FT_SFNAME at the same time. However, if this column is not assigned a value and you have assigned a value to FT_SFNAME, SearchServer automatically assigns the name of its standard text reader (s).

If you use a library or directory specification as the value for the FT_SFNAME column, you can supply values only for the

FT_DNAME document name (the filename corresponding to FT_SFNAME minus any preceding directory path)

FT_OWNER name of the owner of the file (if it can be determined)

Page 205: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 205

FT_FLIST and FT_SFNAME columns.

If you specified a library or directory name for FT_SFNAME, any text reader list value assigned to the FT_FLIST column is applied (directly or with modification) to all document files in the library or directory and subdirectories.

FT_ROW_STATE

This reserved column contains a value that indicates the condition of the index for this row. SearchServer automatically assigns a value when an INSERT, VALIDATEINDEX, or UPDATE statement is executed. There are four possible values for this column:

NOT_YET_INDEXED INDEXED UPDATED_SINCE_LAST_INDEXING CANNOT_BE_INDEXED null

Note: The null value indicates a container row.

For a PERIODIC table, the value of FT_ROW_STATE is set to 'UPDATED_SINCE_LAST_INDEXING' after an UPDATE state-ment is executed. It is then eligible to be indexed again when the next VALIDATE INDEX statement is executed.

For an IMMEDIATE table, the UPDATE statement causes indexing to occur immediately. Therefore, the value of FT_ROW_STATE af-ter an UPDATE statement is either:

• ‘INDEXED' if indexing succeeds

• ‘CANNOT_BE_INDEXED' if indexing does not complete due to a catastrophic failure or an indefinite loop

• ‘NOT_YET_INDEXED' if indexing failed for some other reason

FT_ROW_TYPE

This reserved column is assigned automatically by SearchServer to have one of the following values:

• DATA for all rows except containers

• DIRECTORY for container rows

FT_SFNAME

If you're inserting or updating a row that is to have an associated ex-ternal document, you must supply a value for this column. Typically, you would assign the name of the file that you want to add to the ta-

Page 206: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

206 SA-Application Software Expert 5.0

C

ble. The data inside this file (after processing by a text reader) be-comes the value for the FT_TEXT column for this row. If you don't include a full pathname, SearchServer assumes that the location of the file is relative to the one specified by the BASEPATH table pa-rameter of the CREATETABLE clause.

You can supply the name of a file or container as the value of this column. However, if you specify a container, you're using the row expansion feature and these conditions apply:

• The data in the row naming the library or directory is not indexed and therefore can't be searched or retrieved by the SELECT statement, except by explicit searches of the FT_ROW_TYPE column. This is the only way to position a cursor on a library or directory row in order to delete it. As the row is otherwise not retrievable, you aren't allowed to insert data into any column other than FT_SFNAME and FT_FLIST when FT_SFNAME specifies a directory or library.

• The next time table validation is performed, SearchServer enters the documents that it finds in the directory or library specified as new searchable rows in the table. If the named directory contains any subdirectories, rows are also added for the files within them, and so on.

• For example, in the SUPPORT table the external text column has been renamed TEXT_LOG, and FT_SFNAME has been renamed to TEXT_LOG_FILE. There is one document for each problem report. To insert this information, you must assign the name of the problem report file to the TEXT_LOG_FILE column, as in:

INSERT INTO SUPPORT (PROBLEM_NUMBER, PRIME_CONTACT, COMPANY, CRE-

ATOR, TEXT_LOG_FILE)VALUES ('92011301', 'MONTAG HORTZ', 'OREO SOFT-

WARE', 'POLLY', '92011301')

In this case, the value for external document file (TEXT_LOG_FILE) has been specified as the name of the docu-ment file, 92011301. The information in the 92011301 file becomes the value for the external text column (TEXT_LOG).

If a value isn't assigned to this column, then the row has no external document file and the value of the FT_TEXT column is NULL.

SearchServer automatically associates external documents with a ta-ble, as long as the FT_SFNAME and FT_FLIST reserved columns of the table are assigned values prior to indexing.

FT_TEXT_STATUS

SearchServer automatically assigns a value to this reserved column that indicates the status of the external document (that is, the FT_TEXT column of this row). It can be used for all types of rows.

Page 207: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 207

The possible values for this column are:

• OK indicates that the external document is readable.

• MISSING indicates that the external document can't be found. This can happen if the document is on a different machine and the connection between the two machines is down, or if the document has been deleted.

• UNREADABLE indicates that the user doesn't have read permission for the external document.

• UPDATED indicates that the external document has been updated since the last VALIDATEINDEX statement was executed. If the value in this column is UPDATED, the value in the FT_ROW_STATE column is UPDATED_SINCE_LAST_INDEXING.

• NULL indicates that there is no external document associated with the row.

FT_TIMESTAMP

SearchServer automatically assigns the FT_TIMESTAMP value ac-cording to when the row was last updated.

Preparing for Indexing

Before you index a table, make sure that:

• Any required operating system file permissions have been set. The user who indexes a table must have read and write access to the table management files, and read access to the stop file (if any). It must also have write access to the directories containing index files and temporary files.

• The FULTEMP server attribute (if applicable) has been set appropriately. SearchServer must have write access to the location specified by this server attribute. Information about setting FULTEMP is provided in Fulcrum SearchServer Getting Started.

• If the WORKDIR directory was specified in the CREATE TABLE clause, it exists and has sufficient space. In addition, SearchServer must have write access to the directory.

• The correct amount of disk space has been made available for SearchServer operations.

• The parsing rules are defined. For example, European numeric format must be enabled if your documents contain European-style numbers.

• If you need to do any of the following, look at the contents of the table's configuration file and modify the entries using a standard text editor.

– change the BASEPATH – index accented characters – use a stop file – name a second directory for temporary files during indexing

Page 208: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

208 SA-Application Software Expert 5.0

C

Depending on the operating system environment, SearchServer au-tomatically ensures that certain files are not indexed. Under UNIX, they include any file named core, filenames ending with .o or .a, filenames beginning with a period(.), and any executable binary files. Under Microsoft Windows, filenames ending with .LIB, .OBJ, .EXE, or .COM are not indexed.

For more information about the disk space required by index files at the end of the indexing process, see Appendix C, "Table Manage-ment Files."

Note: In 16-bit Windows environments, indexing uses FULSEARCH as set in the FULCRUM.INI file. You must ensure that FULSEARCH in FULCRUM.INI and in the data source param-eter (in the SERVER_INFO system table) are both set appropriately. For more information, see Fulcrum SearchServer Getting Started and Fulcrum SearchServer SearchSQL Reference.

Space Requirements

The space required for the dictionary and reference file is usually less than 50 percent of the size of the text that comprises a table. An explicit indexing operation updates the index by creating a second updated copy of the index information.

Therefore, when the indexing engine is run, two sets of index files exist until the final sort file is merged with the existing dictionary and reference file. Until one updated set of index files emerges, tem-porary space requirements for indexing can be 2 to 4 times of the ul-timate size of the index files.

Note: The space occupied by the differential index in an immediate table is not released by an indexing process. However, it becomes available for use for new differential index data.

Defining the Parsing Rules

The parsing of text into the terms that are indexed is done based on certain rules. Each character in a character set is associated with a character class. The character class has a parsing rule associated with it. The combination of all character classes defines the parsing rules for a character set. The default rules as associated with the in-ternal character set (FTICS) that you are using are explained in the Fulcrum SearchSQL Reference.

Enabling European Numeric Format

SearchServer can be modified to recognize the standard European

Page 209: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 209

format for numbers, where a period (.) is used to separate thousands, and a comma (,) represents a decimal point (for example, 1.010,03). You can enable the European numeric format by adding the follow-ing two lines to the beginning of the stop file:

DJS=",." STOPFILE=

Remember that any change to the stop file requires that you must completely re-index all associated tables.

Enabling Accent Indexing

Accented characters (for example, à, É, or ö) are indexed without the accent (as A, E or O) by default, but the presence of an OPT:a entry in the configuration file indexes with the accents. Even with accent indexing enabled, case normalization of alphabetic characters is still performed if it was selected when the table was created.

The table must be completely re-indexed (execute a VALIDATE INDEX statement with the ABANDON parameter) after an OPT:a entry is added, changed, or deleted. It is recommended that you add it before the table is indexed for the first time.

Note: Inserting the OPT:a option into an immediate table is not suf-ficient to enable accent indexing. You should index the table using the ABANDON parameter on an immediate table before any index-es are done.

Customized Parsing Rules

Customized parsing rules can be provided for your installation by re-defining the set of characters assigned to each character class. For complete instructions on how to perform this task, see Fulcrum SearchServer SearchSQL Reference and the Fulcrum SearchServer Customization Guide.

Modifying the Stop Word List

The stop file identifies the search terms that are not to be indexed. A word that isn't indexed can't be searched. Stop words in a search are treated as if they match every row.

When you create the schema, you can specify the name of a stop file (by default no stop file is named) in the CREATE TABLE clause of the CREATESCHEMA statement. If you used the CREATETABLE statement, the stop file for your table is defined in the STOPFILE server attribute in the SERVER_INFO system table.

The FULTEXT.STP stop file supplied with SearchServer contains the words that occur so often in typical English writing that they pro-

Page 210: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

210 SA-Application Software Expert 5.0

C

vide no search value to the search engine or the user (words such as "and" and "the" fall into this category). Adding the words that occur most frequently can improve the performance of the indexing and search engines. The stop file can also contain a character class defi-nition that modifies the rules SearchServer uses for recognizing nu-meric punctuation.

You can create a new stop file using a text editor or word processor. Preparing a custom stop file should be performed before any index-ing operation. This avoids having to re-index the data later.

What's in the Stop File?

The stop file typically contains alphabetic words, but you can add other types of stop words (subject to the word definition rules de-scribed in Fulcrum SearchServer SearchSQL Reference).

It is important to note that a word should not be considered for in-clusion in a stop file unless it is known to be of no search value in all contexts. For example, the word "a" is not included in the FUL-TEXT.STP stop file because the letter "a" could be an important des-ignator in some contexts (such as "Appendix A").

The SearchServer index files require low storage overhead. In addi-tion, the query refinement done by Intuitive Searching chooses the most valuable words for the search. Therefore, making common words into stop words is often unnecessary.

The stop words in the stop file FULTEXT.STP are:

after between into that uponalso but of the whenan by or there whereand for other these whetheras from out this whichat however since those withbe if such to withinbecause in than under withoutbefore

Adding and Deleting Stop Words

A stop file can contain a maximum of 1024 stop words totaling not more than 10,000 characters. The stop file can only contain unique entries.

CAUTION: After modifying the stop file, you must completely re-index all of the tables associated with that particular stop file. To re-index a table, issue the VALIDATE INDEX statement with the ABANDON parameter.

Page 211: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 211

The stop file syntax supports multiple stop words per line and con-forms to the following syntax rules:

The case normalization of alphabetic characters is performed auto-matically, so it doesn't matter whether the words you add are in up-percase or lowercase letters.

Accented characters must not be used in stop words unless accent in-dexing has been enabled in the configuration files of all of the tables associated with the stop file. For more information about how SearchServer handles accented characters, see the section, "En-abling Accent Indexing," earlier in this chapter.

Selecting a Text Reader to Read Your Stop File

The stop file is read by SearchServer through a series of text readers just like an external document, and the resulting stream must be in FTDF. If your stop file only contains ASCII characters or is already in FTDF, the default text reader, s, is sufficient. Otherwise, you must add an STF entry to the configuration file of each table to which the stop file is applied.

The optional STF: entry in the configuration file of a table lets you specify the list of text readers needed to read the stop file. You must add an STF: entry if the stop file associated with the table isn't al-ready represented in ASCII or the FTICS. The STF: entry must rep-resent a sequence of text readers that translates the stop file to an FTDF text stream. For example, if your stop file is in a Word for Windows file, the entry would be: STF:ftmf:s

If your stop file is created in the ISO Latin-2 character set, the entry would be: STF:nti/ti=ISO_LATIN2:s

When you add, change, or delete the STF: entry after indexing, you must abandon the current index and completely re-index your table.

<stopfile> ::= <stopword-list>

<stopword-list> ::= <stopword-line> [ <newline> <stopword-line> ]...

<stopword-line> ::= [ <stopword> [ <stopword-separator> <stopword> ]...] [<comment>]

<stopword> ::= any sequence of characters, excluding the space character, the number sign, and the equality symbol

<stopword-separator> ::= space character | horizontal tab character

<comment> ::= "#" <comment-text>

<comment-text> ::= any sequence of characters, excluding <newline>

<newline> ::= (an optional carriage return character followed by) a line feed character

Page 212: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

212 SA-Application Software Expert 5.0

C

For more information about editing the configuration file, see Ap-pendix C, "Table Management Files."

Optimizations that Affect Indexing

There are a number of methods you can choose to use to optimize performance. Use the optimization best suited for your application:

• increasing the sort buffer and temporary file size

• trading off index overhead against search time for wildcards

• naming a second directory for sorting and indexing

Optimizing During the Index Build

You can optimize indexing performance by increasing the sort buff-er and temporary file size. You do this by setting the BUFFER and TEMP_FILE_SIZE parameters in the VALIDATE INDEX state-ment.

The indexing engine reads the column data (including the external text) one row at a time, and creates a temporary buffer containing the words that require indexing. The relative locations of these words, and the identifiers of the rows with which they are associated, are also stored in the buffer.

The BUFFER parameter to the VALIDATE INDEX statement can be used to control the size of this buffer. If memory is available, a larger buffer decreases indexing time.

By default, BUFFER is set to 206,000 bytes on most platforms, and 16,384 bytes (maximum) in 16-bit Microsoft Windows environ-ments. Decreasing the BUFFER size reduces indexing speed and in-creasing it (within practical limits) can improve indexing speed.

TEMP_FILE_SIZE is 8,388,608 bytes by default in all environ-ments, and there is no imposed maximum. TEMP_FILE_SIZE lim-its how much text can be indexed before the contents of the temporary sort file are merged with the table's index files. If the in-dexing process is not complete when a merge occurs, a new indexing cycle begins automatically. For example:

VALIDATE INDEX SUPPORT BUFFER 12000 TEMP_FILE_SIZE 4096000

If the buffer isn't large enough to contain all of the data, the contents of each filled buffer is sorted and written to a separate segment of the intermediate sort file. These segments are repeatedly merged until all of the words indexed during the current pass have been stored in a final sort file, contained in the directory named by the INDEXDIR parameter in the CREATETABLE clause. SearchServer's disk space requirements are discussed in Appendix C, "Table Management Files."

Page 213: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 213

If file space is available, a larger BUFFER parameter decreases in-dexing time by reducing the number of merge passes on the index data. The operations of reading text and sorting and merging word lists lead up to the creation of the final sort file.

The first intermediate sort file is created in the directory you name in the WORKDIR parameter of the CREATE TABLE clause. You can specify the maximum size of this temporary sort file through the TEMP_FILE_SIZE parameter of the VALIDATE INDEX state-ment. The value you specify can limit the extent of the indexing cy-cle; a smaller temporary sort file means that more cycles can be required to complete indexing.

The space requirements (in excess of the current index file size) you can expect are shown in Table 5-1:

Table 5-1 Space Requirements

The size of the final sort file (tablename.SRT) depends on the amount of column data in the table, the number of unique words in the table, and the number of words in the stop file. Some typical siz-es are shown in Table 5-2.

Table 5-2 Size of Final Sort File (an example)

The indexing engine merges the final sort file with index informa-tion from the existing index files, and creates the temporary files that contain the updated index information. The total size of the updated files will not exceed the total size of the old index files added to the size of the final sort file.

The updated index files exist concurrently along with the old ones at this point in the indexing process. The old files are removed, and the updated ones become the new dictionary and reference file.

If the indexing terminates before all the rows are indexed, because of the limit imposed by the TEMP_FILE_SIZE parameter, indexing is repeated until all of the index information is merged into the new index files. Whenever the first intermediate sort file reaches the size specified in TEMP_FILE_SIZE, the indexing engine moves on to the next phase.

Space Maximum Space Requirement (bytes)

Peak 2 x (TEMP_FILE_SIZE)

End TEMP_FILE_SIZE

Table Size (bytes) Final Sort File Size (bytes)

100 KB 50 KB

4 MB 1 MB

Page 214: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

214 SA-Application Software Expert 5.0

C

The data in the rows that have already been indexed is reflected in the new index files, and the row status is updated so that it can be searched by another application. The indexing engine then returns to index the data in the remaining rows.

If a copy operation fails during indexing due to exhausted file space or a hardware error, the temporary updated index files are removed automatically. If the copy operation fails for some other reason, the temporary files are not removed. For information about how to re-move them yourself, see "Recovering from Indexing Failure," later in this chapter.

Note: While increasing TEMP_FILE_SIZE can reduce indexing time significantly on large tables, it also increases the amount of in-dexing work that has to be repeated if indexing fails for any reason. Keeping TEMP_FILE_SIZE small permits a failed indexing opera-tion to resume close to where it left off.

SearchServer performs disk space checking on a per row basis. This checking attempts to predict if there is enough disk space to com-plete the current indexing pass. If more disk space is required, SearchServer updates the index for the rows it has processed before beginning the next row. This can increase the number of cycles re-quired to complete indexing.

Wildcard Optimization

Wildcard optimization changes can be performed using the WILDCARD_OPT validate index parameter. However, in almost all cases you must discard and completely rebuild the index using the ABANDON validate index parameter.

You can build your index with one or more of the three wildcard op-timizations:

MINIMIZE_INDEX_OVERHEAD This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method.

This option offers search performance nearly as good as MIN_SEARCH_TIME (except for complex searches on CD_ROM) with little more indexing time and storage overhead than NONE.

Page 215: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 215

Note: To change the wildcard option on a table, you must use the VALIDATEINDEX statement that specifies the ABANDON pa-rameter.

For a complete description of the WILDCARD_OPT parameter, see Fulcrum SearchServer SearchSQL Reference.

Optimizing With the WORKDIR and SW2 Directo-ries

Naming a second directory for indexing improves performance con-siderably. You can name a second directory for indexing by adding an SW2= entry to the configuration file, as described in the section "Optimizing with the Temporary Files" later in this chapter.

Depending on the size of the index information, the two directories are written to alternately until all segments have been merged into the final sort file. No more than two sort files (one in each of the two directories) can exist simultaneously during indexing.

If neither the WORKDIR nor SW2 directory is specified, the SearchServer FULTEMP environment variable is used to find a lo-cation for temporary storage. SearchServer must have write access to these directories. For more information about FULTEMP, refer to Fulcrum SearchServer Getting Started.

When you create a table using the CREATETABLE clause with the WORKDIR table parameter, an SWK= entry is added to the config-uration file automatically. You can add an SW2= entry directly into the configuration file to make a second directory available for inter-mediate sort files during indexing. Where you locate these directo-ries can directly affect performance.

MINIMIZE_SEARCH_TIME This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.

NONE No wildcard optimization is enabled for the table. Performance for prefix and infix wildcard searches is substantially reduced. If this parameter is omitted, NONE is assumed unless a SET WILDCARD_OPT statement is used.

Page 216: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

216 SA-Application Software Expert 5.0

C

When using both the SWK= and SW2= entries, their values should name directories on different drives to obtain the best performance. Keeping SWK and SW2 on different drives from the table manage-ment files further improves indexing performance. You can control the location of the table management files with the INDEXDIR table parameter or the FULCREATE server parameter. SearchServer must have write access to these directories.

Careful specification of these locations offers performance benefits during the merging of temporary sort files in that disk seeking time is shortened, or even eliminated. When placed on different drives, SearchServer can read sequentially from one intermediate sort file and write to the other. For example, under UNIX, the entry :

SW2=/usr2/tmp

would optimize performance if usr2 is the mount point for a second disk, and if WORKDIR names a directory on another disk. For more information about editing the configuration file, see Appendix C, "Table Management Files."

Creating the Index

The rows you insert, update, or delete in the table aren't immediately indexed unless the IMMEDIATE table parameter was specified or implied when you created the table. Rows inserted or updated in a table created without the IMMEDIATE table parameter can't be searched until a VALIDATE INDEX statement is successfully exe-cuted.

In the case of an IMMEDIATE table, you can search the new data without executing a VALIDATEINDEX statement. However, to op-timize search speed and index file size, it is recommended that you execute a VALIDATEINDEX statement periodically to reorganize the index.

If you use the row expansion feature, the files in the container row aren't represented in the table until table validation is performed with the VALIDATETABLE parameter of the VALIDATEINDEX state-ment. This is true even if the IMMEDIATE table parameter was specified when the table was created.

CAUTION: When there are indexing failures during a VALI-DATEINDEX operation, the FTT_INDEXLOG column of the TA-BLES system table provides the actual error report. When indexing failures occur, a portion of the indexing might have already complet-ed. Any data retrievals from that table can be inconsistent, because some rows might remain unindexed.

However, once the problem (such as lack of disk space) has been identified and corrected, it is always safe to restart indexing. In this

Page 217: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 217

case, use the UNPROTECT parameter of the VALIDATEINDEX statement. For more information, see the section, "Recovering from Indexing Failure," later in this chapter.

What Happens During Table Validation

Table validation is a phase that can occur as part of the VALI-DATEINDEX process. Its purpose is to synchronize table data with the state of any external documents. Row expansion is performed during table validation. You don't need to request table validation if your application involves no container rows. In this case, your appli-cation takes care of synchronizing table data with the external doc-uments.

During table validation, SearchServer determines the indexing sta-tus of the rows in a table and updates the status accordingly. New rows for any new documents in the directory or library are inserted into the table. Additionally, any rows referring to documents that no longer exist in the directory or library are deleted from the table. No table validation, row expansion, or index update occurs unless the VALIDATEINDEX statement was issued with the VALIDATE TABLE parameter.

The indexing status of a row is updated during table validation. It is reflected in the FT_ROW_STATE column of each row. FT_ROW_STATE can have a value of NOT_YET_INDEXED, IN-DEXED, UPDATED_SINCE_LAST_INDEXING, CANNOT_BE_INDEXED, or no value (for DIRECTORY row types), subject to the following explanation.

During table validation, SearchServer validates all rows associated with external documents and updates the indexing status in the fol-lowing way:

Any row associated with an external document (including container rows) that has been modified since the last VALIDATE INDEX op-eration is marked to be indexed (UPDATED_SINCE_LAST_INDEXING is the value in the FT_ROW_STATE column).

• Any row associated with an external document (including container rows) that can't be found is deleted. Until indexing is performed, this condition is reflected with a value of MISSING in the FT_TEXT_STATUS column.

• A row can be a DATA row or a DIRECTORY row, depending on the type of external reference in the FT_SFNAME column. SearchServer records the type of row in the FT_ROW_TYPE column.

Table Validation Considerations

Table validation is designed to be used with applications that do not update the table directly. In particular, the following rules must be

Page 218: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

218 SA-Application Software Expert 5.0

C

followed to protect the integrity of your table data during table vali-dation:

• You must not have a row locked. The row might be missed during table validation.

• You must not change either the FT_FLIST or FT_SFNAME reserved columns in a row. This can cause a duplicate row or the row might not be detected at all after table validation has completed.

• You must not add a row. In this case, the row might be deleted during table validation.

• You must not delete a row. The row might still be included during table validation.

• You must not change column data within a row. In this case, the new column data might revert back to its previous value.

• If any one of these scenarios occurs, you could have problems with unpredictable results when retrieving data after the table validation has completed.

Periodic Versus Immediate Indexing

The VALIDATE INDEX statement causes an explicit indexing op-eration on a table whether the table is PERIODIC or IMMEDIATE. This statement provides the parameters for specifying how a table's index files are to be updated. It also allows you to assign a temporary buffer and sort file space for the indexing operation.

The general syntax of the VALIDATE INDEX statement is:

VALIDATE INDEX <table name> [<validate index parameter>...]

For example:

VALIDATE INDEX SUPPORT

The complete definition and description of the VALIDATE INDEX statement is located in Fulcrum SearchServer SearchSQL Refer-ence.

If a container row is inserted into a table, that row is not expanded until the VALIDATETABLE parameter is specified in a VALI-DATEINDEX statement. This applies to both periodic and immedi-ate tables.

You can execute a VALIDATE INDEX statement on a table through the ExecSQL administration utility. You can create an indexing script containing several VALIDATE INDEX statements, perhaps one statement for every table in your data source. Then, you can au-tomate the indexing process by executing that script through Exec-SQL at predetermined intervals.

Page 219: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 219

Periodic Indexing

For a PERIODIC table, changes to a row in a table are not reflected in the table's index until a VALIDATE INDEX statement is execut-ed on the table. If you update or insert a row, or make any changes to an external document, the new information can't be searched until the table is indexed. Similarly, deleted rows remain searchable until the index is updated, although any attempt to retrieve a deleted row will fail.

When you issue a SELECT statement, SearchServer examines the index information recorded during the last indexing operation. When table data is modified and the table is not indexed directly af-terwards, the data that SearchServer retrieves might not correspond to the current index information. To keep the index information as up-to-date as possible, we recommend that you index a table after any sequence of operations that add, update, or delete table data.

Explicit Indexing of Immediate Tables

In the case of an IMMEDIATE table, SearchServer updates the in-dex information each time you insert, update or delete a row. The new information can be searched immediately and deleted rows won't be found. However, there are several reasons for doing period-ic indexing even on an immediate table.

First, SearchServer does not keep the index information current for any changes to external documents unless the row associated with the external document is updated too. This synchronization is the ap-plication's responsibility.

The second reason is that the differential index information associ-ated with new or changed text in an immediate table can consume up to 200 percent of the space for the text itself. This is compared to less than 50 percent of the space for the equivalent periodic index.

An IMMEDIATE table benefits from the explicit indexing operation caused by a VALIDATEINDEX statement. The performance of subsequent search and INSERT and/or UPDATE operations is im-proved. An explicit indexing operation reorganizes the index infor-mation to optimize subsequent operations and index file size.

You can perform explicit indexing, searching or updating concur-rently. However, a complete re-indexing (VALIDATEINDEX state-ment with the ABANDON parameter) or table validation (VALIDATEINDEX statement with the VALIDATETABLE pa-rameter) can only be performed when no other table activity is oc-curring.

The Text Reader During Indexing

Page 220: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

220 SA-Application Software Expert 5.0

C

SearchServer uses text readers to read external document text when your application requests table indexing (such as calling the VALI-DATE INDEX statement). Text readers provide a way for Search-Server to process a single type of data stream (FTDF) for indexing purposes; one that is independent of the platform you are using and the way the text is stored.

When a row that references an external document is about to be in-dexed, SearchServer obtains the following information from the row data:

• the name of the file that references the external text (from FT_SFNAME)

• the names of the text readers (from FT_FLIST) that will be used to read the external text and produce an FTDF stream

In some cases, the file named by FT_SFNAME contains the external document, alone or together with other library documents. In other cases, the filename is used by the text reader in combination with pa-rameters in the text reader list as an indirect reference to external text.

What Happens During Indexing

During indexing two things happen: SearchServer interprets any FTDF control sequences encountered in the text stream, and it reads and sorts the words in the source text.

The internal column data stored in the table management files and any external documents associated with the table are indexed at this time. This means that the indexing engine accesses all column data. It uses text readers to access any external documents referenced from within the internal columns.

Figures 5-1 and 5-2 show the indexing process, first using a periodic index and next using an immediate index:

Page 221: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 221

Figure 5-1 Periodic Indexing Process

Figure 5-2 Immediate Indexing Process

Page 222: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

222 SA-Application Software Expert 5.0

C

For each row in the table, the indexing engine reads and indexes the external text first, then the other column data. It repeats this se-quence for all rows that require indexing.

During indexing, the column data is read and the word recognition rules are applied to exact words. The text is segmented by any zone definitions, the words are sorted and stored, and their locations are recorded. When indexing is complete, the indexing status of all rows (including those associated with operating system files, directories, and document library files) is updated.

The indexing state of a data row is recorded in the FT_ROW_STATE reserved column. The NOT_YET_INDEXED value indicates that the row has never been indexed since it was first inserted into the table. The INDEXED value indicates that a VALI-DATEINDEX statement has been executed on the data row since the inserted row was last changed.

The UPDATED_SINCE_LAST_INDEXING value indicates that the data has changed since the last VALIDATEINDEX statement was executed for that table. The null value indicates that the row is a directory or library row.

Figures 5-3 and 5-4 show the life cycle of a data row as it moves be-tween the various possible indexing states:

Figure 5-3State Table of a Data Row for a Periodic Table

Page 223: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 223

Figure 5-4 State Table of a Data Row for an Immediate Table

How Index Modes Affect Indexing

The index modes that are assigned to columns and zones in the sche-ma can affect the way values are indexed. For example, if your col-umn was defined with VALUE index mode, only numeric terms in the column's data value are indexed. Any non-numeric data (such as characters or words) aren't indexed.

The LITERAL index mode causes the indexing engine to treat word separators (such as blanks and punctuation) as part of the word, rath-er than as a separator. If the data you enter for a column must be in-dexed as more than one word, use control codes (such as tab or newline) to act as word separators. If you don't, the indexing engine treats the data as one long word. For a complete description of what constitutes a word in SearchServer, see the section called "Patterns" in Fulcrum SearchServer SearchSQL Reference.

Sometimes you might want a zone to have a different index mode than its column. When this is true, you'll have to use SearchServer index control sequences to bracket the data. These sequences enable and disable the mode at the appropriate time. You must provide these control sequences in addition to the select zone sequences that are used to delimit zone segments in the text, either embedded in the data itself, or inserted dynamically in the FTDF text stream output by a text reader. The following section provides this information.

Page 224: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

224 SA-Application Software Expert 5.0

C

LITERAL Indexed Data in a NORMAL Indexed Column

In a column defined with NORMAL index mode, you can have LIT-ERAL indexed data, provided that the zone containing the data is de-fined with LITERAL index mode.

For example, to define the term "file99x.ex0" with LITERAL index mode, you need to create a zone for the term by either manually in-serting the start and stop LITERAL indexing control sequenses (the start and stop zone control sequences must already be there to delim-it the zone data), or by having a custom text reader insert these dur-ing indexing. The zone's data will look like this:

\C213s\C12vfile99x.ex0\C11v\Cs

where \C213s and \Cs are the start and stop zone control sequences, and \C12v and \C11v are the start and stop literal index mode control sequences, and 'file99x.ex0' is the actual data to be indexed. In the table's schema, the zone must be declared as literal index mode:

CREATE ZONE JFILE (213) LITERAL

This zone must be declared as part of a column in the table by using a domain. Since the data is in a document, and not in a script to insert data, the zone should be declared as part of a domain that makes up the TEXT_LOG column. For example,

CREATE ZONE DESCRIPTION (32)CREATE ZONE JFILE (213) LITERALCREATE DOMAIN LOG_DMN (DESCRIPTION, JFILE) AS APVARCHARCREATE TABLE SUPPORT (TEXT_LOG LOG_DMN 32)

You can then search on the JFILE zone and display its data as part of the TEXT_LOG column. For example,

SELECT TEXT_LOGFROM SUPPORTWHERE JFILE CONTAINS 'file99x.ex0'

To display only the jfile information, its data would have to be in-serted in to the catalog file via a custom text reader or via a script, such as exec/ins001.fte is used for the operator column, which cor-responds exactly with the data in the creator zone.

CAUTION: Fulcrum software is not designed to support literally indexed data in a normally indexed zone.

Searching for part of the data in a literally indexed zone returns no rows.

Page 225: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 225

Recovering from Indexing Failure

The most likely reasons for indexing failure are:

• disk space runs out

• write permission is denied

• the disk is bad (hardware failure). This condition can cause the table data or index to appear to be corrupted.

• failure in a custom text reader

• application failure is also a possibility.

Note: Even when indexing fails, some of the table data could have already been indexed and updated in the table's index files. The up-dated data is valid and can be searched.

How to Use the Index Log File

When a VALIDATEINDEX statement terminates with SQL_SUCCESS_WITH_INFO or SQL_ERROR, always investigate the cause of indexing failure before you attempt to re-index a table. Search-Server maintains an index log that you can read by running ExecSQL and executing a SELECT statement on the FTT_INDEXLOG column of the TABLES system information table. For example:

SELECT FTT_INDEXLOG FROM TABLES WHERE TABLE_NAME CONTAINS 'SUP-PORT'

Note: If the index log gets too large, you might not be able to view the entire contents of the FTT_INDEXLOG column. In this case, you can view the .log file that is located (by default) with the other maintenance files.

You would get a result similar to the following if a previous attempt to index the SUPPORT table failed because of a read/write access problem:

RowCount = 1 NumResultCols = 1: FTT_INDEXLOG, values: ('8: starting indexing at Fri Feb 12 15:06:25 -- 6.0E18204: 17 documents 207: 1728 words processed156: size of catalog files: 5612 157: size of index files: 18790 9: ending indexing at Fri Feb 12 15:06:28

Page 226: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

226 SA-Application Software Expert 5.0

C

8: starting indexing at Fri Feb 12 16:39:00 -- 6.0E18 109:SearchServer: read and write access to /usr/myhome/fulcrum/fultext/support.dct is re-quired 9: ending indexing at Fri Feb 12 16:39:00 ') ***

Unprotecting a Table

After an indexing failure, a table remains locked against further in-dexing and removal. You can execute a VALIDATE INDEX state-ment with the UNPROTECT parameter to clear this condition and attempt to re-index the table. In addition, you can use the UNPRO-TECT statement to unlock the table, remove any temporary files, and access the index files and stop file so that the table can be re-in-dexed (or dropped).

To unprotect a table, use the UNPROTECT statement:

UNPROTECT TABLE <table name>

Removing Temporary Files

You can delete any temporary files that remain from an unfinished indexing run. They're located in the WORKDIR and/or INDEXDIR directory, according to the value(s) in the FTT_WORKDIR or FTT_INDEXDIR columns of the TABLES system information ta-ble. There might also be temporary files in the directory named by the FULTEMP environment variable or server parameter. They can be recognized by their filenames.

As long as there aren't any SearchServer applications running, clear all files having names of the following format (where n represents any digit):

ft<nnnnnnnn>.tmp fti<nnnnn>.*

Under Microsoft Windows, SearchServer can also create temporary filenames with the following format:

<xxxxxxxx>

where x represents any numeric or alphabetic character. Do not de-lete files having names of this form if there is any possibility that an application other than SearchServer has created them and might still need them.

Page 227: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 227

Recovery Procedures

If You Run Out of Disk Space

1. Make more disk space available. Providing adequate disk space is essential. Always estimate the disk space SearchServer requires before you index a table. If insufficient disk space is available. you can reduce the requirement by decreasing the TEMP_FILE_SIZE parameter. You could also relocate one or more of the directories used for temporary sort files to a different file system by modifying the FULTEMP environment variable or by changing the SWK= and SW2= entries in the table's configuration file.

2. Retry the indexing operation with the UNPROTECT parameter to remove any temporary files and unlock the table.

If Write Permission is Denied

1. Gain read and write access to the file system and directories needed by the indexing process according to the procedure for your partic-ular operating environment.

2. Retry the indexing operation with the UNPROTECT parameter to remove any temporary files and unlock the table.

If the Data Appears to be Corrupted

1. You might have to perform one or all of the checks described in the section to determine the problem.

2. Retry the indexing operation with the UNPROTECT parameter to remove any temporary files and unlock the table.

Verifying Index Files

If indexing fails and no other cause has been determined, you can use the ftidrck utility program to check the validity of the existing index files.

Note: ftidrck is not supported for remote tables.

Standard exit values are returned. A value of zero indicates success, and non-zero indicates failure. As each error is detected, ftidrck writes an error message to the standard error stream.

The following example shows the output from ftidrck when the in-dex files associated with the SUPPORT table are intact:

ftidrck support317:ftidrck: checking collection support DCT = /usr/myhome/fulcrum/fultext/sup-

Page 228: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

228 SA-Application Software Expert 5.0

C

port.dct REF = /usr/myhome/fulcrum/fultext/sup-

port.ref318:ftidrck: collection support OK

Any output message that is different from the ones shown above, in-dicates a problem with the index files. For a complete description of the ftidrck utility program, see Appendix A, "Utility Program Sum-mary."

Verifying Table Data Files

To check the table data files for consistency after an indexing fail-ure, create a script to run ftcout at regular intervals (perhaps night-ly).

This utility returns standard exit status values. A value of zero indi-cates success, and non-zero indicates failure. For each error encoun-tered, a message is written to standard error and ftcout continues (if possible).

The format of the output text file conforms to the syntax required by the ftcin utility. Any rows that have been deleted are not written to the output.

For example, to check the table data files of the SUPPORT table but suppress checking of the indexing status of its external documents, enter:

ftcout support -x -t/dev/null

Any error messages are written to the standard error output.

For a complete description of the ftcout utility program, see Appen-dix A, "Utility Program Summary."

Updating and Deleting Rows in theTable

The UPDATE statement modifies data in the rows of your table and the DELETE statement removes the rows of data from your table. Both of these statements have two forms: positioned and searched. The positioned form requires application-level programming and is described in your Fulcrum SearchServer or SearchBuilder Develop-er's Guide. The searched form is described here and includes an op-tional WHERE clause to determine the rows to be updated or deleted.

What Happens During a Delete or Update?

When you execute a searched DELETE statement, the rows are de-leted from the table once the statement has finished executing. Any

Page 229: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 229

attempt to retrieve data from that row fails and SearchServer returns an error message. Any subsequent SELECT statement won't be able to retrieve the data from the deleted row.

If you are deleting a row that references an external document, a DE-LETE statement only deletes the reference to the external document and not the actual external document file. If a DELETE statement is applied to a row that was automatically generated by the indexing engine as a result of row expansion, the corresponding row is regen-erated when table validation is next performed, unless the associated external document file has also been deleted. To delete a container row and all the previously generated rows, you can remove or re-name the container and then perform table validation.

For an IMMEDIATE table, the UPDATE and DELETE statements also update the index so that the changed or deleted information is immediately reflected in subsequent searches. Otherwise, the new or replacement data isn't searchable until a VALIDATE INDEX state-ment has been successfully executed. In this case, replaced or delet-ed data remains searchable (if it was previously indexed).

SearchServer won't execute an UPDATE statement for a row that has been marked for deletion by a DELETE statement. A searched DELETE or UPDATE statement will not operate on any row that has been modified since it was last indexed. This prevents any mod-ification to a row that might no longer satisfy the search condition that caused it to be selected. However, if your search criteria in-cludes the FT_CID reserved column, SearchServer modifies the row.

Using the UPDATE and DELETE Statements

You can update table data using the UPDATE statement. This state-ment provides the parameters for specifying which values are to be merged with existing column values for the selected rows.

The general syntax of the searched form of the UPDATE statement is:

UPDATE <table name> SET <set clause item> [{, <set clause

item>}...] [WHERE <search condition>]

CAUTION: If you used the row expansion feature to insert rows into the table, any column data subsequently added to those rows (using the UPDATE statement) could be deleted if the location of the associated external document is altered afterwards in any way.

When you want to delete table data, use the DELETE statement. The syntax of the searched form of the DELETE statement is:

Page 230: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

230 SA-Application Software Expert 5.0

C

DELETE FROM <table name> [WHERE <search condition>]

You can find complete definitions and descriptions of the UPDATE and DELETE statements in Fulcrum SearchServer SearchSQL Ref-erence.

The table name identifier for both statements is used to identify the table that you want to work with. This name must be the valid name of an existing table. The optional WHERE clause specifies the crite-ria for selecting rows from a table in the form of a search condition that is TRUE for each row selected to be deleted or updated.

The search condition can be a simple test (predicate) or a combina-tion of several tests. All rows for which the search condition is TRUE are deleted or updated. Omitting the WHERE clause results in all of the rows in the table being deleted and updated.

CAUTION: Because SearchServer doesn't support relational data-base transactions, rollback of updated or deleted rows isn't possible. Therefore, these statements should be used with care.

One way to avoid accidental deletion of rows is to perform a SE-LECT statement using an identical WHERE clause. You can then verify that your search condition matches the desired set of rows.

SearchServer obtains an exclusive lock on the table for the duration of the operation. Execution of the searched UPDATE or DELETE statement will fail if the attempt to obtain an exclusive lock fails.

The set clause item in the UPDATE statement specifies the column name and a literal value or NULL. Any column name can't be spec-ified more than once in an UPDATE statement. You can update the values of some reserved columns. For more information, see the sec-tion, "Assigning Values to Reserved Columns," earlier in this chap-ter.

Specifying the Data for an Update

An update value can be either a literal or a null value. Literals are of three different types, as described in the section "Inserting Data into the Table" earlier in this chapter.

Backing Up and Restoring Table Data

You must do regular backups of your tables and make sure the back-up data is not corrupted if you want to be able to recover from any unforeseen loss of data. This includes the table management files and the table support files. You can optionally back up any external

Page 231: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Maintaining the Data

Text Retrieval Guide 231

documents associated with your tables. These preventative measures can be done nightly through an automated process. There are many file backup/restore software packages available on the market suited to this purpose.

To keep track of any modifications to the table data since the last backup, keep a log of all update activity, including any INSERT, DELETE, and UPDATE statements executed on your tables. If nec-essary, this transaction log can be used in a script to restore the back-up table data to the most accurate state.

Always restore the table management and support files first, then up-date all transactions since the last backup by executing the transac-tion log script using ExecSQL (or use your own application).

You can run the ftcout utility program before each backup to verify the consistency of the table data files. It unloads the table data into a proprietary text file format, and checks whether the data's been cor-rupted as it unloads the data. You don't need to keep the output file.

Checking the Consistency of Table Data Files

To check the table data files for consistency before backing up the data, arrange for ftcout to be run at regular intervals (perhaps night-ly).

This utility returns standard exit status values. A value of zero indi-cates success, and non-zero indicates failure. For each error encoun-tered, a message is written to standard error and ftcout continues (if possible).

The format of the output text file conforms to the syntax required by the ftcin utility. Any rows that have been deleted are not written to the output.

For example, to check the table data files of the SUPPORT table but suppress checking of the indexing status of its external documents, enter:

ftcout support -x -t/dev/null

For UNIX systems, /dev/null is used so that the file isn't created and no disk space is used. For systems where /dev/null can't be specified, you must provide an actual filename. Any error messages are written to the standard error output.

For a complete description of the ftcout utility, see Appendix A, "Utility Program Summary."

Checking the Consistency of Index Files

It is not necessary to back up the index files, because they can always be rebuilt from the data. However, you might want to back them up

Page 232: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

232 SA-Application Software Expert 5.0

C

along with the data files to save the time required to re-index the complete table in the event of a loss of the index files. In this case, you can also run the ftidrck utility before each backup to verify the consistency of the index files. For example:

ftidrck support

For more information about the ftidrck utility, see Appendix A, "Utility Program Summary."

Page 233: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Altering the Table

Text Retrieval Guide 233

Chapter 6:

Altering the Table

This chapter provides detailed information about changes to the ta-

ble such as:

• changing the table's schema

• dropping the table

• moving the table and its associated files

Introduction

After you've created, populated, and used your table, you might find it necessary to change various attributes of that table. For example, it might be found that the schema is not complete enough for the ap-plications that are using it. In this case, perhaps a new column must be added or that your environment is changing and you need to move the table to another server.

These are just two examples of ways that you might have to alter your table. This chapter describes the various considerations that must be made when altering some attribute of your table.

Changing the Schema Definition

There are two methods you can use to change the schema of an ex-isting table:

• replace the entire schema

• add or drop an individual column

Note: If you execute the examples in this chapter on the SUPPORT table, you will change the results for the examples in the subsequent chapters. If you want to execute these examples (and still receive the same results as those given in subsequent chapters) you should rec-reate the SUPPORT table after completing this chapter.

Replacing the Schema Definition

The CREATE SCHEMA statement uses the REPLACE option to overwrite an existing schema. You can use it to add, delete, or re-name a column, or to change a column's data type or index mode. These operations alter SearchServer's interpretation of your table's data and must be carried out with great care.

When the REPLACE option is specified, SearchServer replaces the

Page 234: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

234 SA-Application Software Expert 5.0

C

existing schema with the new one specified in the CREATE SCHE-MA statement. The schema is completely overwritten, but the data values in the table aren't affected. You must reference all of the ex-isting information that you want to retain and include the changes you want to make.

There are some attributes of the schema that cannot be changed. For example, the table parameters specified in the CREATETABLE clause cannot be changed. To change these parameters, you must re-create the entire schema.

The reserved columns can be named (or renamed, if permitted) when using the CREATESCHEMA statement that specifies the RE-PLACE option. You can also change the index mode of particular columns. However, the data type and field number of these reserved columns cannot be changed. Naming a reserved column explicitly in the schema make the column visible in the COLUMNS system table.

CAUTION: You must ensure that no other application processes are referencing the table when you execute a CREATE SCHEMA statement that specifies the REPLACE option on it; otherwise, un-predictable results can occur.

An immediate consequence of almost all schema changes is that the index is no longer accurate. The only exceptions to this are changing the name of a column or adding or deleting a column. For all other changes, you should correct the index by executing a VALI-DATEINDEX statement that specifies the ABANDON parameter before any further searching is performed on the table.

If you've already inserted data into a table, you should be aware that existing column data is identified in the table management files by field number. The field numbers associated with existing data don't change if you assign different column names to field numbers. The only way to change the field numbers of existing data is to rewrite (update) the existing rows.

Adding or Deleting a Column

You can add or delete a column either by using the CREATESCHE-MA statement with the REPLACE, or by using the ALTERTABLE statement. The schema definition would be the same as the previous one, with the exception that one more (or one less) column would be defined.

If you add a column with either the REPLACE option of the CRE-ATESCHEMA statement or the ADD option of the ALTERTABLE statement, SearchServer sets all data values for the new column in the existing rows to NULL (except when the new column renames a reserved column).

Page 235: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Altering the Table

Text Retrieval Guide 235

When you add a column, if the field number matches the field num-ber of a column that was previously dropped, the original column is restored with the new attributes specified in the column definition of the ALTER TABLE statement or CREATETABLE clause. Unless the data has been deleted using the UPDATE statement, the data from the original dropped column is now accessible through the re-stored column.

However, if the column definition specifies the name of a column that has been previously dropped, but the field number in the column definition is not explicitly specified, the original column is not re-stored. Instead, the new column is created and the value for that col-umn in all rows in the table is NULL. You can now use the UPDATE statement to store data values in that column. You must re-index the table using the VALIDATE INDEX statement so that the data can be searched.

For example, the following adds the DOC_DATE column to the ta-ble's schema to rename the FT_DATE reserved column:

ALTER TABLE SEARCHMEADD DOC_DATE 31

Note: If you've dropped a column from the schema, you won't be able to search or retrieve the associated data. However, the data for that column continues to occupy file space. To avoid this situation, you must update the column to be dropped with a NULL value for every row before altering the schema.

When the ALTER TABLE statement is executing, you can't perform a delete, update, or insert operation on the table. SearchServer reads the schema is read, and the column specified is added or flagged for deletion from the schema. Once the statement has finished execut-ing, deletes, updates, and inserts are again permitted.

Changing a Column's Data Type or Index Mode

You can change a column's data type using the REPLACE option of the CREATESCHEMA statement. The new schema definition will be the same as the previous one, with the exception that the data type specified for the column being changed would be different. Search-Server assigns each data type a default index mode.

Alternatively, you can define a domain in the CREATE DOMAIN clause of the CREATE SCHEMA statement and associate it with the column to override the column's data type and/or index mode. What-ever data type and index mode you specify for the domain becomes the data type and index mode associated with the column.

Page 236: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

236 SA-Application Software Expert 5.0

C

CAUTION: It is important to understand how all data conforms to the new attributes of the column. If you change the data type or in-dex mode of a column that contains data, you must re-index the table by executing a VALIDATE INDEX statement that specifies the ABANDON parameter on the table before doing any searching.

You can also change the data type of an existing column using the ALTERTABLE statement by first dropping the column and then adding it back to the table and specifying a different data type. How-ever, because you can't specify a domain in an ALTERTABLE statement, you can use only the pre-defined data types and their de-fault index mode.

Controlling Concurrency

When you modify a table, you must ensure that the users of the table get a consistent view of it and that you have exclusive access to it. SearchServer accommodates two levels to provide exclusive access to the table management index files: row locking and table protec-tion. Row locking and table protection are not mutually exclusive operations.

A row can be locked when a table is protected. Conversely, a table can be protected when a row is locked.

Row Locking

A locked row ensures that updates can occur without another appli-cation retrieving or indexing the affected row in an intermediate state. This mechanism is also used during retrieval to control row updates. For more information about row locking and concurrency issues during retrieval, see the Fulcrum SearchServer Developer's Guide for your working environment.

The default, when a table is created is to have row locking enabled. This is also referred to as high integrity mode. When a table is cre-ated, you can specify the NOLOCKING table parameter in the CRE-ATETABLE clause to disable row locking.

You can change the locking attribute of a table after it has been cre-ated using the ftlock utility. This utility has two options:

• the -h option is equivalent to the ROWLOCKING table parameter of the CREATETABLE clause and will enable row locking

• the -c option is equivalent to the NOLOCKING table parameter of the CREATETABLE clause and will disable row locking

• the value in the FTT_NOLOCKING column of the TABLES system table indicates whether or not row locking is enabled.

Page 237: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Altering the Table

Text Retrieval Guide 237

Note: Locked rows are skipped during indexing. They will be in-dexed during subsequent indexing operations if the row is unlocked and if indexing is needed.

Table Protection

A protected table can be searched or updated, but the associated ta-ble management index files (with the exception of the immediate in-dex) can't be changed until it is unprotected. When you protect a table, you're ensuring that no other application will interfere with your changes, jeopardizing the integrity of the new table manage-ment files.

A table's protection is universal. When one application protects a ta-ble, no other application can perform any of the noted operations. However, any application can release a protected table.

You will want to coordinate the use of applications or administrative procedures that need the table in a protected state. You will not want one application to undo the table protection that another has set.

When a table is protected, you can still search the table, update rows, or insert rows. However, you cannot drop or index the table.

Note: Table protection does not apply to the external documents. They can be changed at any time by any process.

You can protect a table with the PROTECT statement. Conversely, you use the UNPROTECT statement to unprotect the table.

To index a table and unprotect it immediately before the indexing operations starts, you can use the UNPROTECT parameter of the VALIDATEINDEX statement. Using this parameter ensures that no other application will prevent the indexing operation.

Moving the Table

A table comprises the table management files and, optionally, one or more external documents. One or more tables can reference the same stop file and other support files. When you move a table, you must consider where these files are presently located, and where they will fit into the directory file structure after they are moved. You should also consider whether the directory used for temporary work space during indexing will need to be changed.

Before you begin, you should follow these steps:

• Display the name of the directory being used to store the table management files by examining the FTT_INDEXDIR column of the TABLES system information table.

Page 238: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

238 SA-Application Software Expert 5.0

C

• If external documents are associated with the table, find the directory in which they reside by examining the FTT_BASEPATH column of the TABLES system information table.

• Display the name of the directory used for temporary work space by examining the FTT_WORKDIR column of the TABLES system information table.

• You can tackle the task of moving a table as two independent sub-tasks:

• When the table management and/or support files need to be moved, the FULSEARCH server attribute might need to be changed. See the sections "Moving Table Management Files," and "Moving Table Support Files," later in this chapter.

• When the external documents associated with a table are to be moved, a small change to the table's configuration file is all that's required (subject to the conditions explained in the section "Moving External Documents" later in this chapter).

Moving Tables Between Systems

The table management files are binary files stored in a portable for-mat. You can move or copy tables between systems if you have a way to transfer binary files.

SearchServer uses timestamps to optimize performance of indexing and table validation, and to set the value of the FT_ROW_STATE reserved column. To minimize the impact of moving tables between systems and time zones, timestamps are stored in UCT form. To use this feature, you must preserve the timestamps on all files and direc-tories that are moved.

Some operating systems do not interpret UCT correctly under all cir-cumstances. As a result, timestamps can become invalid when exter-nal documents are moved between different operating systems, or even between different versions of the same operating system. To determine whether this is the case, ensure that the value in the FT_ROW_STATE reserved column is INDEXED. When the table is moved to its new location, confirm that this reserved column has the same value.

If during the move, the value in FT_ROW_STATE changes to UPDATED_SINCE_LAST_INDEXING, you must regenerate all timestamps using the VALIDATEINDEX statement specifying the ABANDON parameter (and the VALIDATETABLE parameter, if necessary).

If the table is read-only (for example, on a CD-ROM), the CHECK_TEXT_STATUS server attribute of the SERVER_INFO system table should be set to 'FALSE'. This also applies in TFA en-vironments. Therefore, transparent file access is not supported for table update, indexing, or schema modification in heterogeneous en-vironments.

Page 239: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Altering the Table

Text Retrieval Guide 239

Moving Table Management Files

There are two ways to accomplish this task. You can move the table management files then specify where they've been relocated as fol-lows: 1. Move all the files including the table's configuration file.

2. Make sure the new location is included in the list of directories specified in FULSEARCH. Information about how to set FULSEARCH for finding tables is provided in Fulcrum Search-Server Getting Started.

3. Alternatively, you can specify the new location in the configuration file:

4. Move all the files except the table's configuration file.

5. Edit the table's configuration file to show where the other table management files have been relocated.

For more information about editing the configuration file, see Ap-pendix C, "Table Management Files."

When you move table management files, you can resume searching on the table right away. Re-indexing isn't required.

Moving Table Support Files

If you intend to move any of the table support files to another direc-tory or file system, you must change the value of the FULSEARCH server attribute so that SearchServer can determine where to locate these files after the move.

When you change FULSEARCH, you can resume searching on the table right away. Re-indexing is not required. Information about set-ting server attributes such as FULSEARCH is provided in Fulcrum SearchServer Getting Started.

Moving External Documents

CAUTION: If you used the row expansion feature to insert rows into the table, any column data that was subsequently modified or added to those rows will be deleted if the value in the FT_FLIST and FT_SFNAME reserved columns of the associated external docu-ment is changed. If these reserved columns do not change, all col-umn information remains intact.

You can move external documents very easily when the following two conditions are met:

Page 240: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

240 SA-Application Software Expert 5.0

C

• all the external documents will be moved but their positions (relative to each other) in the file system will be preserved

• the external documents are currently referenced by a relative path name in the FT_SFNAME reserved column

As long as these two conditions are met, you can effectively change the value of the table's BASEPATH parameter to reflect where the external documents have been relocated. However, you can't do this by executing a CREATE SCHEMA statement with a REPLACE op-tion with a different BASEPATH parameter.

The text reader lists used to access these documents shouldn't de-pend on any platform-specific text reader defaults.

Changing the BASEPATH

The only way to change a table's BASEPATH is to edit or add the PTH= entry in the table's configuration file. This file can be found in the location specified by the value in the FTT_LOCATION col-umn of the TABLES system table. The format of the configuration file is explained in the section, "Editing the Configuration File," in Appendix C, "Table Management Files." You can change the PTH= entry as described in that section.

The PTH= entry defines where a table's external documents reside. You can move the external documents and change the path in the PTH= entry any time, without having to re-index the associated ta-ble afterwards.

By default (when there is no PTH= entry in the configuration file), the external documents associated with a table are assumed to be lo-cated relative to the directory that holds the table's configuration file.

When you edit the PTH= entry, you cause SearchServer to search for the external documents in a different directory. Figure 6-1 helps to illustrate this principle. Assuming a UNIX directory structure as fol-lows:

Figure 6-1How SearchServer Searches for External Documents

Page 241: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Altering the Table

Text Retrieval Guide 241

A table named searchme, which comprises the two external doc-uments in a directory called docdir, can be created with the fol-lowing statement:

CREATE SCHEMA SEARCHME CREATE TABLE SEARCHME (TEXTLOG APVARCHAR) BASEPATH '../docdir'

As long as the FULCREATE server attribute is set to /usr/my-home/fulcrum/fultext, this statement creates a configura-tion file and the other table management files in the /usr/myhome/fulcrum/fultext directory.

The explicit BASEPATH parameter causes the configuration file to contain a PTH= entry that looks like this:

PTH=../docdir/

In this example, SearchServer would attempt to locate the external documents relative to the directory /usr/myhome/fultext in the absolute directory location /usr/myhome/fulcrum/docdir

To locate an external document, SearchServer uses the location of the configuration file, followed by the value of the PTH= entry, fol-lowed by the filename. If you move all of the external documents, you must edit the PTH= entry to show the new location.

For example, if you move the documents to another file system, to the directory /usr2/public/doc, the new PTH= entry would be

PTH=/usr2/public/doc

Changing the Temporary Work Directory

To change the name of the WORKDIR directory, you can edit the SWK= entry in the table's configuration file. It is important to re-member that SearchServer must have write access to the directory.

You don't have to re-index the table afterwards. However, if you've defined a second work directory for intermediate sort files and you want to rename it too, you should edit the configuration file to change both the SWK= and SW2= entries.

When you edit the table's configuration file, you don't have to change the schema. Follow the instructions in the section, "Editing the Configuration File," in Appendix C, "Table Management Files" to change the names in the SWK= and/or SW2 = entries.

Changing the Stop File

Page 242: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

242 SA-Application Software Expert 5.0

C

To change the name of the stop file being used, you can edit the STP= entry in the table's configuration file. Alternately, you can use the same stop filename, but change the internals of the stop file. This process is described in Chapter 5, "Maintaining the Data."

If you do provide a different stop file or change the internals of the existing one, you must re-index the table. If you fail to do this, sub-sequent searches report that the data has changed since the last in-dexing request.

Dropping the Table

Use the DROP TABLE statement to delete a table. When you drop a table, the table management files are deleted. However, the table support files and external documents that were associated with it are not deleted. SearchServer might require the support files for other ta-bles and the operating system files that contain the external docu-ments are stored separately from the index information in the table management files.

CAUTION: Don't delete a table while other users might be access-ing it. Otherwise, they could encounter unpredictable results.

An example of the DROPTABLE statement is:

DROP TABLE SUPPORT

This statement fails if the table has been protected against removal or indexing by:

• a PROTECT statement

• a failed VALIDATE INDEX statement

You can remove the table protection by executing an UNPROTECT statement.

A complete definition and description of the DROP TABLE state-ment is located in Fulcrum SearchServer SearchSQL Reference.

Deleting a View

You can't use the DROP TABLE statement on a view. However, you can remove a view without removing its component tables by delet-ing the view file (viewname.CFG). For more information about view files, see the section, "Defining a View to Multiple Tables," in Chap-ter 3, "Structuring the Data.".

Page 243: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 243

Chapter 7:

Providing Support Files for Searching

This chapter provides information about how to create support files

that influence the way searching is performed. These files control:

• character variant generation

• thesaurus expansion

• ordering of text strings during sorting

Creating a Character Variant Rules File

The character variant generation feature allows SearchServer to treat typographical variants of the same word as equivalent for search purposes. This ensures that potential mismatches due to subtleties in language or other external restrictions are avoided. For example, SearchServer can be instructed to include the German word, Frühling as an equivalent form of the word Fruehling in a query.

Character variant generation is controlled by the set of rules con-tained in a character variant rules file. The rules can contain instruc-tions for removing or inserting accents, and instructions for modifying the suffix of a query term. SearchServer supports En-glish, French, German, and other European language character vari-ants.

There are three character variant rules files supplied with Search-Server. The FULTEXT.FTL character variant rules file appends the suffixes s and 's to each word. The GERMAN.FTL character variant rules file supports the typographic equivalents used for ä, ö, ü, and ß.

The FRENCH.FTL character variant rules file equates accented characters with a non-accented counterpart (for example, the char-acter e is replaced by the three accented forms é, ê, and è). You can modify one of these three sample character variant rules files or you can create a new character variant rules file using a text editor.

You can test new word variant rules with the fthtest utility program, and optionally compile the rules with a thesaurus source file into a thesaurus object file using the fthmake utility program. More infor-mation about modifying and testing the sample character variant rules files is provided later in this chapter.

The name of the character variant rules file you want to apply at search time can be specified through the SET CHARACTER_VARIANT statement. In a distributed environment,

Page 244: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

244 SA-Application Software Expert 5.0

C

the rules file specified in the SET CHARACTER_VARIANT state-ment must reside on the same node as the table being searched. SearchServer locates the file through the FULSEARCH server pa-rameter or environment variable. If the rules file can't be read, char-acter variant generation is disabled without warning: the search is still attempted, but no search term variants are generated.

What's in the Character Variant Rules File?

Variant generation operates on the assumption that the target string to be substituted can be completely replaced by another, regardless of context. The rules can include removing or including the accents in a query term, or modifying the suffix of a query term.

Each substitution causes a variant form to be added to the search along with the original search term. For example, given a rules file which specified replacement of every e by the three accented forms é, ê, and è, the original word donne would generate a search for donne, donné, donnê, and donnè. A replacement string can't be modified by another rule.

The maximum number of rules per file is 40, and a maximum of 30 substitutions can be applied simultaneously to a given word. If one of these limits is exceeded, SearchServer rejects the query. If the for-mat of a rule doesn't conform to the syntax specified below, Search-Server could reject the query.

Syntax and Semantics of the Rules File

Each rule in the character variant rules file must be on its own line until the end of the file is reached. A rule has four fields, each with a specific starting column and maximum length as follows:

Figure 7-1Rules Fields

The target and replacement strings must be padded with space char-acters when they occupy less than four characters.

Field Name What it Contains StartingColumn

Length

substitution code a colon (:) to indicate substitution anywhere within the word, or a percent sign (%) to indicate a suffix is to be replaced

1 1

target string the part of the word to be replaced

2 <=4

replacement string

the string that replaces the target string

6 <=4

end of rule line feed (x0A) or EOF (end of file)

6-10 1

Page 245: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 245

A suffix matching rule can have an empty target string. In this case every original term generates a character variant that has the replace-ment string appended as a suffix. Suffix rules are applied only to an ordinary word by itself or as the last component of an implied phrase. For example, given the query terms friend% and micro-computer, the suffix rules are applied to computer only.

Suffix rules are not applied to single-character words. The same rule applies to the last component of an implied phrase, where the last component must contain at least two characters to be eligible for suf-fix substitution.

CAUTION: The total number of character variants generated from a single query term can become very large when several substitution rules apply. Because SearchServer must look up each generated variant form in the index, a large number of variants (more than a few hundred) can cause an unacceptably slow response, even if only a few variants actually occur in the table.

Character variant generation is applied to stop words. To avoid searches on stop words, all spelling variants of each word in the stop file must be explicitly included in the stop file. For example, in a stop file for the German language, include both für and fuer.

Character Sets

The character variant rules file must be in FTICS. To allow conve-nient editing of this file using a 7-bit ASCII editor, the rules file can contain certain multi-character sequences to allow the representa-tion of all characters in the FTICS.

The rules file is processed in a fashion equivalent to using the test text reader, which recognizes a 5-character sequence (beginning with \Fx and ending with a 2-character hexadecimal representation) as a single character in the FTICS. In the previous discussion of sub-stitution rule syntax, each such sequence counts as only one charac-ter.

Examples of Character Variant Rules

The following rules from FULTEXT.FTL, append the plural suffix s and the English possessive suffix 's to each word:

% S % 'S

In both cases the suffix is separated from the percent sign by exactly four spaces. The following rules from GERMAN.FTL bidirectional-ly substitutes the substring ue for ü:

Page 246: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

246 SA-Application Software Expert 5.0

C

:UE \Fxc8U :ue \Fxc8u :\Fxc8U UE :\Fxc8u ue

In each rule exactly two spaces separate the target and replacement fields.

Note: The character variant rules are case sensitive. Therefore, the sample rules files included with SearchServer contain redundant rules differing only in the case (UPPER and lower) of the letters in the target field.

The case of the letters in the replacement field is not a concern, since SearchServer performs case normalization before dictionary lookup.

One additional rule might be required to extend the equivalence of ue and ü to single-character wildcard matching: an indexed accent followed by an alphabetic character is treated as one character. Ex-tending this to the typographical equivalent substring ue requires the following rule:

:\Fx18 ue

where \Fx18 is a special code that represents a single character wildcard. This rule must contain exactly three spaces separating the target and replacement fields.

Testing the Character Variant Rules

You can test your own custom variant rules file or learn about char-acter variant generation using one of the example rules files with the fthtest utility program. This utility lets you verify how the equiva-lent terms generated by the rules files compare to the original search term.

Placing the Character Variant Rules File

You must place the rules file in a directory that SearchServer can ac-cess. SearchServer searches for the rules file according to the or-dered list of directories specified for locating table management files (see the information about FULSEARCH as an environment vari-able and in the definition of an ODBC data source in Fulcrum SearchServer Getting Started). The typical location for a rules file is in one of the directories named in the FULSEARCH server parame-ter.

If you don't place the rules file in a location according to the ordered list of locations SearchServer uses to find files, the SET CHARACTER_VARIANT statement must always specify the full pathname to the character variant rules file.

Page 247: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 247

If your rules file will be used to search tables on a remote node (through a server other than the local server), it must be accessible to the server on the remote system. You can set the FULSEARCH environment variable for the remote server through the ftserver pro-gram or set FULSEARCH in FULCRUM.INI on the remote server. Fulcrum SearchServer Getting Started contains instructions for do-ing this on server platforms.

Testing

The fthtest utility program searches for the rules file (using the FULSEARCH environment variable or FULSEARCH defined in FULCRUM.INI) and tests this file with the terms you enter on the command line. The format of the fthtest command line is:

fthtest <term> -l<rulesfile> [-c<table name>] [-t<outfilename>]

Enter the terms you want to test on the command line when you in-voke fthtest. This utility looks up the terms and reports the results. The <rulesfile> parameter is the name of the character variant rules file that must be up to eight characters in length plus a three-charac-ter file extension.

The optional -c parameter instructs fthtest to test the generated vari-ants against the specified table. In this case, only the equivalent terms found in the table are reported. Optionally, you can specify the name of an output file to store the results of the test.

The fthtest utility exits when it reaches the end of the input file or when you enter "quit" followed by pressing enter (under MS-DOS), pressing ctrl+z (under 32-bit Windows), or ctrl+d (under UNIX). If an invalid character variant rules file is specified, fthtest returns the following message:

can't open rulesfile The following example records a brief test session with fthtest for the sample character variant rules file FULTEXT.FTL:

fthtest disk -lfultext.ftl = disc = disc's = discs

Creating a Customized Thesaurus

A customized thesaurus file contains rules for generating plural and possessive variants of search terms, long forms of certain abbrevia-tions, and synonyms for selected search terms. This operation is called thesaurus expansion. You can modify the thesaurus source file supplied with SearchServer (SUPPORT.FTS), or create your own source file using a text editor.

Page 248: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

248 SA-Application Software Expert 5.0

C

Using the SET THESAURUS_NAME statement you can specify a default thesaurus file. If no default is set or you want to override the default, you can specify a thesaurus file for each search term in a contains predicate. SearchServer only performs thesaurus expansion on the terms specified in a SearchSQL THESAURUS function in the contains predicate of a SELECT statement.

Thesaurus source files must be compiled with the fthmake utility program, alone or in conjunction with character variant rules. The compiled version of SUPPORT.FTS is supplied as SUPPORT.FTH. You can test a compiled thesaurus file with the fthtest utility pro-gram.

If the contents of the thesaurus source file are written in a character set other than the FTCS94, SearchServer must process the thesaurus source file through the set of text readers needed to translate the con-tents into the format SearchServer recognizes. You can specify which list of text readers to use when you invoke the fthmake utility program to compile your new thesaurus source file.

You should verify the correctness of a new thesaurus file using the fthtest utility program. If you don't, your application could get un-expected results if there is a problem with your thesaurus.

If your thesaurus object file will be used to search tables on a remote node (through a server other than the local server), it must be acces-sible to the server on the remote system. You can set the FULSEARCH environment variable for the remote server through the ftserver program or set FULSEARCH in FULCRUM.INI on the remote server.

Fulcrum SearchServer Getting Started contains instructions for do-ing this on server platforms. If the thesaurus object file cannot be read, thesaurus expansion is disabled without warning. The search is still attempted, but no new search terms are generated.

What's in the Customized Thesaurus?

A thesaurus source file can contain two types of rules: synonym rules and suffix rules.

A synonym rule means wherever the search term x is used, look in-stead/also for the term y. For example:

huge big gigantic enormous;

A suffix rule means whenever a search term ending in the suffix x is used, look instead/also for all words with the same stem, but ending in y. Suffix rules start with a plus sign (+) as the first character. Syn-onym rules don't have a leading character. For example, the first two rules in the following list are suffix rules and the remainder are syn-onym rules.

Page 249: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 249

+ y : y ies ie's; + % s 's; d.e.c. dec dec's : d.e.c dec dec's digital-equipment-corp digital-equipment-corporation digital-equipment-corporation's; dec december; one 1; first 1st; monkey monkeys monkey's; whereas wherefore :;

Syntax and Semantics of the Thesaurus File

A rule can have two logical parts: a left-hand-side (LHS), and a right-hand-side (RHS), separated by a colon (:). The entire rule is terminated by a semicolon (;). Words, phrases, and suffixes are sep-arated by spaces, and rules can span more than one line. The words in a phrase on the RHS must be joined by hyphens or any other stopped punctuation.

If the colon separator and the RHS are missing, the RHS is under-stood to be the same as the LHS. If the colon is present but the RHS is missing, then no alternatives are generated, and the original term is unchanged. This can be used to suppress suffix expansion for se-lected words.

The LHS contains words or suffixes to be matched when a search term is looked up in the thesaurus. The RHS contains a list of alter-native words and phrases (synonyms) or suffixes (plurals and pos-sessives).

When a match is made with one of the entries on the LHS, the orig-inal term is either equated with the alternatives taken directly from the RHS (for synonym rules), or another is formed by combining the search term stem with each of the alternative suffixes on the RHS (for suffix rules).

The RHS of synonym rules should include plurals, possessives, and any other alternatives that should be derived from the term(s) in the LHS. When the same word appears in the LHS of more than one rule, a synonym lookup for that word generates a combined list of alternatives from the RHS of all the matching rules.

Suffix rules are distinguished by a plus sign (+) as the first non-white space character. The LHS and (optional) RHS are lists of suffixes separated by white space. The percent symbol (%) can be used to represent a null suffix.

Suffix lookup proceeds in a way such that the longest possible suffix in the LHS of all suffix rules is matched. Thus, the percent symbol (%) represents the suffix of last resort, and should be used in the LHS of only one rule. It can be used in the RHS of several rules.

Page 250: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

250 SA-Application Software Expert 5.0

C

At search time, synonym rules have precedence over suffix rules. A match between a search term and a word in the LHS of a synonym rule prevents any suffix processing for that term, whether or not any alternatives were generated.

SearchServer applies certain restrictions to the way in which search terms are looked up in the current thesaurus at search time. These re-strictions are:

• never look up stop words

• only look up individual search terms, including words or phrases with embedded punctuation (for example, F.2D), but exclude words containing wildcards, and any word generated by a wildcard expansion, as well as phrases containing embedded spaces

• only look up alphabetic words that have two or more letters

• if "huge:big;" and "big:enormous;" are in the thesaraus, a search on "huge" does not cause documents containing "enormous" to be found.

How Character Variant Generation Interacts with Thesaurus Lookup

The application can request both thesaurus lookup and character variant generation in the same query. Combining the content of the two associated files allows the application to generate meaningful queries while still providing a thorough cross-matching of terms. When character variant generation is enabled, thesaurus possessive generation is disabled. Since the character variant rules file can con-tain suffix rules, this interaction prevents the generation of unwanted terms with double suffixes. When both thesaurus lookup and variant generation are enabled, the-saurus lookup is performed first. This implies that each term pro-duced by the thesaurus can produce its own set of variants afterwards. A large set of equivalent terms can result. Enabling character variant generation doesn't disable the suffix com-ponent of thesaurus lookup. If you're using both the thesaurus and character variant feature, have the suffix rules in only one of the files. You can verify the interaction of these two facilities using fth-test.

CAUTION: The total number of terms generated from a single que-ry term can become very large when several substitution rules apply. Because SearchServer must look up each generated term, a large number of query terms (more than a few hundred) can cause an un-acceptably slow response, even if only a few hits actually occur in the table.

If you want to allow for the possibility of typographical variants in the terms subject to thesaurus lookup, you could explicitly include all possible variant forms in the LHS of each thesaurus rule. Alter-natively, you could save time and do this automatically by compiling

Page 251: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 251

the thesaurus file with a variant rules file.

Examples of Thesaurus Rules

This section describes the rules in the sample thesaurus file, SUP-PORT.FTH. A comment about what the rule does is shown before each rule. The first examples are suffix rules, which usually come first in a thesaurus source file.

•pony produces the alternative list pony, ponies, ponie's:

+ y : y ies ie's;

• any of pit, pits, or pit's produce all three forms:

+ % s 's;

Note that the above rules don't include the suffixes s' or ies'. Since the SearchServer character classes associated with the word indexing rules cause a trailing apostrophe (') to be ignored for index-ing purposes, a search for ponies retrieves ponies' and vice versa (except in a phrase). Therefore, the normal plural possessive suffixes don't need to be included.

The following examples illustrate the various forms of synonym rules:

• d.e.c. or dec produce the alternatives d.e.c, dec, dec's, or various longer forms, as shown below. dec also produces december:

d.e.c. dec dec's : d.e.c dec dec's digital-equipment-corp digital-equipment-corporation digital-equipment-corporation's; dec december;

• one or 1 produce both forms; similarly for first or 1st:

*ne 1; *irst 1st;

• monkey produces monkey, monkeys, or monkey's. This rule overrides the +y... suffix rule above, which would produce monkey, monkeies, or monkeie's:

monkey monkeys monkey's;

• whereas and wherefore have no alternative forms:

whereas wherefore :;

This type of rule is not strictly necessary since alternatives produced by the suffix rules would not likely occur in any document. Howev-er, such rules can improve search performance by preventing the generation of alternatives that would otherwise have to be looked up

Page 252: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

252 SA-Application Software Expert 5.0

C

in the index files. Any words in the thesaurus file that are also in the stop file associated with a table are not looked up.

Compiling and Testing a Thesaurus

You can compile and test your own custom thesaurus file using the fthmake and fthtest utility programs. Both of these utilities search for the thesaurus file by using the FULSEARCH environment vari-able or FULSEARCH as defined in FULCRUM.INI.

The fthmake utility compiles the thesaurus source file and allows you to name the output object file. The fthtest utility is an interactive test facility. It lets you check the object file that fthmake compiled and verify that the equivalent terms returned after the search com-pare to the original search term. These utilities are described in the following sections.

Placing the Thesaurus Object File

To avoid overwriting an existing thesaurus object file, compile your new thesaurus into a temporary object file. After you've tested it, copy or rename it to replace the existing object file.

The object file should be placed in a directory that SearchServer can access. SearchServer searches for the thesaurus object file according to the ordered list of directories specified for locating table manage-ment files (see the information about FULSEARCH as an environ-ment variable and in the definition of an ODBC data source in Fulcrum SearchServer Getting Started). The typical location for a custom thesaurus is in one of the directories named in the FULSEARCH environment variable.

If you don't place the thesaurus file in a location according to the or-dered list of locations SearchServer uses to find files, the SET THESAURUS_NAME statement must always specify the full path-name to the thesaurus object file.

If your thesaurus file will be used to search tables on a remote node (through a server other than the local server), the thesaurus object file must be accessible to the server on the remote system. You can set the FULSEARCH environment variable for the remote server through the ftserver utility program or use FULSEARCH as set in FULCRUM.INI (for 16-bit Windows environments) on the remote server. Fulcrum SearchServer Getting Started contains instructions for doing this on server platforms.

Compiling

The fthmake utility program compiles the thesaurus by reading the thesaurus source file and creating a thesaurus object file. If you're using a character variant rules file in addition to the thesaurus file, ensure that the thesaurus lookup includes any typographical rules duplicated in the rules file. This utility ensures that any duplicated

Page 253: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 253

rules are incorporated into the thesaurus object file and subsequently ignored in the variant rules file.

fthmake can be invoked as follows:

fthmake <sourcefile> <objectfile> [-f<text-reader_list>] [-l<rulesfile>]

where sourcefile is the name of the thesaurus source file to be com-piled (it must contain the three-character extension .FTS), objectfile is the name of the object file to be created (it must have a filename containing eight or fewer characters plus the three-character exten-sion .FTH), and text-reader_list specifies a text reader list to be ap-plied to the thesaurus source file.

Note: The fthmake utility doesn't support any redirection of input or output. Use an explicit value for the sourcefile or objectfile on the command line instead.

The rulesfile is the name of the file that contains the variant rules you want to incorporate into the thesaurus object file. When this optional parameter is specified, each word found on the LHS of a thesaurus source rule is subject to character variant expansion according to the variant rules in the rulesfile. This enables typographic variants of search terms to be eligible for thesaurus expansion.

For example, to rebuild the sample thesaurus file SUPPORT.FTH, switch to the directory in which the corresponding thesaurus source file SUPPORT.FTS is stored and enter:

fthmake support.fts support.fth

You don't have to specify the -f parameter because the default text reader list can be used to read SUPPORT.FTS.

The default text reader list (nti:s) can be overridden by the -f param-eter. If the translation text reader is selected, it translates the source text from the external character set to the FTICS equivalent. If the source text is already represented in the FTICS, a specification for the standard text reader (s) should be used rather than the default text reader specification.

If there are any compilation errors, fthmake writes a message to standard error before it exits. For example, occasionally a source file can't be read, or an object file can't be created due to access protec-tion. A syntax error in the source file (such as a missing semicolon character) would write a message (for example, words at end of file or need right terminator). If a suffix rule is entered in the source file more than once, the message suffix root is in twice is produced.

Page 254: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

254 SA-Application Software Expert 5.0

C

Any problem encountered in writing to any part of object file (in-cluding running out of space in the file system) produces the mes-sage can't write objectfile. If an object file has been created but writing wasn't completed due to an error, it is removed.

Testing the Thesaurus Object File

The fthtest utility program searches for the compiled thesaurus ob-ject file and tests it by performing thesaurus expansion. To test the thesaurus expansion with one or more terms, use the command:

fthtest <objectfile> [<term>...]

To use the full capability of fthtest, specify the command line:

fthtest <term> -h<objectfile> [-c<table_name>] [-t<outfilename>] [-l<rulesfile>]

The objectfile is the name of your compiled thesaurus object file. If you don't specify any terms on the command line (as in the first case above), fthtest prompts you for a term, looks it up in the thesaurus, and reports the results.

You can also enter the term you want to test on the command line when you invoke fthtest (as in the second case above). When there is a term on the command line, fthtest looks it up and reports the re-sults before exiting. Otherwise, fthtest prompts you for a term to ex-pand.

The optional -l parameter causes fthtest to apply character variant expansion after thesaurus expansion. This tests the interaction of the two forms of expansion using the named thesaurus and variant rules files. The rulesfile is the name of the character variant rules file with which you want to test the thesaurus. If you specify a variant rules file, fthtest performs character variant expansion on the terms gen-erated by thesaurus expansion.

The optional -c parameter instructs fthtest to test the generated terms against the specified table. In this case, only the equivalent terms found in the table are reported. You may optionally specify the name of an output file to record the results of the test.

The fthtest utility exits when it reaches the end of the input file or when you enter "quit" followed by pressing enter (under MS-DOS), pressing ctrl+z (under 32-bit Windows), or ctrl+d (under UNIX).

Results are indicated by one of the following messages:

• synonym: followed by a list of derived synonyms

• synonym empty if a matching synonym rule is found with an empty RHS

• suffix: followed by a list of words formed by combining the alternative suffixed with the word stem

Page 255: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Providing Support Files for Searching

Text Retrieval Guide 255

• suffix empty if a matching suffix rule has no RHS

• converts to nothing in response to an input term indicates a failure to read the thesaurus object file after it was opened

• failed to open file for input If an invalid thesaurus is specified

The following example records an interactive test session with fth-test for the sample thesaurus file SUPPORT.FTH.

fthtest support.fth237: enter term: pony 240: suffix: ponie's ponies pony 237: enter term:disc 238: synonym: disk disc disks floppy floppies diskette diskettes

The following example records a session with fthtest to illustrate the interaction between the sample thesaurus file SUPPORT.FTH and the sample character variant rules file FULTEXT.FTL:

fthtest disc -hsupport.fth -lfultext.ftl

The search terms that would be generated in a search on the SUP-PORT table using these support files are:

disk floppy disk's floppy's disks floppys disc floppies disc's floppies's discs floppiess disks diskette disks's diskette's diskss diskettes discs diskettes discs's diskettes's discss diskettess

Thesaurus expansion is applied to the term disc first, which pro-duces the alternative forms disk, disks, disc, discs, flop-

Page 256: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

256 SA-Application Software Expert 5.0

C

py, floppies, diskette, and diskettes. Secondly, the alternative forms are expanded by the character variant rules.

For comparison, the following example applies the same list of terms to the SUPPORT table:

fthtest disc -hsupport.fth -lfultext.ftl -csupport

The search terms that would be generated in a search on the SUP-PORT table using these support files are: disk floppiesdisc diskette disks diskettes floppy

Page 257: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Verifying the Table

Text Retrieval Guide 257

Chapter 8:

Verifying the Table

This chapter explains how to verify the table by performing ad hoc

searching on the data.

Searching

You should always verify the integrity of your table once you have inserted the rows. One way to verify the integrity of a table quickly is to perform a few ad hoc searches on it that encompass all the col-umns and retrieves all the rows.

The following example will return the FT_CID for each row in the table. You can verify this number with the INSERT statement when the table was populated.

SELECT FT_CID FROM support;

The following example will return the external text for the specified rows. Again, you can verify this through the INSERT requests when the table was populated.

SELECT FT_TEXT FROM support WHERE FT_CID IN (3, 4, 5);

Verifying with System Information Tables

Another method of verifying a new table is to use the TABLES, COLUMNS, ZONES, SEARCH_TERMS, and SERVER_INFO system tables.

The TABLES system table can be used to verify that your table is created in the expected location with the expected table parameters. The following example looks for a table called SUPPORT, identifies its location and all its table parameters. You should verify that all these values are what you expect.

SELECT * FROM TABLES WHERE TABLE_NAME CONTAINS 'SUPPORT';

If another table of the same name exists within your data source, you should consider whether you want to have two tables with the same name. To avoid potential problems, it is probably best to rename one.

The COLUMNS system table can be used to verify your application-

Page 258: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

258 SA-Application Software Expert 5.0

C

defined columns. The following example returns all of the applica-tion-defined columns related to the SUPPORT table. You don't need to verify the default columns, so that information is removed by re-stricting the search to the application-defined columns.

SELECT * FROM COLUMNS WHERE TABLE_NAME CONTAINS 'SUPPORT';

If you have any segmented columns, you can verify the relationship of zones to columns. The ZONES system table identifies and de-scribes the zones created with your schema. The following search re-turns all zones that you can then verify.

SELECT * FROM ZONESWHERE TABLE_NAME CONTAINS 'SUPPORT';

The SERVER_INFO system table is the last system table that gives you some information about your newly created table. From it, you see the default data source configuration for access to this table by applications. It is not useful to change any of these at this point for your particular table, but you should note these defaults so you can provide users with the changes necessary when they are connected to a data source and want to use your new table.

Some examples are:

• a specific character variant rules file

• a specific collation sequence (this is a table parameter)

• a setting for CHECK_TEXT_STATUS

As well as these, you might want to record in your files the version of SearchServer under which the table was created.

For more information about system tables, see the section, "The Sys-tem Information Tables," in Chapter 2, "The Administration Tools."

Checking the Validity of TableManagement Files

SearchServer provides the ftcout utility program for verifying the internal consistency of the table management files, and it ensures that all table data (except the text of any associated external docu-ments) can be read.

The ftcout utility program outputs the data in a proprietary format, and standard exit values are returned. A value of zero indicates suc-cess, any other value indicates failure. As each error is detected, ft-cout writes an error message to the standard error stream. For example:

ftcout support -l

Page 259: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Verifying the Table

Text Retrieval Guide 259

Verifying External Documents

After you have created and populated your table, you should per-form a few selects on the external text column to ensure that you re-trieve the information that you're expecting. Try these scenarios:

• Check the execution time for a phrase search. Is it reasonable? If not, look at literal indexing if phrase search is slow, or augmenting the stop word list.

• Execute a SELECT statement that specifies a particular zone in the external text column.

• Check the literals that you have indexed by executing a SETSHOW_MATCHES statement that specifies a value of `TRUE'. Search several documents for the literal terms you are expecting.

• Again execute a SETSHOW_MATCHES statement that specifies a value of ̀ TRUE'. Search several documents for a specific search term. This allows you to verify that the highlighting is in the correct position.

Using the Thesaurus File

If your new table is to be used with a thesaurus file, you should ver-ify that it works as expected. If you have a unique thesaurus, then re-fer to Chapter 7, "Providing Support Files for Searching" for a description of the thesaurus file before you proceed with verifying the table itself.

Once you have a thesaurus file that you know is correct, you can ver-ify your table's operation with that file. A few ad hoc searches that make use of the thesaurus provide you with some confidence about your table. By knowing what relations are in the thesaurus file and what words occur in some sample text in your table, you'll be able to design SELECT statements that exercise the thesaurus with your ta-ble.

Using the Character Variant File

It may be necessary to provide alternative spellings for words based on character variations. If your external documents contain several spellings of a term, then you should use a character variant file. Re-fer to Chapter 7, "Providing Support Files for Searching" for a de-scription of creating and verifying this file.

You should ensure that your character variant file is correct before using it to verify the correctness of your new table. As with the the-saurus file, a few ad hoc searches that make use of the character vari-ant file will suffice. Design the queries with the character variant rules and term variations in the text taken into consideration.

Verifying Changes to the Stop File

Page 260: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

260 SA-Application Software Expert 5.0

C

If you have modified or replaced the stop file, you must completely re-index the table using the VALIDATEINDEX statement that spec-ifies the ABANDON parameter.

To test whether the stop file is working as expected, you can index (or re-index) a single table after you've modified the stop file.

Choose a new stop word that was added to your stop file and that you know exists in your table and use this stopword in a SELECT state-ment as a search term. SearchServer should return a message indi-cating that no matches were found.

If the stopword can be selected, verify that the format of the modi-fied stop file is correct. An incorrect stop file is ignored.

Page 261: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 261

GLOSSARY

Active Connection A connection is active when a SearchSQL statement has been or is being executed on the connection, even if an executed statement has been terminated.

AFTCS94 The Fulcrum Technologies Internal Character Set (FTICS) used for Arabic.

Alias A temporary column name used in the AS clause of the SELECT state-ment. The alias can be used in the optional ORDER BY clause to refer to the column in the working table.

ANSI American National Standards Institute.

API See Application Program Interface.

API Function One of the SearchServer API functions that can be used in your application to perform environment management, execute SearchSQL statements, process results, and retrieve data. See also SearchServer API.

API Library File Contains the software functions that execute SearchServer requests on behalf of the application. The Fulcrum SearchBuilder products provide a programmable interface (API) to these functions for developing a SearchServer application.

Application The client application program that is using SearchServer ser-vices. Also the total solution for which SearchServer is being used.

Application Program Interface (API) A set of functions that can be called from an application program to carry out specific operations on behalf of the application. See also SearchServer API.

Application-defined A definition that is specified by the application.

ASCII American Standard Code for Information Interchange.

Base Table The table resulting from executing a CREATE SCHEMA or CREATE TABLE statement. As with all tables, it comprises rows, columns, and zones. However, a base table is different from a working table or a system table in that the internal column data is persistent across connections. A base table is usually referred to simply as a table.

Case Normalization The conversion of all alphabetic character codes in a character set to the set of codes representing the corresponding uppercase characters. SearchServer performs case normalization by default.

Character Class A set of characters defined to be equivalent for the purposes of term recognition. The characters in a class are treated according to the in-dexing attributes assigned to that class.

Character Class Rules A set of rules that determine how characters are treat-

Page 262: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

262 SA-Application Software Expert 5.0

C

ed for indexing and in search specifications. See also Stop File.

Character Set A set of symbols and their associated code values. Search-Server supports the Fulcrum Technologies Internal Character Sets (FTICS).

Character String A sequence of characters treated as a unit.

Character Variant Generation The operation of generating several equiva-lent forms from a single search term by replacing a target string in the original search term with each of several different character strings, according to the rules in a SearchServer character variant rules file.

Client (or Client Application) An application process on the user end of a client/server connection.

Client/Server A networking architecture that allows the software running on users' machines (the clients) to access resources and/or capabilities through a server process executing on a remote host. Typically, client machines are PCs and the server is a high-end workstation or a mainframe. SearchServer client/server operation is distinct from transparent file access (TFA).

Column A set of values related to one attribute of a table. All values within a given column must have the same column data type. Each column has a de-scription and an ordinal position within the table. See also Simple Column and Segmented Column.

Column Attribute A persistent characteristic of a column such as name, data type, size, and index mode. Column attributes are recorded in the COLUMNS system information table.

Column Data Type The type of data assigned to a column, as defined through SearchSQL. Column data types are defined in the schema definition. The three main column data types are character string, exact numeric, and date.

Column Definition In a schema, a column definition associates a name with the other column attributes.

Column Value The data stored in a table for a particular row and column. It is the smallest unit that can be selected from a table and the smallest unit that can be updated.

Component Table A member of a SearchServer view.

Connection The association between a client application and a data source. The duration of a connection is the interval between the SQLConnect func-tion and the corresponding SQLDisconnect function. A connection defines a set of tables that can be accessed by the client through the data source.

Connection Handle A unique identifier assigned by the SearchServer API to identify the connection to a server. A connection handle is assigned by the SQLAllocConnect function.

Constituent Zone A zone in a segmented column. See also Segmented Col-

Page 263: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 263

umn.

Container An object that contains other (member) objects, which can be ex-ternal text objects or other containers. Examples include directories, docu-ment libraries, and databases.

Container Row A row in which the combination of FT_SFNAME and FT_FLIST describes a container. SearchServer sets FT_ROW_TYPE for container rows to 'DIRECTORY'. Container rows are expanded and main-tained by table validation. See also Directory Row, Document Library Row, and Row Expansion.

Context Information The information SearchServer needs to allow an appli-cation user to move randomly through a text stream. See also FillBuf Func-tion.

Context Retrieval A SearchServer retrieval feature that is enabled after a SearchSQL SET SHOW_MATCHES 'TRUE' statement is executed. When context retrieval is enabled, the text in the working table contains embedded match codes surrounding terms that match the search.

Control See GUI Control.

Control Sequence A sequence of characters that defines text indexing or dis-play attributes. The control sequences SearchServer supports are defined in the SearchServer Data Preparation and Administration manual.

Current Directory The file system directory in which you are currently working.

Cursor A pointer to the current row of a SearchServer working table, or an online indicator that shows where a user selection will take affect.

Custom Controls Any of the GUI controls specific to some SearchBuilder kits.

Data Definition Language (DDL) DDL statements define the structure of data and related indexes. The DDL statements SearchSQL supports are CRE-ATE SCHEMA, DROP TABLE, and VALIDATE INDEX.

Data Manipulation Language (DML) DML statements manipulate data. The DML statements SearchSQL supports are CREATE TEXT_VECTOR, DELETE FROM, INSERT INTO, SELECT, and UPDATE.

Data Source A named set of tables that is bound to a driver (such as Search-Server). See also SearchServer.

Data Type See Column Data Type.

Database A set of one or more tables about related subjects, and a set of op-erations for activities such as searching and sorting. See also Relational Da-tabase.

DDL See Data Definition Language.

Differential File A record of the changes made to a table between the execu-

Page 264: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

264 SA-Application Software Expert 5.0

C

tion of VALIDATE INDEX statements. A differential file is kept only for im-mediate tables. The data recorded in a differential file is merged into the periodic index of the table when a VALIDATE INDEX statement on that ta-ble is executed. See also Periodic Index and Immediate Table.

Directory A way of organizing files so that related files are stored together. A directory can contain other directories. The topmost directory is called the root directory, while directories within directories are called sub-directories.

Directory Row A container row in which the container referred to is a direc-tory. If FT_ROW type is 'DIRECTORY', the row is a container row, but not necessarily a directory row. See also Container Row.

Directory Row Expansion A SearchServer indexing feature that automati-cally populates a table with individual rows for each external document in a named directory. See also Row Type.

Display Attribute Defines if and how characters are modified for display purposes. These attributes are defined through SearchServer control sequenc-es. See also Select Graphic Rendition (SGR) Sequence.

DLL See Dynamic-Link Library.

DML See Data Manipulation Language.

Document A searchable text object that typically corresponds to one row in a SearchServer table. See also External Document.

Document Format Translation Text Reader A text reader that translates a document in a storage stream, usually in original form, to FTDF. Examples include the ftmf and nti text readers.

Document Library A storage stream that is a container for external docu-ments. Note that directories and databases are not simple storage streams, and are therefore not document libraries (although they are containers). See also Fulcrum Library.

Document Library File A host system file that contains a document library.

Document Library Row A row in which FT_FLIST holds a library expan-sion text reader list. See also Container Row.

Document Text Reader A document format translation text reader.

Domain A user-defined data type.

Domain Definition In a schema, a domain definition associates zones with a column, or overrides a column's default index mode. A list of zones, or a data type and index mode can be assigned to a domain in the domain definition. See also Zone.

Dynamic-link Library (DLL) A file containing code that an application can call at run time.

EFTCS94 The Fulcrum Technologies Internal Character Set (FTICS) used for Europa-3.

Page 265: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 265

Engine A specific functional part of the software technology underlying an API. See also SearchServer and Search Engine.

Expansion FillBuf Function In the expansion mode of a text reader, the Fill-Buf function that lists the logical documents that have been stored in a single file or database.

Expansion Text Reader List A special form of text reader list used by SearchServer for row expansion. It consists of a normal text reader list (be-ginning with a container expansion text reader ). This is followed by a "!", and a model retrieval text reader list to use a template for the text reader list of the resulting member rows. (Directory expansion is a special case, and does not need an expansion text reader list.) See also Library Expansion Text Reader List.

External Document Typically a self-contained text file having its own native format, although more complex storage schemes are possible. It can be creat-ed by a word processor or text editor, or it can be the by-product of some other function such as electronic mail. External documents must be read through a text reader.

External Text Text that is not stored in SearchServer's table files. This text that can be searched through the external text column (FT_TEXT or its equiv-alent), but is stored in one or more external documents.

External Text Column A column in which SearchServer presents external text. SearchServer uses text readers to read external documents in their native format and presents the text to the application as a text column named FT_TEXT.

Field The text reader unit, identified by number, that corresponds to a Search-Server column.

FillBuf Function A text reader function that performs data transformation and creates an output stream. See also Expansion FillBuf Function.

Format Translation Text Reader A document format translation text read-er.

Formatting Attribute Defines if and how retrieved text can be formatted by an application at display time. These attributes are defined through Search-Server control sequences.

FTCS94 The standard Fulcrum Technologies Internal Character Set (FT-ICS).

FTDF See Fulcrum Technologies Document Format.

FTDF Text Reader A text reader for which both input and output are FTDF. Used mainly for creating segmented columns and for loading internal col-umns. An example is the C source (r) text reader.

FTICS See Fulcrum Technologies Internal Character Set.

Fulcrum Library A proprietary Fulcrum document library form supplied with SearchServer, understood by the standard (s) and Fulcrum library (l) text

Page 266: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

266 SA-Application Software Expert 5.0

C

readers.

Fulcrum Technologies Document Format (FTDF) An ANSI-standard-based specification for text that is represented in a Fulcrum Technologies In-ternal Character Set (FTICS). It uses SearchServer control sequences of the format described in Fulcrum SearchServer Data Preparation and Adminis-tration.

Fulcrum Technologies Internal Character Sets (FTICS) SearchServer translates all table data into FTICS before it is processed. These character sets are defined in Fulcrum SearchServer Data Preparation and Administration.

Full Pathname A complete filename specification. A full pathname contains the complete list of directories that must be traversed to locate a directory or file. See also Relative Pathname.

Function Either a SearchServer API function, or a SearchSQL function.

GUI Graphical user interface.

GUI Control An active display element used to receive input and display output. Every control has its own set of properties and events. Often referred to simply as a control.

Identifier In SearchSQL an identifier names a table, column, or domain. The maximum length for an identifier in SearchSQL is 18 characters. It must start with an uppercase letter, and can contain uppercase letters (A-Z), digits (0-9), and special symbols (#, $, _). In other contexts, the term identifier is used more loosely to mean any text string used as a label.

Immediate Table A table created with the IMMEDIATE table parameter. It has an index that is updated immediately by any SearchServer operation that adds, modifies, or deletes column data.

Index A list of the unique index terms that make up the text in a table along with the locations where they are stored, or the operation of preparing such a list. See also Periodic Index.

Index Files The files that contain the index information for a table.

Index Mode Determines the type of indexing performed on a column's data. There are four SearchSQL indexing modes: NORMAL, VALUE, LITERAL, and NONE.

Index Term The smallest unit of text that is indexed and searchable. Typical-ly a single word.

Index Term Separator A character or sequence of characters that causes the SearchServer indexing engine to recognize a word break.

Indexing Attribute Defines how the SearchServer indexing engine is to in-terpret indexed words, and allows the search engine to recognize whether a search term matches a search.

Indexing Status An attribute of a row that indicates whether or not a row needs to be indexed. The indexing status of a row is reflected in the value of

Page 267: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 267

the FT_ROW_STATE column.

Info Function A text reader function that provides information to Search-Server for text reader processing.

Internal Column Any column of a table other than the external text column (named FT_TEXT by default).

Intuitive Searching A SearchServer search technique that provides the ap-plication with the means to select documents based on the similarity of their content to sample document text.

ISO International Organization for Standardization.

Keyword See Reserved Word.

Label An identifier usually a word, symbol or group of characters used to identify a file or an element of an application. In an application, labels can't be altered by the user.

Library Document A member of a document library.

Library Document Row A row that represents one member of a document library. A library document row can be generated automatically through SearchServer row expansion, or it can be inserted manually when created for applications that access static table data.

Library Expansion Text Reader List An expansion text reader list for doc-ument libraries.

Library File See API Library File or Document Library File.

Library Row A document library row. Contrast with Library Document Row.

Literal A non-null value expressed directly in a SearchSQL statement. There are three types of literals: character string literal, exact numeric literal, and date literal. LITERAL is also one of four indexing modes.

Logical Document A document that is physically stored in a single file or da-tabase. See also Library Document.

Match Codes A pair of control sequences that delimit the location of a search term in the searchable text. Match codes are inserted automatically in the working table when context retrieval mode is enabled.

Member An object in a container. For example, a library document.

Network A group of computers and associated devices that are connected by communications facilities.

Node A physical or virtual location on a network. Your current location on the network is the local node; all other locations on the network are remote nodes.

NULL A SearchSQL keyword that represents a null value.

Page 268: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

268 SA-Application Software Expert 5.0

C

Null Value A special value that indicates the absence of data.

ODBC See Open Database Connectivity.

Open Database Connectivity (ODBC) An interoperability standard that en-ables applications to access data in a database management system (DBMS) using SQL as a standard for accessing data.

Open Function A document text reader function that completes the initial-ization of a text reader block for document text reader processing. See also Text Reader Block.

Original Form The form in which an external document was originally stored by its authoring application.

Parameter A variable that is passed between the application and SearchServ-er, or a portion of a SearchSQL construct.

Pathname See Full Pathname or Relative Pathname.

Pattern A sequence of characters treated as a unit for searching purposes. The search result is influenced by the index mode of the column or zone being searched, as well as the presence of punctuation, white space characters, or special characters in the pattern. Within a pattern, a search term can also con-tain special pattern characters.

Periodic Index A form of SearchServer index optimized for efficient storage and searching.The information about the data in a table is updated through the execution of a VALIDATE INDEX statement.

Periodic Table A table created specifying the PERIODIC table parameter in a CREATETABLE clause. It has an index that is updated only when a VAL-IDATEINDEX statement is executed.

Phrase A sequence of one or more search terms and punctuation separated by white space characters.

Pointer The tool that is used to choose, move, or resize a GUI control.

Populate The operation of loading application-specific control information and data into a table according to the schema definition.

Precision The percentage of rows in a working table that are perceived by the user to be relevant to the search. See also Recall.

Predicate A SearchSQL condition that can be evaluated as true or false.

Recall The percentage of all relevant (as perceived by the user) rows in a table that were selected for the working table. See also Precision.

Relational Database A type of database where a row represents a relation among its data values. Storing information in tables relational a database uses matching values to relate information in one table to information in the other.

Relative Pathname A relative pathname identifies the location of a directory or file relative to another directory in the file storage system. See also Full Pathname.

Page 269: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 269

Reserved Column A SearchServer-defined column that is created automati-cally for every table. The only column attributes of a reserved column you can change are the column name, and in some cases, the index mode.

Reserved Word A word that can't be used as a name or identifier in a Search-SQL construct unless it is used in quotation marks. A complete list of re-served words is included in Fulcrum SearchServer SearchSQL Reference.

Result List See Working Table.

Row A set of related values in a table. A row contains one value (or no value) for each column. A row is the smallest unit of data that can be inserted into or deleted from a table.

Row Expansion SearchServer feature in table validation that automatically populates a table with individual rows for each member in a container.

Row Type A row can be designated as DATA, DIRECTORY, or LIBRARY. One of these values is held in the FT_ROW_TYPE column of the row. A DATA row is associated with internal column data, and optionally with an ex-ternal document. A DIRECTORY row refers to a directory containing exter-nal documents, and a LIBRARY row refers to a document library.

Schema A list of the labels or keys used in a program, along with a descrip-tion of their logical meanings. This information is bound to one table. It de-scribes the structure of a table; its rows, columns, domains, and other properties.

Script A file containing commands that are intended to be executed by an ap-plication such as ExecSQL or the SearchServer API test driver.

Search Engine The component of SearchServer that executes a search and builds the corresponding working table.

Search Result The set of rows and columns resulting from the execution of a SELECT statement. The search result is presented as a working table.

Search Term A sequence of one or more adjacent characters, punctuation, and spaces. The simplest case of a search term is a single word. A search term can also be a phrase, such as TEXT RETRIEVAL. See also Phrase and Index Term.

SearchServer Fulcrum's powerful search and retrieval system. It receives SearchSQL statements and SearchServer API requests from the client appli-cation, interprets them, and carries them out. SearchServer sends back status information and data to the client application through the SearchServer API.

SearchServer API The client application's point of contact with SearchServ-er. The client application uses the API to send SearchSQL statements and oth-er requests to SearchServer, and to receive status information and data from the server. See also Application Program Interface and API Function.

SearchSQL The ANSI SQL-based search language that is part of the Search-Server application interface. SearchSQL provides language extensions for supporting full-text searching and retrieval.

Page 270: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

270 SA-Application Software Expert 5.0

C

Segmented Column A column (typically the external text column) that is partitioned into separately searchable zones. A segmented column is defined in the schema by associating it with a domain that links the zones. A segment-ed column is the opposite of a simple column.

Select Graphic Rendition (SGR) Sequence A control sequence that indi-cates where a character display attribute such as bold, italic, rapid blink, or overstrike starts or stops.

Server A software process that provides searching and table management services to applications in a client/server application. See also SearchServer and Client/Server.

Server Attribute Server attributes affect how table data is recognized and handled by SearchServer. Server attributes are recorded in the SERVER_INFO system information table and are specific to the current con-nection.

Server Node A node that has SearchServer software installed and is capable of running one or more server processes to service SearchServer client appli-cation requests in a client/server environment.

Session The duration of an environment as determined by a SearchServer ap-plication. The current session begins when an environment handle is assigned by a call to the SQLAllocEnv function, and it ends when the environment handle is released by the SQLFreeEnv function.

SGR Select Graphic Rendition. See also Select Graphic Rendition Sequence.

Simple Column A column in which no application-specific zones are de-fined. It contains only a default zone. Compare with Segmented Column.

SQL See Structured Query Language.

Statement Handle A unique identifier assigned by the SearchServer API to identify a SearchSQL statement. It retains all of the information related to the statement (such as column bindings, result values, and status information). A statement handle is assigned by the SQLAllocStmt function and is an input parameter to all API functions that affect SearchSQL statements.

Stop File A table management file that contains a list of terms that are to be ignored during indexing and searching. The stop file also contains any appli-cation-defined character class rules.

Stop Word A word that appears in a stop file.

Storage Access Text Reader A text reader that delivers a storage stream.

Storage Stream A stream of bytes, in which arbitrary repositioning is possi-ble.

Storage Text Reader A text reader that delivers an external document in its original form.

Storage Transform Text Reader A text reader that transforms one form of storage stream to another. Can be used to implement containers. An example

Page 271: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

GLOSSARY

Text Retrieval Guide 271

is the Fulcrum library (l) text reader.

Structured Query Language (SQL) SQL and its variants are the languages used with relational database systems. See also SearchSQL.

System Information Table A set of reserved SearchServer tables that are part of a global repository for the physical and logical information that can be retrieved for every table in the data source.

Table A data structure that is organized into rows and columns. A table can be updatable or read-only. Unless otherwise specified, table usually refers to a base table. See also Base Table, Working Table, System Information Table, Immediate Table, and Component Table.

Table Management Files Files that contain SearchServer table data. They include the configuration file, index files, data files, index log file, and tem-porary index sort files.

Table Parameter Values specified in a CREATE TABLE clause that define where SearchServer can locate external documents, table management files, temporary work space, and other attributes of a table. Table parameters are recorded in the TABLES system information table.

Table Support Files SearchServer files that influence the operation of SearchServer. These files include the system configuration file, stop file, the-saurus file, and character variant rules file.

Table Validation The process of reconciling table data with the state of ex-ternal documents and containers. This can include adding, deleting, or modi-fying rows in the table. See also Row Expansion.

Term See Search Term and/or Index Term.

Term Separator See Index Term Separator.

Text Attribute Defines whether text is displayed and what kind of indexing mode is assigned to it. These attributes are defined through SearchServer con-trol sequences.

Text Box A GUI control into which a user can enter text.

Text Character Count The count of separately displayable characters in any contiguous region in the text that can be searched.

Text Column A column having one of the character data types of Search-SQL. Full-text searching can be applied to a text column using techniques such as word, pattern, phrase, or Intuitive Searching. See also External Text Column.

Text Reader (TR) A dynamically loadable module that is used (sometimes in combination with other text readers) to generate the external text stream. SearchServer invokes a text reader whenever an external document is indexed or retrieved.

Text Reader API An API provided for writing text readers.

Page 272: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

272 SA-Application Software Expert 5.0

C

Text Reader Block A data structure used by the text reader API to manage the text stream between the text readers in a text reader chain.

Text Reader Chain The sequence of text readers used to read the text in a document. The final output of a text reader chain is a data stream that con-forms to the Fulcrum Technologies Document Format (FTDF).

Text Reader List A character literal that specifies the text reader chain re-quired to process an external document or container. This value is held in the FT_FLIST column of any row associated with an external document.

Text Stream A continuous succession of characters representing text. A text stream is normally processed sequentially.

TFA See Transparent File Access.

Thesaurus Expansion The operation of generating related search terms ac-cording to the rules of a SearchServer thesaurus.

Transparent File Access (TFA) An operating system service that provides direct access to files on a remote host. Contrast with Client/Server.

View A logical grouping of several tables into one large table for search pur-poses. The tables that form a view are also called component tables of a view.

White Space Character The space character (x20) and control characters that separate search terms and index terms, including the horizontal tab, line feed, carriage return, vertical tab, and form feed characters.

Wildcard Used in a pattern, represents one or more characters in a search term. In SearchSQL, the two valid SearchServer wildcards are the underscore (_) which matches any single character, and the percent sign (%) which matches any number of characters.

Wildcard Expansion The process of expanding a pattern to include all in-dexed terms that match the pattern.

Working Table A temporary table derived from one or more other tables as a result of a search. See also Search Result.

Zone A separately searchable portion of a column. You can specify more than one zone per column, or overlap zones within a column to focus the area of data scanned during a search. However, you cannot specify a zone across multiple columns.

Zone Definition In a schema, a zone definition associates the characteristics of a zone such as its name, zone number, and data type. Zone definitions are recorded in the ZONES system information table.

Page 273: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 273

Appendix A:

Utility Program Summary

This appendix provides a summary of the command line syntax and command parameters for invoking the utility programs.

Invoking a Utility Program

You can invoke a utility program by entering the program name and its parameters on the command line. The utility programs return standard exit values (zero means success, any other value means failure). If an error occurs, the utility program displays one or more explanatory messages to the standard error stream (usually the screen) or to a log file.

Most of the utility programs expect the name of a table as a param-eter. The command line parameters are entered as literal values, fol-lowed by the filename or value that applies to that parameter.

Most of the parameters are optional, and you can enter them in any order. For example, the two following commands are acceptable and have the same meaning:

ftlock <table name> [-h] ftlock [-h] <table name>

In most cases, each parameter is specified by a single en-dash lead character (-) followed by a single lowercase letter and an argument (if required). The descriptions of the command line parameters shown in this appendix use UNIX-style syntax. For example:

<utility name> <table name> [-o<value1>] [-p] [-q<value2>]

This syntax applies to the command shell environments of UNIX, MS-DOS, 32-bit Windows, or Macintosh MPW. The MS-DOS command shell is included in the list because SearchServer for Mi-crosoft 16-bit Windows includes some utility programs that must be run in a DOS shell window.

Filenames and table names are interpreted literally, and they may follow the local operating system conventions. You should not en-able the caps lock key when invoking a utility program. In most cas-es, if no parameters or if incorrect parameters are supplied in the command line, the utility program displays a usage message, show-ing the expected parameter list.

The next section explains how to adapt this syntax for utilities that operate as Windows applications.

Page 274: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

274 SA-Application Software Expert 5.0

C

Utility Programs in a Microsoft 16-Bit Windows Environment

SearchServer for Microsoft Windows 3.1 contains both DOS-based and Windows-based utility programs. The most frequently used util-ities are available as Windows applications under Microsoft 16-bit Windows.

Table A-1 Utility Programs as Microsoft Windows Applications

Using the Utilities with Microsoft 16-Bit Windows Environments

There are three methods of invoking one of these utilities.

Method 1

If Windows is already running, select Run from the Program Man-ager File menu. Enter the following syntax at the command prompt:

<utility name>.exe [-K] {<commandline parame-ter>...}

Icon What it Does

ExecSQL is an application program used to execute SearchSQL statements and run scripts. You can administer, modify, and maintain tables using this tool.

The SearchDoc documentation viewer allows you to search and display the SearchServer online documentation. You can search for specific words or phrases anywhere in the complete body of text across all SearchServer manuals, or you can limit your search to specific sections of the manuals you name.

The ftcin utility reads data supplied in the text file format created by ftcout, and loads the desired rows into a selected table. The ftcin utility is very useful when building Fulcrum library files.

The ftcout utility is used to export table data into a proprietary text file format. This text file can later be loaded with ftcin.

ftlock (no icon) The ftlock utility is used to alter table attributes. You can use it to set or cancel high integrity mode.

Page 275: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 275

If there is an icon labeled with the utility name you want, double-click on the icon to display the following screen.

Figure A-1Example of a Utility Window

Method 2

If Microsoft Windows is not already running, use the following syn-tax at the DOS prompt:

win <utility name>.exe [-K] {<command line pa-rameter> ...}

Method 3

If you specify the optional -K parameter, the program will automat-ically exit Windows when it has finished executing. Use this param-eter when you want to invoke the utility from a DOS batch file. Note: The description that follows doesn't apply to the ExecSQL or SearchDoc applications. For a description of ExecSQL and Search-Doc, see Fulcrum SearchServer Getting Started.

Entering Command Line Parameters

To indicate what function you want to use for a particular utility, en-ter the command line parameters for the utility in the Command Line Text Entry Sub-Window. Once you've done this, press return or click ok. The utility executes and displays the results (to a maximum of 64K) in the scrollable output window. If the output data is greater than 64K, save it to a file using the -1 and -2 options (described later in this appendix).

To exit from the utility, use one of the following Microsoft Windows procedures:

• double-click on the control box of the application window

• open the Control menu and select Close from the pop-up menu

• press alt+spacebar, then press c

By default, the results are written to a temporary file. The name of the file is determined by the operating system at run-time. The file is

Page 276: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

276 SA-Application Software Expert 5.0

C

deleted when the utility exits. To prevent conflicts with other users when on networked systems with a shared file serve, the temporary file is located in the directory define in the FULTEMP parameter. For a complete description of the FULTEMP parameter see the sec-tion, "Using the Utility Parameter File," later in this appendix.

If the utility is invoked from an icon, and more than one command line is entered before exiting the utility, the output data relating to the most recent command overwrites earlier data in the temporary file.

If you use the -1 or -2 option on several commands during one ses-sion of a utility, the output relating to the most recent command overwrites earlier output data in the named file. If you don't want to overwrite existing data, you can name different files for each use of the -1 and -2 options.

To append later output to existing data, use the plus sign(+) after the -1 or -2 option on the command line of the utility. The files specified by these options are written to the current directory, if no directory is specified on the command line.

If you specify the -0 option on the command line, the option and its filename are removed from the command line buffer and the con-tents of the file is appended to the command line buffer. Execution then continues with the revised command line.

Note: If a command option is specified more than once on the com-mand line, but with different values, the later value is used. No error is returned.

Utilities that are MS-DOS Applications

The following utility programs must be run in a DOS shell:

• ftlin

• ftlout

• ftmload

• ftmunld

• fthmake

• fthtest

• ftidrck

To invoke one of these utility programs, use the syntax described in the section "Utility Program Summary" later in this appendix.

Page 277: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 277

Note: The MS-DOS utilities do not make use of the FULCRUM.INI file or the data source. You must ensure the FULSEARCH, FUL-CREATE, FULTEMP, and FTNPATH parameters exist in your MS-DOS environment.

Using the Utility Parameter File

In the Microsoft 16-bit Windows environment, the utilities with names beginning with "ft" all make use of a parameter file to specify the values for certain operating parameters. The FULCRUM.INI file is in the BIN directory. You can edit this file as you would any text file on your system. The parameters in it relate directly to those that you have set for your data source (FULSEARCH, FULCREATE, FULTEMP, and FTNPATH).

Your data source is used by ExecSQL, SearchDoc, and your appli-cation. You should align these parameters with your data source pa-rameters to avoid inconsistencies between the operation of your application and the utilities. For a complete description of these pa-rameters, see Fulcrum SearchServer Getting Started.

Note: In a 16-bit Windows environment, the indexing environment is controlled by the operating parameters specified in the FUL-CRUM.INI file.

Command Line Syntax

The syntax of the commands for invoking utility programs is de-scribed in typographical terms. The parameters are shown in bold type, and the arguments to the parameters enclosed in angle brackets (< >) indicate that you must enter the value that is associated with that parameter.

Optional parameters are enclosed in square brackets ([ ]), and items that can optionally be repeated are followed by an ellipsis (...). Groupings of command line options can be shown enclosed by curly braces ({ }). Do not enter square brackets or curly braces. These characters are included in the command line syntax only to show the groupings of parameters and arguments.

A vertical bar (|) is used to separate the groupings that are mutually exclusive. You can either enter the options in front of the bar, or those following the bar, but not both. You should not enter the ver-tical bar itself.

In the following syntax, the utility name (ftlock) is mandatory and takes a single argument (table name). There are two optional param-eters.

Page 278: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

278 SA-Application Software Expert 5.0

C

ftlock <table name> [ -h | -c]

You can enter either -h (which enables row locking) or -c (which cancels row locking), but not both. These parameters do not have ar-guments associated with them.

A Word About the Table Name CompoundIdentifier

In the UNIX environment, SearchServer creates table management files with lowercase filenames. Therefore, you must specify the table name parameter in lowercase letters when invoking a utility pro-gram.

The table name parameter must be the name of the table that is to be processed. This parameter can be a simple table name, such as STDOCS or a a qualified table name such as C:\FULCRUM\FUL-TEXT\STDOCS (under any Microsoft Windows environment) or /home/fulcrum/fultext/stdocs (under UNIX).

When a simple table name is entered, SearchServer searches for the table according to the FULSEARCH parameter. When a qualified table name is entered, SearchServer only searches for the table on the local node in the specified directory.

Specifying a Server in the Table Name Compound Identifier

In a distributed environment, most utilities that expect the table name argument also accept table name@nodename. This method of naming a table lets you explicitly name the location of a table by specifying the node name of a remote host system. In this case, the table name must be a simple table name, that is, it must not be qual-ified by a directory specification.

For example, the parameter stdocs@rhost causes SearchServer to try to locate the stdocs table on any server running on the rhost node, which is assumed to be accessible through the network text readers listed in the FTNPATH parameter.

In a distributed environment, the simpler form of table name is still accepted. In this case, SearchServer searches for the table locally first. Failing that, SearchServer searches for the table among those accessible to each remote server (provided that the table name is a simple and unqualified table name).

When the command applies to the local node only, a period (.) must be entered for the nodename (that is, table name@.). When specify-ing multiple tables, one table name or table name@nodename pa-rameter must be entered for each table.

Page 279: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 279

Utility Program Summary

This section provides a quick reference to the utility programs and their parameters. The utility programs are presented in alphabetical order.

Note: In Microsoft 16-bit Windows environments, remember that the way you invoke a utility depends on whether it is a Windows ap-plication or a DOS program.

ExecSQL

ExecSQL is an administration utility that lets you issue SearchSQL statements or run scripts consisting of SearchSQL statements. In all Microsoft Windows environments, ExecSQL is a Windows application.

Note 1: Any error encountered during retrieval for a SELECT state-ment terminates retrieval for that SELECT statement.

Note 2: Windows NT does not support running 16-bit applications from the AT job scheduler. Therefore, you cannot use ExecSQL in this instance. Use ftetapi instead.

Command Line

The syntax for ExecSQL is:

execsql [-h<datasourcename>] [[-0]<infilename>] -1<outfilename>] [-2<errfilename>] [-s<queue size>]

Parameters

The command line parameters are interpreted as follows:

-h<datasourcename>

Connect to the named data source. If a data source is not specified, ExecSQL accesses the local and remote tables available to the serv-ers specified directly or indirectly through the SearchServer FTN-PATH parameter. This parameter is not available in a Windows environment.

-0<infilename>

Provide input to ExecSQL in the form of a script file. The name of the input file is specified through infilename. The script is processed

Page 280: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

280 SA-Application Software Expert 5.0

C

and ExecSQL exits. If this parameter is omitted, input is taken from the standard input stream. In this case, the input is terminated by pressing ctrl+d under UNIX.

-1<outfilename>

Save output from ExecSQL. The name of the output file is specified through outfilename. If a file with the same name already exists, it is overwritten. If this parameter is omitted, ExecSQL writes its out-put to the standard output stream, which is the screen by default. -2<errfilename>

Maintain a log of error messages in the file errfilename during the ExecSQL session. If a file with the same name already exists, it is overwritten. If this parameter is omitted, error messages are written to the standard error stream, which is the screen by default. -s

Specify the size of the statement queue to be maintained by Exec-SQL. If you specify a number greater than 1, then you can use the back reference predicate and intuitive searching in your queries. The default is 1.

Note: When ExecSQL encounters an error while executing a SearchSQL statement, the SQLSTATE and error message text are displayed. Statements that succeed but return a warning are pro-cessed the same as statements that succeed with no warning. That is, no SQLSTATE and message text are displayed.

For More Information About*sing ExecSQL, see Fulcrum SearchServer Getting Started.

ftcin

ftcin is used to load a table file created by ftcout or ftlin into an ex-isting table. Unless you're loading an IMMEDIATE table, you must execute a VALIDATEINDEX statement (perhaps through Exec-SQL) to make the new data searchable after running ftcin.

This utility can insert a row containing up to approximately 8000 bytes of data into a table. However, for a table in which a larger row has been inserted by other means (for example, through ExecSQL), ftcin can insert a row that matches the largest row previously insert-ed, within the constraints of available memory.

If you want to use ftcin and ftcout to relocate a table, you must rec-reate the schema after running ftcin for the new table. In ExecSQL, use a CREATESCHEMA statement that specifies the REPLACE option. Otherwise, the default schema is used and no application-de-fined columns are defined for the table. Date files in the data being loaded using ftcin should use the ISO standard data format, yyyym-

Page 281: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 281

mdd.

In Microsoft 16-bit Windows environments, ftcin is a Windows ap-plication.

Note 1: You can't use ftcin to load a view because views are read-only. Note 2: There are restrictions on the amount of data per row that can be imported by ftcin.

Command Line

To load a table file into a table using the ftcin utility program, use the following syntax:

ftcin <table name> [-t<libdata_filename>] -p -q -f [-x]

To import table data from a text file written by the ftcout utility pro-gram, use the following syntax:

ftcin <table name> [-t<data_filename>] -n -p -q [-x]

Parameters

The command line parameters are interpreted as follows:

<table name>

A compound identifier that specifies the name of the table whose ta-ble management data files you're loading.

-t<libdata_filename>

Specifies the name of a table data file (created by the ftcout or ftlin utility program). If you omit this parameter, ftcin reads the data from its standard input stream.

-n

Causes ftcin to remove all rows from the table before adding the ones from the export file. In addition, the schema is replaced by the default schema. This process reorganizes the table, and SearchServ-er automatically supplies new row identifiers for all of the rows in the table.

-p Indicates that the data being imported will update the existing rows.

Page 282: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

282 SA-Application Software Expert 5.0

C

-q

Causes ftcin to operate as if there are no rows that refer to duplicate filenames. This improves performance.

-x

Causes ftcin to lock the table for exclusive use. This ensures that no other processes are accessing the table when ftcin is used. If you are loading a large table, this option improves performance.

-f

Disables text reader verification during data loading. Using this pa-rameter improves performance of ftcin, but delays the reporting of any text reader specification errors until indexing is performed. Use -f when loading a library data file which has been prepared by the ftlin utility.

For More Information About

• creating library files, see the section, "Preparing Libraries," in Chapter 4, "Using External Text."

• ftlin, see the section later in this appendix.

ftcout

ftcout is used to verify and save table data. It checks the consistency of the data and saves it in a proprietary text file format. The resulting output file can be used by ftcin to restore the table data.

If ftcout encounters an error, it writes a message to standard error and continues if possible with the next row of data.

In Microsoft 16-bit Windows environments, ftcout is a Windows application.

Note: The maximum amount of text that can be displayed in the ft-cout window is 32K. If there is more than this amount of text, the remainder is not displayed.

Command Line

The syntax of the ftcout command line is: ftcout <table name> [-l] [-x] -t<data_filename>

Parameters

The command line parameters are interpreted as follows:

Page 283: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 283

<table name>

A compound identifier that specifies the name of the table you're verifying for consistency or exporting for import purposes later.

-l

An optional parameter that leaves date information corresponding to the FT_DATE reserved column in the output. When creating a copy of your table, include this parameter so that the information in the FT_DATE reserved column is also recorded. By default, the ftcout utility does not preserve the FT_DATE values.

-x

An optional parameter that preserves the indexing status of the ex-ternal documents as it is recorded in the FT_ROW_STATE reserved column. When creating a copy of your table's indexing status, in-clude this parameter so that the information in the FT_ROW_STATE reserved column is also recorded. By default, the ftcout utility does not preserve the FT_ROW_STATE values.

-t<data_filename>

Names the output file. If this option is omitted, data is displayed on the standard output (stdout). When you are verifying for consisten-cy, you should provide a filename. Any error messages are written to the standard error output (stderr). On a UNIX system, /dev/null is used so that the file isn't created and no disk space is used.

For More Information About

checking data consistency, see Chapter 2, "The Administration Tools" and Chapter 8, "Verifying the Table."

fthmake

fthmake compiles a thesaurus object file by reading a thesaurus source file and optionally, a set of character variant rules.

fthmake can't create a thesaurus using client/server. Therefore, you must run fthmake on the server system where the thesaurus is to be used, or copy it to the server system using a method that preserves binary data.

In Microsoft 16-bit Windows environments, fthmake is a DOS ap-plication.

Note: In Microsoft 16-bit Windows environments, the fthmake util-ity cannot create a thesaurus file larger than 63K.

Page 284: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

284 SA-Application Software Expert 5.0

C

Command Line

The syntax for fthmake is:

fthmake <sourcefile> <objectfile>[-f<textreader_list>] [-l<rulesfile>]

Parameters

The command line parameters are interpreted as follows:

<sourcefile>

|Names the thesaurus source file to be compiled.

<objectfile>

Names the object file to be created. The filename used is restricted to the 8-character filename with the three-character extension .FTH.

Note: fthmake doesn't support any redirection of input or output. Use an explicit sourcefile and objectfile on the command line in-stead.

-f<textreader_list> Specifies a text reader list to be applied to the source file. The default text reader list (nti:s) can be overridden by the -f parameter. If the translation text reader is selected, it translates the source text from the external character set to the FTICS equivalent. If the source text is already represented in the FTICS, a specification for the standard text reader alone should be used rather than the default text reader specification.

Note: You can't use the -f parameter to specify a custom text reader in Microsoft 16-bit Windows environments.

-l<rulesfile>

Specifies the name of the file that contains the variant rules you want to incorporate into the thesaurus object file. It must include the three-character extension .FTL. If the rulesfile is an unqualified file-name (no path to the file is specified), fthmake searches for the rules file in the ordered list of directories SearchServer searches to find ta-bles.

Page 285: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 285

For More Information About

modifying, compiling or testing a thesaurus source file, see the sec-tion, "Creating a Customized Thesaurus," in Chapter 7, "Providing Support Files for Searching."

character variant rules, see the section, "Creating a Character Vari-ant Rules File," in Chapter 7, "Providing Support Files for Search-ing."

fthtest

fthtest is a test facility used for verifying the operation of thesaurus expansion and character variant generation. This facility lets you compare an original search term with the set of generated query terms.

fthtest searches for a thesaurus object file or a character variant rules file in the directory specified by your utilities parameter file, and al-lows you to test the operation of the rules in the file. If you don't specify any terms on the command line, fthtest prompts you for a term, and reports the results.

fthtest can't access a thesaurus file using client/server. Therefore, you must run fthtest on a system where the thesaurus is accessible locally or through TFA.

In Microsoft 16-bit Windows environments, fthtest is a DOS appli-cation.

Command Line

To test the thesaurus expansion with one or more terms, use the fol-lowing syntax:

fthtest <objectfile> [<term>...]

To use the full capability of fthtest, specify the following syntax:

fthtest <term> [-h<objectfile>] [-l<rulesfile>] [-c<table_name>] [-t<outfilename>]

Parameters

Only one term is required in the second form of the command and you must specify at least one of the -h, -l, or -c parameters.

The command line parameters are interpreted as follows:

-h<objectfile>

Specifies the name of the thesaurus object file you want to test. The filename must include a three-character extension (usually .FTH). If

Page 286: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

286 SA-Application Software Expert 5.0

C

the objectfile is an unqualified filename (no path to the file is speci-fied), fthtest searches for the thesaurus file in the directory specified by your utilities parameter file.

If you omit the -h parameter, fthtest doesn't perform thesaurus ex-pansion.

-l<rulesfile>

An optional parameter that names the character variant rules file you want to test. The filename must include a three-character extension (usually .FTH). If the rulesfile is an unqualified filename (no path to the file is specified), fthtest searches for the rules file in the directo-ry specified by your utilities parameter file.

If you specify a variant rules file, fthtest performs character variant expansion on the term or terms generated by the thesaurus expan-sion. Otherwise, no character variant expansion is performed.

<term>

Refers to a search term to be expanded by thesaurus and/or character variant expansion. If there are any terms on the command line, fth-test looks them up and reports the results before exiting. Otherwise (in the first form of the command), fthtest prompts you for each term to be expanded.

-c<table name>

An optional parameter that instructs fthtest to test the expanded terms against the specified table. In this case, only the terms found in the table are reported.

-t<outfilename>

An optional parameter that specifies the name of an output file in which to store the results of the test.

For More Information About

modifying, compiling or testing a thesaurus source file, see the sec-tion, "Creating a Customized Thesaurus," in Chapter 7, "Providing Support Files for Searching."

character variant rules, see the section, "Creating a Character Vari-ant Rules File," in Chapter 7, "Providing Support Files for Search-ing."

ftidrck

ftidrck is a command line utility program used to check the validity of a table's index files. You can run ftidrck just before you index a very large table to help prevent indexing failure due to format errors

Page 287: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 287

in the existing index files, or before backing up a table to ensure a clean backup.

ftidrck can't access index files using client/server. Therefore, you must run ftidrck on a system where the index files are accessible lo-cally or through TFA.

In Microsoft 16-bit Windows environments, ftidrck is a DOS appli-cation.

Note: ftidrck does not check the validity of the differential file as-sociated with an IMMEDIATE table. As a result, use the VALI-DATEINDEX statement before using ftidrck on an IMMEDIATE table.

Command Line

The syntax for ftidrck is:

ftidrck <table name>

Parameters

The command line parameters are interpreted as follows:

<table name>

A compound identifier that specifies the name of the table associated with the index files you want to verify.

For More Information About

checking the validity of a table's index files, see the section, "Recov-ering from Indexing Failure," in Chapter 5, "Maintaining the Data."

ftimport

ftimport loads table data exported by other applications. In the Mi-crosoft Windows environment, ftimport is a WINDOWS applica-tion.

Command Line

The syntax for ftimport is:

ftimport <table name> [-s<datasource>] [-t<im-putfile>] [-h<headerfile>] [-c<inputcharset>] [-m<delim>][-q <quote>] [-e<qescape>] [-o<da-teorder>] [-i] [-x][-f] [-g]

Page 288: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

288 SA-Application Software Expert 5.0

C

ftimport can be executed with just the table name and no other pa-rameters. In this case, ftimport reads the import data from standard input. The first line of the import data must be the field names, and subsequent lines must contain the field values to import into the col-umns.

Parameters

The command line parameters are interpreted as follows:

<table name>

A compound identifier that specifies the name and location of the ta-ble into which the data is being inserted. The table name can be qual-ified. For example, in the 32-bit Windows environments you could use:

C:\FULCRUM\FULTEXT\STDOCS

-s<datasource>

In 32-bit Windows environments, this option names the ODBC data source used to access the table. The default is SearchServer_3.0. This option is not available in other environments.

-t<inputfile>

Specifies the file that contains the data to import into the table. If this option is omitted or the file is `-', ftimport takes its input from the standard input.

-h<headerfile>

Specifies a file from which to obtain the header information. The header information supplies a mapping from columns in the input file to columns in the table.

-c<translation>

Specifies the character set translation for both the input file and the header file. If this option is omitted, the default character set trans-lation is dependent on which platform is used. For more information about character set options, see the "Translation Text Reader" in Ap-pendix B, "Text Readers."

-m<delim>

Specifies the character that marks the transition to the next column value in the input data. If this option is omitted, the default character is a comma (,). The column delimiter cannot be a newline character nor any alphabetic (a-z, A-Z) or numeric (0-9) characters. There are aliases for some of the characters that are difficult to enter on the command line.

Page 289: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 289

Table A-2 specifies a list of characters that are normally difficult to enter on the command line because they can have special meaning. Aliases have been defined to help simplify entering these characters as delimiters, quote, or quote escape characters. These aliases repre-sent their corresponding character when entered on the command line for those options (-m, -q, -e).

Table A-2 Special Characters and Their Aliases

-q<quote>

If the column data contains the delimiter character, this option spec-ifies the character to be used in the input to surround column data. The default quote character is the single quote character (').

The quote character cannot be a newline character, nor any alphabet-ic (a-z, A-Z) or numeric (0-9) characters. There are aliases for some of the characters that are difficult to enter on the command line. For more information about these characters, see Table A-2.

-e<qescape>

Specifies the character that can precede the quote character, speci-fied by the -q option so that it can be in the column value. The de-fault character to escape the quote character is the single quote character ('). This character has the same restrictions as the column delimiter character and the delimiter alias table, see Table A-2.

-o<dateorder>

Specifies the basic format for the dates in the input file. It specifies the order of the components of the date and consists of a string con-taining a `Y', `M', and `D'. This parameter specifies the order in which to interpret the date (for example, `MDY' specifies all dates in the input file are formatted as month, day, year. The default for-mat is `YMD' (year, month, day).

-I

For IMMEDIATE tables, allows indexing to occur during the import operation.

-x

Specifies non-exclusive access to the table during the import opera-tion.

Character Alias Character Alias Character Alias

, c ' 1 tab t

; s " 2 space a

| p \ b : l

Page 290: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

290 SA-Application Software Expert 5.0

C

If the table is opened for exclusive access (that is, this parameter is not used), ensure that no other process is accessing the table. Exclu-sive access prevents other processes from opening the table, but doesn't detect if another process has already accessed the table. In this case, the other process won't be able to update the table. For op-timum performance, do not use this parameter.

-f

Validates the existence of the external file in the file system and that a valid text reader was specified for the external file. If either of these conditions fails, the line in the input file is ignored and ftim-port moves to the next line.

-g

Ignores errors resulting from the number of input fields not match-ing the number of columns in the header. Normally, these rows would not be inserted.

ftlin

ftlin creates a Fulcrum document library file from a list of document files. It can also create a matching table data file.

To create a library data file that is suitable for static applications, the -s parameter must be specified on the command line. Afterwards, you'll need to load the resulting table data file into a table using the ftcin utility program.

In Microsoft 16-bit Windows environments, ftlin is a DOS applica-tion.

Command Line

To create a document library file for an application that will access dynamic documents, use the following syntax:

ftlin <library_filename> [-i<filelist>]

To create a document library file and a library data file for applica-tions that will access static documents, use the following syntax:

ftlin <library_filename> [-i<filelist>] -s [-o<libdata_filename>]

Parameters

The command line parameters are interpreted as follows:

<library_filename>

Specifies the name of the document library file you're creating.

Page 291: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 291

-i<filelist>

Refers to an ASCII flat text file which contains a list of the names of files to be loaded into the library file; it must not refer to a directory. The files named in the filelist are loaded into the document library file. Alternatively, if the -i parameter is not specified, ftlin reads the list of filenames from its standard input stream. Each filename must be on a separate line.

-s

Causes ftlin to create a corresponding table save file. It is normally specified along with the -o parameter.

-o<libdata_filename>

Specifies the table save file you need for a static application. This is the name of the file you'll load into the table using ftcin. After the library data has been loaded into the table, the external document text from the library file is accessed through the standard text reader. If you omit the -o parameter but specify the -s parameter, the table save output is written to the standard output stream.

Example

In environments other than 16-bit Windows, to create a document li-brary file (called MYDOCS.DLF) from a list of document files (named in MYDOCS.LST) and load the related row data into a table (called mytable) in one step, enter:

ftlin mydocs.dlf -i mydocs.lst -s | ftcin mytable -q -x -f

For More Information About

creating library files, see the section, "Preparing Libraries," in Chap-ter 4, "Using External Text."

loading a library object file into a table, see Chapter 5, "Maintaining the Data."

ftlock

ftlock alters table attributes. You can use it to set or cancel high in-tegrity mode.

In Microsoft 16-bit Windows environments, ftlock is a Windows application.

Command Line

The syntax for ftlock is:

Page 292: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

292 SA-Application Software Expert 5.0

C

ftlock <table name> [-h | -c]

ftlock continues the processing of all tables named on the command line in spite of any errors encountered. If the request couldn't be per-formed, or if any table fails a validity check, the exit status indicates failure (a value other than zero is returned), and a message is written to standard error.

Parameters

The command line parameters are interpreted as follows:

<table name>

A compound identifier that specifies the name of a table on which you want to perform an ftlock operation. -h

Enables row locking for the table(s) specified by the table name identifier. Sets the value in the FTT_NOLOCKING column of the TABLES system information table to `FALSE'.

Note: Row locking has no effect in a Microsoft Windows stand-alone environment.

-c

Cancels row locking for on the table(s) specified by the table name identifier. Subsequent retrievals from the FTT_NOLOCKING col-umn of the TABLES system information table obtain a value of `TRUE'.

For More Information About

unlocking a table, see the section, "Recovering from Indexing Fail-ure," in Chapter 5, "Maintaining the Data."

ftlout

ftlout is a command line utility program used to unload document library files created using the ftlin utility program.

CAUTION: If you build a library file in a UNIX environment with names longer than eight characters, ftlout on DOS truncates the name.

In Microsoft 16-bit Windows environments, ftlout is a DOS appli-cation.

Page 293: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 293

Command Line

To create a separate file for every logical member of the document library file, use the following syntax:

ftlout <library_filename>

Parameters

The command line parameter is interpreted as follows:

<library_filename>

Specifies the name of the document library file you want to unload. The character string stored in the header of each document library file member is used to name the files.

For More Information About

creating library files, see the section, "Preparing Libraries," in Chap-ter 4, "Using External Text."

ftmload

ftmload is a command line utility program used to load the dynamic library table FULTEXT.EFT and the FTMESS and ETMESS mes-sage files into the system configuration file. In Microsoft 16-bit Windows environments, ftmload is a DOS ap-plication.

Command Line

The syntax for ftmload is:

ftmload <sourcefile> [-o <object_name>[-m <minifile>][-v] [-e] [-f filter_list]:

Parameters

The command line parameters are interpreted as follows:

<sourcefile>

Specifies the name of the file to be loaded.

-o <object_name>

Names the object that is to be loaded from the sourcefile. If this op-tion isn't specified, sourcefile is used as the object name. Valid ob-ject names are FTMESS, ETMESS, and FULTEXT.EFT.

Page 294: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

294 SA-Application Software Expert 5.0

C

-m <minifile>

Specifies the full pathname of the FULTEXT.FTC file. If -m isn't specified, FULSEARCH is used to locate FULTEXT.FTC.

-v

Causes the full pathname of the FULTEXT.FTC file loaded and the name of the object loaded to be written to the standard output stream.

-e

Merges the content of the sourcefile with the existing object. If this option isn't specified, any existing object named object_name is overwritten. This option is only valid when the object name is FUL-TEXT.EFT.

-f

Loads the object using a list of specified document text readers. This option can't be used when loading FULTEXT.EFT. A valid FUL-TEXT.EFT must be loaded before this option can be used.

For More Information About

updating and loading the dynamic library table, see the section, "Loading the Dynamic Text Reader Table," in Chapter 4, "Using External Text."

ftmunld

ftmunld is a command line utility program used to display the dy-namic library table FULTEXT.EFT.

Note: This utility can only be used for FULTEXT.EFT. The FT-MESS and ETMESS message files cannot be requested with this utility. In Microsoft 16-bit Windows environments, ftmunld is a DOS ap-plication.

Command Line

The syntax for ftmunld is:

ftmunld <target file> [-m <minifile>] [-v] -o [<object_name>]

Page 295: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 295

Parameters

The command line parameter is interpreted as follows:

<target file>

Specifies the file to receive, in a readable form, a copy of the named object.

-m <minifile>

Specifies the full pathname of the FULTEXT.FTC file. If -m isn't specified, standard search methods are used to locate FUL-TEXT.FTC.

-v

Generates the full pathname of the FULTEXT.FTC file loaded and the name of the object loaded.

-o<object_name>This parameter names the object. The only valid object name is FULTEXT.EFT.

For More Information About

unloading the dynamic library table, see the section, "Loading the Dynamic Text Reader Table" in Chapter 4, "Using External Text."

ftpr

This utility can be used to verify that the printed format of the text conforms to your specifications.

The ftpr utility reads a text stream from a text reader chain, and out-puts it in flat ASCII format or in Fulcrum Technologies Document Format (FTDF). It sends output to standard output or to a specified file. The output contains a form feed after each file (unless the -s pa-rameter is specified). The first 25 lines of ftpr.txt is displayed while ftpr is running.

Note: The ftpr utility is also useful in data preparation. Documents that require a custom text reader can be converted to FTDF, and dis-tributed in that form. Users can then access the data with the stan-dard text reader.

Command Line

The syntax for ftpr is: ftpr <inputfilename > [-f <textreaderlist>] [-o <filename>] | [-a <filename>] [-s] [-g]

Page 296: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

296 SA-Application Software Expert 5.0

C

Parameters

The command line parameter is interpreted as follows:

<inputfilename>

Refers to the name of the file that contains the document (or docu-ments) you want to print. More than one file can be specified.

-f<textreaderlist>

Defines a list of document text reader identifiers (delimited by co-lons and without blanks) that will be used with the input file. The standard text reader can be used with ASCII files and those in FTDF.

-o<filename>

Names the file where output from the ftpr will be written. If the file exists, then it is truncated before reading the first input file. If the file does not exist, it is created. You cannot use this option with the -a option.

-a<filename>

Appends the output from ftpr to the file named as the input file. If this file does not exist, it is created. You cannot use this option with the -o option.

-s

Causes ftpr suppress flat ASCII formatting. If this option is not present, the file is output in 8-bit ASCII format (conforming to ISO 2022, but without control sequences). An attempt is made to format the document according to the embedded control sequences.

-g

Indicates that GUI mode sequences will be emitted if supported in the document text reader. When this parameter is specified, ftpr will attempt to open the source file with the text reader in GUI mode. If this option is used, the -s option should also be specified.

SearchDoc

SearchDoc is an interactive, full-screen interface for browsing the SearchServer online documentation. It is compatible with character-based terminals or terminal emulators.

In all Microsoft Windows environments, SearchDoc is a Windows application.

Page 297: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Utility Program Summary

Text Retrieval Guide 297

Command Line

The syntax for SearchDoc is:

srchdoc

For More Information About

• using SearchDoc to search the entire SearchServer and SearchBuilder documentation, refer to Fulcrum SearchServer Getting Started.

Page 298: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

298 SA-Application Software Expert 5.0

C

Page 299: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 299

Appendix B:

Text Readers

This appendix describes the capabilities and limitations of the text

readers shipped with SearchServer.

The Fulcrum Multi-Format Text Reader (ftmf)

The Fulcrum Multi-Format Text Reader (MFTR) translates text from a wide variety of document formats into the Fulcrum Technol-ogies Document Format (FTDF).

The MFTR is now the recommended way of accessing WordPerfect and Microsoft Word documents for all new applications.

Text Reader Syntax

There are a number of text reader parameters that allow some of the default text reader settings to be changed when the document is opened. The format of the Fulcrum Multi-Format Text Reader spec-ification (ftmf) with its optional parameters is:

ftmf [/c=<setcode>] [/h=[d][i]] [/i=<transla-tion>] [/s][/tp=<field number>] [/z]

These parameters are interpreted as follows:

/c=<setcode> The /c parameter is used to recognize WordPerfect documents pre-pared with the Europa3 character set. The /c parameter is specified in the following way:

/c=e

/h

The /h parameter is used to choose whether or not hidden text is dis-played and/or indexed. By default, hidden text is not displayed or in-dexed. Table B-1 shows the valid parameters and their descriptions.

Page 300: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

300 SA-Application Software Expert 5.0

C

Table B-1Valid /h Parameters

Note: All hidden text always appears in the output stream. The /h parameter controls whether this text is surrounded by control se-quences to suppress display and/or indexing.

/i=<translation>

The /i parameter is used to override the default character set used to translate the document. The document is assumed to be in the char-acter set specified by this parameter. The values supported for this parameter are the same as those for the nti text reader, except that the default is WIN_LATIN1. For example:

text reader list: ftmf/i=WIN_LATIN2

Note: For WordPerfect documents, this parameter has no effect. Use the /c parameter instead.

/s

The /s parameter is used to suppress the loading of document sum-mary information into reserved columns.

/tp=<field number>

The /tp parameter specifies the field number to be used to store the total page count of the document. If this parameter is omitted, the to-tal page count is not stored. If the /tp parameter is specified, then 1 plus the number of page breaks stored in the document is recorded as the page count in the column specified by the field number.

Note: Depending on the document format, the type of page breaks used in the document can affect the page count. For more informa-tion, see "Page Breaks" later in this appendix.

Parameter Description

h= hidden text is not displayed and not indexed

h=d hidden text is displayed, but not indexed

h=i hidden text is indexed, but not displayed

h=di or h=id hidden text is displayed and indexed

Page 301: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 301

/z

The /z parameter disables heuristic processing. Normally, the MFTR attempts to positively identify a wide variety of document formats. When the format is not identifiable, the MFTR uses a heuristic pro-cess to discriminate flat text data from other data. It assumes that flat text does not include control characters in the ranges 0x00-0x06, 0x0e-0x1a, and 0x7f. For files presumed (by this algorithm) to be flat text, the MFTR uses its ANSI8 filter. You would use the /z pa-rameter if you have flat text documents that contain any of these control codes.

CAUTION: If you disable this heuristic processing on a table that uses directory expansion on arbitrary directories, the MFTR treats as flat text all files whose formats are not positively identified. This can increase the overhead for indexing and searching time, and for table management file space.

Supported Document Formats

This text reader supports a wide variety of word processing docu-ment formats, including WordPerfect, Microsoft Word, and Ami Pro. For a complete list, see the FLISTSS.TXT file included in your install directory.

Support for non-word processing formats, such as presentations and spreadsheets, is also available.

Supported Document Features

The following document features are supported:

bold page break (hard/soft)

character translation paragraph break

first line indent space (non-breaking)

font name subscript

font size summary info

hyphen (hard/soft) superscript

hidden text (Word for Windows, WordPerfect and Ami Pro only)

tab

indent (left/right) tab settings

initial left and right margins tables

italics underline

justification (left, right, center, full)

Page 302: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

302 SA-Application Software Expert 5.0

C

Note: The representation of tables is an approximation.

Unsupported Document Features

The MFTR also does not support

• bullet marks, paragraph numbers, or other non-stored text if it is a paragraph attribute rather than stored text

• Microsoft Word files that are password protected

• Microsoft Word 6.0 INSERT/FORM FIELD TEXT, CHECK BOX, and DROP DOWN

Summary Information for the Search Result

Most word processors attach information to the document to identify the author, subject, etc. This is the type of summary information by which you'll want to be able to search explicitly and want to see in the search result list of your application.

When row data is indexed, SearchServer copies the contents of cer-tain fields in the summary information to reserved columns in the ta-ble. This can be suppressed with the /s parameter described in "Text Reader Syntax," earlier in this section.

Note: Summary information data is copied to these columns each time the row is re-indexed. Therefore you should not update these columns, as this data will be overwritten when the row is re-indexed.

The reserved columns that SearchServer uses for summary informa-tion are as follows:

FT_OWNER

The Author summary information is copied to the FT_OWNER re-served column. SearchServer fills FT_OWNER with the document's

annotations headers

bookmarks line numbering

borders line spacing

character color macros

comments margins

footnotes multiple columns

frames paragraph spacing

glossaries set page number

graphics text boxes

Page 303: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 303

owner name during directory expansion. The MFTR subsequently overwrites FT_OWNER with the Author summary information dur-ing indexing.

FT_KEYWORDS

The Keywords summary information is copied to the FT_KEYWORDS reserved column.

FT_SUBJECT

The Subject summary information is copied to the FT_SUBJECT reserved column. However, if no Subject information is supplied, the Title information is used.

Other Reserved Columns

This text reader writes the values into the FT_FORMAT and FT_ORIGINAL_SIZE reserved columns. For FT_FORMAT, the possible values are documented in the SCCFI.H file. The value for FT_ORIGINAL_SIZE is the original size of the external document.

Page Breaks

Word processors use two types of page breaks, usually referred to as "hard" and "soft" breaks. A hard page break is inserted by the docu-ment author to ensure the material which follows it appears on a new page. Soft page breaks are inserted by the word processor as part of its pagination function. Their positions are calculated based on page size, margins, size of type, etc.

Most word processors store page breaks in the document. However, some (such as WordPerfect) store both hard and soft breaks, while others (such as Microsoft Word and AmiPro) store only hard breaks. The MFTR emits a page break when it encounters either type of page break stored in the document.

Documents in formats that store only hard page breaks can cause the text reader to produce an inaccurate page count. If you require a cor-rect page count, Microsoft Word or AmiPro documents should be prepared with a hard page break at the end of each page.

For more information about the text reader's page count function, see the /tp parameter described in "Text Reader Syntax" earlier in this section.

Redistributing the Supported Formats

Each supported document format has an associated dynamic link li-brary (DLL) that must be included with your application. It is not necessary to include all of the DLLs, just those corresponding to for-mats that your application supports. If your application is then used with a different document format, SearchServer will report the prob-

Page 304: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

304 SA-Application Software Expert 5.0

C

lem in a SQLSTATE.

The Fulcrum Library Text Reader (l)

A Fulcrum library file is a single operating system file that contains multiple logical documents. These documents are called library doc-uments. A single document library file can contain all of the external documents that are to be included in a table, or it can be supplement-ed with additional external documents and document library files.

The Fulcrum library text reader allows the individual indexing of li-brary documents as if they were single operating system files. This text reader also supports automatic library file expansion during ta-ble validation. This means that after you create the document library file and include it in a table, the library documents in that file are au-tomatically referenced by new rows inserted into the table and then indexed in the usual way through the execution of a VALIDATEIN-DEX statement with the VALIDATETABLE parameter.

The input and output processing is shared between the Fulcrum li-brary text reader and typically two other text readers. The Fulcrum library text reader retrieves the individual documents from a docu-ment library file, but translation to the FTDF and the basic input op-erations must be performed by other text readers.

Individual external documents must be loaded into a document li-brary file before the table it is associated with is indexed. Search-Server is distributed with the ftlin and ftlout utility programs, which you can use to load and unload document library files.

Text Reader Syntax

The Fulcrum library text reader requires an expansion text reader list:

l/x[:<storage access text reader list>]! [<document text reader list>:]l/r/@ [:<storage access text reader list>]

The document text reader list is optional if the documents are stored as FTDF ASCII. Otherwise, it consists of a document format text reader, possibly with FTDF parsing text readers to its left.

The storage access text reader list is optional if the library is stored in an operating system file.

Offsets are supplied by the expansion text reader automatically dur-ing the indexing of library documents, and these offsets are placed in the model text reader list at the place marked by an at-sign (@).

The relative offset is defined as the byte offset from the beginning of a library file, to the header of the document record. The offsets are expressed as eight-character sequences of hexadecimal digits.

Page 305: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 305

Example

The following is an example of a Fulcrum library text reader list:

l/x:s!ftmf:l/r/@:s

where

l/x:s is the expansion text reader list, where

! separates the expansion text reader list from the model text reader list

ftmf:l/r/@:s is the model text reader list, where

The Translation Text Reader (nti)

The translation text reader translates text from an external character set to FTICS. It is distributed with translation tables to determine the mapping of characters between the FTICS and the external character sets.

Text Reader Syntax

The format of the translation text reader is:

nti[<selector>] | {[/m=[i or o] | [/t=<translator name>]}

<selector>

Selects the character set mapping used internally by the translation text reader. It is a single character chosen from the following table of supported character sets.

/x expansion mode parameter

s standard s text reader

ftmf document text reader to be used on each library member

/r retrieval mode parameter

/@ permits automatic substitution of the offset value

s standard s text reader

Translation TableName

Selector External CharacterSet

Internal CharacterSet

ASCII a ASCII (7-bit) FTCS94

MACINTOSH c Macintosh FTCS94

DOS d DOS FTCS94

Page 306: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

306 SA-Application Software Expert 5.0

C

Table B-2 External Character Sets and Short Table Names for nti Text Reader

Note: The long table names corresponds to the character set transla-tion table names in FULTEXT.EFT. Translation tables can be cus-tomized. See the Fulcrum SearchServer Customization Guide for details.

For example, to interpret a text file that is coded in the ISO 8859-2 character set, the following text reader list is used:

ntix

/m Indicates whether the mode is input (i) or output (o). An input mode indicates translation from an Application Character Set (ACS) to a Fulcrum Technologies Internal Character Set (FTICS). An output mode indicates translation from a FTICS to an ACS. If this option is omitted, the default is output mode.

/t Specifies the translator name. It can be one or more characters in length. If this option is omitted, the default character set translator (z) is used.

If the short table name is omitted, the default for your platform is used:

Table B-3 Platform Default Translation Specification

Additional character sets can be supported by defining custom trans-lation tables. You need the SearchServer Customization Tools to en-able this feature.

EUROPA3 e Windows Europa3 EFTCS94

ISO_LATIN1 i ISO 8859-1 (Latin-1) FTCS94

WIN_LATIN1 n Windows Latin-1 FTCS94

ISO_LATIN2 x ISO 8859-2 (Latin-2) FTCS94

WIN_LATIN2 y Windows Latin-2 FTCS94

Platform Default Translation Table Name

16-bit Windows DOS

32-bit Windows WIN-LATIN1

UNIX ASCII

Macintosh MACINTOSH

Page 307: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 307

The Test Text Reader (t)

Control sequences and other binary information can be embedded in the text of external documents. However, many text editors make it difficult to embed such sequences in the text. The test text reader al-lows you to embed character sequences in a document that will then be converted into binary codes by the text reader. These binary codes can then be interpreted by SearchServer.

The test text reader should be used instead of the translation text reader when external character set to FTICS translation is required and arbitrary control sequences are embedded in the external docu-ment.

CAUTION: This text reader converts sequences of printable char-acters into control characters. This changes the FTDF stream that represents the document original in such a way that no viewer acting on the original stream is able to display the same text that appears in the output from the t text reader.

Due to this limitation, you should never use the t text reader in the FT_FLIST reserved column of a table. Instead, use it only with the ftpr utility to produce documents that are stored in FTDF form. This makes ftpr the authoring application and the FTDF form the origi-nal.

Text Reader Syntax

The syntax for the text reader is:

t/i=<translation table name>

/i The character set conversion option is controlled by the i parameter to the test text reader. If this option is omitted, the default for your platform, as specified for the nti text reader, is used.

If the input stream is already in FTICS, character set translation is undesirable. The translation can be suppressed by using the /i=0 op-tion. For example, you can use this option if the test text reader is used after a text reader that converts the document to FTICS.

<translation table name>

Names the translation table. See the description of the nti text reader for details.

Page 308: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

308 SA-Application Software Expert 5.0

C

Character Set Conversion

Following the conversion of the characters to binary codes, the codes in the external character set (ECS) are converted to FTICS codes. Note that some character sequences are converted directly to FTICS codes (for example, \Fxnn) and no further conversions are done.

The test text reader interprets character sequences as follows:

Table B-4 External Character Translation using the Test Text Reader (Contin-ued)

A backslash (\) preceding any character other than those listed in the preceding table is ignored. For example, \B is converted to the single character B.

Note: /f, /r and /b have no FTICS equivalent.

The C Source Code Text Reader (r)

The C source code text reader divides a document (presumed to con-tain C language source code) into three zones:

• comments (zone number 121),

• quoted strings (zone number 119)

• other text (zone number 32, the default for the external text column)

External Character Set Representation

Value Output by Test Text Reader

\\ ECS backslash

\E FTICS escape (x1B)

\C FTICS control sequence introducer (x9B)

\^ ECS caret (^)

\nnn (for example, \032) ECS octal value

\0xnn (for example, \0xA3) ECS hexadecimal value

\Fxnn (for example, \FxA3) FTICS hexadecimal value

^a (for example, ^Z) ECS control character

\f ECS form feed (x0C)

\n ECS line feed (x0A)

\r ECS return (x0D)

\t ECS tab (x09)

\b ECS backspace (x08)

Page 309: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 309

The CSOURCE.STP stop file is included with the software and is useful for C source tables.

The Directory Expansion Text Reader (dir)

The directory expansion text reader causes a row containing a direc-tory name to be expanded into several rows each of which contain a document found in this directory. The expansion can produce other directory entries. This expansion is performed through the execution of a VALIDATEINDEX statement that specifies the VALI-DATETABLE parameter.

The Standard Text Reader (s)

The standard text reader is a storage access text reader that reads standard operating system files. This text reader is assumed when no storage access text reader is specified.

This text reader accepts an optional offset that can be used with a Fulcrum document library generated by the ftlin utility. (The offsets are also generated by the same utility, as a table data file.) In this case, performance is optimized by keeping the library file open be-tween table rows that reference the same library file.

The PDF Text Reader (pdf)

The PDF text reader translates Portable Document Format (PDF) files into the Fulcrum Technologies Document Format (FTDF). It also supports retrieval of original PDF document data.

The PDF text reader is supported on Windows NT and Solaris plat-forms only.

Text Reader Syntax

The syntax for the text reader is:

pdf [/d=<temp file>] [/o=0|1]

/d=<temp file> The Dir (/d) parameter specifies an optional directory to use for a temporary file. This option is ignored if this is the last text reader in the chain.

/o The Order (/o) parameter determines whether the text reader ex-tracts text in XY (0) or PDF (1) order (see "Extraction Order" be-low).

Page 310: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

310 SA-Application Software Expert 5.0

C

Specifying Text Reader Parameters in the Windows NT Registry

The PDF text reader can also accept parameters specified in the Windows NT registry, but parameters specified in the text reader list take precedence. The PDF parameters are stored by name under HKEY_LOCAL_MACHINE in \SOFTWARE\Fulcrum\PDF Text Reader\1.1\PDF Text Reader. You should change only the parame-ters that are listed under "Text Reader Syntax", above.

Specifying Text Reader Parameters in Unix

The PDF text reader can also accept parameters specified in the FT-PDF.INI file or as environment variables, but parameters specified in the text reader list take precedence.

Using the FTPDF.INI File

You can specify parameters in the FTPDF.INI file using either the parameter name or its abbreviation (see "Text Reader Syntax" above). For example: "Order=1" or "o=1"

Using Environment Variables

You can specify parameters as environment variables. Prefix the pa-rameter name with "FULPDF_". For example:

setenv FULPDF_order 1

Parameters specified in environment variables take precedence over parameters specifed in the INI file.

Text Reader List

Although you can use "pdf:s" as the text reader list, "pdf" provides better performance.

Summary Information

The PDF text reader inserts the following information into the SearchServer table during indexing:

SearchServer Column Name

Field ID PDF Information

FT_FORMAT 102 Format ID 3003

FT_ORIGINAL_SIZE 103 Original file size in bytes

FT_DNAME 110 Document Title

FT_OWNER 97 Document Author

FT_SUBJECT 107 Document Subject

FT_KEYWORDS 104 Document Keywords

Page 311: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Readers

Text Retrieval Guide 311

Extraction Order

The PDF text reader extracts the text from the document in one of two orders: XY or PDF. Using XY order, the text reader extracts text from the document by following the X (left to right) and Y (top to bottom) coordinates on each page. Using PDF order, the text reader extracts text in the order that it encounters PDF entities.

For example, if a document contains this two-column piece of text:

XY ordering would read the text from the PDF document as:

While PDF ordering would extract the text as:

The default order is XY.

Humpty Dumpty sat on a wall,Humpty Dumpty had a great fall.

Hickory, Dickory, Dock.The mouse ran up the clock.

Humpty Dumpty sat on a wall, Hickory, Dickory, Dock. Humpty Dumpty had a great fall. The mouse ran up the clock.

Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall.Hick-ory, Dickory, Dock. The mouse ran up the clock.

Page 312: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

312 SA-Application Software Expert 5.0

C

Page 313: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Table Management Files

Text Retrieval Guide 313

Appendix C:

Table Management Files

This appendix provides information about the table management

files and the data storage requirements needed for them.

Overview

Each table has an associated set of table management files. They al-low SearchServer to manage and access the searchable data.

Data values for internal columns are stored in the table management data files. Indexing information is retained in the table management index files. SearchServer relies on this information to reference the exact location of all the words that make up the searchable data. A large table has large table management files.

With the exception of the configuration file and indexing log file, the table management files are binary files that can be read only by the server.

Figure C-1The Configuration File Identifies the Other Table Management Files

The Configuration File

A configuration file is created when you define a new table. It is named according to the table name compound identifier in the CRE-ATE TABLE clause of the CREATE SCHEMA statement or the CREATE TABLE statement. To name the configuration file, SearchServer appends the extension .CFG to the table name.

The configuration file is an ASCII file that identifies the other table management files and their locations. The other table management files include the data files, the dictionary, the reference file and the

Page 314: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

314 SA-Application Software Expert 5.0

C

index log file. A differential file would also be created to support an IMMEDIATE table. These files are integral to the table they are as-sociated with.

Editing the Configuration File

You can use an ASCII text editor to modify the configuration file.

You will have to access the configuration file if you want to change the BASEPATH associated with a table because you're going to move one or more external documents from one directory or com-puter system to another.

Optionally you can access the configuration file to:

• let accented characters be indexed

• let the stop file be read by a text reader

• name a second temporary work directory for indexing

What Does the Configuration File Look Like?

There is one entry per line in the configuration file. Each associates a value with a keyword. The value specifies a file or a directory, and the form is as follows:

<keyword>=<value> -or- <keyword>:<value>

When the equals symbol (=) is used, a filename in the value can in-clude a full or relative path specification. If a path is not specified, the path to the table's configuration file is prepended dynamically as required.

When the pair is separated with a colon (:), the value represents data other than a file or directory. In this case, value can be a single word, or multiple words enclosed in double quotation marks (").

The following example shows a configuration file that could have been created by SearchServer and edited later in a directory called /usr/myhome/fulcrum/fultext in a UNIX environment. Comments have been added on the right-hand side of the file.

CAT=searchme.cat name of catalog file CIX=searchme.cix name of catalog map file DCT=searchme.dct name of dictionary file REF=searchme.ref name of reference file DYX=searchme.dyx name of differential file DUP=searchme.dup name of temporary dictionary file RUP=searchme.rup name of temporary reference file SRT=searchme.srt name of intermediate sort file

Page 315: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Table Management Files

Text Retrieval Guide 315

SWK=/var/tmp name of WORKDIR directory LOG=searchme.log name of indexing log file STP=fultext.stp name of stop file PTH=../docdir/ BASEPATH specification

The associated table name is searchme, and the example assumes that /usr/myhome/fulcrum/fultext contains a stop file called ful-text.stp. During indexing, SearchServer creates the intermediate sort files in the /var/tmp directory.

This example contains most of the entries you could find in a table's configuration file. If they're needed, you can add another three en-tries using a standard text editor, as described in the section, "Prepar-ing for Indexing," in Chapter 5. They are STF:, OPT:, and SW2=.

Where Do the Entries Come From?

The values for some of the keyword values are specified in the CRE-ATE TABLE clause of the CREATE SCHEMA statement or the CREATE TABLE statement. The following table shows how the en-tries are derived from CREATE TABLE parameters (directly or in-directly), and how they are reflected in the columns of the TABLES system information table.

An explanation of the entries in this table are shown in Table C-1.

Table C-1 How Configuration File Entries Reflect CREATE TABLE Clause Parameters

For more information about the CREATE TABLE clause parame-ters and system information tables, see the Fulcrum SearchServer

ConfigurationFile Entry

CREATE TABLEClausetable parameter

TABLESColumn Name

CAT= <table name>.cat

CIX= <table name>.cix

DCT= <table name>.dct

REF= <table name>.ref

DYX= <table name>.dyx and IMMEDIATE

FTT_IMMEDIATE

DUP= <table name>.dup

RUP= <table name>.rup

SRT= <table name>.srt

SWK= WORKDIR <work directory> FTT_WORKDIR

LOG= <table name>.log FTT_INDEXLOG

STP= STOPFILE <stop filename> FTT_STOPFILE

PTH= BASEPATH <base path> FTT_BASEPATH

INDEXDIR <index directory> FTT_INDEXDIR

Page 316: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

316 SA-Application Software Expert 5.0

C

SearchSQL Reference.

The files named by the CAT=, CIX=, DCT=, REF=, DYX=, DUP=, RUP=, and SRT= entries are created in the directory named in the INDEXDIR table parameter of the CREATE TABLE clause. It must be a directory to which SearchServer has read, write, and search ac-cess.

If the INDEXDIR table parameter isn't specified, or the CRE-ATETABLE statement is used, the files are placed in the same di-rectory as the configuration file. You can view the name of the INDEXDIR directory by selecting the FTT_INDEXDIR column of the TABLES system table.

When the IMMEDIATE table parameter is specified in the CRE-ATE TABLE clause, or the CREATETABLE statement is used, a DYX= entry and a file called tablename.DYX are created. This means that the table makes use of immediate indexing. You can de-termine whether a table has an immediate index by selecting the FTT_IMMEDIATE column of the TABLES system table.

A temporary work directory must exist by the time indexing hap-pens. You can name it through the WORKDIR table parameter of the CREATE TABLE clause, or you can add it to the configuration file through the SWK= entry. If the WORKDIR table parameter isn't specified, or the CREATETABLE statement is used, the FULTEMP server attribute is consulted for the name of a default temporary work directory. You can look at the name of the WORKDIR direc-tory by selecting the FTT_WORKDIR column of the TABLES sys-tem table.

The name assigned to the LOG= entry is tablename.LOG. The con-tents of this file can be selected through the FTT_INDEXLOG col-umn of the TABLES system table.

The STP= entry names the stop file according to the STOPFILE ta-ble parameter of the CREATE TABLE clause. You can look at the name of the stop file associated with a table by selecting the FTT_STOPFILE column of the TABLES system table. If the CRE-ATETABLE statement is used to create the table, the default stop filename, FULTEXT.STP, is used.

The PTH= entry defines the path to the external documents associ-ated with a table. If the path defined in the configuration file is a rel-ative path, it is interpreted as being relative to the directory in which the configuration file resides. The path is specified through the BASEPATH table parameter of the CREATE TABLE clause.

You can look at the value of the BASEPATH table parameter by se-lecting the FTT_BASEPATH column of the TABLES system table. If there is no PTH= entry in the configuration file, or the CRE-ATETABLE statement is used, external documents are located rel-ative to the directory that contains the table's configuration file.

Page 317: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Table Management Files

Text Retrieval Guide 317

The Table Management Data Files

Each table has table management data files. These files contain a pointer and a status flag for each record and one record for each row in a table.

Records are divided into fields. Each field directly corresponds to one column in a table. Indexing makes data directly searchable. The data values of all internal columns are stored in the table manage-ment data files. The only data not stored in these files are the con-tents of external documents. This external text can be stored separately from the table management data files in another area of the file system directory structure.

When making copies of the table management files for backup pur-poses, be sure that no inserts, updates or deletes are issued while the copies are underway. If you don't, the integrity of the copies may be compromised.

Data Storage Requirements

SearchServer allocates as much space as needed for each row in a ta-ble. The space is allocated in segments of 128 bytes. Three of the bytes in a segment aren't available for data storage. To estimate how much space a row will need, multiply the number of columns in the row by three, and add the result to the total size of the data in all columns except the external text column, as in:

row space = total size of data in all internal columns + 3 x (total number of existing columns in row)

Round the result up to the nearest multiple of 128 bytes, after allow-ing for three bytes of overhead per segment.

The total size of the table management data files would be approxi-mately:

table management data files size =256 bytes + (total space for all rows) + 30 x (number of all columns and zones in

schema) + (total length of all column and zone names)

+ 6 bytes + 6 x (total number of all rows (including any deleted rows))

The Table Management Index Files

This section describes the table management index files and their data storage requirements.

Page 318: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

318 SA-Application Software Expert 5.0

C

The Dictionary and Reference Files

Each table has a dictionary and reference file. These files, called in-dex files, record each occurrence of every search term in a column. The dictionary file contains one entry per term, and each entry has a pointer to detailed location information in the reference file. The ref-erence file contains the location (row and offset) of each occurrence of each search term.

Data Storage Requirements

The size of the dictionary and the reference file depend on the amount of data in the table, the number of unique words in the table, and the number of words in the stop file.

Some sample sizes for dictionary and the reference files for typical text are shown in Table C-2:

Table C-2 Sample Dictionary and Reference File Sizes

Dictionary growth slows as the amount of text increases, but the ref-erence file is linear at approximately 30 percent.

Reversed Dictionary Impact

In order to accommodate the prefix wildcard searches or prefix root expansion searches, each conventional dictionary term has its mirror image in a reversed dictionary. The reversed dictionary doubles the size of the dictionary file for periodic tables, and doubles the storage of the differential file for immediate tables.

The Differential File

The differential (DYX) file is a supplementary index file that is needed only for an IMMEDIATE table. It is created when the IM-MEDIATE parameter is specified in the CREATE TABLE clause or when the CREATE TABLE statement is used. The differential file is updated dynamically by any SearchServer operation that adds, modifies or deletes data.

Data Storage Requirements

The size of the differential file depends on the amount of data insert-ed or updated since the last indexing operation. The differential file typically requires from one to two times as much space as the data.

The differential file does not decrease in size after an indexing oper-

Data Size Dictionary Size Reference File Size

100 MB 7520 KB 40 KB

4 MB 2075 KB 1.5 MB

Page 319: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Table Management Files

Text Retrieval Guide 319

ation has transferred the information in the differential file to the dic-tionary and reference file. The existing file space is reused and extended as necessary by subsequent insert and update operations. This file space is released by executing a VALIDATEINDEX state-ment with the ABANDON parameter.

The Indexing Log File

The indexing log file is an external character set file that records the messages generated by SearchServer during when a VALIDATEIN-DEX statement is executed. The sizes of the table management data files, dictionary, and reference file are also recorded in the log file. You can view the contents of an index log file through the FTT_INDEXLOG column of the TABLES system table. For more information about system information tables, see the section, "The System Information Tables," in Chapter 2, "The Administration Tools."

The size of the indexing log file depends on the number of messages written to it during indexing. You can control whether messages from successive indexing operations overwrite or append to the log file with the REWIND option of the VALIDATEINDEX statement.

Page 320: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

320 SA-Application Software Expert 5.0

C

Page 321: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 321

Appendix D:

Control Characters and Control Sequences

This appendix provides information about control characters and

control sequences and how you can use them to:

• change indexing attributes

• specify display attributes

Control Characters

There are two control character sets: Primary and Secondary. Their control codes are included in the FTDF stream from the text readers. All the codes associated with control sequences are the representa-tions in the FTICS and the application character set being used. This means that the documents can include the control codes or the text reader can emit them into the text stream as it is processed. You can check for them without further translation to another character set.

Table D-1 describes the control codes in the Primary Control Char-acter Set.

Table D-1 Primary Control Character Set

Control Code Description

HT (0X09) The Horizontal Tab control code is used to position to the next tab stop, as set by the last SET TAB control sequence. By default, tab stops are set to every 8 character positions.

NL (0x0A) The New Line control code is used to position to the left-most position of the next line. Unlike NEL, the current paragraph continues and temporary paragraph format settings remain in effect.

FF (0x0C) The Form-Feed control code is used to start a new page. The active display position is moved to the top left-most position of the next page and the page number is incremented by one.

ESC (0x1B) The Escape control code indicates the beginning of an extended control function sequence.

An ESC followed by any code in the range 0x40 to 0x5F is equivalent to the single character with a corresponding code point in the range 0x80 to 0x9F. For example, ESC'[` (0x1B 0x5B) is equivalent to control sequence introducer (CSI) (0x9B).

Page 322: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

322 SA-Application Software Expert 5.0

C

Note: The notation \E represents ESC is SearchServer documenta-tion.

Table D-2 describes the control codes in the Secondary Control Character Set. Single character codes in the text stream are translat-ed by SearchServer into the two-character ESC equivalent sequence.

Control Code ESC Sequence Description

NEL (0x85) (0x1B,0x45) The Next Line control code is used to indicate the end of a paragraph. Temporary paragraph format settings (such as indent) are reset. The active display position moves to the left-most position of the next line.

PLD (0x8B) (0x1B,0x4B) The Partial Line Down control code indicates the beginning of a subscript, or the end of a superscript.

PLU (0x8C) (0x1B,0x4C) The Partial Line Up control code indicates the end of a subscript, or the beginning of a superscript.

Page 323: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 323

Table D-2Secondary Control Character Set

Parsing Control Sequences

As an application designer, you are responsible for understanding the affects of all control sequences. However, an individual applica-

SS2 (0x8E) (0x1B,0x4E) The Single Shift Two control code is used to extend the text character sets (for example, to declare line-drawing characters). These are non-locking shift codes that change the meaning of the next code only.

Note: External codes that do not have an FTICS equivalent are preceded by "SS2" in the FTICS stream. In the translation to the ACS during retrieval, the SS2 is stripped from the stream, leaving the original code. This technique is used, for example, in the Macintosh translation table.

The two-character sequence is treated like a single punctuation character for indexing purposes.

The characters that appear after this control character (0x21-0x7E or 0xA1-0xFE).

Any control characters following the SS2 sequence cause the SS2 sequence to be ignored.

This code is not permitted within a control function sequence.

CSI (0x9B) (0x1B,0x5B) The Control Sequence Introducer control code indicates that the following set of characters is a control sequence.

Control Code ESC Sequence Description

Page 324: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

324 SA-Application Software Expert 5.0

C

tion or FTDF text reader typically cares about only certain control sequences.

All control sequences are built according to the syntax specified in ANSI standard X3.64-1979:

<control sequence> ::= <control sequence introducer> [<parameter>

...][<intermediate>] <final> <control sequence introducer> ::= 0x9B | 0x1B

0x5B <parameter> ::= 0x30..0x3F <intermediate> ::= 0x20..0x2F <final> ::= 0x40..0x7E

It is possible to skip a control sequence without understanding it in detail. If a control sequence introducer (CSI) is encountered, parsing should proceed according to the syntax. Once a final has been en-countered, the control sequence can be acted on.

In display, control sequences that are not understood should be scanned up to and including the point where a final is encountered, and then discarded.

In a text reader, control sequences that are not understood should be scanned up to and including the final and either passed through or discarded, depending on the requirements for that text reader.

Note: Object tags are more complex. The object tag control se-quence includes a list of counts, the sum of which is the number of bytes in the complete tag. An object tag that is not acted on must be skipped by skipping the number of bytes represented by the sum of these counts. For more information, see the section called "Object Tag" later in this appendix.

Selecting a Zone Number

Zones are defined in the schema using the CREATE ZONE clause of the CREATESCHEMA statement. The CREATE ZONE clause assigns a list of one or more zone numbers to a named zone. Search-Server treats a named zone like a column for search purposes.

To enable this searching feature, zone control sequences must de-limit the text that is to be considered part of each separately search-able zone. The format of the select zone control sequence is discussed below. For more detailed information about zones, see Chapter 3, "Structuring the Data."

Select Zone

The select zone control sequence is used to partition text into sepa-rately searchable areas. These areas can overlap. The format of the select zone control sequence is:

Page 325: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 325

\E[parameter1;parameter2; ...;parameterNs

where each parameter can take one of the following forms:

startzone:endzone -or- zone_number

unless the parameter is empty (NULL).

A range of zones is indicated by a pair of values separated by a co-lon. Multiple individual zones or zone ranges are permitted, when separated by the semicolon. A specification of this type causes con-current indexing in multiple zones, as described below.

Only zones or zone pairs in the range of 32 through 64,010 are valid. Others are ignored. Some of the zone numbers are reserved for SearchServer, so be sure to use only the zone numbers that are de-fined in the schema. A null zone number means that SearchServer must use the default zone number for the column containing the text.

The default zone number for a column is equal to the column's field number. All words that follow a select zone control sequence (up to the next select zone control sequence) are indexed concurrently in each zone selected. For example, to index a portion of text in zones 230, 231, 232, and 235 concurrently, prefix the text with:

\E[230:232;235s

To revert to the default zone, follow the selected portion of text with \E[s. A select zone control sequence that isn't valid will cause the words that follow to be unindexed, and therefore unsearchable. Whenever possible, try to index data in only one zone to minimize index overhead. There is a maximum on the number of zones that can be indexed concurrently. See Fulcrum SearchServer Getting Started for these limits.

Setting or Changing Text Attributes

The following section describes how to set or change text attributes.

Select Index Display Mode

The select index mode control sequence (\E[parameter1;parameter2;...;parameterNv) supports the parame-ter values shown in Table D-3.

parameter Value Effect on Index Mode

0 Reset to Default Indexing (NORMAL index mode)

Page 326: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

326 SA-Application Software Expert 5.0

C

Table D-3 Index Mode Parameter Values

Most of these parameters affect the indexing mode and should only be used together with a select zone control sequence to mark a zone transition in the text that agrees with the schema of the table. Other-wise, unpredictable searching behavior can result. All of these con-trol sequences cause word breaks.

Parameters 5, 6, and 9 don't affect the indexing mode. Display con-trol sequences (parameters 5 and 6) don't affect searching, but SearchServer custom controls do act on display control sequences. Control sequences are described in the Fulcrum SearchBuilder doc-umentation for your platform if the feature is available.

If a conflict occurs among multiple parameters, the parameter that appears last takes precedence. A missing parameter is taken as zero. Most text attributes can be set independently, or in combination, providing they do not conflict. The data in each column initially has the text attributes correspond-ing to the indexing mode indicated in the schema for that column.

Note: Any column that has an index mode of NONE is ignored, re-gardless of any text attribute control sequences that might be embed-ded in the data.

1 Stop Text Indexing (without suspending character counting)

2 Start Text Indexing (NORMAL or LITERAL index mode) and Character Counting

3 Stop Text Indexing (NORMAL or LITERAL index mode) and Character Counting

4 Resume Character Counting

5 Stop Displaying Text (without affecting index mode)

6 Start Displaying Text (without affecting index mode)

7 Stop VALUE Indexing

8 Start VALUE Indexing (dates or numbers)

9 Force Term Break (without affecting index mode)

10 Reserved

11 Disable LITERAL Indexing (return to normal term rules)

12 Enable LITERAL Indexing (suspend normal term rules)

parameter Value Effect on Index Mode

Page 327: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 327

The valid index modes and the type of index they create are:

Examples of Transitions to New Index Modes

To make the transition from any index mode to an index mode of NONE, embed either the sequence \E[7;3v, or the sequence \E[7;1v in the column data.

To indicate a transition to VALUE indexing from another index mode, specify the sequence \E[3;8v, or the sequence \E[1;8v. \E[3;8v turns off character counting (not typically used), and \E[1;8v leaves character counting on.

To make a transition to NORMAL indexing, specify \E[0v or ESC[v. To change to LITERAL indexing, specify \E[0;12v. The control sequences given above are not the shortest ones for all possible transitions, but they guarantee correct behavior, regardless of the previous index mode.

Reset to Default Attributes

A parameter value of 0 resets the text attributes to their default set-tings, which are:

• start displaying text

• stop VALUE indexing

• disable LITERAL mode

• start NORMAL indexing

Text Indexing and Character Counting

Text indexing refers to NORMAL and LITERAL mode, whichever is currently in effect. A parameter value of 1 disables text indexing but lets SearchServer continue counting unindexed characters so that they can accurately influence searches that depend on position-ing information (such as searches on literal phrases, or involving a WITHIN clause in the contains predicate).

NORMAL The index created for the column or zone contains one entry for each word in the column. (The definition of a word is provided in Chapter2 of the Fulcrum SearchServer SearchSQL Reference.)

VALUE The index created for the column (or zone) contains one entry for each date or numeric value in the column or zone.

LITERAL The index created for the column (or zone) contains one entry for each literal term, delimited by tabs, newlines, or other control characters.

NONE No index is created. A column (or zone) defined with this index mode cannot be referenced in the WHERE clause and is labeled as not searchable.

Page 328: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

328 SA-Application Software Expert 5.0

C

The parameter values 3 and 2 respectively stop and start text index-ing and character counting so that the text in between disappears (al-though it is present in retrieved data). SearchServer sees the surrounding text as being adjacent for search purposes. A parameter value of 4 resumes character counting (after an instance of parame-ter 3) and can induce a separation with the same effect as parameter 1.

CAUTION: Indexing and character counting should only be started or stopped at the same time as displaying text is started or stopped. Otherwise, a viewer looking at the original stream will not know how to interpret match character counts.

NORMAL Indexing

In NORMAL indexing, the text stream is parsed into index terms ac-cording to character class rules, and each term of the text stream is indexed (and therefore searchable in your queries). Stop words aren't indexed.

NORMAL indexing and LITERAL indexing are mutually exclu-sive. The parameter values 11 and 12 switch between NORMAL and LITERAL indexing. The parameter values 1, 2, and 3 disable or enable whichever of these two modes is in effect.

VALUE Indexing

VALUE indexing and text indexing (whether it's NORMAL or LIT-ERAL indexing) are controlled independently. A parameter value of 1 or 2 does not affect VALUE indexing, and a parameter value of 7 or 8 does not affect text indexing. VALUE indexing happens in ad-dition to any other indexing mode in effect. You may want to stop text indexing when you start VALUE indexing (that is, \E[I3;8v), and vice versa (\E[7;2v or ESC[v).

LITERAL Indexing

In LITERAL indexing mode, a term with embedded punctuation and/or spaces that would otherwise be broken into separate words are indexed as a unit. A literal term is delimited by a control code (for example, a tab or new line character).

LITERAL indexing doesn't affect the function of control codes and sequences, nor does it affect stop word processing. Like other terms, a literal term is not indexed if it is found in the stop file. Not all pos-sible literal terms may be acceptable within the stop file syntax. For a definition of the stop file syntax, see the section, "Changing the Stop File," in Chapter 6, "Altering the Table."

Page 329: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 329

Insert Character Separation

The insert character separation control sequence (\E[parameter1;parameter2;...;parameterN+v) is used to prevent SearchServer from matching the words surrounding this control se-quence with a literal phrase or WITHIN clause of the contains pred-icate.

CAUTION: The insert character separation control sequence can appear only in external text column data and can only be used with FTDF viewers. Using this sequence with other viewers results in in-correct search term highlighting.

A null or zero parameter defaults to a value of 1000. The maximum value of each parameter is 65,000.

The cumulative separation (the total of all the parameter values) must not cause the document to appear to be longer than 2 gigabytes (2,147,483,646 bytes). Match codes are not inserted after a text char-acter count of 16 million is reached.

Object Tag

The object tag control sequence (\E[object_ID;ref_length;label_length{) indicates that a reference to an external or embedded multimedia object follows (for example, a voice, sound, video, or hypertext link).

The value of the object_ID parameter must be a decimal value that is meaningful to the application. ref_length is optional, but when in-cluded, it must be a decimal number that indicates the number of bytes in the reference string following the control sequence.

label_length is optional also. It must be a decimal number that indi-cates the number of bytes in the label string following the reference string. The maximum value for label_length is 65520.

The control sequence must be followed by a reference string and a label string. The reference string has two substrings that are separat-ed by a tab character and identifies the object type and object data filename. The first substring is a mnemonic (in which the first three characters are significant) that identifies the object's type. For exam-ple, TIF, PCX, BMP, and WMF. The second substring is the name of a file containing the object's data.

The label string is last. It can have either two or three substrings sep-arated by tab characters and is used by the application to identify the label to the user. The first substring is a case-sensitive mnemonic (in which the first three characters are significant) that identifies the type of label. The label can be a text label (TEX) or an icon label

Page 330: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

330 SA-Application Software Expert 5.0

C

(ICO). Text labels are indexed unless the object tag control sequence and the following reference and label are surrounded by indexing on/off control sequences.

If the first substring begins with TEX, the second substring is text that is displayed with distinctive attributes as part of the document to provide a visual clue to the user. This second substring is also in-dexed, unless the object tag control is surrounded by indexing on/off control sequences.

If the first substring begins with ICO, the second substring identifies an executable file of a dynamic link library containing icon resourc-es, and the third substring is a numeric parameter that selects a par-ticular icon from the file.

The following examples illustrate references to Windows wallpaper bitmap files. The first example has a text label and the second exam-ple has an icon label identified as member 22 in the file MORI-CONS.DLL.

\E[1;14;21{bmp\tleaves.bmptest\tLeaves Wall-paper \E[2;13;20{bmp\tchess.bmpicon\tmori-cons.dll\t22

Note 1: You don't have to embed object tag sequences within on/off indexing sequences.

Note 2: When SearchServer inserts Select Graphic Rendition (SGR) sequences in the text label of an object tag to effect search term high-lighting, it can't adjust the parameter value in the control sequence that specifies the length of the label. Consequently, applications must compensate for this when parsing object tags. The SGR se-quence that enables or disables highlighting must not appear in the external FTDF stream delivered by text readers.Note 3: Object tag labels may not contain object tags or partial con-trol sequences.

Specifying Display Attributes

Display attributes define if and how characters are emphasized for display purposes. These sequences are ignored during indexing. If present, they can be interpreted by your application at display time. They are also interpreted by the SearchServer custom controls.

Select Graphic Rendition (SGR)

An SGR control sequence sets a display attribute that is associated with the characters that follow the sequence. This control sequence is represented as \E[parameter1;parameter2;...;parameterNm, where parameter can be one of the values listed in Table D-4. SGR

Page 331: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 331

sequences conform to ANSI standard X3.64.

Table D-4

parameter Value Attribute Associated with Text Following Sequence

0 (default) primary rendition (no underlining or emphasis)

1 bold begin

2 faint begin

3 italic begin

4 underline begin

5 slow blink begin

6 rapid blink begin

7 negative image (inverse video) begin

9 overstrike begin

10 primary font

11 to 19 first-ninth alternative fonts

21 or 22 normal intensity (bold or faint end)

23 italic end

24 underline end

25 or 26 blink end

27 negative image end (normal video)

29 overstrike end

30 black foreground

31 red foreground

32 green foreground

33 yellow foreground

34 blue foreground

35 magenta foreground

36 cyan foreground

37 white foreground

40 black background

41 red background

42 green background

43 yellow background

44 blue background

45 magenta background

46 cyan background

47 white background

32703 search term highlighting on

32723 search term highlighting off

Page 332: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

332 SA-Application Software Expert 5.0

C

Select Graphic Rendition Parameter Values

Zero is the default parameter value, thus both \E[m and \E[0m reset all display attributes to their default values.

Note: SearchServer automatically inserts match codes into the col-umn data when the context retrieval feature is enabled using the SET SHOW_MATCHES statement. They have the format of an SGR se-quence: \E[32703m (match code begin) and \E[32723m (match code end).

Specifying Fonts

SearchServer passes the FNT sequence or object tag variant that populates the font register through to the application from the text reader.

The process of font selection involves the following elements:

• a mechanism for registering a type face name

• a mechanism for activating a subset of the font register

• a mechanism for selecting a font for subsequent text

Note: Symbol fonts generally use a different character set than other fonts. However, because SearchServer has no knowledge of which font face names correspond to symbol fonts, characters in a symbol font are not treated differently from characters in any other font. This is true even though while it is in FTICS, the character probably has different semantics. For example, it may be a searchable charac-ter rather than a graphic symbol. If you don't want to index charac-ters in a symbol font, you should arrange for them to be surrounded by indexing off/on control sequences.

Augmenting the Font Register

In order to use the font register, control sequences must be used to associate the new typeface name with a new font-id, and associate the new font id with an SGR font id.

Adding a New Typeface Name

To associate a new typeface name with a font-id, use a variant of the object tag control sequence, in the following form:

\E[<font-id>;<reference-length>{

The \E[ introduces the control sequence and the semi-colon (;) sep-arates parameter values. The font-id is a decimal number that

Page 333: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 333

uniquely identifies a font in the register. The reference-length is a decimal number that specifies the number of bytes in the font's ref-erence that follows the control sequence's final character. The open curly brace ({) is the sequence's final character.

In this variant of the object tag sequence, the font-id replaces object-id, and there are no label-length or object-length parameters (and consequently no object label or embedded object data).

The font reference string has two substrings separated by a tab char-acter. The first substring is a mnemonic that must begin with FON and is case insensitive. The second substring is the typeface name.

The following example adds Lucida Sans to the font register as Font ID 8:

\E[8;16{font\tLucida Sans

Activating a Font Register Subset

SGR parameter values relating to font selection are limited to the range 10-19. Therefore, SGR font ids are limited to the range 0-9, and no more than 10 fonts can be available for selection at any one time. (Note that there is no such restriction on the value of font-ids.)

In order to make a new font available for selection, a text reader must emit a FNT sequence to associate a font id with an SGR font-id. The syntax of the FNT sequence is:

\E[SGR font number;font-id D

The first parameter is a number in the range 0-9, that identifies the SGR font id, and the second parameter is the font number. For ex-ample, to associate font-id 8 with SGR font number 8, use the fol-lowing sequence

\E8;8 D

Note that the `D' is preceded by a space character.

Selecting a Font

Once a font is present in the font register, and once it is made avail-able for selection (either implicitly or because of a preceding FNT sequence), the font is selected for rendering subsequent text by an SGR sequence with a parameter value in the range 1-19, correspond-ing to font numbers 0-9.

Summary

To select a font that is not in the font table, three sequences are re-quired. For example, to select Lucida Sans you must first associate the font name with a font id. The following sequence associates Lu-cida Sans with font number 8.

Page 334: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

334 SA-Application Software Expert 5.0

C

\E[8;16{font\tLucida Sans

Then you must associate the font-id with an SGR font-id. The fol-lowing sequence associates font-id 8 with SGR font-id 8.

\E[8;8 D

Finally, you can use an SGR sequence to activate the font:

\E[18m

Specifying Formatting Attributes

Formatting attributes affect the format of retrieved data. These se-quences are intended for use at data retrieval and display time, and their only effect at indexing time is to cause a word break (except for the soft hyphen, which does not).

The following control sequences specify parameters in terms of de-cipoints (1/720 inch).

• Set Left/Right Margin

• Temporary Left Margin

• Set Tab

• Center

For more information about how the formatting attributes affect for-matting, refer to the Fulcrum SearchServer or Fulcrum Search-Builder Developer's Guide for your operating environment.

Set Left/Right Margin

The parameters of the set left and set right margin control sequences (\E[parameter$@ and \E[parameter$A respectively) refer to the left and right positions in a formatted line where the display of charac-ters begins and ends.

The default left margin is position 0 (which is the left-most display-able position). Set left margin takes effect at the beginning of a line. If the sequence is received in the middle of a line, the next line as-sumes the new margin.

For example, to set the left margin at a half-inch from the beginning of the line, specify the sequence \E[360$@. To set the right margin at 4320 (6 inches) from the beginning of the line, specify the se-quence \E[4320$A.

The default right margin position is 4680 (6,5 inches). It can be changed to any value less than 4680.

Page 335: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 335

Set Page Number

The set page number control sequence (\E[value*v) is used to indi-cate the current page. SearchServer views a document stream as a sequence of contiguous pages numbered starting at one. Each page is terminated by a form feed character and the next character begins the next page. When a form feed character (\0x0C) is encountered, the current page number is incremented by one.

You can override the default page number by inserting a page num-ber control sequence before any displayable text on a page. If this value is less than the current page number, it has no effect.

Current Line Indent

This control sequence allows you to specify a positive or negative indent for the current line. To specify a positive indent, use the se-quence \E[?>size$F. The parameter is a number specifying the size of the indent. To specify a negative indent, use the sequence \E[?<size$F. The ̀ ?' character must be generated by text readers but may be removed by SearchServer.

Justify

The appearance of the justify control sequence affects the justifica-tion of the subsequent text (\E[parameter1; parameter2;...;parame-terN F).

Table D-5 summarizes the Justify parameter values, as specified in ANSI standard X3.64.

Table D-5Justify Parameter Values

When a justify control sequence is encountered, the justification state implied by its parameter values replaces the previous state, as if the first parameter value were zero.

Parameter Interpretation

0 disable justification

1 enable fill action on (text to or from other lines)

2 enable variable interword spacing

3 enable variable letter spacing

4 enable hyphenation

5 flush left margin

6 center between margins

7 flush right margin

8 Italian form (arbitrary word break with underscore)

Page 336: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

336 SA-Application Software Expert 5.0

C

Set Tab

The set tab control sequence (\E[tab_stop1;tab_stop2; ...;tab_stopN N) takes a list of parameters that are separated by semicolons. Each tab_stop indicates the position of a tab stop, where the left-most po-sition is 0. The default tab stops are every 360 decipoints apart, at 360;720;1080 and so on.

The appearance of a tab set control sequence clears all previous tab stop specifications in order to set new ones. For example, to change the first two default tab stops to a quarter-inch apart (and clear all the rest) specify the sequence \E[180;360 N.

SearchServer provides an extended version of the set tab control se-quence in which each parameter value takes the following form:

position[:alignment[:leader]]

Table D-6 summarizes the alignment specifications:

Table D-6 Alignment Specifications

The optional leader specification is the decimal value of the charac-ter that should be used to fill the space of the tab character. The de-fault is 32 (space). Note that the first parameter of the value of this control sequence must be \0x3F ('?') if the extended features are used. This value must be generated by text readers, but may be re-moved by SearchServer.

Center

The center control sequence has the format \E[selector:#y, where se-lector can be either the number 6 (start centering) or 0 (end center-ing).

Alignment Interpretation

0 (default) left alignment—text following the tab character extends to the right of the tab stop

1 center alignment—text following the tab character extends equally to the left and right of the tab stop

2 right alignment—text following the tab character extends to the left of the tab stop until the space of the tab character is filled, then text extends to the right

3 decimal alignment—text to the left of the decimal point extends to the left, with the same restriction as described in right alignment; text to the right of the decimal point extends to the right; text without a decimal point extends only to the left

Page 337: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 337

The start centering (\E[6:#y) control sequence immediately pre-cedes text that is to be centered, and the end centering (\E[0#y) con-trol sequence must immediately follow the centered text. This control sequence causes centering at the current position.

Hard Space

This sequence (\E[$H) is interpreted as a space character. In NOR-MAL indexing mode, this sequence (like a space) causes a word break, while in LITERAL indexing mode, the space becomes part of the indexed term.

Soft Hyphen

The soft hyphen (\E[$a) is typically used where a word processor has performed automatic hyphenation for the purpose of text justifi-cation.

The soft hyphen control sequence temporarily causes subsequent control codes that indicate spacing (the space, tab, line feed, form feed, and carriage return characters) to be ignored. In soft hyphen mode, control sequences that do not normally cause word breaks (for example, SGR sequences) are permitted, and they do not alter soft hyphen processing.

The next code or control sequence that is not otherwise ignored, or that causes a word break, terminates soft hyphen mode. A character that does not cause a word break is considered part of the word in which a soft hyphen occurs and it also terminates soft hyphen mode.

Hard Hyphen

This sequence (\E[$I) is interpreted as a hyphen. This causes a word break.

Set Positioning Unit Mode

This sequence (\E[11h) should be generated by the text reader at the beginning of the text stream to indicate that control sequences that contain positioning information (for example, set tab) are interpret-ed in terms of decipoints. This is required by ANSI standard X3.64.

Graphic Size

The graphic size control sequence (\E[size C)has a single numeric parameter whose value specifies the height, in decipoints, of subse-quent characters in the data stream, regardless of font, until another data stream. Character width is implicitly defined by the height, the font selected, or both.

For example, to specify 12-point characters, use \E[120 C.

Page 338: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

338 SA-Application Software Expert 5.0

C

This sequence should be generated by the text reader. SearchServer may replace this sequence with an SGR control sequence with a spe-cial parameter value in the range 0x7000-0x70ff, in which the low-order byte encodes the height value.

Examples of Control Sequences

The following example shows a newspaper article with embedded control sequences to enable searching through the body of the arti-cle, as well as by storyline, byline, newspaper source, and dateline. Other control sequences are embedded in the text to give specific text attributes to certain zones at display time.

Note: Spaces are used only to increase the readability of the exam-ples.

\E[33s\E[7mThatcher, Mitterand Select Tunnel Plan\E[m\E[34s\E[1mBY LESLIE PLOMMER\E[m\E[35s\E[4mSpecial to The Chronicle\E[m\E[36sLILLE, France\E[s A high school band struck up It's a Long Way to Tipperary, but after two cen\E[$aturies of pipe dreams, that distance began to shrink yesterday as the leaders of Britain and France announced their chosen scheme for a tunnel across the English Channel.

SearchServer interprets control sequences as follows:

• Define zone 33 at the beginning of the storyline information, start an inverse video SGR sequence for display purposes, and reset the SGR sequence at the end of the storyline.

\E[33s\E[7mThatcher, Mitterand Select Tunnel Plan\E[m

• Define zone 34 at the beginning of the byline information, start a bold font SGR sequence for display purposes, and reset the SGR sequence at the end of the storyline.

\E[34s\E[1mBY LESLIE PLOMMER\E[m

• Define zone 35 at the beginning of the newspaper source information, start an underline SGR sequence for display purposes, and reset the SGR sequence at the end of the newspaper source.

\E[35s\E[4mSpecial to The Chronicle\E[m

Page 339: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Control Characters and Control Sequences

Text Retrieval Guide 339

• Define zone 36 at the beginning of the dateline information.

\E[36sLILLE, France

• Mark the beginning of the external text column (zone 32 by default). Note that a soft hyphen is embedded in the body of the article.

\E[s A high school band struck up It's a Long Way to Tipperary, but after two cen\E[$atu-ries of pipe dreams, that distance began to shrink yesterday as the leaders of Britain and France announced their chosen scheme for a tunnel across the English Channel.

The next example features the external document named 92012701 (supplied in your kit). It has the following information and embed-ded control sequences:

DATE:\E[201s92-01-27\E[s

TIME:\E[202s9:30 am\E[s

OPERATOR:\E[211sMarie\E[s

\E[6:40#yCONTACT:\E[212sDave Smith\E[s\E[#y

\E[5$@DESCRIPTION: \E[3mDave\E[m called to ask how many connections handles he could

\E[7$Fopen in one application.

\E[2FThe description of the SQLConnect\E[$Hfunc-tion in \E[1mChapter 4\E[m of the \E[4m\E[1mAPI\E[m\E[m

\E[4m\E[1mReference Manual\E[m\E[m states that an application can establish more than one

\E[0Fconnection handle, however, only one can be active at a time.

\E[60$A You can connect to another server at any time provided you first dis\E[$Iconnect from the currently active \E[9mhookup\E[29m connection.

The sequences shown above (when interpreted by a custom user in-terface application) could enhance the display in a way similar to the following:

Page 340: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

340 SA-Application Software Expert 5.0

C

DATE:92-01-27 TIME:9:30 am OPERATOR:Marie

CONTACT:Dave Smith

DESCRIPTION: Dave called to ask how many connection handles he could open in one application.

The description of the SQLConnect function in Chapter 4 of the API Reference Manual states that an application can establish more than one connection handle, however, only one can be active at a time.

You can connect to another server at any time provided you firsts disconnect from the currently active connection.

Note: The manner in which SGR sequences are interpreted is depen-dent on the features of the user interface application. You can use a text editor or word processor to view the sequences in the source file provided in your kit, but they won't necessarily be interpreted by the application that you choose to view the file.

Page 341: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

This appendix is reprinted with permission from Fulcrum Technologies, Inc., and contains the SearchServer 3.5 SearchSQL Manual.

DSearchSQL

Page 342: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

342 SA-Application Software Expert 5.0

D

Preface

This preface provides:

• a description of the intended audience

• a synopsis of each chapter

• a summary of the text conventions

• abstracts of the other documents in the SearchServer documentation set

About this Manual

This manual contains the description and syntax for the common language elements, syntax for the SearchSQL statements, the refer-ence information for viewing system tables, and tables of supported character sets. It also contains information about searching tables and system information tables.

All the information contained in this manual is generic in nature. As such, it can be used with all programming languages and hardware platforms that are available for the Fulcrum SearchServer family of products.

This manual assumes you're already familiar with SearchServer ter-minology and concepts. If you're not, we recommend that you first review Fulcrum SearchServer Introduction to SearchServer, Ful-crum SearchServer Data Preparation and Administration, and the Fulcrum SearchServer or Fulcrum SearchBuilder Developer's Guide for your environment.

See the section "Related Documentation," later in this preface, for a brief description of each of these manuals.

The following is a brief description of what you'll find in this man-ual.

Chapter 1, "SearchSQL Overview," provides a brief introduction to the SearchSQL statements and their functions.

Chapter 2, "The Search Process," discusses the SearchSQL state-ments that you'll use to search your documents.

Chapter 3, "SearchSQL Language Elements," describes the support-ing elements of the SearchSQL language.

Chapter 4, "SearchSQL Statements," presents a detailed reference of the SearchSQL statements, clauses, and predicates.

Chapter 5, "System Information Tables," provides detailed informa-tion about the system information tables.

Page 343: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 343

Appendix A, "Character Sets," provides information about the vari-ous character sets supported by SearchServer.

Appendix B, "Character Classes," describes the character classes that are the basis for the parsing rules used to index terms.

Text Conventions

This manual uses the following conventions:

Convention What it is Used ForCaseSensitivity

Filenames and directory names are shown in UPPERCASE letters; however, they can be entered in lowercase letters if this is a requirement in your environment.

InitialCapitals

Initial capitals are used in Windows application program names. For example:

SearchDoc

UPPERCASE Letters

Uppercase letters are used to represent statement names, keywords, table names, column identifiers, environment variables, mnemonic symbols, data types, filenames, and directory names. For example:

SELECT, ALL, STDOCS, FT_CID, FTNPATH, SQL_SUCCESS_WITH_INFO, SMALLINT, FULTEXT.MSG, BIN

bold Bold letters are used to represent utility program names and function names. For example:

ftmload, SQLColAttributes

[ ] Square brackets ([ ]) indicate that the elements of the syntax between them are optional. In the following example, the WHERE clause is optional.

DELETE FROM <table name>

[WHERE <search condition>]

Page 344: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

344 SA-Application Software Expert 5.0

D

Related SearchServer Documentation

SearchServer includes a comprehensive documentation set that pro-vides the information you'll need to use SearchServer. If you also purchased a Fulcrum SearchServer Software Developer's Kit (SDK) or a Fulcrum SearchBuilder product, your documentation set will in-clude manuals written for your particular development environment (for example, a SearchBuilder for Visual Basic Developer's Guide).

What's New in SearchServer 3.0 Describes what's new and changed in SearchServer 3.0 and tells you where to look for more informa-tion. It provides a description of enhancements to the SearchSQL language statements as well as a description of the enhancements to the SearchServer API functions.

Introduction to SearchServer Provides a high-level introduction to the capabilities of SearchServer. It introduces the SearchServer con-cepts and describes the process required to embed text-retrieval in an application.

SearchServer Getting Started (platform specific) Provides installa-tion and configuration instructions and all platform-specific infor-mation (including limitations).

SearchSQL Reference Provides the complete definition (syntax and semantics) of the SearchSQL language. It also contains complete in-formation about searching tables and system information tables.

< > Angle brackets (< >) represent an element of the syntax you must substitute with a specific value. In the following example, you would supply the name of a schema.

CREATE SCHEMA [REPLACE] <schema name>

{ } Curly braces ({ }) represent groups of elements in the syntax. For example: CREATE TABLE <table name> (<column definition>[{, <column definition>}...])

| An OR bar ( | ) indicates a mutually exclusive entry. You can enter one of the options shown on either side of the bar, but not both. For example: <column name> {<data type> | <domain name>}

... An ellipsis (...) indicates that an element of the syntax can be repeated. For example: zone list ::= <zone number> [{,<zone number>}...]

Convention What it is Used For

Page 345: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Text Retrieval Guide 345

SearchServer Messages and Error Codes Provides quick and easy reference to the messages and error codes returned by SearchServer and the SearchServer utility programs.

SearchServer Database Integration Describes how to use database text readers that Fulcrum provides and explains how you can modify the template code. It also provides a guide to application-level inte-gration of text and structured data.

Fulcrum Customer ServicesWe've got some of the most knowledgeable experts in text-retrieval — experts in application design and development, database integra-tion and systems engineering. Fulcrum offers you a wide range of choices to help you leverage the value of SearchServer — by ana-lyzing your requirements and helping to design the application, by transferring knowledge to your developers through Fulcrum courses and seminars, by supporting Fulcrum products through our customer support team, and by offering expert consulting.

Customer SupportIf you have a question about SearchServer, first look in the printed version of the documentation, or consult the electronic version of the documentation (using SearchDoc) or online help. You can also find late-breaking updates and technical information about SearchServer by double-clicking the Readme icon in the Fulcrum program group or folder.

If you cannot find the answer, contact Fulcrum's Customer Support Team. Our technical support staff use Fulcrum's own text-retrieval software for fast and responsive phone support. Every support per-son has instant access to all of Fulcrum's support tools, including a history of known problems, on-line design notes and product docu-mentation, technical bulletins, and product source code.

Fulcrum allows you to choose the method of contact that best meets your needs, ranging from calling us directly to sending a request electronically.

Calling Directly Fulcrum provides telephone support to registered licensees of SearchBuilder and SearchServer Software Developer's Kits (SDKs) who have up to date support agreements. For technical support, call:

• 1-800-209-4357 (for support within North America)

• 1-613-238-7068 (for support within Ottawa and outside of North America)

Electronic Services When sending your request electronically, use the electronic version of the Case Report Form (CASE.TXT) and send it to:

[email protected]

Fax Services When sending your request by fax, use the Case Re-port Form located at the back of this manual and dial:

• 1-613-238-7695

Page 346: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

346 SA-Application Software Expert 5.0

D

Product Training and ConsultationTo quickly bring your developers up to speed on SearchServer, we offer hands-on, interactive education courses, featuring real-world examples. You can also benefit from Fulcrum's expertise through workshops on specialized application areas such as database integra-tion and text reader creation. Courses are available at Fulcrum training locations, or on-site at your offices to help maximize the use of Fulcrum tools within your own environment. Our lab at corporate headquarters in Ottawa, Can-ada, is also available for your development team, complete with ap-plication experts as required.

Consulting ServicesFulcrum's professional services consultants have been designing and creating powerful integrated text-retrieval solutions for years. They

can help guide you to success at each stage of the development pro-cess:

• Evaluation and prototyping

• Requirements analysis and design

• Code review and walkthroughs

• Building application components such as text readers (filters), high-level APIs, user interfaces and system administration utilities

• Integrating Fulcrum products with other technologies (database, im-aging)

• Benchmarking

Page 347: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Overview

Text Retrieval Guide 347

Chapter 1:

SearchSQL Overview

This chapter provides an overview of SearchSQL. In it, you'll read about:

• Data Definition Language (DDL) statements

• Data Manipulation Language (DML) statements

• Search and Retrieval Language (SRL) statements

• Search functions

Introducing SearchSQL

SearchServer's searching, indexing, and table management func-tionality is accessed through the SearchSQL language. SearchSQL supports the following functionality:

• schema and table creation

• insert, update, and delete operations

• indexing operations

• search operations

SearchSQL is based on a subset of the ANSI Structured Query Lan-guage (SQL), the standard interface language for accessing database systems. It also provides language extensions that support text-re-trieval applications.

SearchSQL allows you to search for information in a variety of ways. You can search by exact words or phrases, or use wildcards to allow root expansions. You can also specify Boolean search logic, thesaurus expansion, search term weighting, and relevance ranking.

It also provides an extremely powerful Intuitive Searching capabili-ty, where you can specify one or more sections of text and have SearchServer automatically determine the appropriate query re-quired to retrieve relevant material.

This manual tells you how to use the SearchSQL language to its full-est capability for formulating text queries, manipulating search re-sults, and managing SearchServer tables.

Page 348: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

348 SA-Application Software Expert 5.0

D

Data Definition Language (DDL) Statements

SearchSQL includes the DDL statements shown below. These state-ments can be executed directly by an application program to permit schema and table definition and administration.

Data Manipulation Language (DML) Statements

SearchSQL includes the following DML statements:

Search and Retrieval Language (SRL) Statements

SearchSQL includes the following SRL statements:

SELECT Statement Features

SearchSQL defines new constructs that are required to support the search requirements of text-based applications as part of the SE-LECT statement. The SELECT statement supports:

• a contains predicate to support text-retrieval queries

ALTER TABLE Add or delete one column in an existing schema.

CREATE SCHEMA Define a new schema and table, or replace the schema of an existing table.

CREATE TABLE Define a table.

DROP TABLE Delete an existing table.

PROTECT TABLE Protect a table from indexing, schema alteration, and removal.

UNPROTECT TABLE Enable indexing, schema alteration, and removal.

VALIDATE INDE X Update the index for a table.

INSERT Insert new rows into a table.

UPDATE Update selected column values in table rows.

DELETE Delete rows from a table.

SELECT Search one or more tables and specify the columns and rows to be returned and the sort order of the resulting rows.

CREATE TEXT_VECTOR Prepare text to be used as the source for Intuitive Searching.

SET Set options for subsequently executed statements for the duration of a connection to a specific data source.

Page 349: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Overview

Text Retrieval Guide 349

• a method of indicating that certain text is to occur within a specified distance of other text (proximity searching)

• the ability to search multiple tables

• a method of weighting search terms

• boolean combination of search conditions

• advanced text-search techniques including relevance ranking, document similarity, and fuzzy Boolean

• advanced language-specific morphological query processing and thesauri

• a back reference predicate for iterative search refinement

Functions

Functions return special information for use within a SELECT state-ment. There are eleven functions that can be used when searching.

COUNT Function

The COUNT function generates a column in the working table that contains the number of rows that satisfy the query. This function is useful to determine the total number of rows in a table, or the total number of rows a specified search will produce.

CUSTOM_VIEWER Function

The CUSTOM_VIEWER function retrieves the external text docu-ment in its custom viewer format instead of the SearchServer trans-lated format usually used for retrieving documents. This function is useful when using the SearchServer search engine for document dis-play within document viewers.

FULLNAME Function

The FULLNAME function generates a column in the working table that contains the full pathname associated with the external text doc-ument of the current row. This function is useful to determine the lo-cation of the documents associated with the table, especially if the documents reside on a remote node.

MARKER_LIST Function

The MARKER_LIST function allows you to retrieve highlighting information that a document viewer can use to position efficiently within the external document. This function is useful when using the SearchServer search engine for document display within document viewers.

Page 350: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

350 SA-Application Software Expert 5.0

D

MATCH_VCC_LIST Function

The MATCH_VCC_LIST function returns the character position in-formation for each row matching the search criteria in the working table. This function allows a document viewer to use these positions to highlight search terms in the viewed document.

TABLENAME Function

The TABLENAME function adds a column to the working table that contains the table name associated with the current row. This func-tion is useful for determining the name of the table when you've used a UNION clause to search more than one table.

TABLEQUALIFIER Function

The TABLEQUALIFIER function adds a column to the working ta-ble that contains the node name or pathname of the table associated with the current row. This function is useful for determining the ful-ly-qualified name of the table when you've used a UNION clause to search more than one table.

THESAURUS Function

The THESAURUS function allows the expansion of a word or phrase specified in the contains predicate of the SELECT statement. This function allows applications to use a variety of language-spe-cific roots, such as morphological analysis and thesaurus lookup.

ORIGINAL Function

The ORIGINAL function allows you to retrieve the external text document in its original format instead of the SearchServer translat-ed format, called Fulcrum Technologies Document Format (FTDF), usually used for retrieving documents. This function is useful when viewing documents with a non-Fulcrum product.

RELEVANCE Function

The RELEVANCE function determines the relevance values of the current row in a working table and returns that value. For each row that matches the search criteria, the relevance value is calculated based on the specified retrieval model and relevance algorithm. This value is used to sort (rank) the working table rows.

VCC_RULES Function

The VCC_RULES function returns the rules for a table necessary to return the text location of terms that match the search criteria. This function is useful when using the SearchServer search engine for document display within document viewers.

Page 351: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Overview

Text Retrieval Guide 351

A Word About Examples

The examples used in this manual make use of the SUPPORT table. You can build this table using scripts supplied with SearchServer. For a complete description of the SUPPORT table, see Fulcrum SearchServer Data Preparation and Administration.

Page 352: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

352 SA-Application Software Expert 5.0

D

Page 353: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 353

Chapter 2:

The Search Process

This chapter describes the search process in SearchServer. It in-cludes information about how to:

• use the SET statement to influence a search request

• use the SELECT statement to construct a query

• combine predicates

• search multiple tables

• determine relevance ranking for a result list

• implement an Intuitive Search

• search reserved columns

Searching with SearchSQL

There are three SearchSQL statements you can use to support searching:

• The SET statement is used to create the appropriate searching environment for your application.

• The SELECT statement is used to perform searches.

• The CREATETEXT_VECTOR statement is used to prepare the query terms for an Intuitive Search before it is processed by SELECT.

A Typical Search Session

The search process begins when a user enters a query using the que-ry syntax that you've implemented in your user interface. When the query is entered and has been expressed in SearchSQL statements, the application passes the query to SearchServer. SearchServer ex-pands the query to include any terms that have been defined in the thesaurus or character variant files.

SearchServer then checks the index and locates the terms. When that is complete, it locates these terms in the documents and creates a working table that consists of selected columns from matching rows. Assuming that your search has been successful, SearchServer re-turns the working table to the application.

Page 354: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

354 SA-Application Software Expert 5.0

D

When SearchServer processes the query, it determines:

• in which documents the terms occur

• how many times each term occurs in each document

• where each term occurs in the document

A "hit" occurs in a document any time SearchServer locates a term that matches the search request. The number of successful hits in a document determines how relevant that document is to the query.

SearchServer allows you to specify that certain terms are more im-portant than others for any particular search. This is called search term weighting, and it can influence relevance ranking. As well, you can alter the way that search operators are interpreted for a particular search (retrieval model).

Influencing a Search

You can influence how SearchServer will conduct a search by:

• eliminating irrelevant words from a search request

• enhancing a search request by linking similar words together

• including variations of words

Eliminating Irrelevant Words

Most documents contain words that don't add any real value to a search (such as "and" and "the"). While necessary for clarity, these words are not useful to SearchServer for locating terms that match a specified query. These words are called stop words.

Stop words are not indexed because they occur so frequently that they have no value in discriminating one document from another. However, since they appear in the actual document, phrases contain-ing them are still retrieved.

For example, if you have a typical stop file declared for a table, and then search for the phrase "index the table" in that table, you'll notice that the word "the" is not highlighted. This is because it was chosen as a stop word. In this context, "the" does not add value to the search because it does not distinguish one document from another in a meaningful way.

SearchServer doesn't automatically declare a stop file. You must re-quest it when you create a table. Once that's done, and the table is indexed, stop word processing is handled automatically by Search-Server, so your application does not need to be aware of stop words when preparing queries. However, you can determine the list of stop words for a table when creating the application. For a complete de-scription of stop words, see Fulcrum SearchServer Data Prepara-

Page 355: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 355

tion and Administration.

Using the Thesaurus to Enhance a Query

At the same time that you are eliminating irrelevant words by using the stop word feature, you can increase the value of the search by linking similar words together. In this manner, when a user specifies a particular word, you can have SearchServer expand the query to include related terms. This is done using the THESAURUS func-tion. When you use the THESAURUS function, SearchServer auto-matically introduces similar forms of the selected term or terms in the query.

Similar terms, that are derived from an external thesaurus file, can be synonyms or expanded versions of the terms (including plurals, singular forms, and possessive forms). For example, the term "disc" can be expanded to include "disk," "diskette," "floppy," and so on.

SearchServer also allows you to have search terms linguistically processed to include additional word forms. You can invoke linguis-tic processing using the thesaurus function.

Using the Character Variant File

You can also ensure that any variations in the spellings of words, or the use of accents, can be taken care of by including those variations in the character variant file. When you use this feature, SearchServer automatically includes the similar forms of the selected term or terms in the query.

Using Linguistic Processing to Enhance a Query

SearchServer's linguistic processing capability lets you expand the range of the words used to search for documents by providing addi-tional word forms. These include: inflections, derivations, spelling correction, baseforms and roots of the search term.

For example, if linguistic processing is specified using inflection, the word `filter' will be expanded to search for `filter', `filters', `fil-ter's', `filtered' and `filtering'. Linguistic processing can apply both to single words in a structured search and to the text provided to the intuitive searching process.

A structured search can specify the use of linguistic processing by using the thesaurus function or by using the SET TERM_GENERATOR statement in conjunction with the contains or like predicates. This is not the default.

An intuitive search (using the is_about predicate) uses linguistic processing by default. This can increase the value of a search by re-ducing the terms in the text to their uninflected form before deter-mining their statistical ranking within the document. For example, if the text contained the words `filter' and `filters', the word `filter' would be considered more significant than if ̀ filter' and ̀ filters' were

Page 356: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

356 SA-Application Software Expert 5.0

D

seen as separate terms.

You can change the default settings for linguistic processing and in-tuitive searching by changing the VECTOR_GENERATOR server attribute in the system information table.

Adding a Word Wheel

To assist users who are not completely familiar with the documents in the table, you can provide a mechanism to view a list of terms that can be used to search a table. The standard interface model for dis-playing such information is called a word wheel. The information is accessible to SearchServer users through the SearchServer system information tables. For more information about determining what search terms are available see Chapter 5, "System Information Tables."

Altering the Retrieval Model

You can alter the way SearchServer interprets a hit by designing a retrieval model that meets your particular requirements. The retriev-al model you choose will determine how SearchServer interprets the importance of search terms and how they will appear in your docu-ments.

SearchServer's comprehensive set of search techniques can satisfy any user, no matter what their level of expertise. These search tech-niques include:

• Intuitive Searching

• Single Term Searching

• Boolean Searching

To help users choose the appropriate documents for browsing, SearchServer also offers relevance ranking, a feature that lists the most relevant documents first. For a complete description of how to choose a retrieval model, see the section, "Choosing a Retrieval Model," later in this chapter.

Using the SET Statements to Influence the Search

The SET statements are used to influence the entire search and re-trieval process. Many of the SET statements can be implemented as choices your users will make when they search documents. For ex-ample, you could allow users to select the thesaurus and word vari-ant file they want to use and, if they are sophisticated users, choose the retrieval model they would like to use for a search.

Page 357: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 357

Table 2-1 describes the SET statements you can use to influence the search and retrieval environment.

Table 2-1 SET Statements

Using the SELECT Statement to Con-struct a Query

The SELECT statement is the most frequently used and is often the most complex of the SearchSQL statements. Using the SELECT statement you can create a working table by selecting a subset of the rows and columns from one or more tables.

Statement What it is Used For

SET CHARACTER_VARIANT

Sets the name of the character variant file that you want to use in this search session. This feature is available only if it is set specifically.

SET MAX_EXEC_TIME Sets the maximum execution time for the SELECT and CREATE TEXT_VECTOR statements.

SET MAX_SEARCH_ROWS

Sets the maximum number of rows you want to include in a working table.

SET RELEVANCE_METHOD

Sets a specific retrieval model and relevance ranking method.

SET SEARCH_MEMORY_SIZE

Sets the search memory size to a larger value to improve search performance. (Only applicable if the table resides in a non-Windows environment.)

SET SERVER_REPORT_TIME

Sets the amount of time that SearchServer has before it must return from an asynchronous statement execution.

SET TERM_GENERATOR Sets the type of linguistic processing to be performed on all search terms in the contains or like predicate of subsequent SELECT statements for the current session.

SET THESAURUS_NAME Sets the name of the thesaurus file that will be used for this session.

SET VECTOR_GENERATOR

Sets the type of linguistic processing to be performed on the search term that comprise the intuitive search. Linguistic processing can apply both before and after Intuitive Searching processing for the current session.

SET WILDCARD_OPT Sets the type of wildcard optimization to be enabled for the table.

Page 358: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

358 SA-Application Software Expert 5.0

D

The working table created is the set of rows and columns that satis-fies the conditions at the time the SELECT statement is executed. In this case, the row and column numbers are fixed for the working ta-ble. However, the data referenced by the rows or columns might change because you or another user has updated the row.

For example, to search for all the columns and rows in the SUP-PORT table, you would enter the following statement:

SELECT *FROM SUPPORT;

This simple search creates a working table containing all the appli-cation-defined columns and renamed reserved columns in the order in which they were named in the CREATE TABLE clause. This SE-LECT statement is useful if you just want to browse through the en-tire table instead of selecting specific information.

You can build on this simple SELECT statement to create more powerful search requests. For example, by replacing the asterisk (*) with a select list, you can choose specific columns and define the or-der in which they're placed in the working table. For example:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUSFROM SUPPORT;

This SELECT statement creates a working table that contains all the rows (including unindexed rows) from the table, but only the four named columns. You must add a WHERE clause to the SELECT statement to be able to specify which rows you want returned from the table. A SELECT statement that contains a WHERE clause re-turns only indexed rows.

A WHERE clause determines which rows are returned in the work-ing table by specifying a search condition that must be satisfied. These search conditions are formulated with predicates and Boolean operators.

A predicate specifies a condition that is evaluated on a row to give a truth value of 'TRUE' or 'FALSE'. The row is selected only if the search condition is 'TRUE' when applied to the row.

There are seven types of predicates:

• back reference predicate

• between predicate

• comparison predicate

• contains predicate

• in predicate

Page 359: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 359

• is_about predicate

• like predicate

Boolean operators are used to connect or negate two or more predi-cates to form the search condition, or to connect search terms used inside a predicate. To connect or negate predicates, use one of the following Boolean operators:

• AND

• OR

• NOT

To connect or negate terms within a contains predicate, use one of the following Boolean operators:

• & (corresponds to AND)

• | (corresponds to OR)

• ~ (corresponds to NOT)

Table 2-2 shows how you can use the components of the search cri-teria in the SELECT statement to influence the results:

SELECT Statement Component

Parent SELECT Clause/Predicate

What It Is Used For

WHERE N/A This clause contains the query specification.

Contains WHERE The terms that are included in the contains clause are the words or phrases that the user is trying to locate.

Between WHERE This predicate is used to search for rows that contain the specified range of numeric or date values.

Page 360: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

360 SA-Application Software Expert 5.0

D

Comparison WHERE The comparison predicate is represented by the comparison values "greater than," "less than," or "equal to." This predicate is used primarily for numeric and date searches.

Like WHERE This predicate is used to search for rows that contain a specific term in a text column.

In WHERE This predicate is used to select rows with column or zone values found in a specified list of exact values.

Back reference WHERE This predicate is used to search the rows selected by a previous query.

Is_about WHERE This predicate is used to implement an Intuitive Search using a single statement.

Within Contains This sub-predicate is used inside a contains predicate. It is used to test for the proximity of the search terms specified in the contains predicate.

Proximity Contains This sub-predicate is used inside a contains predicate. It is used to test for the proximity of the multiple search terms specified in the contains predicate.

SELECT Statement Component

Parent SELECT Clause/Predicate

What It Is Used For

Page 361: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 361

Table 2-2 SELECT Statement Components

THESAURAUS Contains This function involkes thesaurus expansion for a specified word or phrase.

RELEVANCE Contains This function allows you to determine the ranking of the rows in the result list of your match.

TABLENAME N/A This function allows you to determine the table name of the current row in a working table.

TABLEQUALIFIER N/A This function allows you to determine the node name or pathname of the table of the current row in a working table.

COUNT N/A This function allows you to determine the number of rows in a working table.

ORIGINAL N/A This function allows you to retrieve the external text document in its original format instead of the SearchServer translated format usually used for retrieving documents.

ORDER BY N/A This clause allows you to order your result list according to the values in one or more columns.

SELECT Statement Component

Parent SELECT Clause/Predicate

What It Is Used For

Page 362: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

362 SA-Application Software Expert 5.0

D

Searching Unstructured Text (Contains Predicate)

The contains predicate is the most commonly used search condition for searching unstructured text. You can only use the contains pred-icate to search for columns and zones defined with LITERAL or NORMAL index mode. When applied to a column or zone, the con-tains predicate evaluates to 'TRUE' whenever data is found that matches the pattern(s) that you specify.

For example, to search for FILTER in the TEXT_LOG column, you could execute the following statement:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'FILTER'

This SELECT statement creates a working table with the specified columns and all rows that contain the indexed term FILTER in the TEXT_LOG column.

The previous example used a simple one-word pattern for its search. You can also use a multiple-word pattern, or phrase, in your contains predicate. The following example searches for the phrase DOCU-MENT FILTER in the TEXT_LOG column.

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'DOCUMENT FILTER'

You can also specify a word list. The following example selects all the rows containing document or filter or both in the TEXT_LOG column.

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER'

You can also have linguistic processing apply to the word or word list specified in the contains predicate. Linguistic processing can ex-pand the term to include inflections or derivations, so that users can automatically include the different tenses or other variations of the word being searched. For example, if the TERM_GENERATOR value of the system information table is set, the following statement:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORTWHERE TEXT_LOG CONTAINS 'FILTER'

will retrieve all documents containing `filter', `filters', `filter's', `fil-tered' and `filtering'.

Page 363: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 363

Within Predicate

The within predicate is used inside a contains predicate. It tests for the proximity of the search terms. Proximity is determined by count-ing indexedprintable characters between the end of one search term and the start of another. The condition evaluates to 'TRUE' whenev-er any search term in the first list is within the specified distance of any search term in the second list. For example:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER'WITHIN 10 CHARACTERS OF 'WORD', 'PROCESSING'

In this example, SearchServer creates a table with all the rows of data where any search terms in the first list are close enough to any of those in the second list.

You can use the IN_ORDER option to modify the search. This op-tion specifies that any term from the first list must precede (by not more than the specified distance) any term from the second list. For example:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER'WITHIN 10 CHARACTERS OF 'WORD', 'PROCESSING' IN_ORDER

In this case, one of the terms DOCUMENT or FILTER must occur (within not more than 10 characters) before one of the terms WORD or PROCESSING.

For more information about character counting, see "Text Indexing and Character Counting" in Fulcrum SearchServer Data Prepara-tion and Administration.

Proximity Predicate

The proximity predicate is used inside a contains predicate. It tests for the proximity of multiple search term lists. Proximity is deter-mined by counting indexedprintable characters between the end of one search term or phrase and the start of another. The condition evaluates to 'TRUE' whenever any term in one list is within the spec-ified distance of any term in any other list. For example:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS PROXIMITY 10 CHARAC-TERS (('DOCUMENT', 'TEXT') & ('PROCESSING', 'READ-ER') & ('GENERIC', 'DATABASE'))

Page 364: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

364 SA-Application Software Expert 5.0

D

In this example, SearchServer creates a table with all the rows of data where any terms in the first list are within ten characters of any of the terms in the second or third list.

For more information about character counting, see "Text Indexing and Character Counting" in Fulcrum SearchServer Data Prepara-tion and Administration.

Searching for Numeric or Date Values (Comparison Predicate)

The comparison predicate is used for numeric and date searches. Ta-ble 2-3 lists the valid comparison operators.

Table 2-3Comparison Operators

This predicate evaluates to 'TRUE' for each row containing a value in the specified column or zone that satisfies the arithmetic relation. If a row contains no data in a column, that row will be included in the search result when the column is searched for values that are < or <= or <> some number.

No context information is available, meaning that match codes can't be inserted in returned data to indicate the location of numeric values that satisfy the predicate.

The following example searches for all rows of data with a value in the PRIORITY column greater than or equal to 2.

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE PRIORITY >= 2

When used with numeric columns or zones, the values are compared with respect to their algebraic value. However, when used with date columns or zones, the values are compared with respect to their cal-endar value. For example:

SYMBOL ARITHMETIC RELATION

Text Relation

= equal to equal to

<> not equal to not equal to

< less than N/A

> greater than N/A

<= less than or equal to N/A

>= greater than or equal to N/A

Page 365: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 365

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, DATE_CLOSEDFROM SUPPORTWHERE DATE_CLOSED <= DATE '1995-01-22'

The equal to (=), and not equal to (<>) operators are exactly equiv-alent to the CONTAINS <single term> and NOT CONTAINS <sin-gle term>.

The equal (=) operator can also be used in some special searches in-volving the reserved columns. For more information, see the section, "Searching the Reserved Columns," later in this chapter.

Comparing Numeric or Date Values (Between Predicate)

The between predicate is used to compare numeric and date values in a search. This predicate can only be used for columns defined with VALUE index mode. The values in the between predicate can be en-tered in any order. For instance, the following example searches for all rows of data with the value in the PRIORITY column falling be-tween 1 and 3 inclusively:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE PRIORITY BETWEEN 1 AND 3

No context information is available, meaning that match codes can't be inserted in returned data to indicate the location of values that sat-isfy the predicate.

Searching for a Specific Term (Like Predicate)

The like predicate is used to search for a single pattern within a col-umn. The like predicate supports wildcards, such as percent (%) and underscore (_), but doesn't support the THESAURUS function, WEIGHT, WITHIN, or ESCAPE clause.

The following example searches for the any text in the SUPPORT table that contains the term filter.

SELECT COMPANY, PROBLEM_NUMBER FROM SUPPORT WHERE TEXT_LOG LIKE 'FILTER'

Likewise, the following example will find the document containing the phrase document filters.

SELECT COMPANY, PROBLEM_NUMBER

Page 366: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

366 SA-Application Software Expert 5.0

D

FROM SUPPORT WHERE TEXT_LOG LIKE 'DOCUMENT FILTER%'

Linguistic processing can be used with the like predicate to expand a term to include inflections and derivations. The SET TERM_GENERATOR statement is used to set the linguistic pro-cessing that you want, as is done with the contains predicate.

Searching for a Set of Values (In Predi-cate)

This predicate is used when you want to find documents where a col-umn matches any one of a set of values. For example:

SELECT PROBLEM_NUMBER, COMPANYFROM SUPPORTWHERE PRIME_CONTACT IN ('Dave Chisolm', 'Jes-sica Trew')

If the data type of the column is DATE, all values must be date lit-erals. If the column is VALUE indexed, all values must be numeric literals. Otherwise, all values must be character literals.

The in predicate is similar to using the equal and not equal compar-ison operators of the comparison predicate. However, in these cases, the in predicate is more compact and readable.

No context information is available for DATE or VALUE indexed columns, meaning that match codes can't be inserted in returned data to indicate the location of values that satisfy the predicate.

One of the best uses of this predicate is to search the FT_CID re-served column. For example, the following search returns a working table that contains a row for each of the FT_CID values specified.

SELECT *FROM SUPPORTWHERE FT_CID IN (1, 4, 5, 6)

This would typically be used where FT_CID values serve as a link between tables, perhaps from a database table to a SearchServer ta-ble. For more information, see Fulcrum SearchServer Database In-tegration.

Using a Previous Search Result in a New Search (Back Reference Predicate)

The back reference predicate is used to combine the results of one or more previous searches with a new search to get a new result. This predicate is helpful when you have a working table but want to re-fine the search criteria to create a new working table. It allows you

Page 367: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 367

to build on your previous search results instead of building on the SELECT statement itself.

In your application, this predicate must be used in conjunction with the SearchServer API functions to obtain or set the cursor name of the previous SELECT statement.

The back reference predicate evaluates to 'TRUE' for each row in the table that corresponds to a row in the working table referenced by the cursor name. You can use this predicate when you have a work-ing table that you want to add more rows or columns to, remove rows or columns from, or retrieve different rows or columns from the result. You could otherwise just take the initial search, add a more complex WHERE clause and rerun it.

Using a backreference predicate can be more efficient and conve-nient for the user. Performance may improve because the availabil-ity of previous search results means that the application doesn't have to do a completely new search. On platforms with memory limita-tions (such as 16-bit Windows), you can split up searches and use back reference to get the results you need. Back reference is also convenient because users doesn't have to repeat complicated search-es just to refine their search criteria.

Note: If the table has been updated, the rows found by the original search condition might not be the same rows that would be matched if you reissued the SELECT statement

To use a backreference predicate, you must issue a SELECT state-ment to create the first working table.

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE COMPANY CONTAINS 'OREO'

Next, call the SearchServer API to obtain or set the cursor name that is the reference to the first working table. The cursor name is an identifier, and therefore follows the same rules as any other identifi-er.

Note: You'll find that the cursor name used in these examples might not correspond to the cursor name obtained from the SearchServer API

You can then refine your search to select a different set of columns, but still using the same rows from the first SELECT statement. For example, if the cursor name obtained from the previous search ex-ample is SQL_CUR00001, you can reference it in the following SE-

Page 368: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

368 SA-Application Software Expert 5.0

D

LECT statement.

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANYFROM SUPPORTWHERE CURSOR SQL_CUR00001

The new working table refers to the same rows in the table, but is otherwise independent of the first working table. You can continue to refine a search and refer back to either working table independent-ly.

The maximum number of SELECT statements that can be executed within a single connection is 65,535. The maximum number of cur-sors that can be open concurrently (that is, SELECT statements whose statement handles haven't been freed by a call to SQLFreeSt-mt) is dependent on available memory, but is at least 10.

If you have requested that match codes be inserted into the original working table using a SET SHOW_MATCHES statement, then the WITH CONTEXT option of this predicate instructs SearchServer to insert match codes in the new working table for data matching the back referenced search terms. The original search produced a working table with the search term OREO highlighted with match codes. A back reference to that search also highlights OREO if the WITH CONTEXT option is included in the back reference search. For example:

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANYFROM SUPPORTWHERE CURSOR SQL_CUR00001 WITH CONTEXT

The WITHOUT CONTEXT option is the default. It instructs SearchServer not to insert match codes in the new working table for data matching the back referenced search terms. This option works even if you have previously requested that match codes be provided during retrieval using a SET SHOW_MATCHES statement.

For example, the original search produced a working table with the search term OREO highlighted with match codes. A back reference to that search won't highlight OREO if the WITHOUT CONTEXT option is included in the back reference search.

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANYFROM SUPPORTWHERE CURSOR SQL_CUR00001 WITHOUT CONTEXT

Refining an Existing Search

When combining predicates, the back reference predicate can be an efficient method to broaden or narrow a search request. For exam-ple, the following SELECT statement creates a working table with the specified columns and all rows that contain the word OREO in the COMPANY column.

Page 369: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 369

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE COMPANY CONTAINS 'OREO'

The following SELECT statement uses the back reference predicate to reference the working table created in the previous statement and broadens the search criteria by using the OR Boolean operator to connect a contains predicate.

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE CURSOR SQL_CUR00001 OR STATUS CONTAINS 'HOLD'

Similarly, you can use the back reference predicate to narrow the search by using the AND Boolean operator to connect a contains predicate. For example:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE CURSOR SQL_CUR00001 AND STATUS CONTAINS 'HOLD'

Intuitive Searching (Is_about Predicate)

The is_about predicate allows the definition of the text vector in the SELECT statement. The is_about predicate accepts the text vector criteria in a single query statement. There is no named text vector created when this predicate is used. It can also use a text vector cre-ated in a previous CREATETEXT_VECTOR statement.

The VECTOR_TERMS parameter is used for initial term selection by specifying how many terms are kept in the text vector. This pa-rameter is equivalent to the VECTOR_TERMS parameter of the CREATETEXT_VECTOR statement. Linguistic processing is en-abled by default when using Intuitive Searching.

The is_about predicate is used to specify an Intuitive Search. For a complete description of Intuitive Searching, see "Implementing an Intuitive Search," later in this chapter.

Determining the Thesaurus for a Search (THESAURUS Function)

The THESAURUS function specifies that a word or phrase can be expanded using a thesaurus. It can be used anywhere that a pattern is used.

Page 370: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

370 SA-Application Software Expert 5.0

D

Note: Stop words, string wildcards, and phrases are not expanded

The THESAURUS function can be specified in a contains predicate. It uses the thesaurus file to find related words or phrases. If there is a match, a word list is generated containing all the related words. In-ternally, SearchServer then expands the search to include all the words in the word list.

Note: You cannot cancel a search process until the thesaurus expan-sion has completed.

The thesaurus file can be specified either through the SET THESAURUS_NAME statement, or through an optional third pa-rameter to the THESAURUS function. Using the optional parameter applies only to the current search and doesn't change the thesaurus filename in the THESAURUS_NAME server attribute in the SERVER_INFO system table.

There are six options you can specify when using this function:

• WORD_SYNONYM

• WORD_SUFFIX

• WORD_SIMILARITY

• WORD_BROADEN

• WORD_NARROW

• WORD_MODIFY

The WORD_SYNONYM option expands the word or phrase to in-clude equivalent words before processing the predicate. For in-stance, the following example searches the thesaurus file for the word DISK and equivalent words.

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS THESAURUS ('DISK', WORD_SYNONYM)

The word list generated from this SELECT statement could include the following:

DISK DISCDISKETTE FLOPPY

Page 371: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 371

SearchServer then creates a new contains predicate (internally) that searches for the original word or phrase and any of the other words or phrases in the word list. The resultant search is the same as if the original contains predicate referenced all the words in the word list.

For example:

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS ('DISK', 'DISC', 'DIS-KETTE', 'FLOPPY')

The WORD_SUFFIX option expands the word or phrase to include their plural and possessive forms. For instance, the following exam-ple searches the thesaurus file for the plural and possessive forms of the word DISK.

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS THESAURUS ('DISK', WORD_SUFFIX)

The word list generated from this SELECT statement would include the following:

DISKS DISK'S

The resultant contains predicate that SearchServer creates would look like this:

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS ('DISKS', 'DISK''S')

The WORD_SIMILARITY option uses a combination of the WORD_SYNONYM and WORD_SUFFIX options. This option gives the synonym processing priority over the suffix processing. If there is a synonym match, there is no further search for an additional suffix match. However, if there is no synonym match, then suffix processing is performed.

The WORD_BROADEN and WORD_NARROW options are equivalent to the WORD_SYNONYM option. They are included for clarity if the thesaurus file specified in the function is intended to broaden or narrow the term specified. For instance (for this SELECT only) the following example searches the optional thesaurus file (THESALT.FTH) for the word DISK and equivalent words.

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS THESAURUS ('DISK', WORD_NARROW, 'thesalt.fth')

Page 372: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

372 SA-Application Software Expert 5.0

D

The word list generated from this SELECT statement would include the following:

DISK DISC

The WORD_MODIFY option uses the linguistic processing filter to expand the terms. This filter and its related options are specified in the thesaurus file parameter of this function. Refer to the SET TERM_GENERATOR statement and the THESAURUS FUNC-TION specification in Chapter 4, "SearchSQL Statements", for the syntax for the linguistic processing filter.

Determining the Document Relevance (RELEVANCE Function)

Document relevance is used to describe how closely a given docu-ment matches a query. The relevance of a document can determine the order of the search result that is displayed to the user if it is sorted on that value. The RELEVANCE function returns the relevance val-ue of the current row in a working table.

In the case of a SELECT with no WHERE clause or when the RELEVANCE_METHOD is not set as a server attribute or in the RELEVANCE function specification, the relevance value is NULL for all rows.

For each row that matches the search criteria, the relevance value is calculated from the specified retrieval model and relevance order. The rows of the working table can be sorted in relevance order, if specified by the ORDERBY clause.

The relevance value can be returned in the working table represented as a non-negative exact numeric value which indicates the relative importance of the row. To do this, you specify the RELEVANCE function in the select list. The relevance value can be retrieved as if the function were defined as an INTEGER column.

To order the table by relevance, the RELEVANCE function must be specified in the select list with an AS column alias that is then used in the ORDER BY clause.

This function can be used in conjunction with the SETRELEVANCE_METHOD statement. This SET statement specifies which retrieval model and relevance order are used to cal-culate the relevance value of subsequent SELECT statements. How-ever, you can override the current relevance method for a particular SELECT statement by including the relevance method in the rele-vance function specification.

Note: When using relevance ranking in a SELECT statement, it is recommended to use the SETMAX_SEARCH_ROWS statement to

Page 373: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 373

limit the number of rows returned in the working table. A reasonable limit (for example, 100) optimizes performance and memory usage.

Determining the Character Variant Rules File

Variant generation allows the search for a word or phrase to be aug-mented using a predetermined rules file. It allows typographical variants of the same word to be treated as equivalent for search pur-poses. Any simple words (including stop words), compound words, and word sequences found in the search are included in the variant generation.

Single-character words are the exception. They aren't included for suffix substitution because they might generate other two-character words that have no relationship to the original word. For example, "a" and "as". Literal terms, dates, or numbers also aren't included.

Note: The total number of variants generated from a single search term can become large when several substitution rules apply. Be-cause SearchServer searches for each generated variant form in the table, a large number of variants can cause unacceptably slow re-sponse, even if only a few variants actually occur in the table.

The character variant rules file is specified in the SETCHARACTER_VARIANT statement. If a rule applies to a search term, a word list is created containing all the generated words. Internally, the search is expanded to include all the words in the word list. For example, the character variant rules file named FUL-TEXT.FTL, automatically generates plurals and possessives. You can request that this rule file be used in all following searches by ex-ecuting the statement:

SET CHARACTER_VARIANT 'fultext.ftl'

When the character variant rules file is enabled, the word or phrase in a search is expanded to include equivalent words before process-ing the predicate. For instance, the following example checks the character variant rules file for a rule applicable to the search term word.

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS 'word'

The word list generated from this statement would include the fol-lowing:

Page 374: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

374 SA-Application Software Expert 5.0

D

wordwordsword's

The resultant working table is the same as if the original contains predicate referenced all the search terms in the list separately. For example:

SELECT *FROM SUPPORTWHERE TEXT_LOG CONTAINS ('word', 'words', 'word''s')

Note: You cannot cancel a search process until the character variant expansion has completed.

Determining the Sort Order (ORDER BY Clause)

The ORDER BY clause of the SELECT statement is used to specify how the rows of the resulting working table should be ordered. Us-ing the ORDERBY clause, the rows of a working table can be sorted based on one or more of the following criteria:

• the contents of a column

• the row's relevance value

• the name of the base table from which a row originated

For each of these criteria, it is possible to sort in either ascending or descending order. Including the ASC keyword after a column name indicates that sorting should be done in ascending order. Including the DESC keyword indicates that sorting should be done in descend-ing order. If neither ASC nor DESC is specified, ascending order is used by default. For instance:

SELECT LAST_MODIFIED FROM SUPPORT ORDER BY LAST_MODIFIED DESC

causes the working table to be ordered according to the contents of the LAST_MODIFIED column, with the most recent dates appear-ing first.

Conversely, SELECT LAST_MODIFIED FROM SUPPORT ORDER BY LAST_MODIFIED ASC

Page 375: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 375

or,

SELECT LAST_MODIFIED FROM SUPPORT ORDER BY LAST_MODIFIED

would both cause the working table to be ordered according to the contents of the LAST_MODIFIED column, with the oldest dates ap-pearing first.

To sort on the contents of a column, the name of the column is in-cluded in the ORDER BY clause. Ordering of the working table can be based on any column or function that is listed in the SELECT statement's select list, with the exception of the external text column.

To sort on the basis of search relevance, table name, or table quali-fier, one of the following functions is used in the select list of the SE-LECT statement:

• RELEVANCE()

• TABLENAME()

• TABLEQUALIFIER()

To sort on the value of these functions, the function must be listed in the SELECT statement's select list, and assigned an alias using the AS clause. The alias can then be included in the ORDER BY clause.

To order based on the relevance of the rows, the RELEVANCE function is used. Since the RELEVANCE function returns the high-est values for the highest ranking rows, the DESC keyword should be used in conjunction with the RELEVANCE function to cause the most relevant rows to be listed first. For example:

SELECT RELEVANCE('2:1') AS RANK, TEXT_LOGFROM SUPPORTWHERE TEXT_LOG CONTAINS 'SOFTWARE', 'FILTER%', 'DOCUMENT', 'PROBLEM'ORDER BY RANK DESC

would cause the most relevant documents, as ranked by the hits count algorithm, to be listed before less relevant documents.

To order based on the name of a table, use the TABLENAME func-tion. For instance,

SELECT AUTHOR, FT_TEXT, TABLENAME() AS BASETABLEFROM EAST_REPORTS UNION SOUTH_REPORTS UNION WEST_REPORTSORDER BY BASETABLE, AUTHOR

Page 376: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

376 SA-Application Software Expert 5.0

D

would cause the rows in the working table to be ordered according to the name of their base tables.

You can also use the TABLEQUALIFIER function in the previous statement to order the rows not only by their table name, but also by their node name or pathname. For example:

SELECT AUTHOR, FT_TEXT, TABLEQUALIFIER() AS TABLEQUAL TABLENAME() AS BASETABLEFROM EAST_REPORTS UNION SOUTH_REPORTS UNION WEST_REPORTSORDER BY TABLEQUAL, BASETABLE, AUTHOR

When more than one sort criterion is listed in an ORDER BY clause, any sort criterion takes precedence over those that follow it. For ex-ample:

SELECT COMPANY, LAST MODIFIEDFROM SUPPORTORDER BY COMPANY, LAST MODIFIED

would cause the resulting working table to be sorted first by the val-ues in the COMPANY column. Rows with identical COMPANY values would then be sorted according to the values in the LAST_MODIFIED column.

Determining the Number of Rows in a Working Table (COUNT Function)

This function generates a column in the working table that contains the number of rows that satisfy the query. When used, it must be the only component of the select list. This function returns a working ta-ble that contains only one row and one column. Therefore, the OR-DER BY clause and MAX_SEARCH_ROWS server attribute have no effect on a search that uses this function. You can't use this func-tion in a search that also uses a FORUPDATE option. This function is useful to determine the total number of rows in a table, or the total number of rows a specified search will produce.

The data type of the value returned by this function is INTEGER.

Searching Zones

A zone can be specified anywhere in a WHERE clause that a column name is allowed. It cannot be specified in the select list of the SE-LECT statement. These zones give you more flexibility when you are searching the table.

For example, the SUPPORT table was created with a number of dif-ferent zones. The ENTRY_HEADING zone grouped a number of zones into one searchable unit. This zone contains the data from the

Page 377: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 377

DATE_AND_TIME, OPERATOR, DESCRIPTION, and CON-TACT zones.

To search a zone, you specify it in the WHERE clause the same way you would specify a column. For example:

SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS, CREATORFROM SUPPORTWHERE ENTRY_HEADING CONTAINS 'PETER' | 'DAVE'

The working table derived from the SUPPORT table would contain all rows that contained Peter in the OPERATOR zone and Dave in the CONTACT zone, or both. For example, if the working table could be displayed, it would look like this:

Combining Predicates

Boolean operators can be used to connect individual predicates or to connect two search terms within a predicate. This allows you to for-mulate complex searches.

Note: All the examples in the following section use the default re-trieval model, strict Boolean.

For example, the following is a simple case of combining two pred-icates in a search condition:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE COMPANY CONTAINS 'OREO'AND PRIORITY = 5

The preceding search utilizes the contains and comparison predi-cates. This search returns a working table that consists of the col-umns specified in the select list and only those rows of data that contain the word OREO in the COMPANY column and the value 5

PROBLEM__NUMBER

PRIME_CONTACT STATUS CREATOR

92010201 Dave Chisholm ACTIVE Peter

92011101 Jessica Trew CLOSED Peter

92012101 Daryl Weaver CLOSED Peter

92012201 Dave Chisholm CLOSED Marie

92012203 Montag Hortz CLOSED Peter

92012701 Dave Chisholm CLOSED Marie

92013102 Montag Hortz CLOSED Peter

92020501 Montag Hortz NULL Peter

Page 378: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

378 SA-Application Software Expert 5.0

D

in the PRIORITY column.

You can also use Boolean operators to connect the same type of predicate or predicates involving the same column. For example, the following search combines two contains predicates:

SELECT PROBLEM_NUMBER, SUBJECTFROM SUPPORTWHERE SUBJECT CONTAINS 'ORDER OF WORDS'OR SUBJECT CONTAINS 'TEXT_VECTOR'

This search can also be formulated with a single contains predicate that uses a Boolean operator within it. For example: SELECT PROBLEM_NUMBER, SUBJECTFROM SUPPORTWHERE SUBJECT CONTAINS 'ORDER OF WORDS' | 'TEXT_VECTOR'

The two sets of Boolean operators (symbols and words) are used in different parts of the search condition. The three symbols (|, &, ~) are only used to combine terms within a contains predicate. For ex-ample:

SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS 'PETER' | 'MARIE'

The three Boolean words (OR, AND, NOT) are used to connect predicates to combine their results using Boolean algebra. For exam-ple:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STA-TUSFROM SUPPORTWHERE (COMPANY CONTAINS 'OREO' OR COMPANY CON-TAINS 'SARASOTA')AND PRIORITY = 5

Boolean operators react differently when you are using SearchServ-er's Intuitive Searching capability. For more information, see "Im-plementing an Intuitive Search," later in this chapter.

Placement of Boolean Operators

The placement of the Boolean operators plays an important role in the searching process and in some cases actually returns different re-sults. The following examples show how the placement of the NOT and ~ Boolean operators are interpreted to produce different results. The tables that follow each example are the working tables derived from the search.

To fully understand the different results you'll receive when using Boolean operators, this first example illustrates the results of a

Page 379: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 379

search that doesn't use Boolean operators. SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS 'PETER'

The next three examples show the different placement of the NOT and ~ Boolean operators when there is only one search term in the WHERE clause. Notice that the working tables for these three exam-ples contain exactly the same data. This is because the NOT and ~ Boolean operators used influence only one search term.

The following examples return all the rows where the CREATOR column contains any value other than Peter.

***ETPIT013 SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS ~ 'PETER' ***ETPIT014 SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR NOT CONTAINS 'PETER' ***ETPIT015 SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE NOT CREATOR CONTAINS 'PETER'

PROBLEM__NUMBER

CREATOR SUBJECT

92010201 Peter What are the valid connectors between predicates

92011101 Peter When can I free a statement handle

92012101 Peter Searches take too long (use relevance ordering)

92012203 Peter Where to obtain row numbers for the CREATE DOCUMENT

92013102 Peter How do you look for accented characters

92020501 Peter Is there support for Ful/Text Library files?

Page 380: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

380 SA-Application Software Expert 5.0

D

Multiple Search Terms

When you add more than one search term to a WHERE clause that contains the NOT and ~ Boolean operators, the statement is inter-preted differently. The following three examples contain two search terms instead of just one. It is in these cases that the placements of the NOT and ~ Boolean operators are interpreted by SearchServer to produce different results.

For example, the following search returns all rows that do not con-tain Peter in the CREATOR column but do contain Marie:

***ETPIT016 SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS ~ 'PETER' & 'MARIE'

PROBLEM__NUMBER

CREATOR SUBJECT

92010202 Polly What is the difference between FT_MTIME and FT_DAT

92010301 Marie Where can comments appear

92010302 Margaret Are the header files compatible with SUN C++

92011301 Polly What networking software is required to connect to

92011401 Margaret Why can't I define a column named DATE

92012201 Marie Why does '_' not work

92012202 Polly How to find words in sequence

92012701 Marie How many connection handles can be opened concurre

92013001 Polly How do special characters in a query effect a sear

92013101 Margaret Why do alpha-numeric queries produce extra hits?

92020401 Marie How do you look for words with the characters "_"

Page 381: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 381

In the preceding example, SearchServer interprets the (~) in the WHERE clause to influence only the search term 'PETER'. How-ever, the contains predicate also includes the search term 'MARIE'. Therefore, this example could also be written as

SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS ~ 'PETER' AND CREATOR CONTAINS 'MARIE'

By placing the NOT Boolean operator before the CONTAINS key-word, as in the following example, SearchServer interprets the NOT to include both search terms. Therefore, all rows in the table are re-turned because there is no CREATOR value in the SUPPORT table that contains the terms PETER and MARIE together.

SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR NOT CONTAINS 'PETER' & 'MARIE'

PROBLEM__NUMBER

CREATOR SUBJECT

92010301 Marie Where can comments appear

92012201 Marie Why does '_' not work

92012701 Marie How many connection handles can be opened concurrently

92020401 Marie How do you look for words with the characters "_"

PROBLEM__NUMBER

CREATOR SUBJECT

92010201 Peter What are the valid connectors between predicates

92010202 Polly What is the difference between FT_MTIME and FT_DAT

92010301 Marie Where can comments appear

92010302 Margaret Are the header files compatible with SUN C++

92011101 Peter When can I free a statement handle

Page 382: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

382 SA-Application Software Expert 5.0

D

The same results can be achieved if you place the NOT Boolean op-erator before the column name. For example:

SELECT PROBLEM_NUMBER, CREATOR, SUBJECT FROM SUPPORT WHERE NOT CREATOR CONTAINS 'PETER' & 'MARIE'

When you have two or more search terms that you want to exclude from the working table, you must place the ~ Boolean operator be-fore each of the terms. For example:

SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS ~ 'PETER' & ~ 'MARIE'

92011301 Polly What networking software is required to connect to

92011401 Margaret Why can't I define a column named DATE

92012101 Peter Searches take too long (use relevance ordering)

92012201 Marie Why does '_' not work

92012202 Polly How to find words in sequence

92012203 Peter Where to obtain row numbers for the CREATE DOCUMEN

92012701 Marie How many connection handles can be opened concurre

92013001 Polly How do special characters in a query effect a sear

92013101 Margaret Why do alpha-numeric queries produce extra hits?

92013102 Peter How do you look for accented characters

92020401 Marie How do you look for words with the characters "_"

92020501 Peter Is there support for Ful/Text Library files?

PROBLEM__NUMBER

CREATOR SUBJECT

Page 383: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 383

The preceding example can also be written as any of the following examples.

SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR CONTAINS ~ 'PETER'AND CREATOR CONTAINS ~ 'MARIE' SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR NOT CONTAINS 'PETER'AND CREATOR NOT CONTAINS 'MARIE' SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE NOT CREATOR CONTAINS 'PETER'AND NOT CREATOR CONTAINS 'MARIE' SELECT PROBLEM_NUMBER, CREATOR, SUBJECTFROM SUPPORTWHERE CREATOR NOT CONTAINS 'PETER' | 'MARIE'

Grouping Predicates

If you execute a search that uses more than two predicates, you can use parentheses to override the default precedence of the Boolean operators. SearchServer follows the standard rules of precedence. Expressions within parentheses are evaluated first.

When the order of evaluation is not specified by parentheses, the fol-lowing rules apply:

PROBLEM__NUMBER

CREATOR SUBJECT

92010202 Polly What is the difference between FT_MTIME and FT_DAT

92010302 Margaret Are the header files compatible with SUN C++

92011301 Polly What networking software is required to connect to

92011401 Margaret Why can't I define a column named DATE

92012202 Polly How to find words in sequence

92013001 Polly How do special characters in a query effect a sear

92013101 Margaret Why do alpha-numeric queries produce extra hits?

Page 384: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

384 SA-Application Software Expert 5.0

D

1. NOT is applied before AND

2. AND is applied before OR

3. operators that are the same are applied from left to right

For example, SearchServer interprets the following two examples identically:

SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS, CREATORFROM SUPPORTWHERE CREATOR CONTAINS 'PETER'OR STATUS CONTAINS 'CLOSED' AND PRIME_CONTACT CONTAINS 'MONTAG' SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS, CREATORFROM SUPPORTWHERE CREATOR CONTAINS 'PETER'OR (STATUS CONTAINS 'CLOSED' AND PRIME_CONTACT CONTAINS 'MONTAG')

If the working table for the preceding two examples could be dis-played, it would look like this:

However, to change the way SearchServer interprets the preceding example, use parentheses as in the following example:

SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS, CREATORFROM SUPPORT WHERE (CREATOR CONTAINS 'PETER' OR STATUS CON-TAINS 'CLOSED') AND PRIME_CONTACT CONTAINS 'MONTAG'

If the working table for the preceding example could be displayed, it would look like this:

PROBLEM__NUMBER

PRIME_CONTACT STATUS CREATOR

92010201 Dave Chisholm ACTIVE Peter

92011101 Jessica Trew CLOSED Peter

92012101 Daryl Weaver CLOSED Peter

92012203 Montag Hortz CLOSED Peter

92013102 Montag Hortz CLOSED Peter

92020501 Montag Hortz NULL Peter

PROBLEM__NUMBER

PRIME_CONTACT STATUS CREATOR

92012203 Montag Hortz CLOSED Peter

92013102 Montag Hortz CLOSED Peter

Page 385: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 385

Searching More Than One Table

In SearchServer, it is possible to arrange several tables into one log-ical table called a View. Views are typically used to combine tables that have some kind of logical relationship that make it useful for us-ers to search them as one table. Rows in a View cannot be updated or deleted and the View cannot be indexed as one entity.

The tables that make up a View must reside on the same server. For more information about creating Views, see Fulcrum SearchServer Data Preparation and Administration.

Grouping Tables Together in a Search (UNION Clause)

The UNION clause of the SELECT statement is used to group tables together for a specific query. You can use the UNION clause to group tables and Views that are on more than one server.

The TABLENAME function can be selected to identify the table from which each row in the working table is derived. You can spec-ify two identically named tables that have different qualifiers in the UNION clause in a SELECT statement. However, because the TA-BLENAME function doesn't indicate a qualifier, you must also use the TABLEQUALIFIER function for this purpose. The TA-BLEQUALIFIER function returns the node name or pathname of the table for each row in the working table.

The maximum number of tables in a UNION clause is platform-de-pendent. For more information, see Fulcrum SearchServer Getting Started for your platform.

Choosing a View or a UNION

Table 2-4 describes some of the issues to consider when choosing between using a View or a UNION:

92020501 Montag Hortz NULL Peter

PROBLEM__NUMBER

PRIME_CONTACT STATUS CREATOR

Characteristic View UNIONare tables on more than one server?

Views must be made up of tables that reside on the same server

UNIONs can combine tables from any location

Page 386: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

386 SA-Application Software Expert 5.0

D

Table 2-4 Choosing Between View and UNION

Choosing a UNION Clause or Separate SELECT Statements

There are several performance-related reasons why you might find it preferable to use a UNION rather than a series of SELECT state-ments, each naming a separate table. These include:

• If a UNION contains two tables that reference server A and two that reference server B, the client side is sensitive to which tables belong to which servers, and packages network requests accordingly. If the application were to do this, it would require more server requests, resulting in a noticeable performance difference in a WAN environment.

• The UNION clause uses distributed processing to send multiple requests to the servers as a result of a single SELECT statement, then waits for the requests to complete. This can provide significant performance benefits.

• When you use the UNION clause, the application doesn't have to perform the tasks of managing many working tables and combining or merging the results.

can get tablename of component tables?

no yes

can get table qualifier of component tables?

no yes

are columns all the same across component tables?

yes only those that are used in this query

can the ORDERBY function sort by table name?

rows are sorted against all tables always

rows can be sorted across all tables or sorted within each table

are the component tables updatable?

yes (as long as it is done separately for each component table)

yes (as long as it is done separately for each component table)

Characteristic View UNION

Page 387: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 387

Choosing a Retrieval Model

There are three retrieval models that can be specified. Each model has a different impact on the search results when used with a ranking algorithm.

• strict Boolean

• fuzzy Boolean

• vector space

The strict Boolean retrieval model offers very precise control over what is retrieved, but can often eliminate useful text because it does not conform exactly to the search criteria. Fuzzy Boolean and the vector space models offer more relaxed interpretations of the Bool-ean operators.

The retrieval model to be used is controlled by the first character of the RELEVANCE_METHOD server attribute. Vector space is indi-cated by a V and fuzzy Boolean by an F. You can enter the letters V and F in either uppercase or lowercase. The strict Boolean model is selected if the first character is neither V nor F.

Vector space and fuzzy Boolean are referred to as relaxed Boolean models because they relax the strict requirements imposed by the AND, OR, and NOT Boolean operators under the strict Boolean model. You can include the RELEVANCE_METHOD server at-tribute for a SELECT statement by specifying an argument for the RELEVANCE function.

Strict Boolean

This retrieval model is the default for all predicates other than is_about. When searching for matching rows in a table, this model uses strict Boolean operations for combining search terms. There is no measurement of the degree to which a match is found.

The strict Boolean retrieval model is applied to the Boolean opera-tors (AND, OR, NOT) that combine individual predicates in a WHERE clause. This is true regardless of the retrieval model that you have specified.

When this model is used, one single search term influences the rele-vance of a row when combining terms and predicates. For the OR Boolean operator, the most relevant term with respect to matches in the row (based on the relevance method) predominates. For the AND Boolean operator, the least relevant term predominates.

The strict Boolean interpretation can be inconvenient for the follow-ing reasons:

Page 388: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

388 SA-Application Software Expert 5.0

D

• Searches containing AND Boolean operators discard potentially useful rows that contain most of the terms, but not all of them.

• To be effective, strict Boolean searches require an understanding of the nature of the text in the table being searched.

The strict Boolean retrieval model can be used with any of the four ranking algorithms. However, it is not the preferred model with the terms ordered and critical terms ordered ranking algorithms (3 and 4) because of the tendency for single terms to dominate the rele-vance value.

or a complete description of the ranking algorithms available with SearchServer, see the section, "Determining Relevance Ranking," later in this chapter

Fuzzy Boolean

Like strict Boolean, this model can discriminate between all Bool-ean operators and allows the AND (&) Boolean operator in the CONTAINS clause to emphasize rows that contain all terms. How-ever, unlike strict Boolean, it doesn't rule out rows that contain only some of the terms.

Fuzzy Boolean calculates a relevance value for each row that is based on the aggregate weight of search terms and the proportion of search terms matched. The aggregate weight is found by applying all search terms together, but not all search terms must be matched (as required by the strict Boolean model).

The difference between this model and the vector space model is that fuzzy Boolean distinguishes between AND (&) and OR (|) when cal-culating the aggregate weight. The calculation for the OR (|) Bool-ean operator yields an aggregate weight dominated by a few highly weighted (highly relevant) terms. In other words, the OR (|) Boolean operator emphasizes hits count (number of occurrences of the term). It causes rows that match only a few of the specified query terms to be scored highly under the following circumstances:

• those terms appear frequently in the row

• those particular terms are sufficiently rare in the table

• those terms are weighted high enough in comparison to other query terms

The AND (&) Boolean operator produces an aggregate weight in a way that can be influenced by one or a few low weighted terms, em-phasizing the requirement that most, if not all, of the terms be rele-vant to the document. In other words, the AND (&) Boolean operator emphasizes terms count (query coverage).

Page 389: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 389

The NOT (~) Boolean operator prefers dissimilar matching. It caus-es rows that don't match the NOT (~) condition to have a high rele-vance value.

Developer's should be aware that users who haven't been trained in a "computer's view of the world" find the Boolean AND (&) and OR (|) concepts unnatural and often confusing. As a result, the use of these operators in a user interface is recommended only for specially trained user communities. For most users, a model that doesn't re-quire this training (the vector space model) is usually more effective.

The fuzzy Boolean model can only be used with terms ordered and critical terms ordered ranking algorithms to provide statistical rank-ing. For a complete description of the ranking algorithms available with SearchServer, see the section, "Determining Relevance Rank-ing," later in this chapter.

This retrieval model gives more importance to query coverage for the AND (&) Boolean operator and more importance to term occur-rence for the OR (|) Boolean operator. It can improve the precision and recall of the search result when the relationships (synonyms, lin-guistic variants) among the query terms are known. However, be-cause of the differentiation between the AND (&) and OR (|) Boolean operators, it can be more difficult for the end-user to under-stand.

Vector Space

This retrieval model relaxes the strict Boolean model by allowing the match to be measured by using some of the search terms. The rel-evance value is based on the proportion of the most relevant search terms that are matched in the row.

When compared with strict Boolean, this model provides a better measurement of the relevance of a row because all terms are used to-gether to determine the outcome.

The vector space model uses the AND (&) and OR (|) Boolean op-erators equivalently when combining terms for matching rows, and for calculating the relevance value. If these Boolean operators are used in the search, they are not distinguished. Vector space gives equal weighting to query coverage and hits count.

This model can only be used with the statistical ranking algorithm terms ordered and critical terms ordered. This retrieval model is rec-ommended as the easiest for end-users to understand when doing statistical ranking, although fuzzy Boolean can improve precision and recall when the correct Boolean relationship among query terms is known.

Page 390: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

390 SA-Application Software Expert 5.0

D

Defining a Collation Sequence

A collation sequence is used to determine the sort order of text (char-acter) columns in the context of specific application character sets and national languages. For example, you can use a collation se-quence to place accented characters in their correct order, or perform case normalization of alphabetic characters.

By default, the built-in collation sequence is used. It orders text ac-cording to English dictionary order.

To increase sort performance in a client/server environment, sorting should always be performed on the server. To ensure this is the case, the selected collation sequence must be available on the client as well as all servers to be used in any given search. Sorting on the cli-ent does not affect the functionality of the sorting, but could have an impact on performance. To verify sorting is being done on the serv-ers for all tables, check the ftserver log. If a requested collation se-quence is not available, an error message is logged.

In SearchServer, you specify the name of the collation sequence to be in effect through the SET COLLATION_SEQUENCE statement. The syntax for this SET statement is:

SET COLLATION_SEQUENCE <character string lit-eral>

The character string literal specified in this SET statement contains the name of the desired collation sequence chosen from those avail-able to SearchServer or from any custom collation sequences that might have been integrated with SearchServer. For details on creat-ing a custom collation sequence, see the Fulcrum SearchServer Cus-tomization Guide.

Note: The case of the character string literal in the SET statement is relevant and must match exactly the case of the collation sequence name.

Integrating a Custom Collation Sequence

Before you can specify a custom collation sequence in the SETCOLLATION_SEQUENCE statement, you must integrate it with SearchServer. The custom collation sequence you define can't be used until it has been loaded into the SearchServer dynamic li-brary table. To do this, invoke the ftmload utility program as fol-lows:

echo FICS cx mylib mytropen mytrinfo - > tmpfile ftmload tmpfile -e -o fultext.eft

Page 391: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 391

This loads the new entry for the custom collation sequence into the system configuration file. SearchServer recognizes the newly-in-stalled collation sequence after you restart your SearchServer appli-cation.

You should keep a record of modifications to the dynamic library ta-ble by running ftmunld. (You should back up the FULTEXT.EFT file first.) For example:

ftmunld myfilters.eft -o fultext.eft

This creates a file called MYFILTERS.EFT. It is a dynamic library table source file is a standard ASCII file. You can use a text editor to edit this file to add the collation sequence line. The format of the collation sequence line is:

FICS <collation sequence name> <library name> <function name>

The collation sequence name is a character string literal that speci-fies the name of the collation sequence. The maximum length for the collation sequence name is 18 characters. The DLL name specifies the name for the user-supplied dynamic-link library. The function name specifies the name of the collation function and must exist in the DLL specified.

The library name can be fully qualified (full pathname) or unquali-fied. If it's unqualified, SearchServer looks for it in platform-depen-dent locations. See Fulcrum SearchServer Getting Started for your platform for details.

The FULTEXT.EFT dynamic library table file is compiled and load-ed into the FULTEXT.FTC system configuration file through the ft-mload utility program. For complete instructions about using ftmload, see Appendix A, "Utility Program Summary."

Determining Relevance Ranking

The RELEVANCE function combines the retrieval model and rank-ing algorithms to provide a calculated relevance value that estimates the similarity between the row and the query criteria. You use the ORDERBY clause in the SELECT statement to order the rows in the working table by their calculated relevance. The four ranking algo-rithms are:

• hits count (algorithm 1)

• terms count (algorithm 2)

• terms ordered (algorithm 3)

• critical terms ordered (algorithm 4)

Page 392: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

392 SA-Application Software Expert 5.0

D

The ranking algorithms provide a means of ordering documents that favors documents that are most relevant to the query, as opposed to ordering by date or some other column in the table.

The selection of a relevance algorithm determines how the relevance is measured. If no relevance method is specified, no relevance is as-signed. In this case, the value returned from the RELEVANCE func-tion is NULL.

The relevance of a document can be determined by the following factors:

• how many "hits" are in the document

• how many documents contain the search term (Inverse Document Frequency)

• weights that might have been applied to search terms

What are Search Term Hits?

Search term hits are the occurrences of one or more search terms in the columns of interest in a particular row.

What is Inverse Document Frequency?

Document frequency is the number of documents in which a search term appears. Inverse document frequency (IDF) is a function of the ratio of the total number of documents in the table to the number of documents that actually contain the search term.

IDF is used to judge the importance of terms in the terms ordered and critical terms ordered algorithms. The implied assumption is that terms that occur in almost all documents are less helpful in dis-tinguishing documents, and therefore, are less important.

What are Search Term Weights?

Search term weights are numerical values that can be assigned to search terms to indicate the relative importance of each search term. For example, when SearchServer is ranking documents using one of the relevance algorithms, you can assign weights to the search terms in order to influence which terms contribute more to the relevance calculation and cause some documents to be ranked higher than oth-ers.

Hits Count (Algorithm 1)

This ranking algorithm counts the total number of occurrences of the individual words (as opposed to phrases) matched regardless of the term frequency within the table.

Page 393: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 393

Note: This algorithm for calculating the relevance of a row can be used only with the strict Boolean retrieval model. Specifying this al-gorithm with the vector space or fuzzy Boolean model causes a syn-tax error in the SET statement.

As an example of this algorithm, if you set the retrieval model to strict Boolean and the ranking algorithm to 1

SET RELEVANCE_METHOD '2:1'

and then executed the following SELECT statement,

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%', 'HAN-DLE%' ORDER BY REL DESC

the REL column in the working table would contain a numeric value that reflected the number of times the words STATEMENT and HANDLE (and all their root expansions) were found in the TEXT_LOG column. That is, if the TEXT_LOG column for a par-ticular row contained five occurrences of the word STATEMENT and one occurrence of the word HANDLE, the REL column would con-tain a value of 6. The higher the number, the more relevant the re-sult.

If the working table derived from this example could be displayed, it might look like this:

Each word has a default weight of 1. That means that each word in the word list has an equal relative importance in the search.

You can use the WEIGHT option to apply different weights to the words in the list. For instance, in the following example the word HANDLE (and all its root expansions) is given a higher weight value than the word STATEMENT (and all its root expansions):

SELECT RELEVANCE AS REL, PROBLEM_NUMBER, COM-PANY, PRIORITY

REL PROBLEM_NUMBER COMPANY PRIORITY6 92011101 GILFORD

SYSTEMS0

4 92012203 OREO SOFTWARE SOLUTIONS

0

2 92012701 DIXIE CORP. 0

1 92010301 OREO SOFTWARE SOLUTIONS

5

Page 394: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

394 SA-Application Software Expert 5.0

D

FROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%' WEIGHT 1, 'HANDLE%' WEIGHT 10ORDER BY REL DESC

In this case, the same row that had a relevance value of six in the pre-vious example would now have a value of 15. If the working table derived from this example could be displayed, it might look like this:

Note: The ordering (ranking) of rows has changed, as well as the relevance values, due to the introduction of a term weight.

Terms Count (Algorithm 2)

This ranking algorithm is different from the hits count algorithm in that it counts the number of different search terms matched. The fre-quency of occurrence of the terms is not considered. The relevance value is the number of terms matched.

Note: This algorithm for calculating the relevance of a row can be used only with the strict Boolean retrieval model. Specifying this al-gorithm with the vector space or fuzzy Boolean model, causes a syn-tax error in the SET statement.

As an example of this algorithm, if you set the retrieval model to strict Boolean and the ranking algorithm to 2,

SET RELEVANCE_METHOD '2:2'

and then executed the following SELECT statement,

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%', 'HAN-DLE%'ORDER BY REL DESC

REL PROBLEM_NUMBER COMPANY PRIORITY20 92012701 DIXIE CORP. 0

15 92011101 GILFORD SYSTEMS

0

4 92012203 OREO SOFTWARE SOLUTIONS

0

1 92010301 OREO SOFTWARE SOLUTIONS

5

Page 395: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 395

the REL column in the working table would contain a numeric value that reflected the number of search terms in the contains predicate that were matched in the TEXT_LOG column. That is, if the TEXT_LOG column contained five occurrences of the word STATEMENT and one occurrence of the word HANDLE, the REL column would contain a value of two: one for each original search term matched, not for each occurrence of the search term. The higher the number, the more relevant the result.

If the working table derived from this example could be displayed, it might look like this:

By applying different weights to each search term, you can alter the relative importance of each row returned in the working table. In the previous example, both search terms are equally weighted.

However, in the following example, the second search term is 10 times more relevant than the first search term:

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%' WEIGHT 1, 'HANDLE%' WEIGHT 10ORDER BY REL DESC

If the working table derived from this example could be displayed, it might look like this:

Terms Ordered (Algorithm 3)

This ranking algorithm uses a mathematical formula that computes the relevance statistically. Algorithm 3 combines the characteristics

REL PROBLEM_NUMBER COMPANY PRIORITY2 92011101 GILFORD

SYSTEMS0

1 92010301 OREO SOFTWARE SOLUTIONS

5

1 92012203 OREO SOFTWARE SOLUTIONS

0

1 92012701 DIXIE CORP. 0

REL PROBLEM_NUMBER COMPANY PRIORITY11 92011101 GILFORD

SYSTEMS0

10 92012701 DIXIE CORP. 5

1 92010301 OREO SOFTWARE SOLUTIONS

0

1 92012203 OREO SOFTWARE SOLUTIONS

0

Page 396: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

396 SA-Application Software Expert 5.0

D

of algorithms 1 and 2 and how many search terms were matched. It takes into account not only the number of occurrences of each search term, but also a statistical measurement of how common the term is over all the rows in the table (the document frequency).

Note: This algorithm for calculating the relevance of a row can be used with any retrieval model, but is most effective when used with the vector space and fuzzy Boolean retrieval models.

The value in the RELEVANCE function is a number in the 0 to 1000 range. As in the other relevance methods, the higher the number, the more relevant the result.

As an example of this algorithm, if you set the retrieval model to vector space and the ranking algorithm to 3

SET RELEVANCE_METHOD 'V2:3'

and then executed the following SELECT statement,

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%', 'HAN-DLE%'ORDER BY REL DESC

the REL column in the working table would contain a numeric value that reflected the result of the mathematical formula.

If the working table derived from this example could be displayed, it might look like this:

By applying different weights to each search term or search condi-tion, you can alter the relative importance of each row that is re-turned in the working table. In the previous example, both search terms are equally weighted. However, in the following example the second search term is 10 times more relevant than the first search term:

REL PROBLEM_NUMBER COMPANY PRIORITY962 92011101 GILFORD

SYSTEMS0

770 92012203 OREO SOFTWARE SOLUTIONS

0

475 92012701 DIXIE CORP. 0

192 92010301 OREO SOFTWARE SOLUTIONS

5

Page 397: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 397

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%' WEIGHT 1, 'HANDLE%' WEIGHT 10ORDER BY REL DESC

If the working table derived from this example could be displayed, it might look like this:

Critical Terms Ordered (Algorithm 4)

This ranking algorithm builds on algorithm 3 by putting more em-phasis on search terms that occur in fewer rows of the table being searched. These are the terms that are most useful in discriminating between relevant and non-relevant documents.

Although search term weights can be used when this relevance algo-rithm is in effect, it is typically used when search term weights are not known.

Note: This algorithm for calculating the relevance of a row can be used with any retrieval model, but is most effective when used with the vector space and fuzzy Boolean retrieval models.

For example, if you set the retrieval model to vector space and the ranking algorithm to 4

SET RELEVANCE_METHOD 'v2:4'

and then executed the following SELECT statement,

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%', 'HAN-DLE%' ORDER BY REL DESC

the REL column in the working table would contain a numeric value that reflected the result of the mathematical formula. The value in

REL PROBLEM_NUMBER COMPANY PRIORITY475 92012701 DIXIE CORP. 0

237 92011101 GILFORD SYSTEMS

0

77 92012203 OREO SOFTWARE SOLUTIONS

0

19 92010301 OREO SOFTWARE SOLUTIONS

5

Page 398: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

398 SA-Application Software Expert 5.0

D

the RELEVANCE column is a number in the 0 to 1000 range. As in the other relevance methods, the higher the number, the more rele-vant the result.

If the working table derived from this example could be displayed, it might look like this:

Relevance Threshold (RELEVANCE Function)

When using the RELEVANCE function, you can also specify a low-er limit on the relevance values of columns that will appear in the working table. This limit is referred to as the relevance threshold.

If used, the relevance threshold value must be at the end of the rele-vance string, and must follow a greater than (>) sign. The threshold is interpreted as follows:

The following example selects all rows that contain four or more search term hits:

SELECT RELEVANCE('2:1>4')AS REL, PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORTWHERE TEXT_LOG CONTAINS 'STATEMENT%', 'HAN-DLE%'ORDER BY REL DESC

If the working table derived from this example could be displayed, it might look like this:

REL PROBLEM_NUMBER COMPANY PRIORITY721 92011101 GILFORD

SYSTEMS0

577 92012203 OREO SOFTWARE SOLUTIONS

0

475 92012701 DIXIE CORP. 0

144 92010301 OREO SOFTWARE SOLUTIONS

5

>NNN include only rows with RELEVANCE value greater or equal to NNN

REL PROBLEM_NUMBER COMPANY PRIORITY6 92011101 GILFORD

SYSTEMS0

4 92012203 OREO SOFTWARE SOLUTIONS

0

Page 399: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 399

Implementing an Intuitive Search

Intuitive Searching is a feature that allows users to search for infor-mation that is relevant to some existing text without having to con-struct a detailed query. SearchServer constructs the query automatically from the text selected by the user. This feature uses the is_about predicate in the SELECT statement, and optionally, the CREATETEXT_VECTOR statement.

By default, the linguistic processing feature is used for Intuitive Searching. With this feature, all search terms in an Intuitive Search are first reduced to the uninflected form before determining their sta-tistical ranking within the document. The resultant significant terms are then expanded to include the inflected forms before being pro-cessed by the search. This linguistic processing feature increases the accuracy of determining significant terms and therefore returns more relevant documents. You can change this default linguistic processing by using the SET VECTOR_GENERATOR statement.

As with other searches, an Intuitive Search is initiated by executing a SELECT statement. The WHERE clause of this SELECT state-ment can use the is_about predicate to compare the contents of a ta-ble column with the user's selected text. When using the is_about predicate, there are two approaches you can take.

Your application can specify this text directly in the is_about predi-cate, in the form of one of the following:

• a literal string

• a filename

• a reference to rows in a table

This single-statement approach produces a dynamic text vector that uses the same parsing rules that are used with the table being searched.

Alternatively, you can create the text vector using the CREATE TEXT_VECTOR statement. Then a subsequent SELECT statement would use that text vector by name in the is_about predicate. When using this two-step method, the search term parsing rules can only be applied if a table is referenced in the CREATE TEXT_VECTOR statement.

The CREATETEXT_VECTOR statement must still be available (not terminated or closed) when the SELECT statement that refers to the text vector is executed. Using the CREATETEXT_VECTOR statement is advantageous if the predicate is to be used in more than one SELECT statement, and a single statement is not required for the Intuitive Search.

Page 400: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

400 SA-Application Software Expert 5.0

D

Creating the Text Vector

When the user provides a selection of text as the basis of an Intuitive Search (either in the is_about predicate or in the CREATE TEXT_VECTOR statement), SearchServer constructs a text vector containing the most prevalent terms from the selection. Weights are assigned to the search terms composing the text vector according to how often each term appears in the selection of text.

The VECTOR_TERMS parameter of the CREATETEXT_VECTOR statement is used for initial term selec-tion by specifying how many terms are kept in the text vector. Most situations are dealt with adequately by setting this parameter to 100 (the default) or at most 200 terms.

Increasing the number of terms can require more storage for both the CREATETEXT_VECTOR statement and the subsequent SELECT statement. Increasing the value of the VECTOR_TERMS parameter can improve recall because more terms are included in the text vec-tor. This causes more documents to be retrieved but can cause the search to take longer when the text vector is used.

Selecting the Ranking Algorithm for an Intuitive Search

The is_about predicate is restricted to matching a single column or zone. For each row in the working table, it looks at the specified col-umn in the predicate and determines the number of words in the text vector that match, and the number of times they match. It then com-putes a document weight value, taking into account the term weights in the text vector.

The retrieval model used for Intuitive Searching (is_about predicate) is internally set to the vector space model. The vector space model has been chosen based on the premise that the text vector contains a set of terms whose relationships are unknown. There can be terms that are related by meaning (synonyms) or linguistically (for exam-ple, plural and singular forms).

This computation depends on the ranking algorithm as set in the RELEVANCE_METHOD server attribute of the SERVER_INFO table. However, you can override the current relevance method for this SELECT statement by including the relevance method option in the argument of the RELEVANCE function. With vector space set as the retrieval model, the relevance method selected must be either terms ordered (3) or critical terms ordered (4). If one of the other relevance methods is selected, then the strict Boolean method will be used.

Note: When performing an Intuitive Search (is_about predicate) alone or in combination with other criteria (a combined structured and Intuitive search), the most intuitively correct document ranking

Page 401: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 401

will be obtained by specifying one of the statistical ranking algo-rithms, preferably CRITICAL_TERMS_ORDERED.

The following example shows how to set the relevance method, cre-ate the text vector, and then reference it in an is_about predicate in a SELECT statement:

SET RELEVANCE_METHOD 'F2:4' CREATE TEXT_VECTOR VEC1'Windows clients can talk to UNIX servers'

SELECT RELEVANCE AS REL, PROBLEM_NUMBER, COM-PANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG IS_ABOUT VEC1 ORDER BY REL DESC

The following example uses the is_about predicate to get the same results in one statement:

SELECT RELEVANCE ('V2:4') AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORT WHERE TEXT_LOGIS_ABOUT 'Windows clients can talk to UNIX servers'ORDER BY REL DESC

If the working table derived from this example could be displayed, it might look like this:

REL PROBLEM_NUMBER

COMPANY PRIORITY

345 92012202 SARASOTA COMPUTERS 0

286 92011301 OREO SOFTWARE SOLUTIONS

0

86 92010202 SARASOTA COMPUTERS 0

53 92012701 DIXIE CORP 0

53 92020501 OREO SOFTWARE SOLUTIONS

NULL

35 92010301 OREO SOFTWARE SOLUTIONS

5

35 92011101 GILFORD SYSTEMS 0

35 92012203 OREO SOFTWARE SOLUTIONS

0

17 92010302 GMS GROUP 1

17 92013101 SARASOTA COMPUTERS 5

Page 402: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

402 SA-Application Software Expert 5.0

D

In the first example, the TEXT_LOG column of the SUPPORT table is checked for similarity to the text vector VEC1. This text vector is a simple string of 7 words, each with a term weight (occurrence count) of 1. To perform the similarity checking, the Vector Space re-trieval model (not the fuzzy Boolean model as specified) and rank-ing algorithm 4 are used. The resulting working table indicates the rows matched and orders them by the calculated relevance.

The following example performs exactly the same functions as the previous example. However, instead of changing the relevance method for all future SELECT statements, this example uses the rel-evance option specified in the RELEVANCE function to override the relevance method for this SELECT statement only.

CREATE TEXT_VECTOR VEC1'Windows clients can talk to UNIX servers' SELECT RELEVANCE('V2:4') AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG IS_ABOUT VEC1 ORDER BY REL DESC

Table 2-5 shows which combinations of retrieval model and rele-vance algorithm are supported:

Notes:

1. Strict Boolean applies to Boolean combinations (AND/OR/NOT) of predicates regardless of retrieval model selected.

2. Vector space applies to Intuitive Searching (is_about predicate), regardless of retrieval model selected for the search.

3. This combination of retrieval model and relevance method is not as effective as using the fuzzy Boolean or vector space retrieval mod-els.

Not permitted.

Table 2-5 Retrieval Model and Relevance Algorithms

Text Vector Refinement

The MAX_TERMS parameter of the is_about predicate affects the calculated relevance by controlling the number of terms from the

Relevance AlgorithmRetrieval Model 1 2 3 4

strict Boolean1 3 3

fuzzy Boolean

vector space2

Page 403: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 403

text vector that are used in matching rows, and therefore the number of terms used in the relevance calculation. Use of this parameter for restricting the number of search terms is called "text vector refine-ment" because after the text vector is created containing all the prev-alent terms, the matching that is performed in the SELECT statement uses only a subset of those terms.

This subset of terms is considered to be more useful in discriminat-ing documents. In this case, the text vector retains the most prevalent terms in the specified text and then the Intuitive Search chooses to match only the most useful of those terms.

Setting the MAX_TERMS parameter in the is_about predicate re-duces the number of matches, especially the number of spurious matches, by eliminating words that are least useful in identifying rel-evant documents. Both the processing time and storage require-ments are also reduced. However, this can also reduce the number of matching documents.

For ranking algorithms 3 or 4, the MAX_TERMS parameter can have a positive effect. In these cases, the final relevance calculation is influenced directly by the frequency of the term in each document as well as by the document frequency (the number of documents in which the term occurs). By selecting only a few terms, the associated documents are better identified through the calculated relevance.

Specifying the MAX_TERMS parameter can boost the precision (percentage of rows in the working table that are relevant) of the search at the cost of lowering recall (percentage of all relevant rows selected for the working table). Usually, the benefit of increasing precision outweighs the decrease in recall, because the more rele-vant documents are ranked higher if the relevance function is speci-fied. This remains true as the MAX_TERMS value is reduced to a small fraction of the number of terms in the text vector.

In many situations, high precision is more important than high re-call, and the MAX_TERMS parameter can be set to five or ten per-cent of the number of terms in the text vector. The following example sets the MAX_TERMS value to 2:

SELECT RELEVANCE () AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG IS_ABOUT VEC1 MAX_TERMS 2ORDER BY REL DESC If the working table derived from this example could be displayed, it might look like this:

REL PROBLEM_NUMBER

COMPANY PRIORITY

633 92012202 SARASOTA COMPUTERS 0

158 92010202 SARASOTA COMPUTERS 0

Page 404: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

404 SA-Application Software Expert 5.0

D

The text vector can also be refined by specifying a document fre-quency with the relevance method. A third parameter can be added after the ranking algorithm when using the is_about predicate.

This value, referred to as the document frequency, is expressed as the percentage of documents in the entire table that contain a partic-ular term from the text vector. To select a maximum means that terms are not included in the text vector if they occur in more docu-ments than the percentage specified. This reduces common terms. If neither the MAX_TERMS parameter nor the maximum document percentage parameter are specified, no text vector refinement is per-formed. All the terms in the text vector (except the stop words) are used in the Intuitive Search.

If you set the value too low, some Intuitive Searches will fail be-cause all the terms were dropped from the text vector. This is likely to occur when very short text is selected for the Intuitive Search, and the document frequency parameter is too low.

The following example excludes any terms from the text vector that occur in more than 30 percent of the documents:

SELECT RELEVANCE ('V2:4:30') AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG IS_ABOUT 'Windows clients can talk to UNIX servers'ORDER BY REL DESC

If the working table from this example could be displayed, it might look like this:

98 92012701 DIXIE CORP 0

98 92020501 OREO SOFTWARE SOLUTIONS

NULL

65 92010301 OREO SOFTWARE SOLUTIONS

5

65 92011101 GILFORD SYSTEMS 0

65 92012203 OREO SOFTWARE SOLUTIONS

0

32 92010302 GMS GROUP 1

32 92013101 SARASOTA COMPUTERS 5

REL PROBLEM_NUMBER

COMPANY PRIORITY

REL PROBLEM_NUMBER

COMPANY PRIORITY

422 92012202 SARASOTA COMPUTERS 0

Page 405: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 405

Combining Multiple Is_about Predicates

The Intuitive Searching technique called relevance feedback allows you to use multiple is_about predicates in a search. When multiple is_about predicates are used in a SELECT statement, each predicate can be assigned a unique weight factor. The weight is applied as a method of setting priorities on certain conditions. The higher the weight, the higher the priority of that search condition.

For instance, the following example contains three is_about predi-cates, each with a different weight:

SELECT RELEVANCE() AS REL, PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE TEXT_LOG IS_ABOUT 'Windows clients can talk to UNIX servers' WEIGHT 5AND TEXT_LOG IS_ABOUT 'Software hardware' WEIGHT 10OR TEXT_LOG IS_ABOUT 'filter%' WEIGHT 2ORDER BY REL DESC

Besides the effects of the different weights in the preceding exam-ple, there is an implicit order for resolving the portions of the WHERE clause with respect to each row being matched. These re-sults of the predicates involving the first and second text vectors are combined with the AND Boolean operator. The combined results of the first and second text vectors are used with the results of the third text vector and the OR Boolean operator.

When combining the is_about predicates with the AND Boolean op-erator, a row will appear in the working table if it is matched by both of the text vectors (the first and second text vectors in this example).

If you combine the is_about predicate with any other predicate using the AND Boolean operator, and the text vector contains stop words and non-existent terms, the working table derived from this search contains rows that match the other predicates, and not the is_about predicate.

When combining is_about predicates with an OR operator, a row will appear in the working table if it is matched by either of the spec-ified text vectors. This behavior is a result of applying the strict Boolean retrieval model to the combinations of predicates.

349 92013101 OREO SOFTWARE SOLUTIONS

0

105 92010202 SARASOTA COMPUTERS 0

REL PROBLEM_NUMBER

COMPANY PRIORITY

Page 406: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

406 SA-Application Software Expert 5.0

D

In more general terms, the predicates surrounding the AND Boolean operator are resolved first. It is the result of the AND Boolean oper-ator that is used with the next Boolean operator. The AND Boolean operator takes precedence over the OR Boolean operator, and there-fore is resolved first. This order can be influenced by placing paren-theses where appropriate.

Searching the Reserved Columns

Most reserved columns can be specified in the select list or WHERE clause of a SELECT statement. A reserved column is searched in the same manner as a user-defined column based on its data type and in-dex mode. However, there are four reserved columns that have spe-cial search restrictions:

• FT_CID

• FT_ROW_TYPE

• FT_ROW_STATE

• FT_TIMESTAMP

FT_CID

The FT_CID reserved column contains a unique numeric value that is automatically assigned for each row in a table. You can use this unique value to select specific rows in a table. However, there are some search restrictions for this reserved column.

You can specify this reserved column only in a SELECT statement that includes an in predicate, or a comparison predicate that uses the equal to (=) or not equal to (<>) comparison operators, because this column is defined with VALUE index mode. When using these predicates, the list of FT_CID values can be specified in any order.

FT_CID values must have been retrieved previously from Search-Server, either using the LAST_ROWID_INSERTED server at-tribute, or using the FT_CID in the select list of a previous SELECT statement. The FT_CID values are guaranteed to be invariant (for a table) unless you reorganize the table using the ftcout and ftcin util-ity programs. Consequently, FT_CID values can be stored external-ly (for example, as a foreign key in a database table) and used to identify particular rows at a later time.

This reserved column can't be used to search a SearchServer View. This is because a View is treated as a table, and therefore, the FT_CID values of each component table are meaningless. However, you can use it when specifying the UNION clause in a search. In this case, the search matches any occurrence of the FT_CID value in any of the tables.

Page 407: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 407

When the FT_CID value is specified in a simple search (that is, no other predicates are involved), the values are not verified. Therefore, if some of the FT_CID values are invalid for the table, it is detected only when the application attempts to retrieve data from this row in the working table.

However, when the FT_CID reserved column is part of a search con-taining multiple or negated predicates, the FT_CID value is verified and the working table derived from this search will contain only val-id rows of data.

Note: Although FT_CID searches don't make use of the convention-al index created by the VALIDATE INDEX statement, they do re-quire that such an index exists. Any attempt to search a table that has no indexed rows returns an error.

Container Rows

A search based on the FT_CID reserved column operates differently than other searches for container rows (these are rows using either a Fulcrum library data file or a directory name as the external docu-ment reference). In this case, when the FT_CID value of a container is actually specified in the search condition, then the container row is returned in the working table. However, the container row isn't re-turned when the search condition uses the NOT Boolean operator and the FT_CID value doesn't match the container row. For instance, the following example would return a container row with an FT_CID value of 1:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE FT_CID = 1;

However, the following example would not return a container row with an FT_CID value of 1:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITYFROM SUPPORTWHERE FT_CID <> 2;

FT_ROW_TYPE

The FT_ROW_TYPE reserved column can be used only with the equal to(=) comparison operator and the OR Boolean operator. The WHERE clause can't include any other types of predicates or refer-ences to any other columns. The following syntax shows the only ac-ceptable forms of the WHERE clause for this reserved column:

SELECT <select list>FROM <table name>WHERE FT_ROW_TYPE = <character string literal>

Page 408: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

408 SA-Application Software Expert 5.0

D

[{OR FT_ROW_TYPE = <character string liter-al>}...]; <character string literal>

::= 'DIRECTORY' | 'DATA';

DIRECTORY is used to identify a row that refers to a container as indicated in the external document reference (FT_SFNAME). DATA is used to identify a row that contains a reference to the smallest unit of retrieval in FT_SFNAME.

Each of the possible literals can appear only once in the statement.

Note: These character string literals are case-sensitive and must be entered in uppercase. All other parts of the statement are not case-sensitive.

The following two statements return the same results:

SELECT *FROM SUPPORT; SELECT *FROM SUPPORTWHERE FT_ROW_TYPE = 'DATA';

However, the SELECT statement that specifies the FT_ROW_TYPE can't be used to search a SearchServer View.

FT_ROW_STATE

The FT_ROW_STATE reserved column can be used only with the equal to (=) comparison operator. It can't be used with any other types of predicates or with any Boolean operators or can this re-served column be specified in a SearchServer View.

There are five possible values for this reserved column. However, the only searchable values are NOT_YET_INDEXED and CANNOT_BE_INDEXED. The following syntax shows the only acceptable forms of the WHERE clause for this reserved column:

SELECT <select list>FROM <table name>WHERE FT_ROW_STATE = <character string liter-al>; <character string literal>

::= 'NOT_YET_INDEXED'

| 'CANNOT_BE_INDEXED'

Page 409: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

The Search Process

Text Retrieval Guide 409

Note: The character string literals are case-sensitive and must be entered in uppercase. All other parts of the statement are not case-sensitive.

FT_TIMESTAMP

The FT_TIMESTAMP reserved column can only be specified in the WHERE clause of a searched UPDATE or DELETE statement. It can be specified in the column list of a SELECT statement. This read-only reserved column is defined with an index mode of NONE.

The value is assigned by SearchServer when a row is inserted or up-dated. This value can then be used to ensure that a row has not been modified since a previous SELECT statement was executed. The following example illustrates optimistic concurrency using the FT_TIMESTAMP reserved column. The example assumes that only one row (having an FT_CID value of 15) matches the WHERE clause criteria:

SELECT STATUS, FT_CID, FT_TIMESTAMPFROM SUPPORTWHERE CREATOR = 'PETER' AND STATUS CONTAINS 'HOLD'

The values from the result list must be saved by the application. To continue this example, the result list could have the following val-ues:

FT_CID = 15FT_TIMESTAMP = 17263)

The searched UPDATE statement would then reference these val-ues. For example

UPDATE SUPPORT SET STATUS = 'CLOSED'WHERE CREATOR CONTAINS 'PETER'AND FT_CID = 15AND FT_TIMESTAMP = '17263'

Optimistic concurrency provides an alternative data integrity ap-proach to row locking. On platforms that don't support read-only locks (such as Windows), optimistic concurrency allows a user to edit data for an extended period, while other users can concurrently view information contained in the row.

Page 410: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

410 SA-Application Software Expert 5.0

D

Page 411: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 411

Chapter 3:

SearchSQL Language Elements

This chapter describes the supporting elements of the SearchSQL language. Included in this chapter are the definitions of the follow-ing elements:

• identifiers

• literals

• data types

• user-written comments

Identifiers

Identifiers are used to specify the following names:

<column alias><column identifier><correlation name><cursor name><text vector name><domain name><schema name><table identifier><thesaurus name><zone name>

In SearchServer, the maximum length for most identifiers is 18 char-acters.

The characters for an identifier can be letters (A-Z), numeric digits (0-9), or the underscore character (_). The first character must be a letter. Embedded spaces aren't permitted and reserved words can't be used as identifiers. For a complete listing of reserved words, see the section, "Reserved Words," later in this chapter.

The following are all valid column names:

STATUSSVR4PROBLEM_NUMBER

Identifiers can also be quoted. Quoted identifiers follow the same rules for construction (that is, the first character must be a letter, and remaining characters can be letters, digits, and underscores), but they are always enclosed in double quotes ("). Unlike ordinary iden-

Page 412: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

412 SA-Application Software Expert 5.0

D

tifiers, quoted identifiers can duplicate reserved words. However, adding quotes to an ordinary identifier doesn't make a unique iden-tifier.

The following are valid quoted identifiers (note that "unique" is a re-served word):

"STATUS" "UNIQUE" "PROBLEM_NUMBER"

Identifiers, like the rest of the SearchSQL language, aren't case sen-sitive. All letters are treated as uppercase characters. For example, the following two statements are equivalent:

SELECT COMPANY FROM SUPPORT WHERE CREATOR CONTAINS 'PETER'

select company from support where creator contains 'peter'

Note: The literals 'PETER' and 'peter' in the two previous examples are equivalent because the NORMALIZATION option in the SUP-PORT table is not set to NONE. For more information about the NORMALIZATION option, see the section, "CREATE TABLE Clause/Statement," in Chapter 4, "SearchSQL Statements." This doesn't affect identifier case-insensitivity.

Compound Identifiers

Table Names

The syntax of the table name compound identifier looks like this:

<table name> ::= [<table qualifier> <table separator>] <table identifier>

A table name is a compound identifier that consists of a table iden-tifier and an optional table qualifier with a table separator.

Table Identifier The table identifier is the actual name of the table, as defined in the CREATE TABLE clause. The length and format of table names are environment dependent.

Table Separator The table separator character is a backslash (\) or slash (/) in Windows, and a slash (/) in UNIX. It can also be a period (.) when the qualifier is a node name.

Page 413: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 413

Table Qualifier If the table is accessed through a remote server, the table qualifier is the name of the node where the remote server used to access the table it is executing. If the table is accessed without us-ing a remote server, the table qualifier is the directory name where the configuration file for the table is stored.

You can find the qualifiers for every table visible to the data source in the TABLE_QUALIFIER column of the TABLES system table. The qualifier name is prefixed to the table identifier and must be fol-lowed by a table separator to separate it from the table identifier.

For example, the following is a statement fragment that creates a ta-ble for the Customer Support data source on the node called FISH (from any Windows client):

CREATE TABLE FISH\SUPPORT

Note that the table qualifier is different from the data source name to which your application connects. You can't connect to a particular remote server unless you have defined a data source which specifi-cally uses that remote server. (The name of that data source can be the same as that of the corresponding remote server.) If you have connected to a data source that restricts access to the tables on one remote server, you can use a table qualifier only if it matches the node name of that server.

The table qualifier can also be an absolute file path for accessing lo-cal tables. An absolute file path is a sequence of characters having syntax specific to the operating system. In Windows, it consists of a drive letter, a colon (:), and optionally one or more directory names each prefixed with a slash (/) or backslash (\). Directory names con-sist of a sequence of letters, digits, underscores (_) and periods (.). DOS restrictions on the length and form of filenames are not en-forced or checked by SearchServer. In UNIX, an absolute file path consists of one or more directory names, each prefixed by a slash (/). Directory names consist of an arbitrary number of letters, digits, underscores, and periods. A file path can't contain white space. The overall length of an absolute file path is also limited. It is 260 in Windows (66 when it refers to a local directory in 16-bit Windows environments) and 255 in UNIX. An absolute file path always con-tains at least one ":", "/", or "\". This distinguishes it from a node name. Qualifiers cannot be relative paths.

For example, to create a table called TEMPDATA in a specific di-rectory in Windows:

CREATE TABLE C:\WORKING\TEMPDATA

To create the table in the UNIX environment:

CREATE TABLE /usr/home/working/TEMPDATA

Page 414: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

414 SA-Application Software Expert 5.0

D

The table qualifier is case-sensitive only when it names a directory and the underlying file system distinguishes case (for example, UNIX). In addition, table qualifiers can be quoted.

If the table qualifier is omitted from a SearchSQL statement, Search-Server uses the "current qualifier" as a default if it has been speci-fied. Otherwise, SearchServer automatically searches for the table on the list of directories specified by FULSEARCH and on the serv-ers specified by FTNPATH.

When executing a CREATE SCHEMA or CREATE TABLE state-ment, if you don't specify the table qualifier and the data source in use doesn't restrict access to a particular remote server, the table is created on the local node (where the statement is executed). When executing other statements that use a table name but don't specify the qualifier, SearchServer determines the table location from among the SearchServer servers accessible to that data source.

For more information about table location, see the installation in-structions in Fulcrum SearchServer Getting Started for your plat-form.

Column Names

The syntax of the column name compound identifier looks like this:

<column name> ::= <table identifier>.<column identifier> | <correlation name>.<column identifier> | <column identifier>

For a complete description of correlation name, see the section, "SE-LECT Statement," in Chapter 4, "SearchSQL Statements."

Reserved Words

Table 3-1 contains an alphabetical list of reserved words that cannot be used as identifiers:

ABANDON ABSOLUTE

ADA ADD

ALL ALLOCATE

ALTER AND

ANY APVARCHAR

ARE AS

ASC ASSERTION

ASSUME_TABLE_VALID AT

AUTHORIZATION AVG

B BASEPATH

BEGIN BETWEEN

Page 415: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 415

BINARY BIT

BIT_LENGTH BLOCKSIZE

BUFFER BY

C CASCADE

CASE CAST

CATALOG CHAR

CHARACTER CHARACTERS

CHAR_LENGTH CHARACTER_LENGTH

CHARACTER_SET CHARACTER_VARIANT

CHECK CHECK_TEXT_STATUS

CLOSE COALESCE

COBOL COLLATE

COLLATION COLLATION_SEQUENCE

COLUMN COMMIT

COMMON CONNECT

CONNECTION CONSTRAINT

CONSTRAINTS CONTAINS

CONTEXT CONTINUE

CONVERT CORRESPONDING

COUNT CREATE

CURRENT CURRENT_DATE

CURRENT_TIME CURRENT_TIMESTAMP

CURRENT_UTC_TIME CURRENT_UTC_TIMESTAMP

CURSOR CUSTOM_VIEWER

DATE DAY

DEALLOCATE DEC

DECIMAL DECLARE

DEFAULT DEFERABLE

DEFERRED DELETE

DESC DESCRIBE

DESCRIPTOR DIAGNOSTICS

DICTIONARY DISCONNECT

DISPLACEMENT DISTINCT

DOCUMENT DOMAIN

DOUBLE DROP

ELSE END

END_EXEC EQ

ESCAPE EXCEPT

EXCEPTION EXEC

EXECUTE EXISTS

Page 416: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

416 SA-Application Software Expert 5.0

D

EXTERNAL EXTRACT

FALSE FETCH

FETCH_BUFFER_SIZE FILE

FILTER FIRST

FLOAT FOR

FOREIGN FORMAT_TEXT

FORTRAN FOUND

FRAGMENTED FROM

FTCHAR FTVARCHAR

FULL FULLNAME

GE GET

GLOBAL GO

GOTO GRANT

GROUP GT

HAVING HOUR

IDENTITY IGNORE

IMMEDIATE IN

INDEX INDEXDIR

INDICATOR INNER

IN_ORDER INPUT

INSENSITIVE INSERT

INT INTEGER

INTERSECT INTERVAL

INTO IS

IS_ABOUT ISOLATION

JOIN

KEY

LANGUAGE LAST

LE LEFT

LEVEL LIKE

LITERAL LOCAL

LOWER LT

MARKER_LIST MATCH_VCC_LIST

MATCH MAX

MAX_EXEC_TIME MAXROWS

MAX_SEARCH_ROWS MAX_TERMS

MIN MINUTE

MODIFY MODULE

MONTH MUMPS

N NAMES

Page 417: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 417

NATIONAL NATURAL

NCHAR NE

NEXT NOLOCKING

NONE NORMAL

NORMALIZATION NO_SPACE_CHECK

NOT NULL

NULLIF NUMERIC

OCTET_LENGTH OF

OFF ON

ONLY OPEN

OPTION OR

ORDER ORIGINAL

OUTER OUTPUT

OVERLAPS

PARAGRAPHS PASCAL

PERIODIC PLI

POSITION POSITIONING_UNIT

PRECISION PREPARE

PRESERVE PRIMARY

PRIOR PRIVILEGES

PROCEDURE PROTECT

PROXIMITY PUBLIC

READ REAL

REFERENCES REFERENCE_SPOOLING

RELATIVE RELEVANCE

RELEVANCE_METHOD REPLACE

RESTRICT REVOKE

REWIND RIGHT

RIGHT_MARGIN ROLLBACK

ROW ROWLOCKING

ROWS

SCHEMA SCROLL

SEARCH_MEMORY_SIZE

SECOND

SECTION SELECT

SENTENCES SEQUENCE

SERVER_REPORT_TIME SET

SHOW_MATCHES SHOW_SGR

SIMILAR SIZE

SMALLINT SOME

Page 418: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

418 SA-Application Software Expert 5.0

D

Table 3-1Reserved Words

SQL SQLCODE

SQLERROR SQLSTATE

STOPFILE SUBSTR

SUM SYSTEM

TABLE TABLENAME

TABLEQUALIFIER TEMP_FILE_SIZE

TEMPORARY TERM_GENERATOR

TEXT_VECTOR THEN

THESAURUS THESAURUS_NAME

TIME TIMESTAMP

TO TRANSACTION

TRANSLATE TRANSLATION

TRUE

UNION UNIQUE

UNKNOWN UNLOCK

UNPROTECT UPC

UPDATE UPPER

USAGE USER

USING

VALIDATE VALUE

VALUES VARCHAR

VARYING VCC_RULES

VECTOR VECTOR_GENERATOR

VECTOR_TERMS VIEW

WEIGHT WHEN

WHENEVER WHERE

WILDCARD_OPT WITH

WITHIN WITHOUT

WORD_BROADEN WORD_MODIFY

WORD_NARROW WORD_SIMILARITY

WORD_SUFFIX WORD_SYNONYM

WORDS WORK

WORKDIR WRITE

X

YEAR

ZONE

Page 419: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 419

Literals

A literal denotes a data value. The syntax for a literal is:

<literal> ::= <character string literal> | <numeric literal> | <date literal>

literals specify values that are not NULL. Typically, they are used to specify values in a row, and to establish values for parameters in SET statements. You can also use character string literals when cre-ating a text vector.

There are three types of literals:

<character string literal> <numeric literal> <date literal>

Character String LiteralA character string literal is a text data value. The syntax for a character string literal looks like this:

<character string literal>

::= <quote> <character representation>... <quote>

<quote>

::= '

<character representation>

::= <non quote character> | <quote symbol>

<quote symbol>::= ''

A character string literal permits any character defined in the native character set, other than the quote character ('), to be embedded in a string. The quote character (') is used at the beginning and end of a character string literal. To embed a quote character (') in the charac-ter representation, you must use the quote symbol (' ') that consists of two single quote characters.

The following example shows how a character string literal is used in a search:

SELECT PROBLEM_NUMBER, STATUS, COMPANY, LAST_MODIFIED

FROM SUPPORT

Page 420: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

420 SA-Application Software Expert 5.0

D

WHERE STATUS CONTAINS 'ACTIVE'

The following example shows how to embed a quote character (') in a character string literal by using the quote symbol (' '):

SELECT PROBLEM_NUMBER, SUBJECT FROM SUPPORT WHERE SUBJECT CONTAINS 'WHY CAN''T I DEFINE A COL-

UMN'

The length of a character string literal is the number of non-quote characters and quote symbols (' ') it contains. The data type of a char-acter string literal is CHAR.

The value of a character string literal is the sequence of characters it contains. Each quote symbol represents a single quotation mark character in both the value and the length of the character string lit-eral. For instance, the value of the previous example includes every character, space, and one quotation mark for a length of 27.

Numeric Literal

A numeric literal is a decimal representation of a whole number. The syntax for a numeric literal looks like this:

<numeric literal> ::= [<sign>]<unsigned integer>

<unsigned integer> ::= <digit>...

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

<sign> ::= + | -

A numeric literal lets you specify a numeric quantity. In a SELECT state-ment, a numeric literal can be used as the second value expression in a comparison predicate. For example, the following search returns all the rows in the SUPPORT table corresponding to an item for which the value of the PRIORITY column is greater than 3:

SELECT PROBLEM_NUMBER, PRIORITY, COMPANY, LAST_MODIFIED FROM SUPPORT WHERE PRIORITY > 3

Date Literal

A date literal represents a specific date value. The syntax for a date literal is shown below:

Page 421: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 421

<date literal> ::= DATE <quote> <date value> <quote>

<date value> ::= <years value>-<months value>-<days value>

Date literals are used to specify a date value. Only numeric values are al-lowed and they must follow the rules for dates according to the Gregorian calendar. BC dates are not valid. The data type of a date literal is DATE. Date values have the form YYYY-MM-DD. Variations with fewer digits in date elements are also accepted, but are not guaranteed to be accepted in future releases. The largest date value that you can enter is 2047-12-31; the minimum date value is 100-01-01. Dates outside these limits pro-duce unpredictable results.

If the value of the year is less than 100, 1900 is added to the value. For example, the following date literals are interpreted as September 8, 1994 and September 8, 1901, respectively:

DATE '94-09-08'

DATE '0001-09-08'

It is suggested that applications use full 4-digit years representing the actual year number rather than depending on this SearchServer feature.

Patterns

A pattern is a character string that you use in the CONTAINS and LIKE predicates to search for terms (words and phrases) within a column. The syntax for a pattern is:

<pattern> ::= <character string literal>...[<escape clause>]

<escape clause> ::= ESCAPE <quote> [<non quote character>] <quote>

A pattern is formed exactly like a character string literal but is interpreted differently. It is distinguished from a character string literal by its option-al escape clause and can only be used in a contains predicate. A pattern is interpreted differently, depending on the index mode (NORMAL, LIT-ERAL, or VALUE) of the column or zone being searched.

How Does SearchServer Interpret a Pattern?

Because SearchServer recognizes the extent of a word based on the lexical rules of Latin-type languages, such as the English language, a word is defined as any sequence of letters or digits delimited by white space (spaces, newlines, tabs, etc.) or punctuation characters

Page 422: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

422 SA-Application Software Expert 5.0

D

(&, . , ; etc.). For example, a term can be entered as a complete or incomplete word, or you can embed a comma (,) and period (.) in a numeric word to represent monetary values, as follows:

'1,016.31'

However, a space ( ) and the following punctuation characters take on a special meaning when they are embedded in a pattern:

• hyphen (-)

• backslash (\)

• underscore (_)

• percent (%)

To search for one of these characters in a column or zone defined with LITERAL index mode, use the escape character as described later in this section.

By default, SearchServer is not sensitive to the case (uppercase or lowercase) of alphabetic characters in a pattern, or in the text being searched. However, the case sensitivity of pattern matching can be controlled for each table. For more information about case normal-ization, see the NORMALIZATION option in the section, "CRE-ATE TABLE Clause/Statement," in Chapter 4, "SearchSQL Statements."

Each internal character set provided with SearchServer has a set of parsing rules associated with it. These parsing rules define how in-dexing will treat each character in a character set. For a complete de-scription of these parsing rules, refer to Fulcrum SearchServer Data Preparation and Administration, Appendix D

Table 3-2 shows some possible word and phrase matches for pat-terns used in a column defined with NORMAL or LITERAL index mode.

Pattern Examples

Examples of Matched Text

Character Pattern Normal Index Mode Literal Index Mode

Space ( ) 'ON LINE' ON<tab>LINE ON<newline>LINE ON LINE

ON<tab>LINEON<newline>LINE

Escape (\) with Space ( )

'ON\ LINE' ON<tab>LINE ON<newline>LINE ON LINE

ON LINE

Page 423: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 423

Note: These examples don't reflect exactly where the match codes would be placed for highlighting. The LITERAL index mode exam-ples are assumed to be extracted from text where they are delimited by LITERAL mode separator characters (for example, tab or new-line). Because this table is meant only to illustrate various possibili-ties, some of the search terms can't be verified using the SUPPORT table.

Table 3-2 Examples of Word and Phrase Matches for Patterns

Hyphen (-) 'ON-LINE' ON<tab>LINE ON ; LINE ON<newline>LINE ON.ONLINE ON &LINEON LINE ON; LINE ON-LINE ON. LINE ON;LINE ON\LINE ON.LINE ON A LINE

ON-LINE

Escape (\) with Hyphen (-)

'ON\-LINE' ON<tab>LINE ON<newline>LINE ON LINEON LINE ON &LINEON LINE ON; LINE ON-LINE ON. LINE ON;LINE ON\LINE ON.LINE ON A LINE ON&LINE

ON-LINE

Any Punctuation Character

'ON&LINE' ON<tab>LINE ON ; LINE ON<newline>LINE ON LINEON LINE ON &LINEON-LINE ON; LINEON;LINE ON. LINE ON.LINE ON\LINE ON&LINE ON A LINE

ON&LINE

Underscore (_) 'WO_D' WORD WOOD

WORD WO D WO-D WO_D WOOD WO&D

Percent (%) 'WOR%' WORD WORDSMITH WORDAGE WORDY WORDING WOR% WORDLESS WOR& WORDPLAY WOR A WORDS

WORD WORDSMITHWORDAGE WORDYWORDING WOR%WORDLESS WOR&WORDPLAY WOR AWORDS

Escape (\) withPercent (%)

'WOR\%' WOR% WOR%

Pattern Examples

Examples of Matched Text

Page 424: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

424 SA-Application Software Expert 5.0

D

Using Punctuation Characters in a Pattern

When searching text in NORMAL index mode, punctuation charac-ters (other than those defined as special characters—space, hyphen, underscore, percent, and backslash) are interpreted as placeholders for a single character. For example, the pattern in the following statement:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON&LINE'

is interpreted by SearchServer as the following statement:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON' WITHIN 1 CHARACTERS OF 'LINE' IN_ORDER

This pattern would match the same set of phrases matched by the pattern, 'ON\-LINE':

ON<tab>LINE ON.LINE ON; LINEON<newline>LINE ON&LINE ON. LINEON LINE ON ; LINE ON\LINEON-LINE ON. LINE ON A LINEON;LINE ON &LINE

When searching text in LITERAL index mode, these same punctua-tion characters are interpreted by SearchServer as that particular character. For example, the pattern in the following statement

SELECT PROBLEM_NUMBER, PRODUCT_VERSION FROM SUPPORT WHERE PRODUCT_VERSION CONTAINS '1&1'

would only match the following:

1&1

Using Special Characters in a Pattern

To help you search for words and phrases in a table, SearchServer lets you use special characters that, when embedded in a pattern, are interpreted differently. These special pattern characters allow you to:

• use the space ( ) character to separate search terms that must occur in sequence

• use the hyphen (-) character to search for hyphenated and non-hyphenated forms of a pattern

Page 425: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 425

• insert an underscore (_) wildcard character so you can match a single character

• insert a percent (%) wildcard character so you can match a sequence of characters

• search for the literal meaning of a special character by inserting a backslash (\) character immediately before the special pattern character

Using a Space Character ( ) in a Pattern

In a pattern, a space separates terms that must occur in sequence. For example, the pattern in the following statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON LINE'

is interpreted as the proximate predicate in the following statement:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON' WITHIN 0 CHARACTERS OF 'LINE' IN_ORDER

When searching text indexed in NORMAL index mode, this pattern would match any occurrence of the word ON followed by spaces, tabs, newlines, or any other white space characters in any combina-tion, followed by the word LINE.

You can think of the space character in a pattern as matching the white space (that includes tabs and newlines, as well as spaces) that separates terms in columns defined with NORMAL index mode. However, for text in columns defined with LITERAL index mode, a space character is considered to be part of a term, and a space char-acter in a pattern won't match it. Therefore, the pattern in the follow-ing statement

SELECT PROBLEM_NUMBER, PRODUCT_VERSION FROM SUPPORT WHERE PRODUCT_VERSION CONTAINS '1 1' would match two consecutive occurrences of 1

separated by any combination of control characters that separate words (tab, newline, carriage return, vertical tab, and form feed, but not space). It would not match the following text in a column defined with LITERAL index mode:

1 1

When searching text in a column or zone defined with NORMAL in-dex mode, SearchServer interprets a pattern with alphabetic and nu-

Page 426: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

426 SA-Application Software Expert 5.0

D

meric words juxtaposed with no intervening space as a phrase. For example, the pattern in the following statement SELECT PROBLEM_NUMBER, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'XYZ123' is equivalent to the pattern in the following statement SELECT PROBLEM_NUMBER, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'XYZ 123'

This would match any occurrence of the word XYZ optionally fol-lowed by white space, followed by the numeric word 123, such as XYZ123 XYZ<tab>123 XYZ 123 XYZ 123

However, when searching text in a column or zone defined with LITERAL index mode, any sequence of alphabetic, numeric, and punctuation characters in a pattern is interpreted literally. For exam-ple, the pattern in the following statement

SELECT PROBLEM_NUMBER, PRODUCT_VERSION FROM SUPPORT WHERE PRODUCT_VERSION CONTAINS '1.1A'

would match only the term 1.1A

Note: The lexical rules followed by SearchServer when indexing column data correspond to those given above. A pattern is matched by searching its indexes, not the column directly. This ensures that the desired words are found regardless of how the text is formatted. This is true for all data defined with NORMAL index mode.

Using the Hyphen Character (-) in a Search

When searching a column defined with NORMAL index mode, a hyphen (-) in a pattern is treated as an optional punctuation charac-ter. This allows you to search for hyphenated and non-hyphenated forms of the pattern.

The non-hyphenated forms include spaces or anything that is inter-preted as a punctuation character in the Fulcrum Technologies Inter-nal Character Set (FTICS). For example, in a search, the pattern in the following statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON-LINE'

is interpreted as the compound condition in the following statement:

Page 427: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 427

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ONLINE' | 'ON' WITHIN 1 CHARACTERS OF 'LINE' IN_ORDER

This pattern would match certain combinations of words, spaces, and punctuation including the following:

ONLINE ON;LINEON LINE ON.LINEON-LINE ON&LINE

Because spaces between words in the actual data are optional when matching a proximate predicate, and the number of spaces is not considered in a search, the following text strings would also be matched:

ON<tab> LINEON. LINEON. LINEON<newline>LINE ON &LINE ON\LINEON ; LINE ON; LINE ON A LINE

However, if the column is defined with LITERAL index mode, a hy-phen (-) is treated as a literal hyphen and the only match is the actual search term specified. For example, the pattern in the following statement only matches the pattern 1-1. SELECT PROBLEM_NUMBER, PRODUCT_VERSION FROM SUPPORT WHERE PRODUCT_VERSION CONTAINS '1-1'

Using String Wildcards in a Search

String wildcards allow you to enter a partial search term in a con-tains predicate when one or more of the characters to be matched is not specified. Use a wildcard character either to represent a single character, or an optional sequence of characters. There are two string wildcard characters:

• underscore (_) represents a single character

• percent sign (%) represents a sequence of zero or more characters

You can place a string wildcard character anywhere within a word in a pattern.

Note: Some forms of wildcard are not permitted in character col-umns in the TABLES, COLUMNS, ZONES, and SERVER_INFO sys-tem tables. There are no restrictions on wildcard use when searching in the SEARCH_TERMS table.

Wildcard characters retain their special interpretation in a pattern used to search in any character string column, whether the column is

Page 428: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

428 SA-Application Software Expert 5.0

D

defined with NORMAL or LITERAL index mode. To embed a string wildcard in its literal form, you must preface the wildcard with an escape character. For more information about the escape clause, see the section, "Using the Escape Clause in a Pattern," later in this chapter.

Using the Underscore (_) The underscore character is used to rep-resent any single character and can be embedded anywhere in the word. It matches any character at the same position at which it ap-pears in the search term. For example, the pattern in the following statement SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'WO_D'

would match all words beginning with WO and ending with D with any single character in between, such as: WORD WOOD

You can use multiple single-character wildcards together to match a fixed number of characters at the specified position. For example, the patter in the following statement SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'W__D'

would match all words beginning with W and ending with D with any two single characters in between, such as: WARD WILDWORD WINDWEED WORDWELD WOOD

Using the Percent Sign (%) The percent sign (%) is used to repre-sent any sequence of zero or more characters at the same position at which it appears in the search term. For example, the pattern in the following statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'WORD%'

would match all words beginning with WORD, such as:

WORD WORDPLAYWORDAGE WORDSWORDING WORDSMITHWORDLESS WORDY

Page 429: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 429

Note: Using the percent sign (%) by itself (without any accompany-ing letters or digits) is not currently supported.

Combining String Wildcards You can also combine string wild-cards in a single character string provided the string contains at least one non-wildcard character. For example, the pattern in the follow-ing statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'WO_D%

would match words such as:

WONDER WOODWIND WORDLESSWOOD WOODY WORDPLAYWOODEN WORD WORDSMITHWOODLAND WORDAGE WORDYWOODLOT WORDING

When both the underscore (_) and percent sign (%) string wildcards are used in combination, the minimum number of characters matched is determined by the number of underscore (_) wild-cards used. For example, w% matches w plus one or more trailing characters, w%_ matches w plus one or more trailing characters, w%__ matches w plus two or more trailing characters, and so on.

Using more than one consecutive percent sign (%) string wildcard is interpreted as a single percent sign.

Performance Implications of Using Wildcards

Wildcard use can have a significant negative impact on search per-formance. In some cases, this impact can be minimized by creating new tables that are optimized for wildcard searching by specifying a value for WILDCARD_OPT other than NONE, or by re-indexing existing tables using the same parameter or the VALIDATE INDEX statement.

When devising wildcard patterns for searching, it is useful to consid-er the following performance implications:

A pattern with both leading and trailing wildcard characters (such as _oo%) requires significantly more processing than a pattern that does not begin with a wildcard. This impact is independent of any optimizations specified by the WILDCARD_OPT parameter. A pattern with a leading wildcard character requires significantly more processing on tables with a WILDCARD_OPT value of NONE.

Page 430: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

430 SA-Application Software Expert 5.0

D

Using the Escape Character (\) in a Pattern

By default, the escape character is the backslash (\). This character, when placed immediately before any special pattern character (hy-phen, percent, backslash, underscore, or space) in a pattern, causes the special pattern character to be interpreted literally. For example, the pattern in the following statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON\-LINE'

would only match the following if the column being searched was defined with LITERAL index mode: ON-LINE

However, when searching a column defined with NORMAL index mode, this pattern is interpreted as the following SELECT statement that contains a proximate predicate:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON' WITHIN 1 CHARACTERS OF 'LINE' IN_ORDER

and would match words and phrases such as:

ON<tab>LINE ON.LINE ON; LINEON<newline>LINE ON&LINE ON. LINEON LINE ON ; LINE ON\LINEON-LINE ON. LINE ON A LINEON;LINE ON &LINE

Any number of spaces between words in these phrases is permitted when matching the proximate predicate. The interpretation of back-slash space (\ ) in a pattern is:

• a literal space in LITERAL index mode

• a separator between terms in a sequence

Use this combination to match a term containing an embedded space in LITERAL index mode. When searching NORMAL text, this combination retains the term separation characteristic of a space in a pattern.

Using the Escape Clause in a Pattern

Use the escape clause to change the default escape character to any other single character except the single quote ('). This allows you to search for a pattern that contains a backslash (\).

Page 431: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 431

Note: The escape character is only changed for the current pattern to which the escape clause is attached.

For example, the following statement searches for a pattern contain-ing a backslash (\) and changes the default escape character to a co-lon (:):

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON\LINE' ESCAPE ':'

This pattern would match the following only if the column being searched was defined with LITERAL index mode:

ON\LINE However, in a column defined with NORMAL index mode, the pattern would match the same set of phrases as listed previously for ON\-LINE in NORMAL index mode:

ON<tab>LINE ON.LINE ON; LINEON<newline>LINE ON&LINEO ON. LINEON LINE ON ; LINE ON\LINEON-LINE ON. LINE ON A LINEON;LINE ON &LINE

You can also use the escape character to interpret a backslash (\) lit-erally. For example, the pattern in the following statement would match the same phrases as the previous example that uses the escape clause:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'ON\\LINE'

Searching for Accented Characters in a Pattern

By default, SearchServer ignores accented characters. When an ac-cented character is encountered in a pattern or in column data, it ig-nores the accent and retains the unaccented character. Therefore, the pattern in the following statement

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'RESUME'

would match the following words:

RESUME

Page 432: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

432 SA-Application Software Expert 5.0

D

RESUMÉ

When the search is applying character variant rules to explicitly search for accented characters, then accent indexing must be enabled for each table to be searched. For a complete description about en-abling accent indexing and the use of the character variant rules file, see Fulcrum SearchServer Data Preparation and Administration.

Data Types

Data types are used in the CREATE SCHEMA statement to specify the type of data that is expected in each column. There are three main data types:

<character string type>

<exact numeric type>

<date type>

The syntax for the data type is as follows: <data type> ::= <character string type> | <exact numeric type> | <date type>

Character String Data Type

A character string data type is used for columns that contain charac-ter data. The syntax for a character string data type is shown below:

<character string type> ::= CHARACTER [VARYING] [(<length>)] | CHAR [VARYING] [(<length>)] | CHAR [VARYING] [(<length>)] | VARCHAR [(<length>)] | APVARCHAR [(<length>)]

If the VARYING option is omitted, the column is fixed in length as specified by the length option. If the length of the value in this type of column is less than the length specified by the length option, the value string is padded on the right with spaces.

If the VARYING option is specified, the length is variable with a minimum length of 0 and a maximum length of the value specified by the length option. For this type of column, the value string is not padded.

The maximum value of the length option is 32,767. You can omit the length option if the varying option is also omitted. In this case the string length is 1.

Page 433: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 433

For the APVARCHAR data type, the default and maximum value of the length option is 2,147,483,647.

CHAR VARCHAR

The CHAR data type can be assigned to a column to specify that it can contain a character string of fixed length. The VARCHAR data type can be assigned to a column to specify that it can contain a char-acter string of varying length. (CHAR is a synonym for CHARAC-TER. VARCHAR is a synonym for CHAR VARYING or CHARACTER VARYING.)

The character set for these data types is specified by the native char-acter set. However, because SearchServer stores data in one of sev-eral pre-determined but compatible internal character sets called Fulcrum Technologies Internal Character Sets (FTICS).An automat-ic conversion from the native character set to the internal character set is performed when the data is stored, and from the internal char-acter set to the native character set when it is retrieved.

The internal character set is a super-set of the ASCII character set. For a complete description and chart of the FTICS, see Appendix A, "Character Sets."

The native character set might use single character codes to repre-sent a wide variety of accented characters, suitable for most Latin-based European languages, such as English, French, and Italian. However, FTICS uses two-character codes to represent accented characters (accent and base character), and permits a wider variety of accented characters to be represented. These data types can't be specified for the external text column (FT_TEXT).

APVARCHAR

The APVARCHAR data type can be assigned to a column to specify that it can contain a character string of varying length. This data type is only used for the external text column (FT_TEXT or a column that renames FT_TEXT) and can't be used for any other column. It uses an application-defined character set that can be converted to the in-ternal character set through a text reader. For a complete description of text readers, see Fulcrum SearchServer Data Preparation and Administration.

For example, if your external documents were created using Mi-crosoft Word, the text reader would convert the data from the Mi-crosoft Word character set to the internal character set.

Exact Numeric Data Type

The syntax for an exact numeric data type is:

Page 434: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

434 SA-Application Software Expert 5.0

D

<exact numeric type> ::= INTEGER | INT | SMALLINT

Exact numeric data types are assigned automatically to the follow-ing reserved columns and functions:

• FT_CID

• FT_DFLAG

• FT_FORMAT

• FT_MTIME

• FT_ORIGINAL_SIZE

• FT_TIMESTAMP

• RELEVANCE()

These data types are stored in a machine-independent fashion. They are presented across application interfaces as the appropriate lan-guage data type. Match codes are never returned for a column with an exact numeric data type, even when the index mode for the column is NORMAL.

INTEGER

The INTEGER data type specifies an unsigned 32-bit binary integer. Its range is restricted to 0 through 2147483647. The precision is 10 digits. (INT is a synonym for INTEGER.)

SMALLINT

The SMALLINT data type specifies a signed 16-bit binary integer. The precision is 5 digits. You can insert any number from -32768 through 32767 into a SMALLINT column.

Date Data Type

A date data type is used for columns that contain data in the date for-mat. The syntax for the date data type looks like this:

<date type> ::= DATE

DATE

If you assign this data type to a column, you can only insert data in the date literal format. For a complete description of date literal, see the section "Literals," earlier in this chapter.

Page 435: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Language Elements

Text Retrieval Guide 435

User-Written Comments

User-written comments are used to annotate statements but have no effect on the actual operation of the statement and look like this:

--comment

For example:

SELECT * FROM SUPPORT

WHERE PRIORITY >< 3 --This statement returns rows from the

--SUPPORT table corresponding to problem calls on which

--the PRIORITY is greater than 3.

Comments can be of any length. They can be inserted separately on a line or follow any portion of a statement. When positioned on a line with a statement, they constitute the remainder of that line. A semi-colon (;) occurring in a comment is not interpreted as a state-ment terminator.

Page 436: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

436 SA-Application Software Expert 5.0

D

Page 437: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 437

Chapter 4:

SearchSQL Statements

This chapter presents a detailed reference of SearchSQL statements, clauses, functions, and predicates. For each statement, clause, and predicate you'll find:

• complete syntax

• detailed syntax descriptions

• examples

Overview

This chapter presents the complete syntax for all SearchSQL state-ments. A statement is the language unit executed by the API func-tions.

For quick reference and retrieval, the statements, clauses, functions, and predicates are organized alphabetically and syntax names are displayed in the running header of the page.

Some particularly complex statements are broken down into individ-ual clauses, functions, and predicates. For example, there is a sepa-rate reference page for the WHERE clause, which is part of the SELECT statement.

There is at least one example for each statement or component. However, more complex statements might have several examples.

Note: There is no limit on the length of a SearchSQL statement other than that imposed by platform or available memory.

ALTER TABLE Statement

Adds or deletes one column to or from an existing schema.

Syntax ALTER TABLE <table name> {ADD <column definition> | DROP [COLUMN] (<column identifier>} [CASCADE | RESTRICT]

<column definition> ::= <column identifier> <data type> [<field number>]

Page 438: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

438 SA-Application Software Expert 5.0

D

Keywords and Options

<table name>

A compound identifier that specifies the name and location to which the column is being added or deleted.

ADD

Specifies that the column is to be added to the schema. This option is also useful if you want to rename an existing column or change its data type or field length.

<column identifier>

An identifier that specifies the name of the column being added or deleted in the schema. Each column name must be unique within the table and different from any zone name in the schema.

If the name of a reserved column is specified, the column is added to the select list of a SELECT statement when the asterisk (*) option is used. In addition, the reserved column is displayed in the COL-UMNS system table.

The reserved column can be specified only with its default attributes (field number, data type, and index mode). If you try to override the default attributes, only the default values are used.

<data type>

Specifies the data type of the column being added. The data type can be one of the following with its associated default index mode:

You can change the data type of an existing column by first dropping the column, then adding it to the table again and specifying a differ-ent data type. However, because you can't specify a domain, you can only use the pre-defined data types and their default index mode as shown in the preceding table.

<field number>

Defines the field number of the column being added. A field number is an unsigned integer that provides a link between a column in one table and a column in another table, or to a reserved column.

When you use the ALTER TABLE statement with the ADD option,

Data Type Default Index Modecharacter string type NORMAL

exact numeric type VALUE

date type VALUE

Page 439: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 439

you can let SearchServer assign the field number automatically, or you can specify it explicitly. However, if you want to use the UNION clause in a SELECT statement, the field numbers of identi-cally named columns in each specified table must be the same. If they're not, your search will return unexpected results

The range for field numbers is any number from 128 through 64010 that has not already been assigned to another user-defined column in the same table. Also, the field number must not be the same as any zone number assigned to a zone associated with a different column in the same table.

Field numbers are also used to rename reserved columns. This is done by specifying a field number that matches the field number of a reserved column. In this case, the data type and field number must match those defined for the corresponding reserved column. The only attribute that can be overridden is the column name. Once a re-served column has been redefined, its original name is unavailable.

If this option is omitted, a field number is automatically assigned to each column. If the data type of the column is APVARCHAR, SearchServer assumes that you are renaming the FT_TEXT reserved column, and assigns field number 32.

If the data type is DATE, SearchServer assigns the first field number in the following list that has not already been assigned to a column in the schema:

• 27

• 28

• 329

• 0

he first available field number starting at 128

DATE field numbers 27 to 30 are defined in the schema as reserved columns. If SearchServer assigns one of those field numbers to a column, you are implicitly renaming a reserved column. For all oth-er data types, SearchServer chooses the first available field number starting at 128.

DROP

Specifies that the column is to be flagged for deletion. However, you can't drop a column if it is the only column in the table. In this case, the ALTER TABLE statement fails and an error message is re-turned.

Once a column has been dropped, the data in the rows for that col-umn is inaccessible.

Page 440: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

440 SA-Application Software Expert 5.0

D

COLUMN

Optional keyword that is provided for compatibility with other ap-plications or application environments, but has no effect on the op-eration of the statement.

CASCADE

Optional keyword that is provided for compatibility with other ap-plications or application environments, but has no effect on the op-eration of the statement.

RESTRICT

Optional keyword that is provided for compatibility with other ap-plications or application environments, but has no effect on the op-eration of the statement.

Description

When the ALTER TABLE statement is executed, the schema is read, and the column specified is added or flagged for deletion from the schema. Then the schema is rewritten to the table. Any concur-rently executing application that has already accessed the table will not see the change in the schema until it closes and re-establishes a connection to SearchServer, or executes its own ALTER TABLE statement on this table

When using the ADD option, if the field number matches the field number of a column that has been previously dropped, the original column is restored with the new attributes specified in the column definition. Unless the data has been deleted using the UPDATE statement, the data from the original dropped column is now acces-sible through the restored column. If the index mode of the column changes, the new column might not be searched correctly until a VALIDATE INDEX statement with the ABANDON option is per-formed. However, if the column definition specifies the name of a column that was previously dropped but the field number in the col-umn definition is not explicitly specified, the original column is not restored. Instead, the new column is created and the value in all rows in the table is NULL.

Examples

These two examples drop an existing column in the SUPPORT ta-ble, and then restore it with a different name and data length:

ALTER TABLE SUPPORT DROP COLUMN SUBJECT

ALTER TABLE SUPPORT ADD PROB_SUBJECT VARCHAR(85) 137

Page 441: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 441

For More Information

about adding and deleting columns in the schema, see Fulcrum SearchServer Data Preparation and Administration.

Back Reference Predicate

A search criterion that selects rows corresponding to the rows of an existing working table.

Syntax

<back reference predicate> ::= CURSOR <cursor name> [WITH CONTEXT | WITHOUT CONTEXT]

Keywords and Options

CURSOR <cursor name>

Specifies the working table resulting from a previous SELECT statement. The cursor name must have been set or obtained by call-ing a SearchServer API function. The previous SELECT statement must still exist (not be terminated).

WITH CONTEXT

The context information (indicated by match codes) in the data of the referenced working table is preserved in the new working table.

WITHOUT CONTEXT

The context information from the referenced working table is not preserved. This is the default if neither option is specified. In this case, the only context information available is derived from other predicates combined with the back reference predicate in the current SELECT statement. See "Examples" later in this section.

Description

The back reference predicate evaluates to 'TRUE' for each row in the table being searched that corresponds to a row in the working table referenced by the cursor name. In other words, this predicate selects the same subset of rows from the table as the referenced SELECT statement.

This predicate must appear in a SELECT statement that selects from the same set of tables in the same order as in the SELECT statement referenced by the cursor name.

Page 442: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

442 SA-Application Software Expert 5.0

D

The setting of the SHOW_MATCHES server attribute for the origi-nal SELECT statement determines whether match codes are inserted into returned data when context information is available. If match codes were not kept in the original working table, then you can't ask for context when you use the back reference predicate.

The WITH CONTEXT and WITHOUT CONTEXT options control whether context information from the referenced working table is available when match codes are inserted. These options also deter-mine the contribution of the back reference predicate towards the RELEVANCE function. The relevance isn't recalculated if the back reference predicate isn't combined with any other predicates that compute relevance (this includes the contains, like, and is_about predicates). In other words, the same relevance is assigned to each row as was assigned by the referenced SELECT statement.

The WITH CONTEXT and WITHOUT CONTEXT options don't affect the relevance in this case. However, if there is an additional predicate that can affect the relevance, and the WITHOUT CON-TEXT option is specified (or neither option is specified), the rele-vance is recalculated based on the new search terms found and on the current ranking algorithm, without regard to the distributions of the original search terms in the table.

In these cases, the RELEVANCE function values are likely to be different from those in the original SELECT statement. Also, the or-der of the rows retrieved by a back reference to a ranked working ta-ble might not be the same as it was in the original ranked working table.

If the WITH CONTEXT option is specified, the relevance is calcu-lated using the original search terms and the additional search terms used with the back reference predicate.

Examples

The following example creates a working table with the same rows as an existing working table (with cursor name SQL_CUR00001) but with a different set of columns:

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE CURSOR SQL_CUR00001

The next example uses the same search criteria as the previous ex-ample, and also includes the WITHOUT CONTEXT option:

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE CURSOR SQL_CUR00001 WITHOUT CONTEXT

Page 443: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 443

This example is the same as the previous example, but uses the WITHCONTEXT option: SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE CURSOR SQL_CUR00001 WITH CONTEXT

The final example performs an additional search on the subset of rows in the existing working table associated with the cursor. Match codes are inserted only around the terms from the CONTAINS pred-icate since the back reference does not specify WITH CONTEXT:

SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE COMPANY CONTAINS `Gilford' AND CURSOR SQL_CUR00001

For More Information

• about predicates, see Chapter 2, "The Search Process."

• about using the back reference predicate, see Chapter 2, "The Search Process."

• about setting and obtaining a cursor name, see the Fulcrum SearchServer or SearchBuilder Developer's Guide for your environment.

• about the SELECT statement, see the "SELECT Statement" later in this chapter.

• about server attributes, see Chapter 5, "System Information Tables."

Between Predicate

A search criterion that selects rows with column or zone values with-in a specified range.

Syntax

<between predicate> ::= {<column name> | <zone name>} [NOT] BETWEEN <literal> AND <literal>

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table. The col-umn must have been defined with VALUE index mode.

Page 444: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

444 SA-Application Software Expert 5.0

D

<zone name>

An identifier that specifies the name of an existing zone. A zone is a separately searchable portion of a column. The zone must have been defined with VALUE index mode.

NOT

An optional keyword that negates the predicate. If the predicate is 'FALSE' for a particular row, the negated predicate is 'TRUE' even if the row contains no data (NULL value) in the specified column or zone.

BETWEEN

A keyword that identifies the between predicate and links the col-umn name or zone name to the numeric literals.

<literal>

The data type of the literal restricts the acceptable attributes of the column associated with the specified column name or zone name. The literal must be a numeric or date literal.

If a numeric literal is specified, the column name or the zone name must be associated with a column defined with VALUE index mode.

If a date literal is specified, the associated column must be defined with VALUE index mode and the DATE data type. Arithmetic rela-tions are applied so that earlier date values are less than later date values.

The two literals specify an inclusive range of numbers. The first lit-eral can be less than, greater than, or equal to the second literal.

AND

A keyword that connects the two numeric literals.

Description

This predicate evaluates to 'TRUE' for each row where there is a val-ue for the specified column or zone, and all the values in that row for that column or zone fall inclusively between the two numeric liter-als. Note that a column or zone defined with VALUE index mode can contain multiple values only when it is part of a character col-umn.

If the NOT Boolean operator is included, the predicate evaluates to 'TRUE' for all rows for which the above criteria fails. Only columns

Page 445: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 445

or zones defined with VALUE index mode can be specified.

Note: A between predicate doesn't contribute to the value of the RELEVANCE function.

Example

The following example searches for all rows of data where the value in the PRIORITY column is equal to or greater than 2 and equal to or less than 5:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE PRIORITY BETWEEN 2 AND 5

For More Information

bout predicates, see Chapter 2, "The Search Process." bout using the between predicate, see Chapter 2, "The Search Process." bout the SELECT statement, see the "SELECT Statement" later in this

chapter.

Comparison Predicate

Search criterion that selects rows with column or zone values within a specified range.

Syntax

<comparison predicate> ::= {<column name> | <zone name>} <comparison operator> <literal>

<comparison operator> ::= = | <> | < | > | <= | >=

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table.

<zone name>

An identifier that specifies the name of an existing zone. A zone is a separately searchable portion of a column.

<comparison operator>

Page 446: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

446 SA-Application Software Expert 5.0

D

Specifies the arithmetic relation between data and a literal value that must be satisfied for the predicate to be 'TRUE' when applied to a given row.

Note: The NULL value doesn't satisfy any of the comparison oper-ators, except the not equal to (<>) symbol. Negating a comparison predicate involving any other comparison operator, by prefixing the predicate with the NOT Boolean operator, matches all rows for which the predicate is 'FALSE', including rows that have a NULL value in the column or zone.

<literal>

The literal must be a character string, numeric, or date literal.

If a character string literal is specified, the associated column or zone must be defined with NORMAL or LITERAL index mode. The comparison predicate with the character string literal supports only the equal to (=) and not equal to (<>) operators. Unlike a pattern in a contains predicate, the only special character recognized in this context is a space ( ).

If the text being searched is indexed in NORMAL index mode, the space is interpreted as a term separator. If the text is indexed in LIT-ERAL index mode, the space is interpreted literally and matches a single space embedded in the text. In effect, the space is interpreted like a backslash space (\ ) combination in a pattern. For more infor-mation, see the section, "Patterns," in Chapter 3, "SearchSQL Lan-guage Elements."

The case sensitivity of character string matching is controlled by the NORMALIZATION parameter in the CREATETABLE clause.

If a numeric literal is specified, the column name or the zone name must be associated with a column defined with VALUE index mode.

If a date literal is specified, the associated column must be defined with VALUE index mode and the DATE data type. Arithmetic rela-tions are applied so that earlier date values are less than later date

Symbol Arithmetic Relation Valid Index Mode= equal to VALUE, NORMAL,

LITERAL

<> not equal to VALUE, NORMAL, LITERAL

< less than VALUE

> greater than VALUE

<= less than or equal to VALUE

>= greater than or equal to VALUE

Page 447: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 447

values.

Description

This predicate evaluates to 'TRUE' for each row containing at least one value (or a NULL value in the case of the not equal to (<>) sym-bol) in the specified column or zone and a value in the column or zone satisfies the arithmetic relation. For columns or zones defined with VALUE index mode, no context information is available, so match codes won't be inserted in returned data to indicate the loca-tion of numeric values that satisfy the predicate.

However, context information is available (if requested by a SET MATCH_CODES statement) for comparison predicates that in-volve character string literals and columns defined with NORMAL or LITERAL index mode.

Note: A comparison predicate doesn't contribute to the value of the RELEVANCE function.

Examples

The following example searches for all rows of data where the value in the PRIORITY column is greater than or equal to 2:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE PRIORITY >= 2

The next example searches for all rows of data where the value in the COMPANY column matches the character string literal OREO:

SELECT PROBLEM_NUMBER, COMPANY FROM SUPPORT WHERE COMPANY = 'OREO'

For More Information

• about predicates, see Chapter 2, "The Search Process."

• about using the comparison predicate, see Chapter 2, "The Search Process."

• about the SELECT statement, see the "SELECT Statement" later in this chapter.

Contains Predicate

Tests a column or zone for the presence of a word or combination of words.

Page 448: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

448 SA-Application Software Expert 5.0

D

Syntax <contains predicate> ::= {<column name> | <zone name>} [NOT] CON-TAINS <contains condition>

<contains condition> ::= [<contains condition> <contains or opera-tor>] <contains term><contains or operator> ::= vertical bar character (|)

<contains term> ::= [<contains term> &] <contains factor>

<contains factor> ::= [~] <contains primary>

<contains primary> ::= <subcontains predicate> |(<contains condi-tion>)

<subcontains predicate> ::= <within predicate> | <proximity predicate> | <item list>

<within predicate> ::= <item list> WITHIN <distance> OF <item list> [IN_ORDER]

<proximity predicate> ::= PROXIMITY <distance> (<proximity item list> & <proximity item list> [{& <proximity item list>}...])

<proximity item list> ::= (<item list> ) | <weighted term>

<item list> ::= <weighted item> [{, <weighted item>}...]

<weighted item> ::= <contains item> [WEIGHT <unsigned integer>]

<contains item> ::= <pattern> | <thesaurus predicate function>

Page 449: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 449

<distance> ::= <unsigned integer> CHARACTERS

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table. The col-umn must have been defined with NORMAL or LITERAL index mode. <zone name>

An identifier that specifies the name of a zone in the schema of a ta-ble. A zone is a separately searchable portion of a column. The zone must have been defined with NORMAL or LITERAL index mode.

NOT

An optional keyword that negates the predicate. If the predicate is 'FALSE' for a particular row, the negated predicate is 'TRUE', even if the row contains no data (NULL value) in the specified column or zone.

CONTAINS

A keyword that links the column name or zone name to the search condition. This keyword implies that the column or zone data must contain an instance of the specified word or combination of words.

<contains condition>

Specifies the word or combination of words to be matched in the da-ta. The contains condition provides for one or more alternative con-ditions (contains terms) to be matched, such that the entire predicate is 'TRUE' if at least one of the alternatives is matched.

<contains or operator>

The vertical bar (|) serves to separate the alternative conditions in a contains condition. By default, the interpretation of the or operator (|) is OR, but this can be modified by setting the retrieval model to fuzzy Boolean or vector space.

<contains term>

Provides for one or more factors to be matched. By default, the in-terpretation of the & (ampersand) operator is AND, but this can be modified by setting the retrieval model to fuzzy Boolean or vector space.

<contains factor>

Provides for the negation of a contains primary.

Page 450: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

450 SA-Application Software Expert 5.0

D

~ (tilde)

Negates the matches of the condition in the contains primary. If the condition evaluates to 'FALSE' for a particular row, the negation is 'TRUE', even if the row contains no data (NULL value) in the spec-ified column or zone. By default, the interpretation of ~ is NOT, but this can be modified by setting the retrieval model to fuzzy Boolean or vector space.

<contains primary>

Provides for the inclusion of a simple condition or a complex con-tains condition within a more complex condition.

<subcontains predicate>

Allows within predicates, proximity predicates, and search term lists to be part of a contains condition.

<within predicate>

Specifies a proximity relationship between two search terms or search term lists OR yields a match. This predicate evaluates to 'TRUE' if the two items are within the specified distance.

<proximity predicate>

Specifies a proximity relationship between multiple search term lists. This predicate evaluates to 'TRUE' if all of the items are within the specified distances.

<item list>

Provides a list of alternatives. The item list is matched if at least one of the alternatives is matched in the required context.

WITHIN

Introduces the distance parameter of the within predicate.

<distance>

Specifies the maximum number of indexedprintable characters (ex-cluding stop words) that can separate the items in a within predicate. The proximity expression is matched if two items are matched by the word lists within the specified distance.

OF

Introduces the second word list in a within predicate.

Page 451: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 451

IN_ORDER

Used in a within predicate, it specifies that a match of the first word list must precede a match of the second word list, as well as satisfy-ing the distance relationship. If these conditions are met, the within predicate yields a match. If this option is omitted, the order of matches in the data is ignored.

<weighted item>

This construct provides for a term weight that is assigned to the search item (<item>).

<contains item>

Defines a basic search item that can be a pattern or a thesaurus ex-pansion.

WEIGHT <unsigned integer>

Indicates that the item is more important than other items in the word list, or in the entire condition (depending on the relevance method used).

The number indicates the importance of the item, relative to the oth-er search items. If this clause is omitted, the default weight is 1.

<pattern>

A pattern specifies the text to be matched by a contains predicate. The match can occur anywhere in the zone or column specified in the predicate, subject to the context implied by the construct in which the pattern appears (for example, within predicate). Multiple matches are permitted and the location of each match is recorded and can be indicated by match codes in the returned data.

<thesaurus predicate function>

Specifies that a word or phrase be expanded using a thesaurus. It can be used anywhere a pattern is used.

CHARACTERS

Specifies that the distance parameter in a within predicate is mea-sured in units of indexedprintable characters.

Description

The contains predicate is 'TRUE' for all rows in which the column or zone data includes one or more instances of the specified contains

Page 452: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

452 SA-Application Software Expert 5.0

D

condition. If the NOT keyword is specified, it is 'TRUE' for all rows that don't contain any instances of the contains condition, including rows that have no data (NULL value) in the specified column or zone.

When a domain is used to associate a list of zones with a column, a search on that column is effectively a search in all of the constituent zones. That is, a match in any of the zones constitutes a match in the column. Only columns or zones defined with NORMAL or LITER-AL index mode can be specified.

The operation of the &, |, and ~ Boolean operators conforms to the rules of operation for the specified retrieval model (strict Boolean, vector space, or fuzzy Boolean). The current retrieval model can be determined from the RELEVANCE_METHOD server attribute in the SERVER_INFO system table.

When the vector space or fuzzy Boolean retrieval model is used, the & Boolean operator behaves more like the | Boolean operator, in that only one of its operands need be 'TRUE' to give a true result. These models affect the value of the RELEVANCE function to report how closely the specified conditions were met. In particular, the fuzzy Boolean model distinguishes the & Boolean operator from the | Boolean operator by computing different relevance values.

The ~ Boolean operator negates the basic operation, whether it in-volves the vector space, fuzzy Boolean, or strict Boolean retrieval model. In this case, the resulting rows are selected because they didn't satisfy the criteria of the basic operation. When the vector space or fuzzy Boolean retrieval model is used, the ~ operator will also select rows which do meet the criteria of the basic operation. These rows are given a lower relevance value depending on the de-gree to which the basic criteria are satisfied.

Examples

The following is a simple example of a SELECT statement that in-cludes a contains predicate:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'FILTER'

The next example uses the two string wildcards in an expanded search:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT% FILTER_'

This example includes the within predicate in the contains predicate:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG

Page 453: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 453

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER' WITHIN 10 CHARACTERS OF 'PROCESSING'

This example uses the previous example and includes the IN_ORDER option:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER' WITHIN 10 CHARACTERS OF 'PROCESSING' IN_ORDER

This example includes the proximity predicate in the contains pred-icate:

SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS PROXIMITY 10 'DOCUMENT' & PROCESSING

This example includes the WEIGHT option:

SELECT RELEVANCE ('2:1'), COMPANY, PRIORITY, STA-TUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT', WEIGHT 10, 'FILTER' WEIGHT 2

The final example uses the THESAURUS function: SELECT RELEVANCE ('2:1'), COMPANY, PRIORITY, STA-TUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS ('DISC', WORD_SYNONYM)

For More Information

• about creating patterns, see "Patterns" in Chapter 3, "SearchSQL Language Elements."

• about thesaurus expansion or searching, see Chapter 2, "The Search Process."

• about the SELECT statement, see the "SELECT Statement" later in this chapter.

• about the THESAURUS function, see the "THESAURUS Function," later in this chapter.

• about relevance, and retrieval models, relevance methods, and retrieval models, see Chapter 2, "The Search Process."

Page 454: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

454 SA-Application Software Expert 5.0

D

COUNT Function

Returns a column in the working table that contains the number of rows that satisfies the search criteria.

Syntax

COUNT(*)

Description

This function generates a column in the working table that contains the number of rows that satisfies the query. When used, it must be the only component of the select list. This function returns a working table that contains only one row and one column. Therefore, the OR-DER BY clause and MAX_SEARCH_ROWS server attribute have no effect on a search that uses this function. You can't use this func-tion in a search that also uses a FORUPDATE option. This function is useful to determine the total number of rows in a table, or the total number of rows a specified search will produce.

The data type of the value returned by this function is INTEGER.

Examples

The following example is used to determine the total number of rows in a table:

SELECT COUNT(*) FROM SUPPORT ORDER BY BASETABLE

The next example is used to determine the total number of rows in the working table that satisfy the search criteria:

SELECT COUNT(*) FROM SUPPORT UNION ARCHIVE WHERE (CREATOR CONTAINS 'PETER' OR STATUS CON-

TAINS 'CLOSED') AND PRIME_CONTACT CONTAINS 'MONTAG'

For More Information

about how to use the COUNT function, see Chapter 2, "The Search Process."

CREATE DOMAIN Clause

Creates a domain in a schema.

Page 455: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 455

Syntax CREATE DOMAIN <domain name> {(<zone list>)|<index mode>} [AS] <data type> zone list ::= <zone name>[{, <zone name>}...]

Keywords and Options

<domain name>

An identifier that specifies the name of the new domain. Domain names used in a table must be unique and must differ from any SearchServer data type.

When a domain is used to associate a list of zones with a column, a search on that column is effectively a search in all of the constituent zones. That is, a match in any of the zones constitutes a match in the column. A domain with a list of zone names is required when the as-sociated column data is segmented into multiple zones.

By default, the zone list always includes a zone where the zone num-ber is the same as the field number of the associated column. If this default zone is to be searched separately from other zones in the same column, it must be named in a CREATEZONE clause.

<zone name>

An identifier that specifies the name of a zone defined in the schema. A zone is a separately searchable portion of a column.

<index mode>

Determines the type of indexing for a column defined with the spec-ified domain. The following list contains all valid index modes and describes the type of index that is created:

NORMAL The index created for the column contains one entry for each word in the column. The definition of a word is provided in "Patterns" in Chapter 3, "SearchSQL Language Elements."

VALUE The index created for the column contains one entry for each numeric value in the column.

LITERAL The index created for the column contains one entry for each literal term, delimited by tabs, newlines, or other control characters.

Page 456: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

456 SA-Application Software Expert 5.0

D

The index mode should be specified only when the associated col-umn data isn't segmented into zones, and provides a way of overrid-ing the default index mode. If this option is omitted, the index mode for the column is the default index mode specified by the data type.

AS

Optional keyword that can be used to improve readability in the syn-tax.

<data type>

Specifies the data type of the domain. The data type can be one of the following with its associated default index mode:

Description

A domain is a user-defined data type that is built on a pre-defined data type. It can be used either to associate zones with a column, or to override the default index mode for a pre-defined data type. It can also be used to override the default data length.

Unlike zones and columns, domains can't be referenced anywhere outside a CREATESCHEMA statement. A domain must be associ-ated with only one column of the table if that domain is being used to group zones.

CREATE DOMAIN is a clause used within the CREATE SCHEMA statement and therefore can't be processed on its own.

Examples

The following example creates a domain named LOG_DMN that contains four zones and assigns the data type APVARCHAR to the zones:

CREATE DOMAIN LOG_DMN (DATE_AND_TIME, OPERATOR, CONTACT, DESCRIPTION) AS APVARCHAR

The next example alters the default index mode from NORMAL to

NONE No index is created. A column defined with this index mode can't be referenced in the WHERE clause and is labeled as not searchable.

Data Type Default Index Modecharacter string type NORMAL

exact numeric type VALUE

date type VALUE

Page 457: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 457

LITERAL for the column that is assigned to the specified domain: CREATE DOMAIN VERSION_DMN LITERAL AS VARCHAR(10)

For More Information

• about the rules governing identifiers, see "Identifiers" in Chapter 3, "SearchSQL Language Elements."

• about creating domains, see Fulcrum SearchServer Data Preparation and Administration.

• about using index modes see Fulcrum SearchServer Data Preparation and Administration.

• about creating zones, see Fulcrum SearchServer Data Preparation and Administration.

• about defining columns, see the "CREATETABLE Clause/Statement" later in this chapter.

CREATE SCHEMA Statement

Creates a new table schema including its constituent zones and do-mains.

Syntax

CREATE SCHEMA [REPLACE] <schema name> [{<CREATE ZONE clause> | <CREATE DOMAIN

clause>}...] <CREATE TABLE clause>

Keywords and Options

REPLACE

An option that allows you to overwrite an existing schema for a ta-ble. If this option isn't specified, an error is returned if the table name already exists.

When you overwrite an existing schema, the data values in the table aren't affected. However, SearchServer might interpret them differ-ently if you change the column attributes.

Using this option can invalidate any working tables that are con-structed from the table specified in this statement. Subsequent at-tempts to retrieve data from these working tables are allowed, but not recommended.

Page 458: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

458 SA-Application Software Expert 5.0

D

<schema name>

An identifier that specifies the name of the new schema. Because the schema name is not referenced anywhere else, you can use the same name for the schema and table being created.

<CREATE ZONE clause>

Defines zones in the schema. For a complete description of this clause, see the "CREATE ZONE Clause," later in this chapter.

<CREATE DOMAIN clause>

Defines domains in the schema. For a complete description of this clause, see the "CREATE DOMAIN Clause," earlier in this chapter.

<CREATE TABLE clause>

Creates a new table associated with the schema. For a complete de-scription of this clause, see the "CREATE TABLE Clause/State-ment," later in this chapter.

Description

Each CREATE SCHEMA statement must define one table. You can use this statement to create a new schema or overwrite an existing one. To overwrite an existing schema, use the REPLACE option. However, using the REPLACE option can invalidate the index for the table. To ensure data integrity, always perform a VALIDATEIN-DEX statement that specifies the ABANDON parameter, on the ta-ble.

You must ensure that no other application processes are referencing the table when you execute a CREATE SCHEMA REPLACE state-ment on it. Otherwise, unpredictable results can occur.

When using the REPLACE option, if a field number matches a field number of a column that has been previously dropped with an AL-TER TABLE statement, the original column is restored with the new attributes specified in the column definition of the CREATE TA-BLE clause. Unless the data has been deleted using the UPDATE statement, the data from the original dropped column is now acces-sible through the restored column.

However, if the column definition specifies the name of a column that has been previously dropped but the field number in the column definition is not explicitly specified, the original column is not re-stored. Instead, the new column is created and the value in all rows in the table is NULL.

Page 459: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 459

It is recommended to place the CREATETABLE clause at the end of the statement. However, the CREATETABLE clause can be placed before, after, or between any CREATEZONE or CREATE-DOMAIN clause.

Example

The following example shows how to create a simple schema with no zones or domains associated with a table containing two col-umns:

CREATE SCHEMA SIMPLE CREATE TABLE SIMPLE (DOC_DATE DATE, DOC_TEXT APVARCHAR)

For a more complex example of a how to create a schema, see Ful-crum SearchServer Data Preparation and Administration.

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about creating a schema, see Fulcrum SearchServer Data Preparation and Administration.

CREATE TABLE Clause/Statement

Creates a new table.

CREATE TABLE Clause Syntax

CREATE TABLE <table name> (<column definition>[{, <column defini-tion>}...])[<table parameter>...]

<column definition>

::= <column identifier> {<data type> | <domain name>} [<field number>]

<table parameter> ::= BASEPATH <base path>

| IMMEDIATE | INDEXDIR <index directory> | NOLOCKING | NORMALIZATION <normalization parameter> | PERIODIC | ROWLOCKING | STOPFILE <stop filename> | WILDCARD_OPT <wildcard optimization meth-

od>

Page 460: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

460 SA-Application Software Expert 5.0

D

| WORKDIR <work directory>

<base path> ::= <character string literal>

<index directory> ::= <character string literal>

<stop filename> ::= <character string literal>

<work directory> ::= <character string literal>

<field number> ::= <unsigned integer>

<normalization parameter> ::= { 'DEFAULT'

| 'EUROPA3' | 'ARABIC'} | 'NONE' | 'ASIAN'

<wildcard optimization method> ::= { 'MINIMIZE_SEARCH_TIME' | 'MINIMIZE_INDEX_OVERHEAD | 'NONE'

CREATE TABLE Statement Syntax

CREATE TABLE <table name> (<column definition>[{, <column defini-

tion>}...])

<column definition> ::= <column identifier> <data type> [<con-

straint>]

<constraint> ::= NOT NULL

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble being created. For a complete description of how a table name is formed, see "Compound Identifiers" in Chapter 3, "SearchSQL Lan-guage Elements."

The following table identifiers can't be used in a CREATETABLE clause or statement:

Page 461: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 461

• COLUMNS

• SERVER_INFO

• SEARCH_TERMS

• TABLES

• ZONES

<column identifier>

An identifier that specifies the name of a column in a table. Each col-umn name must be unique within the table and different from any zone name in the same schema.

If the name of a reserved column is specified, SearchServer automat-ically creates it with the default attributes (field number, data type, index mode) of the named reserved column. If you try to override the default attributes, only the default values are used.

For example, if you specify the FT_SFNAME reserved column with an INTEGER data type

CREATE SCHEMA JESSCREATE TABLE JESS(FT_SFNAME INT 400)

SearchServer actually creates the table that contains the FT_SFNAME reserved column with a field number of 3, a data type of VARCHAR(260), and an index mode of NONE.

Note: All column names beginning with FT_ are reserved.

<data type>

Specifies the data type of the column. The data type of the column is stated explicitly or is inherited from the domain. The data type can be one of the following with its associated default index mode:

Data Type Default Index Modecharacter string type NORMAL

exact numeric type VALUE

date type VALUE

Page 462: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

462 SA-Application Software Expert 5.0

D

<domain name>

An identifier that specifies the name of a previously created domain in the same schema. A domain is a user-defined data type that can be used to associate zones with a column, or to override the default in-dex mode for a data type.

When a domain is used to associate a list of zones with a column, a search on that column is effectively a search on all of the constituent zones. That is, a match in any of the zones constitutes a match in the column.

<field number>

Defines the field number of a column in a table. A field number is an unsigned integer that provides a link between a column in one ta-ble and a column in another table, or to a reserved column.

When you use the CREATETABLE statement, SearchServer auto-matically assigns the field number to a column. However, when you use CREATETABLE as a clause within the CREATESCHEMA statement, you can let SearchServer assign the field number auto-matically, or you can specify it explicitly.

However, if you want to use the UNION clause in a SELECT state-ment, the field numbers of identically named columns in each spec-ified table must be the same. If they're not, your search will return unexpected results. Therefore, if you use the UNION clause, you should use the CREATESCHEMA statement and incorporate field numbers in your initial table design.

The range for field numbers is any number from 128 through 64010 that has not already been assigned to another user-defined column in the same table. In addition, the field number must not be the same as any zone number assigned to a zone associated with a different col-umn in the same table.

The field numbers can also be used to rename a reserved column. This is done by specifying a field number that matches the field number of a reserved column. In this case, the data type and field number must match those defined for the corresponding reserved column. The only attribute that can be overridden is the column name. Once a reserved column has been redefined, its original name is unavailable.

If this option is omitted, SearchServer automatically assigns a field number to each column. If the data type of the column is APVAR-CHAR, SearchServer assumes that you are renaming the FT_TEXT reserved column and field number 32 is assigned.

If the data type is DATE, SearchServer assigns the first field number in the following list that has not already been assigned to a column in the schema:

Page 463: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 463

• 27

• 28

• 29

• 30 the first available field number starting at 128

DATE field numbers 27 to 30 are defined in the schema as reserved columns. If SearchServer assigns one of those field numbers to a column, you are implicitly renaming a reserved column. For all oth-er data types, SearchServer chooses the first available field number starting at 128.

NOT NULL

Although the CREATETABLE statement permits the specification of the NOTNULL column constraint on any or all columns in the ta-ble, SearchServer ignores it. This is provided to increase interoper-ability with applications that automatically specify this constraint. However, it can't be specified in a column definition within the CREATETABLE clause of a CREATESCHEMA statement.

Table Parameters

None of the following table parameters can be specified

• when the REPLACE option of the CREATE SCHEMA statement is used

• when the REPLACE option isn't specified but the table already exists with only the default schema

• in a CREATETABLE statement

In the case of a CREATETABLE statement, all parameters assume their default values. All table parameters are server attributes in the SERVER_INFO system table. You should check the SERVER_INFO table for the current settings before you create the table.

BASEPATH <base path>

Specifies the base directory location of the document files and direc-tories. Then, when SearchServer needs to know where a file or di-rectory is located, it takes the filename and prepends the path specified by this parameter (unless the file or directory name is fully qualified).

When you use this parameter, you don't have to include the full path when specifying a file if the file is located in the base path location or anywhere relative to that location.

Page 464: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

464 SA-Application Software Expert 5.0

D

Note: The base directory doesn't have to exist when the table is first created, but must exist when a row is inserted with a value supplied for the FT_SFNAME reserved column.

IMMEDIATE

Each table has a periodic index that is updated only when a VALI-DATEINDEX statement is executed. The IMMEDIATE parameter qualifies the table as an immediate table. When an immediate table is created, a differential index is created in addition to the standard periodic index. The differential index is updated each time a change is made to the table so that the differential index accumulates any changes to a table since the last VALIDATE INDEX statement was performed.

You can't specify both the IMMEDIATE and PERIODIC parame-ters in the same CREATE TABLE clause. If neither parameter is specified, the effect depends on the current setting of the IMMEDI-ATE server attribute in the SERVER_INFO system table. If this server attribute is 'TRUE' the effect is as if the IMMEDIATE param-eter was specified in the CREATE TABLE clause. If the attribute is 'FALSE' the effect is as if the PERIODIC parameter was specified. The IMMEDIATE server attribute defaults to 'TRUE' but can be changed by executing a SET IMMEDIATE statement.

The periodic index is updated only when a VALIDATE INDEX statement is executed. However, specifying this parameter allows a SELECT statement to match any row that has been inserted or up-dated, regardless of whether or when a VALIDATE INDEX state-ment has been performed.

INDEXDIR <index directory>

Specifies a directory that is used to contain the table's data and index files. If this parameter is omitted, these files are placed in the default location. If this option is specified, this directory must exist at the time the CREATESCHEMA statement is executed.

If this parameter is omitted, or if it names a directory that doesn't ex-ist, SearchServer uses the default location for temporary files, which is recorded in the FULTEMP server attribute.

NOLOCKING

This inhibits the use of row locking when retrieving, inserting, up-dating, or deleting data in the table. This ensures that access to a row is never denied because another application is using or modifying it, but it does not guarantee the integrity of the data in tables in which update are occurring in multi-user environments. Tables created with this option must be updated by only one application at a time, otherwise there is a risk that data will be corrupted.

Page 465: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 465

You can't specify both the NOLOCKING and ROWLOCKING pa-rameters in the same CREATE TABLE clause. If neither of these is specified, and the PERIODIC parameter is either specified or im-plied, the effect depends on the current setting of the NOLOCKING server attribute in the SERVER_INFO system table: if this server at-tribute is 'TRUE', the effect is as if the NOLOCKING parameter had been specified in the CREATE TABLE clause; if it is 'FALSE', the effect is as if the ROWLOCKING parameter had been specified. The NOLOCKING server attribute defaults to 'FALSE', but can be changed by executing a SET NOLOCKING statement.

The NOLOCKING parameter is ignored when the IMMEDIATE parameter is specified or implied. Tables created with the IMMEDI-ATE option always use row locking.

NORMALIZATION <normalization parameter>

Selects the case normalization strategy to use when indexing the data in the table. Normalization is always performed on the data af-ter it has been translated into FTICS.

The following table describes the normalization parameters avail-able:

If this parameter is omitted, DEFAULT normalization is assumed. You can also set the case normalization to ASIAN or NONE if you have the appropriate internal character set.

PERIODIC

Each table has a periodic index that is updated only when a VALI-DATEINDEX statement is executed. A SELECT statement on a pe-riodic table will fail until at least one row has been inserted into the table and a VALIDATE INDEX statement has been executed. In this case, any modifications that you make to a table aren't reflected in a search until a VALIDATEINDEX statement has been executed suc-cessfully.

DEFAULT Lowercase letters are mapped onto uppercase letters. Accented characters are mapped onto the unaccented uppercase letter. Characters xF1, xF2, xF4, xF6 through xFA, and 0xFc through 0xFE are mapped onto the corresponding character in column E. This case normalization strategy is based on the FTCS character set table.

EUROPA3 Case normalization mapping is done according to the translation table provided for the Europa3 character set. This case normalization strategy is based on the EFTCS94 character set table.

ARABIC Case normalization mapping is done according to the translation table provided for the ARABIC character set. This case normalization strategy is based on the AFTCS94 character set table.

Page 466: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

466 SA-Application Software Expert 5.0

D

You can't specify both the IMMEDIATE and PERIODIC parame-ters in the same CREATE TABLE clause. If neither parameter is specified, the effect depends on the current setting of the IMMEDI-ATE server attribute in the SERVER_INFO system table: if this server attribute is 'TRUE', the effect is as if the IMMEDIATE pa-rameter had been specified in the CREATE TABLE clause; if it is 'FALSE', the effect is as if the PERIODIC parameter had been spec-ified. The IMMEDIATE server attribute defaults to 'TRUE' but can be changed by executing a SET IMMEDIATE statement.

ROWLOCKING

Specifies that standard row locking procedures are to be applied when accessing or modifying the data in the table. When locked, a row is not accessible to any application other than the one that im-posed the lock. This can result in programs encountering locking conflicts. A row can be locked by an explicit API call or implicitly by retrieving data from a SELECT statement which specified the FOR UPDATE option.

These locks persist until the application removes them. Transient (short duration) locks are applied within SearchServer when fetch-ing a row or rowset and when modifying a row using searched UP-DATE or DELETE. These prevent other applications from changing the data at the critical moment, and last only for a short time. In most environments, the transient locks used during retrieval do not pre-vent other applications from also reading the row.

You can't specify both the NOLOCKING and ROWLOCKING pa-rameters in the same CREATE TABLE clause. If neither parameter is specified, and the PERIODIC parameter is either specified or im-plied, the effect depends on the current setting of the NOLOCKING server attribute in the SERVER_INFO system table. If this server at-tribute is 'TRUE', the effect is as if the NOLOCKING parameter had been specified in the CREATE TABLE clause. If it is 'FALSE', the effect is as if the PERIODIC parameter had been specified. The NOLOCKING server attribute defaults to 'FALSE' but can be changed by executing a SET NOLOCKING statement.

STOPFILE <stop filename>

Specifies an operating system file that contains a list of words not to be indexed (stop words). Stop words can't be found by a SELECT statement if they are in a column defined with NORMAL index mode.

The stop file is assumed to be in the directory where the table con-figuration is created unless the stop filename is a fully qualified pathname. If the STOPFILE parameter is omitted, a stop file is not used. However, SearchServer provides a stop file called FUL-TEXT.STP that can be used by explicitly specifying it in this param-eter.

Page 467: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 467

It is not necessary for the stop file to exist when creating a periodic table. However, it must exist when you execute a VALIDATEIN-DEX statement on a table, or when creating an IMMEDIATE table. The current setting can be determined from the STOPFILE server at-tribute in the SERVER_INFO system table. The default is to use no stop file.

To create a table with noa stop file, specify the stop filename as an empty string. For example:

STOPFILE ''

WILDCARD_OPT <wildcard optimization method>

Specifies the type of wildcard optimization to be enabled for the ta-ble. There are three wildcard optimization methods:

WORKDIR <work directory>

Specifies a work directory to be used for temporary files that may be required to accommodate buffer overflow. You must ensure that the specified directory is large enough to accommodate the indexing to be performed. During the execution of a VALIDATE INDEX state-ment, the index files are built in the work directory. Once the VAL-IDATE INDEX statement has completed executing, the index files are moved into the index directory.

MINIMIZE_INDEX_OVERHEAD

This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method. The MINIMIZE_INDEX_OVERHEAD option gives nearly as good search performance as MINIMIZE_SEARCH_TIME (except for complex searches on CD-ROM) with little or more indexing time and storage overhead than NONE.

MINIMIZE_SEARCH_TIME

This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.

NONE No wildcard optimization is enabled for the table. Performance for prefix and infix wildcard searches is substantially reduced. If this parameter is omitted, NONE is assumed unless a SET WILDCARD_OPT statement is used.

Page 468: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

468 SA-Application Software Expert 5.0

D

If this parameter is omitted, or if it names a directory that doesn't ex-ist, SearchServer uses the default location for temporary files, which is recorded in the FULTEMP server attribute.

Description

CREATE TABLE can be a clause of the CREATE SCHEMA state-ment or a statement on its own. The empty table that is created con-tains no rows. It consists of the columns defined here, and the set of reserved columns found in all tables. To insert rows into the table, use the INSERT statement.

A CREATE TABLE statement creates a simple table. Simple table definitions don't include field numbers, table parameters, or refer-ences to domains. The simple table is always created as an immedi-ate index table. When you use the CREATETABLE statement, SearchServer automatically assigns the field number to a column.

Examples

The following example defines a simple table with one column. This example can be executed as either a CREATE TABLE statement or as a clause within a CREATESCHEMA statement.

CREATE TABLE SIMPLE (COMPANY VARCHAR(30))

The next example associates the VERSION_DMN domain with the PRODUCT_VERSION column. This example must be a CREATE TABLE clause that is part of a CREATE SCHEMA statement. Oth-erwise, SQLSTATE 37000 is generated.

CREATE SCHEMA SUPPORT CREATE TABLE SUPPORT (PRODUCT_VERSION VERSION_DMN, PROBLEM_NUMBER

CHAR(8)) STOPFILE 'support.stp' BASEPATH 'supdocs'

For More Information • about creating tables, see Fulcrum SearchServer Data Preparation

and Administration.

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about data types, see Chapter 3, "SearchSQL Language Elements."

• about reserved columns or for a list of all reserved columns, see Fulcrum SearchServer Data Preparation and Administration.

• about system tables, see Chapter 5, "System Information Tables."

Page 469: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 469

• about field numbers, see Fulcrum SearchServer Data Preparation and Administration.

• about the results of searches containing stop words, see Chapter 2, "The Search Process."

• about default locations of SearchServer files, see Fulcrum SearchServer Getting Started for your platform.

• about locking and unlocking tables, see the "PROTECT TABLE Statement" and the "UNPROTECT TABLE Statement," later in this chapter.

CREATE TEXT_VECTOR Statement

Creates a list of the most frequently occurring words in external doc-uments or portions of any document.

Syntax

CREATE TEXT_VECTOR <text vector name> {<character string literal> | FILE <filename> [FILTER <text reader specification>] | ROW <row list> TABLE <table name>} [VECTOR_TERMS <unsigned integer>]

<filename> ::= <character string literal>

<text reader specification> ::= <character string literal>

<row list> ::= <unsigned integer> [{, <unsigned inte-

ger>}...]

Keywords and Options

<text vector name>

An identifier that specifies the name of the new text vector.

<character string literal>

Specifies the text fragment to be processed. Use this form when the user is entering a readily available and reasonably small text frag-ment from which the text vector is to be created.

Page 470: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

470 SA-Application Software Expert 5.0

D

FILE <filename> [FILTER <text reader specification>]

The filename specifies the name of a file, not necessarily in a table. The text reader specification (by default, the "s" text reader) speci-fies a valid text reader list. It is the responsibility of the text reader named in the text reader list to supply the portions of the specified file to be processed.

Use this form when the text fragment is a complete document or a document fragment in a locally available file that is not part of a ta-ble. For a complete list of valid text readers, see Fulcrum Search-Server Data Preparation and Administration.

ROW <row list> TABLE <table name>

The row list refers to the rows to be processed when creating the text vector. Only the external text column (by default, called FT_TEXT) is considered. To obtain the correct rows, you must first execute a SELECT statement that retrieves the FT_CID reserved column. For example:

SELECT FT_CID FROM SUPPORT WHERE PRIORITY = 1

The result of the SELECT statement is a list of unsigned integers that refer to each row in the table:

1 4

You then specify the rows you want processed by including the FT_CID values for the rows in the row list:

CREATE TEXT_VECTOR VEC1 ROW 1, 4 TABLE SUPPORT

The table name is a compound identifier that specifies the name and location of the table where the rows exist. For a complete description of how a table name is formed, see "Compound Identifier" in Chap-ter 3, "SearchSQL Language Elements."

VECTOR_TERMS <unsigned integer>

Specifies a value that controls the maximum number of terms that can be included in the text vector. The terms with the highest weight are the terms selected.

The default value is 100. The recommended range is 100 to 200 terms.

Page 471: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 471

Note: This option works closely with the MAX_TERMS parameter in the is_about predicate to produce a meaningful relevance calcula-tion.

Description

Each word in the list has an associated weight value that is based on the number of occurrences of that word in the text from which the words were extracted. If the word list could be displayed, an exam-ple might be:

The word list can be used to select rows containing documents sim-ilar to the original text, by using the is_about predicate in the WHERE clause of the SELECT statement.

The word list can be thought of as a multi-dimensional vector. For example, if the word "object" appears five times in a document, the vector is five units long in that dimension. By comparing the vector space distance between the vectors of two documents, you can then see how similar one document is to another.

The distance is calculated in one of three ways depending on the re-trieval model specified in the relevance method. The three ways to create the text vector are by specifying:

• a character string literal

• a file and user-defined text reader list

• rows in a table

Examples

The following example creates a text vector using a character string literal:

CREATE TEXT_VECTOR VEC1 'Windows clients can talk to UNIX servers'

WORD WEIGHT AARDVARK 4

ABACUS 10

ABALONE 7

BABOON 5

BABY 6

CAB 2

CABANA 4

CABBAGE 1

Page 472: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

472 SA-Application Software Expert 5.0

D

The next example creates a text vector from the external document text of two rows in the SUPPORT table using the ROW option:

CREATE TEXT_VECTOR VEC2 ROW 5,11 TABLE SUPPORT

The next example runs in a UNIX environment and creates a text vector using the FILE option with the added FILTER option:

CREATE TEXT_VECTOR VEC3 FILE '/net/fish/92011101' FILTER 't:s'

The final example creates a text vector and specifies that the top 50 terms (by weight) be kept in the text vector:

CREATE TEXT_VECTOR VEC2 ROW 5,11 TABLE SUPPORT VECTOR_TERMS 50

For More Information

• about using a text vector in an is_about predicate, see Chapter 2, "The Search Process."

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about character string literals, see Chapter 3, "SearchSQL Language Elements."

• about text readers, see Fulcrum SearchServer Data Preparation and Administration.

• about setting VECTOR_TERMS, and their effect, see Chapter 2, "The Search Process."

CREATE ZONE Clause

Creates a zone in a schema.

Syntax

CREATE ZONE <zone name> (<zone number>[{, <zone number>}...]) [<index mode>]

Keywords and Options

<zone name>

An identifier that specifies the name of the zone that can be used in a WHERE clause of a SELECT statement. A zone can't be specified

Page 473: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 473

in an INSERT or UPDATE statement. A zone is a portion of a col-umn that can be searched separately, but that is inserted or updated along with the other data in a column. Each zone name must be unique within the schema and different from any column name.

<zone number>

Defines one of the zone numbers of a zone named in a schema.

The same zone number can appear in more than one CREATE ZONE clause for the same schema, provided that all the zones were associated with the same column. However, the zone number list can't be identical for two different zones. This means that the same zone number can't be associated with two columns in the same sche-ma. In addition, a zone number can't be the same as the field number of a column, other than the column associated with the zone.

A zone number is an unsigned integer that identifies a portion of the data in a column. Zone numbers are associated with portions of text by control sequences embedded in the column data. In the case of the FT_TEXT column, the zone control sequences can be supplied by a text reader. Any text portion of a column that is not explicitly zoned is implicitly associated with a zone number that is equal to the field number of the column. These zone numbers are then used to match only text in the corresponding portion of the data when a zone name is used in a WHERE clause of a SELECT statement.

As with field numbers, if you want to use the UNION clause in a SE-LECT statement, the zone number list of identically named zones in each specified table must be the same. If this is not 'TRUE', or if the tables in the UNION clause have different named zones with the same zone numbers, your search could return unexpected results.

<index mode>

Determines the type of indexing assumed for the text portion asso-ciated with the zone. The following list contains all valid index modes and describes the type of index that is created:

NORMAL The index created for the column contains one entry for each word in the column. (See Fulcrum SearchServer Data Preparation and Administration for the definition of a word.) This is the default index mode.

VALUE The index created for the column contains one entry for each numeric value in the column.

LITERAL The index created for the column (or zone) contains one entry for each literal term, delimited by tabs, newlines, or other control characters.

Page 474: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

474 SA-Application Software Expert 5.0

D

If this option is omitted, the index mode for the zone is the default index mode specified by the data type of the associated column. Al-though the index mode of a zone can be different from that of the column containing the zone, the data must conform to the schema.

If the index mode of a zone differs from that of the column that con-tains the zone, the application (or text reader) must bracket the zone's data with a pair of control sequences that enable and disable the required mode. These sequences are in addition to the select zone sequences that delimit the text. SearchServer doesn't track zone se-lections and match them against defined elements in the schema. Schema attributes are only applied at the column level.

The following table summarizes the requirements. (\E[ denotes the two-character control sequence introducer—hexadecimal value 1b5b.)

Note: NORMAL index mode is the default for all data. Therefore, there is no control sequence introducer for NORMAL index mode.

Description

A zone is a separately searchable portion of a column. In the data, you can have multiple zones, zones within a zone, or zones that overlap; however, the zones must stay within the confines of the col-umn. Each zone must be defined with a unique set of zone numbers.

CREATE ZONE is a clause of the CREATE SCHEMA statement and therefore can't be processed on its own.

Examples

The following example creates the CONTACT zone in the SUP-PORT table:

CREATE ZONE CONTACT (212) NORMAL

The next example creates a zone that is associated with two zone numbers:

NONE No index is created. A column defined with this index mode can't be referenced in the WHERE clause and is labeled as not searchable.

Index Mode Enabling Sequence Disabling SequenceVALUE \E[8v \E[7v

LITERAL \E[12v \E[11v

NONE \E[1v \E[2v

Page 475: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 475

CREATE ZONE DATE_AND_TIME (201, 202)

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about creating zones, assigning field numbers, or using index modes, see Fulcrum SearchServer Data Preparation and Administration.

CUSTOM_VIEWER Function

Returns the external text document in a viewer proprietary data stream.

Syntax

CUSTOM_VIEWER()

Description

This function can be specified only in the select list of a SELECT statement. It allows you to retrieve the external text document in its custom viewer format instead of the SearchServer translated format usually used for retrieving documents. It specifies that SearchServer should not modify (through interpretation or translation) any part of the external document when it is retrieved.

This function is beneficial when using the SearchServer search en-gine for document display within document viewers. For more infor-mation about document viewers, see Fulcrum SearchServer Document Viewer Integration.

Example SELECT CUSTOM_VIEWER() FROM WHERE

For More Information • about the SELECT statement, see the "SELECT Statement," later in

this chapter.

• about document viewers, see Fulcrum SearchServer Document Viewer Integration.

DELETE Statement

Deletes one or more rows from a table. There are two forms of the DELETE statement: positioned and searched.

Page 476: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

476 SA-Application Software Expert 5.0

D

Positioned Syntax

DELETE FROM <table name> WHERE CURRENT OF <cursor name>

Searched Syntax

DELETE FROM <table name> [WHERE <search condi-tion>]

Note: SearchServer does not support transaction rollback, and de-leted rows cannot be recovered.

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble from which the row is being deleted.

WHERE CURRENT OF <cursor name>

This clause specifies the row to be deleted according to the current position of a cursor. The row that is deleted in the table corresponds to the row in the working table on which the cursor is currently po-sitioned. The cursor position doesn't change after this statement is executed.

Note: A row that has been updated since the last VALIDATEINDEX operation can't be deleted from a periodic table when you are using a positioned delete.

WHERE <search condition>

An optional clause that specifies the criteria for selecting rows from a table in the form of a condition that is 'TRUE' for each row selected for deletion. The search condition can be a simple test (predicate) or a combination of several tests. All rows for which the search condi-tion is 'TRUE' are deleted.

CAUTION: With the exception of any rows that have changed since indexing, omitting the WHERE clause results in all the rows in a table being selected and deleted.

When using searched delete on a periodic table, SearchServer will not delete rows that have been changed since the last VALI-

Page 477: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 477

DATEINDEX statement was performed. In this case, the searched delete operation continues, but SearchServer returns SUCCESS_WITH_INFO and SQLSTATE80972 at statement com-pletion to indicate that not all selected rows could be deleted.

However, if you specify the WHERE clause in a searched DELETE statement in one of the following forms, the referenced rows will be deleted regardless of whether they have been indexed or changed since the last indexing operation:

WHERE FT_CID = xxx WHERE FT_CID IN (xxx[,...]) WHERE FT_ROW_TYPE = `yyy'

where xxx is the FT_CID value retrieved from a previous SELECT statement and yyy is one of the valid row types. These are the only exceptions to the rule that a row in a periodic table must be indexed for a successful searched DELETE statement on that row. The table must have been indexed at least once, but the index need not be up to date.

Linguistic processing can be specified in the WHERE clause of a searched DELETE statement in the same manner as the SELECT statement. If you first execute a SELECT statement that uses linguis-tic processing, do not enable or disable linguistic processing before you delete the row.

Description

The rows are deleted from the table once the DELETE statement has finished executing. Any attempt to retrieve data from a deleted row fails and an error message is returned. Any subsequent SELECT statement won't be able to retrieve the data from a deleted row.

In a positioned DELETE operation, the row to be deleted from the table is indicated by the cursor position. In this case, you must first execute a SELECT statement on the same table to obtain a cursor. In addition, your application must call an API function to set or obtain the associated cursor name.

In a searched DELETE operation, the WHERE clause permits an ar-bitrary search condition that selects for deletion all of the rows that meet a specific set of conditions. To avoid accidental deletions, a searched DELETE statement should be preceded by a SELECT statement using an identical WHERE clause to determine which rows (or at least how many rows) will be deleted.

SearchServer provides optimistic concurrency through the use of the FT_TIMESTAMP reserved column in the SELECT and searched DELETE statements. This ensures that the row to be deleted has not been updated since the SELECT statement was performed.

Page 478: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

478 SA-Application Software Expert 5.0

D

A searched DELETE statement is affected by the following server attributes:

A searched operation could end before any rows are deleted if the amount of time currently specified in the MAX_EXEC_TIME serv-er attribute expires while the table is being searched. The MAX_EXEC_TIME is ignored if it expires after the search portion of the request.

The MAX_SEARCH_ROWS server attribute is ignored during a searched delete.

The SERVER_REPORT_TIME server attribute is referenced dur-ing a searched delete. You should have an appropriate value set to ensure that your application regains control as often as required.

If you are deleting a row with a non-null external text column (FT_TEXT or its equivalent), a DELETE statement deletes only the reference to the external document and not the actual external docu-ment file.

If a DELETE statement is applied to a row that was generated auto-matically by SearchServer as a result of directory row expansion, the corresponding table record is regenerated when indexing is next per-formed, provided that the associated external document file hasn't also been deleted.

To delete the directory row, you must remove or rename the actual directory and then request SearchServer to perform table validation. However, this also removes all the automatically generated rows for the files in that directory.

Examples

The following example deletes the row indicated by the cursor posi-tion in the SUPPORT table:

DELETE FROM SUPPORT WHERE CURRENT OF SQL_CUR00005

The next example deletes all rows in the SUPPORT table that meet the specified search condition:

DELETE FROM SUPPORT WHERE CREATOR CONTAINS 'PETER'

The final example deletes a specific row in the SUPPORT table us-ing the FT_TIMESTAMP reserved column:

DELETE FROM SUPPORT WHERE CREATOR CONTAINS 'PETER' AND FT_TIMESTAMP = '1234'

Page 479: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 479

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about deleting data, see Fulcrum SearchServer Data Preparation and Administration.

• about limiting the number of rows selected by a searched DELETE statement, see the "WHERE Clause," later in this chapter.

• using optimistic concurrency, see Chapter 2, "The Search Process."

DROP TABLE Statement

Deletes an existing table.

Syntax

DROP TABLE <table name>

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble being deleted. For a complete description of how a table name is formed, see Fulcrum SearchServer Data Preparation and Adminis-tration.

Description

This statement destroys everything related to the table (including the index files, the data, and the schema) except the external documents and the stop file (if any). Executing this statement invalidates any working tables that are constructed from the table specified in this statement. Subsequent attempts to retrieve data from these tables will fail.

This statement fails if the table has been locked against removal and indexing by the failure of a previously executed VALIDATE IN-DEX statement. You can release the lock on the table with the UN-PROTECT parameter in the VALIDATE INDEX statement, or by using the UNPROTECT statement.

Examples

The following example drops the SUPPORT table:

DROP TABLE SUPPORT The next example drops the SUPPORT table on the FISH node:

Page 480: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

480 SA-Application Software Expert 5.0

D

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about how a table name is formed, see Fulcrum SearchServer Data Preparation and Administration.

FULLNAME Function

Returns the full pathname of the external text document for a row.

Syntax

FULLNAME()

Description

This function determines the full pathname of the external document associated with the row in the working table. It can be used only in the select list of the SELECT statement. This function is useful for determining the location of the documents associated with the table, especially if the documents reside on another system.

The data type of the value returned by this function is VARCHAR (261).

Example

The following example returns the full pathname for the document associated with each row in the working table that satisfies the search criteria:

SELECT FULLNAME(TEXT_LOG) FROM SUPPORT WHERE COMPANY = 'OREO'

For More Information

about how to use the FULLNAME function, see Chapter 2, "The Search Process."

In Predicate

Search criterion that selects rows with column or zone values found in a specified list of exact values.

Syntax

<in predicate>

Page 481: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 481

::= {<column name> | <zone name>} [NOT] IN (<literal> [{, <literal>...}])

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table.

<zone name>

An identifier that specifies the name of an existing zone. A zone is a separately searchable portion of a column.

NOT

An optional keyword that negates the predicate. If the predicate is 'FALSE' for a particular row, the negated predicate is 'TRUE', even if the row contains no data (NULL value) in the specified column or zone.

IN

A keyword that identifies the in predicate and links the column name or zone name to the literal.

<literal>

The data type of the literal restricts the acceptable attributes of the column associated with the specified column name or zone name. The literal can be a numeric literal, date literal, or character string. The literals can be specified in any order.

If a numeric literal is specified, the column name or the zone name must be associated with a column defined with VALUE index mode. If a date literal is specified, the associated column must be defined with VALUE index mode and the DATE data type. If a character string literal is specified, the associated column or zone must be de-fined with NORMAL or LITERAL index mode. Unlike a pattern in a contains predicate, the only special character recognized in this context is a space().

If the text being searched is indexed in NORMAL index mode, the space is interpreted as a term separator. If the text is indexed in LIT-ERAL index mode, the space is interpreted literally and matches a single space embedded in the text. In effect, the space is interpreted like a backslash space (\ ) combination in a pattern.

The case sensitivity of character string matching is controlled by the NORMALIZATION parameter in the CREATETABLE clause.

Description

Page 482: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

482 SA-Application Software Expert 5.0

D

This predicate evaluates to 'TRUE' (unless the NOT Boolean opera-tor is specified) if the row contains at least one value in the specified column or zone that is equal to one of the literals in the list.

Note: An in predicate doesn't contribute to the value of the RELE-VANCE function.

Examples

The following example searches for all rows of data where the value in the PRIORITY column is either 2, 3, or 7:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE PRIORITY IN (7, 2, 3)

The next two examples search for all rows of data where the name in the COMPANY column is either 'OREO' or 'DIXIE':

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE COMPANY IN ('OREO', 'DIXIE')

This example is equivalent to the previous one:

SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE COMPANY CONTAINS 'OREO', 'DIXIE'

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about literals, see Chapter 3, "SearchSQL Language Elements."

• about using the in predicate, see Chapter 2, "The Search Process."

• about the SELECT statement, see the "SELECT Statement" later in this chapter.

INSERT Statement

Adds a new row to a table.

Syntax

INSERT INTO <table name> (<column identifier>[{, <column identifi-

Page 483: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 483

er>}...]) VALUES (<insert value>[{, <insert value>}...])

<insert value> ::= {<literal> | NULL}

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble to which the row is being inserted. For a complete description of how a table name is formed, see Chapter 3, "SearchSQL Language Elements."

<column identifier>

An identifier that specifies the name of an existing column in the ta-ble. Each column name must identify an existing reserved or appli-cation-defined column in the table. You can't repeat a column name within one INSERT statement.

The following reserved columns can't be specified, or any column that renames them:

VALUES

Introduces an insert value list.

<insert value >

Defines a value for each column specified in the insert column list. This value can be a literal, or NULL. The number of values and their order in the list must directly correspond to the number and order of the columns in the column name list.

If an application-defined column is omitted from the column name list, the value of that column is initialized to NULL. The NULL val-ue is not actually stored, but serves as a placeholder in the list of in-sert values.

The column and the insert value must be of the same data type. The length of the insert value must be equal to or less than the length of the column. For fixed length character string data types, if the length of the value is less than the length of the column, the value string is padded on

FT_CID FT_ORIGINAL_SIZE FT_TEXTFT_DATE FT_MTIME FT_TEXT_STATUS

FT_DFLAG FT_ROW_STATE FT_TIMESTAMP

FT_FORMAT FT_ROW_TYPE

Page 484: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

484 SA-Application Software Expert 5.0

D

the right with spaces. For variable-length character string data types, the resultant string is not padded.

If a value is specified for FT_SFNAME, but not for FT_FLIST, the latter reserved column is given a default value of `s', that identifies the standard text reader.

If supplied, the value of FT_SFNAME is a filename that is assumed to be relative to the location specified by the BASEPATH table pa-rameter of the CREATETABLE clause (unless the filename is fully qualified).

The new row is constructed from the information supplied in the col-umn name list and the value list. Any column of the table that is missing from the column list is supplied with a NULL value. How-ever, NULL values are not actually stored and therefore are not searchable.

Description

If the value in the FT_SFNAME reserved column refers to a contain-er, then you must specify the VALIDATETABLE option of the VALIDATEINDEX statement for row expansion. For example, a directory is expanded and a row is created for every document file in the specified directory and all its subdirectories when a table val-idation operation is performed. Another example is FT_SFNAME naming a library file with FT_LIST naming the "l" library text read-er. In this case, rows are created for every document in the library file when a table validation operation is performed to have the con-tainer expanded into separately searchable rows. This method of in-serting rows of data is called row expansion.

The FT_FLIST and FT_SFNAME reserved columns contain special values when a database is being referenced from the Fulcrum SearchServer table. For more information, see Fulcrum SearchServ-er Database Integration.

Example

The following example inserts values in two columns of the SUP-PORT table:

INSERT INTO SUPPORT (COMPANY, PRIME_CONTACT) VALUES ('DIXIE CORP.', 'Dave Chisholm')

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about system tables, see Chapter 5, "System Information Tables."

Page 485: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 485

• about inserting data into a table, see Fulcrum SearchServer Data Preparation and Administration.

• about data types, see Chapter 3, "SearchSQL Language Elements."

• about text readers, see Fulcrum SearchServer Data Preparation and Administration.

Is_about Predicate

Provides a way to search for data without having to use Boolean queries. It also allows you to execute Intuitive Search requests using a single statement.

Syntax

<is_about predicate> ::= {<column name> | <zone name>} IS_ABOUT {<text_vector name> | <character string literal> [<vector

terms>] | FILE <filename> [FILTER <text reader specification>] [<vector

terms>] | ROW <row list> TABLE <table name> [<vector terms>] [<similar parameter>...]

<vector terms> ::= VECTOR_TERMS <unsigned integer>

<filename> ::= <character string literal>

<text reader specification> ::= <character string literal>

<row list> ::= <unsigned integer> [{, <unsigned inte-

ger>}...]

<similar parameter> ::= MAX_TERMS <unsigned integer> | WEIGHT <unsigned integer>

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table. The col-umn must have been defined with NORMAL or LITERAL index mode.

Page 486: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

486 SA-Application Software Expert 5.0

D

<zone name>

An identifier that specifies the name of an existing zone. Although any zone can be specified, a text vector created from the external text column of a row in a table would typically be used only to search a zone in the same column.

IS_ABOUT

Keyword that identifies the is_about predicate, and links the column or zone name to the text vector whose name or description follows.

<text_vector name>

Specifies the name of an existing text vector that was created with a previously executed CREATE TEXT_VECTOR statement.

<character string literal>

Specifies the text fragment to be processed. Use this form when the user is entering a readily available and reasonably small text frag-ment from which the text vector is to be created.

VECTOR_TERMS <unsigned integer>

Specifies a value that controls the maximum number of terms that can be included in the text vector. The terms with the highest weight are the terms selected.

The default value is 100. The recommended range is 100 to 200 terms.

FILE <filename> [FILTER <text reader specification>]

The filename specifies the name of a file, not necessarily in a table. The text reader specification (by default, the "s" text reader) speci-fies a valid text reader list. It is the responsibility of the text reader named in the text reader list to supply the portions of the specified file to be processed.

Use this form when the text fragment is a complete document (or a document fragment) in a locally available file that is not part of a ta-ble. For a complete list of valid text readers, see Fulcrum Search-Server Data Preparation and Administration.

ROW <row list> TABLE <table name>

The row list refers to the rows to be processed when creating the text vector. Only the external text column (by default, called FT_TEXT) is considered. To obtain the correct rows, you must first execute a SELECT statement that retrieves the FT_CID reserved column. For example:

Page 487: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 487

SELECT FT_CID FROM SUPPORT WHERE PRIORITY = 1

The result of the SELECT statement is a list of unsigned integers that refer to each row in the table:

1 4

You then specify the rows you want processed by including the FT_CID values for the rows in the row list:

IS_ABOUT ROW 1, 4 TABLE SUPPORT

The table name is a compound identifier that specifies the name and location of the table where the rows exist. For a complete description of how a table name is formed see "Compound Identifier" in Chapter 3, "SearchSQL Language Elements."

MAX_TERMS <unsigned integer>

Specifies the maximum number of terms that are taken into account when considering a row for similarity to a text vector.

This parameter is used in conjunction with the VECTOR_TERMS option in the CREATETEXT_VECTOR statement or the VECTOR_TERMS option in the is_about predicate. It is recom-mended that this value be five to ten percent of the actual number of terms in the text vector.

WEIGHT <unsigned integer>

Assigns a weight to the text vector. This option is useful when com-bining an is_about predicate with other weighted predicates in a search. The default weight is1.

Description

The is_about predicate can be used in two ways. It can allow the def-inition of the text vector in the SELECT statement and doesn't re-quire a named text vector from a previous CREATETEXT_VECTOR statement. In this case, the text_vector name is omitted and the is_about predicate functions as if the speci-fied options were first used in a CREATE TEXT_VECTOR state-ment. This allows you to use text vectors in environments where the CREATETEXT_VECTOR statement is not accessible.

The text_vector name specified in this predicate can be used only for the current is_about predicate in the current SELECT statement. Af-ter the SELECT statement is executed, the text_vector name is dis-carded.

Page 488: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

488 SA-Application Software Expert 5.0

D

The is_about predicate is intended to be used in combination with the RELEVANCE function and the ORDER BY clause to provide a list of documents ordered by decreasing similarity to the original text. The MAX_SEARCH_ROWS parameter should also be set to limit the list to a reasonable number of the top-ranked documents.

This predicate is restricted to matching on a single column or zone. For each row SearchServer looks at the text specified and compares counts for each unique word with the specified text vector. Search-Server then computes a relevance value.

With the exception of the retrieval model, the current relevance af-fects the calculation of the relevance value for the is_about predi-cate. The retrieval model for this predicate is vector space, because the semantic relationship among words in the text vector is not known.

Examples

The following are examples of searches that include an is_about predicate:

SELECT RELEVANCE('2:3') AS HITS, PROBLEM_NUMBER, COMPANY, PRIORITY

FROM SUPPORT WHERE TEXT_LOG IS_ABOUT VEC1 ORDER BY HITS DESC

SELECT RELEVANCE ('2:3') AS HITS, PROBLEM_NUMBER, COMPANY, PRIORITY

FROM SUPPORT WHERE TEXT_LOG IS_ABOUT VEC1 MAX_TERMS 5 ORDER BY HITS DESC

SELECT RELEVANCE('2:3') AS HITS, PROBLEM_NUMBER, COMPANY, PRIORITY

FROM SUPPORT WHERE TEXT_LOG IS_ABOUT VEC1 WEIGHT 10 AND TEXT_LOG IS_ABOUT VEC2 WEIGHT 1 ORDER BY HITS DESC

For More Information

• about creating a text vector, see the "CREATETEXT_VECTOR Statement" earlier in this chapter.

• about searching, see Chapter 2, "The Search Process."

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

Page 489: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 489

Like Predicate

Search criterion that selects rows with column values that match a pattern.

Syntax <like predicate> ::= <column name> [NOT] LIKE <pattern>

Keywords and Options

<column name>

An identifier that specifies the name of a column in a table. The col-umn must have been defined with NORMAL or LITERAL index mode.

NOT

Optional keyword that negates the predicate. If the predicate is 'FALSE' for a particular row, the negated predicate is 'TRUE' even if the row contains no data (NULL value) in the specified column.

LIKE

Keyword that links the column name to the pattern.

<pattern>

A pattern specifies the text to be matched by a like predicate. That match can occur anywhere in the column specified in the predicate. Multiple matches are permitted and the location of each match is re-corded and can be indicated by match codes in the returned data.

Description

The like predicate is 'TRUE' for all rows in which the column data includes one or more instances of the specified pattern. If the NOT keyword is specified, it is 'TRUE' for all rows that contain no in-stances of the pattern, including rows that have no data (NULL val-ue) in the specified column.

Only columns defined with NORMAL or LITERAL index mode can be specified.

Example

This example shows how to use the like predicate with a pattern that contains a wildcard:

Page 490: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

490 SA-Application Software Expert 5.0

D

SELECT COMPANY FROM SUPPORT WHERE COMPANY LIKE 'G%' AND STATUS CONTAINS

'CLOSED'

For More Information

• about creating patterns, see "Patterns" in Chapter 3, "SearchSQL Language Elements."

• about thesaurus expansion, see Chapter 2, "The Search Process."

• about searching, see Chapter 2, "The Search Process."

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

MARKER_LIST Function

Returns the marker information to provide highlighting information for document viewers.

Syntax

MARKER_LIST()

Description

This function can only be specified in the select list of a SELECT statement. It returns positioning information that a document viewer can use to position efficiently within the external document.

This function is beneficial when using the SearchServer search en-gine for document display within document viewers. For more infor-mation about document viewers, see Fulcrum SearchServer Document Viewer Integration.

Example

The following example returns a working table containing the mark-er list information of all the search terms matched by the WHERE clause:

SELECT MARKER_LIST(), COMPANY, PRIORITY, STATUS, TEXT_LOG

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT'

Page 491: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 491

For More Information

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

• about document viewers, see Fulcrum SearchServer Document Viewer Integration.

MATCH_VCC_LIST Function

Returns the text location of terms matching the search criteria in the external document.

Syntax

MATCH_VCC_LIST()

Description

This function returns character position information within the ex-ternal document. A document viewer can use these positions to high-light search terms in the viewed document It returns an empty string (not a NULL value) for any rows that don't refer to an external doc-ument, for all rows of a null search, and for any rows where there are no search terms in the external document.

This function returns duplicate values if the WHERE clause of the SELECT statement contains duplicate search terms. For example,

SELECT MATCH_VCC_LIST(), COMPANY, PRIORITY, STA-TUS, TEXT_LOG

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT' | 'DOCUMENT'

This function can only be specified in the select list of a SELECT statement.

This function is beneficial when using the SearchServer search en-gine for document display within document viewers. For more infor-mation about document viewers, see Fulcrum SearchServer Document Viewer Integration.

Example

The following example returns a working table containing the loca-tion of all the search terms matched by the WHERE clause:

SELECT MATCH_VCC_LIST(), COMPANY, PRIORITY, STA-TUS, TEXT_LOG

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT'

Page 492: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

492 SA-Application Software Expert 5.0

D

For More Information

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

• about document viewers, see Fulcrum SearchServer Document Viewer Integration.

ORDER BY Clause

Causes rows in a working table to be sorted according to the speci-fied order.

Syntax

ORDER BY <sort criteria> <sort criteria> ::= <sort specification>[{, <sort specifica-

tion>}...]

<sort specification> ::= {<column name> | <column alias> | <unsigned integer>} [<ordering speci-

fication>]

<ordering specification> ::= ASC | DESC

Keywords and Options

<sort criteria>

Specifies the list of columns that will influence the order in which rows are written to a working table. Rows are sorted in ascending or descending order according to the contents of the specified columns.

If a function or tablename column is available (for example, a REL-EVANCE, TABLENAME, or TABLEQUALIFIER function was included in the SELECT statement's select list with an AS reassign-ment), the rows can be sorted according to the function values. Rows can also be sorted by date, provided a date column is available.

<sort specification>

An identifier that specifies the name of a column as entered in the SELECT statement's select list, or as renamed by the column alias in the AS clause of the SELECT statement. Function column names are not permitted in an ORDER BY clause. If any values resulting from a RELEVANCE, TABLENAME, or TABLEQUALIFIER function are to be used to order the rows in a working table, the associated functions must be renamed in the AS clause. You can't duplicate a column name in the sort specification.

Page 493: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 493

Alternatively, the column by which to sort can be specified by its or-dinal number within the select list of the SELECT statement (1 for the first column, 2 for the second column, etc.).

When more than one sort specification is listed in an ORDER BY clause, the first criterion is evaluated and applied before the next. Rows that are identical in all sort criteria are written to the working table in an arbitrary order.

<ordering specification>

Specifies an optional keyword that determines whether the rows are sorted in ascending or descending order for a particular column. The ordering specification can be either ASC or DESC. The default is ASC, ascending order.

Description

The ORDER BY clause is an optional clause that causes the selected rows to be sorted before they are written to the working table. The sorted order of character columns is determined by the collation se-quence that is declared in the COLLATION_SEQUENCE server at-tribute.

If the ORDER BY clause appears in a SELECT statement, it must include one or more sort specifications. If more than one sort speci-fication is listed, they are evaluated and applied in the same order listed in the ORDER BY clause.

If you omit this clause, the order of rows in the working table isn't defined.

Examples

The following example shows how to sort the working table such that the rows are ordered by the name of the table from which they were derived:

SELECT AUTHOR, TABLENAME() AS BASETABLE FROM STDEMO UNION STDOCS ORDER BY BASETABLE

The next example shows how to create a working table with three columns, such that the rows presented in the working table are or-dered by the values in the three columns. DOCDATE is sorted with-in RANK, which is sorted within AUTHOR:

SELECT AUTHOR, RELEVANCE('2:1') AS RANK, DOCDATE FROM STDEMO WHERE TEXT CONTAINS 'HARDWARE' ORDER BY AUTHOR, RANK DESC, DOCDATE DESC

Page 494: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

494 SA-Application Software Expert 5.0

D

The final example illustrates the specification of sort criteria by po-sition within the select list:

SELECT TITLE, AUTHOR FROM STDEMO WHERE SUBJECT CONTAINS 'OPTICAL', 'SYSTEMS' ORDER BY 2

For More Information

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

• about limiting the select criteria, see the "WHERE Clause," later in this chapter.

• about the RELEVANCE, TABLENAME, and TABLEQUALIFIER functions, see the "RELEVANCE Function," "TABLENAME Function," and "TABLEQUALIFIER Function" later in this chapter.

ORIGINAL Function

Retrieves the external text document in its original format.

Syntax

ORIGINAL()

Description

The ORIGINAL function allows you to retrieve the external text document in its original format instead of the SearchServer translat-ed format usually used for retrieving documents. It specifies that SearchServer should not modify (through interpretation or transla-tion) any part of the external document when it is retrieved. The text reader list is scanned from.

This function is beneficial when using the SearchServer search en-gine for document display within document viewers. For more infor-mation about document viewers, see Fulcrum SearchServer Document Viewer Integration.

Example

The following example is used to retrieve the external text document in its original format from the first row in the SUPPORT table: SELECT ORIGINAL() FROM SUPPORT WHERE FT_CID = 1

Page 495: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 495

For More Information

about how to use the ORIGINAL function, see Fulcrum SearchServer Data Preparation and Administration.

PROTECT TABLE Statement

Prevents a table from being indexed or dropped, or having its sche-ma altered or replaced.

Syntax

PROTECT TABLE <table name>

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble to be protected. For a complete description of how a table name is formed, see Chapter3, "SearchSQL Language Elements."

Description

The protection status of a table is universal. This means that when one application protects a table, altering or creating a schema, and dropping or indexing the table is disallowed for all other applica-tions.

An application can determine whether or not a table is protected by viewing the FTT_PROTECTED column value in the TABLES sys-tem table. The protection status is 'TRUE' for protected tables and 'FALSE' for unprotected tables. A table can also become protected due to an indexing failure.

The reverse operation is the UNPROTECT statement.

Example

The following example protects the SUPPORT table:

PROTECT TABLE SUPPORT

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about how a table name is formed, see Fulcrum SearchServer Data Preparation and Administration.

Page 496: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

496 SA-Application Software Expert 5.0

D

RELEVANCE Function

Calculates the relevance of each row in the working table.

Syntax

RELEVANCE ([<relevance method>]) <relevance method> ::= '[<retrieval model>] [<relevance or-

dered>]'

<retrieval model> ::= {V | F}

<relevance ordered> := {2:1|2:2} [<gt> [<relevance threshold>]] | {2:3|2:4} [:<document frequency>] [<gt> [<relevance threshold>]]

<gt> ::= >

<relevance threshold> ::= <unsigned integer>

<document frequency> ::= <unsigned integer>

Keywords and Options

<relevance method>

Specifies a character string that determines the retrieval model and relevance algorithm to be used in the statement.

You can specify the retrieval model with a single character code. If the first character of the string is 'V' or 'v' you are specifying the vec-tor space retrieval model. If it is 'F' or 'f', you are specifying the fuzzy Boolean retrieval model. Otherwise, the default strict Boolean re-trieval model is assumed. Retrieval model affects only the contains predicate, where it controls the operation of Boolean operators and the calculation of a combined relevance for the predicate.

The relevance ordered option specifies which relevance algorithm is used. Table4-1 lists the possible values and meanings for the option are:

Page 497: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 497

Table 4-1 Relevance Ordered Options

<document frequency>

You can specify the document frequency if you are using a ranking algorithm of 2:3 or 2:4. This document frequency is expressed as the maximum percentage of documents in the entire table that contain a particular term. To select a maximum means that the term will not be considered when matching rows, or in the relevance calculation. This parameter is most useful with Intuitive Searching to ensure that very common terms don't cause spurious documents to be included

Values Relevance Ranking Description2:1 Hits Count (Algorithm 1) Counts the total number

of occurrences of the individual words (not phrases) matched regardless of the term frequency within the table.

2:2 Terms Count (Algorithm 2) Counts the number of search terms matched. The frequency of the occurrence of the terms isn't considered.

2:3 Terms Ordered (Algorithm 3) Uses a mathematical formula that computes the relevance statistically. It combines the characteristics of algorithms 1 and 2 and takes into account not only the number of occurrences of each search term, but also a statistical measurement of how common the term is over all the rows in the table (document frequency).

2:4 Critical Terms Ordered (Algorithm 4)

Uses a mathematical formula that computes the relevance statistically and accentuates the effect of the inverse document frequency. It squares the search term importance before multiplying it with the number of occurrences of the search term.

Page 498: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

498 SA-Application Software Expert 5.0

D

in the search result.

Depending on the characteristics of your table, you might not need to set this parameter. If you want to experiment with the effect of this parameter on your application, the recommended starting value is 75. If you set the value too low, some of the Intuitive Searches might fail because all the terms have been dropped from the text vector. This is more likely to occur when a very short text is selected for the Intuitive Search, and the document frequency parameter is low.

To apply this limit, you must include a colon (:) and the percentage (between 1 and 100) after the relevance ordered parameters. For ex-ample, 'F2:3:75' specifies a maximum document frequency of 75 percent. If any term is in more than 75 percent of the documents, then that term isn't considered for the Intuitive Search process.

<relevance threshold>

To apply a relevance threshold to the relevance results, specify a greater than symbol (>) and the lower threshold value after the rele-vance ordered parameters.

Description

This function can be used in the select list to request relevance rank-ing calculation for each row in the result list. When the ORDER BY clause is to be used with the relevance calculation, the RELE-VANCE function must be given an alias, and that alias is used in the ORDER BY clause. The RELEVANCE function itself must not be used directly in the ORDER BY clause.

The data type of the value returned by the RELEVANCE function is INTEGER. The value can be NULL or a positive integer. The min-imum value is one, and the maximum value depends on the selected algorithm. The value is NULL when no relevance method has been specified.

If no relevance method is specified in this function, then the rele-vance method specified by a previously executed SET RELEVANCE_METHOD statement is applied. You can check the current relevance method being used in a connection by viewing the value in the SERVER_INFO system table.

The relevance method affects only the calculation of the relevance value for the contains, like, and is_about predicates. When two or more of these predicates are combined in the WHERE clause, the relevance values of the predicates are combined to produce a single relevance value for the row, according to the rules of the selected rel-evance algorithm. Other predicates don't contribute to the final rele-vance value.

Page 499: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 499

Example

This example uses the strict Boolean model and the hits count rele-vance ranking.

SELECT RELEVANCE('2:1') AS HITS, PROBLEM_NUMBER FROM SUPPORT WHERE TEXT_LOG CONTAINS 'STATEMENT%' WEIGHT 10,

'HANDLE%' WEIGHT 5 WITHIN 10 CHARACTERS OF 'DOCUMENT', 'FILTER'

IN_ORDER ORDER BY PROBLEM_NUMBER, HITS DESC

For More Information

• about relevance, see Chapter 2, "The Search Process."

• about the SET statement, see the "SET Statements," later in this chapter.

SELECT Statement

Creates a working table from a subset of the rows and columns of an existing table or tables.

Syntax

SELECT [ALL | DISTINCT] <select list> FROM <table name> [<correlation name>] [UNION <table name> [<correlation

name>]]... [WHERE <search condition>] [ORDER BY <sort criteria>] [FOR UPDATE]

<select list> ::= <derived column>[{, <derived column>}...] | * | COUNT(*)

<derived column> ::= <value expression> [AS <column alias>]

<value expression> ::= <column name> | CUSTOM_VIEWER() | RELEVANCE([<relevance method>]) | TABLENAME() | TABLEQUALIFIER() | FULLNAME() | ORIGINAL() | MARKER_LIST()

Page 500: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

500 SA-Application Software Expert 5.0

D

| MATCH_VCC_LIST() | VCC_RULES()

<column alias> ::= <identifier>

Keywords and Options

ALL

Optional keyword that can be used to improve readability but has no effect on the operation of the statement.

DISTINCT

Optional keyword that is provided for compatibility with other ap-plications or application environments but has no effect on the oper-ation of the statement.

<select list>

Specifies the list of columns and function values that should be in-cluded in the working table. Values are derived from each row in the table that satisfies the select criteria. Each column written to the working table has the same attributes as the one from which it was derived. Columns can be renamed in the select list.

<derived column>

Specifies the name of the column from which the data is being de-rived, and optionally, a new name for the column data (or function values) before they are written to the working table.

* (asterisk)

Specifies a sequence of column names that includes all user-defined columns and renamed reserved columns in the table. Each column is referenced once in the order in which it was entered in the CREATE TABLE clause.

COUNT(*)

The COUNT function is used to generate a column in the working table that contains the number of rows that satisfies the query. This function returns a working table that contains only one row. There-fore, the ORDER BY clause and MAX_SEARCH_ROWS server at-tribute have no effect on a search that uses this function. You can't use this function in a search that also uses a FORUPDATE option.

<value expression>

Specifies a value that influences which columns will be included in the working table. In this context, a value expression can be any one

Page 501: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 501

of the following: column name>

• CUSTOM_VIEWER()

• RELEVANCE([<relevance method>])

• TABLENAME()

• TABLEQUALIFIER()

• FULLNAME()

• ORIGINAL()

• MARKER_LIST

• BATCH_VCC_LIST

• VCC_RULES

The column name must name an existing reserved or application-de-fined column or a function. The functions RELEVANCE, TABLE-NAME, and TABLEQUALIFIER can be specified, but they must be given aliases if they are used in the ORDER BY clause. Each col-umn that is included in the working table has the same attributes as the corresponding column in the originating table. A given column name can't appear more than once in the select list.

The RELEVANCE function is used to assign relevance values to the rows in the working table. This value is a non-negative number that is computed according to the current setting of the relevance meth-od.

The relevance method is an optional character string literal that de-termines the retrieval model and relevance algorithm that will be used to calculate the relevance value for the rows in the working ta-ble. If it isn't specified, the relevance method that is currently estab-lished by the SET RELEVANCE_METHOD statement is used. You can override the current relevance method for this SELECT state-ment by specifying another relevance method in this option.

The TABLENAME function is used to generate a column in the working table that contains a character string that identifies the name of the originating table for each of the selected rows. This value is useful for identifying the table to which a row belongs when a UNION clause is used in the SELECT statement. However, you can't use this function to obtain the table name in a View.

The TABLEQUALIFIER function is used to generate a column in the working table that contains a character string that identifies the node name or pathname of the originating table for each of the se-lected rows. This value is useful for identifying the table to which a row belongs when a UNION clause is used in the SELECT state-

Page 502: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

502 SA-Application Software Expert 5.0

D

ment. However, you can't use this function in a View.

The FULLNAME function is used to generate a column in the work-ing table that contains the full pathname associated with the external text document of the current row. This function is useful to deter-mine the location of the documents associated with the table, espe-cially if the documents reside in another location.

The ORIGINAL function allows you to retrieve the external text document in its original format instead of the SearchServer translat-ed format usually used for retrieving documents. This function is useful when viewing the document with a non-Fulcrum product.

The CUSTOM_VIEWER function allows you to retrieve the exter-nal text document in a viewer-proprietary data stream.

The MARKER_LIST function allows you to retrieve application data and positioning data associated with an external document to be used within document viewers.

The MATCH_VCC_LIST function allows you to retrieve position information pertaining to terms matched by the SELECT statement to be used to document viewers.

The VCC_RULES function allows you to retrieve a list of the 255 one-digit numbers that indicate, for each character in the current ap-plication character set, how many counts that character is assigned when calculating position for the purpose of highlighting. This in-formation is used by document viewers.

AS <column alias>

An optional clause that gives an alias for a selected column. The alias can be used in the optional ORDER BY clause to refer to the column in the working table.

Function column names are not permitted in an ORDER BY clause. If any function is to be the basis for ordering rows in the working ta-ble, the function must be given an alias, and the alias must be used in the ORDER BY clause.

FROM <table name>

The FROM clause includes a compound identifier that specifies the name and location of the table or tables being searched. For a com-plete description of how a table name is formed, see "Compound Identifiers" in Chapter 3, "SearchSQL Language Elements."

<correlation name>

An optional identifier that specifies an alias for a table. However, SearchServer ignores all correlation names in the syntax. It is used only for compatibility with non-Fulcrum database products.

Page 503: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 503

UNION <table name>

An optional clause that allows you to combine the rows and columns of the tables specified. The working table contains all the selected rows from each of the tables in the UNION, and the columns com-prise all of the unique columns selected from the constituent tables.

You can safely union two or more tables, provided that the schemas are consistent. In this case, each column that appears in more than one table must have identical column attributes. The columns that aren't common between schemas must not have zone and field num-bers that conflict with any other column in the working table. A col-umn that doesn't exist in a given table is assumed to contain NULL values.

WHERE <search condition>

An optional clause that specifies the criteria for selecting indexed rows from the table in the form of a condition that is 'TRUE' for each row selected for the working table. The search condition can be a simple test (predicate) or a combination of several tests. All rows for which the search condition is 'TRUE' are included in the working ta-ble.

Note: Omitting the WHERE clause results in all of the rows in a ta-ble being included in the working table.

For a complete description see the "WHERE Clause," later in this chapter.

ORDER BY <sort criteria>

An optional clause that causes the selected rows to appear in sorted order in the working table. The sort criteria specify the hierarchical list of columns that will influence the order of these rows.

If you omit this clause, the order of rows in the working table isn't defined. A SELECT statement without an ORDER BY clause pro-duces a consistently ordered working table, if repeatedly executed on identical data. For a complete description see the "ORDER BY Clause," earlier in this chapter.

CAUTION: The default ordering can change between SearchServer releases. Therefore, if your application (or ExecSQL script) depends on the ordering of rows in a working table, you must always use the ORDER BY clause.

Page 504: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

504 SA-Application Software Expert 5.0

D

FOR UPDATE

This option declares that the SearchServer application reserved the right to update or delete any row in the working table, and therefore exclusive access is required when a row is retrieved. Updates can be performed even if this option isn't specified, but are more suscepti-ble to failure. This option has no effect on the actual creation of the working table.

Note: To maximize sharing of data with other applications and us-ers, use this option only if you intend to execute a positioned UP-DATE or DELETE on a row.

Description

The select list specifies the subset of columns, and the WHERE clause specifies criteria that select the subset of data rows that form the working table. SearchServer never includes a directory row in a working table.

If this statement is performed on an immediate table, the working ta-ble will reflect the result of any DELETE, INSERT, or UPDATE statements issued since the last VALIDATE INDEX statement. If it is performed on a periodic table, the working table reflects the state of the table at the time of the last successful VALIDATE INDEX statement.

SearchServer issues an error message if a VALIDATE INDEX statement has never been executed successfully for a table created without the IMMEDIATE table parameter, or if the table contains no rows. These tables aren't searchable.

Examples

The following examples show how to formulate a SELECT state-ment from its simplest form to more complex statements that use clauses, predicates, Boolean operators, and functions:

SELECT * FROM SUPPORT

SELECT ALL PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS

FROM SUPPORT SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'FILTER' SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT% FILTER_' SELECT COMPANY, PRIORITY, STATUS, TEXT_LOG

Page 505: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 505

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER_' WITHIN 10 CHARACTERS OF 'PROCESSING' IN_ORDER SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE PRIORITY >= 2 SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE CURSOR SQL_CUR00001 WITHOUT CONTEXT SELECT RELEVANCE() AS RANK, PROBLEM_NUMBER, COM-

PANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG IS_ABOUT VEC1 ORDER BY PROBLEM_NUMBER, PRIORITY, COMPANY, RANK

DESC SELECT PROBLEM_NUMBER, SUBJECT FROM SUPPORT WHERE SUBJECT CONTAINS 'order of words' | 'text

vector' SELECT PROBLEM_NUMBER, CREATOR, SUBJECT FROM SUPPORT WHERE CREATOR NOT CONTAINS 'PETER' & 'MARIE' SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS,

CREATOR FROM SUPPORT WHERE (CREATOR CONTAINS 'PETER' OR STATUS CON-

TAINS 'CLOSED') AND PRIME_CONTACT CONTAINS 'MONTAG' SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS('DISK',

WORD_SYNONYM) SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS('DISK',

WORD_NARROW, 'narrow.fth')

For More Information

• about limiting the select criteria, see the "WHERE Clause" later in this chapter.

• about the RELEVANCE, TABLENAME, and TABLEQUALIFIER functions, see "RELEVANCE Function," "TABLENAME Function," and "TABLEQUALIFIER Function" in this chapter.

• about changing the relevance method, see the "SET Statements" later in this chapter.

• about updating or deleting a row, see Fulcrum SearchServer Data Preparation and Administration.

Page 506: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

506 SA-Application Software Expert 5.0

D

• about influencing the order in which rows are written to a working table, see the "ORDER BY Clause" earlier in this chapter.

SET Statements

Sets options for subsequently executed statements for the duration of a connection to a particular data source.

Syntax SET BASEPATH <base path> | SET CHARACTER_SET <character string literal> | SET CHARACTER_VARIANT <character string liter-

al> | SET CHECK_TEXT_STATUS <character string liter-

al> | SET COLLATION_SEQUENCE <character string liter-

al> | SET FRAGMENTED <character string literal> | SET IMMEDIATE <immediate parameter> | SET INDEXDIR <index directory> | SET MAX_EXEC_TIME <unsigned integer> | SET MAX_SEARCH_ROWS <unsigned integer> | SET NOLOCKING <nolocking parameter> | SET NORMALIZATION <normalization parameter> | SET POSITIONING_UNIT <character string literal> | SET REFERENCE_SPOOLING <character string liter-

al> | SET RELEVANCE_METHOD <relevance method> | SET SEARCH_MEMORY_SIZE <search memory parame-

ter> | SET SERVER_REPORT_TIME <unsigned integer> | SET SHOW_MATCHES <character string literal> | SET SHOW_SGR <character string literal> | SET STOPFILE <stop filename> | SET TERM_GENERATOR <character string literal> | SET THESAURUS_NAME <character string literal> | SET VECTOR_GENERATOR <character string literal> | SET WILDCARD_OPT <wildcard optimization method> | SET WORKDIR <work directory> <base path> ::= <character string literal>

<immediate parameter> ::= {'FALSE' | 'TRUE'}

<index directory> ::= <character string literal>

<nolocking parameter> ::= {'FALSE' | 'TRUE'} <normalization parameter> ::= {'DEFAULT' | 'EUROPA3'

Page 507: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 507

| 'ARABIC'} | 'NONE' | 'ASIAN' <search memory parameter> ::= <unsigned integer>

<stop filename> ::= <character string literal>

<wildcard optimization method> ::= {'MINIMIZE_INDEX_OVERHEAD' | 'MINIMIZE_SEARCH_TIME' | 'NONE'}

<work directory> ::= <character string literal>

Keywords and Options

SET BASEPATH <base path>

Specifies the base directory location of the document files and direc-tories. When SearchServer needs to know where a file or directory is located, it takes the filename and prepends the path specified by this SET statement (unless the file or directory name is fully quali-fied).

When you use this statement, you don't have to include the full path when specifying a file, providing the file is located in the base path location or anywhere relative to that location.

Note: The base directory doesn't have to exist when the table is first created, but must exist when a row is inserted with a value supplied for the FT_SFNAME reserved column.

The value specified in this statement affects only subsequent CRE-ATETABLE statements, and CREATESCHEMA statements that don't explicitly specify the BASEPATH table parameter in the CRE-ATETABLE clause. The default value is two quotation marks (' ') that don't specify a base path. The current setting can be determined from the BASEPATH server attribute in the SERVER_INFO sys-tem table.

SET CHARACTER_SET <character string literal>

Specifies the name of a character set used by the application. All character data returned by SearchServer, and all character data (such as SearchServer SQL statements) passed into SearchServer will be assumed to be in this character set.

Page 508: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

508 SA-Application Software Expert 5.0

D

For more information about the relationship between the character set, case normalization, and the internal character set, refer to Ap-pendix A. "Utility Program Summary." For more information about the functional effect of this setting, refer to Fulcrum SearchServer Data Preparation and Administration and Fulcrum SearchServer C Developer's Guide.

The current setting can be determined from the CHARACTER_SET server attribute in the SERVER_INFO system table. The character set for your platform is set to the default for each connection. See Fulcrum SearchServer Getting Started for your platform for details about the default character set for your platform.

Note: The case of the literal in the SET CHARACTER_SET state-ment is relevant. They must be uppercase.

SET CHARACTER_VARIANT <character string literal>

Specifies the filename of the character variant rules file. This file al-lows typographical variants of the same word to be treated as equiv-alent when searching. If you omit the filename extension, the standard extension (.FTL) is automatically included. You can also include a full pathname for the file.

If you omit the pathname, SearchServer searches for the rules file in the list of directories specified by the FULSEARCH server attribute. The rules file must be local to the table being searched. In a client/server situation, the rules file must be found on the same server node as the table.

The current setting can be determined from the CHARACTER_VARIANT server attribute in the SERVER_INFO system table. The default value is two quotation marks (' ') that spec-ifies that the rules file is disabled. If this file can't be found, character variant generation is disabled without warning. If the file is found and can't be used, the search is terminated and an error is returned.

SET CHECK_TEXT_STATUS <character string literal>

Specifies if the timestamps of the external documents are checked against FT_MTIME before the data is retrieved. This statement can be used to improve retrieval performance for tables on CD-ROM or tables that are specified as read-only. In these cases, there is no need to check if the external document has been changed since the table was created and indexed because the external document can't be changed and the index can never be out of date.

The default value is 'TRUE', to have the timestamp checked on the external document. Alternatively, the value 'FALSE' turns off the timestamp check. The consequence of setting this value to 'FALSE'

Page 509: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 509

inappropriately, is incorrect highlighting (incorrect positioning of match codes).

This statement has an effect only on the external document times-tamp checking and not on timestamp checking of other columns. The timestamp of all other columns is always checked by Search-Server. If one of these columns has been updated (changing the timestamp for the entire row) since the last indexing operation, then SearchServer reports SQLSTATE 80972 "The data has changed since the index used to perform this search" when retrieving data. In this case, this error message is reported when positioning to the row and retrieving data from any column of the row including the exter-nal text column, by default FT_TEXT.

The current setting can be determined from the CHECK_TEXT_STATUS server attribute in the SERVER_INFO system table.

SET COLLATION_SEQUENCE <character string literal>

Specifies the name of a collation function that is used to determine the ordering of character strings. The default collation sequence pro-vides dictionary ordering for English and French text being returned by most languages based on Latin 1. It is based on the Canadian Standards Association standard CAN/CSA-Z243.4. This default collation sequence provides dictionary ordering for English text in all the supported application character sets. To set the default colla-tion sequence, use one of the following methods:

• SET COLLATION_SEQUENCE 'dictionary_latin1'

• SET COLLATION_SEQUENCE 'default'

Note: The SET COLLATION_SEQUENCE statement does not re-turn an error if the collation sequence is not found.

SearchServer supports custom collation sequences that might be re-quired for a particular application. When a custom collation se-quence is available, you can activate it by specifying its name in this statement. For more information about how to customize Search-Server in this way, see the Fulcrum SearchServer Customization Guide.

The current setting can be determined from the COLLATION_SEQUENCE server attribute in the SERVER_INFO system table.

Note: The case of the literal in the SET COLLATION_SEQUENCE statement is significantrelevant.

Page 510: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

510 SA-Application Software Expert 5.0

D

SET FRAGMENTED <character string literal>

Specifies that there might be non-contiguous repeated field numbers in a table. This statement is required only for data compatibility with tables that have been updated by software based on Fulcrum prod-ucts other than SearchServer (for example, Fulcrum Ful/Text).

The default value is 'FALSE', to indicate that there are no repeated field numbers in the table. When searching and retrieving data, SearchServer can operate more efficiently if all field numbers are contiguous. In this case, the occurrence of a different field number signals the end of data for the column currently being retrieved. Oth-erwise, if the value is 'TRUE', SearchServer must read through all the data in the columns (except the external text column) in case there is a later disjoint instance of data belonging to the column.

SearchServer allows you to retrieve and update any columns in ta-bles based on other Fulcrum products.

The current setting can be determined from the FRAGMENTED server attribute in the SERVER_INFO system table.

SET IMMEDIATE <immediate parameter>

Each table has a periodic index that is only updated when a VALI-DATEINDEX statement is executed. The SETIMMEDIATE state-ment qualifies the table as an immediate table. When an immediate table is created, a differential index is created in addition to the stan-dard periodic index.

The differential index is updated each time a change is made to the table so that the differential index accumulates any changes to a ta-ble since the last VALIDATEINDEX statement was performed. The periodic index doesn't affect updates until a VALIDATE INDEX statement is executed. A table must be created with the IMMEDI-ATE value set to 'TRUE' to have a differential index created. This allows a SELECT statement to match any row that has been inserted or updated, regardless of whether or when a VALIDATEINDEX statement has been performed.

The value specified in this statement affects only subsequent CRE-ATETABLE statements, and CREATESCHEMA statements that don't specify IMMEDIATE or PERIODIC in the CREATETABLE clause. The default value is 'TRUE', which specifies immediate in-dexing. The current setting can be determined from the IMMEDI-ATE server attribute in the SERVER_INFO system table.

SET INDEXDIR <index directory>

Specifies a directory that is used to contain the data and index files for the table. The default value is two quotation marks (' '), which specifies that these files are placed in the default location. If this statement is specified, the directory must exist at the time the CRE-ATESCHEMA or CREATETABLE statement is executed for im-

Page 511: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 511

mediate indexed tables. Otherwise, the table isn't created. For periodic tables, the directory must exist before the first VALI-DATEINDEX statement is executed on the table.

The value specified in this statement affects only subsequent CRE-ATETABLE statements, and CREATESCHEMA statements that don't explicitly specify the INDEXDIR table parameter in the CRE-ATETABLE clause. The current setting can be determined from the INDEXDIR server attribute in the SERVER_INFO system table.

SET MAX_EXEC_TIME <unsigned integer>

Specifies the maximum execution time, in milliseconds, for the VALIDATEINDEX, CREATE TEXT_VECTOR, searched UP-DATE, searched DELETE, and SELECT statements. The numeric value that is specified can be rounded up to the nearest second de-pending on the operating system.

Setting this value permits an application to stop these statements from executing for a long period of time. These are the only state-ments that are executed in stages, and the only statements that are af-fected by the SET MAX_EXEC_TIME statement.

The default value is 0, which allows SearchServer to execute for an unlimited amount of time. The current setting can be determined from the MAX_EXEC_TIME server attribute in the SERVER_INFO system table. You should use the default value when executing a searched UPDATE or DELETE statement.

SET MAX_SEARCH_ROWS <unsigned integer>

Specifies the maximum number of rows to be included in a working table. Setting this value permits an application to stop very general searches from returning too many rows. For example, you can limit the search so that it retrieves only 10 result rows.

The default value is 0, which allows SearchServer to return an un-limited number of rows. The current setting can be determined from the MAX_SEARCH_ROWS server attribute in the SERVER_INFO system table.

When the SELECT statement specifies that the rows in the working table should be ranked and ordered by relevance, setting MAX_SEARCH_ROWS will reduce the time SearchServer re-quires to execute the SELECT statement.

SET NOLOCKING <nolocking parameter>

This statement enables or disables the use of the standard row lock-ing protocol on tables created by this session. If NOLOCKING is set to 'TRUE', row locking will not be used. If it is 'FALSE', row locking will be used.

Page 512: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

512 SA-Application Software Expert 5.0

D

When row locking is not used, access to a row is never denied be-cause another application is using or modifying it, but the integrity of the data in the table is not guaranteed if updates are being per-formed by another application. Tables that do not use row locking must be updated by only one application at a time, otherwise their data may be corrupted.

When locking is used, an application will be refused access to a row for retrieval or update if another application is currently retrieving or updating it. This is called a locking conflict. The duration of the lock used during a normal retrieval is very short, and in most environ-ments will not preclude simultaneous retrieval by other applications. Locks applied explicitly, or as a result of retrieval from a SELECT statement specifying the FOR UPDATE option, persist until the locking application removes them and prevent both retrieval and up-date.

SET NORMALIZATION <normalization parameter>

Specifies the case normalization strategy used when indexing the data in the table. Normalization is always performed on the data af-ter it has been translated into FTICS.

The following list describes the normalization parameters available:

The value specified in this statement affects only subsequent CRE-ATETABLE statements, and CREATESCHEMA statements that don't explicitly specify the NORMALIZATION table parameter in the CREATETABLE clause. The default value is 'DEFAULT'. The current setting can be determined from the NORMALIZATION server attribute in the SERVER_INFO system table.

ASIAN Case normalization is restricted to the 26 lowercase letters in the 7-bit subset of the FTCS94 table (mapped to the 26 uppercase letters).

DEFAULT Lowercase letters are mapped onto uppercase letters. Accented characters are mapped onto the unaccented uppercase letter. Characters xF1, xF2, xF4, xF6 through xFA, and 0xFc through 0xFE are mapped onto the corresponding character in column E. This case normalization strategy is based on the FTCS94 character set table.

EUROPA3 Case normalization mapping is done according to the translation table provided for the Europa3 character set. This case normalization strategy is based on the EFTCS94 character set table.

ARABIC Case normalization mapping is done according to the translation table provided for the ARABIC character set. This case normalization strategy is based on the AFTCS94 character set table.

NONE No case normalization mapping is done.

Page 513: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 513

SET POSITIONING_UNIT <character string literal>

Specifies the unit to be used for positioning. If the value is set to PAGE, the current position is set to the beginning of the specified document page. The default value is DEFAULT_POSITIONING. For more information about STREAM mode or page navigation, see the Fulcrum SearchServer or Fulcrum SearchBuilder Developer's Guide for your environment.

SET REFERENCE_SPOOLING <character string literal>

Enables and disables reference file spooling when performing searches on a CD-ROM. Reference file spooling improves search time when accessing a table on a CD-ROM. It cannot be used on an IMMEDIATE table.

Because a CD-ROM has comparatively long seek times, its search speed can be improved by making a temporary copy on a high-speed disk of some of the index information needed for the search.

If the value is set to 'TRUE', all searches use reference file spooling, if appropriate. Reference spooling only occurs when more than one reference string is too long to fit in memory. That is, it only occurs if it can increase the speed of a search. If the value is set to 'FALSE', no searches use reference file spooling. The default value is 'FALSE'.

The current setting can be determined from the REFERENCE_SPOOLING server attribute in the SERVER_INFO system table.

SET RELEVANCE_METHOD <relevance method>

Specifies a character string that determines the retrieval model and relevance algorithm to be used in all subsequent query specifica-tions. The syntax and semantics of the relevance method string are described in the section, "RELEVANCE Function," earlier in this chapter.

The relevance method applies to any subsequent SELECT state-ments that use the RELEVANCE function with no parameter. The initial value is an empty string (' '), which indicates that no relevance method is enabled. The current setting can be determined from the RELEVANCE_METHOD server attribute in the SERVER_INFO system table.

The relevance method set by this statement can be temporarily su-perseded for any specific query by specifying an argument for the RELEVANCE function in the SELECT statement.

SET SEARCH_MEMORY_SIZE <search memory parameter>

Specifies the maximum search memory size (in kilobytes) to be used

Page 514: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

514 SA-Application Software Expert 5.0

D

to construct a search. Setting this value allows you to successfully execute large searches, limited only by available system resources.

In some cases, searches with a large number of terms or that generate a large number of terms through wildcards or thesaurus lookup can return SQLSTATE "Search too general." Setting this value to a higher number can alleviate this problem. In CD-ROM environ-ments, increasing this value can improve search performance.

The default value in a 16-bit Windows environment is 63K, which is the maximum in that environment. For all other platforms, the de-fault value is 512K. The current setting can be determined from the SEARCH_MEMORY_SIZE server attribute in the SERVER_INFO system table.

Note: This statement is useful only when searching tables that don't reside on Windows systems.

SET SERVER_REPORT_TIME <unsigned integer>

Specifies the maximum interval (in milliseconds), during which a SearchServer API function will be permitted to execute before re-turning control to the application. The numeric value that is speci-fied can be rounded up to the nearest second, depending on the operating system.

In the case where a SELECT statement includes the optional OR-DER BY clause, SearchServer could ignore the SERVER_REPORT_TIME value at times during the execution of the SELECT statement.

The default value is 1, which allows SearchServer to return with SQL_STILL_EXECUTING if the asynchronous execution option has been specified. Otherwise, the SERVER_REPORT_TIME is ig-nored. The current setting can be determined from the SERVER_REPORT_TIME server attribute of the SERVER_INFO system table.

SET SHOW_MATCHES <character string literal>

Specifies whether match codes are inserted into the data returned from a search to indicate the location of the matched words. The codes surround each individual matched word.

The character string value determines if matches are marked with match codes in the data returned in the working table after a SET SHOW_MATCHES statement is executed. The following is a list of all valid values for the character string literal:

Page 515: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 515

The current setting can be determined from the SHOW_MATCHES server attribute in the SERVER_INFO system table. The actual match codes used can be determined from the MATCH_CODE_START and MATCH_CODE_END server at-tributes in the SERVER_INFO table.

Note: There is a limit on the number of match code pairs that can be inserted into each row. The limit is environment dependent, but it is not less than 8192. When the limit is exceeded for a row, Search-Server stops inserting match codes, for that row only, without in-forming the application. Where this occurs in a row depends on the order of columns in the table, and the number of matches in each col-umn.

SET SHOW_SGR <character string literal> Specifies whether control and escape sequences are inserted into the data returned from a search. The following is a list of all valid values for the character string lit-eral:

The current setting can be determined from the SHOW_SGR server attribute in the SERVER_INFO system table. Insert of match code

'TRUE' match codes are inserted for all columns

'FALSE' match codes are not inserted for all columns

'DEFAULT' match codes are not inserted for all columns

'INTERNAL_COLUMNS' match codes are inserted for all internal columns and aren't inserted for the external text column

'EXTERNAL_COLUMN' match codes are inserted for the external text column and aren't inserted for internal columns

'TRUE' control and escape sequences are inserted for all columns

'FALSE' control and escape sequences are not inserted for all columns

'DEFAULT' control and escape sequences are not inserted for all columns

'INTERNAL_COLUMNS' control and escape sequences are inserted for all internal columns and aren't inserted for the external text column

'EXTERNAL_COLUMN' control and escape sequences are inserted for the external text column and aren't inserted for internal columns

Page 516: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

516 SA-Application Software Expert 5.0

D

sequences are not affected by SHOW_SGR.

SET STOPFILE <stop filename>

Specifies an operating system file that contains a list of words not to be indexed (stop words). Stop words can't be found by a SELECT statement if they are in a column defined with NORMAL index mode.

The stop file is assumed to be in the directory where the table con-figuration is created unless the stop filename is a fully qualified pathname. The default value, two quotation marks (' '), specifies no stop file is used. SearchServer provides a stop file called FUL-TEXT.STP, which can be used by explicitly specifying it in this pa-rameter.

If this statement is specified, the stop file must exist at the time the CREATESCHEMA or CREATETABLE statement is executed for immediate indexed tables. Otherwise, the table isn't created. For pe-riodic tables, the stop file must exist before the first VALIDATEIN-DEX statement is executed on the table.

The value specified in this statement affects only subsequent CRE-ATETABLE statements and CREATESCHEMA statements that don't explicitly specify the STOPFILE table parameter in the CRE-ATETABLE clause. The current setting can be determined from the STOPFILE server attribute in the SERVER_INFO system table.

SET TERM_GENERATOR <character string literal>

Specifies the type of linguistic processing to be performed on all search terms in the contains or like predicates of subsequent SE-LECT statements for the current connection. The character string lit-eral uses the format specified for the is in a similar form as the linguistic filter specification of the THESAURUS function, but can only contain one option. For example, if the value is 'WORD!FTELP/INFLECT',all search terms specified in a con-tains or like predciate are expanded to include the inflected forms of the term. In this case the following WHERE clause:

WHERE FT_TEXT CONTAINS 'WORD'SET TERM_GENERATOR 'WORD!FTELP/INFLECT'

expands the search to include "word," "words," "word's," "worded," and "wording."

The current setting can be determined from the TERM_GENERATOR server attribute in the SERVER_INFO sys-tem table. The initial value, an empty string ('') specifies that TERM_GENERATOR linguistic processing is disabled. The setting 'DEFAULT' is equivalent to 'WORD!FTELP/INFLECT', but note that this is not the default setting (the empty string).

Page 517: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 517

If this statement is specified, linguistic processing is performed on search terms in subsequent SELECT statements that specify the like predicate.

SET THESAURUS_NAME <character string literal>

Specifies the filename of the default thesaurus file. If you omit the filename extension, the standard extension (.FTH), is included auto-matically. You can also include a full pathname for the file.

If you omit the pathname, SearchServer searches for the thesaurus rules file in the list of directories specified by the FULSEARCH server attribute. The thesaurus file must be local to the table being searched. In a client/server environment, the rules file must be found on the same server node as the table.

The default thesaurus name applies to an instance of the THESAU-RUS function that omits the thesaurus name parameter.

The current thesaurus file can be determined from the THESAURUS_NAME server attribute in the SERVER_INFO sys-tem table. The initial value, two quotation marks (' '), specifies that the thesaurus is disabled. If a thesaurus file has been specified, you can remove it by setting this value to the default (' '), or by terminat-ing the session. If this file can't be found, thesaurus lookup is dis-abled without warning. If the file is found and can't be used, then the search is terminated and an error is returned.

SET VECTOR_GENERATOR

Specifies the type of linguistic processing to be performed on the search term both before and after during Intuitive Searching process-ing for a connection. The linguistic processing pre-scans the terms being submitted to Intuitive Searching to reduce the terms to a base form, does the statistical ranking, and then expands the significant terms selected by using the specified expansion options. If this state-ment is executed, each SELECT statement in the connection that contains an is_about predicate will have the search terms affected by the linguistic processing specified.expands the linguistic processing to include the linguistic filter specification specified in this state-ment. This allows linguistic processing to be performed on the search term both before and after Intuitive Searching.

The character string literal has one of the following forms:is com-prised of one or a linguistic filter specification.

'<linguistic specification' '<linguistic specification>|*|<linguistic speci-

fication>'

In the first form, no reduction to base form is done. In the second form, the Intuitive Searching processing step is represented by the placeholder "*". Since the left part of this specification directs the re-duction to base form, this specification must result in a single term

Page 518: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

518 SA-Application Software Expert 5.0

D

for each term in the sample text. To ensure this, the "single" ftelp op-tion must be used. Thesaurus expansion (ftiet) should not be used. The only ftelp options that are valid in this case are "base" and "root."

The current setting can be determined from the VECTOR_GENERATOR server attribute in the SERVER_INFO system table. The default value is

'WORD!FTELP/BASE/SINGLE|*|WORD!FTELP/INFLECT'

This specifies that all terms processed through the is_about predi-cate are reduced to a single base form before being converted to a text vector (that is, a list of unique terms and counts) by basic Intui-tive Search processing. The resultant terms are then expanded by the linguistic filter to include all inflected forms.

To reset the value to the SearchServer default, execute the following statement:

SET VECTOR_GENERATOR 'default'

To disable VECTOR_GENERATOR linguistic processing, set the value to an empty string ('').

It is assumed that exactly one Intuitive Search process, defined by the is_about predicate parameters, is required for any given search. If the "*" placeholder is not present in the query specification, the In-tuitive Search process precedes any linguistic filter specifications. For example, setting VECTOR_GENERATOR to 'WORD!FTELP/INFLECT' performs the Intuitive Search process first, followed by inflection.

SET WILDCARD_OPT <wildcard optimization method> Specifies the type of wildcard optimization to be enabled for the ta-ble. There are three wildcard optimization methods:

MINIMIZE_INDEX_OVERHEAD

This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method. The MINIMIZE_INDEX_OVERHEAD option gives nearly as good search performance as MINIMIZE_SEARCH_TIME (except for complex searches on CD-ROM) with little or more indexing time and storage overhead than NONE.

Page 519: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 519

The default is NONE. The value specified in this statement affects only subsequent CREATETABLE statements, and CREATESCHE-MA statements that don't explicitly specify the WILDCARD_OPT table parameter in the CREATETABLE clause. The current setting can be determined from the WILDCARD_OPT server attribute in the SERVER_INFO system table.

SET WORKDIR <work directory>

Specifies a work directory to be used for temporary files that might be required to accommodate buffer overflow. You must ensure that the specified directory is large enough to accommodate indexing. During the execution of a VALIDATE INDEX statement, the index files are built in the work directory. Once the VALIDATEINDEX statement has completed executing, the index files are moved into the index directory.

If this statement is specified, the directory must exist at the time the CREATESCHEMA or CREATETABLE statement is executed for immediate indexed tables. For periodic tables, the directory must ex-ist before the first VALIDATEINDEX statements executed on the table. Otherwise, the table isn't created.

The default value is 'DEFAULT', which specifies the default loca-tion for temporary files that is recorded in the FULTEMP server at-tribute.

The value specified in this statement affects only subsequent CRE-ATETABLE statements, and CREATESCHEMA statements that don't explicitly specify the WORKDIR table parameter in the CRE-ATETABLE clause. The current setting can be determined from the WORKDIR server attribute in the SERVER_INFO system table.

Description

Many of the parameters in these statements directly or indirectly af-fect the results of a SELECT statement. The parameters in the SET MAX_EXEC_TIME, SETMAX_SEARCH_ROWS, and SET SERVER_REPORT_TIME statements also affect other statements. The state of each parameter is recorded in the corresponding row of the SERVER_INFO system table.

MINIMIZE_SEARCH_TIME

This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.

NONE No wildcard optimization is enabled for the table. Performance for prefix and infix wildcard searches is substantially reduced.

Page 520: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

520 SA-Application Software Expert 5.0

D

Examples

The following example sets the maximum time limit to 500 millisec-onds for certain SearchSQL statements to execute before execution is halted:

SET MAX_EXEC_TIME 500

The next example sets the maximum number of rows in a working table to 10:

SET MAX_SEARCH_ROWS 10

The next example sets the retrieval model to strict Boolean and the relevance method to algorithm 4:

SET RELEVANCE_METHOD '2:4'

The next example sets the maximum time limit to 300 milliseconds for the SQLExecDirect API function using the asynchronous op-tion, to execute before returning control to the application

SET SERVER_REPORT_TIME 300

The next example sets the data returned from a SELECT statement to contain match codes:

SET SHOW_MATCHES 'TRUE'

The final example sets the thesaurus name to be used in subsequent SELECT statements that specify the THESAURUS function with-out a thesaurus name parameter:

SET THESAURUS_NAME 'SUPPORT.FTH'

For More Information

• about literals, see "Literals," in Chapter 3, "SearchSQL Language Elements."

• about system tables, see Chapter 5, "System Information Tables."

• about relevance methods see the "RELEVANCE Function," earlier in this chapter.

• about thesaurus usage, see the "THESAURUS Function," later in this chapter.

• about character variant usage, see Fulcrum SearchServer Data Preparation and Administration.

• about formatting codes, see Fulcrum SearchServer Data Preparation and Administration.

Page 521: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 521

• about application character sets, see Fulcrum SearchServer Data Preparation and Administration.

• about the functional effect of the SETCOLLATION_SEQUENCE setting, see the "ORDER BY Clause," earlier in this chapter.

• about statement and connection options, see the Fulcrum SearchServer or Fulcrum SearchBuilder Developer's Guide for your environment.

TABLENAME Function

Returns the table name of the current row.

Syntax

TABLENAME()

Description

This function determines the name of the table that created this row in the working table. It is useful when you have used a UNION clause to search more than one table.

This function can only be used in the select list of a SELECT state-ment. To order the result list by table names, you must use an alias in the select list and use the alias in the ORDER BY clause.

The data type of the value returned by this function is VAR-CHAR(128).

Example

The following example retrieves the name of the table for each row retrieved (assuming that the ARCHIVE table has a schema identical to the SUPPORT table schema):

SELECT TABLENAME() AS BASETABLE, PROBLEM_NUMBER, COMPANY, LAST_MODIFIED

FROM SUPPORT UNION ARCHIVE WHERE TEXT_LOG CONTAINS 'STATEMENT' ORDER BY BASETABLE

For More Information

• about how to use the TABLENAME function, see Chapter 2, "The Search Process."

• about ordering the result list by table names, see the "ORDER BY Clause" earlier in this chapter.

Page 522: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

522 SA-Application Software Expert 5.0

D

TABLEQUALIFIER Function

Returns either a node name where the table resides or the pathname to the table of the current row.

Syntax

TABLEQUALIFIER()

Description

This function determines the node name of a remote table or the ab-solute path of a local table that created this row in the working table. It is useful when you have used a UNION clause to search more than one table. It should be used in conjunction with the TABLENAME function to form a fully-qualified table name in the working table. This is useful when you have identically named tables in the UNION clause of a SELECT statement.

The TABLEQUALIFIER function can be used only in the select list of a SELECT statement. To order the result list by table names, you must use an alias in the select list and use the alias in the ORDER BY clause.

The TABLEQUALIFIER column in the working table is case-sen-sitive. This means that the node name is always returned as upper-case, but the pathname remains unchanged.

The data type of the value returned by this function is VARCHAR.

Example

The following example retrieves the name of the table and its node name or absolute path for each row retrieved:

SELECT TABLEQUALIFIER(), TABLENAME(), PROBLEM_NUMBER, COMPANY, TEXT_LOG

FROM SUPPORT UNION ARCHIVE_SUPPORT WHERE TEXT_LOG CONTAINS 'STATEMENT'

For More Information

• about how to use the TABLEQUALIFIER function, see Chapter 2, "The Search Process."

• about ordering the result list by table names, see the "ORDER BY Clause" earlier in this chapter.

Page 523: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 523

THESAURUS Function

Specifies that a word or phrase be expanded using a customizable thesaurus.

Syntax

<thesaurus predicate function> ::= THESAURUS (<thesaurus string>, <thesaurus operator> [, <thesaurus

name>]) | THESAURUS (<thesaurus string>, <linguistic operator> [, <linguistic specification>])

<thesaurus string> ::= <character string literal>

<thesaurus operator> ::= WORD_SYNONYM | WORD_SUFFIX | WORD_SIMILARITY | WORD_BROADEN | WORD_NARROW

<thesaurus name> ::= <character string literal>

<linguistic operator> ::= WORD_MODIFY

<linguistic specification> ::= <linguistic filter list>

<linguistic filter list> ::= <linguistic filter specification> [{<pipe separator> <linguistic filter specification>}]

<pipe separator> ::= |

<linguistic filter specification> ::= <linguistic rules filter> {/<linguistic rules option>} | <international thesaurus filter> { '/' <international thesaurus option> }

<linguistic rules filter> ::= 'word!ftelp'

<international thesaurus filter> ::= 'word!ftiet'

Page 524: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

524 SA-Application Software Expert 5.0

D

<linguistic rules option> ::= 'base' | 'compound' | 'derive' | 'expand' | 'inflect' | 'lang=' <linguistic rules language> | 'root' | 'single' | 'spell' [ '='<unsigned integer> ]

<linguistic rules language> ::= 'english' | 'french' | 'dutch' | 'german' | 'italian' | 'spanish' | 'swedish' | 'portuguese'

<international thesaurus option> ::= 'inflect' | 'lang=' <international thesaurus lan-

guage> | 'limit=' <unsigned integer> | 'recap' | 'spell' [ '=' <unsigned integer> ]

<international thesaurus language> ::= 'usenglish' | 'ukenglish' | 'danish' | 'dutch' | 'finnish' | 'french' | 'german' | 'italian' | 'norwegian' | 'spanish' | 'swedish' | 'brportuguese' | 'euportuguese'

Keywords and Options

<thesaurus string>

The thesaurus string can be a simple word, compound word, or word sequence. Words that contain string wildcards are valid in the search, but the thesaurus processing is bypassed. Stop words are ig-nored.

Page 525: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 525

<thesaurus operator>

Specifies the type of expansion to be performed.

WORD_SYNONYM

A keyword that expands the word or phrase to include equivalent words before processing the predicate. SearchServer generates a word list containing all the equivalent words. It then expands the search to include all the words in the word list.

WORD_SUFFIX

A keyword that expands the word or phrase to include their plural and possessive forms before processing the predicate.

WORD_SIMILARITY

A keyword that expands the word or phrase to include equivalent words, or plural and possessive forms. This option gives the syn-onym processing priority over the suffix processing. If there is a syn-onym match, there is no further search for an additional suffix match. However, if there is no synonym match, then suffix process-ing is performed.

WORD_BROADEN

A keyword that is equivalent to the WORD_SYNONYM keyword. It can be used to clarify the use of the optional thesaurus file.

WORD_NARROW A keyword that is equivalent to the WORD_SYNONYM keyword. It can be used to clarify the use of the optional thesaurus file.

<thesaurus name>

The thesaurus name can be specified as an optional third parameter. This allows you to attach a specific thesaurus to each instance of the search. This thesaurus file is used only for this instance and doesn't change the THESAURUS_NAME server attribute in the SERVER_INFO system table.

If this parameter is omitted, the default thesaurus is used, as speci-fied in the last SETTHESAURUS_NAME statement executed. If no thesaurus name is set, thesaurus processing is bypassed. WORD_MODIFY

A keyword that specifies that the search term be expanded by using linguistic processing.

<linguistic rules filter>

Specifies the name of the linguistic rules filter to be used for linguis-tic processing. The linguistic rules filter must also be an ELP entry

Page 526: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

526 SA-Application Software Expert 5.0

D

in the FULTEXT.FTC file (for example, FTELP).

Note: Literal elements are case sensitive, and must be entered in lowercase. Spaces are not permitted, except surrounding a pipe sep-arator.

<international thesaurus filter>

Specifies the name of the international thesaurus filter to be used for linguistic processing. The international thesaurus filter must also be an ELP entry in the FULTEXT.FTC file (for example, FTIET).

BASE

Includes the uninflected baseform of the search term in the list of terms.

COMPOUND

Performs pre-processing normalization of the search term that can generate additional search terms that are the components of a com-pound word. This option is only valid for German, Swedish, and Dutch.

DERIVE

Includes derivations of the search term in the list of terms. Search-Server first uninflects the search term before it performs the deriva-tion. If the INFLECT option is also specified, inflected forms for the derived search terms are included. This option is only valid for En-glish.

EXPAND

Performs pre-processing normalization of the search term that can generate additional search terms by one of the following methods:

• the removal of punctuation such as a slash, hyphen, or parenthesis.

• the removal of periods separating single uppercase letters (assumed to be an abbreviation or acronym)

• the removal of clitics (only valid for French, Italian, Portuguese, and Spanish)

• for French only, the character conversion of `ae' and `oe' to their single-character ligature equivalent

• for German only, the character conversion of `ae', `oe', and `ue' to `a', `o', and `u' with diacritical mark

Page 527: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 527

INFLECT

When specifying the linguistic rules filter, this option includes in-flected forms of the search term. In this case, the baseform of the search term is changed to indicate number, person, mood, or tense.

When specifying the international thesaurus filter, this option in-cludes inflected forms of the synonyms to match the inflection of the search term.

LANG

Overrides the language specified in the dynamic library table. When specifying the linguistic rules language, the default language is En-glish. For the international thesaurus language, the default language is US English. All languages are only valid if you installed the SearchServer Options.

LIMIT

Specifies the maximum number of synonyms for the search term in the search result.

RECAP

Recapitalizes the returned synonyms. If the search term contains an initial uppercase letter, all uppercase letters, or all lowercase letters, the synonyms returned match the capitalization of the search term. If this option is not specified, all synonyms are returned in lowercase letters.

ROOT

Includes the derivational root form of the search term in the list of terms. SearchServer first uninflects the search term and finds the root of the uninflected term. Both the uninflected term and the root are added to the list of terms. For example, if the search term WAR-RANTIES is specified and this option is used, the search term list will contain the following terms:

WARRANTIESWARRANTYWARRANT

SINGLE

Selets a single term that includes all specified linguistic processing. This option is most useful when used in combination with the "base" and "root" options in a SET VECTOR_GENERATOR statement.

SPELL

Returns suggested words if the search term cannot be found. In this case, SearchServer assumes the search term is spelled incorrectly. If

Page 528: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

528 SA-Application Software Expert 5.0

D

you include a number with this option, the number limits the maxi-mum number of suggested words returned. Otherwise, all suggested words are returned. The suggested words are processed according to the other options specified, and all resulting terms are returned.

Description

The THESAURUS function can appear anywhere in a contains predicate where a pattern can be used. The function searches the the-saurus file to locate words and phrases specified in the thesaurus string. If there is a match, SearchServer expands the query to include all the related words and phrases. Otherwise, the original terms of the thesaurus string are used directly in the query.

SearchServer performs case normalization, as appropriate for the ta-ble being searched. This applies both to the thesaurus string to be looked up in the thesaurus, and to the related terms generated by the thesaurus.

When there is more than one linguistic filter specification in a lin-guistic filter list, each filter accepts as input the output of the filter to its left in the list. (Note that this order is the opposite of that for text reader lists.)

For example, to apply international thesaurus expansion followed by expansion to all derived forms, the filter list would be:

'word!ftiet | word!ftelp/derive'

Note: This type of multiple expansion can produce very large que-ries that can take a long time to complete. Therefore, you should only use them when simpler forms don't produce the required re-sults.

Examples

The following example uses the WORD_SYNONYM option of the THESAURUS function to include equivalent words before process-ing the predicate. The thesaurus file being used is defined in the SERVER_INFO system table, that was set by a previously executed SET THESAURUS_NAME statement.

SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS ('DISK',

WORD_SYNONYM)

This example uses the WORD_BROADEN option of the THESAU-RUS function and explicitly specifies a thesaurus file for the word 'DISK':

Page 529: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 529

SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS ('DISK',

WORD_BROADEN, 'BROADEN.FTH')

The following example uses the linguistic rules filter and specifies the INFLECT option for the word 'DISK':

SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS ('DISK', WORD_MODIFY, 'WORD!FTELP/IN-

FLECT')

The following example uses the international thesaurus filter for the word 'DISK':

SELECT * FROM SUPPORT WHERE TEXT LOG CONTAINS THESAURUS ('DISK', WORD_MODIFY, 'WORD!FTIET'

The following example combines the use of the international filter with the linguistic rules filter for the word 'DISK'. The word is first expanded through the international thesaurus filter and then the resulting synonyms are piped through the linguistic rules filter to be inflected by the linguistic rules filter:

SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS ('DISK', WORD_MODIFY, 'WORD!FTIET |

WORD!FTELP/INFLECT')

For More Information

• about how to build a thesaurus file, see Fulcrum SearchServer Data Preparation and Administration.

• about default thesaurus, see the "SET Statements," earlier in this chapter.

• about the SERVER_INFO system table, see Chapter 5, "System Information Tables."

UNPROTECT TABLE Statement

Allows a table to be indexed or dropped, or its schema to be altered or replaced.

Page 530: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

530 SA-Application Software Expert 5.0

D

Syntax

UNPROTECT TABLE <table name>

Keywords and Options

<table name>

A compound identifier which specifies the name and location of the table to be protected. For a complete description of how a table name is formed, see Chapter3, "SearchSQL Language Elements."

Description

An application can determine whether or not a table is protected by viewing the FTT_PROTECTED column value it the TABLES sys-tem table. The protection status is 'TRUE' for protected or locked ta-bles and 'FALSE' for unprotected or unlocked tables.

Indexing failure leaves the table protected. Use the UNPROTECT statement to remove its protection.

Examples

The following example unprotects the SUPPORT table:

UNPROTECT TABLE SUPPORT

This example unprotects the SUPPORT table on the FISH node:

UNPROTECT TABLE FISH.SUPPORT

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about how a table name is formed, see Fulcrum SearchServer Data Preparation and Administration.

UPDATE Statement

Changes the data in one or more rows of a table. There are two forms of the UPDATE statement: positioned and searched.

Positioned Syntax

UPDATE <table name> SET <set clause item>[{, <set clause item>}...] WHERE CURRENT OF <cursor name>

Page 531: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 531

<set clause item> ::= <column name> = {<literal> | NULL}

Searched Syntax

UPDATE <table name> SET <set clause item>[{, <set clause item>}...] [WHERE <search condition>]

<set clause item> ::= <column identifier> = {<literal> | NULL}

CAUTION: SearchServer doesn't support transaction rollback, and updated rows can't be restored to their previous values.

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble that contains the rows being updated.

SET

Introduces the clause that identifies each column whose value in the selected rows will be updated. Any column not specified isn't affect-ed.

<column identifier>

Specifies the name of an existing updatable column in the table. For positioned updates, the column you specify must be present in the working table. This implies that the column must have been speci-fied in the select list of the previously executed SELECT statement.

Also, the order of the rows in the working table must match the exact order of the rows in the table from which they were derived. This means that the SELECT statement you used to create the working ta-ble should not have re-ordered the rows with the ORDER BY clause.

If the column you specify here was affected by rows being re-or-dered before being written to the working table, then the update re-quest won't reference the correct rows in the table. You won't be able to use it to reference a row for a positioned update operation. If any column specified in the optional ORDER BY clause of the previously executed SELECT statement is listed as a column in a po-sitioned UPDATE statement, SQLSTATE 809D3 is generated.

The following reserved columns can't be updated:

Page 532: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

532 SA-Application Software Expert 5.0

D

The FT_FLIST reserved column can't be updated unless the FT_SFNAME reserved column is updated at the same time. This is a characteristic of the supplied text readers and might not apply to custom text readers. Also, the FT_SFNAME reserved column should not be updated to reference a different row type (for example, from an ordinary document to a directory, or vice versa). Otherwise, the row is deleted during the next VALIDATEINDEX operation.

<literal> | NULL

Specifies a value for the column that was named in the set clause list. This value can be a character string literal, numeric literal, date lit-eral, or NULL.

If the NULL value is specified for the column, any existing data is deleted. Otherwise, the value supplied by the literal replaces the ex-isting data. If you delete the data for the FT_SFNAME reserved col-umn, the external document is not deleted. SearchServer only deletes the reference to the file.

The column and the insert value must be of the same data type. The length of the character string literal must be equal to or less than the length of the data type of the column. Otherwise, an error occurs.

You can only insert a character string literal into a column defined with a character string data type. Similarly, you can only assign a date literal to a column defined with a DATE data type, and a numer-ic literal to a numeric column (INTEGER or SMALLINT data type).

For fixed-length character string data types, if the length of the value is less than the length of the column, the value string is padded on the right with spaces. For example, if the column is a character string with a length of 10, and the value is

'BOSTON'

the resultant value is

'BOSTON '

For variable-length character string data types, the resultant string isn't padded.

If a value is specified for FT_SFNAME, but not for FT_FLIST, the latter reserved column is given a default value of 's', that identifies the standard text reader.

FT_CID FT_MTIME FT_TEXT

FT_DATE FT_ORIGINAL_SIZE FT_TEXT_STATUS

FT_DFLAG FT_ROW_STATE FT_TIMESTAMP

FT_FORMAT FT_ROW_TYPE

Page 533: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 533

WHERE CURRENT OF <cursor name>

This clause specifies the row to be updated according to the current position of a cursor. The row that is updated in the table corresponds to the row in the working table on which the cursor is currently po-sitioned. The cursor position doesn't change after this statement is executed.

WHERE <search condition>

An optional clause that specifies the criteria for selecting rows from a table in the form of a condition that is 'TRUE' for each row selected for update. The search condition can be a simple test (predicate) or a combination of several tests. All rows for which the search condi-tion is 'TRUE' are updated.

CAUTION: With the exception of any rows that have changed since indexing, omitting the WHERE clause results in all of the rows in a table being selected and updated.

When updating a periodic table, SearchServer will not update rows that have been changed since the last VALIDATEINDEX statement was performed. In this case, the searched update operation contin-ues, but SearchServer returns SUCCESS_WITH_INFO and SQLSTATE80972 at statement completion to indicate that not all selected rows could be deleted.

However, if you specify the WHERE clause in a searched UPDATE statement in one of the following forms, the referenced row(s) will be updated regardless of whether it has been indexed or changed since the last indexing operation:

WHERE FT_CID = xxx WHERE FT_CID IN (xxx[,...]) WHERE FT_ROW_TYPE = `DATA'

where xxx is the FT_CID value retrieved from a previous SELECT statement. These are the only exceptions to the rule that a row in a periodic table must be indexed for a successful searched UPDATE statement on that row. The row must have been indexed at least once, but the index need not be up to date.

Linguistic processing can be specified in the WHERE clause of a searched UPDATE statement in the same manner as the SELECT statement. If you first execute a SELECT statement that uses linguis-tic processing, do not enable or disable linguistic processing before you update the row.

Page 534: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

534 SA-Application Software Expert 5.0

D

Description

In a positioned UPDATE operation, the row to be updated in the ta-ble is indicated by the cursor position. In this case, you must first ex-ecute a SELECT statement on the same table to obtain a cursor. In addition, your application must call an API function to set or obtain the associated cursor name.

In a searched UPDATE operation, the WHERE clause permits an ar-bitrary search condition that selects for update all of the rows that meet a specific set of conditions. To avoid accidental update of the wrong rows, a searched UPDATE statement can be preceded by a SELECT statement using an identical search condition to determine which rows (or at least how many rows) would be affected.

A searched operation could end before any rows are updated if the amount of time currently specified in the MAX_EXEC_TIME serv-er attribute expires while the table is being searched. In addition, MAX_EXEC_TIME is ignored if it expires after the search portion of the request, but before the update portion. The MAX_SEARCH_ROWS server attribute is ignored during a searched update. The SERVER_REPORT_TIME server attribute is always referenced during an update.

For optimistic concurrency, the use of the FT_TIMESTAMP re-served column in the SELECT and searched UPDATE statements ensures that the row to be updated has not been changed since the last SELECT statement was performed. Optimistic concurrency pro-vides an alternative data integrity approach to row locking. On plat-forms that don't support read-only locks (such as Windows), optimistic concurrency allows a user to edit data for an extended pe-riod, while other users can concurrently view information contained in the row.

If the table was created using the CREATETABLE statement with the IMMEDIATE table parameter in the CREATETABLE clause, the UPDATE statement also updates the index so that the new or changed information is immediately searchable. Otherwise, the new or replacement data isn't searchable until a VALIDATE INDEX statement has been successfully executed, and replaced or deleted data remains searchable if it was previously indexed.

Example

The following example updates the CREATOR and STATUS col-umns with the specified values for the row indicated by the cursor name:

UPDATE SUPPORT SET CREATOR = 'PETER', STATUS = 'CLOSED' WHERE CURRENT OF SQL_CUR00003

Page 535: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 535

The next example updates the CREATOR and STATUS columns with the specified values for all the rows indicated by the search con-dition:

UPDATE SUPPORT SET STATUS = 'CLOSED', DATE_CLOSED = DATE'1993-

11-19' WHERE STATUS CONTAINS 'HOLD'

This example illustrates optimistic concurrency using the FT_TIMESTAMP reserved column. The example assumes that only one row (that has an FT_CID value of 15) matches the WHERE clause criteria:

SELECT STATUS, FT_CID, FT_TIMESTAMP FROM SUPPORT WHERE CREATOR = 'PETER' AND STATUS CONTAINS

'HOLD'

UPDATE SUPPORT SET STATUS = 'CLOSED' WHERE FT_CID = '15' AND FT_TIMESTAMP = '17263'

For More Information

• about the rules governing identifiers, see Chapter 3, "SearchSQL Language Elements."

• about literals, see Chapter 3, "SearchSQL Language Elements."

• about selecting rows to be updated, see the "WHERE Clause," later in this chapter.

• about the effect of the UPDATE statement on reserved columns, see Fulcrum SearchServer Data Preparation and Administration.

VALIDATE INDEX Statement

Builds or updates the indexes for a table.

Syntax

VALIDATE INDEX <table name> [<validate index parameters>...]

<validate index parameters> ::= ABANDON [IMMEDIATE] | ASSUME_TABLE_VALID | BUFFER <unsigned integer> | NO_SPACE_CHECK | REWIND | TEMP_FILE_SIZE <unsigned integer>

Page 536: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

536 SA-Application Software Expert 5.0

D

| UNPROTECT | VALIDATE TABLE

| WILDCARD_OPT <wildcard optimization method>

<wildcard optimization method> ::= 'MINIMIZE_INDEX_OVERHEAD' | 'MINIMIZE_SEARCH_TIME' | 'NONE'

Keywords and Options

<table name>

A compound identifier that specifies the name and location of the ta-ble being indexed. For a complete description of how a table name is formed, see "Compound Identifiers" in Chapter 3, "SearchSQL Language Elements."

ABANDON

This parameter merges the differential index for tables, then re-in-dexes all their rows and associated documents to update the periodic index. This might be required after changing the schema of a table or changing the contents of the stop file named when the table was created. Otherwise, this is only performed as a last resort after a sys-tem malfunction.

If this option is omitted, only rows that are new or modified since the last VALIDATE INDEX statement was executed are indexed. If you also need to expand container rows, you will want to use the VALI-DATE TABLE parameter with ABANDON.

Including the IMMEDIATE option causes the contents of the imme-diate index to be discarded and all rows that were in the immediate index to be indexed into a periodic index. A new initialized immedi-ate index is then created. Note that the table is locked when this statement executes.

ASSUME_TABLE_VALID

Specifies that the state of the table accurately reflects the state of the external document files. In particular, that SearchServer should as-sume that the reserved columns that relate to external text files (namely FT_DATE, FT_SFNAME, FT_FLIST, FT_DNAME, FT_OWNER, or renamed versions of these) accurately describe the external files for the table.

This parameter can reduce indexing time for applications that want to control the document management process for a table, or that know that a particular table has no external document files (no row has a value for the FT_SFNAME column).

Page 537: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 537

The default value is ASSUME_TABLE_VALID. This parameter is ignored if the ABANDON parameter is also specified.

BUFFER <unsigned integer>

Specifies the size in bytes of a temporary buffer that is allocated. If memory is available, a larger buffer decreases indexing time. The default buffer size is environment dependent. This value can't be less than 2048 bytes.

NO_SPACE_CHECK

Specifies that availability of disk space will not be checked. If space is not available when SearchServer attempts to create the necessary temporary files, an error will be reported. If this parameter is not specified, disk space is checked prior to indexing. The indexing re-quest will be terminated if the required disk space is not available.

REWIND

The indexing engine places messages in the log file of each table. The log file contains the following types of messages:

• error messages if indexing fails

• number of documents and words indexed

• number of rows deleted

• size of files

The log file can be viewed only using a facility outside SearchServer such as an operating system text editor. The log file is stored in the same location as the table files.

This parameter clears the log file before indexing is performed. Oth-erwise, the messages accumulate with successive indexing opera-tions.

TEMP_FILE_SIZE <unsigned integer>

Specifies the maximum size in bytes of a temporary sort file. If file space is available, a larger file size decreases indexing time.

The default file size is 8388608 (8 MB). This value can't be less than 2048 bytes.

UNPROTECT

When indexing begins, the table is protected to prevent attempts to index or drop the table. If indexing completes successfully, the table is unprotected. If indexing doesn't complete successfully, then the table is left protected and later attempts to index or drop the table are prevented. The UNPROTECT parameter removes protection on a

Page 538: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

538 SA-Application Software Expert 5.0

D

table before indexing begins.

CAUTION: Always investigate the cause of an indexing failure (by using the log file information) before using this parameter when re-indexing.

VALIDATE TABLE

If VALIDATETABLE is specified, SearchServer performs the fol-lowing modifications to the table based on changes found since the last VALIDATEINDEX statement was executed, before updating the index:

• abandons the differential index for immediate tables, making the rows in it available to indexing to be placed in the periodic index

• deletes each row in which the FT_SFNAME value names a file that no longer exists

• inserts a new row for each new document found in a container row (for example, for each new file found in a directory or any of its sub-directories)

• updates the appropriate reserved columns (or their renames) for a row with an external document file that has been modified

WILDCARD_OPT <wildcard optimization method>

Wildcard optimization changes can be performed using the WILDCARD_OPT validate index parameter. However, in almost all cases you must discard and completely rebuild the index using the ABANDON validate index parameter. The only exception is when the table does not have any wildcard optimization enabled (i.e. NONE), then it is not necessary to specify the ABANDON table pa-rameter when also specifying the MINIMIZE_INDEX_OVERHEAD parameter.

Changes to the table's wildcard optimization method are performed during the resulting indexing operation. The change remains in ef-fect until it is changed with a subsequent VALIDATEINDEX state-ment.

There are three wildcard optimization methods:

Page 539: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 539

The default is NONE. The value specified in this statement affects only subsequent CREATETABLE statements, and CREATESCHE-MA statements that don't explicitly specify the WILDCARD_OPT table parameter in the CREATETABLE clause. The current setting can be determined from the WILDCARD_OPT server attribute in the SERVER_INFO system table.

Description

SearchServer ensures that the indexes accurately reflect the current values in the table, and reorganizes the indexes to optimize search performance. In the case of a periodic table, this procedure includes eliminating obsolete index information associated with rows that have been inserted, deleted, or updated since the last VALIDATE INDEX statement was executed.

Executing this statement can invalidate any working tables that are constructed from the table specified in this statement. Subsequent at-tempts to retrieve data from these tables are allowed, but not recom-mended.

Note 1: If the external text refers to a database (the FT_FLIST refers to a database text reader), then the option selected for the VALI-

MINIMIZE_INDEX_OVERHEAD

This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method.This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.

MINIMIZE_SEARCH_TIME

This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method.

NONE No wildcard optimization is enabled for the table. Performance for prefix and infix wildcard searches is substantially reduced. This is the default unless changed by a SET WILDCARD_OPT statement.

Page 540: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

540 SA-Application Software Expert 5.0

D

DATEINDEX statement is crucial and depends on the implementa-tion of the database text reader. For a complete description about text readers, see the Fulcrum SearchServer Text Reader Developer's Guide.

Note 2: In Windows 95 environments, a successful execution of a VALIDATEINDEX statement shrinks the size of the immediate in-dex file to its minimum size.

Examples

The following example indexes the SUPPORT table without using any validate index parameters (ASSUME_TABLE_VALID is the default):

VALIDATE INDEX SUPPORT

The following example indexes the SUPPORT table using the spec-ified validate index parameters. In this example, SearchServer would re-index the entire table, allocate the specified amount of space for the indexing buffer, re-initialize the log file, assume the availability of the specified amount of file space for sorting purpos-es, and unlock the table if necessary.

VALIDATE INDEX SUPPORT ABANDON BUFFER 51200 REWIND TEMP_FILE_SIZE 4096000 UNPROTECT

For More Information

• about the rules governing identifiers, see "Identifiers" in Chapter 3, "SearchSQL Language Elements."

• about system tables, see Chapter 5, "System Information Tables."

• about indexing external databases, see the Fulcrum SearchServer Text Reader Developer's Guide.

• about indexing and modifying data, see Fulcrum SearchServer Data Preparation and Administration.

VCC_RULES Function

Returns the rules necessary to return the text location of matched search terms.

Page 541: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 541

Syntax

VCC_RULES()

Description

This function returns a text stream as a list of 255 integers. There is one integer for each character in the current application character set (except for the NULL character (character 0)). The integer indicates the amount to increment for that character. In most cases the integer is 0 or 1, but might be 2 if that character translates to a multiple char-acter sequence in FTICS.

The rules are specific for each table. All the rows from the same ta-ble return the same value for this function. The integers are returned in a character string without separating blank spaces.

This function can only be specified in the select list of a SELECT statement. It is beneficial when using the SearchServer search en-gine for document display within document viewers. For more infor-mation about document viewers, see Fulcrum SearchServer Document Viewer Integration.

Example

The following example returns a working table containing the rules for all the search terms matched by the WHERE clause:

SELECT VCC_RULES(), COMPANY, PRIORITY, STATUS, TEXT_LOG

FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT'

For More Information

• about the SELECT statement, see the "SELECT Statement," later in this chapter.

• about document viewers, see Fulcrum SearchServer Document Viewer Integration.

WHERE Clause

Specifies the search criteria in a statement.

Syntax

WHERE <search condition> <search condition> ::= <Boolean term> | <search condition> OR <Boolean term>

Page 542: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

542 SA-Application Software Expert 5.0

D

<Boolean term> ::= <Boolean factor> | <Boolean term> AND <Boolean factor>

<Boolean factor> ::= [NOT] <Boolean primary>

<Boolean primary> ::= <predicate> | (<search condition>)

<predicate> ::= <back reference predicate> | <between predicate> | <comparison predicate> | <contains predicate> | <in predicate> | <is_about predicate> | <like predicate>

Keywords and Options

WHERE <search condition>

Defines search criteria that are applied row by row to the data in a table to determine which rows are to be selected for retrieval, update, or deletion. A row is selected depending on whether the search con-dition evaluates to 'TRUE' or 'FALSE' when applied to the rows in a particular column, zone, or when applied to previously selected rows. If the search condition evaluates to 'TRUE', the row is selected for subsequent processing.

The search condition can be as simple as a single predicate, or it can be an arbitrarily complex expression. A search expression consists of sub-expressions (terms, factors, and primaries) combined with Boolean operators (OR, AND, NOT).

Parentheses can be used to group sub-expressions within a WHERE clause. Expressions within parentheses are evaluated first. When the order of evaluation is not specified by parentheses, the following rules apply:

1. NOT is applied before AND

2. AND is applied before OR

3. Operators of the same type (OR, AND) follow the associative rule. The associativity of the OR and AND operators means that Search-Server can apply operators of the same type in any order to opti-mize execution without affecting the value of the expression. Boolean operators are defined by the following truth tables:

Page 543: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 543

When the NOT Boolean operator is applied to a Boolean primary that would otherwise evaluate to 'FALSE' for a particular row (no match in the row), the resulting truth value is 'TRUE', even if the row contains no data (NULL value).

The truth value of a search condition is derived by applying the spec-ified Boolean operations to the truth values of predicates. If no Bool-ean operations are specified, the result of the search condition is the result of the single predicate.

<predicate>

Specifies a condition that can be evaluated on a row to give a truth value of 'TRUE' or 'FALSE'. In this context, a predicate can be one of the following:

<back reference predicate> <between predicate> <comparison predicate> <contains predicate> <in predicate> <is_about predicate> <like predicate>

Table 4-2 summarizes the function of each predicate:

AND 'TRUE' 'FALSE'

'TRUE' 'TRUE' 'FALSE'

'FALSE' 'FALSE' 'FALSE'

OR 'TRUE' 'FALSE'

'TRUE' 'TRUE' 'TRUE'

'FALSE' 'TRUE' 'FALSE'

NOT 'TRUE' 'FALSE'

'FALSE' 'TRUE'

Predicate The value is 'TRUE' when

back reference the row was selected by a previous SELECT statement

between the column or zone value is within a specified range

comparison the column or zone value satisfies the specified constraint

contains the text column or zone contains words and phrases matching a combination of patterns

in the column or zone contains one of the values in the specified list

is_about the text column or zone contains any word in the specified string, phrase, or text vector

Page 544: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

544 SA-Application Software Expert 5.0

D

Table 4-2Functions of Predicates Each predicate is described in a separate section of this chapter.

Description

The WHERE clause specifies the criteria for selecting indexed rows from a table for further processing. A row is selected when the search condition evaluates to 'TRUE' when applied to that row.

Omitting the WHERE clause results in all the rows in a table being selected and processed.

Note: If a stop word is used in a simple search that contains a WHERE clause, no documents are found and the SQLSTATE 80944 "Required search term not found" is returned. In all other searches involving stop words, the results returned are as if the stop word occurred in all documents.

Examples

The following examples show how to formulate a WHERE clause in a SELECT statement from its simplest form to more complex state-ments that use clauses, predicates, Boolean operators, and functions:

SELECT COMPANY PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT% FILTER_' SELECT COMPANY PRIORITY, STATUS, TEXT_LOG FROM SUPPORT WHERE TEXT_LOG CONTAINS 'DOCUMENT', 'FILTER' WITHIN 10 CHARACTERS OF 'PROCESSING' IN_ORDER SELECT PROBLEM_NUMBER, COMPANY, PRIORITY, STATUS FROM SUPPORT WHERE PRIORITY > = 2 SELECT PROBLEM_NUMBER, LAST_MODIFIED, COMPANY FROM SUPPORT WHERE CURSOR SQL_CUR00001 WITHOUT CONTEXT SELECT RELEVANCE() AS HITS, PROBLEM_NUMBER, COM-

PANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG IS_ABOUT VEC1 ORDER BY PROBLEM_NUMBER, COMPANY, PRIORITY, HITS

DESC SELECT PROBLEM_NUMBER, CREATOR, SUBJECT FROM SUPPORT WHERE CREATOR NOT CONTAINS 'PETER' & 'MARIE'

like the text column or zone contains a word or phrase matching the specified pattern.

Page 545: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

SearchSQL Statements

Text Retrieval Guide 545

SELECT PROBLEM_NUMBER, PRIME_CONTACT, STATUS, CREATOR

FROM SUPPORT WHERE (CREATOR CONTAINS 'PETER' OR STATUS CON-

TAINS 'CLOSED') AND PRIME_CONTACT CONTAINS 'MONTAG' SELECT * FROM SUPPORT WHERE TEXT_LOG CONTAINS THESAURUS('DISK',

WORD_SYNONYM) SELECT RELEVANCE() AS HITS, PROBLEM_NUMBER, COM-

PANY, PRIORITY FROM SUPPORT WHERE TEXT_LOG CONTAINS 'STATEMENT%', WEIGHT 10,

'HANDLE%' WEIGHT 5 WITHIN 10 CHARACTERS OF 'DOCUMENT', 'FILTER'

IN_ORDER ORDER BY PROBLEM_NUMBER, COMPANY, PRIORITY, HITS

DESC SELECT COMPANY FROM SUPPORT WHERE COMPANY LIKE 'G%' AND STATUS CONTAINS

'CLOSED' SELECT COMPANY FROM SUPPORT WHERE TEXT_LOG IN ('SOFTWARE', 'HARDWARE') SELECT PROBLEM_NUMBER FROM SUPPORT WHERE COMPANY = 'OREO'

For More Information

• about searching, see Chapter 2, "The Search Process."

• about predicates, see the individual sections in this alphabetically organized chapter

Page 546: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

546 SA-Application Software Expert 5.0

D

Page 547: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 547

Chapter 5:

System Information Tables

This chapter provides detailed information about the following sys-tem tables:

• TABLES Table

• COLUMNS Table

• ZONES Table

• SEARCH_TERMS Table

• SERVER_INFO Table

Introduction

System tables are (with one exception) read-only tables from which a SearchServer application can retrieve information about the tables, columns, zones, and other aspects of the data source to which it is connected. All system tables are part of a global SearchServer infor-mation schema that describes the set of tables associated with the data source.

The following tables make up the global SearchServer information schema:

TABLES This table contains one row for each table in the set of tables. The application can search this table using the SELECT statement to determine the tables it can process, which table parameters were used when each table was created, and the contents of the indexing log for each table.

COLUMNS This table contains one row for each column in each table in the set of tables. The application can search this table using the SELECT statement to determine the definition of one or more columns for a table.

ZONES This table contains one row for each zone name/number combination in each table in the set of tables. The application can search this table using the SELECT statement to determine the definition of one or more zones for a table.

SEARCH_TERMS This table describes the set of searchable terms in all the columns and zones in a table. Each row provides information about one term in one zone or column of one table.

Page 548: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

548 SA-Application Software Expert 5.0

D

How to Use System Tables

System tables can be referenced only in a SELECT statement. They can't be used in any other SearchSQL statement. There are other re-strictions on how and where these system table names can be used when searching. For example, when searching a system table, the FROM clause can't contain a UNION clause.

The system table must be the only table referenced in the SELECT statement, therefore you cannot use a UNION clause. As a result of this restriction on the FROM clause, the columns referenced in the select list and the WHERE clause can contain only the valid column names of the system table being searched.

You can use the asterisk option (*) in place of the select list in the SELECT statement. However, it is more efficient to specify the col-umns of the system table in your SELECT statement. (All the col-umns in the system tables are described in this chapter.)

All columns that have an INTEGER data type are defined with the VALUE index mode and can be searched using any comparison op-erator in a comparison predicate.

Usually, the contains predicate is used in the WHERE clause of a SELECT statement that references a system table for columns de-fined with NORMAL or LITERAL index mode. SearchServer al-lows you to use the is_about predicate, but this predicate is inappropriate for searching the small patterns and single words that are contained in the system tables.

Each time a SELECT statement that references a system table is ex-ecuted, the information in the system table is compiled and a new ta-ble is created. Therefore, the back reference predicate can't be used in a SELECT statement that references a system table because the predicate must refer to the same table as the initial search.

Optimizing System Table Searches

There are no restrictions on the WHERE clause of a SELECT state-ment when referencing a system table. However, certain optimiza-tions are performed for specific predicates.

In particular, most contains predicates that refer to the following five columns can be optimized:

SERVER_INFO This table describes the state of the current connection to a data source that can include one or more servers. Each row provides information about one attribute of the connection. This table is the only system table that is updatable.

Page 549: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 549

• TABLE_QUALIFIER

• TABLE_NAME

• COLUMN_NAME

• ZONE_NAME

• SERVER_ATTRIBUTE

To qualify for optimization, a predicate must adhere to the following rules:

• A predicate must either be the only element in the WHERE clause or be combined with other predicates using the AND Boolean operator. If the OR or NOT Boolean operators are applied to the predicate, it can't be optimized.

• In a contains predicate, only a comma delimited word list is allowed. Wildcards and the NOT Boolean operator (applied to the word list), can be used, but the thesaurus function and the Boolean operator symbols (&, |, ~) can't be optimized.

• The predicate is the first for that column, encountered when reading the WHERE clause from left to right, that satisfies these rules.

• The search terms should not contain wildcards. This applies for all tables except the SEARCH_TERMS system table. For more information, see "Using String Wildcards in a Search" in Chapter 3, "SearchSQL Language Elements." of this manual.

When there are multiple predicates in the WHERE clause, all the predicates must follow all these rules in order to have the entire search optimized. Otherwise, the SELECT statement will exe-cute normally.

To achieve faster retrieval of the TABLES, COLUMNS, and ZONES system tables, specify explicit values for the TABLE_QUALIFIER and TABLE_NAME columns in the WHERE clause.

The following is an example of a SELECT statement containing op-timizable predicates:

SELECT COLUMN_NAME, INDEX_MODE, CHAR_MAX_LENGTH, FIELD_NUMBERFROM COLUMNSWHERE TABLE_QUALIFIER CONTAINS 'FISH' AND TABLE_NAME CONTAINS 'SUPPORT' AND COLUMN_NAME CONTAINS 'COMPANY', 'SUBJECT'

The predicates in the following example are non-optimizable be-cause:

Page 550: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

550 SA-Application Software Expert 5.0

D

• the first predicate uses the | Boolean symbol instead of a word list

• the NOT Boolean operator is applied to the second predicate

• the second and third predicates are combined using the OR Boolean operator

• the third predicate is not a simple test for equality and doesn't refer to one of the five optimizable columns.

Any one of these reasons is sufficient to make the entire search non-optimizable.

SELECT COLUMN_NAME, INDEX_MODE, CHAR_MAX_LENGTH, FIELD_NUMBERFROM COLUMNSWHERE TABLE_QUALIFIER CONTAINS 'FISH' | 'FOWL' AND (NOT COLUMN_NAME CONTAINS 'TEXT%' OR FIELD_NUMBER > 32)

Note: The system table definitions outlined in this chapter are pro-vided for reference purposes only. System tables are created auto-matically when referenced in a SELECT statement. It isn't possible to create any of these system tables using the CREATE SCHEMA statement.

Selecting Tables for a Search

Your application may allow users to select the tables that they want to use for a particular search session. How you choose to have the tables identified by the user is the responsibility of the application. However, you can use the TABLES system table to determine the names of the tables and their characteristics.

How SearchServer Locates Tables

SearchServer locates tables by examining the settings of the follow-ing parameters of the current data source:

• FTNPATH

• FULSEARCH

These parameters are set during configuration of SearchServer. For more information about setting the FULSEARCH parameter, see Fulcrum SearchServer Data Preparation and Administration. For more information about setting both parameters see Fulcrum SearchServer Getting Started for your platform.

Page 551: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 551

The TABLES System Table

Each row in the TABLES system table names a table that is visible to the user. It contains the following columns:

Column Name Data Type Index Mode DescriptionTABLE_QUALIFIER VARCHAR (128) LITERAL identifies the location

of the table: the nodename of the remote server where the table resides, or directory name if the table can be accessed without using a remote server

TABLE_OWNER VARCHAR (32) LITERAL always NULL

TABLE_NAME VARCHAR (32) LITERAL name of a table

TABLE_TYPE VARCHAR (254) LITERAL identifies the type of the table; value is 'BASE TABLE' or 'VIEW'

REMARKS VARCHAR (254) LITERAL always NULL

FTT_PROTECTED CHAR (5) NORMAL 'TRUE ' if the table is currently locked

'FALSE' if the table is not currently locked

FTT_BASEPATH VARCHAR (260) NORMAL base directory from which document files and directories are located; NULL for a remote table

FTT_INDEXDIR VARCHAR (260) NORMAL directory containing the table's data and index files; NULL for a remote table

FTT_WORKDIR VARCHAR (260) NORMAL work directory used for temporary files during VALIDATE INDEX; NULL for a remote table

FTT_STOPFILE VARCHAR (260) NORMAL name of the operating system file containing the list of words excluded from indexing; NULL for a remote table

Page 552: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

552 SA-Application Software Expert 5.0

D

Table 5-1 TABLES System Table

The TABLE_QUALIFIER and TABLE_NAME columns together comprise the unique key of the TABLES table. Therefore, no two rows are identical in both these columns.

Searching the TABLES System Table

You can retrieve information from the TABLES system table by ex-ecuting a simple SELECT statement. Typically, you would retrieve information, such as

• a list of all the tables in a current data source

• a list of tables that meet specific criteria

• attributes of one or more specific tables that have been identified by name

It is important to remember that the TABLE_QUALIFIER, TABLE_OWNER, TABLE_NAME, and TABLE_TYPE columns are defined with an index mode of LITERAL. The remaining col-umns (except for REMARKS) are known as table parameter col-umns. They have column names that begin with FTT_ and are defined with NORMAL index mode.

Pathnames are always defined with NORMAL index mode so that each individual component can be searched separately. However, you can still search for a complete pathname. For example:

FTT_IMMEDIATE CHAR (5) NORMAL 'TRUE ' if a differential index is associated with the table; or 'FALSE' if only a periodic index exists

FTT_NOLOCKING CHAR (5) NORMAL 'TRUE ' if row-locking is inhibited during searches

FTT_NORMALIZATION

VARCHAR (32) LITERAL status of case normalization

FTT_LOCATION VARCHAR (260) NORMAL directory containing the table's configuration file

FTT_INDEXLOG APVARCHAR (2147483647)

NORMAL contents of the indexing log

FTT_WILDCARD_OPT

VARCHAR (32) LITERAL status of wildcard optimization for the table

Column Name Data Type Index Mode Description

Page 553: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 553

SELECT TABLE_NAMEFROM TABLESWHERE FTT_LOCATION CONTAINS '/net/fish/fultext'

Even though the words net, fish, and fultext are indexed sep-arately, their concatenation in the contains predicate ensures that only tables in the directory /NET/FISH/FULTEXT are reported.

The following table parameters can be retrieved for any table named in the TABLE_QUALIFIER column. However, if the associated ta-ble resides on a remote node, the column values will be NULL:

FTT_BASEPATHFTT_INDEXDIRFTT_WORKDIRFTT_STOPFILE

Optimizing TABLES Table Searches

You can reduce the amount of time it takes to create a working table from the TABLES table by refraining from naming table parameter columns in the select list or WHERE clause of the SELECT state-ment or by naming a table explicitly in the WHERE clause of a SE-LECT statement. However, if this is required, use the following syntax:

SELECT <table parameter column>[{, <table parameter column>...}]FROM TABLESWHERE TABLE_QUALIFIER CONTAINS <literal>[{, <literal>...}]AND TABLE_NAME CONTAINS <literal>[{, <liter-

al>...}]

Note: The literals in the preceding syntax shouldn't contain wildcards.

The following example retrieves all columns (including table parameter columns), and for fast data retrieval, it specifies a particular table.

SELECT TABLE_QUALIFIER, FTT_BASEPATH, FTT_INDEXDIRFROM TABLESWHERE TABLE_QUALIFIER = '/net/fish/fultext' AND TABLE_NAME CONTAINS 'SUPPORT'

This example retrieves the names of all accessible tables.

SELECT TABLE_QUALIFIER, TABLE_NAMEFROM TABLES

Page 554: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

554 SA-Application Software Expert 5.0

D

The execution time for either of these examples is minimal com-pared to a more general search such as:

SELECT *FROM TABLES

Note: When selecting only the TABLE_NAME column, Search-Server might return rows for tables that can't be accessed through a subsequent SELECT, INSERTINTO, DROPTABLE, or VALI-DATEINDEX statement. This occurs when any of the following conditions apply:

• a .CFG file exists, but some of the other table files are missing

• the table is a UNIX table whose name contains uppercase characters

• the table is a UNIX table whose name contains special characters (for example, a+b).

Tables not created by SearchServer might be reported in the TA-BLES system table, yet not be available to other SearchSQL state-ments. To ensure that only SearchServer tables are reported, include the TABLE_TYPE column in the select list.

The COLUMNS System Table

Each row in the COLUMNS system table names a column in a par-ticular table that is visible to the user, and lists the column's at-tributes. It contains the following:

Column Name Data Type Index Mode DescriptionTABLE_QUALIFIER VARCHAR (128) LITERAL identifies the location

of the table: the nodename of the remote server where the table resides, or directory name if the table can be accessed without using a remote server

TABLE_OWNER VARCHAR (32) LITERAL always NULL

TABLE_NAME VARCHAR (32) LITERAL name of a table

COLUMN_NAME VARCHAR (32) LITERAL name of a column in the specified table

DATA_TYPE INTEGER VALUE a number that represents the data type of the column as referenced by the API

Page 555: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 555

Table 5-2 COLUMNS System Table

The TABLE_QUALIFIER, TABLE_NAME, and COLUMN_NAME columns together comprise the unique key of the COLUMNS table. Therefore, no two rows are identical in all

TYPE_NAME VARCHAR (254) NORMAL identifies the data type of the column as used in SearchSQL; it can be one of: DATE CHAR VARCHAR INTEGER APVARCHAR SMALLINT

PRECISION INTEGER VALUE precision of approximate numeric data types; always NULL

LENGTH INTEGER VALUE maximum length for a character data type column; is NULL for other data types

SCALE SMALLINT VALUE total number of significant digits to the right of the decimal point; zero for INTEGER and SMALLINT types, otherwise NULL

RADIX SMALLINT VALUE radix of PRECISION; always NULL

NULLABLE SMALLINT VALUE a value indicating whether the column accepts NULLs from INSERTINTO and UPDATE statements 1 if NULLs are permitted, 0 if not

REMARKS VARCHAR (254) NORMAL descriptive information about the column; always NULL

INDEX_MODE VARCHAR (32) NORMAL indexing mode of the column; one of NORMAL, VALUE, LITERAL, or NONE

FIELD_NUMBER INTEGER VALUE field number of the column

Column Name Data Type Index Mode Description

Page 556: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

556 SA-Application Software Expert 5.0

D

three of these columns.

Searching the COLUMNS System Table

The COLUMNS system table can contain the descriptions of col-umns of more than one table. However, for faster data retrieval, ap-plications should specify explicit values in the WHERE clause for the TABLE_QUALIFIER and TABLE_NAME columns.

It is also more efficient to specify the names of the tables in a word list of a contains predicate rather than in separate contains predicates when more than one table is specified.

For instance, following example:

SELECT COLUMN_NAME, DATA_TYPE, FIELD_NUMBERFROM COLUMNSWHERE TABLE_NAME CONTAINS 'TABLE1', 'TABLE2', 'TABLE3'AND TABLE_QUALIFIER = '/net/fish/fultext'

retrieves data faster than this example:

SELECT COLUMN_NAME, DATA_TYPE, FIELD_NUMBERFROM COLUMNSWHERE (TABLE_NAME CONTAINS 'TABLE1' OR TABLE_NAME CONTAINS 'TABLE2' OR TABLE_NAME CONTAINS 'TABLE3')AND TABLE_QUALIFIER = '/net/fish/fultext'

SearchServer can recognize the first example directly and optimizes the creation of the working table. Both examples produce the same working table. However, the working table derived from the first ex-ample contains no context information (match codes).

Note: The use of the '/NET/FISH/FULTEXT' notation in the exam-ples means that this SELECT statement applies to tables on the spec-ified directory only. Tables on other local or remote directories don't appear in the result list.

In a WHERE clause, if both the TABLE_NAME and TABLE_QUALIFIER columns specify a word list for their associ-ated contains predicates, SearchServer matches each TABLE_NAME with every TABLE_QUALIFIER supplied. SearchServer verifies that each table exists before writing the row to the working table. This validation is performed even if the TABLE_NAME and TABLE_QUALIFIER together specify a sin-gle table.

For fast retrieval, use the following form of the SELECT statement syntax:

Page 557: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 557

SELECT <select list>FROM COLUMNSWHERE TABLE_QUALIFIER CONTAINS <pattern> [{,<pat-tern>}...] AND TABLE_NAME CONTAINS <pattern>[{, <pattern>} ...][AND COLUMN_NAME [NOT] CONTAINS <pattern> [{, <pattern>} ...]]

For example:

SELECT COLUMN_NAME, INDEX_MODE, CHAR_MAX_LENGTH, FIELD_NUMBERFROM COLUMNSWHERE TABLE_QUALIFIER CONTAINS 'FISH', 'FOWL' AND TABLE_NAME CONTAINS 'SUPPORT' AND COLUMN_NAME NOT CONTAINS 'FT\_%', 'TEXT%'

The COLUMNS system table contains a row for each application-defined column as well as for all reserved columns that have been renamed or explicitly named in the schema. You can use the con-tains or comparison predicates to qualify rows using the COLUMN_NAME column. For example, the following WHERE clause retrieves data for only the reserved columns that have been named in the schema:

WHERE COLUMN_NAME CONTAINS 'FT\_%'

This WHERE clause retrieves data for only the application-defined columns:

WHERE COLUMN_NAME NOT CONTAINS 'FT\_%'

This WHERE clause retrieves data for specific application-defined columns:

WHERE COLUMN_NAME CONTAINS 'COMPANY', 'STA-TUS', 'CREATOR'

The ZONES System Table

Each row in the ZONES system table names a zone in a table that is visible to the user. It contains the following columns:

Page 558: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

558 SA-Application Software Expert 5.0

D

Table 5-3 ZONES System Table

The TABLE_QUALIFIER, TABLE_NAME, ZONE_NAME, and ZONE_NUMBER columns together comprise the unique key of the ZONES table. Therefore, no two rows are identical in all four of these columns.

Searching the ZONES System Table

If a single zone name contains multiple zone numbers, there will be multiple rows in the ZONES system table, differing only in the con-tents of the ZONE_NUMBER column. To uniquely identify a par-ticular row, you must specify the following columns in the select list:

• TABLE_QUALIFIER

• TABLE_NAME

Column Name Data Type Index Mode DescriptionTABLE_QUALIFIER VARCHAR (128) LITERAL identifies the location

of the table: the nodename of the remote server where the table resides, or directory name if the table can be accessed without using a remote server

TABLE_OWNER VARCHAR (32) LITERAL always NULL

TABLE_NAME VARCHAR (32) LITERAL name of a table

COLUMN_NAME VARCHAR (32) LITERAL name of the associated column

ZONE_NAME VARCHAR (32) LITERAL name of a zone in the table specified by TABLE_NAME and contained in the column named by COLUMN_NAME

ZONE_NUMBER INTEGER VALUE zone number (identifier) as given in the original CREATE ZONE clause

INDEX_MODE VARCHAR (32) NORMAL indexing mode of the zone; one of NORMAL, VALUE, LITERAL, or NONE

FIELD_NUMBER INTEGER VALUE field number of the associated column

Page 559: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 559

• COLUMN_NAME

• ZONE_NAME

• ZONE_NUMBER

For example, the following search retrieves two rows differing only in the value of the ZONE_NUMBER column:

SELECT COLUMN_NAME, ZONE_NAME, ZONE_NUMBER, INDEX_MODEFROM ZONESWHERE TABLE_QUALIFIER = '.' AND TABLE_NAME = 'SUPPORT' AND COLUMN_NAME = 'TEXT_LOG' AND ZONE_NAME = 'DATE_AND_TIME'

If the working table could be displayed, it would look like this:

The SEARCH_TERMS System Table

Each row in the SEARCH_TERMS system table provides informa-tion about one term in one zone or column of one table that is visible to the user. It contains the following columns:

COLUMN_NAME

ZONE_NAME

ZONE_NUMBER

INDEX_MODE

TEXT_LOG DATE_AND_TIME 201 NORMAL

TEXT_LOG DATE_AND_TIME 202 NORMAL

Column Name Data Type Index Mode DescriptionTABLE_QUALIFIER VARCHAR (128) LITERAL identifies the location

of the table: the nodename of the remote server where the table resides, or directory name if the table can be accessed without using a remote server

TABLE_NAME VARCHAR (32) LITERAL name of a table (views are not valid)

TERM VARCHAR (255) NORMAL word from the index

COLUMN_NAME VARCHAR (32) LITERAL name of the associated column

ZONE_NUMBER INTEGER VALUE zone number (identifier) as given in the original CREATE ZONE clause

Page 560: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

560 SA-Application Software Expert 5.0

D

Table 5-4 SEARCH_TERMS System Table

The TABLE_QUALIFIER, TABLE_NAME, TERM, COLUMN_NAME, and ZONE_NUMBER columns together com-prise the unique key of the SEARCH_TERMS table. Therefore, no two rows are identical in all five of these columns.

Searching the SEARCH_TERMS System Table

The SELECT statement is used to search the SEARCH_TERMS ta-ble. This form of the SELECT statement has two valid forms one to provide order by terms and another to provide order by terms with ordered zone information for each term.

For both types of searches, only terms indexed with NORMAL or LITERAL index modes are returned. Any terms indexed with VAL-UE or NONE isn't returned. When a term is returned, no highlight-ing information is included.

Note: You can only use the multiple-character wildcard (%) as a prefix, suffix, or infix.

All terms are returned in uppercase. Depending on the case normal-ization used, accents on vowels might not be returned. Only infor-mation from the periodic index of a table is returned. You should ensure that the table's index is current before using this facility. In addition, while browsing the SEARCH_TERMS table, no other ap-plication can update the index for the table being used. However, the table is still searchable.

Ordering by terms:

SELECT TERM, SUM(OCCURRENCES)FROM SEARCH_TERMS WHERE [ TABLE_QUALIFIER CONTAINS <literal> AND ]TABLE_NAME CONTAINS <literal> <zone specifica-tion>[ AND TERM CONTAINS <pattern>]ORDER BY TERM GROUP BY TERM

ROW_COUNT INTEGER NONE count of the number of rows in the table in which this term is indexed in this zone

OCCURRENCES INTEGER NONE total number of occurrences of this term in this ZONE_NUMBER in the entire table

Column Name Data Type Index Mode Description

Page 561: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 561

The two types of searches that can be performed on this system table provide you with a quick access to the information required to im-plement a word wheel. In this search, the ORDERBYTERM-GROUPBYTERM clause must be included. The rows in the working table are sorted in ascending order with one row for each unique term. The ordering is based on the term as expressed in FT-ICS, even though the term is returned in the application character set. The ordering of the term is not affected by any collation se-quence setting.

Stop words are included in the working table for this type of search. They are identified by a value of zero in the SUM(OCCURRENC-ES) column.

Ordering by terms and zones:

SELECT TERM, ZONE_NUMBER, ROW_COUNT, OCCURRENCES FROM SEARCH_TERMS WHERE [ TABLE_QUALIFIER CONTAINS <literal> AND ]TABLE_NAME CONTAINS <literal> <zone specifica-

tion>[ AND TERM CONTAINS <pattern>]ORDER BY TERM, ZONE_NUMBER <zone specification> ::= AND <column name clause> | AND <zone number clause> | /* */

<column name clause> ::= COLUMN_NAME CONTAINS <literal> [{, <liter-al>}...] | COLUMN_NAME LIKE <literal> | COLUMN_NAME = <literal>

<zone number clause> ::= ZONE_NUMBER IN (<numeric literal>> [{, <nu-meric literal>}...]) | ZONE_NUMBER BETWEEN <numeric literal> AND <numeric literal>

When using this type of search, the ORDERBYTERM, ZONE_NUMBER clause must be included. The working table con-tains one row for each unique combination of term and zone number. The rows are sorted first by term (in the same manner as the first type of search) and then by zone number (in ascending numeric order) within each term. The ordering of the term is not affected by any col-lation sequence setting.

Stop words are not included in the working table for this type of search. You can use ExecSQL to experiment with some searches on your table, but to build a useable word wheel, you'll need to use the SearchSearver C API )or another of Fulcrum's SearchServer prod-ucts) in an application.

Page 562: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

562 SA-Application Software Expert 5.0

D

The Fulcrum SearchServer C API Reference Manual (or corre-sponding users manual provided with you r SearchBuilder product) describes the syntax for using the SEARCH_TERMS table when re-trieving.

When designing the word wheel feature of your application, note the following restictions:

• MAX_SEARCH_ROWS is not applied to the result list.

• Retrieval performance improvements made in this release of SearchServer do not apply

• The row set size used with SQLExtendedFetch must be 1.

• The prefetch feature of SQLFwtch cannot be applied.

• The SQLGetData must use a buffer large enough to get the entire term with hte first retieval request. SQLSetColPosition can't be used.

• Due to the internal structure of the table, positioning backwards (using SQLPosition) is slower than moving forwards.

The SERVER_INFO System Table

Each row in the SERVER_INFO system table provides information about one server attribute as it relates to the current connection. It contains the following columns:

Table 5-5 SERVER_INFO System Table

The SERVER_INFO system table is the only system information ta-ble that can appear in SearchSQL statements (except the SELECT statement).

A searched UPDATE statement of the following form: UPDATE SERVER_INFO SET ATTRIBUTE_VALUE = value WHERE SERVER_ATTRIBUTE = 'attribute'

is treated as equivalent to the following SET statement for those at-tributes for which SET statements exist:

Column Name Data Type Index Mode DescriptionSERVER_ATTRIBUTE VARCHAR (254) LITERAL an attribute of the

server

ATTRIBUTE_VALUE VARCHAR (254) LITERAL value of the attribute identified by SERVER_ATTRIBUTE

Page 563: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 563

SET ATTRIBUTE_VALUE 'value'

Those attributes marked by an asterisk in the following list are read-only and trying to modify them results in an error. In addition, any attempt to set a writable attribute to an inappropriate value results in an error (for example, an attribute other than 'TRUE' or 'FALSE' for REFERENCE_SPOOLING, or a non-numeric value for SERVER_REPORT_TIME).

No statement other than a SET or a searched UPDATE can change server attributes in the SERVER_INFO table. The SERVER_INFO system table contains one row for each possi-ble SERVER_ATTRIBUTE column. The possible values are:

For the FTNPATH, FULCREATE, FULSEARCH, and FULTEMP rows, the ATTRIBUTE_VALUE column contains a string that rep-resents the current setting of the data source parameter correspond-ing to the SERVER_ATTRIBUTE column.

For all of the other rows, the ATTRIBUTE_VALUE column con-tains a string that represents the current setting of the corresponding server attribute in the SERVER_ATTRIBUTE column. The possi-ble ATTRIBUTE_VALUE strings for each server attribute are de-scribed in the remainder of this chapter.

BASEPATH

This attribute specifies the base directory location for document files

BASEPATH NOLOCKING

BLOCKSIZE NORMALIZATION

CHARACTER_SET POSITIONING_UNIT

CHARACTER_VARIANT REFERENCE_SPOOLING

CHECK_TEXT_STATUS RELEVANCE_METHOD

COLLATION_SEQUENCE RIGHT_MARGIN

FORMAT_TEXT *ROW_LENGTH

FRAGMENTED SEARCH_MEMORY_SIZE

*FTNPATH SERVER_REPORT_TIME

*FULCREATE SHOW_MATCHES

*FULSEARCH SHOW_SGR

*FULTEMP STOPFILE

*IDENTIFIER_CASE TERM_GENERATOR

*IDENTIFIER_LENGTH THESAURUS_NAME

IMMEDIATE *TXN_ISOLATION

INDEXDIR *USERID_LENGTH

*LAST_ROWID_INSERTED VECTOR_GENERATOR

*MATCH_CODE_END *VERSION

*MATCH_CODE_START *WILDCARD

MAX_EXEC_TIME WILDCARD_OPT

MAX_SEARCH_ROWS WORKDIR

Page 564: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

564 SA-Application Software Expert 5.0

D

and directories. The value is a character string literal specified in the last SETBASEPATH statement executed or BASEPATH table pa-rameter. The default value, two quotation marks (''), specifies no base path.

BLOCKSIZE

This attribute is provided for backward compatibility with previous versions of SearchServer. The value is '1024'.

CHARACTER_SET

This attribute specifies the name of a character set used by the appli-cation. All character data returned by SearchServer, and all character data (such as SearchServer SQL statements) passed into Search-Server will be assumed to be in this character set. The initial value indicates the default character set for your environment.

See Appendix A, "Character Sets," for a list of the values and a de-tailed description of these character sets.

Note: The case of the literal in the SET CHARACTER_SET state-ment is relevant. They must be uppercase.

CHARACTER_VARIANT

This attribute specifies the name of a character variant rules file. This value is the character string literal specified in the last SETCHARACTER_VARIANT statement executed. The initial val-ue, an empty string (' '), disables character variant generation.

CHECK_TEXT_STATUS

This attribute specifies whether or not the timestamp of the external documents are checked before the data is retrieved. This value is the character string literal specified in the last SETCHECK_TEXT_STATUS statement executed and is either 'TRUE' or 'FALSE'. The initial value is 'TRUE' indicating that the timestamp is checked.

COLLATION_SEQUENCE

This attribute identifies the collation sequence name for the ordering of the character set for this server. The initial value is 'default' indi-cating the English and French default collation sequence. The case of the literal in the SETCOLLATION_SEQUENCE statement is rel-evant.

FORMAT_TEXT

This attribute is provided for backward compatibility with previous

Page 565: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 565

versions of SearchServer. The value is 'STREAM'.

FRAGMENTED

This attribute is provided for backward compatibility with previous Fulcrum products. It specifies whether there might be repeated field instances in a row. This value is the character string literal specified in the last SETFRAGMENTED statement executed and is either 'TRUE' or 'FALSE'. The initial value is 'FALSE', indicating that data isn't fragmented.

FTNPATH

This attribute specifies zero or more network connectors, possibly including server names, used to communicate with remote Search-Servers. The format of a network connector is described in Fulcrum SearchServer Getting Started for your platform. If this value is the empty string (' '), SearchServer can access only local tables and those that appear to be local because of transparent file access or net-worked file systems. The value is the current value of the FTNPATH data source parameter.

FULCREATE

This attribute specifies the directory where tables will be created on the local system. If this value is the empty string (' '), SearchServer creates unqualified tables on the first server explicitly named by the FTNPATH data source parameter. The value is the current value of the FULCREATE data source parameter.

FULSEARCH

This attribute specifies the directory or list of directories that are searched when looking for an unqualified table. This must include at least the directory where the SearchServer message file (FUL-TEXT.FTC) is located. The value is the current value of the FULSEARCH data source parameter.

FULTEMP

This attribute specifies a location for temporary files during index-ing and searching operations. SearchServer must be given permis-sion to write to that location. The value is the current value of the data source parameter.

Note: If FULTEMP is not writable, ExecSQL will quite with no er-ror message.

IDENTIFIER_CASE

This attribute identifies the case sensitivity of user-defined names

Page 566: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

566 SA-Application Software Expert 5.0

D

such as table names and column names. The value is always 'UP-PER' to indicate that lowercase letters are converted to uppercase.

IDENTIFIER_LENGTH

This attribute is the maximum length of an identifier. The value is the character string representation of the decimal value, 18.

IMMEDIATE

This attribute declares that future tables will be created as an imme-diate table. This value is the character string literal specified in the last SETIMMEDIATE statement executed and is either 'TRUE' or 'FALSE'. The initial value is 'TRUE' indicating that the tables creat-ed will use immediate indexing.

INDEXDIR

This attribute specifies a directory that is used to contain the data and index files for the table. This value is the character string literal spec-ified in the last SETINDEXDIR statement executed or INDEXDIR table parameter. The initial value is an empty string (' ') indicating that these files are placed in the default location.

LAST_ROWID_INSERTED

This attribute specifies the FT_CID value of the last inserted row of a connection. This value is the character string representation of the unsigned integer from the FT_CID reserved column. The initial val-ue is '0'.

MATCH_CODE_END

This attribute is the character string that follows matched text when the value of SHOW_MATCHES is 'TRUE'. This value is '\E[32723m' where \E[ denotes the control sequence introducer (the 2-byte code with the hexadecimal value 1b and 5b).

MATCH_CODE_START

This attribute is the character string that precedes matched text when the value of SHOW_MATCHES is 'TRUE'. This value is '\E[32703m' where \E[ denotes the control sequence introducer (the 2-byte code with the hexadecimal value 1B5B).

MAX_EXEC_TIME

This attribute is the maximum execution time (in milliseconds) al-lowed for the SELECT, CREATETEXT_VECTOR, and VALI-DATE INDEX statements. This value is the character string representation of the unsigned integer from the last SET MAX_EXEC_TIME statement executed. The initial value is '0' which indicates that no time limit is imposed.

Page 567: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 567

MAX_SEARCH_ROWS

This attribute is the maximum number of rows allowed in a working table. This value is the character string representation of the un-signed integer from the last SET MAX_SEARCH_ROWS state-ment executed. The initial value is '0', indicating that no limit is imposed.

NOLOCKING

This attribute specifies whether the table is locked when retrieving and inserting data. This value is the character string literal specified in the last SETNOLOCKING statement executed or NOLOCKING table parameter and is either 'TRUE' or 'FALSE'. The initial value is 'FALSE' indicating locking.

NORMALIZATION

This attribute specifies the case normalization strategy to use when indexing the data in the table. The value is the character string literal specified in the last SETNORMALIZATION statement executed or NORMALIZATION table parameter.

The value can be any of the following case normalization strategies:

The initial value is 'DEFAULT'.

POSITIONING_UNIT

This attribute specifies the unit to be used for positioning. This value is the character string literal specified in the last SET POSITIONING_UNIT statement executed and is either 'DEFAULT_POSITIONING' or 'PAGE'.

'DEFAULT' Lowercase letters are mapped onto uppercase letters. Accented characters are mapped onto the unaccented uppercase letter. Characters xF1, xF2, xF4, xF6 through xFA, and 0xFc through 0xFE are mapped onto the corresponding character in column E. This case normalization strategy is based on the FTCS94 character set table.

'EUROPA3' Case normalization mapping is done according to the translation table provided for the Europa3 character set.

'ARABIC' Case normalization mapping is done according to the translation table provided for the ARABIC character set. This case normalization strategy is based on the AFTCS94 character set table.

'ASIAN' Case normalization is restricted to the 26 lowercase letters in the 7-bit subset of the FTCS table (mapped to the 26 uppercase letters).

'NONE' No case normalization mapping is done.

Page 568: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

568 SA-Application Software Expert 5.0

D

REFERENCE_SPOOLING

This attribute specifies whether reference file spooling is enabled or disabled. This value is the character string literal specified in the last SETREFERENCE_SPOOLING statement executed and is either 'TRUE' or 'FALSE'. The initial value is 'FALSE' indicating reference spooling is disabled.

RELEVANCE_METHOD

This attribute specifies the retrieval model and relevance algorithm used by the RELEVANCE function in the absence of an explicit rel-evance method parameter. This value is the character string literal specified in the last SET RELEVANCE_METHOD statement exe-cuted. The initial value is an empty string (' ') indicating that no de-fault relevance method is defined.

RIGHT_MARGIN

This attribute is provided for backward compatibility with previous versions of SearchServer. The value is 200 display positions.

ROW_LENGTH

This attribute is the maximum length of a row. The value is the char-acter string representation of the decimal value, 2147483647 (the maximum length of the FT_TEXT reserved column).

SEARCH_MEMORY_SIZE

This attribute specifies the current size (in kilobytes) of the search memory. This value is the character string representation of the un-signed integer from the last SETSEARCH_MEMORY_SIZE state-ment executed. The initial value is '63' in Windows environments and '512' on all other platforms.

SERVER_REPORT_TIME

This attribute is the minimum amount of time (in milliseconds) that the SQLExecDirect, SQLPrepare, and SQLExecute API func-tions will execute before returning control to the application with SQL_STILL_EXECUTING. This value is the character string rep-resentation of the unsigned integer from the last SETSERVER_REPORT_TIME statement executed. The initial val-ue is '1', indicating that if asynchronous execution is requested, these functions can return with SQL_STILL_EXECUTING in as little as one millisecond after being called.

SHOW_MATCHES

This attribute identifies whether match codes are inserted into the text returned by the SQLFetch and SQLGetData API functions. This value is the character string literal specified in the last SET SHOW_MATCHES statement executed and can be any of the fol-

Page 569: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 569

lowing:

The initial value is 'DEFAULT'.

SHOW_SGR

This attribute specifies whether control and escape sequences are in-serted into the data returned from a search. This value is the charac-ter string literal specified in the last SET SHOW_SGR statement executed and can be any of the following:

The initial value is 'DEFAULT'.

STOPFILE

This attribute specifies the default operating system file that con-tains a list of words not to be indexed (stopwords). This value is the character string literal specified in the last SETSTOPFILE statement executed or STOPFILE table parameter. The initial value is an emp-ty string (' ') indicating that no stopword file is defined. TERM_GENERATOR

This attribute specifies the linguistic rules filter list. The initial value is an empty string (' ') indicating that no linguistic rules filter will be used.

'TRUE' match codes are inserted for all columns

'FALSE' match codes are not inserted for all columns

'DEFAULT' match codes are not inserted for all columns

'INTERNAL_COLUMNS' match codes are inserted for all internal columns and aren't inserted for the external text column

'EXTERNAL_COLUMN' match codes are inserted for the external text column and aren't inserted for internal columns

'TRUE' control and escape sequences are inserted for all columns

'FALSE' control and escape sequences are not inserted for all columns

'DEFAULT' control and escape sequences are not inserted for all columns

'INTERNAL_COLUMNS' control and escape sequences are inserted for all internal columns and aren't inserted for the external text column

'EXTERNAL_COLUMN' control and escape sequences are inserted for the external text column and aren't inserted for internal columns

Page 570: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

570 SA-Application Software Expert 5.0

D

THESAURUS_NAME

This attribute specifies the default thesaurus file used by the THE-SAURUS function. This value is the character string literal specified in the last SETTHESAURUS_NAME statement executed. The ini-tial value is an empty string (' ') indicating that no thesaurus file is defined.

TXN_ISOLATION

This attribute is the initial transaction isolation level that the server assumes. This value is the character string representation of the dec-imal value 0 (no isolation).

USERID_LENGTH

This attribute is the maximum length of a user name or qualifier identifier. The value is the character string representation of the dec-imal value 18.

VECTOR_GENERATOR

This attribute specifies the type of linguistic processing applied to intuitive searches. The initial value is FTELPVECDEFLT. To dis-able linguistic processing, set the attribute value to an empty string (' '). To return to the SearchServer default, set the attribute value to 'DEFAULT'.

VERSION

This attribute identifies the current version and release of the local installation of SearchServer and the Fulcrum kernel, in that order. These two values are always separated by a slash. For example:

2.0B6/6.1B3

WILDCARD

This attribute identifies the possible position of the string wildcard characters. The value is a three-character map where the characters indicate if the wildcards can be used for PREFIX, EMBEDDED, or SUFFIX usage.

The characters in the map can be:

Character Meaning% only the percent (%) wildcard is supported in this

position

_ only the underline ( _ ) wildcard is supported in this position

B both wildcard characters are supported in this position

N neither wildcard is supported in this position

Page 571: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

System Information Tables

Text Retrieval Guide 571

The value returned by SearchServer for this attribute is 'BBB'

WILDCARD_OPT <wildcard optimization method>

This attribute specifies the type of wildcard optimization to be en-abled for the table. There are three wildcard optimization methods:

The initial value is 'NONE'.

WORKDIR

This attribute specifies the default work directory to be used for tem-porary files that might be required to accommodate buffer overflow. This value is the character string literal specified in the last SET WORKDIR statement executed or WORKDIR table parameter. The initial value is 'DEFAULT', which specifies the default location for temporary files. This location is recorded in the FULTEMP server attribute.

MINIMIZE_INDEX_OVERHEAD This method minimizes indexing time and space. Performance for some prefix and infix wildcard searches is reduced as compared to the MINIMIZE_SEARCH_TIME method.

MINIMIZE_SEARCH_TIME This method maximizes search performance. Indexing time is increased and the space required for the index is doubled. If space permits, this method is preferred for tables located on slower mass-storage devices, such as CD-ROMs.

NONE No wildcard optimization is enabled for the table. Performance for prefix and infix wildcard searches is substantially reduced.

Page 572: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

572 SA-Application Software Expert 5.0

D

Page 573: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 573

Appendix A:

Character Sets

This appendix describes the character sets that SearchServer sup-ports.

Introduction

SearchServer accepts statements and returns data in wide range of character sets. The following application character sets are support-ed, and can be set either by your application or using SearchSQL:

Internally, SearchServer stores data in one of two Fulcrum Technol-ogies Internal Character Sets (FTICS):

TCS94 (for application character sets other than Windows Europa3) FTCS94 (for Windows Europa3) FTCS94 (for Arabic)

It is possible to use other application character sets if they can be in-ternally represented in one of these internal character sets. The se-lection of FTICS is automatic, based on the application character sets.

You can specify the collation sequence to be used for ordering char-acter data, using a SETCOLLATION_SEQUENCE statement. Or-dering of text is done after translation to the application character set.

It is recommended that you use a single application character set. However, in a heterogeneous environment, it might be necessary to search and retrieve data in a platform-specific character set. In this case, only the data that can be represented in all of the character sets of interest, will be searchable and retrievable across all platforms.

When you create a table, you need to be aware of what character set your document are using. That character set indicates:

Character Set SearchServer Syntax

Appropriate Normalization

Associated FTICS

ISO 8859-1 (Latin-1) ISO_LATIN1 DEFAULT FTCS94

ISO 8859-2 (Latin-2) ISO_LATIN2 DEFAULT FTCS94

Windows Latin-1 WIN_LATIN1 DEFAULT FTCS94

Windows Latin-2 WIN_LATIN2 DEFAULT FTCS94

Windows Europa3 EUROPA3 EUROPA 3 EFTCS94

Macintosh MACINTOSH DEFAULT FTCS94

DOS DOS DEFAULT FTCS94

Page 574: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

574 SA-Application Software Expert 5.0

D

• the internal character set to be used

• the appropriate collation sequence

• the case normalization to be applied

You or your application are directly responsible for:

• setting the character set for the text reader before indexing

• setting the character set during retrieval

• setting the application collation sequence as a server parameter

• setting the case normalization strategy

Setting a Collation Sequence

Another characteristic of text might be that it requires an alternative collation sequence for ordering text strings. This is generally a char-acteristic of a language rather than a character set. This sequence can be complex beyond that which could be described by a character-by-character ordering. Using the Fulcrum SearchServer Customization Guide, you can define a special collation sequence for SearchServer.

The default collation sequence provides dictionary ordering for French and English text being returned in Windows Latin-1 of ISO 8859-1 (Latin-1). It is based on the Canadian Standards Association standard CAN/CSA-Z243.4. This default collation sequence pro-vides dictionary ordering for English text in all the supported appli-cation character sets. When data is returned in any other character set, the default collation sequence provides a dictionary ordering of English text. This might not be appropriate when used for text in French or any other language.

Setting the Case Normalization Strategy

SearchServer provides a table parameter used to select the case nor-malization strategy to use when indexing and searching text in the table. Like all table parameters, this one can be set only when a table is originally created. The syntax of this parameter is:

NORMALIZATION {'DEFAULT' | 'NONE' | 'ASIAN' | 'ARABIC' | 'EUROPA3'}

The case normalization you select dictates the internal character set to be used for the table.

The description of the case normalization for each of these options is:

Page 575: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 575

If you have the appropriate internal character set, you can also set the case normalization to ASIAN (to restrict normalization to the 26 lowercase letters) or NONE (to have no case normalization map-ping).

This state of the NORMALIZATION table parameter can be deter-mined by retrieving the new FTT_NORMALIZATION column from the TABLES table. Its value will be one of the literals listed above.

Note: The process of normalization is independent of the process of collation. Terms that can be considered identical for the purposes of searching (because of normalization) might not appear near each other in a list of terms ordered by a collation sequence. For example, the two terms 'abc' and 'ABC' would be considered identical for the purposes of searching through the default normal-ization, but would not appear together in a list ordered by a default collation sequence based on the binary ordering of the character val-ues.

Setting the Character Set

Your application must use the SET CHARACTER_SET statement if you intend to use anything other than the default for your system. You must ensure that the application character set that you choose uses the same internal character set (FTICS) as that used by the text reader when the documents were indexed.

You can define a custom character set if the tables supplied by Ful-crum do not provide the flexibility that you need. For more informa-tion, refer to the Fulcrum SearchServer Customization Guide.

DEFAULT Lowercase letters are mapped onto uppercase letters. Accented character are mapped onto the unaccented uppercase letter. Characters xF1, xF2, XF4, xF6 through xFA, and OxFc through OxFE are mapped onto the corresponding character in column E. This case normalization strategy is based on the FTCS94 character set table.

ARABIC Case normalization mapping is done according to the translation table provided for the ARABIC character set. This case normalization strategy is based on the AFTCS94 character set table.

EUROPA3 Case normalization mapping is done according to the translation table provided for the Europa3 character set. This case normalization strategy is based on the EFTCS94 character set table.

Page 576: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

576 SA-Application Software Expert 5.0

D

FTCS94

NOTES: (Table 'Column' references below, are referenced horizon-tally)

• Columns:0 and 1, are the Primary Control Character Set

• Columns 2 through 7, are the Primary Text Character Set

• Columns 8 and 9, are the Supplementary Control Character Set

• Columns A through F, are the Supplimentary Test Character Set

Table A-1 The Fulcrum Technologies Character Set (FTCS94)

Note 1: All undefined characters are reserved for future use.

Note 2: The supplementary codes in columns 8 and 9 are translated by SearchServer to the equivalent 2-byte sequence with the high bit off. For example, 0x9B in a data stream becomes esc 0x5B after SearchServer processes it during retrieval.

EFTCS94

NOTES: (Table 'Column' references below, are referenced horizon-tally)

0 1 2 3 4 5 6 7 8 9 A B C D E F0 SP 0 @ P ` p NBS K1 ! 1 A Q a q ¡

2 " 2 B R b r ¢

3 # 3 C S c s £

4 $ 4 D T d t x ~

5 % 5 E U e u NEL ¥

6 & 6 F V f v | ¶ IJ ij

7 ' 7 G W g w § · · ‰

8 BS ( 8 H X h x ¤ ..

9 HT ) 9 I Y i y ` ' / Ø ø

A LF * : J Z j z "

B VT ESC + ; K [ k {] PLD CSI « » ¸

C FF , < L \ l | PLU -

D CR - = M ] m }

E SO . > N ^ n ~ SS2

F SI / ? O _ o DEL SS3 SHY ¿

Page 577: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 577

• Columns:0 and 1, are the Primary Control Character Set

• Columns 2 through 7, are the Primary Text Character Set

• Columns 8 and 9, are the Supplementary Control Character Set

• Columns A through F, are the Supplimentary Test Character Set

Table A-2 The European Fulcrum Technologies Character Set (EFTCS94)

Note 1: All undefined characters are reserved for future use.

Note 2: The supplementary codes in columns 8 and 9 are translated by SearchServer to the equivalent 2-byte sequence with the high bit off. For example, 0x9B in a data stream becomes esc 0x5B after SearchServer processes it during retrieval.

AFTCS94

Table A-3 The Arabic Fulcrum Technologies Character Set (AFTCS94)

Note 1: All undefined characters are reserved for future use.

Note 2: The supplementary codes in columns 8 and 9 are translated by SearchServer to the equivalent 2-byte sequence with the high bit off. For example, 0x9B in a data stream becomes esc 0x5B after SearchServer processes it during retrieval.

0 1 2 3 4 5 6 7 8 9 A B C D E F0 SP 0 @ P ` p NBS

1 ! 1 A Q a q ¡

2 " 2 B R b r ¢

3 # 3 C S c s £

4 $ 4 D T d t x ~

5 % 5 E U e u NEL ¥

6 & 6 F V f v ¿ ¶

7 ' 7 G W g w § · .

8 BS ( 8 H X h x ¤ .. ß

9 HT ) 9 I Y i y ` ' /

A LF * : J Z j z "

B VT ESC + ; K [ k { PLD CSI « » ¸

C FF , < L \ l | PLU ª

D CR - = M ] m } Æ æ ''

E SO . > N ^ n ~ SS2

F SI / ? O _ o DEL SS3

Page 578: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

578 SA-Application Software Expert 5.0

D

ISO Latin-1

Table A-4 The ISO Latin-1 Character Set

Note: There are some characters that are necessary for correct French spelling that are not part of this character set. These charac-ters are: Latin capital letter Y with diaeresis; Latin small ligature O E; Latin capital ligature O E.

ISO Latin-2

b8 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

b7 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

b6 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

b5 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

0 0 0 0 00 SP 0 @ P ` p NBSP

0 0 0 1 01 ! 1 A Q a q i

0 0 1 0 02 " 2 B R b r ¢

0 0 1 1 03 # 3 C S c s £

0 1 0 0 04 $ 4 D T d t ¤ ' Ä ô ä ô

0 1 0 1 05 % 5 E U e u ¥

0 1 1 0 06 & 6 F V f v | ¶ Æ ö æ ö

0 1 1 1 07 ' 7 G W g w § · Ç x ç

1 0 0 0 08 ( 8 H X h x ¨ ¸ È Ø è ø

1 0 0 1 09 ) 9 I Y i y

1 0 1 0 10 * : J Z j z ª

1 0 1 1 11 + ; K [ k { « » Ë Û ë û

1 1 0 0 12 , < L \ l |

1 1 0 1 13 - = M ] m } SHY

1 1 1 0 14 . > N ^ n ~

1 1 1 1 15 / ? O _ o — ¿ Ï ß ï ÿ

b8 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1b7 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1b6 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1b5 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 150 0 0 0 00 SP 0 @ P ` p NBSP

0 0 0 1 01 ! 1 A Q a q Á á

0 0 1 0 02 " 2 B R b r  â

0 0 1 1 03 # 3 C S c s ó ó

0 1 0 0 04 $ 4 D T d t ¤ ' Ä ô ä ô

0 1 0 1 05 % 5 E U e u

0 1 1 0 06 & 6 F V f v ö ö

0 1 1 1 07 ' 7 G W g w § Ç x ç

1 0 0 0 08 ( 8 H X h x ¨ ¸

Page 579: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 579

Table A-5 The ISO Latin-2 Character Set

Windows Latin-1

1 0 0 1 09 ) 9 I Y i y é

1 0 1 0 10 * : J Z j z Ú ú

1 0 1 1 11 + ; K [ k { Ë ë

1 1 0 0 12 , < L \ l | Ü ü

1 1 0 1 13 - = M ] m } SHY " Í

1 1 1 0 14 . > N ^ n ~ Î î

1 1 1 1 15 / ? O _ o . . ß .

0 1 2 3 4 5 6 70 0 16 32 48 0 64 @ 80 P 96 ` 112 p1 1 17 33 ! 49 1 65 A 81 Q 97 a 113 q2 2 18 34 " 50 2 66 B 82 R 98 b 114 r3 3 19 35 # 51 3 67 C 83 S 99 c 115 s4 4 20 36 $ 52 4 68 D 84 T 100 d 116 t5 5 21 37 % 53 5 69 E 85 U 101 e 117 u6 6 22 38 & 54 6 70 F 86 V 102 f 118 v7 7 23 39 ' 55 7 71 G 87 W 103 g 119 w8 8 24 40 ( 56 8 72 H 88 X 104 h 120 x9 9 25 41 ) 57 9 73 I 89 Y 105 i 121 yA 10 26 42 * 58 : 74 J 90 Z 106 j 122 zB 11 27 43 + 59 ; 75 K 91 [ 107 k 123 {C 12 28 44 , 60 < 76 L 92 \ 108 l 124 |D 13 29 45 - 61 = 77 M 93 ] 109 m 125 }E 14 30 46 . 62 > 78 N 94 ^ 110 n 126 ~F 15 31 47 / 63 ? 79 O 95 _ 111 o 127

8 9 A B C D E F0 12

8144

160

176

192

À 208

224 à 240

1 129

145

` 161

i 177

193

Á 209

Ñ 225 á 241 ñ

2 130

¸ 146

' 162

¢ 178

194

 210

ò 226 â 242 ò

3 131

147

" 163

£ 179

195

à 211

ó 227 ã 243 ó

4 132

„ 148

" 164

¤ 180

' 196

Ä 212

ô 228 ä 244 ô

5 133

… 149

165

¥ 181

197

Å 213

õ 229 å 245 õ

Page 580: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

580 SA-Application Software Expert 5.0

D

Table A-6 Windows Latin 1 Character Set

Windows Latin-2

6 134

† 150

- 166

182

¶ 198

Æ 214

ö 230 æ 246 ö

7 135

‡ 151

— 167

§ 183

· 199

Ç 215

x 231 ç 247

8 136

152

~ 168

¨ 184

¸ 200

È 216

Ø 232 è 248 ø

9 137

‰ 153

169

185

201

É 217

Ù 233 é 249 ù

A 138

154

170

ª 186

202

Ê 218

Ú 234 ê 250 ú

B 139

‹ 155

› 171

« 187

» 203

Ë 219

Û 235 ë 251 û

C 140

Π156

œ 172

188

204

Ì 220

Ü 236 ì 252 ü

D 141

157

173

- 189

205

Í 221

237 í 253

E 142

158

174

190

206

Î 222

238 î 254

F 143

159

Ÿ 175

— 191

¿ 207

Ï 223

ß 239 ï 255 ÿ

0 1 2 3 4 5 6 70 0 16 32 48 0 64 @ 80 P 96 ` 112 p1 1 17 33 ! 49 1 65 A 81 Q 97 a 113 q2 2 18 34 " 50 2 66 B 82 R 98 b 114 r3 3 19 35 # 51 3 67 C 83 S 99 c 115 s4 4 20 36 $ 52 4 68 D 84 T 100 d 116 t5 5 21 37 % 53 5 69 E 85 U 101 e 117 u6 6 22 38 & 54 6 70 F 86 V 102 f 118 v7 7 23 39 ' 55 7 71 G 87 W 103 g 119 w8 8 24 40 ( 56 8 72 H 88 X 104 h 120 x9 9 25 41 ) 57 9 73 I 89 Y 105 i 121 yA 10 26 42 * 58 : 74 J 90 Z 106 j 122 zB 11 27 43 + 59 ; 75 K 91 [ 107 k 123 {C 12 28 44 , 60 < 76 L 92 \ 108 l 124 |D 13 29 45 - 61 = 77 M 93 ] 109 m 125 }E 14 30 46 . 62 > 78 N 94 ^ 110 n 126 ~F 15 31 47 / 63 ? 79 O 95 _ 111 o 127

8 9 A B C D E F0 128 144 160 176 192 208 224 240

1 129 145 ` 161 177 193 Á 209 225 á 241

Page 581: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 581

Table A-7 Windows Latin 2 Character Set

Windows Europa-3

2 130 ¸ 146 ' 162 178 ` 194 Â 210 226 â 242

3 131 147 " 163 179 195 211 ó 227 243 ó

4 132 „ 148 " 164 ¤ 180 ' 196 Ä 212 ô 228 ä 244 ô5 133 … 149 165 181 197 213 229 245

6 134 † 150 - 166 | 182 ¶ 198 214 ö 230 246 ö

7 135 ‡ 151 — 167 § 183 · 199 Ç 215 x 231 ç 247

8 136 152 168 ¨ 184 ¸ 200 216 232 248

9 137 ‰ 153 169 185 201 É 217 233 é 249

A 138 170 186 202 218 Ú 234 250 ú

B 139 ‹ 155 › 171 « 187 » 203 Ë 219 235 ë 251

C 140 156 172 188 204 220 Ü 236 252 ü

D 141 157 173 - 189 " 205 Í 221 237 í 253

E 142 158 174 190 206 Î 222 238 î 254

F 143 159 175 191 207 223 ß 239 255 .

0 1 2 3 4 5 6 7 8 9 A B C D E F0 0 @ P p Ç É á ó Ÿ

1 ! 1 A Q a q ü æ í u

2 " 2 B R b r é Æ ó ¡ Ê ô

3 # 3 C S c s â ô ú ¿

4 ¶ $ 4 D T d t ä ö ñ ' õ

5 § % 5 E U e u à ò Ñ Á õ

6 £ & 6 F V f v å û Â ã í

7 7 G W g w ç ù À Ã

8 ( 8 H X h x ê

9 ) 9 I Y i y ë ö Ú

A * : J Z j z è Ü .

B + ; K [ k { Ï ø

C , < L \ l | Î

D Ë - = M ] m } ì Ø

E . > N ^ n ~ Ä

F / ? O _ o Å

Page 582: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

582 SA-Application Software Expert 5.0

D

Table A-8 Windows Europa-3 Character Set

Macintosh

Table A-9 Macintosh Character Set

DOS

0x 1x 2x 3x 4x 5x 6x 7x 8x 9x Ax Bx Cx Dx Ex Fxx0 nul dle sp 0 @ P ` p A ê † ¿ - ‡

x1 soh DC1

! 1 A Q a q Å ë

x2 stx DC2

" 2 B R b r Ç ì ¢ ~ " ¸ Ú

x3 etx DC3

# 3 C S c s É í £ " „ Û

x4 eot DC4

$ 4 D T d t Ñ Î § ¥ ` ‰ Ù

x5 enq nak % 5 E U e u ö Ï

x6 ack syn & 6 F V f v Ü ñ ¶ Ê ^

x7 bel etb ' 7 G W g w á ó ß >> Á ~

x8 bs can ( 8 H X h x à ò << ÿ Ë -

x9 ht em ) 9 I Y i y â ô … Ÿ È

xA lf sub * : J Z j z ä ö nbsp

/ Í ·

xB vt esc + ; K [ k { ã õ ' ª À ¤ Î

xC ff fs , < L \ l | å ú ..

xD cr gs - = M ] m } ç ù õ > Ì

xE so rs . > N ^ n ~ é û Æ æ Œ fi ó

xF si us / ? O _ o del è ü Ø ø œ fl ô

0 16 32 48 0 64 @ 80 P 96 ` 112 p1 17 33 ! 49 1 65 A 81 Q 97 a 113 q2 18 34 " 50 2 66 B 82 R 98 b 114 r3 19 35 # 51 3 67 C 83 S 99 c 115 s4 20 36 $ 52 4 68 D 84 T 100 d 116 t5 21 37 % 53 5 69 E 85 U 101 e 117 u6 22 38 & 54 6 70 F 86 V 102 f 118 v7 23 39 ' 55 7 71 G 87 W 103 g 119 w8 24 40 ( 56 8 72 H 88 X 104 h 120 x9 25 41 ) 57 9 73 I 89 Y 105 i 121 y10 26 42 * 58 : 74 J 90 Z 106 j 122 z11 27 43 + 59 ; 75 K 91 [ 107 k 123 {12 28 44 , 60 < 76 L 92 \ 108 l 124 |

Page 583: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Sets

Text Retrieval Guide 583

Table A-10DOS Character Set

13 29 45 - 61 = 77 M 93 ] 109 m 125 }14 30 46 . 62 > 78 N 94 ^ 110 n 126 ~15 31 47 / 63 ? 79 O 95 _ 111 o 127

128

Ç 144

É 160

á 176

192

208

224 240

129

ü 145

æ 161

í 177

193

209

225 ß 241

130

é 146

ft 162

ó 178

194

210

226 242

131

â 147

ô 163

ú 179

195

211

227 243

132

ä 148

ö 164

ñ 180

196

212

228 244

133

à 149

ò 165

Ñ 181

197

213

229 245

134

å 150

û 166

ª 182

198

214

230 246

135

ç 151

ù 167

183

199

215

231 247

136

ê 152

ÿ 168

¿ 184

200

216

232 248

137

ë 153

ö 169

185

201

217

233 249

138

è 154

Ü 170

186

202

218

234 250 .

139

ï 155

¢ 171

187

203

219

235 251

140

î 156

£ 172

188

204

220

236 252

141

ì 157

¥ 173

¡ 189

205

221

237 ø 253

142

Ä 158

Pt 174

« 190

206

222

238 254

143

Å 159

‰ 175

» 191

207

223

239 255

0 16 32 48 0 64 @ 80 P 96 ` 112 p

Page 584: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

584 SA-Application Software Expert 5.0

D

Page 585: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Classes

Text Retrieval Guide 585

Appendix B:

Character Classes

This appendix describes the character classes that are the basis for the parsing rules used to index terms.

Character Classes Available in Search-Server

The parsing of text into the terms that are indexed is done based on rules, which are in turn based on character classes. Each character in a character set must belong to one (and only one) character class.

SearchServer contains a number of possible character classes. How they are combined defines the parsing rules of a character set. The correct setting of character classes is necessary to ensure that text is parsed correctly at indexing time, and that queries are parsed cor-rectly. Therefore the definition of a character set requires that the character class of each character in the set is defined.

Table B-1 describes all the available character classes.

Class Assignment/FunctionAC Accents that are permitted in alphabetic terms and that are

ignored for indexing.

ADJ Punctuation marks that are permitted between alphabetic or numeric substrings of a term. Multiple consecutive ADJ characters can occur when surrounded by alphas/digits. ADJ characters at the beginning or end of a term, or in any context other than those embedded within alphas/digits, can be treated as stopped or unstopped punctuation, through the use of $PN and $PY groups

AJ Punctuation marks that are permitted in alphabetic terms, when surrounded by at least one alphabetic (AL) on each side. When an AJ character occurs at the beginning or end of a term, it is not considered part of the term. In this and other contexts, AJ characters can be treated as unsearchable (stopped) or searchable (unstopped) punctuation by the appropriate definition of the $PN and $PY groups. Punctuation marks that commonly occur in alphabetic strings should be placed in this class to prevent the index from becoming cluttered with terms fragments.

AL Alphabetic characters that form alphabetic terms.

Page 586: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

586 SA-Application Software Expert 5.0

D

Table B-1Character Class Definitions

Table B-2 summarizes the attributes of each character class. The col-umns in this table indicate whether the members of a class increment the virtual character count (VCC), act as term separators, and can be used in alphabetic or numeric terms (or both), and whether that class member is indexed at indexing time.

DI Numeric characters (digits) that form numeric terms. During normal indexing, numeric terms are indexed separately from alphabetic terms, even when they are adjacent. When a numeric string is evaluated in the context of numeric range searching, value range searching and value indexing, any leading non-digit character is ignored, provided the character belongs to the DI class.

DJ Punctuation marks that are permitted in numeric terms, with similar restrictions as for AJ.

DJS Means the same as DJ, but also redefines numeric and string separator characters such as commas and periods.

esc Control function sequence introducers. These are not term separators by themselves, but the control sequence as a whole may be treated as a term separator. The membership of this class can not be altered.

IAC Accents that are permitted in alphabetic terms have a VCC value, and are indexed.

NFE Other format effectors that are always treated as term separators.

nul Characters that are ignored in all contexts. This class contains all characters that belong to no other class.

PST Stopped punctuation marks that are term separators, and that are treated like single-character stopwords. Any character in the primary or supplementary character sets (except x7F [DEL], xA0 and xFF) which belongs to none of the above classes, belongs to the PST class by default.

PUN Unstopped punctuation marks that are term separators, and that are treated like single-character indexable words.

SP Space a term separator, except when it follows a soft hyphen, or when indexing occurs in literal mode.

WS Non-printing format effectors that are treated like term separators, except when they follow a soft hyphen, in which case they are ignored.

Class Assignment/Function

Class AttributesIndexed Increment

s VCCPermitted in * Can be term

separatorAlphabetic terms

Numeric terms

AC no no yes no no

Page 587: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Character Classes

Text Retrieval Guide 587

* Attributes that do not apply when the character is used in a literally indexed term. ** Attributes that change when the character is used in a literally in-dexed term.

Table B-2 Character Class Attributes

Character Classes of SearchServer's In-ternal Character Sets

As described in Appendix A, "Character Sets," several internal char-acter sets are provided with SearchServer. Each of these has its own set of default character class specifications. Table B-3 lists the de-faults character classes for the FTCS94 character set.

ADJ yes yes yes yes no

AJ yes yes yes no no

AL yes yes yes no no

DI yes yes no yes no

DJ yes yes no yes no

DJS yes yes no yes no

esc no no yes yes no

IAC yes yes yes no no

NFE no no no no yes

nul no no yes yes no

PST no ** yes no no yes **

PUN yes yes no no yes **

SP no ** no ** no no yes **

WS no no no no yes

Class Attributes

Class Characters That Are Part of This ClassAC 0xc0 - 0xcf

ADJ Empty class.

AJ ` (0x27)

AL A - Z, a - z, 0xe0 - 0xfe

DI 0 - 9

DJ , (0x2c) . (0x2e)

DJS Empty class. Can replace DJ.

Page 588: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

588 SA-Application Software Expert 5.0

D

Table B-3 FTCS94 Default Character Classes

Table B-4 lists the defaults character classes for the EFTCS94 char-acter set.

Table B-4 EFTCS94 Default Character Classes

esc ESC (0x1b), CSI (0x9b)

IAC Empty class

NFE VT(0x08), NEL(0x85), PLD(0x8b), PLU(0x8c).

nul All codes not listed above.

PST 0x21 - 0x7e and 0xa1 - 0xdf, except those codes listed above.

PUN Empty class.

SP SPACE (0x20)

WS HT(0x09), NL(0x0a), FF(0x0c), cr(0x0d)

Class Characters That Are Part of This Class

Class Characters That Are Part of This ClassAC 0xc0 - 0xcf

ADJ Empty class.

AJ ` (0x27)

AL A - Z, a - z, 0xac - 0xaf, 0xbc - 0xbf, 0xd0 - 0xfe

DI 0 - 9

DJ , (0x2c) . (0x2e)

DJS Empty class. Can replace DJ.

esc ESC (0x1b), CSI (0x9b)

IAC Empty class

NFE VT(0x08), NEL(0x85), PLD(0x8b), PLU(0x8c).

nul All codes not listed above.

PST 0x21 - 0x7e and 0xa1 - 0xbb, except those codes listed above.

PUN Empty class.

SP SPACE (0x20)

WS TAB(0x09), NL(0x0a), FF(0x0c), cr(0x0d)

Page 589: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Symbols

$QUAL 31$TextSearch 26, 27

A

additional predicates 18APVARCHAR 13, 14ASCII characters 16

B

build scripts 30Building a SearchServer index 18

C

Create Schema 13, 18Create Table 13, 18Creating a database text index 23Creating indexes during a build 30Creating Indexes With SABuild 29

D

Defining a SearchServer index 13Deleting a database text index 25Deleting a SearchServer index 19Deleting and updating text indexes 31Description 30Drop Table 19

E

External documents, defined 13

F

Filter Options 17fonts

monospace 8

FT_CID 14FT_FLIST 14, 16, 17FT_SFNAME 14, 16, 17FT_TEXT 14

I

icons 8IMMEDIATE 15Increased SA-Script text retrieval functionality 23Indexing Database Text Fields 21Indexing External Documents 11Inserting data into a SearchServer index 16

K

Key column 14KEYLITERA 31KEYLITERAL 31KEYVALUE 31

L

LITERAL 31LONGCHAR 31

M

Memo column 17monospace fonts 8

O

Other SearchServer index activities 19

P

parameter 17

Q

Querying a database text index 26Querying a SearchServer index 18

Index

Page 590: SA-Application Text Retrieval Software Expert 5.0 …publib.boulder.ibm.com/tividd/td/ASE/ASE50trg/en_US/PDF/...create dialog box forms, menus, toolbars, and string tables for any

Index

SA-Application Software Expert 5.0590

Querying multiple indexes 27

R

Reserved columns 14Rules for creating database text Indexes 31

S

s filter 16SAI_NDX_REBUILD 25SAI_NDX_UPDATE 25SAIDBTextIndexDelete 25SOL_DESC 30Solution_Id 30SQL_ExecuteImmediate 13SQLDBTextIndexCreate 23SQLFetch 13SQLInsert 16SQLSelect 18, 26SQLSelectInt 18Storing supplemen-tary information in an index 14

T

textmonospace 8

Text readers 23text retrieval index 22

U

Updating database text indexes 25Using a schema 15Using SQL with SearchServer 22

V

Validate Index 15, 18VALUE 31

W

Where clause 26wildcard 19ww filter 16