4/66—3r indexing theory, indexing methods and search devices. 1964. frederick jonker. scarecrow,...

3
ing computers. For a more scholarly and complete reference work with excellent bibliogra hy one must we Bourne. For anyone developin a retrievafsyh to rely eolely on Kent would be foolharfy. A library school course completely de- pendent on Kent without the addition of Bourne is not exploring information systems in depth and is not using the basic tool in the field. AUDEEY RUBIN 4/66-2R Tower& Information Retrieval. 1961. Robert A. Fairthorne. Butterworths, London. 211 pp. Towards Information Retrieval is a collection of papem written over a thirteen-year period. While no attempt has been made to add textual material which would unify them, the papers do form a compasite picture of some of the theoretical aspects of information retrieval, bringing to the reader various facets of a common theme. Robert A. Fairthorne is a noted figure in a documenta- tion, althou h perhaps better known in England than in the United jtates. The earliest paper presented in this col- lection dates from 1947 indicating his long concern with the field. Some thirty-fiv years of the author‘s career were spent at the Royal Arcraft Establishment, where he was consultant at the time of this publication. During the first twenty years of his career he applied his mathematical training to a wide variety of technical problems. During the next. fifteen years he mas also much involved with the or- ganization of the library at the Royal Aircraft Establish- ment and the practical and theoretical problems involved became his major concern. He is now with Herner and Com- pany in the United States. In his article “Identifying Key Contributions in Informa- tion Science” (Am Doc 15: 289-295, Oct. 1964)) Carlos A. Cuadra singles out this volume as a major text in the field. Fairthorne’s name also appears in Cuadra’s table of fre- quently cited authors, derived from a count of entries in bibliographies in the field in the same study. The Cuadra study shows the article “Basic Postulates and Common Syntax” which appears in this volume as being cited by five major textbooks in the field. However, the book cannot be considered as a text in the field in the sense of providing a survey of the entire field and fundamental information on its key aspects. It does, nevertheless, represent the best (according to the reviews surveyed) and probably the most frequently cited of Fairthorne’s works. The theme which ties the paper together is probably best described in Fairthorne’s own words taken from the preface : For some millenia librarians have had to deal with texts as carriers of concepts, and with texts as heavy objects with marks on. They have evolved efficient techniques and principles to cope with these aspects severally. Rarely have they discussed texts in both capacities at once. The selection of papers published here explores activities in which indefinite neglect of either aspect, the conceptual or the mechanical. will lead to practical and theoretical disaster. They centre on the recovery of records according to their subject matter . The articles explore various areas of documentation, analyze and criticize existing systems, and seek new insights for blending the conceptual with the manipulative. Throughout the papers, Fairthorne’s intent appears to be that of raising problems and drawing attention to them rather than offering solutions. In an introductory section entitled “Comments,” Lea M. Bohnert states that the best introduction to the field and to the author’s general ap- proach is the paper “The Pattern of Retrieval” originally ublished in Amencan Doczlmentation in 1956 and reprinted F lere. Fairthorne indicates the nature of his concerns when he writes, “A deep question of great theoretical and prac- tical importance is how far can we go in documentation, as in compritiny, by using ritual in place of understanding?” On notation: “The bridge hetween the concepts and physics of retrieval is notation, or systems of marking the texts.” “They [librarians] have given little attention and have had little need to give attention to the mechanical conse- quence of notation considered aa instruction for retrieving rather than recognixing documents.” Fairthorne vnds some time in diecuesing the cla+ication of tasks in dormation work, especially those wkch, in his words, may be “dele- gated” to the machine. “Fortunately,” says Bohnert, “Fah thorne belongs to the economic breed that considera it e5- cient to have human machines perform the unusual and variegated types of work.” Pairthome does not appear to expect classification t o solve the problems of retrieval as seems currently fashionable. In fact, he does not seem to expect much from claasiiicatian at all, in spite of his sev- eral writings on the subject. Another of his major concerns is that of cost. He states that theory can be used to produce a fair estimate of costa when we study “all the links in the operational chain.” But: “The theory can give only the least cost of clerical operations. Evidently the greatest cost depends only 011 what the author of the system can get other people to ut up with. In practice, the limit seems to have been reacfed by the time the entria needed for retrieval exceed tho= in the documenta to be retrieved.” He believes that models of document retrieval systems should be used for experi- mental study before more money is ment on erpenaive varieties of retrieval machinery. On the whole the volume is not easy reading. Much of it is theoretical and requires slow deliberate concentration for comprehension, and even then some is elusive. Fairthome works through what he has to say with precision. His mathe matical interests and ability are evident in the many dia- grams and formulas. To the mathematically untrained the volume ap ears rather frightening by its not infrequent complicated’ passages. Yet Fairthorne cannot be criticized for being deliberately obscure, or u n n e c e b l y complex. The writing is straight forward and lucid. He apparently attempts to write with great clarity-so much so that he often achiema a die- arming simplicity in hi^ statements. His tendency to reduce complex notions to ordiyry terminology is often evident - ”marking” and ‘ ‘ p a r 7 as the two physical methods for organieing information or the retrieval process. He is often amusing or witty. When he is critical, hia criticiem is often biting, as in the opening of his article on “Delegation of Classification.” This volume is a most valuable contribution to the litera- ture. That librarians have failed to appreciate Fairthorne can, according to Vickery, be attributed to a number of factors: “By and large, librarians are concerned to empb size the intellectual content of their work, and display a marked psychological resistance to a description of part of it as ‘clerical’ and capable of performance by automation. It almost seems that they spurn 1abouMaving devices, des ite their constant complaint of overwork. “hey have a gar of automata to overcome. They should ponder Fairthorne’s words ‘automatism is merely remote control in time.”’ Vickery adds that Fairthome has never actdl7 participated in building a retrieval system. “In short, he IE a theorist, and suffers the usual fate of lack of understand- ing by ‘practical men.’ The index is by Calvin Mooers. While much of the material is not recent, most of Fair- thorne’s questions are as valid today as they were when he first raised them. This is true mainly because the author’s concern is with basic theory, and not with descriptions of current practice. MARGARET LINN 4/WR Indexing Theory, Indexing lhthodr and Search Devices. 1964. Frederick Jonker. Scarecrow, New York. 124 pp. Frederick Jonker’s chief purpose in writing this book wan to give a full exposition of a “generalized theory of index- ing” which he had begun to develop a few years earlier. The expression “generalized theory” may be understood aa referring to the process of describing a group or series of events in words sufficiently general to encompass all aspecta of those events, and sufEciently specific that the description is recognizable as being uniquely of those events. Some groups of events lend themselves quite readily to such treat- American Documentation - April 1966 109

Upload: edith-ward

Post on 09-Aug-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

ing computers. For a more scholarly and complete reference work with excellent bibliogra hy one must we Bourne. For anyone developin a r e t r i e v a f s y h to rely eolely on Kent would be foolharfy. A library school course completely de- pendent on Kent without the addition of Bourne is not exploring information systems in depth and is not using the basic tool in the field.

AUDEEY RUBIN

4/66-2R Tower& Information Retrieval. 1961. Robert A. Fairthorne. Butterworths, London. 211 pp.

Towards Information Retrieval is a collection of papem written over a thirteen-year period. While no attempt has been made to add textual material which would unify them, the papers do form a compasite picture of some of the theoretical aspects of information retrieval, bringing to the reader various facets of a common theme.

Robert A. Fairthorne is a noted figure in a documenta- tion, althou h perhaps better known in England than in the United jtates. The earliest paper presented in this col- lection dates from 1947 indicating his long concern with the field. Some thirty-fiv years of the author‘s career were spent a t the Royal Arcraft Establishment, where he was consultant a t the time of this publication. During the first twenty years of his career he applied his mathematical training to a wide variety of technical problems. During the next. fifteen years he mas also much involved with the or- ganization of the library at the Royal Aircraft Establish- ment and the practical and theoretical problems involved became his major concern. He is now with Herner and Com- pany in the United States.

In his article “Identifying Key Contributions in Informa- tion Science” ( A m Doc 15: 289-295, Oct. 1964)) Carlos A. Cuadra singles out this volume as a major text in the field. Fairthorne’s name also appears in Cuadra’s table of fre- quently cited authors, derived from a count of entries in bibliographies in the field in the same study. The Cuadra study shows the article “Basic Postulates and Common Syntax” which appears in this volume as being cited by five major textbooks in the field.

However, the book cannot be considered as a text in the field in the sense of providing a survey of the entire field and fundamental information on its key aspects. It does, nevertheless, represent the best (according to the reviews surveyed) and probably the most frequently cited of Fairthorne’s works. The theme which ties the paper together is probably best described in Fairthorne’s own words taken from the preface :

For some millenia librarians have had to deal with texts as carriers of concepts, and with texts as heavy objects with marks on. They have evolved efficient techniques and principles to cope with these aspects severally. Rarely have they discussed texts in both capacities a t once. The selection of papers published here explores activities in which indefinite neglect of either aspect, the conceptual or the mechanical. will lead to practical and theoretical disaster. They centre on the recovery of records according to their subject matter . The articles explore various areas of documentation,

analyze and criticize existing systems, and seek new insights for blending the conceptual with the manipulative. Throughout the papers, Fairthorne’s intent appears to be that of raising problems and drawing attention to them rather than offering solutions. In an introductory section entitled “Comments,” Lea M. Bohnert states that the best introduction to the field and to the author’s general ap- proach is the paper “The Pattern of Retrieval” originally

ublished in Amencan Doczlmentation in 1956 and reprinted Fl ere. Fairthorne indicates the nature of his concerns when he writes, “A deep question of great theoretical and prac- tical importance is how far can we go in documentation, as in compritiny, by using ritual in place of understanding?” On notation: “The bridge hetween the concepts and physics of retrieval is notation, or systems of marking the texts.” “They [librarians] have given little attention and have had little need t o give attention to the mechanical conse-

quence of notation considered aa instruction for retrieving rather than recognixing documents.” Fairthorne v n d s some time in diecuesing the cla+ication of tasks in dormation work, especially those wkch, in his words, may be “dele- gated” to the machine. “Fortunately,” says Bohnert, “Fah thorne belongs to the economic breed that considera it e5- cient to have human machines perform the unusual and variegated types of work.” Pairthome does not appear t o expect classification t o solve the problems of retrieval as seems currently fashionable. In fact, he does not seem to expect much from claasiiicatian a t all, in spite of his sev- eral writings on the subject.

Another of his major concerns is that of cost. He states that theory can be used to produce a fair estimate of costa when we study “all the links in the operational chain.” But: “The theory can give only the least cost of clerical operations. Evidently the greatest cost depends only 011 what the author of the system can get other people to u t up with. In practice, the limit seems to have been reacfed by the time the en t r ia needed for retrieval exceed tho= in the documenta to be retrieved.” H e believes that models of document retrieval systems should be used for experi- mental study before more money is ment on erpenaive varieties of retrieval machinery.

On the whole the volume is not easy reading. Much of it is theoretical and requires slow deliberate concentration for comprehension, and even then some is elusive. Fairthome works through what he has to say with precision. His m a t h e matical interests and ability are evident in the many dia- grams and formulas. T o the mathematically untrained the volume ap ears rather frightening by its not infrequent complicated’ passages.

Yet Fairthorne cannot be criticized for being deliberately obscure, or u n n e c e b l y complex. The writing is straight forward and lucid. H e apparently attempts to write with great clarity-so much so that he often achiema a die- arming simplicity in hi^ statements. His tendency to reduce complex notions to ord iyry terminology is often evident - ”marking” and ‘ ‘ p a r 7 as the two physical methods for organieing information or the retrieval process. He is often amusing or witty. When he is critical, hia criticiem is often biting, as in the opening of his article on “Delegation of Classification.”

This volume is a most valuable contribution to the litera- ture. That librarians have failed to appreciate Fairthorne can, according to Vickery, be attributed to a number of factors: “By and large, librarians are concerned to e m p b size the intellectual content of their work, and display a marked psychological resistance to a description of part of it as ‘clerical’ and capable of performance by automation. It almost seems that they spurn 1abouMaving devices, des ite their constant complaint of overwork. “hey have a g a r of automata t o overcome. They should ponder Fairthorne’s words ‘automatism is merely remote control in time.”’ Vickery adds that Fairthome has never a c t d l 7 participated in building a retrieval system. “In short, he IE a theorist, and suffers the usual fate of lack of understand- ing by ‘practical men.’ ”

The index is by Calvin Mooers. While much of the material is not recent, most of Fair-

thorne’s questions are as valid today as they were when he first raised them. This is true mainly because the author’s concern is with basic theory, and not with descriptions of current practice.

MARGARET LINN

4 / W R Indexing Theory, Indexing lhthodr and Search Devices. 1964. Frederick Jonker. Scarecrow, New York. 124 pp.

Frederick Jonker’s chief purpose in writing this book wan to give a full exposition of a “generalized theory of index- ing” which he had begun to develop a few years earlier. The expression “generalized theory” may be understood aa referring to the process of describing a group or series of events in words sufficiently general to encompass all aspecta of those events, and sufEciently specific that the description is recognizable as being uniquely of those events. Some groups of events lend themselves quite readily to such treat-

American Documentation - April 1966 109

ment by exhibiting many characteristics in common. An author then says that he has formulated a theory, becauge he has noticed the common characteristics. Needless to say, once a theory has been formulated, it is very easy to see subsequent evenb as operating within its framework, In- deed, sometimes it is almost impossible not to see them that way. Moreover, when events are described in the terminology of the theory, they seem subtly to change in character to fit it.

Jonker points out that the only valid criterion for de- signing an information retrieval system is cost: “how to deliver a specified quantity, quality and speed of service at the lowest possible cost.” Since the initial indexing, and not the entering of data or the search, is by far the greatest cost factor, an analysis and understanding of this part of the I. R. structure is the prime necessity. However, since he is attempting to provide the common precepts by which individual systems can be judged for their suitability to a particular I. R. problem, the author has formulated his theory to cover all aspects of the systems.

Jonker begins the development of his theory by defining mechanized I. R. as “march by coincidence of terms.” He proves this by demonstrating that all the logical relation- ships among indexing terms which a system may be re- quired to provide may be reduced to readout functions, to coincidence-of-terms search, or to a combination of the two. He does not, however, limit his theory to a description of these activities. He points out that most systems in actual use are combinations of hierarchical or classified grouping with term coordination, and therefore attempts to encompass both.

The two basic factors in any index are the kind of ter- minology used, and the ways in which the words are made to relate to each other and to indicate relationships among the concepts embodied in the information store.

For the first of these, Jonker postulates a “terminological continuum” which he conceives as a direct function of the development of knowledge. He represents it schematically as a straight line proceeding from left to right. It ia his contention that the language of a field of knowledge de- velops from longer to shorter te rm. When a new concept is born and recognized, words are taken from several older concepts to describe it. He considers this the left end of the continuum. As the new concept becomes accepted and widely used, and in turn forms the basis for further develop- ments in the field, new and unique words are used to de- scribe it. Sometimes two or more older-concept words are simply combined to form one; sometimes they are hyphen- ated into a single inseparable expression; in other cases, a new word is coined. Since this is a natural evolutionary process, it cannot be depended upon to happen consistently, or a t a particular rate of speed. It does not eliminate am- biguities caused by synonyms, homonyms, and shades of accepted meaning when the same word is used in different but related fields. It does not obviate the problem that dif- ferent people in referring to the same concept will use words from different stages of development of the vocabulary. For greatest precision, therefore, an indexing system should. in principle, assign a unique word or code indication to every unique concept. This is the extreme right end of the continuum.

Such accuracy can be achieved only a t great cost. In practice, the theory has two lessons for the system designer. He must be aware of the level of the vocabulary develop- ment (within the field with which the system deals) of the users of the system. He must also understand the language of the body of literature to which he is providing accws. His job is to create a bridge between the language of the user and that of the system and, then, through really ef- fective indexing, between the system and the literature. The first span may consist simply of a list of the index terms used; it may be in the form of a thesaurus; or it may be a translation mechanism built right into the machine.

The author puts forward cogent arguments against the poasibility of a universal indexing vocabulary, applicable to all fields and all users. He claims that there is no

standard criterion upon which t o base such a language. In some caaes, the use of something may be the best way to describe it. In others, for other people, the structure of that same thing may be more important.

Jonker postulates a “connective continuum” to describe the historical development of ways of showing relationships among concepts indexed. At one end is the clamification system where a term is placed with others to which it bears a hierarchical relationship. This produces very long index terms, since each one carries with it all its relatives. At the opposite end is the keyword technique. Here, the only thing that can be discovered is what other items of information are stored in the same document or what other documents bear the same information. This produces the shortest index terms. Between the two ends is the subject heading list that overlaps both extremes by frequently giving some hier- archical indication, while also giving several subject head- ings for a particular item of information. The greatest po- tential for “indexing depth” (defined by the author as the number of criteria by which an item of information may be indexed) is at the short-term end of the continuum. However, since an item of information is entered a t this end only on the hierarchical level on which it appears in the literature being indexed, it will be lost in a search by a word applying to a higher or lower level. Searches must be made on various levels if generic information is sought. On the other hand, the short-term end can handle ideas a t all levels of their development, since as many terms as deemed necessary can be used to describe them, with no need to fit them into a preconceived pattern. The system designed a t the short-term end of the continuum is, there- fore, inexpensive (relatively) to feed, but may incur great expense in search time or coordination mechanisms a t the output. A classified system is more expensive to feed, and may lose new ideas by erroneously placin them in hierarchies in whic they are later found not togbelong, but should be the sim$est and cheapest a t the output.

Integrating the two continua, any I. R. system might be viewed as a point on a two-dimensional plane. The hori- zontal might be considered the indexing type moving from classification to keyword, and the vertical the terminological type, moving up from lay to professional language (long to shorter terms). The decision must always be made at which point along each of these lines a particular system will operate. Lines drawn from these points, and pe endicular to the axes, will intersect a t a point which may%e said t o define the system.

Developing his theory further, Jonker goes on to anallze the mechanization of I. R. systems. At the present time, he feels, the most important time- and labor-saving functions of mechanization lie in faciliating term correlation. Other operations, such as automatic encoding, printout, etc., are simply added benefits in more complex systems.

For students of information systems. perhaps the most valuable section of this book is the chapter entitled “Pri- mary design consideration” (pp. 90-114). Here, the author gives a clear discussion of existing commercially available systems from the point of view of their suitability to par- ticular I. R. needs. Working from his theory, he considers the efficiency with which they accomplish term correlation with respect to the way they handle three basic operations:

store organization (document or term grouping) matching (simultaneous or sequential) access to the store (single or multiple)

Using this gauge, eight basic types of systems are possible, in the form of different combinations of these operations. The author gives examples of systems embodying each of the combinations. He gives excellent diagrams which demonstrate the principles by which they work, and which are far more valuable than the photographs in, for example, Bourne’s Methods of Information Handling. There is also a list of sources of supply, but with none of the valuable cost information given by Bourne.

Jonker seems somewhat overly impressed with the theoretical approach. He makes the following statement

110 American Documentation - April 1966

about the machine “art”: “In developing his design, the designer proceeds from the most fundamental considera- tions available to him to considerations which are usually of a less abstract nature, and from there to design details” (p. 85). This ie a theoretically sound approach, but, in practice, if the process takes place, it must often be some-

where below the conscious level. Nevertheless, this book has much valuable material for the student, and the “genera] theory” may be at least a way to analyze the systems avail- able when a potential consumer must decide which one suits him best.

EDITH WARD

American Documentation - April 1966 111