filing and word breaking procedures. 2 session agenda pre-14.x tab_word_breaking table structure...
TRANSCRIPT
Filing and Word Breaking Procedures
2
Session Agenda
• Pre-14.x• tab_word_breaking table• Structure• Procedures
• Special remarks• tab_filing table• Structure• Procedures
3
Pre-14.x
• Various filing and word breaking procedures existed. Each procedure included many parts, but was a closed box.
• Each procedure was assigned a code, such as B1, B5, C1, A3, AM, etc.
• Each procedure was a separate program, requiring new program development to create new procedures. For example, there was no A3 + AM filing procedure.
4
From 14.1 onwards
• ALEPH provides ready-made components (programs) for creation of filing and word breaking procedures
• /tab/tab_word_breaking -an ALEPH table which identifies word breaking procedures and defines their component parts
• / tab/tab_filing - a table which identifies filing procedures and defines their component parts
5
• /tab/tab_word_breaking -is an ALEPH table which identifies word breaking procedures and defines their component parts.
• Each word breaking procedure is made up of a group of one or more programs.
tab_word_breaking
6
tab_word_breaking
1 2 3 4
!!-!-!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03 L abbreviation
03 L numbers
03 L compress -
03 L to_blank !@#$%^&*()_+={}[]:";'<>,.?/|\
• col.1: procedure identifier• col.2: alpha of the text• col.3: procedure name• col.4: procedure parameters
7
Procedures (1)
• compressStrips characters listed in col. 4
• delete_subfieldChanges sub-field sign (e.g., $$x)
to blank• to_blank
Changes characters listed in col. 4 to blanks
8
Procedures (2)
• subf_to_signChanges second and subsequent
sub-field signs to the single character listed in col. 4
• blank_to_caratChanges blanks to carat (^)
• marc21_41 041 for separating languages in
MARC21 field 041
9
Procedures (3)
• AbbreviationCompresses a dot between single characters (e.g., I. B. M. changes to I B M; I.B.M. changes to IBM)
• NumbersCompresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)
10
Procedures (4)
• IMPORTANT NOTEThe procedures must be listed in logical order. For example, numbers must be listed before compress or change_to_blank if a comma or a dot is included in them.
Otherwise, they will no longer be present when the numbers procedure is used.
11
Procedures (5)
• ReminderWord breaking procedures are used in tab11, section W. A line can be listed several times in tab11, in order to index it multiple times, with different word breaking each time.For example, an apostrophe:O’hara Ohara O hara
11 W 100## abcdq 01 B WRD WAU
11 W 100## abcdq 04 B WRD WAU
12
unicode_to_word_genWord indexing routines, as well as retrieval
routines, use the table defined under instance WORD-FIX in ./alephe/unicode/tab_character_conversion_line. The table is traditionally called unicode_to_word_gen.
13
unicode_to_word_genThis table defines equivalencies for characters, for the purpose of creating words in the words file.All characters naturally retain their unicode value, and are stored in the system in UTF encoding. In order to translate one character into another character (e.g. translating an accented "e" to "e"), you can set an equivalency. The equivalency can be up to 5 characters:
00E6 0061 0065 #LATIN SMALL LETTER AE
14
unicode_to_word_gen The library's tab_word_breaking table can define
different treatment for the same characters. In
separate procedures specific characters can be set to
compress or to be changed to blank. Characters dealt
with in this manner should be left in their natural
value, and not translated in this table.
For example, you might want an apostrophe to be
considered like a blank, like itself, and as if it were
not there at all (e.g. o'hara, ohara). In order to be
able to set the apostrophe in tab_word_breaking as
both as a compressed character, it must retain its
natural value, and NOT be translated in this table.
15
Special Remarks
2. When browsing a word index in the OPAC, special characters are always displayed in their converted state.
I.e., if unicode_to_word_gen table sets umlaut to ue, the word will be displayed with ue, and not with an umlaut.
16
tab_filing - Example
01 L del_subfield
01 L to_lower
01 L abbreviation
01 L suppress
01 L compress '
01 L to_blank !@#$%^&*()_+- ={}[]:";<>?,./~`
01 L mc_to_mac
01 L pack_spaces
01 L char_conv FILING-KEY-01
01 C chi
17
tab_filing - Structure
1 2 3 4
!!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!>
01 L compress ’
01 L char_conv FILING-KEY-01
• col.1: procedure identifier• col.2: alpha of the text• col.3: procedure name• col.4: procedure parameters
18
tab_filing Procedures (1)
• compressStrips characters listed in col. 4
(e.g., ()[]:,)• delete_subfield
Changes subfield sign to blank (e.g., $$x) • to_blank
Changes characters listed in col. 4 to blanks
19
tab_filing Procedures (2)
• to_lowerChanges all characters to lower case
• to_caratChanges subfield sign to two carat (^^) signs in order to achieve hierarchical sorting of headings
• suppressSuppresses all text contained within <<…>>, as well as the signs themselves
20
tab_filing Procedures (3)
• expand_numFor filing numbers numerically, adds leading zeroes to numbers to fixed length of 7 (e.g. 17 -> 0000017)
• mc_to_macChanges initial “mc” to “mac” (for interfiling McKay and MacKay)
• non_filingSuppresses initial text according to non-filing indicator defined in tab11
21
tab_filing Procedures (4)
• compress_blankStrips blanks (e.g. ISBN)
• numbersCompresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)
• non_numericDeletes all non-numeric characters (for ISBN, ISSN)
22
tab_filing Procedures (5)
• abbreviationCompresses a dot between single characters (e.g., I. B. M. changes to I B M, I.B.M. changes to IBM)
• build_filing_key_lc_call_noSpecial procedure for correct sequencing of LC call numbers
23
tab_filing Procedures (7)
• char_convTranslates one character for another (up to 5), using the char_conv procedure listed in the matching line of the tab_character_conversion_line in alephe/unicode For example:
01 L char_conv FILING-KEY-01
refers to the lineFILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01
24
unicode_to_filing_nn_source
This table is used for character conversion for filing. The table must be processed using UTIL P/3 in order to create the unicode_to_filing_nn table. This latter table is the one actually used by the system. It performs an additional translation in order to remove null characters.
25
unicode_to_filing_01_source
• Examples:Latin capital letter AE:00C6 0041 0045Small letter sharp s:00DF 0053 005A
26
IMPORTANT NOTE
The procedures must be listed in logical order. For example:
numbers must be listed before compress or change_to_blank if comma or dot are included in them. Otherwise, they will no longer be present when the numbers procedure is used.
27
./tab/tab_filing - usage
• Filing procedures are used when building filing key for headings (Z01), index entries (Z11) and sort keys (Z101)
28
./tab/tab_filing - usage
• Note: if no procedure for creation of sort keys
has been defined in tab01.lng, the system will use the default filing procedure 99.
Filing procedure 99 MUST be defined tab_filing, as far as it installs the default sort order.