procedure for creating digital libraries based on …€¦  · web viewit is possible to convert...

33
Second Draft (05-03-2006) CREATING DIGITAL LIBRARIES BASED ON CDS/ISIS DATABASES (Guide for use with Greenstone releases prior to 2.70) by Pablo Morete and John Rose This guide is intended to help users of UNESCO's CDS/ISIS software 1 to convert their databases to digital libraries under the Greenstone Digital Library program. Like all Greenstone libraries, these libraries can then be disseminated on multiple Web platforms (Windows, Unix, Linux, Mac OS X) or on CD-ROM. Two types of conversion are possible: 1. "As is" conversion of a CDS/ISIS database to a Greenstone digital library . Since CDS/ISIS records are limited to 32,000 characters (in version 1.5), CDS/ISIS databases generally do not contain the full text of documents; this first type of conversion is thus normally used to provide easier Web or CD- ROM access to a bibliographic or referral database (through indexing or browsing on any of the CDS/ISIS fields). If any of the CDS/ISIS fields contain hyperlinks to external resources, they will be active in the resultant Greenstone library. The Greenstone library cannot be enlarged or edited in Greenstone; it will typically be generated periodically from the master CDS/ISIS database. 2. Creation of a Greenstone digital library of full-text documents from a CDS/ISIS bibliographic database and the files containing the full-text documents associated with the database records. This means that the original document can be imported into Greenstone; and that the text of the document and/or the CDS/ISIS metadata can be accessed through indexing or browsing on any of the CDS/ISIS fields. The resulting digital library can then be updated/maintained as an autonomous application, or if preferred can be periodically regenerated from the master CDS/ISIS database with associated documents. 1 http://www.unesco.org/webworld/isis

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Second Draft (05-03-2006)

CREATING DIGITAL LIBRARIES BASED ON CDS/ISIS DATABASES(Guide for use with Greenstone releases prior to 2.70)

by Pablo Morete and John Rose

This guide is intended to help users of UNESCO's CDS/ISIS software1 to convert their databases to digital libraries under the Greenstone Digital Library program. Like all Greenstone libraries, these libraries can then be disseminated on multiple Web platforms (Windows, Unix, Linux, Mac OS X) or on CD-ROM. Two types of conversion are possible:

1. "As is" conversion of a CDS/ISIS database to a Greenstone digital library . Since CDS/ISIS records are limited to 32,000 characters (in version 1.5), CDS/ISIS databases generally do not contain the full text of documents; this first type of conversion is thus normally used to provide easier Web or CD-ROM access to a bibliographic or referral database (through indexing or browsing on any of the CDS/ISIS fields). If any of the CDS/ISIS fields contain hyperlinks to external resources, they will be active in the resultant Greenstone library. The Greenstone library cannot be enlarged or edited in Greenstone; it will typically be generated periodically from the master CDS/ISIS database.

2. Creation of a Greenstone digital library of full-text documents from a CDS/ISIS bibliographic database and the files containing the full-text documents associated with the database records. This means that the original document can be imported into Greenstone; and that the text of the document and/or the CDS/ISIS metadata can be accessed through indexing or browsing on any of the CDS/ISIS fields. The resulting digital library can then be updated/maintained as an autonomous application, or if preferred can be periodically regenerated from the master CDS/ISIS database with associated documents.

The second type of conversion is of particular interest for CDS/ISIS users, since it enables one to provide full digital library services (indexed access to original documents) while retaining all of the power and flexibility of CDS/ISIS in the handling of the corresponding metadata.2

Both types of CDS/ISIS to Greenstone conversion will be documented in the following two sections of this guide, while the third section will briefly address the issue of configuring the user interface

1 http://www.unesco.org/webworld/isis2 Other FOSS/freeware options available for presentation of native CDS/ISIS databases "as is" on the Web include:1. GenISIS is freeware available from UNESCO (http://www.unesco.org/webworld/isis) to create a customized server-

end application for querying CDS/ISIS databases. It requires the WWWISIS software (also called WXIS from version 4.0) of the Latin American and Caribbean Center on Health Sciences Information (BIREME), but for serving under Windows, the freeware version of WWWISIS (version 3.0) is sufficient.

2. The JavaISIS package of UNESCO requires installation of the JavaISIS Server at the server end and the JavaISIS Client at the user end. Users can retrieve, create, modify and display records, and import and export records using the ISO2709 format. Multilingual encoding support is provided. Since JavaISIS also requires WWWISIS, it works only under a Windows based Web server unless the paid version of WWWISIS is acquired.

3. CLABEL can be used to serve a CDS/ISIS database over Linux/Unix. It requires OpenISIS and PHP-OpenISIS (all three are FOSS programs available at http://www/sourceforge.org).

4. Igloo (http://igloo.lib.itb.ac.id/) can be used either over OpenISIS and PHP-OpenISIS or over PHP-OpenIsis for Windows.

Page 2: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

once the database is converted. However, in order to follow these instructions, the user should first become familiar with the basic use of the Greenstone Librarian Interface (GLI) in creating digital libraries.3

The CDS/ISIS to Greenstone conversion functionality is continually being improved. The present instructions are being issued to enable users to test this functionality prior to the availability of Greenstone version 2.70 scheduled for April 2006 (an updated version will be provided when this version is released). It is recommended that users wishing to convert CDS/ISIS databases to Greenstone install the latest "stable" version of Greenstone (version 2.62 at the time of writing) from the website at http://www.greenstone.org. Then, if your installation is 2.63 or older you should download i) the updated ISISPlug plugin at http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.63/ISISPlug.pm (and use it to replace the existing "ISISPlug.pm" in the "Greenstone\perllib\plugins" directory, normally found under C:\Program Files\ in Windows) and ii) the updated "explode" program at http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.63/explode_metadata_database.pl (and use it to replace the existing "explode_metadata_database.pl" file in the "Greenstone\bin\script" directory, normally found under C:\Program Files\ in Windows).4, 5

1- IMPORTING A CDS/ISIS DATABASE INTO GREENSTONE

This procedure converts your CDS/ISIS database "as is" into a Greenstone digital library. It requires the ISISPlug plugin which has been available in Greenstone starting with version 2.50. Data formatted according to most CDS/ISIS data structures (e.g. subfields, repeatable fields, and "pseudo" repeatable fields using " angular brackets (< and >) to delimit occurrences (CDS/ISIS indexing type 2)6) should be imported correctly without special user action.

The following steps are required:A- Load the Greenstone Librarian Interface (GLI) create a new collection by clicking on the

File/New... main menu item, providing a short collection name (for convenience it may be the name of CDS/ISIS database) and a description. Keep the default setting for "Base this collection on:" as -- New Collection --, click on OK (see Figure 1). You are proposed a list of metadata sets; uncheck Dublin Core Metadata Element Set and click OK and then OK again in the warning box (you do not need a metadata set for this step, metadata will be generated from the CDS/ISIS file structure).

3 See, for example, Chapter 3 of the Greenstone User's Guide (http://prdownloads.sourceforge.net/greenstone/User-en.pdf) or the FAO IMARK training module on Digitization and Digital Libraries (http://www.imarkgroup.org/), followed as required by the more advanced sources in section 3 of this guide.4 In versions of Greenstone prior to 2.62, the main Greenstone directory was called "gsdl" rather than "Greenstone", and you should put the new files in that directory.5 It is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata editing) using Greenstone 2.50 or above, and to import associated documents and edit CDS/ISIS metadata in Greenstone using version 2.60 or above; users who do not want to update their entire installation can in principle also carry out the functions in this guide by updating only the ISISPlug.pm file for "as is" conversion and both files for full conversion.6 In versions of Greenstone up to 2.63, only the CDS/ISIS field named "Keywords" can be formatted in this way and it must be thus formatted.

Page 3: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 1: Creating a new collection

B- The GLI Gather panel will appear. Locate the MST, XRF, and FDT files from your CDS/ISIS database in the directory tree of Workspace pane (normally they will be in the "C:\WINISIS\data\" directory of the Local Filespace) and drag them one by one into the Collection pane. When you copy MST file into the Collection pane Greenstone will propose to load ISISPlug (see figure 2); accept the proposal. For the XRF and FDT files Greenstone will say that there is no applicable plugin applicable (see figure 3) - this is OK.

Figure 2: Warning box when the MST file is dragged to the Collection pane

Page 4: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 3: Warning box when the FDT file is dragged to the Collection pane

Page 5: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

You will then have the screen shown in Figure 4.

Figure 4: Collection files ready to process

C- If you are using non-ASCII characters in your database, go to the Design panel and click on Document Plugins. Then click on ISISPlug and on Configure Plugin (see Figure 5). Check input_encoding and select the appropriate DOS codepage (for Latin alphabets using accented letters, it is "dos_850-DOS codepage 850 (Latin 1)") and click OK. This will set ISISPlug to correctly recognize the character set of your database (see Figure 6).

Page 6: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 5: Select the ISISPlug under Document Plugins in the Design panel

Figure 6: Select the appropriate character encoding scheme

Page 7: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

If either i) you want to incorporate and index electronic document files into your Greenstone library or ii) you want to edit metadata in Greenstone (not typically required for an "as is" conversion of the CDS/ISIS database), then go to Section 2 on "Assigning CDS/ISIS Metadata to Electronic Documents". If on the other hand your CDS/ISIS database is a self-contained resource database pointing to paper documents, or if you are not interested in indexing the electronic documents referenced in a bibliographic field in your database, continue with the steps just below.

D- Go to the Create panel of GLI and click on Build Collection (see Figure 7).7

Figure 7: The Build Collection button in the Create panel

E- Normally after several seconds or minutes (depending on the size of your database, it could be longer), a box will appear informing you that the collection has been built (see Figure 8), and you may then preview it by clicking on the "Preview Collection" button.

7 Note that it is only after the collection is built the first time that the CDS/ISIS metadata are extracted and available to create indexes, browsing classifiers and display formats.

Page 8: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 8: Box announcing that the collection has been built

F- By default the new collection will be in simple search mode (single search index) with two browsing classifiers ("titles"8 and "filenames"9) and a primitive display format showing the raw records. To provide an appropriate user interface to the collection, you will now need to configure your search types, search indexes, browsing classifiers and format features in the Design panel. This is a task common to all Greenstone collections, for which some specific guidance and general references are given in Section 3.

ALTERNATIVE METHOD:If you have a CDS/ISIS database whose fields are equivalent to a subset of those of the CDS/ISIS example database provided with Greenstone (which follows the structure of UNESCO's bibliographic database), you can consider building your Greenstone library on the model of this example provided with Greenstone.10 The advantage would be to use or to adapt the rather complex record display format of this example rather than having to start from scratch on the format.

In this case, specify when you create the Greenstone collection that it is to be based on the "CDS/ISIS example (isis-e)" by selecting this in the drop-down list for the parameter: "Base this collection on:" in the initial screen for creating the collection (see Figure 1). If your fields correspond exactly to those in the example, your library will work just as the example after you complete steps B-E above, provided that afterwards you replace "isis-e" in the "&c=isis-e" specification of the 'format DocumentText' statement (Design panel, Format Features) by the short name of your Greenstone collection (see Figure 9, and also Section 3, point D concerning editing of formats in the Format Features view). If your field names are different, you should modify your CDS/ISIS FDT to use the field names of the example;11 the convenience of this approach will depend on whether the example collection's presentation appeals to you and the extent of the differences between the two structures.

8 This classifier will work if the CDS/ISIS database has a field entitled "Title".9 The filename for all of the records is the MST file, so this classifier is not useful.10 The example collections can be downloaded from http://prdownloads.sourceforge.net/greenstone/gsdl-documented-collections-aug2005.zip (if your CDS/ISIS example collection came with a version of Greenstone prior to 2.62, for example the UNESCO CD-ROMs published in 2004 (Greenstone version 2.50) or 2005 (Greenstone version 2.60), it will only work properly in version 2.62 and greater if you download and install the updated version of the Greenstone\collect\isis-e\etc\collect.cfg file at http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/collect.cfg).11 It would also be possible to modify the configuration of the Greenstone CDS/ISIS example collection to your needs, but this would be considerably more complex - refer to the "How the collection works" presentation on the homepage of the example collection.

Page 9: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 9: Format parameter to change (shown within the red box in HTML Format String)

Remember that if you are going on to assign CDS/ISIS metadata to electronic documents (second type of database conversion), there is no need to improve the display interface of the simple "as is" Greenstone collection based only on metadata, or to use the above alternative method based on such a collection.

2- ASSIGNING CDS/ISIS METADATA TO ELECTRONIC DOCUMENTS

The "Explode Metadata Set" option" provides a way of reorganizing a Greenstone collection consisting of metadata only ("as is" conversion of a CDS/ISIS database) so that each record appears as an individual document with the associated metadata assigned to it.

Exploding metadata is an irreversible process, so that if you have built an "as is" CDS/ISIS library in Greenstone, and want to keep it, start this next step by creating a new Greenstone collection based on your "as is" collection (as explained above under ALTERNATIVE METHOD).12

In the Gather panel, you will notice that the MST file has a different coloured icon than the other files. This green icon indicates that the file it is a metadata database that can be exploded. Right-

12 The only reason to save the "as is" library would normally be that you have developed a user interface (search types, search indexes, browsing classifiers, and display formats) that you want to save. The explode function erases the copy of the MST file which was dragged into GLI, but not the original MST file in your CDS/ISIS database.

Page 10: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

click on the icon and choose Explode metadata database by right clicking on this line in the menu (see Figure 10).

Figure 10: Menu presented after right click on the MST file

There are now two ways to integrate the electronic documents into the collection: A) automatically importing them through a hyperlink existing in the CDS/ISIS database or B) creating dummy files which are later replaced by the corresponding documents.

A- Automatic importation of the electronic documents

When the Explode Metadata Database window opens, the explode parameters including the CDS/ISIS field containing the hyperlink should be specified as in Figure 11.

Page 11: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 11: Specifying the Explode Metadata Database parameters

input_encoding: As before, if you are using non-ASCII characters in your database select the appropriate character set from the dropdown menu (for Latin alphabets it is "dos_850").

document_field: indicate the label (name) of the CDS/ISIS field containing the file name (in which case the document_prefix parameter is also required, as in this example) or the path and filename (C:\qqq\rrr\sss\ttt.pdf ) of the electronic version of the document associated with the record (the document can be in any format accepted by Greenstone: htm, pdf, doc, ppt, etc.). Here we have specified the "Notes" field of the example CDS/ISIS application in which the document names have been added (they are not present in the example collection).

document_prefix: indicate the path (if any) to be prefixed to the contents of document_field.

Leave the other options blank and click on Explode. Greenstone will explode the database, copying the corresponding full-text document files into the "Greenstone\collection\xxx\import\YYY\" directory, where xxx is the name of the Greenstone database (in this case "cds") and YYY is the name of the CDS/ISIS database (in this case also "cds"). The metadata for the records is in a new "metadata.xml" file in the "Greenstone\collection\xxx\import\YYY" directory. The YYY.MST file is deleted automatically (and the copies in Greenstone of the YYY.XRFand YYY.FDT files are no longer needed - they can be erased in the Collection pane of the Gather panel if the user wishes). Click on the OK button when informed that the explode process has been completed.

B- Creation and replacement of dummy documents

Page 12: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

If the user finds it inconvenient to place the paths and filenames of the associated electronic documents in Greenstone,13 the documents can be temporarily replaced by a dummy document file (a file with zero length and ".nul" extension). In this case, the bibliographic metadata are attached to the dummy file. When the full document is available it can be copied into the collection to replace the dummy file, but then before the collection is rebuilt, the user must use a text editor like WordPad to change the line "<FileName>filename\.nul</FileName>" in the metadata.xml file (in the Program Files\Greenstone\collect\xxx\import\YYY directory) by replacing "filename\.nul" by the full file name (name and extension, e.g. "actualfilename\.doc") of the actual electronic document.14

Note that if there are some records in the collection which are based on dummy documents, you must include the NULPlug plugin when building the collection (this will normally not require action unless you base your collection on one without NULPlug installed, since NULPlug is loaded by the default configuration).

There are two ways in Greenstone to insert dummy documents referenced in CDS/ISIS documents: If the name of the documents' filenames (but not the paths) are available in a CDS/ISIS

field, this field can be specified as the filename_field when exploding the database (see Figure 11); in this case the document_field parameter should be left blank. A filename.nul dummy document will be created for each record at the time of exploding and identified as the source document for the record in the metadata.xml file (note that if "zzz.doc" occurs in the field, the dummy document will be called "zzz.doc.nul").

If you do not specify a value for either filename_field or document_field in the Explode Metadata Database window, then dummy files will be created for all of the records with filenames 0001.nul, 0002.nul, etc.15

You can now build the collection in GLI (as in Section 1, point D), making sure that you have specified in the Document Plugins view of the Design panel the plugins you will need to process your documents (e.g. WordPlug and HTMLPlug for Word documents, NULPlug for dummy documents) - action will normally not be required unless you base your collection on one without the desired plugins, since most of the available plugins are loaded by the default configuration. You will be able to search and to browse using the default interface, which should now be finalized by configuring the search types, search indexes, browsing classifiers and format features in the Design panel (see Section 3).

At this stage the CDS/ISIS metadata elements are considered as "extracted" rather than assigned, and all their names start with the prefix "ex.", e.g. "ex.Title". Although they can be used in configuring the search indexes, browsing classifiers and format features (see Section 3), they do not

13 For example, because the user wishes to set up a bibliographic database in Greenstone with the intention of gradually incorporating the associated documents, or simply because it is inconvenient the automate the transfer due to documents scattered among different storage units and directories.14 Note that in Greenstone 2.70, the change in the metadata.xml file will be implemented automatically with a new Greenstone renaming function.15 The disadvantage of this method is that you will then have to rename your documents to correspond to the names of the dummy files (e.g. 0001.doc) or correct the filename in the metadata.xml file (which can be done when changing the nul extension to the correct one).

Page 13: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

appear in the Enrich panel of GLI and therefore cannot be edited, nor assigned to new documents added directly to the new Greenstone collection in the Gather panel.

If you want to be able to edit metadata in and to add documents to the collection using GLI, you most associate the extracted metadata with a valid Greenstone metadata set recognized by GLI (file with mds extension in the "Greenstone\GLI\metadata" directory). After this process is completed, the original extracted metadata will have been mapped into a new metadata set with editable content. For example, the ex.Title field may now be called zzz.Title, where zzz is the short name of the metadata set which you select (option 1 below) or create (option 2 below):16

1. Insert the metadata in an existing metadata set , for example "Dublin Core Metadata Element Set". This is not recommended unless one of the existing metadata sets closely matches your own data structure.17 Go to Design panel and click on Metadata Sets. Click on Add Metadata Set Select the metadata set that best matches your own metadata (e.g. "dublin.mds") and click on Add Metadata Set. GLI will ask you how to deal with the metadata it finds in your records. For each metadata field (see Figure 12), you have the

Figure 12: Box for mapping extracted metadata element Imprint^a to Dublin Core

option of adding it (Add) to the existing metadata set with its original name (e.g. the content of "ex.Dingdong" field will replicated in "dl.Dingdong", irrespective of the target element specified in the dialogue box), mapping it (Merge) into a specified existing target metadata element of the selected set (e.g. the content of the "ex.Title" field will be replicated in "dl.Creator"), or ignoring it (Ignore). Add will not be available if an element of the same name already exists in the target metadata set. One strategy would be, for each element, to Add it if it doesn't have a corresponding element in the selected set, and Merge it if it does (this will result in a collection with metadata elements mainly from the selected metadata set - it doesn't matter that the names differ from the original ones, as long as you use these names in formatting for the user interface). Another strategy would be to Add all of the elements (and ignore the unused similar metadata elements of the selected metadata set when formatting for the user interface).

16 This arcane requirement comes from the fact that Greenstone the metadata were originally assigned in CDS/ISIS rather than in GLI.17 If you do not recognize an appropriate metadata set from the list provided in the Metadata Sets feature of Design panel, you can easily view the details of the available metadata sets using the Greenstone Editor for Metadata Sets (GEMS) which is installed with the Greenstone package (see the second option for assigning metadata presented subsequently).

Page 14: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

2. Use a new metadata set for your metadata elements . This is very simply done with the Greenstone Editor for Metadata Sets (GEMS) that is provided with Greenstone. Open the program and open a new metadata set (File/New... main menu item). Give it a full Name and a unique short name (Namespace) and click OK (see Figure 13). Then select the metadata

Figure 13: Naming a new metadata set in GEMS

set in the left panel and click on the Save item in the File menu. A blank metadata set will be saved in a file named Name.mds (in this case, cds.mds). Now go back to your Greenstone collection in GLI and add the newly created blank metadata set in the Metadata Sets view of the Design panel. Add all of the extracted metadata set items one by one as promoted to the new dataset.18

Now if you can go to the Enrich panel and select in the left pane one of the documents in the collection (they are in the YYY directory), you will see associated editable metadata in the right pane. Before building your collection in final form (Build Collection button in the Create panel) you should configure the search indexes, browsing classifiers and format features (see Section 3).

18 It is also possible to create in GEMS a new metadata set corresponding exactly to the CDS/ISIS extracted set. There is no advantage in doing this unless you wish to document the new dataset structure, or to reuse or share it in other applications.

Page 15: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

3- CONFIGURING THE USER INTERFACE

This guide cannot go into all of the details of how to configure the end-user interface for your collection converted from CDS/ISIS. This section will provide only guidance for obtaining a basic acceptable configuration of search types, search indexes, browsing classifiers and format features, as well as a list of Greenstone documentation for more detailed or advanced configuration. The discussion that follows will refer to the default configuration parameters that are obtained by creating a new collection (see step 1.A.). If you base your collection on an existing collection the parameters will be those of the model collection.

A- Search Types

The default configuration is simple search (all search fields in the same index) and plain search type (a single box for search terms). In order to configure for advanced (multi-field) search with a the multi-field search form as default for the end user, go to the Search Types view of the Design panel and check the Enable Advanced Searches box. Then click on the Add Search Type button to add "form" search. Select "form" in the Currently Assigned Search types and click the Move Up button. You will then have the correct configuration shown in Figure 14.

Figure 14: Parameters for form search default using field searching

Page 16: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

B- Search Indexes

Go to the Search Indexes view of Gather panel.

If you have an "as is" database or an exploded database without assignment of a metadata file, the Currently Assigned Indexes by default will be (see Figure 15, but attention, if you have not enabled advanced searches as described under point A above, it will look different):

text "text" [index on the text of the record (in "as is" conversion) or of the linked document (in exploded conversion)]

ex.Title "Title" [index on the extracted title - this is only assured to be the real title if you have a CDS/ISIS field called Title]

ex.Source "Source" [index on the source file name (for an "as is" conversion it is YYY.MST for all records, thus useless)]

Figure 15: The view for setting search indexes

If your library contains textual documents, you should keep the text index line. ex.Title should be kept if there is a CDS/ISIS field called title; otherwise you should change to the element which does represent the title (e.g. "ex.Name"). If you wish to search on file names in the case of an exploded database, then you may keep the ex.Source line. For any index, you can change the "Index Name" (presented in quotes in the corresponding line in the Currently Assigned Indexes box) and/or "Index Source" (the metadata element to be indexed) by selecting a the corresponding line in the Currently Assigned Indexes box, and then changing the data just below it (NOTE that the Add

Page 17: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Index and/or Replace Index buttons only become active when you have made a change in "Index Name" or "Index Source" parameters).

C- Browsing Classifiers

Classifiers define the browsing lists available within a Greenstone application (in Greenstone, indexing a field for search access and specifying it for browsing access are completely separate functions: an indexed field may or may not be presented in a browsing list and vice versa). The classifiers are configured by selecting the Browsing Classifiers view in the Design panel which is shown in Figure 16.

Figure 16: Browsing Classifiers view showing default classifiers

The browsing lists defined by the classifiers can be accessed through buttons on the homepage of the collection, as seen in Figure 17 (note that the search button is always at the upper left, and the browsing classifier buttons are to its right).

Page 18: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 17: Homepage of the "cds" collection

The classifiers proposed by default are:

classify AZList -metadata ex.Title [alphabetically sorted list ordered according to the extracted title - this is only assured to be the real title if you have a CDS/ISIS field called Title]

classify AZList -metadata ex.Source [alphabetically sorted list ordered according to the source file name (for an "as is" conversion it is YYY.MST for all records, thus useless)]

There are numerous classifier types available in Greenstone but for simplicity this guide will refer only the AZList (display of simple vertical list of records in alphabetical order) and AZCompactList (display of an alphabetically sorted vertical list of metadata values - clicking on one of the values yields a vertical list of the corresponding records in alphabetical order, similar to the list provided by AZList). The AZCompactList is used to browse metadata for which the same value can occur in several records, such as for authors and keywords.

Classifiers are added, configured or removed using the Browsing Classifiers view (Figure 16). As a model, we will assume that the user wishes to enable browsing on title, authors and keywords, for which the following steps should be followed (in any order):

Page 19: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Remove the source file browser by selecting the ex.Source line in the Currently Assigned Classifiers box and clicking on the Remove Classifier button.

If the name of the title field of your CDS/ISIS database is other than "Title" then change the metadata element to be browsed in the title classifier line to the correct metadata name (e.g. if the title field in CDS/ISIS is "Name", then change the metadata element from. ex.Title to ex.Name). This is done by selecting the ex.Title line in the Currently Assigned Classifiers box and clicking on the Configure Classifier button to display the "Configuring Arguments" window (Figure 18):

Figure 18: Configuration window for AZList classifier

Then you change the name of the metadata element to be browsed on in the top field and click OK.

Add the additional authors and keywords fields using the AZCompactList classifier type. For each classifier, Select AZCompactList in the Select classifier to add field of the Browsing Classifiers view (Figure 16), then click on the Add Classifier button to get the "Configuring Arguments" window (Figure 19). Then select the name of the metadata element to be browsed on in the top field (if the field is repeatable select the occurrence metadata element, e.g. "ex.Keywords", rather than the entire field, e.g. "ex.Keywords^all"), set the minigroup parameter to "1", and click OK.

Page 20: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 19: Configuration window for AZCompactList classifier

If the collections is rebuilt and previewed in the Create view, the new browser structure will appear in the homepage of the collection as seen in Figure 20. The order of the browser classifiers can be changed by selecting a line in the Currently Assigned Classifiers box, clicking on the Move Up or Move Down buttons as appropriate, and rebuilding the collection.

Page 21: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

Figure 20: Modified homepage of the "cds" collection

The two new browsing lists are identified on the homepage as underlined text rather than as buttons, because there is no button in Greenstone corresponding to the names of the classifiers (here PersonalAuthors and Keywords). There are buttons present for all of the metadata elements of the metadata sets provided with Greenstone (you can see the elements of any metadata set by adding in the Metadata Sets view of the Design panel, and then removing if not needed, or by using the Greenstone Editor for Metadata Sets (GEMS) described under point 2.B.2 above). For example, you will see that "Keyword" rather than "Keywords" exists in the Development Library Subset (dls) metadata set, and knowing this, you can set the buttonname parameter in the classifier configuration window (Figure 19) to "Keyword" and rebuild to get the "keywords" button on the collection homepage. If you want to insert a button with an unsupported name, there is a page on the Greenstone website that can be used to generate a new button in the default Greenstone style (http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/make-images.html).

D- Format Features

After completing the above steps it is possible that the record references in browsing lists and/or the full texts to be displayed when a record is selected will not appear correctly. The two most common format features to be considered in this context, presented briefly below, are respectively the VList and DocumentText parameters in the Format Features view of the Design view (note that these can be changed and previewed in GLI without rebuilding the collection). A third relevant

John Rose, 01/03/-1,
Waikato please check, is it all of the sets or only Dublin Core and dls?
Page 22: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

formatting point covered here is the display configuration parameters of ISISPlug. For more detailed configuration needs, the user is referred to the resources in Section E.

VList is a format in Greenstone formatting language (modified html) which determines how the record reference (including the title and other metadata determined by the user) is displayed in the search results and browsing lists. It can be edited by selecting the line starting with "format VList" in the Currently Assigned Format Commands box (DO NOT select it in the "Affected Component" field which is for adding new formats rather than modifying existing ones). The box with VList selected for editing is shown in Figure 21 (MAKE SURE to expand the window to full screen to see the contents of the Currently Assigned Format Commands box).

Figure 21: Format Features window ready for the editing of VList

VList can now be edited in the HTML Format String box by simply typing in the box and/or inserting specified metadata elements or standard format elements by choosing them in the drop-down list of Variables and clicking on the Insert button. The default version is:

<td valign=top>[link][icon][/link]</td><td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td><td valign=top>[highlight]{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}

Page 23: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

[/highlight]{If}{[ex.Source],<br><i>([ex.Source])</i>}</td>

For example, if one wants the list of authors in parentheses instead of the source file name in parentheses, the last line may be changed to:

[/highlight]{If}{[ex.PersonalAuthors^all],<br><i>([ex.PersonalAuthors^all])</i>}</td>

The only major problem with the default VList format would be if the CDS/ISIS title field is called something other than "Title". For example if this field is called "Name", the fourth line of the Vlist format should be changed to the following:

{Or}{[ex.Name],Untitled} [This displays the value of ex.Name if this field exists, else the mention "Untitled".]

To get back the default setting of the format, remove the format, then reopen the collection clicking on the File/Open... main menu item

The DocumentText format displays the full text associated with a selected record (the full CDS/ISIS record in the case of "as is" conversion, or the full electronic document if it has been imported through the explode method). If an electronic document has been imported, then the default value of this format will suffice.

However, if the display text is intended to be the CDS/ISIS record (which will be presented in an inconvenient form with the default format in an "as is" conversion, and not at all if dummy files have been created), the DocumentText format should be edited (in the same way as was done for VList above) to show the metadata elements in the desired way.

For example, if the default format is changed as follows:

Title: [ex.Title]<br>Authors: [ex.PersonalAuthors^all]<br>Publisher: [ex.Imprint^all]<br>Keywords: [ex.Keywords^all]

the text of the records would display as follows:

Title: Policy Guidelines for the Development and Promotion of Governmental Public Domain InformationAuthors: Uhlir, P.F.Publisher: Paris, UNESCO, 2004Keywords: public domain, digital information, open access, copyright

instead of the inconvenient result with the default format:

---------- tag=24 data=Policy Guidelines for the Development and Promotion of Governmental Public Domain Information tag=26 data=^aParis^bUNESCO^c2004 tag=50 data=Policy Guidelines final.doc tag=69 data=<public domain><digital information><open access><copyright> tag=70 data=Uhlir, P.F.

The ISISPlug plugin has two formatting parameters which affect the display metadata. These are the entry_separator and subfield_separator paramaters shown in the "Configuring Arguments" window for ISISPlug (Figure 22) which is obtained by selecting

Page 24: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

ISISPlug in the Document Plugins view of the Design panel, and then clicking on the Configure Plugin button (see Figure 5). These parameters are useful in changing the display in Greenstone of the full field in the case of repeatable fields and fields with subfields.

The entry_separator parameter controls how the combined content of a repeatable field (designated in Greenstone as fieldname^all) is derived from the individual occurrences; the default is <br>19 which generates a combined field with the a new line between occurrences. NOTE that this parameters relates to fields designated as repeatable in the CDS/ISIS field definition (FDT) file, not to "pseudo repeatable" fields in which the occurrences are separated by angular brackets (< and >) or slashes (/ and /).

The subfield_separator controls how combined content of field with subfields (also designated in Greenstone as fieldname^all) is derived from the individual subfields; the default is a comma followed by a space (which is entered in the parameter but not explicitly displayed in the "Configuring Arguments" window), which generates a combined field consisting of the subfields separated by commas.

The values shown in gray characters on gray background are active even if the corresponding parameters are not checked in the "Configuring Arguments" window; the parameters need be changed only if the user desires to modify them.

Figure 22: Formatting parameters selected in ISISPlug

E- Resources for detailed guidance

The following resources may be consulted for more complete/advanced guidance on configuring Greenstone search types, search indexes, browsing classifiers and format features.19 The "&lt;" and "&gt;" sequences represent the" less than" (<) and "greater than" (>) brackets in HTML.

John Rose, 01/03/-1,
Waikato please check, are any parameters in gray active or only the default parameters?
Page 25: PROCEDURE FOR CREATING DIGITAL LIBRARIES BASED ON …€¦  · Web viewIt is possible to convert CDS/ISIS databases "as is" into Greenstone metadata (presentation only, no metadata

o A summary of all formatting features and commands:http://www.greenstone.org/cgi-bin/library?e=p-en-faq-utfZz-8&a=p&p=faqcustomize#customizeformat

o Some example collections which have documentation about their configuration:http://www.nzdl.org/cgi-bin/library#documentedexamplecollections

o Section 2.3 of the Greenstone Developer's Guide.http://prdownloads.sourceforge.net/greenstone/Develop-en.pdf

o Fiji workshop, greenstone tutorials, downloadable fromhttp://www.greenstone.org/cgi-bin/library?e=p-en-home-utfZz-8&a=p&p=docs