the chinese university of hong kong library old collections in a new bottle: how the chinese...
TRANSCRIPT
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Old Collections in a New Bottle: Old Collections in a New Bottle: How How TThe Chinese University of Hong Kong Library he Chinese University of Hong Kong Library
Uncovers Hidden Treasures of the UniversityUncovers Hidden Treasures of the University
Louisa LAMLouisa LAMJeff LIUJeff LIU
Islandora Conference Islandora Conference Aug 4, 2015Aug 4, 2015
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
OutlineOutline
• About CUHK Library Digital Initiatives• Islandora@CUHK• New Features implemented to uncover the idiosyncratic
nature of Chinese texts to make the CUHK Digital Collections more discoverable
• Book Flipping / Page Progression Direction• Display of Transcribed Chinese Text• Cross search of different forms of Chinese
characters
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
About CUHK LibraryAbout CUHK Library
• 7 branches in 2 campuses
4
Lower campus:Chung Chi College Library,Architecture Library
Central campus: University Library, Law Library
Upper campus: New Asia College Library, United College Library
Medical Library at Prince of Wales Hospital
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
About CUHK LibraryAbout CUHK LibraryCollection Size:Print: 2.4 millionDatabases: 670+E- Journals: 130,000+E-books: 4,500,000+
http://www.lib.cuhk.edu.hk/
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
19
About CUHK LibraryAbout CUHK Library Digital Initiatives Digital InitiativesStarted 1995, now nearly 25 digitization projects were developed with a total of over 5.5 million of images. One of the most popular database reaches several millions hits per year.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Time to Move to a new Digital Time to Move to a new Digital RepositoryRepository
These initiatives suffer from a multitude of limitations that requires migration to a new platform:• Individual web sites for browsing and searching cast difficulties in
branding, maintenance and future technological development• Non-standard descriptive metadata schema• No single platform for the management of digital objects and font-
end display• No cross search amongst the digital collections• Lack of advanced features that were popular with users: facet
search, social networking, federation search, non-discoverable, etc.
6
Time to move from Individual databases to
Islandora
Time to move from Individual databases to
Islandora
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Why Islandora?Why Islandora?
Flexible front-end thematic design & back-end management of digital objects – present much potential for adapting to new technologies
Lots of modularized functions for rapid development (e.g. Drupal’s i18n to support trilingual interface, Google Analytics, Apache Solr, OAI-PMH etc…we all know more it ^-^)
Open source with large user community, documentation and forums Support digital humanities research: text-mining, TEI, GIS, etc Support multiple metadata schema Digital Preservation and curation
Meet our current and future needs!
CUHK Library is the first Asian library implementing Islandora
6Source: http://islandora.ca/islandora-installations
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Is Islandora Sufficient for Is Islandora Sufficient for CUHK Library Use Cases?CUHK Library Use Cases?
• Majority of CUHK Library digital collections are in rare books in Traditional Chinese
• Some idiosyncratic nature of Chinese texts is beyond the scope of a Unicode-based Solr-supported repository system.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
PROBLEM / FEATURE 1:PROBLEM / FEATURE 1:BOOK FLIPPING / PAGE PROGRESSIOBOOK FLIPPING / PAGE PROGRESSION DIRECTIONN DIRECTION
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Book Flipping / Page Progression DirectionBook Flipping / Page Progression Direction
• The default Internet Archive Reader of Islandora is perfect for some modern Chinese and almost all English books, but …
• Not working as expected for our Chinese Rare Book Collection that require flipping from right to left
• The default page direction of Internet Archive Reader will flip the book from left to right, causing weird user experience and incorrect reading of the text.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Incorrect flipping direction for Incorrect flipping direction for Chinese Rare BooksChinese Rare Books
Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islandora%3A145#page/3/mode/2up
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Implementation of a New Implementation of a New Book Flipping / PagBook Flipping / Page Progression Directione Progression Direction
• We partnered with discoverygarden to develop a new Page Progression option in CUHK Islandora
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Implementation of a New Implementation of a New Book Flipping / PagBook Flipping / Page Progression Directione Progression Direction
• A Drush parameter was also developed for batch ingestionsudo drush --root=/var/www/drupal7
--uri=http://repository.lib.cuhk.edu.hk --user=admin islandora_book_batch_preprocess --namespace=islandora --parent=islandora:daoist-text --content_models=islandora:bookCModel --type=zip --page_progression=rl --do_not_generate_ocr --target=/mnt/daoist/007294481.zip
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Corrected Corrected Book Flipping / Page Progression Book Flipping / Page Progression DirectionDirection for Chinese Rare Books for Chinese Rare Books
• Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islandora%3A9860#page/1/mode/2up
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
PROBLEM / FEATURE 2:PROBLEM / FEATURE 2:DISPLAYDISPLAY OF TRANSCRIBED CHINESE OF TRANSCRIBED CHINESE TEXTTEXT
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Writing Direction of Chinese Text (1)Writing Direction of Chinese Text (1)
• Like most Asian texts, Chinese texts can be written vertically and horizontally.
《中國學生周報》第 1 期 中華民國 41(1952) 年 7 月 25 日First issue of “The Chinese Student Weekly” published on 25 July, 1952 http://hklit.lib.cuhk.edu.hk/pdf/journal/78/1952/160001p.pdf
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館• Traditionally, vertical writing was the standard system and
widely used in publications
文昌帝君陰騭文廣義節錄 : [ 三卷 ] / 周夢顏述 .http://repository.lib.cuhk.edu.hk/en/islandora/object/islandora%3A9860#page/19/mode/2up
1. Vertical
2. Right to Left
Writing Direction of Chinese Text (2)Writing Direction of Chinese Text (2)
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館• Horizontal writing is mainly used for signs
• In recent 40 years, the writing system is gradually changed to using horizontal writing for publications, possibly owing to the influence of English and the inability of some software / browser to fully support vertical display.
Writing Direction of Chinese Text (3)Writing Direction of Chinese Text (3)
Header Logo of “The Chinese Student Weekly” published in year 1952
Header Logo of “Hong Kong Literature” published in year 2013http://hklit.lib.cuhk.edu.hk/pdf/journal/97/2013/1000065.pdf
Horizontally from Right to Left
Horizontally from Left to Right
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Sheng Xuanhuai ArchiveSheng Xuanhuai Archive
• Library collaborated with CUHK Art Museum for developing a Sheng XuanHuai manuscript archive.
• Sheng Xuanhuai Archive contains letters and correspondences of Sheng Xuanhuai, who was a very influential entrepreneur in the late Qing Dynasty.
• The texts of the manuscript were transcribed by a Shanghai expert.
• There is a need to display the transcribed Chinese text with the digitized images in Islandora.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Display Problem in IslandoraDisplay Problem in Islandora• A side-by-side Open SeaDragon viewer and Transcription viewer
is used for displaying the images and the transcribed text.• However, the readability of the image and annotation is lowered
as the reading directions for two viewers are different.
2. Right to Left 1. Left to Right
1. Vertical
2. Vertical
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
DisplayDisplay of Chinese Text in Vertical Directionof Chinese Text in Vertical Direction
• We partnered with discoverygarden to develop a new feature option in Islandora that enables vertical display of transcribed text
• The implemented solution is based on the Writing Mode style in CSS3
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Corrected Display DirectionCorrected Display Direction for for Chinese Text Chinese Text
• http://repository.lib.cuhk.edu.hk/en/islandora/object/namespace%3A2
2. Right to Left
2. Right to Left
1. Vertical
1. Vertical
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
TranscriptionTranscription DisplayDisplay for for Chinese Text Chinese Text
Known issues / Limitations:•The enhancement fits for all recent versions of popular browsers, except FireFox, which does not support CSS3 writing mode currently•The vertical text display is not supported in the admin edit mode of Islandora
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
PROBLEM / FEATURE 3:PROBLEM / FEATURE 3:CJK TSVCC SEARCHCJK TSVCC SEARCH
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Chinese Search in default Islandora / SolrChinese Search in default Islandora / Solr
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Search of one character 中 : no result
Search of 2 characters 中文 : no result
Search of one phrase 中文大學圖書館 (title): no result
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Chinese Search in default Islandora / SolrChinese Search in default Islandora / Solr
• It is not the problem / bug of Islandora / Solr• Just like other systems, Solr is developed based on
western language.• In CUHK, our Integrated Library System Innopac /
Millennium also has similar problems.• Customization is required to enable the search of
Chinese characters
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
28
Structure of Chinese CharactersStructure of Chinese Characters
Word• 香 (Incense) • 港 (Port)• 中 (Center)• 文 (Language)• 大 (Large)• 學 (Learn)
Phrase• 香港 (Hong Kong)• 中文 (Chinese)• 文大 (Meaningless)• 大學 (University)• 中文大學 (Chinese University)• 香港中文大學 (Chinese University of Hong
Kong)FormTraditional Chinese 繁體中文 (Proper Chinese): Used in Hong Kong, TaiwanSimplified Chinese 简体中文 : Used in Mainland China after 1949 and Singapore
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
29
Different Unicode for Different Forms of Different Unicode for Different Forms of the Same Chinese Characterthe Same Chinese Character
• Traditional Chinese:• 中文大學 (Chinese University) (U+5B78)• Simplified Chinese:• 中文大学 (Chinese University) (U+5B66) • Traditional Chinese:• 中國 (U+570B)• Simplified Chinese:• 中国 (U+56EF)
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
30
Preferred Search and Display Mode Preferred Search and Display Mode of CUHK Library Usersof CUHK Library Users
• In Hong Kong, as a Special Administration Region of China, we need to serve both indexing and searching of Traditional Chinese and Simplified Chinese in our publications, websites, and ….. Islandora
• CUHK Library users are composed of Mainland students and faculty that use Simplified Chinese and local students and faculty that use Traditional Chinese
• Most prefer cross-search and retrieval of materials in both traditional and simplified Chinese by inputting one single form of characters
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
31
Different Unicode for Variant Forms of Different Unicode for Variant Forms of the Same Chinese Characterthe Same Chinese Character
• Because of the long history of China, there are variant forms of the same character carrying the same meaning: 台灣 (Taiwan) U+53F0 vs 臺灣 (Taiwan) |
U+81FA • It is similar to American English and British English• “Center” vs “Centre” and “Digitization” vs
“Digitisation”
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Unique Searching Problems of Unique Searching Problems of Chinese CharactersChinese Characters
• How to handle the cross search of 1) different forms of the same characters and 2) variant forms of the same characters, all with different Unicode?
• Hong Kong Libraries developed an unique way of handling this special nature of Chinese characters
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
TSVCC TableTSVCC Table
• For mapping Traditional Chinese, Simplified Chinese and Variant form of Chinese Characters.
• Has been Developed since 2003 for supporting CJK and Unicode support in the web-based Online Public Access Catalog.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Mapping in TSVCC TableMapping in TSVCC Table
• Traditional Chinese: 學 (U+5B78) with Simplified Chinese 学 (U+5B66)
• Variant Forms:• U+4E00 一 | U+5F0C 弌 | U+58F9 壹 | U+58F1 壱 | (One)
• U+4E8C 二 | U+5F0D 弍 | U+8CB3 貳 | U+8D30 贰 | U+5F10 弐 | U+8CAE 貮 | (Two)
• U+53F0 台 | U+81FA 臺 | U+98B1 颱 | U+6AAF 檯 | U+67B1 枱 (Table)
Total 4515 entries
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Implementation of TSVCC in Implementation of TSVCC in Islandora@CUHKIslandora@CUHK
• We partnered with discoverygarden to implement the TSVCC table in Islandora.
• The mapping table mapping-tsvcc.txt was loaded into /usr/local/fedora/solr/collection1/conf
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Implementation of TSVCC in Implementation of TSVCC in Islandora@CUHKIslandora@CUHK
Extract from schema.xml
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Implementation of TSVCC in Implementation of TSVCC in Islandora@CUHKIslandora@CUHK
• Input in either Traditional Chinese or Simplified Chinese retrieves exactly the same result
Search in Simplified ChineseSearch in Traditional Chinese
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
Conclusion: Islandora@CUHKConclusion: Islandora@CUHK
• Islandora is the right and good move for the continued development of CUHK digital initiatives
• The problems found in the display and search of Chinese characters turned out to be the main development features of Islandora@CUHK
• The enhanced features that built around the idiosyncratic features of Chinese characters further strengthen the platform for the discoverability of CUHK library treasures
• A new team, namely Research Support & Digital Initiatives, was just established in the Library to lead the development of e-Research; digital collections at Islandora is one of the core components of this strategy.
The C
hin
ese
Univ
ers
ity o
f H
ong
Kong L
ibra
ry
香港中文大學圖書館
ReferencesReferences
Chinese Characters Searching•https://en.wikipedia.org/wiki/Ambiguities_in_Chinese_character_simplification•http://hkiug.ln.edu.hk/unicode•http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.htmlTranscription Display for Chinese Text•https://en.wikipedia.org/wiki/Horizontal_and_vertical_writing_in_East_Asian_scripts