document management (aka ‘digital libraries’)
DESCRIPTION
Document management (aka ‘digital libraries’). The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze. Document management Content management Metadata management Multimedia documents - PowerPoint PPT PresentationTRANSCRIPT
Document management (aka ‘digital libraries’)
The Greenstone Group:
Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze
Our work includes…
• Document management
• Content management• Metadata management• Multimedia documents• Alerting and event
notification support
• OCR-ing services• Document & collection
visualization• User needs analysis• Text mining• Automatic metadata
extraction
Greenstone software
• ‘digital library’ construction, use, and maintenance software
• Developed at Waikato (www.greenstone.org)• Open Source• Widely used internationally (UNESCO, FAO,
Texas A&M Uni, Kyrgyz Republic, …)
Digital library:A collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organisation, and maintenance[librarian]
Greenstone software features “Library” = set of separate collections
“Collection” = set of separate documents Multigigabyte collections
Hierarchical document model Multimedia picture, voice, music, video collections
Multi-language documents Unicode throughout
Multi-language interfaces French, Chinese, Arabic …
Web browser or CD-ROM
Searching full-text and fielded, ranked or boolean
Browsing hierarchical indexes created from metadata
Metadata Dublin core + collection-specific extensions
Plugins different document types and metadata specifications
Classifiers create browsing indexes (collection editor decides)
Compression techniques throughout uses MG
Distributed collections coming soon, with Corba
Open-source software free, extensible
Collections
Documents
Access
Importing
Distributing
Greenstone supports: multilanguage documents
Greenstone supports: hierarchically
structured documents
A book
Greenstone supports: collection design, maintenance
Designing a collection with the Gatherer
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Greenstone supports: a wide (and growing) set of file formats
• DOC• PDF• XLS• LaTeX• Refer• MARC• …• highly extensible through ‘plugin’ mechanism
Mobile document access
• handheld information access• browsing methods for varying screen sizes• studies on search behaviour (on- and off-line)• support for non-text documents (FunkyZoom
views of maps, images)
Browsing and exploration: hierarchical phrase index
What’s in this collection?Is it any good?What coverage for topic X?My query returned too much/little, what now?
Recent and proposed projects
• Making documents mobile: moving between large online collections and a PDA
• Text mining: extracting quality metadata from legacy documents
• User needs analysis: what sort of documents do a given set of users require, and how can the collection be managed?
• Visualization: making it easy to ‘see’ what’s in a collection, and supporting effective browsing
Recent and proposed projects• Multi-language collections: tailoring a document
collection interface and interaction mechanisms to the language of its users
• Alerting services: bringing potentially useful documents to the user’s attention, without overwhelming them
• Supporting unusual users: collections for the physically disabled, illiterate or semi-literate, children, …
• Audio and image collections: novel browsing and searching mechanism
Recent and proposed projects
• Storage and searching: developed highly efficient techniques for storing, indexing, and searching text documents; implemented in Greenstone, but portable to other document management software
• Usability analysis: how easy is it to use your current document collection? How can access be improved?
• And a host of wacky and cool things: collaging document collections, music retrieval systems, ‘aerial’ views of documents, …