digital libraries spring 2006, 1 march bharat mehra is 520 (organization and representation of...

40
Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of Tennessee

Upload: milo-sherman

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Libraries

Spring 2006, 1 March

Bharat MehraIS 520 (Organization and Representation of Information)

School of Information SciencesUniversity of Tennessee

Page 2: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Libraries

What does the digital library concept mean to you

as a user as an information professional as an author

Is the Web a digital library? Why? Why not?

Your definition or notion?

Page 3: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Libraries

What is the role of a librarian or information professional? How has this role changed in the context of digital libraries?

Page 4: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

The Web: Implications for DLs

Ubiquitous information source: Why is the web “a much more engaging medium and teacher” than textbooks or a local librarian?

Page 5: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Identify pros and cons for specific situations in the different quadrants?

Page 6: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Finding Information on the Web

Web directories for browsing Yahoo! -- human indexers/catalogers

classificatory structure

Web search engines for queryingAltaVista, Google -- robots

automatically generated indexes

Combination of directory and engine

Page 7: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Paradigm shift

Classic IR Web IR

Collectionprofessionalsselection policy

polling (robot)

Representationdescriptionaccess points

full textmetadata

Searchalgorithms

master fileinverted indexes

non Booleanproprietary

Interfacegood functionalitycomplex

simplistictrade off

Page 8: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Library Features

community based users extension and enhancement of classic IRs digital resources are multimedia: text,

images, sounds, etc. technical capabilities for creating,

searching, and using information distributed using networks (the Web, etc.)

Page 9: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Library Features

content of digital libraries includes data, metadata that describe various aspects of the data

links (or relations) to other data or metadata (internal or external)

context portals to support individual users’ information needs and work tasks

Page 10: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Digital Library Projects

Digital Libraries Initiatives phase II <http://www.dli2.nsf.gov/>

LC American Memory Website <http://memory.loc.gov/>

standards <http://lcweb.loc.gov/standards/metadata.html>

Page 11: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Example Digital Libraries

The National Science Digital Library

http://nsdl.org/ Library portals extend and serve

classrooms, offices, laboratories, homes, and public spaces.

Page 12: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Information Theory (for DLs)Joseph Goguen: A theory of information should be

Useful for understanding and designing info systems (or DLs)

Address the meanings that users give to events, including social and political nuances

Address ethical issues Account for the fact that different individuals and

groups can construe meanings in very different ways

Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).

Page 13: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Goguen’s Info Qualities Relevant to DLs1. Situated: Info can only be fully understood in relation to the particular,

concrete situation in which it actually occurs2. Local: Interpretations are constructed in some particular context, including

a particular time, place, and group3. Emergent: Info cannot be fully understood at the level of the individual, that

is at the level of the individual psychology, because it arises through ongoing interactions with other people/technologies

4. Contingent: Interpretation of info depends upon current situation, which may include the current interpretation of prior events

5. Embodied: Info is tied to documents/bodies in particular situations, so that the particular way that bodies are embedded in a situation may be essential to some interpretations

6. Vague: In practice, info is only elaborated to the degree that it is useful to do so; the rest is grounded in intangible knowledge

7. Open: Info cannot in general be given a final and complete form, but must remain open to revision in the light of future developments

“Wet” information: strongly situated, less mobile “Dry” information: Weakly situated; more mobile

Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).

Page 14: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Issues of Text Representation in DLs

Storing textual materials is related to its: Structure (characters, words, paragraphs,

headings): Represented by mark-up, e.g., Standard Generalized Markup Language

Appearance (choice of format, size of font, margins, line spacing, how headings are represented, location of figures)” Page-description languages precisely describe the appearance, e.g., TeX, PostScript, Portable Document Format (PDF)

Page 15: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

                                                                      

Alternative renderings of a single document

Page 16: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Converting Text

Scanning: Optical character recognition

Encoding characters: ASCII, Unicode

Document type definitions (DTDs) in the Text Encoding Initiative (TEI), Encoded Archival Description (EAD)

Page 17: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Three General Types of Metadata

1. Object-descriptor metadata (Dublin Core)Designed to describe global characteristics of entire objects with external references

2. Internal/Structural Metadata (HTML, XML, RDF)Designed to describe internal semantic structure of objects with internal and external references

3. Display Metadata (HTML, StyleSheets)Designed to describe how objects or parts of objects should be visualized or displayed. Not necessarily related to semantic structure

Page 18: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

What is a Database?

A database is a collection of data that is organized so that its contents can easily be accessed, managed and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network.

Page 19: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Relational Databases

A database system in which the database is organized and accessed according to the relationships between data items without the need for any consideration of physical orientation and relationship. Relationships between data items are expressed by means of tables.

Page 20: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Features of Databases • Collection of data stored together as a unit

• Databases are useful for storing data and making it available for retrieval

• Within the database, data is organized into different tables

• Each table has columns and rows. Indexes on tables provide speedy access to data

• Information in the database can be retrieved, modified, or deleted using a query language like SQL

• Some common database systems are Oracle, SQL Server, DB2, Sybase, etc.

Page 21: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Relational Database Model

Data is presented as a collection of relations

Each relation is depicted as a table

Columns are attributes

Rows represent entities

Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity

Page 22: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Relational Database Model

Views in a database

Company maintains a database of its employees

• Other attributes of its employees: age, salary, emergency contacts, appraisal, etc.

• Different needs for different applications of the database: e.g., company may need to make available demographic data to a governmental agency

• Only some attributes need be supplied - and others ought not to so as to protect privacy: different views can be provided into the same data

Page 23: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Database Design Identify entities that we are dealing with, their various attributes, and

their relationships

An entity is some object with a real or conceptual existence in the world -- tofu, Advanced Java Class, Guggenheim Museum, Elaine, company

Attribute is a property of an entity -- address, size, mother, age

A relational column is an attribute

A relationship defines roles in which entities work together -- "Bill WORKS-FOR Motorola", "jbs TEACHES advanced-java"

RDBMSs represent relationships as tables

Page 24: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Database Design as ER Diagrams Rectangles represent entity types, diamonds relationship types, and ovals attributes. Underlined attribute names represent keys

Rectangles: Object/concept nounsDiamonds: VerbsOvals: Characteristics

Page 25: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Functions: Join

Page 26: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Microsoft Access provides a graphical user interface that makes it very easy to define and manipulate databases. E.g., membership records in an organization

Access allows you to define and then store a set of queries and give these queries names that are meaningful to you. Note the Tables and Queries tabs in particular (Reports is useful for generating hardcopy output, such as mailing labels).                                                           

Page 27: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Tables in Microsoft Access

Page 28: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Projectso Two-student teams work on projects for the DiscoverET.org

or develop their own

o Each team will present final results to the class during a public forum and produce a document of the project

o Information Organization and Representation Portfolio (IORP) Includes analysis and/or commentary related to class topics

o Intellectual works and their manifestations, metadata standards in various environments, cataloging and authority control, metadata coding and crosswalks, digital library development, subject access and vocabulary control, concept mapping, indexing and abstracting, classification systems, cognitive category analysis, system design

o Evaluation based on : Creativity of project outcomes (recommendations/ solutions proposed), Relevance and practicality of implementation, Thoroughness and examination of details

Page 29: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project General Guidelineso Purpose is to apply knowledge to real life situations and to gain hands-

on experiences.

o I. You must sign up for the project and work in a two-student team.

o II. Each group must schedule a meeting with the instructor to discuss the project no later than the due date indicated in schedule.

o III. Each group must document the process and activities. Turn in your project documentation including the following parts:

Introduction: Topic description and project goals; members

Specific tasks that are distributed among members

The final product plus description and examples (this is the main part of the document)

Conclusions and experiences (summarize what you have learned and your thoughts; you may add what you would do if you would do it again)

Page 30: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Projects: Road Map/TOC/Outline for

the Information Organization Portfolio I. Introduction

• What is your project? Expectations, Required elements, etc.• Issues/concerns specific to your project topic that play a role

in developing an IOP

II. Class topics and their relationship to your project3-5 key considerations about each topic that is significant in developing an IOP on the specific project

III. Case-Studies and their Critique based on class topics or more

List of web resources (DL or web portal) with short description and location

3 or more case studies as relevant

Comparative analysis

IS 520~Mehra

Page 31: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Projects: Road Map/TOC/Outline for the

Information Organization Portfolio

IV. Design Solutions/Templates Design solutions reflecting key aspects Web design solutions Analysis of designs

V. Recommendations

VI. Future Considerations

VII. Documentation Report

IS 520~Mehra

Page 32: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples1. On the existing DiscoverET.org website, develop an IORP for

presenting community-based information for a selected subject

category “Health.”

• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.

• Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions.

• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.

Page 33: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples2. On the existing DiscoverET.org website, develop an IORP

for presenting community-based information for a selected subject category “Tourism.”

• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.

• Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions.

• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.

Page 34: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples3. For the existing DiscoverET.org website, develop an

IORP for presenting community-based information for a new subject category of “Diversity Resources.”

• Do a case-analysis of existing content and representation scheme(s) (related to “Diversity”) on the website and provide alternative design solutions.

• Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) on the subject site and provide alternative design solutions.

• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.

Page 35: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county.

• Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions.

• Your IORP should include a comprehensive collection of website listings for that county, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.

• Provide a test-bed for implementation based on selection for one selected county from the adjoining states or select from the following website: URL: http://www.discoveret.org/index.php?p=DirCountySearch

IS 520~Mehra

Page 36: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples5. Based on a study of the use of wikis in existing and emerging community-based web portals, develop an IORP for presenting community-based interactive communication and information-sharing interactive tools via development of wikis on the DiscoverET.org website.

• Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) that have wikis and provide alternative design solutions.

• Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Evaluate the forms of interaction taking place via the different wikis in the different settings.

• Present the pro and cons based upon your analysis while you make recommendations for the DiscoverET.org website. Present summary reports for use of wikis as community-based interactive communication and information-sharing tools that includes design options and implementation plan for application.

Page 37: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Final Project Examples6. Based on a study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples, provide a scheme for a community client (Fish) at DiscoverET.org who want to develop a system to keep up track of their activities/events and organize their work and human resources (time schedules, working responsibilities, etc.).

• Based on case-analysis and critique of existing content and representation scheme(s) in databases on selected websites/web portals (other community networks), identify what kind of databases the client can use, discussion on pros and cons for each, cost-benefit ratios, etc.

• Your IORP should include a comprehensive collection of database examples, identification of entities and attributes for your designed database, classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects.

• Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities.

Page 38: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

For the DiscoverET.org website

1. Present community-based information for a selected subject category “Health”

2. Present community-based information for a selected subject category “Tourism”: Pam, Suzanne

3. Present community-based information for a new subject category “Diversity Resources”: Hannah, Deborah

4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county: Sara, Christa

5. Study of the use of wikis in existing and emerging community-based web portals: Margaret, Emily

6. Study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples: Bridger, Roger

Page 39: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Critical Reflection 7

In pairs identify a subject domain and select at least five items to form a template design for a digital library. Brainstorm various topics/aspects covered in class that will be pertinent for creating an effective information organization and representation scheme for your digital library. Design a database for your collection and identify key entities, attributes, and relationships. Present an ER Diagram to reflect some aspects of your database design.

Page 40: Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of

Critical Reflection Goals for the metadata and users: Are you clear about what you

want to achieve with this metadata? Are you clear about your users’ use of the resources?

Granularity: What level of granularity is most appropriate to the items and user needs?

Sources of info: Is it clear or even stated where you get your information? For example, if title is a field, is the cataloger told where to find that info? For example with a videotape- do you look on the label? The box?

Complexity of record creation: Are special skills required to formulate the records? Are the records designed to be created by the info ‘publisher’ or centrally by service providers?

Content: The content of different metadata record formats can be compared from aspects of structure and syntax, but perhaps most important is an evaluation of the usefulness and purpose of the info within them. How useful are the records you have created?

Works well or not: What fields or characteristics work well (or do not work well) in describing your objects?

Tweaking: How could/should the metadata be “tweaked” to accommodate your needs?