metadata: what, how and why?

36
Metadata: What, How and Why? IMT595B April 6, 2007 Mike Crandall University of Washington Information School [email protected]

Upload: stuart-frost

Post on 01-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Metadata: What, How and Why?. IMT595B April 6, 2007 Mike Crandall University of Washington Information School [email protected]. Web 2.0… The Machine is Us/ing Us. http://youtube.com/watch?v=6gmP4nk0EOE. Roadmap. What is metadata? The basics Metadata standards - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Metadata:  What, How and Why?

Metadata: What, How and Why?

IMT595BApril 6, 2007

Mike CrandallUniversity of Washington Information School

[email protected]

Page 2: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 2

Web 2.0… The Machine is Us/ing Us

http://youtube.com/watch?v=6gmP4nk0EOE

Page 3: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 3

Roadmap

• What is metadata?– The basics – Metadata standards

• How can you use metadata? – What is it for?– When do you use it? – How much does it cost? – What about maintenance?

• Why would you use metadata?– What value does it add?– When are alternatives a better choice?– Social tagging vs. metadata

• Things to think about

Page 4: Metadata:  What, How and Why?

What is Metadata?

Page 5: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 5

Page 6: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 6

What is Metadata?

• Data about data• Definitional data that provides information

about or documentation of other data managed within an application or environment… metadata may include descriptive information about the context, quality and condition, or characteristics of the data (FOLDOC)

• Levels of complexity– Simple (embedded in object; e.g., a hyperlink)– Structured (Dublin Core, content management)– Rich (library MARC records, Encoded Archival

Description)

Page 7: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 7

Origins

• Library science– Focus is on entities as containers for information– Emphasis is on resource discovery– Tight focus resulted in widespread standards

• Data management– Focus is on the information itself– Much more complex information spaces (e.g.,

NASA satellite data)– Much more varied types of information and use– Emphasis is on data use (authenticity, authority)– Standards tend to be associated with data types

Page 8: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 8

Types of Metadata

• Administrative– Object management– Rights and access management– Maintenance and preservation– Meta-metadata for managing metadata

• Structural or technical– Describes relationships between parts– Enables recognition and use of objects by systems

• Descriptive– Describes characteristics of object– Physical and aboutness (subject)

Page 9: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 9

Metadata Schemas

• Sets of metadata elements designed to meet the needs of a community

• The elements are the fields that hold values authorized for use in the schema

• Many different needs, so many different schemas are available

• Three primary components– Structure: the model used to derive the schema (e.g., RDF)– Semantics: the meaning of the elements

• Values are specified through rules or vocabularies (“encoding schemes” or authority control)

– Syntax: the method for encoding the schema (e.g., XML, XHTML)

Page 10: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 10

Page 11: Metadata:  What, How and Why?

How Can You Use Metadata?

Page 12: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 12

Information Systems

Soergel, 1985

Page 13: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 13

Objectives of Metadata

• Find– Through search engines, catalogs, etc.

• Identify– Distinguishing between items for purposes of use

• Select– By attributes such as language, format, genre, etc.

• Obtain– Either directly or through location/ordering metadata

• Navigate– For example, categories on web sites

• Manage– Content management systems– Document repositories

Page 14: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 14

Finding

Indexing

User

Other Users

Query Preprocessing

Result Set Manipulation

Searching Index(es)User

Interface

Indexer

Independent Metadata

Data Stores

Data Analysis

Index Metadata

database schemasthesauri

file systemhttpmessaging storesDocument storeDatabasesDirectory stores

string manipulationsynonym sets &thesauristemmingwordbreaking

adaptive crawlingword breakingword stemmingNLP

dedupingconcatenationranking

Result Refining

User Metadata

Page 15: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 15

MSWeb Search

Page 16: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 16

News Publishing Tool

Page 17: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 17

Navigating

Page 18: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 18

Facets at wine.com

Facet / Metadata # of vocabulary terms

Type 46

Region 16

Winery 750

Price 6

Rating 6

Total terms 824

Total combinations 1,656,824Morante, Marcia. Creating Useful Taxonomies: Metadata, Taxonomies and Controlled Vocabularies. SLA – PER Division, June 8, 2004. http://www.kcurve.com/Metadata_Taxonomy%20Development_SLA_060804.ppt

Page 19: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 19

ManagingLayers in the Darwin Information Typing Architecture

Web site; information portal

aggregate printinghelpset

Delivery contexts

Web site; information portal

aggregate printinghelpset

Delivery contexts

referencetaskconcepttopic

Typed topic structures

referencetaskconcepttopic

Typed topic structures

highlighting software programming user interfaceIncluded domains:

referencetaskconceptTyped topic:

Specialized vocabularies (domains) across information types

highlighting software programming user interfaceIncluded domains:

referencetaskconceptTyped topic:

Specialized vocabularies (domains) across information types

OASIS (CALS) tablemetadata

Common structures

OASIS (CALS) tablemetadata

Common structures

http://www-128.ibm.com/developerworks/xml/library/x-dita1/

Page 20: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 20

Costs of Metadata

• Basic question should really be what are you trying to accomplish, and does metadata add value to your project?

• Startup costs can be high, but maintenance costs will be at least equal if not more

• Good metadata systems require resources– people, machines, and time

• Don’t start without an understanding of what those might be

Page 21: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 21

Example Startup CostsMILESTONES / TASKS LABOR / HOURS COSTS 1) INITIAL COSTS SOFTWARE AND OUTSIDE SERVICES Taxonomy Development Software $125,000 Search and Auto-classification Engine $140,000 Cross Application Schema Repository $125,000 Onsite Installation and Training $30,000 SUB TOTAL $420,000

PROJECT RESOURCES

System evaluation and purchase Knowledge Architecture Manager (80) Search Architect (40) Knowledge Architect (40)

7,600

Approvals Knowledge Architecture Manager (40) Knowledge Auditor/ Customer Liaison (80)

5,600

Audit (interviews) Knowledge Architect (40) Knowledge Auditor/ Customer Liaison (100)

5,800

Audit (systems) Knowledge Architect (20) Taxonomy Designer (100) Search Architect (40)

6,700

Imported Structures Taxonomy Designer (80) Knowledge Auditor/ Customer Liaison (40)

4,800

Modeling Knowledge Architect (40) Taxonomy Designer (60) Search Architect (60)

6,900

Application Development Search Architect (100) Application Developers (300) Knowledge Architect (20) Taxonomy Designer (20) System Engineer (120)

23,000

Refinement and Validation Taxonomy Designer (50) Taxonomy Engineers (300)

12,500

SUB TOTAL $72,900 TOTAL INITIAL COSTS $492,900

Page 22: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 22

Example Maintenance Costs 2) ONGOING COSTS (annualized) Customer Relations Knowledge Architecture Manager (1200),

Knowledge Architect (240) Knowledge Auditor/ Customer Liaison (800)

$102,800

Planning and Management Knowledge Architecture Manager (400) Knowledge Architect (120) Search Architect (120)

$30,800

Architectural Design Knowledge Architect (600) Search Architect (120) Taxonomy Designer (240)

$42,000

Change History and Reporting Knowledge Auditor/ Customer Liaison (160) Application Developers (120)

$11,200

Sea Change Events Knowledge Architecture Manager (120), Knowledge Architect (240) Knowledge Auditor/ Customer Liaison (240) Taxonomy Designer (240) Taxonomy Engineers (960) Search Architect (160) Application Developers (200)

$84,800

Reconciliation Knowledge Auditor/ Customer Liaison (80) Taxonomy Designer (320) Taxonomy Engineers (640)

$38,400

Synchronization Knowledge Auditor/ Customer Liaison (120) Taxonomy Designer (120) Taxonomy Engineers (480) Search Architect (240) Application Developers (480)

$56,400

Application Upgrades and Modifications

Search Architect (900) Knowledge Architect (120) Taxonomy Designer (240) Application Developers (1800)

$127,500

System Maintenance and Upkeep System Engineer (1800) $72,000 TOTAL ANNUAL ONGOING COSTS $565,900 TOTAL PROJECT OUTLAYS $1,058,800

Page 23: Metadata:  What, How and Why?

Why Use Metadata?

Page 24: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 24

It’s Not Just the Tools

"Content" has been treated like a kind of soup that "content providers" scoop out of pots and dump wholesale into information systems. But it does not work that way. Good information retrieval design requires just as much expertise about information and systems of information organization as it does about the technical aspects of systems.

Bates,Marcia J. “After the Dot-Bomb: Getting Web Information Retrieval Right This Time” First Monday 7(7), July 2002. http://firstmonday.org/issues/issue7_7/bates/index.html

Page 25: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 25

The Big Picture

Selamat & Choudrie, 2004

We’re here

But don’t forget

the rest

Page 26: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 26

Alternative Approaches

• What about folksonomies and social tagging?– What problems can they solve?– What issues do they raise?

• How many people are likely to tag?• What about synonym control?• Does it matter?

• Civilizations in decline are consistently characterised by a tendency towards standardization and uniformity. Arnold Toynbee, historian(1889-1975)

Page 27: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 27

Alternative Approaches

Page 28: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 28

Page 29: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 29

Page 30: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 30

Page 31: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 31

Page 32: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 32

Where Does Metadata Fit?

We tend to think that the hard problems are the big ones. So we

believe that searching the Web is hard because it's so huge. But

I've been thinking lately that the really hard problems are actually

the ones in the middle. In the middle, many algorithms don't work

that well with moderate document sets, context becomes more

important, interaction is critical, and you can't get the user "in

the ballpark" anymore--you have to get them right to the thing

they're looking for.

Karl Fast- http://lists.ibiblio.org/mailman/private/aifia-members/2004-February/001129.html

Page 33: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 33

Braly & Froh (2007) after Shirky (2005)

A ContinuumWhen to Use Formal Metadata

Page 34: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 34

Things to Think About

• Make sure you can measure results

• Don’t assume one size fits all

• Choose user access points wisely

• Provide user tools and education for effective use of your metadata

• Make sure you’re adding value

• Balance theory with practical needs

• Consider trust and provenance

Page 35: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 35

Readings• Soergel, D. (1985). Organizing information. Principles of data base and retrieval systems. Orlando, Fl: Academic Press. 450 p. • Taylor, A. (2004). The Organization of Information. 2nd ed. Westport, Conn: Libraries Unlimited. 417p.• Burnett, K. (1999) “A Comparison of the Two Traditions of Metadata Development”. Journal of the American Society for

Information Science, 50(13), 1209-1217. • Rosenfeld, L. & P. Morville. (2002). Chapter 9, “Thesauri, Controlled Vocabularies, and Metadata” in Information Architecture for

the World Wide Web. 2nd ed. Sebastopol, CA: O’Reilly. (p. 176-208).• Zeng, M.L. (2005). Construction of controlled vocabularies: A primer. NISO. http://www.slis.kent.edu/~mzeng/Z3919/index.htm.• Bates,Marcia J. (2002) “After the Dot-Bomb: Getting Web Information Retrieval Right This Time” First Monday 7(7), July 2002.

http://firstmonday.org/issues/issue7_7/bates/index.html• Bryar, J.V. (2001) “Taxonomies: The value of organized business knowledge”. A White Paper Prepared for NewsEdge. • Byrne, T. (2004) “Enterprise information architecture: Don’t do ECM without it”. Econtent 27.5 (May 2004): 22-29.• Earley, S. (2005). “Developing enterprise taxonomies”. Early & Associates.

http://www.earley.com/Earley_Report/ER_Taxonomy.htm. • Montague Institute. (2001). “Managing taxonomies strategically”. http://www.montague.com/abstracts/taxonomy3.html. • Selamat, M.H. & J. Choudrie. (2004). “The diffusion of tacit knowledge and its implications on information systems: The role of

meta-abilities”. Journal of Knowledge Management, 8(2), 128-139. • Bulterman, D.C.A. (2004) "Is It Time for a Moratorium on Metadata?," IEEE MultiMedia, vol. 11,  no. 4,  pp. 10-17, 2004.

http://homepages.cwi.nl/~dcab/PDF/ieeeMM2004.pdf • Fitzgerald, M. (2006) “The Name Game: Tagging tools let users describe the world in their own terms as taxonomies become

"folksonomies."” CIO Magazine, April 1, 2006. http://www.cio.com/archive/040106/et_main.html?action=print• Braly, M. & G. Froh (2007). “Tagging”. Presentation for IMT530 Organization of Information Resources. (Feb 10.2007).

EnterpriseTagging.org http://enterprisetagging.org/assets/pdf/IMT530_Tagging_Presentation.pdf. • Shirky, C. (2005). “Ontology is Overrated: Categories, Links, and Tags”. Clay Shirky’s Writings About the Internet.

http://www.shirky.com/writings/ontology_overrated.html.

Page 36: Metadata:  What, How and Why?

April 6. 2007 Metadata: What, How and Why? 36

Questions???