whither subject access?
TRANSCRIPT
Whither subject access?
Karen MarkeyProfessor, University of
Outline
6 reasons why subject access so difficult
4 end-user searcher types Helping the most predominant type (~ 80% of
queries) to overcome difficulties 4 system improvements Our improvement approach:
A web-based board game that teaches players how to build their knowledge about a topic
Why do subject access? (The library context)
I don’t know something, and I want to find out
Why is subject access so difficult?-1
If you don’t know something, how can you formulate a question, query, keywords, search statement, etc., to answer it?
“Precisely because of the inquirer'slack of knowledge about a problem area,
it is impossible to specify what would resolve it.”
– Belkin 1980, 137 –
Outcome of subject searches(The library context)
Vetted scholarship
Read, analyze, and synthesize Act: Satisfy the information need that set the subject-access episode into action
Why is subject access so difficult?-2
Where to find the answer? OPAC Library-licensed databases (at U-M = 1,023)
The web: Google and other search engines
Institutional repositories Subject archives (e.g., arXiv, Cogprint)
Invisible web
What is subject access so difficult?-3
In the course of satisfying knowing, you encounter “doing” Buying and selling Playing Managing assets Talking to other people Computing Developing … so that people can buy and sell, play, manage assets, talk, and compute
Outcome of subject searches(The e-context)
Vetted scholarship And a whole lot more
New technologies for scientists & scholars
OCLC Environment Scan: Pattern recognition:Executive Summary, p. 3. 2003.
Expanded role for librarians
No longer just about the finished products of research
Selecting, organizing, preserving, etc., the products and by-products of “doing” science & scholarship
Orienting information seekers about the “doing” science & scholarship artifacts they encounter
Maybe using some of these same new technologies to facilitate what we do…
Outcome of subject searches(The e-library context)
Vetted scholarship And a whole lot more:
Limiting this “whole lot more” to the products and by-product of the research enterprise … to the doing of science and scholarship
Why is subject access so difficult?-4
The seeker’s present level of expertise vis-à-vis their retrievals Grade school High school College Graduate school Terminal degree, e.g.,
MD, JD, PhD, MFA, licenses, certifications, ordinations, initiations, etc.
Topics: Kukulcan Making aerogel
affordable How do birds migrate? Tibetan Buddhism Pop rocks The Black Death Using extremophiles to
clean up radioactive wastes
Knowledge is like the dust.You can't see it building up because it builds up so slowly
but after a while when you check, you can see it has built up
quite a bit.
Why is subject access so difficult?-5 Different document representations
Titles Uncontrolled keywords Controlled vocabularies Abstracts Web pages E-journal articles Citation data E-reviews E-encyclopedia articles E-newspaper articles E-books
Why is subject access so difficult?-6
Different search engines and search functionality Boolean Probabilistic Manual or automatic truncation Word proximity Spelling correction Phrase searching Relevance ranking Popularity ranking
Plus… Our knowledge of people’s feelings during search exacerbates the problem (Kuhlthau’s ISP Model)
1. Task initiation apprehension, uncertainty
2. Topic selection confusion, anxiety, brief elation after selection
3. Prefocus exploration
confusion, doubt, threat, uncertainty
4. Focus formulation
optimism, confidence in ability to complete the task
5. Information collection
so much work to do but confidence in ability…
6. Closure relief, satisfaction or disappointment
Summing up: Subject access is difficult Knowing so little about what I want to know
Expressing my query in words Formulating my query into a search statement that yields useful retrievals
Continuing the search beyond the web Eliminating the noise Retrieving something I can understand given my present knowledge of the subject
Roller coastering up and down emotionally
What really matters = system & domain knowledge
Low system knowledge
High system knowledge
Low domain expertise
~79% ~7%
High domain expertise
~14% less than 0.5%
Most people are looking for information on topics they know nothing about
They have low system knowledge and low domain knowledge
When double novices search… Low domain knowledge
Not knowing the right jargon, names of movers & shakers, an expert other than their instructor
Low system knowledge Searches that are frenetic, aimless, random,
meandering … Low procedural knowledge
Not knowing what sources to search or the order of searching sources
Success starting with Google, but then what? Low metacognitive knowledge
Not thinking about searching, search strategies, search tactics, making progress, knowing when to stop…
(The vast majority of users and uses)
Low system knowledge & high domain knowledge-1 High procedural knowledge
Familiar with in-domain sources The order for searching these sources
High domain knowledge—they know: Experts contributing to their field Jargon and language of their field Other domain experts for recommendations Channel this knowledge into these successful
search strategies: Author searching Backward chaining Forward chaining Journal runs
Low system knowledge & high domain knowledge-2
Rely on their domain knowledge to quickly spot relevant retrievals
Don’t need Google for basic information in their domain
Not as frenetic … Do they generalize the search strategies of their in-domain searches to theirout-of-domain searches?
The rise of the professional-amateur class-1 Becoming a birdwatcher (1960s)
Library: field guides, picture books, how-to books
Parents (?) Scout leaders (?)
Becoming a birdwatcher (today) All of the above + e-birding: rare bird alerts; chat and experts on
mailing lists; photo archive; hot spots: directions, lists, and maps; meeting and field trip notes; commercial tours; travel preparation
Doing: Go on field trips and benefit from volunteer expertise
Professional-amateur class-2 Greying of America> Increase in professional-
amateurs Boomers retire in good health with leisure time and
money … Professional classes harness professional-amateur
enthusiasm and expertise Cornell Laboratory of Ornithology’s Citizen Science
(http://www.birds.cornell.edu/) Status and population trends ~ as simple as counting
birds at your feeders Threatened species: Tanagers, Cerulean warblers,
Golden-winged warblers U.S. Forest Service
Endangered Kirtland’s Warbler Swelling the ranks of searchers with low system-
high expertise knowledge
Kirtland’s Warbler (Ron Austing photography)
Double experts High domain knowledge
Use in-domain search strategies Know jargon, active researchers, other domain
experts… High system knowledge
Use the wide range of search-system functionality
High procedural knowledge Know the relevant sources and their order
High metacognitive knowledge Thinking about searching, search strategies,
search tactics, accessing their progress, knowing when to stop…
(Miniscule percentage of users and uses)
Low domain knowledge & high system knowledge
High system knowledge Rarely frenetic … Use the wide range of system search functionality Use in-domain search strategies for out-of-domain
searches Author searching Backward and forward chaining Journal runs
Cognizant of procedural knowledge What are the in-domain sources? How should these sources be ordered?
Cognizant of metacognitive knowledge Think about searching
Improve searching fordouble novices
Reduce the impact of the end user’s Low system knowledge Low domain expertise Low procedural knowledge
Reduce these conditions> End users can focus on thinking about searching (metacognitive knowledge)
Reduce the impact of low system knowledge: Post-Boolean-1
Build future systems with post-Boolean searching Quoting Susan Feldman :
“These systems are doing what expert searchers have learned to do yourselves. They look for terms that can distinguish one document from another, they ask for the terms to appear close together in the document, they stem words, they count words that appear in the title more heavily than those appearing in the rest of the text …”
Reduce the impact of low system knowledge: Post-Boolean-2
Post-Boolean systems don’t require people to: Understand Boolean retrieval Enter complicated search syntax Scan unranked retrievals
Post-Boolean systems rank potentially relevant retrievals at the top Let people use their energy spotting of
relevant retrievals (That’s what people with high domain
knowledge and low system knowledge are doing)
That’s what people with high domain knowledge and lowsystem knowledge are doing
Reduce the impact of low domain expertise: Ranking retrievals
Profile ranking algorithms and relevance feedback routines to: Give higher weights to titles, subject headings,
and table of contents entries than to words buried deep in the text
Produce retrievals that give a comprehensive rather than a cursory treatment of the desired topic
Ensure relevant retrievals are ranked at the top
Reduce the impact of low domain expertise: Feedback
Enhance relevance feedback routines with the search strategies of domain experts Backward chaining Forward chaining Author searching Journal runs
These strategies require input that is straightforward and objective Author names Citation data Journal titles
Reduce the impact of low procedural knowledge: Process models-1
(Some background first) Google searching is easy
Google searches “everything” in one fell swoop
No deliberating or second guessing about database selection
Google’s popularity ranking algorithm ranks the simple, low-granularity stuff at the top
Google is a great starting point, then what?
Reduce the impact of low procedural knowledge: Process models-2
Library gateways feature metasearching to mirror Google searching Gateways categorize databases by discipline and let people search across these databases
Metasearching in gateways is not effective because it ignores procedural knowledge Given one’s knowledge about a topic, knowing what sources to search and in what order
Reduce the impact of low procedural knowledge: Process models-3
Building systems with procedural knowledge should be the next leap forward in online system design Process models to simulate the procedural knowledge of system experts selecting databases General-to-specific model (Tom Kirk) Gateway at Ohio State (Virginia Tiefel) Learning-the-library models (Beaubien, Hogan, & George)
Reduce the impact of low procedural knowledge: Needed metadata-1
Add more cataloging because data in existing bibliographic records is not able to approximate procedural knowledge: In a discipline: in biology, mathematics, physics … With knowledge of this subject at a particular academic level:
with an elementary education, with a high school education, with a college education …
To what extent the author is an authority on the topic at hand For a particular class of people: for teens, for seniors, for
shut-ins, etc.
Reduce the impact of low procedural knowledge: Needed metadata-2
Add more cataloging (contd.) Is a particular genre or of a particular literary
nature: encyclopedias, newspapers, poetry, history, bibliography, research, diary, statistics …
What can be done with the artifact: read, calculate, play, chat, sell, gamble…
How others benefited from using the artifact (reviews and ratings)
Survey existing databases for controlled vocabularies for these elements
To-do list:
1. Post-Boolean retrieval systems Profile ranking algorithms to weight titles, subject
headings, and table of contents entries higher than words buried deep in the text
Produce retrievals that give a comprehensive rather than a cursory treatment of the desired topic
2. Enhance relevance feedback routines with the search strategies of domain experts Author searches Backward chaining Forward chaining Journal runs
To-do list:
3. Build systems with procedural knowledge for searching scholarly and scientific information The next major leap forward in online system
design!
4. Add more subject cataloging In a discipline For a particular class of people Is a particular genre of literature…
(Don’t build vocabularies from scratch—cull vocabularies from other databases)
Desired outcome = Relevant ranked retrievals that are in keeping with people’s knowledge of their topics
To-do list: Then let users focus on putting the relevant information they find to work for them Making a decision Taking an action Adding to their knowledge base about a
topic
Knowledge is like the dust.You can't see it building up because it builds up so slowly
but after a while when you check, you can see it has built up
quite a bit.
Storygame Project-1 A web-based board game: Gain knowledge and depth in a real research topic (“The Black Death”)
Navigate what is written about the Black Death in a systematic way and get practice, practice, practice Tom Kirk’s General-to-Specific Model for Library Research Start with the web Consult encyclopedias Read books Locate edited works Find journal articles Use a favorite, relevant publication to find more via the Web of Science
Storygaming Project-2
Games: Popular pastime for college students
Games have good learning principles (Gee) Lower the consequence of failure Repetition and practice Reward Becoming expert in a domain and being recognized for their expertise
Discovery …
The Solution: Gaming with a Strong Storytelling Element-2 Our immediate mission:
Build a game prototype in which players become certified library researchers
Host game play with incoming freshmen Evaluate the prototype Improve the prototype: game genre, functionality,
interactivity, more instructions, etc. Our long-term mission:
Give students games that they want to play Learn, practice, and reinforce information-
literacy skills Accommodate large numbers of students Export beyond U-M (Interested? Let’s get an IMLS
grant to do this.)
Game Play Basics Board game
(Like Risk, Monopoly, Clue, Chutes & Ladders…) Monopoly = the game of becoming a real-estate tycoon Our game = the game of becoming a certified library
researcher Players accumulate wealth, territory, and
knowledge Wealth = gold Territory = libraries that gamers acquire by proving
their fitness as researchers Knowledge = quota of correct answers to questions
Game winner = Fastest and most accurate researcher
Modest prizes> Delmas Foundation grant
Game Demonstration Backstory
The Black Death has reached the Duchy of Hidgeon
Duke of Hidgeon must develop a plan to handle the impending crisis
Duchy libraries are stocked with knowledge past, present, and future
The duke needs certified researchers to find answers
This is the game of research certification
< Let’s play! >
Storygame web links
The Storygame Project: http://www.si.umich.edu/~ylime/storygame.html
The game:http://ics.umflint.edu:3904/team/loginUse "demo" for name and "secret" for password
The video:http://www.youtube.com/watch?v=u76tW-ne-yY
Summing Up 6 subject-access difficulties + emotions 4 searcher types
Help double novices! (Low domain knowledge and low system knowledge)
4 improvements: Post-Boolean retrieval Built-in strategies Built-in process models Needed metadata
Our contribution: Storygaming to teach process models
Fellow speakers’ contributions …
Help double novices! (Low domain knowledge and low system knowledge)