© 2009 tefko saracevic 1 information science: where does it come from and where is it going? tefko...
TRANSCRIPT
© 2009 Tefko Saracevic 1
Information Science: Where does it come from
and where is it going?Tefko Saracevic, PhDSchool of Communication, & InformationRutgers UniversityNew Brunswick, New Jersey USA
http://www.scils.rutgers.edu/~tefko
© 2009 Tefko Saracevic 2
Information science: a short definition
“the collection, classification, storage, retrieval, and
dissemination of recorded knowledge treated both as a pure and as an
applied science”
Merriam-Webster
actually, it all started long ago
In China:
Wang Zhen developed wooden movable type & published first book in 1313
In Europe Johannes Gutenberg is credited with being the first European to use movable type printing around 1439
© 2009 Tefko Saracevic 3
© 2009 Tefko Saracevic 4
Organization of presentation
1. Big picture – problems, solutions, social place2. Structure – main areas in research & practice3. Technology – information retrieval – largest part4. Information – representation; bibliometrics5. People – users, use, seeking, context6. Digital libraries – whose are they anyhow?7. Conclusions – big questions for the future
© 2009 Tefko Saracevic 5
Part 1. The big picture
Problems addressed
Bit of history: Vannevar Bush (1945):
Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.”
Problem still with us & growing
1890-1974
© 2009 Tefko Saracevic 6
… solution
Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.”
Technological fix to problemStill with us: technological determinant
© 2009 Tefko Saracevic 7
At the base of information science:Problem of information explosion
Trying to control content inInformation explosion
exponential growth of information artifacts, if not of information itself
PLUS todayCommunication explosion
exponential growth of means and ways by which information is communicated, transmitted, accesses, used
Dealing with effects of this abundance
© 2009 Tefko Saracevic 8
technological solution, BUT …
applying technology to solving problems of effective use of information
BUT:from a
HUMAN & SOCIALand not only TECHNOLOGICAL perspective
© 2009 Tefko Saracevic 9
or a symbolic model
Information
Technology
People
© 2009 Tefko Saracevic 10
Problems & solutions: SOCIAL CONTEXT
Information science:Professional practice AND scientific inquiry related to:Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information
Taking advantage of modern information technology
© 2009 Tefko Saracevic 11
ElaborationKnowledge records =
content-bearing structures texts, sounds, images, multimedia, web ... ‘literature’ in given domains
Communication = human-information interaction
study of information science is the interface between people & information
Information need, seeking, and use = reason d'être
Effectiveness = relevance, utility
© 2009 Tefko Saracevic 12
General characteristics
Interdisciplinarity - relations with a number of fields, some more or less predominant
Technological imperative - driving force, as in many modern fields
Information society - social context and role in evolution - shared with many fields
Table of content
© 2009 Tefko Saracevic 13
Part 2. Structure
Composition of the field
As many fields, information science has different areas of concentration & specialization
They change, evolve over time grow closer, grow apart ignore each other, less or more sometimes fight
© 2009 Tefko Saracevic 14
most importantly different areas…
receive more or less in funding & emphasis producing great imbalances in work & progress
attracting different audiences & fields
this includes vastly different levels of support for research and
huge commercial investments & applications
© 2009 Tefko Saracevic 15
How to view structure?by decomposing areas & efforts in research & practice emphasizing
Technology
Information
or
People
or
Three big questions for information science (Bates, 1999)
The design question: [Technology]How can access to recorded information be made most rapid and effective?
The physical question: [Information]What are the features and laws of the recorded information universe?
The social question: [People]How do people relate to, seek and use information?
© 2009 Tefko Saracevic 16Table of content
© 2009 Tefko Saracevic 17
Identified with information retrieval (IR) by far biggest effort and investment international & global commercial interest large & growing
Part 3.
Technology
© 2009 Tefko Saracevic 18
Information Retrieval – definition & objective
“ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...”
Calvin Mooers, 1951
How to provide users with relevant information effectively?
For that objective:1. How to organize information intellectually?2. How to specify the search & interaction intellectually?
3. What techniques & systems to use effectively?
1919-1994
© 2009 Tefko Saracevic 19
Streams in IR research & development
1. Information science: Services, users, use; Human-computer interaction; Cognitive aspects
2. Computer science: Algorithms, techniques Systems aspects
3. Information industry: Products, services, Web, search engines
Market aspects Problem:
relative isolation between these streams
© 2009 Tefko Saracevic 20
IR research
Started in the US through government support & in information science
Now mostly done within computer science e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM)
Gerard Salton1927-1995
© 2009 Tefko Saracevic 21
Contemporary IR research
Spread globally e.g. major IR research communities emerged in China, Korea, Singapore
Branched outside of information science - “everybody does information retrieval”
search engines, natural language processing, data mining, artificial intelligence, organization – ontologies …
© 2009 Tefko Saracevic 22
Testing in IR
Major component of IR made it strong & affected innovation
Long history – started with Cranfield tests in late 1950’s
Measures – precision & recall based on relevance
Cyril Cleverdon 1914-1997
© 2009 Tefko Saracevic 23
Talking about relevance
Major objective of IR is to retrieve RELEVANT information
But what is “relevance?”Spurred many research studies in IS
manifestations, models, theories, experimental studies on people behavior, effects, criteria in judgments, variability, clues …
Still major criterion in search engines
© 2009 Tefko Saracevic 24
Text REtrieval Conference (TREC)
Major research, laboratory effortStarted in 1992,
“support research within the IR community by providing the infrastructure necessary for large-scale evaluation”
Methods provides large test beds, queries, relevance judgments, comparative analyses
essentially using Cranfield 1960’s methodology organized around tracks
various topics – changing over years
© 2009 Tefko Saracevic 25
TREC impact
International – big impact on creating research communities, incl. in Asia
Annual conferences reports, exchange results, foster cooperation
Results mostly in reports, available at
http://trec.nist.gov/pubs.html overviews provided as well but, only a fraction published in journals Book (2005):
TREC: Experiment and Evaluation in Information RetrievalEdited by Ellen M. Voorhees and Donna K. Harman
© 2009 Tefko Saracevic 26
Broadening of IR – ever changing, ever new areas added
Cross language IR (CLIR) Natural language processing (NLP IR) Specific media IR: music, spoken language, image,
video, multimedia
IR for bioinformatics, genomics, law … Categorization, clustering, filtering Information summarization & extraction Question answering Machine learning IR & database search – XML retrieval; structured
queries
Web IR; Web search engines Digital libraries
© 2009 Tefko Saracevic 27
Commercial IR
Search engines based on IRBut added many elaborations & significant innovations dealing with HUGE number of pages fast countering spamming & page rank games – adversarial IR - combat of algorithms
adding context for searching Spread & impact worldwide
about 2000 engines in over 160 countries English was dominant, but not any more
© 2009 Tefko Saracevic 28
Commercial IR: brave new worldLarge investments & economic sector
hope for big profits, as yet questionable
Leading to proprietary, secret IR also aggressive hiring of best talent new commercial research centers in different countries (e.g. MS in China)
Academic research funding is changing brain drain from academe
Commercial search engines & IR facing many challenges lead in innovations
© 2009 Tefko Saracevic 29
IR successfully effected:
Emergence & growth of the INFORMATION INDUSTRY
Evolution of IS as a PROFESSION & SCIENCE
Many APPLICATIONS in many fields including on the Web – search engines
Improvements in HUMAN - COMPUTER INTERACTION
Evolution of INTEDISCIPLINARITY
IR has a long, proud history
Table of content
© 2009 Tefko Saracevic 30
Part 4.
InformationSeveral areas of investigation;
as basic phenomenon – not much progress measures as Shannon's not successful concentrated on manifestations and effects
information representation large area connected with IR, librarianship metadata
bibliometricsscientometrics, informetrics, webometrics structures of literature – authors, journals… impact of authors, journals, institutions …
© 2009 Tefko Saracevic 31
What is information?Intuitively well understood, but formally not well stated Several viewpoints, models emerged
Signals: transmission source-channel-destination signals not content – not really applicable, despite many tries
Cognitive: changes in cognitive structures content processing & effects
Social: context, situation dependent information seeking, tasks
© 2009 Tefko Saracevic 32
Information in information science: Three senses (from narrowest to broadest)
1. Information in terms of decision involving little or no cognitive processing
signals, bits, straightforward data - e.g.. inf. theory (Shanon), economics,
2. Information involving cognitive processing & understanding
understanding, changes in cognitive states3. Information also as related to context,
situation, problem-at-hand users, use, task
For information science (including information retrieval):
third, broadest interpretation necessary
© 2009 Tefko Saracevic 33
Bibliometrics“… the quantitative treatment of the properties of
recorded discourse and behavior pertaining to it.” Fairthorne, 1969
Many quantitative studies & some laws Bradford’s law, Lotka’s law – regularities
quantity/yield distributions of journals, authors
also related areas: Scientometrics
covering science in general, not just publications
Informetrics all information objects
Webometrics or cybermetrics using bibliometric techniques to study the web
Major branches of bibliometricsRelational - older
Patterns, structures, relations, mappings where bibliometrics
started Data on what was
observed e.g. no. of
articles/citations by/to an author; no. of journals with articles relevant to a topic; no. of articles/citations in/to a journal …
Used for description, mapping of relations & prediction
Evaluative - newer
Impacts, effects where bibliometrics
became a big deal in many arenas
Data from what was observed but looking for measures of impact,
prominence, ranking … Discovers who’s up &
how much up Used for decisions,
policies
34© 2009 Tefko Saracevic
Major bibliometric factors for evaluation of academic performance
For individuals
Number of publications in peer reviewed journals
impact factor of those journals
Citation tracking The h-index
combines no. of publications & no. of citations
For institutions
Total no. of publications
Total no. of citations
Institutional impact factor
Various ratios - per faculty, project …
35© 2009 Tefko Saracevic
Example: University rankings Times Higher Education ranking: QS World University Rankings 2008 - Top 400 Universitieshttp://www.topuniversities.com/worlduniversityrankings/results/2008/overall_rankings/fullrankings/
Shanghai ranking: Academic Ranking of World Universities – 2007 - Shanghai Jiao Tong University http://www.arwu.org/rank/2007/ranking2007.htm Miscellaneous Information on University Rankings
http://www.arwu.org/rank/2008/200810/ARWU2008Resources.htm
Leiden ranking: Top 100 & 250 universities, Europe & world, 2008 - Leiden University, Netherlandshttp://www.cwts.nl/ranking/LeidenRankingWebSite.html
36© 2009 Tefko Saracevic
SCImago Journal & Country Rank (SJR) a great resource – from Spain
37© 2009 Tefko Saracevic
Used in a variety of functions & areas
In collection developmentidentifying the most-useful materials: by analyzing
circulation records; journal / e-journal usage statistics; etc.
In information retrievalidentifying top-ranked documents, authors: those most
highly-cited; most highly co-cited; most popular; etc. In the sociology of knowledge
identifying structural & temporal relationships between documents, authors, research areas, universities etc.
In policy makingjustifying, managing or prioritizing support for course
of action in a number of arease.g. science policy, institutional policy, promotion & tenure, grants, support for journals, evaluation of institutions
38© 2009 Tefko SaracevicTable of content
39
Part 5.
People Research
user & use studies interaction studies broadening to information seeking studies, social context, collaboration
relevance studies social informatics
Professional services in organization – moving toward knowledge management, competitive intelligence
in industry – vendors, aggregators, Internet,
© 2009 Tefko Saracevic
© 2009 Tefko Saracevic 40
User & use studies
Oldest area covers many topics, methods, orientations
many studies related to IR e.g. searching, multitasking, browsing, navigation
theoretical & experimental studies on relevance
Branching into Web use studies quantitative & qualitative studies emergence of webmetrics
© 2009 Tefko Saracevic 41
Interaction and ISThree streams:
computer-human interaction human-computer interaction human-information interaction
Many studies on: machine aspects of interaction human variables in interaction interaction with information
Web interactions: a major areaAnother interdisciplinary area
computers science, information science, cognitive science, ergonomics…
© 2009 Tefko Saracevic 42
Interaction & IR
Traditional IR model concentrates on matching but not on user side & interaction
Several interaction models suggested
Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model
hard to get experiments & confirmation Considered key to providing
basis for better design understanding of use of systems
Web interactions: a major new area
© 2009 Tefko Saracevic 43
Information seeking
Concentrates on broader context not only IR or interaction, people as they move in life & work
Number of models provided e.g. Kuhlthau’s information search process, Järvelin’s information seeking
Includes studies of ‘life in the round,’ making sense, information encountering, work life, information discovery
Based on concept of social construction of information
Table of content
© 2009 Tefko Saracevic 44
Part 6. Digital libraries
LARGE & growing area – many in IS involved, but also in computer science, & other fields (e.g. Perseus – classics)
“Hot” area in R&D a number of large grants & projects in the US, European Union, & other countries
but “DIGITAL” big & “libraries“ small but in the US & Europe funding is drying out
“Hot” area in practice building & managing digital collections, hybrid libraries, digitizing, preservation
many projects throughout the world
© 2009 Tefko Saracevic 45
Technical problems
Substantial - larger & more complex than anticipated e.g.: representing, storing & retrieving of library objects
particularly if originally designed to be printed & then digitized
operationally managing large collections - issues of scale
dealing with diverse & distributed collections
interoperability; federated searching assuring preservation & persistence incorporating rights management
© 2009 Tefko Saracevic 46
Research issuesunderstanding objects in DL
representing in many formatsmetadata, automating representation conversion, digitizationorganizing large collectionsmanaging collections, scalingpreservation, archivinginteroperability, standardizationaccessing, using, searching
federated searching of distributed collections evaluation of digital libraries
© 2009 Tefko Saracevic 47
DL projects in practice Heavily oriented toward institutions & their missions in libraries, but also others
museums, societies, government, commercial come in many varieties
Spread globally including digitization
Spending increasing significantly most often a trade-off for other resources
U California, Berkeley’s Libweb “lists over 7700 pages from libraries in over 146
countries”
© 2009 Tefko Saracevic 48
Connection?
DL research & DL practice presently are conducted mostly independently of
each other minimally informing
each other and having slight, or
no connection Parallel universes with
little connections & interaction, at present not good for either
research or practice
Table of content
© 2009 Tefko Saracevic 49
Part 7. Conclusions
IS contributions
IS effected handling of information in society
Developed an organized body of knowledge & professional competencies
Applied interdisciplinarity IR reached a mature stage
penetrated many fields & human activities
Stressed HUMAN in human-computer interaction
© 2009 Tefko Saracevic 50
Challenges Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure
Play a positive role in globalization of information
Respond to technological imperative in human terms
Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the web
Join competition with quality Join DIGITAL with LIBRARIES
© 2009 Tefko Saracevic 51
Juncture
IS is at a critical juncture in its evolution Many fields, groups ... moving into information
big competition entrance of powerful players fight for stakes
To be a major player IS needs to keep progressing in its: research & development professional competencies educational efforts interdisciplinary relations
Reexamination necessary
© 2009 Tefko Saracevic 52
Thank you Miró!
Thank you Picasso!
© 2009 Tefko Saracevic 53
& Prof. Sam Chufor inviting me!
Thank you:
© 2009 Tefko Saracevic 54
Selective bibliography
Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science,50, 1043-1050.
Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101-108. Available: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501-531.
Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73.
Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051-1063. Available: http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf
Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf
Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311-330.
White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972-1995. Journal of the American Society for Information Science, 49 (4), 327-355.