02 scientific information sources
TRANSCRIPT
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 1/41
UNIVERSITAT DE BARCELONAFacultat de Biblioteconomia i Documentació
Metodologia de la recerca
Professor: Ángel Borrego
Scientific Information Sources
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 2/41
Contents
• From author to reader: the scholarlycommunication process
• Index & Abstract (I&A) databases
• Assessment of I&A databases
• Searching I&A databases
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 3/41
Scholarly communication chain
Authors (scientists)
Journal Editors and Referees
Journals
I&A databases
Librarians
End users
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 4/41
Authors (researchers)
Scientists are the first link in the scholarly communicationchain. They create new knowledge and describe it inarticles, books, patents, etc.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 5/41
Publishing an article
Source: Weller, 2000
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 6/41
Editors and publishers
• Scientific editor: an expert in the book or journal’s field whomanages manuscripts’ review.
• Referee or reviewer (usually two or three): experts in the fieldwho (blindly) evaluate the work for the editor, noting
weaknesses or problems along with suggestions forimprovement, and including an explicit recommendation ofwhat to do with the manuscript (accept or reject).
• Publisher: some journals are published by non profit scientificsocieties or universities; other journals are published bycommercial publishers (Elsevier, Emerald, Springer, Wiley…)that expect economic revenues, especially through librarysubscriptions.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 7/41
Scholarly journals
A periodical publication reporting new research in the form of: – Articles: complete descriptions of current original research findings. – Review articles: accumulate the results of many articles on a topic into a
coherent narrative about the state of the art in that discipline. – Letters (not to be confused with the letters to the editor) or short
communications: short descriptions of important current research findings.
In 2004, Carol Tenopir (Library Journal , 2/1/2004) estimated that there were about43,500 active academic journals.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 8/41
I&A databases
• Databases are produced and/or hosted by publicadministrations or private companies.
• These organizations select the most important journals
in a field and analyse them in order to create Index &Abstract (I&A) databases.
• These databases usually offer additional services suchas setting user’s profiles, email alerts, etc.
• Hosts commercialize databases from several producersand provide users with engines to search them.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 9/41
Where to search for scientificinformation?
• Bibliographic (I&A) databases: produced and distributedby public administrations or private companies: – Instituto de Estudios Documentales sobre Ciencia y Tecnología
(cindoc.csic.es) – National Library of Medicine (www.nlm.nih.gov) – Dialog (dialog.com)
• Journal gateways: – Elsevier ScienceDirect (sciencedirect.com) – EmeraldInsight (emeraldinsight.com)
• Internet search engines: – Google Scholar (scholar.google.com) – Scirus (scirus.com)
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 10/41
Librarians
• They are intermediaries between information and endusers: – Know the best information sources in any given field. – Have the ability to transform a user’s information
need into a search equation that can be addressed toan automatic system.
• Tasks: – Exploit information sources. – Create new information sources. – Train users in the use of these sources.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 11/41
Librarians
• They are intermediaries between information and endusers: – Know the best information sources in any given field. – Have the ability to transform an information need into
a search equation that can be addressed to anautomatic system.
• Tasks: – Exploit information sources. – Create new information sources. – Train users in the use of these sources.
Are you sure about
this?
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 12/41
Where do scientists search forinformation?
Rowlands & Nicholas, 2005
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 13/41
Fry et al . , 20 09
Where???
Schonfeld i Housew right, 20 0 9
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 14/41
End users
• The main users of scientific information are scientists―i.e. authors― and some professionals―doctors, forinstance. Articles in scientific journals are written byscientists for scientists.
• There is also an ‘education market’―i.e. handbooks andmanuals that explain the basics of each discipline foreducational purposes.
• Finally, there is also a market for “popular science”including books, journals, mass media, museums, etc.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 15/41
In summary
Information is the main
input and output of science
─
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 16/41
Contents
• From author to reader: scholarlycommunication process
• I&A databases: concept
• Assessing I&A databases
• Searching I&A databases
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 17/41
Databases
• A database is “an organized collection of data, usually indigital form so that its contents can easily be accessed,managed, and updated....”
• … but you already know what a database is!
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 18/41
Access to scientific information
1665 2010200019801960
Printindexes
Access todatabasesthrough
telephonelines
Databaseson CD-ROM
Webonlineaccess
1840
Firstscientific journals
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 19/41
The database market
Year Databases Producers Hosts
1980 411 269 71
1985 2.247 1.316 414
1990 3.943 1.950 645
1994 5.307 2.220 812
1997 10.000 3.400 1.800
2007 20.000 n.d. n.d.
Large et al ., p. 46
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 20/41
Gale Directory of Databases
• Volume 1: online databases
Profiles nearly 11,000 online databases madepublicly available from the producer or an onlineservice
• Volume 2: CD-ROM, DVD, etc.Profiles more than 8,000 database products offeredin portable from or through batch processing
• “In its 34th edition (2011), Gale Directory of
Databases contains contact and descriptiveinformation on nearly 19,000 databases and over3,300 producers, online services, andvendors/distributors of database products.”
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 21/41
Gale Directory of Databases (2)
• Product descriptions.
• Database producers: contact information for databaseproducers and a list of products they produce.
• Vendors and distributors: contact information for vendorsand distributors, conditions of use, and a list of productsthey offer.
• Geographic index: list producers and vendors/distributorsby country.
• Subject index: classifies products within 1,800 subjectterms.
• Master index: lists all names in a single alphabeticsequence.
2011 edition
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 22/41
Gale Directory of Databases (3)
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 23/41
Contents
• From author to reader: scholarlycommunication process
• I&A databases: concept
• Assessing I&A databases
• Searching I&A databases
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 24/41
Assessment criteria
• Contents: – Coverage, accuracy, consistency, updating
• Information retrieval:
– Interface and search options
• Management: – Price, hardware and software requirements,
authentication, information provided by the producer,integration with other library products, support, etc.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 25/41
Database contents
• Coverage: – Topics, source types, chronological, geographical, languages – Local availability of the indexed sources
• Accuracy: – Grammar and typing mistakes. – Duplicate records.
• Consistency: – Formal description: names of authors and journals – Subject description: indexing and classification
• Updating: – Growth in the number of records – Delay in the introduction of records since publication
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 26/41
Interface and search options
• Search page: – Database structure and searchable fields – Simple / advanced / command search – Operators (Boolean, proximity, wildcards, etc.) – Field indexes and thesaurus – Search in a specific database, search history, multilingual interface, etc.
• Results page:
– Visualisation: format and number of records – Ranking criteria – Select and manage records – Record clustering – Similar records – Refine search – Information on errors (0 results).
• Record visualisation: – Record formats – Navigation between records and linked fields – Highlight of search terms in records
• Additional pages: database description, structure, help, etc.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 27/41
Database management
• Price and payment options
• Hardware and software requirements
• Authentication (password / IP / federated authentication)
• User’s manuals, online help, languages, etc.• Integration with other library products (metasearch engines, reference
management software, other databases from the same host).
• Library support
• Access (CD / online)*****************************************************
• And listen to your users: log analysis, surveys, observation…!!
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 28/41
“The JISC Academic DatabaseAssessment Tool (ADAT) aims to
help libraries to make informeddecisions about future subscriptionsto bibliographic databases.”
http://www.jisc-adat.com
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 29/41
Precision and recall
Relevant documents retrieved (a)• Precision = ─────────────────────── X 100
Retrieved documents (a + b)
Relevant documents retrieved (a)• Recall = ────────────────────────────── X 100
Relevant documents in the database (a + c)
Relevant Non- relevant Total
Retrieved a b (noise) a + b
Non-retr ieved c (silence) d c + d
Total a + c b + d a + b + c + d
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 30/41
Drawbacksof precision and recall
• What is a relevant document?
• We assume that relevance is binary.
• Different users may require of different levels of precision andrecall.
• There is an inverse relationship between precision and recall.
• Recall is just an estimate.
• If the system ranks documents by relevance, then precisionand recall vary as the user examines the retrieved records.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 31/41
Example
• Relevant documents in the database for query Q1:D3, D5, D9, D25, D39, D44, D56, D71, D89, D123
• Retrieved documents for query Q1 ranked by relevance
(relevant documents are dotted):
1. D123 • 6. D9 • 11. D38
2. D84 7. D511 12. D48
3. D56 • 8. D129 13. D250
4. D6 9. D187 14. D113
5. D8 10. D25 • 15. D3 •
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 32/41
Precision at differentlevels of recall
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Recall
P r e c i s
i o n
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 33/41
Contents
• From author to reader: scholarlycommunication process
• I&A databases: concept
• Assessing I&A databases
• Searching I&A databases
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 34/41
Query process
Contents
Representation
Organization
System User
Need
Representation
SearchMatch
Retrievedrecords
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 35/41
Search options
• Truncation and wildcards
• Natural vs. controlled vocabulary
• Boolean operators
• Proximity operators
• Search limits: date, type of source, language, etc.
+ Recall
+ Precision
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 36/41
Drawbacks of the Boolean model
• It does not matter whether there is an occurrence of thesearch term in the document or a hundred.
• It does not matter whether a document complies with all therequirements of an “or” search.
• Partial coincidence (for instance, complying with almost all the“and” conditions) is not taken into account.
• It is not possible to reflect the importance of each searchterm.
• A Boolean search just divides the database in two sets ofrelevant and non-relevant documents depending on whether
they fulfil the search conditions or do not. All retrieveddocuments are supposed to be of similar relevance so there isno mechanism to rank documents.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 37/41
Relevance sorting
• A simple method consists in assigning a weight to eachterm in each document.
• The easiest way to assign a weight to a term is to count
its frequency in the document.
• The total weight of a document in reply to a query is thesum of weights of all search terms.
• Those documents with a higher weight are ranked first.
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 38/41
Relevance sorting: example
Term A Term B Term C Term D
Docum ent 1 8 6 0 3
Docum ent 2 4 0 7 6
Docum ent 3 3 0 4 2
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 39/41
Relevance sorting: example
• Retrieved documents sorted by relevance for each query: – A AND C: Doc. 2; Doc. 3 – A OR C: Doc. 2; Doc. 1; Doc. 3 – A NOT C: Doc. 1
• Improving relevance sorting: – Weight the frequency of each term in the database: less
frequent terms are more useful to discriminate documents. – Position of the search term (title, for instance). – Number of incoming links from other documents (in digital
environments).
8/2/2019 02 Scientific Information Sources
http://slidepdf.com/reader/full/02-scientific-information-sources 40/41
Pay attention to thepresentation of results
• Good presentation increases the potential use of theinformation by the users, improves their comprehensionof the information, helps them to save time, andincreases users’ satisfaction.
– Specify the sources searched and the search strategy. – Summarise the results.
– Organise the references (alphabetically, by relevance...) andpresent them in a standard format.
– Pay attention to the format (headers, fonts, margins, etc.).
– Include recommendations: full text access, relevant sources, etc.