prague, november 9, 2010 m’hamed el aisati, head of product technology, s&t elsevier...
TRANSCRIPT
Prague, November 9, 2010M’hamed el Aisati, Head of Product Technology, S&T Elsevier
CERIF/euroCRIS and ElsevierWhere do they meet?
Outline What’s Elsevier (S&T) from a data and technology
perspective? Data types Data processing
Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?
What role More opportunities
2
Elsevier S&T = Scientific Data + Technology + much more
3
> 43 M Abstracts (A&I)
> 10 M Full-text Articles > 55K Main organization profiles
> 20 M Author profiles
> 60 M Patents > 1 M Awarded grants
> 10 years of ScienceDirect and > 5 years of Scopus usage/analytics data
> 10 K Books > 500 M Quality scientific web pages
4
Nearly 18,000 Titles including 16,500 Peer Reviewed Titles 600 Trade Journals 350 Book Series Extensive Conference Proceedings 40 languages are covered
~16,500 600 350
A rich and ex-tended coverage including
Abstracts and citations from5000 publishers (ELS 15%)
3,6 Million conference papers(10% of Scopus records)
“Articles in Press” from more than3000 titles
23 Million Patents
1,200 Open Access journals 80% of all Scopus records have an abstract Abstracts going back to 1823 (Scopus
includes all historical material of ELS, Springer, ACS, AIP, Nature, Science, etc..)
Nearly 2,700 Arts & Humanities titles 430 m integrated scientific websites via
Scirus.com
Scopus coverage
Scopus info on www.info.scopus.com
77005440 1460
230350
250
“Your companion for a scientific life”
7
Manager/Admin
Librarian
Department head
Researcher
Funding agent
Dean/Provost
9
Australian Research Council – ERA 2010 Assessment of research quality within
Australia's higher education institutions using a combination of indicators and expert review by committees comprising experienced, internationally-recognized experts.
ERA uses leading researchers to evaluate research in eight discipline clusters.
ERA will detail areas within institutions and disciplines that are internationally competitive, as well as point to emerging areas where there are opportunities for development and further investment.
Early January 2010 – Aug/Sep 2010 First trial (PCE) in 2009 Scopus selected as source
information provider and partnerMore info on:http://www.arc.gov.au/era/default.htm
Australian Research Council – ERA 2010 3 main components:
- EID tagging- Dedicated web service (API)- Reports:
» Citation Benchmark report (cpp)» Centile threshold report» Ranked journal ‘Indicative World
Distribution’ Benchmark Report
Outline What’s Elsevier (S&T) from a data and technology
perspective? Data types Data processing
Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?
What role More opportunities
12
Database technologies at Elsevier (1) XML native database for large bulk of data, e.g. Full-text articles, Abstract and
Indexing recordsNo ETL process involved“Search Interface” as top layer for retrieving data – XQueries instead of SQL queriesNo (upfront) data modelling is requiredLeveraging and retaining original XML structureMultiple DTDs and schemas supported concurrently. DTD or Schema not as a perquisite for data loadingWith XQuery whole web applications can be built, i.e. no integration with additional web programming language (e.g. php, javascript, etc.)
Though an expensive technologyStraightforward huge amount of data loading and querying might be challengingRequires specific skills
13
Database technologies at Elsevier (2) RDMBs databases for lightweight information, e.g. article and journal metadata.
Known and established technology (e.g. SQL)Typically heavy lifting is done at ETL stage in order to boost query performancePlenty of open source choice and thus free (e.g. MySQL), low threshold for adoptionIdeal for small amount of information
ETL process can be lengthyXML structure is ‘lost’ once data loaded. Separate DTD or schema required for exporting dataSQL is typically a back-end technology. Front end (web) application programming requires a different language (e.g. php, jsp, asp)Data modelling is required. Updating the data model usually requires data re-loading
14
Outline What’s Elsevier (S&T) from a data and technology
perspective? Data types Data processing
Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?
What role More opportunities
15
Some data models at Elsevier Authors are disambiguated
and profiled. Unique and persistent identifier
Affiliations are disambiguated and profiled. Unique and persistent identifier
Backward and forward citations captured through reference linking
Funding data aggregated to affiliations
17
Simple relational data model example Covers publications,
journals, classifications (disciplines), authors, affiliation, journal metrics, citations, etc.
18
Affiliation Profile XML snippert
19
<xocs:doc content-type="Profile" dbname="scopusbase" xsi:schemaLocation="http://www.elsevier.com/xml/xocs/dtd xocs-ip502.xsd" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><xocs:meta><xocs:eid>10-s2.0-101718729</xocs:eid><xocs:timestamp>2009-01-09T13:04:06.067735-05:00</xocs:timestamp></xocs:meta><xocs:institution-profile><institution-profile affiliation-id="101718729"> <status>update</status> <date-created year="2008" month="02" day="03"/> <date-revised year="2008" month="05" day="14" timestamp="2008-05-14T00:05:34.000034+01:00"/> <date-revised year="2008" month="06" day="30" timestamp="2008-06-30T02:09:24.000024+01:00"/> <date-revised year="2009" month="01" day="01" timestamp="2009-01-01T13:57:41.000041+00:00"/> <date-revised year="2009" month="01" day="09" timestamp="2009-01-09T17:44:11.000011+00:00"/> <preferred-name>Balearic Islands Government</preferred-name> <sort-name>Balearic Islands Government</sort-name> <name-variant>Balearic Islands Government</name-variant> <name-variant>Govern Balear</name-variant> <name-variant>Govern de les Illes Balears</name-variant> <address country="es"> <address-part>C/. Foners 10</address-part> <city>Palma</city> <postal-code>07006</postal-code> </address>……
Unique and persistent affiliation ID
Author Profile XML snippert
20
<author-profile id=“7401581436" type="author" suppress="false"> …..<preferred-name>
<initials>A.W.</initials> <indexed-name>MacDonald A.</indexed-name> <surname>MacDonald</surname> <given-name>Alistair W.</given-name><
/preferred-name><name-variant>
<initials>A.W.</initials> <indexed-name>Macdonald A.</indexed-name> <surname>MacDonald</surname> <given-name>A. W.</given-name>
</name-variant>…<classificationgroup>
<classifications type="ASJC"> <classification frequency="7">1306</classification> <classification frequency="1">1315</classification>
<publication-range start="1989" end="2009"/>…<journal-history type="author">
<journal type="j"> <sourcetitle>Clinical Cancer Research</sourcetitle>
…..<affiliation-current>
<affiliation affiliation-id="106499546" parent="60019718"/> </affiliation-current> <affiliation-history> <affiliation affiliation-id="104228751" parent="60024340"/> </affiliation-history> </author-profile>
Unique and persistent author ID
Reference to affiliation
Publication XML snippert
21
<bibrecord><item-info><copyright type="Elsevier">Copyright 2008 Elsevier B.V.,All rights reserved.</copyright><itemidlist><itemid idtype="SCP">34147094726</itemid><history><date-created year="2007" month="04" day="18"/></history><dbcollection>SNCABS</dbcollection><dbcollection>Scopusbase</dbcollection></item-info><head><citation-info><citation-type code="ar"/><citation-language xml:lang="en"/>
…..<author seq="3" auid="7003372933"><ce:initials>P.</ce:initials><ce:indexed-name>Barret P.</ce:indexed-name><ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name><preferred-name><ce:initials>P.</ce:initials><ce:indexed-name>Barret P.</ce:indexed-name><ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name></preferred-name><ce:e-address type="email">[email protected]</ce:e-address></author><affiliation country="fr" afid="60001542"><organization>Plateforme de Transg??n??se du Bl??</organization><organization>UMR ASP 1095 INRA</organization><organization>Université Blaise Pascal</organization><city-group>63100 Clermont-Ferrand</city-group></affiliation>
<references count=“27”>…..</references></bibrecord>
Unique and persistent publication ID
Reference to author
Reference to affiliation
Scopus Custom Data
22
Custom Data is: • A big bucket of highly structured XML items• Extracted directly from Scopus• Accompanied by the articles’ cited by counts• Supported by extensive documentation and test data
upon request• FTP-ed or shipped via mobile (usb) drivesExample of XML data
- <author-group>
- <author seq="1" auid="7005613516">
<ce:initials>A.</ce:initials>
<ce:indexed-name>Rothschild A.</ce:indexed-name>
<ce:surname>Rothschild</ce:surname>
<ce:given-name>Avner</ce:given-name>
- <preferred-name>
<ce:initials>A.</ce:initials>
<ce:indexed-name>Rothschild A.</ce:indexed-name>
<ce:surname>Rothschild</ce:surname>
<ce:given-name>Avner</ce:given-name>
</preferred-name>
<ce:e-address type="email">[email protected]</ce:e-address>
</author>
- <author seq="2" auid="8625399100">
<ce:initials>S.J.</ce:initials>
• Scopus contains ~42 million items• In principle all articles can be ordered • Custom Data can be grouped using the
following criteria:• On ASJC code (All Science Journal
Classification Code). (see next slide)• Per Country• List of countries• Per year• Range of years• Further refining possible in close
cooperation with Product Team • Certain fields can be taken out if preferred;
• Abstracts• References• Etc.
A wide variety of Web Services and APIs SOAP and REST: Simple and
accessible to low level development
Different service levels supported
Access to different content types
Hub ScienceDirect articles Scopus abstracts, Author
profiles, Affiliation profiles Both Search and retrieval XML and other formats
supported
23
Outline What’s Elsevier (S&T) from a data and technology
perspective? Data types Data processing
Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?
What role More opportunities
24
Significant part of Research Information is at publisher
Publishers have lots of info about publications and researchers
Publishers have been dealing with research info for many years
Early adopters of XML and database technologies Are at the front of changes taking place on research area
More and more publishers – certainly Elsevier - are working closely with institutions on topics related to research information management and performance evaluation
25
euroCRIS and CERIF as seen by Elsevier CERIF as a standardized format is a great initiative Elsevier is happy to partner with euroCRIS to improve,
maintain and update the ‘standard’ Elsevier at the other hand is “agnostic” to CRIS
implementations What is the future of data models moving forward with
evolving technologies? Do you need one today? Do you care about how systems are
implemented and set up? Shouldn’t the focus be on the interface/exchange layer?
With web services according to a standard (CERIF), back-end systems are less relevant
26
Opportunities for euroCRIS and Elsevier
Work collaboratively on further standardization of CERIF Ensure completeness of research information exchanged
through CERIF Adopt CERIF as one of the exporting formats straight into
local systems (CRIS or non CRIS) Elsevier and euroCRIS to help accelerate research
community management the population of local systems and repositories
Expand CERIF to include metric based report information for performance evaluation
Exchange technology and knowledge for potential CRIS implementation recommendation
Accelerate integration of Elsevier and other vendors’ products and its data with local systems (e.g. HR, etc.)
27