prague, november 9, 2010 m’hamed el aisati, head of product technology, s&t elsevier...

27
Prague, November 9, 2010 M’hamed el Aisati, Head of Product Technology, S&T Elsevier CERIF/euroCRIS and Elsevier Where do they meet?

Upload: eleanore-dean

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Prague, November 9, 2010M’hamed el Aisati, Head of Product Technology, S&T Elsevier

CERIF/euroCRIS and ElsevierWhere do they meet?

Outline What’s Elsevier (S&T) from a data and technology

perspective? Data types Data processing

Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?

What role More opportunities

2

Elsevier S&T = Scientific Data + Technology + much more

3

> 43 M Abstracts (A&I)

> 10 M Full-text Articles > 55K Main organization profiles

> 20 M Author profiles

> 60 M Patents > 1 M Awarded grants

> 10 years of ScienceDirect and > 5 years of Scopus usage/analytics data

> 10 K Books > 500 M Quality scientific web pages

4

Nearly 18,000 Titles including 16,500 Peer Reviewed Titles 600 Trade Journals 350 Book Series Extensive Conference Proceedings 40 languages are covered

~16,500 600 350

A rich and ex-tended coverage including

Abstracts and citations from5000 publishers (ELS 15%)

3,6 Million conference papers(10% of Scopus records)

“Articles in Press” from more than3000 titles

23 Million Patents

1,200 Open Access journals 80% of all Scopus records have an abstract Abstracts going back to 1823 (Scopus

includes all historical material of ELS, Springer, ACS, AIP, Nature, Science, etc..)

Nearly 2,700 Arts & Humanities titles 430 m integrated scientific websites via

Scirus.com

Scopus coverage

Scopus info on www.info.scopus.com

77005440 1460

230350

250

Scientific Data + Technology provides extra value

5

“Your companion for a scientific life”

7

Manager/Admin

Librarian

Department head

Researcher

Funding agent

Dean/Provost

Performance Evaluation

8

Baden-Wurttemberg

9

Australian Research Council – ERA 2010 Assessment of research quality within

Australia's higher education institutions using a combination of indicators and expert review by committees comprising experienced, internationally-recognized experts.

ERA uses leading researchers to evaluate research in eight discipline clusters.

ERA will detail areas within institutions and disciplines that are internationally competitive, as well as point to emerging areas where there are opportunities for development and further investment.

Early January 2010 – Aug/Sep 2010 First trial (PCE) in 2009 Scopus selected as source

information provider and partnerMore info on:http://www.arc.gov.au/era/default.htm

Australian Research Council – ERA 2010 3 main components:

- EID tagging- Dedicated web service (API)- Reports:

» Citation Benchmark report (cpp)» Centile threshold report» Ranked journal ‘Indicative World

Distribution’ Benchmark Report

ARC – Scopus – Universities interaction

EID tagging process

Dedicated Web Service

Outline What’s Elsevier (S&T) from a data and technology

perspective? Data types Data processing

Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?

What role More opportunities

12

Database technologies at Elsevier (1) XML native database for large bulk of data, e.g. Full-text articles, Abstract and

Indexing recordsNo ETL process involved“Search Interface” as top layer for retrieving data – XQueries instead of SQL queriesNo (upfront) data modelling is requiredLeveraging and retaining original XML structureMultiple DTDs and schemas supported concurrently. DTD or Schema not as a perquisite for data loadingWith XQuery whole web applications can be built, i.e. no integration with additional web programming language (e.g. php, javascript, etc.)

Though an expensive technologyStraightforward huge amount of data loading and querying might be challengingRequires specific skills

13

Database technologies at Elsevier (2) RDMBs databases for lightweight information, e.g. article and journal metadata.

Known and established technology (e.g. SQL)Typically heavy lifting is done at ETL stage in order to boost query performancePlenty of open source choice and thus free (e.g. MySQL), low threshold for adoptionIdeal for small amount of information

ETL process can be lengthyXML structure is ‘lost’ once data loaded. Separate DTD or schema required for exporting dataSQL is typically a back-end technology. Front end (web) application programming requires a different language (e.g. php, jsp, asp)Data modelling is required. Updating the data model usually requires data re-loading

14

Outline What’s Elsevier (S&T) from a data and technology

perspective? Data types Data processing

Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?

What role More opportunities

15

Elsevier logical fit

16

Some data models at Elsevier Authors are disambiguated

and profiled. Unique and persistent identifier

Affiliations are disambiguated and profiled. Unique and persistent identifier

Backward and forward citations captured through reference linking

Funding data aggregated to affiliations

17

Simple relational data model example Covers publications,

journals, classifications (disciplines), authors, affiliation, journal metrics, citations, etc.

18

Affiliation Profile XML snippert

19

<xocs:doc content-type="Profile" dbname="scopusbase" xsi:schemaLocation="http://www.elsevier.com/xml/xocs/dtd xocs-ip502.xsd" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><xocs:meta><xocs:eid>10-s2.0-101718729</xocs:eid><xocs:timestamp>2009-01-09T13:04:06.067735-05:00</xocs:timestamp></xocs:meta><xocs:institution-profile><institution-profile affiliation-id="101718729"> <status>update</status> <date-created year="2008" month="02" day="03"/> <date-revised year="2008" month="05" day="14" timestamp="2008-05-14T00:05:34.000034+01:00"/> <date-revised year="2008" month="06" day="30" timestamp="2008-06-30T02:09:24.000024+01:00"/> <date-revised year="2009" month="01" day="01" timestamp="2009-01-01T13:57:41.000041+00:00"/> <date-revised year="2009" month="01" day="09" timestamp="2009-01-09T17:44:11.000011+00:00"/> <preferred-name>Balearic Islands Government</preferred-name> <sort-name>Balearic Islands Government</sort-name> <name-variant>Balearic Islands Government</name-variant> <name-variant>Govern Balear</name-variant> <name-variant>Govern de les Illes Balears</name-variant> <address country="es"> <address-part>C/. Foners 10</address-part> <city>Palma</city> <postal-code>07006</postal-code> </address>……

Unique and persistent affiliation ID

Author Profile XML snippert

20

<author-profile id=“7401581436" type="author" suppress="false"> …..<preferred-name>

<initials>A.W.</initials> <indexed-name>MacDonald A.</indexed-name> <surname>MacDonald</surname> <given-name>Alistair W.</given-name><

/preferred-name><name-variant>

<initials>A.W.</initials> <indexed-name>Macdonald A.</indexed-name> <surname>MacDonald</surname> <given-name>A. W.</given-name>

</name-variant>…<classificationgroup>

<classifications type="ASJC"> <classification frequency="7">1306</classification> <classification frequency="1">1315</classification>

<publication-range start="1989" end="2009"/>…<journal-history type="author">

<journal type="j"> <sourcetitle>Clinical Cancer Research</sourcetitle>

…..<affiliation-current>

<affiliation affiliation-id="106499546" parent="60019718"/> </affiliation-current> <affiliation-history> <affiliation affiliation-id="104228751" parent="60024340"/> </affiliation-history> </author-profile>

Unique and persistent author ID

Reference to affiliation

Publication XML snippert

21

<bibrecord><item-info><copyright type="Elsevier">Copyright 2008 Elsevier B.V.,All rights reserved.</copyright><itemidlist><itemid idtype="SCP">34147094726</itemid><history><date-created year="2007" month="04" day="18"/></history><dbcollection>SNCABS</dbcollection><dbcollection>Scopusbase</dbcollection></item-info><head><citation-info><citation-type code="ar"/><citation-language xml:lang="en"/>

…..<author seq="3" auid="7003372933"><ce:initials>P.</ce:initials><ce:indexed-name>Barret P.</ce:indexed-name><ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name><preferred-name><ce:initials>P.</ce:initials><ce:indexed-name>Barret P.</ce:indexed-name><ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name></preferred-name><ce:e-address type="email">[email protected]</ce:e-address></author><affiliation country="fr" afid="60001542"><organization>Plateforme de Transg??n??se du Bl??</organization><organization>UMR ASP 1095 INRA</organization><organization>Université Blaise Pascal</organization><city-group>63100 Clermont-Ferrand</city-group></affiliation>

<references count=“27”>…..</references></bibrecord>

Unique and persistent publication ID

Reference to author

Reference to affiliation

Scopus Custom Data

22

Custom Data is: • A big bucket of highly structured XML items• Extracted directly from Scopus• Accompanied by the articles’ cited by counts• Supported by extensive documentation and test data

upon request• FTP-ed or shipped via mobile (usb) drivesExample of XML data

- <author-group>

- <author seq="1" auid="7005613516">

<ce:initials>A.</ce:initials>

<ce:indexed-name>Rothschild A.</ce:indexed-name>

<ce:surname>Rothschild</ce:surname>

<ce:given-name>Avner</ce:given-name>

- <preferred-name>

<ce:initials>A.</ce:initials>

<ce:indexed-name>Rothschild A.</ce:indexed-name>

<ce:surname>Rothschild</ce:surname>

<ce:given-name>Avner</ce:given-name>

</preferred-name>

<ce:e-address type="email">[email protected]</ce:e-address>

</author>

- <author seq="2" auid="8625399100">

<ce:initials>S.J.</ce:initials>

• Scopus contains ~42 million items• In principle all articles can be ordered • Custom Data can be grouped using the

following criteria:• On ASJC code (All Science Journal

Classification Code). (see next slide)• Per Country• List of countries• Per year• Range of years• Further refining possible in close

cooperation with Product Team • Certain fields can be taken out if preferred;

• Abstracts• References• Etc.

A wide variety of Web Services and APIs SOAP and REST: Simple and

accessible to low level development

Different service levels supported

Access to different content types

Hub ScienceDirect articles Scopus abstracts, Author

profiles, Affiliation profiles Both Search and retrieval XML and other formats

supported

23

Outline What’s Elsevier (S&T) from a data and technology

perspective? Data types Data processing

Data Technology adopted and deployed at Elsevier From Elsevier data models to CERIF Is there a role for a publisher?

What role More opportunities

24

Significant part of Research Information is at publisher

Publishers have lots of info about publications and researchers

Publishers have been dealing with research info for many years

Early adopters of XML and database technologies Are at the front of changes taking place on research area

More and more publishers – certainly Elsevier - are working closely with institutions on topics related to research information management and performance evaluation

25

euroCRIS and CERIF as seen by Elsevier CERIF as a standardized format is a great initiative Elsevier is happy to partner with euroCRIS to improve,

maintain and update the ‘standard’ Elsevier at the other hand is “agnostic” to CRIS

implementations What is the future of data models moving forward with

evolving technologies? Do you need one today? Do you care about how systems are

implemented and set up? Shouldn’t the focus be on the interface/exchange layer?

With web services according to a standard (CERIF), back-end systems are less relevant

26

Opportunities for euroCRIS and Elsevier

Work collaboratively on further standardization of CERIF Ensure completeness of research information exchanged

through CERIF Adopt CERIF as one of the exporting formats straight into

local systems (CRIS or non CRIS) Elsevier and euroCRIS to help accelerate research

community management the population of local systems and repositories

Expand CERIF to include metric based report information for performance evaluation

Exchange technology and knowledge for potential CRIS implementation recommendation

Accelerate integration of Elsevier and other vendors’ products and its data with local systems (e.g. HR, etc.)

27

Thanks

28

For questions and/or follow up:M’hamed el [email protected]