data exchange alternatives, sbis conference in stockholm (2008)

Biodiversity Data Provider Software

Hands-on exercises with TAPIR

Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)

2

Fallacies of Distributed Computing

1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.This list of fallacies came about at Sun Microsystems around 1994.

3

TAPIR

Cartoon by Sasha Kopf (Creative Commons)

4

TAPIR• TAPIR - TDWG Access Protocol for

Information Retrieval. • During the 2004 TDWG meeting in

Christchurch, NZ, work started on a unified protocol and named TAPIR.

• TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR.

5

Provider software, wrappers

• DiGIR (2002, not active)– http://digir.sourceforge.net

• BioCASE (2003, PyWrapper)– http://www.biocase.org

• PyWrapper3 (2006, not active)– http://trac.pywrapper.org/

• TapirLink (2007)– http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink

• GBIF Provider Toolkit (2009)– http://code.google.com/p/gbif-providertoolkit

http://digir.sourceforge.net/

http://www.biocase.org/

http://trac.pywrapper.org/

http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink

http://code.google.com/p/gbif-providertoolkit

6

BioCASE 2.5.ORC• The BioCASE provider software is a

product of the EU funded BioCASE project (2001-2004).

• Developed at BGBM in Berlin. • Last updated in April 2008, with

support for Python version 2.5 and less required external

• Implement the BioCASE provider to share data as ABCD 2.06.

http://www.biocase.org



7

1. Make sure you have Python 2.5 installed

(command line: python –v)

2. Download the latest provider software from http://www.biocase.org

3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]

(tar –xzvf provider_...tar.gz)

4. Run setup.py, (python setup.py)

5. Configure your web server to mount biocase/www ashttp://localhost/biocase/

Hint: You will find an example for httpd.conf as the last terminal output from running setup.py

BioCASE 2.5.ORC



http://localhost/biocase/

8

BioCASE 2.5.ORC6. Visit the library test page: http://localhost/biocase/utilities/testlibs.cgi

6a. Download latest 4 Suite from http://4suite.org/ Uncompress and install [4Suite-XML-1.0.2.tar.bz2]

6b. Install additional python libraries, including the desired database driver. For each python package: (python setup.py install)

6c. Graphviz is useful to visualize the databasetable structure.

http://localhost/biocase/utilities/testlibs.cgi

http://4suite.org/

9

BioCASE 2.5.ORC

7. Configuration

• Add datasource (dsa)

• Database connection

• Database table structure

• Mapping of data model to standard schema

10

BioCASE 2.5.ORC8. Query FormThe manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto

http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto

12

PyWrapper3Home: http://trac.pywrapper.org/Primary developers: Markus Döring, Javier de la Torre

14/07/2008 - Development stalledWe are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)

• PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)• PyWrapper 3.1.0 alpha (development version, works with

Python 2.5)

PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.

http://trac.pywrapper.org/

13

Required configuration

Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).

Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.

Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)

Apache, MySQL and Python are open source software, free to use - even for commercial products.

14

Installation

http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide

1. Download the latest PyWrapper3 installer.Use SVN export or checkout for Python 2.5 support

2. Uncompress to a folder of your choice.Example: “/usr/local/pywrapper3/”Example: “C:\pywrapper\”

Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)

promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper promt$ cd pywrapper/tools promt$ /bin/sh install.sh

This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…

3. Execute: pywrapper/setup.pyExample: promt$ python setup.py (Mac OS X,

Linux)On Windows locate setup.py and double-click

http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide

15

Start standalone server

Execute start_server.py (default port is 8080)

promt$ cd webapp/ promt$ ./start_server.py 8088 (example to start on

port 8088)

On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).

Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:

http://localhost:8088/pywrapper

http://localhost:8088/pywrapper

16

Configuration

After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.

Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).

Database connection. For PyWrapper to access your database.

Database structure. Define the relevant database tables, the primary keys and foreign keys.

Data model. Map your database model to the standard represented by the XML Schemas you choose.

http://trac.pywrapper.org/pywrapper/wiki/Documentation

http://trac.pywrapper.org/pywrapper/wiki/Documentation

17

Screen examples

PyWrapper comes with a graphical web based configuration tool

For more information and more screen dumps from the configuration of PyWrapper, see:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i

http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i

18

TapirLink 0.6.1

19

TapirLink 0.6.1

Uncompress PHP source codeEg: /usr/local/tapir/tapirlink

Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLinkPrimary developers: Renato De Giovanni, Dave VieglaisDownload: http://sourceforge.net/project/showfiles.php?group_id=38190

Read permissions on all directoriesWrite on cache, config, log, statistics

Mount admin and www directory for your web server.

Example: Apache “httpd.conf”

Alias /tapirlink "/usr/local/tapir/tapirlink/www”Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin" <Location /tapirlink> Order allow,deny Allow from all</Location> <Location /tapirlink-admin> Order allow,deny Allow from all</Location>

http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink

http://sourceforge.net/project/showfiles.php?group_id=38190

20

TapirLink 0.6.1

Step 1: Describe your new resource

Start by adding a new resource http://localhost/tapirlink-admin/

http://localhost/tapirlink-admin/

21

Step 3: Table structure

TapirLink 0.6.1Step 2: Data source, database connection

22

Step 4: Filter

Step 5: Select mapping standards to use

TapirLink 0.6.1

23

Step 5b: Mapping data schema (ABCD 2.06)

TapirLink 0.6.1

etc…

24

Step 5c: Mapping data concepts (Darwin Core)TapirLink 0.6.1

etc…

Step 5d: Remember that DwC have an extension for geospatial descriptors

etc…

25

Step 6: Settings

TapirLink 0.6.1

New resource successfully configured

26

Test resource with client form:http://localhost/tapirlink/tapir_client.php

TapirLink 0.6.1

The XML Client form is very illustrative for understanding exactly how the wrapper software works!

http://localhost/tapirlink/tapir_client.php

27

Service interface

28

EXAMPLE OF A SERVICE REQUEST

All exchanged data is formatted with XML tags.

29

EXAMPLE OF A SERVICE RESPONSE

...

30

EXAMPLE TAPIR SERVICE REQUEST

31

EXAMPLE TAPIR SERVICE RESPONSE

singer:/sourcenamesinger:/taxonomy/genussinger:/taxonomy/speciessinger:/taxonomy/subspeciessinger:/holding/IDsinger:/holding/namesinger:/origin/collecting/countrysourcesinger:/origin/collecting/countrysourceIDsinger:/status/biologicalstatussinger:/status/biologicalstatusID

...

32

EXAMPLE TAPIR SERVICE SEARCH REQUEST

33

EXAMPLE TAPIR SERVICE SEARCH RESPONSE

34

EXAMPLE OF OAI-PMH SERVICE REQUEST

http://an.oa.org/OAI-script?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_dc

OAI-PMH requests are expressed as HTTP requests.

OAI-PMH requests must be submitted using either the HTTP GET or POST methods.

http://an.oa.org/OAI-script?verb=GetRecord




35

EXAMPLE OF OAI-PMH SERVICE RESPONSE

OAI-PMH responses are formatted as HTTP responses.

With The Content-Type as text/xml.

36

OAI-PMH PROTOCOL, METADATA FORMATS

Request types (verb):

IdentifyListMetadataFormatsListSetsGetRecordListIdentifiersListRecords

For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.

Communities adopt own metadataPrefixes for own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.

37

Why sharing data?

38

[http://data.gbif.org/]

[http://data.gbif.org/datasets/resource/1487]

http://data.gbif.org/datasets/network/2




http://data.gbif.org/datasets/resource/1487/

39

GBIF PGR Network 2[http://data.gbif.org/datasets/network/2]





40

Distributed network

The image is from the BioCASE web site

41

Decentralized network

EURISCO(Europe)

NordGen(Northern Europe)

IPK Gatersleben(Germany)

IHAR(Poland)

(Other European gene banks...)

SINGER(CGIAR)

(CGIARInternationalFuture Harvest gene banks...)

USDA GRIN (USA)

(USDA ARSNational Germplasm Repositories...)

WUR CGN(Netherlands)

GBIF(Global BiodiversityInformation Facility)

USER

ALIS(Accession Level Information System)

Web Services

MCPD

Svalbard Global Seed Vault(Safe Backup)

42

Crop Wild Relatives

ARMLKA

BOL

MDG

UZB

National Datasetsare shared with the central CWR data index.

The national datasets as well as access to other International datasets are provided from the CWR data portal.

EURISCO

SINGERhttp://www.cropwildrelatives.org

http://www.cropwildrelatives.org/




http://www.bioversityinternational.org/

http://singer.grinfo.net/

http://eurisco.ecpgr.org/

http://data.gbif.org/

43

Data portal example

44http://wwwdev.ngb.se/portal/index.php?scope=demo

49

Outlook• The compatibility of data standards between PGR and biodiversity

collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.

• Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.

• Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.

• The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).

50

• Participation and the sharing of your institute datasets with global and national biodiversity projects

• is important for your public and scientific visibility,

• promoting the use (usefulness) of your data

• and ultimately for the continued funding of your institutional activities.

51

Special thanks to

• Bioversity International [http://www.bioversityinternational.org]

• GBIF, Global Biodiversity Information Facility [http://www.gbif.org]

• BioCASE, The Biological Collection Access Service for Europe. [http://www.biocase.org]

• TDWG, Biodiversity Information Standards [http://www.tdwg.org]

52

Special thanks to

• BioCASE and PyWrapper3 software– Markus Döring– Javier de la Torre

• DiGIR and TapirLink software– Renato de Giovanni– Dave Vieglais

53

Thank you for listening!

data exchange alternatives, sbis conference in stockholm (2008)

Technology

python pywrapper

python version

python library

python v2

python package

local python

additional python libraries

python programming language