data exchange alternatives, sbis conference in stockholm (2008)
DESCRIPTION
Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).TRANSCRIPT
Biodiversity Data Provider Software
Hands-on exercises with TAPIR
Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)
2
Fallacies of Distributed Computing
1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.This list of fallacies came about at Sun Microsystems around 1994.
3
TAPIR
Cartoon by Sasha Kopf (Creative Commons)
4
TAPIR• TAPIR - TDWG Access Protocol for
Information Retrieval. • During the 2004 TDWG meeting in
Christchurch, NZ, work started on a unified protocol and named TAPIR.
• TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR.
5
Provider software, wrappers
• DiGIR (2002, not active)– http://digir.sourceforge.net
• BioCASE (2003, PyWrapper)– http://www.biocase.org
• PyWrapper3 (2006, not active)– http://trac.pywrapper.org/
• TapirLink (2007)– http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink
• GBIF Provider Toolkit (2009)– http://code.google.com/p/gbif-providertoolkit
6
BioCASE 2.5.ORC• The BioCASE provider software is a
product of the EU funded BioCASE project (2001-2004).
• Developed at BGBM in Berlin. • Last updated in April 2008, with
support for Python version 2.5 and less required external
• Implement the BioCASE provider to share data as ABCD 2.06.
http://www.biocase.org
7
1. Make sure you have Python 2.5 installed
(command line: python –v)
2. Download the latest provider software from http://www.biocase.org
3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]
(tar –xzvf provider_...tar.gz)
4. Run setup.py, (python setup.py)
5. Configure your web server to mount biocase/www ashttp://localhost/biocase/
Hint: You will find an example for httpd.conf as the last terminal output from running setup.py
BioCASE 2.5.ORC
8
BioCASE 2.5.ORC6. Visit the library test page: http://localhost/biocase/utilities/testlibs.cgi
6a. Download latest 4 Suite from http://4suite.org/ Uncompress and install [4Suite-XML-1.0.2.tar.bz2]
6b. Install additional python libraries, including the desired database driver. For each python package: (python setup.py install)
6c. Graphviz is useful to visualize the databasetable structure.
9
BioCASE 2.5.ORC
7. Configuration
• Add datasource (dsa)
• Database connection
• Database table structure
• Mapping of data model to standard schema
10
BioCASE 2.5.ORC8. Query FormThe manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto
11
12
PyWrapper3Home: http://trac.pywrapper.org/Primary developers: Markus Döring, Javier de la Torre
14/07/2008 - Development stalledWe are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)
• PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)• PyWrapper 3.1.0 alpha (development version, works with
Python 2.5)
PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.
13
Required configuration
Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).
Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.
Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)
Apache, MySQL and Python are open source software, free to use - even for commercial products.
14
Installation
http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide
1. Download the latest PyWrapper3 installer.Use SVN export or checkout for Python 2.5 support
2. Uncompress to a folder of your choice.Example: “/usr/local/pywrapper3/”Example: “C:\pywrapper\”
Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)
promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper promt$ cd pywrapper/tools promt$ /bin/sh install.sh
This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…
3. Execute: pywrapper/setup.pyExample: promt$ python setup.py (Mac OS X,
Linux)On Windows locate setup.py and double-click
15
Start standalone server
Execute start_server.py (default port is 8080)
promt$ cd webapp/ promt$ ./start_server.py 8088 (example to start on
port 8088)
On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).
Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:
http://localhost:8088/pywrapper
16
Configuration
After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.
Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).
Database connection. For PyWrapper to access your database.
Database structure. Define the relevant database tables, the primary keys and foreign keys.
Data model. Map your database model to the standard represented by the XML Schemas you choose.
http://trac.pywrapper.org/pywrapper/wiki/Documentation
17
Screen examples
PyWrapper comes with a graphical web based configuration tool
For more information and more screen dumps from the configuration of PyWrapper, see:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
18
TapirLink 0.6.1
19
TapirLink 0.6.1
Uncompress PHP source codeEg: /usr/local/tapir/tapirlink
Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLinkPrimary developers: Renato De Giovanni, Dave VieglaisDownload: http://sourceforge.net/project/showfiles.php?group_id=38190
Read permissions on all directoriesWrite on cache, config, log, statistics
Mount admin and www directory for your web server.
Example: Apache “httpd.conf”
Alias /tapirlink "/usr/local/tapir/tapirlink/www”Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin" <Location /tapirlink> Order allow,deny Allow from all</Location> <Location /tapirlink-admin> Order allow,deny Allow from all</Location>
20
TapirLink 0.6.1
Step 1: Describe your new resource
Start by adding a new resource http://localhost/tapirlink-admin/
21
Step 3: Table structure
TapirLink 0.6.1Step 2: Data source, database connection
22
Step 4: Filter
Step 5: Select mapping standards to use
TapirLink 0.6.1
23
Step 5b: Mapping data schema (ABCD 2.06)
TapirLink 0.6.1
etc…
24
Step 5c: Mapping data concepts (Darwin Core)TapirLink 0.6.1
etc…
Step 5d: Remember that DwC have an extension for geospatial descriptors
etc…
25
Step 6: Settings
TapirLink 0.6.1
New resource successfully configured
26
Test resource with client form:http://localhost/tapirlink/tapir_client.php
TapirLink 0.6.1
The XML Client form is very illustrative for understanding exactly how the wrapper software works!
27
Service interface
28
EXAMPLE OF A SERVICE REQUEST
All exchanged data is formatted with XML tags.
29
EXAMPLE OF A SERVICE RESPONSE
...
30
EXAMPLE TAPIR SERVICE REQUEST
31
EXAMPLE TAPIR SERVICE RESPONSE
singer:/sourcenamesinger:/taxonomy/genussinger:/taxonomy/speciessinger:/taxonomy/subspeciessinger:/holding/IDsinger:/holding/namesinger:/origin/collecting/countrysourcesinger:/origin/collecting/countrysourceIDsinger:/status/biologicalstatussinger:/status/biologicalstatusID
...
32
EXAMPLE TAPIR SERVICE SEARCH REQUEST
33
EXAMPLE TAPIR SERVICE SEARCH RESPONSE
34
EXAMPLE OF OAI-PMH SERVICE REQUEST
http://an.oa.org/OAI-script?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_dc
OAI-PMH requests are expressed as HTTP requests.
OAI-PMH requests must be submitted using either the HTTP GET or POST methods.
35
EXAMPLE OF OAI-PMH SERVICE RESPONSE
OAI-PMH responses are formatted as HTTP responses.
With The Content-Type as text/xml.
36
OAI-PMH PROTOCOL, METADATA FORMATS
Request types (verb):
IdentifyListMetadataFormatsListSetsGetRecordListIdentifiersListRecords
For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.
Communities adopt own metadataPrefixes for own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.
37
Why sharing data?
38
[http://data.gbif.org/]
[http://data.gbif.org/datasets/resource/1487]
39
GBIF PGR Network 2[http://data.gbif.org/datasets/network/2]
40
Distributed network
The image is from the BioCASE web site
41
Decentralized network
EURISCO(Europe)
NordGen(Northern Europe)
IPK Gatersleben(Germany)
IHAR(Poland)
(Other European gene banks...)
SINGER(CGIAR)
(CGIARInternationalFuture Harvest gene banks...)
USDA GRIN (USA)
(USDA ARSNational Germplasm Repositories...)
WUR CGN(Netherlands)
GBIF(Global BiodiversityInformation Facility)
USER
ALIS(Accession Level Information System)
Web Services
MCPD
Svalbard Global Seed Vault(Safe Backup)
42
Crop Wild Relatives
ARMLKA
BOL
MDG
UZB
National Datasetsare shared with the central CWR data index.
The national datasets as well as access to other International datasets are provided from the CWR data portal.
EURISCO
SINGERhttp://www.cropwildrelatives.org
43
Data portal example
44http://wwwdev.ngb.se/portal/index.php?scope=demo
45
46
47
48
49
Outlook• The compatibility of data standards between PGR and biodiversity
collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.
• Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.
• Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.
• The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).
50
• Participation and the sharing of your institute datasets with global and national biodiversity projects
• is important for your public and scientific visibility,
• promoting the use (usefulness) of your data
• and ultimately for the continued funding of your institutional activities.
51
Special thanks to
• Bioversity International [http://www.bioversityinternational.org]
• GBIF, Global Biodiversity Information Facility [http://www.gbif.org]
• BioCASE, The Biological Collection Access Service for Europe. [http://www.biocase.org]
• TDWG, Biodiversity Information Standards [http://www.tdwg.org]
52
Special thanks to
• BioCASE and PyWrapper3 software– Markus Döring– Javier de la Torre
• DiGIR and TapirLink software– Renato de Giovanni– Dave Vieglais
53
Thank you for listening!