richard white biodiversity informatics projects. thoughts role of biodiversity data in...
TRANSCRIPT
![Page 1: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/1.jpg)
Richard White
Biodiversity Informatics Projects
![Page 2: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/2.jpg)
Thoughts• Role of biodiversity data in bioinformatics
– assisting with organising and retrieving bioinformatic (molecular) data
– a separate area with different users (taxonomy, ecology, conservation, resource management …)
• Demand from users for taxonomic and species diversity information on the Web
• Pressure on the taxonomic community to deliver
• Demand for more sophisticated use of available data: interoperability = online analysis, not just browsing
![Page 3: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/3.jpg)
Assembling biodiversity information sources
Delivering species diversity information by• assembling,
• merging &
• linking databases and
• publishing on the Web,
with special emphasis on linking
![Page 4: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/4.jpg)
Issues in assembling and linking biodiversity information sources• Assembling a web-site (ERMS)
• Assembling databases by merging (ILDIS)
• Linking on-line databases through a gateway (Species 2000 and SPICE)
• Onward links to related information
• Checking the reliability of links (LITCHI)
• Intelligent linking
• Persistent identifiers
![Page 5: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/5.jpg)
Assembling species databases
First of all, before we start merging and linking databases, let’s assemble a database from scratch:
• ERMS (European Register of Marine Species)
• Now at www.marbef.org/data/erms.php
![Page 6: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/6.jpg)
ERMS
![Page 7: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/7.jpg)
Incoming data
• Approximately 100 separate lists for different taxonomic groups
• Mostly compiled as spreadsheets• Scientific names, synonyms, geography (at least
Atlantic or Mediterranean)• Some optional fields• Objective to create a book and a web-site, partially
supported by a database
![Page 8: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/8.jpg)
List conversion
was carried out in several stages:• Excel spreadsheets were exported to text files• Tab-delimited text files were imported into a client-
server database (MySQL)• Database queries results are passed through
templates to generate either RTF (for the printed publication) or HTML (for the Web site)
![Page 9: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/9.jpg)
Variations on a theme• Fields may be combined or separated
e.g. genus species authority date
• Higher taxa may be:– repeated in fields of the species record– given once in separate preceding records in various
different formats
• Synonyms may be:– in a separate field of the species record, or mixed with
other remarks, with various delimiters and separators– in separate records, linked by code or by name or even
abbreviated– implied, e.g. Genus1 specname (Smith as Genus2)
• Geographical information is often free text
![Page 10: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/10.jpg)
ERMS book page
![Page 11: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/11.jpg)
Osteichthyes: brief checklist
![Page 12: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/12.jpg)
Reptilia: full details
![Page 13: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/13.jpg)
Taxonomic hierarchy for Reptilia
![Page 14: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/14.jpg)
Merging versus linking
• Merging databases to create a single larger database
• Linking databases to create a distributed information system
![Page 15: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/15.jpg)
Merging species databases
1 The original databases are physically copied into a new combined database.
2 The user interacts with the new combined database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
1
2
![Page 16: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/16.jpg)
Linking
1 The user interacts with an access system which does not itself contain data.
2 When the user requests data, it is fetched from the appropriate database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
2
1
![Page 17: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/17.jpg)
Assembling databases by merging
Now we have some databases, let’s build a bigger one by merging:
• ILDIS (International Legume Database and Information Service)
![Page 18: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/18.jpg)
ILDIS
International Legume Database and Information Service
• International collaborative project– 10 Regional Centres– 30 Taxonomic Coordinators
• Its goals include– building, maintaining and enhancing the ILDIS
World Database of Legumes– designing and providing services from it to users,
including: • ILDIS LegumeWeb• via Species 2000
![Page 19: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/19.jpg)
ILDIS World Database of Legumes v. 7.00
• Taxa• Species 15,500• Subspecies 1,600• Varieties 2,400
19,500• Names
• Accepted names 19,500• Synonyms 19,000
39,500
![Page 20: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/20.jpg)
ILDIS’s data model: core data
• A core taxonomic checklist, assembled from regional data sets and nearing completion, provides a consensus taxonomy - a unified taxonomic treatment or backbone on which other data can be hung
• Various kinds of additional data may be attached to this backbone (see later)
![Page 21: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/21.jpg)
Features of ILDIS LegumeWeb
We’ll look at examples of the use of LegumeWeb, to show a couple of features:
• Two-stage access with “synonymic indexing”• A gateway to external information - “onward links”
(direct species name links) to further sources of information
![Page 22: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/22.jpg)
![Page 23: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/23.jpg)
![Page 24: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/24.jpg)
User access to LegumeWeb: Step 1
• The user types in a name, which may be incomplete (or wrong!)
• LegumeWeb responds by showing a list of the species names which fit the user’s specification
![Page 25: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/25.jpg)
![Page 26: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/26.jpg)
User access to LegumeWeb: Step 2
• The user chooses one of the species names provided (which may be synonym or an accepted name)
– In this example, the user chooses Abrus cyaneus (a synonym for Abrus precatorius)
• LegumeWeb responds by showing a standard set of information about the chosen species
![Page 27: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/27.jpg)
![Page 28: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/28.jpg)
Synonymic indexing
• Automated synonymic indexing• synonym entered accepted name found
(name taxon)• taxon found synonyms listed
• Types of synonyms– Unambiguous– Ambiguous
• pro parte• homonyms• misapplied names
• In these cases an explanation is offered to the user
![Page 29: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/29.jpg)
Assembling databases by linking
Now we have some biggish databases, let’s build something even bigger by linking databases together:
• Species 2000
– SPICE– Species 2000 Europa
![Page 30: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/30.jpg)
Linking
1 The user interacts with an access system which does not itself contain data.
2 When the user requests data, it is fetched from the appropriate database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
2
1
![Page 31: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/31.jpg)
The Catalogue of Life (Species 2000)
• An international collaborative project to provide access to an authoritative and up-to-date checklist of all the world’s species
• A distributed array of Global Species Databases (GSDs) can be accessed through a Web gateway or Central Access System (CAS)
• The array of GSDs provide an index to a further range of information about each species, using onward links (see later)
• www.sp2000.org
![Page 32: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/32.jpg)
Species 2000 organisation
Taxonomic hierarchy (or hierarchies)
Species
Global species databases (GSDs) and interim checklists:
the species index GSDinterim
checklists
Species information sources (SISs): regional faunas and floras, specialist or sectoral
databases, web pages etc.
SIS
![Page 33: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/33.jpg)
Architecture of Species 2000User
interface
Data collector (CAS)
Wrapper
GSD
Wrapper
GSD
Wrapper
GSD
![Page 34: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/34.jpg)
Species 2000’s Common Access System
• Species 2000 gives users a single point of access to GSDs
• Access involves a two-stage search process similar to that used in LegumeWeb
• In the second stage, the user sees a screen of “standard data” about a species
![Page 35: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/35.jpg)
The “standard data”
This comprises the information about a species which Species 2000 wishes to provide:
– Accepted name (with references)– Synonyms (with references)– Common Names (with references)– Family or other higher taxon– Geography– Comment– Scrutiny information– URL or URLs linking to further data sources for
this species
![Page 36: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/36.jpg)
Need for communication
• Different people are building the various components of the system:– GSDs– wrappers– CAS– user interface
• We need to ensure they all have a common understanding of the data to avoid mistakes
![Page 37: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/37.jpg)
Common Data Model
• We use a Common Data Model (CDM)
– A definition of the information being passed to and fro
– Human-readable, not machine-readable– Helps to manage complexity– Used to create specific machine-readable
implementations for Corba (IDL), CGI/XML (DTD, XML Schema), Web Services, etc.
![Page 38: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/38.jpg)
What does the CDM look like?
• It defines the input (“request”) and output (“response”) for six fundamental operations which the system needs to be able to carry out
![Page 39: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/39.jpg)
Request Types 0-6
– Type 0: Get CDM version supported by a GSD’s wrapper
– Type 3: Get information about a GSD– Type 1: Search for a name in a GSD– Type 2: Fetch “standard data” about a
chosen species– Type 4: Move up the taxonomic
hierarchy (towards the root of the tree)– Type 5: Move down the taxonomic
hierarchy (towards the species level)
![Page 40: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/40.jpg)
Spice CAS in use
Screen-shots of an old version of the Spice system in use:
![Page 41: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/41.jpg)
Spice 1 CAS
![Page 42: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/42.jpg)
![Page 43: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/43.jpg)
![Page 44: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/44.jpg)
Onward links to related external data
Species databases such as ILDIS and federated systems such as Species 2000 envisage providing links from their data to external sources of related data, so-called “onward links”
• Example from ILDIS ...
![Page 45: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/45.jpg)
“Onward links”
• The user may follow a hyperlink to some other data source for further information, not managed by ILDIS
– In this example, the user chooses to go to W3Tropicos at Missouri Botanical Garden to see more information
• In this way LegumeWeb acts as a gateway to other information about legume species
![Page 46: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/46.jpg)
LegumeWeb page with onward links
![Page 47: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/47.jpg)
Destination of an onward link
![Page 48: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/48.jpg)
Further information obtained
![Page 49: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/49.jpg)
Checking the reliability of links
• Whether in
– merging data sets to construct a species database like ILDIS, or in
– linking from one data set to another,
• it is necessary to ensure that the species concepts in the different databases do not conflict
![Page 50: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/50.jpg)
Example 1
Database A
• Caragana arborescens Lam. [accepted name] Caragana sibirica Medikus [synonym]
Database B
• Caragana sibirica Medikus [accepted name]Caragana arborescens Lam. [synonym]
![Page 51: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/51.jpg)
Example 2
Database A
• Caesalpinia crista L. [accepted name]
Database B
• Caesalpinia crista L. [accepted name] Caesalpinia bonduc (L.) Roxb. [accepted name] Caesalpinia crista L., p.p. [synonym]
![Page 52: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/52.jpg)
LITCHI project
• We modelled the knowledge integrity rules in a taxonomic treatment
• The knowledge tested is implicit in the assemblage of scientific names and synonyms used to represent each taxon
• Practical uses include – helping a taxonomist to detect and resolve taxonomic
conflicts when merging or linking two databases– helping a non-taxonomist user follow links from one
database to another, in which the species may be differently classified
![Page 53: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/53.jpg)
Conflict display
![Page 54: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/54.jpg)
Outcome of LITCHI project
• A prototype tool for merging checklists & checking integrity of individual checklists was implemented
• In the Species 2000 Europa project, we are now creating a completely new second version with a view to allowing: – dynamic linking (so-called “taxonomically intelligent
links”)
– Presentation of “attached data” to be organised, merged and used to support conflict resolution
![Page 55: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/55.jpg)
“Intelligent” linking
• The Catalogue of Life (Species 2000) is
– not just a catalogue (which lists things)– it is an index (which points to things)
• GSDs, and gateways to them such as the Catalogue of Life, can serve not only as catalogues of species but also as indexes giving access, potentially, to all species information on the Internet
![Page 56: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/56.jpg)
“Intelligent” linking
• Species 2000 plans to provide links to take a user
– from a species entry (from a GSD) – to further sources of information about that
particular species (Species Information Sources or SISs)
![Page 57: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/57.jpg)
Species 2000 organisation
Taxonomic hierarchy (or hierarchies)
Species
Global species databases (GSDs) and interim checklists:
the species index GSDinterim
checklists
Species information sources (SISs): regional faunas and floras, specialist or sectoral
databases, web pages etc.
SIS
![Page 58: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/58.jpg)
“Intelligent” species links
• Given that it is possible to detect many cases of potential taxonomic conflict when linking species databases, how can such links be managed?
• There are a number of choices in the ways links may be made and handled
![Page 59: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/59.jpg)
Cross-mapping
So how can we make intelligent links work, especially in the difficult cases where a species in one database does not have an exact match in the other ?
– One way is to create and maintain “cross-maps” which describe how one or more taxa in one resource (such as the Species 2000 index) relate to one or more taxa in another resource
![Page 60: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/60.jpg)
A dream
A system for managing intelligent species links would maximise the potential of the plethora of species-based catalogues, indexes and rich species resources currently being assembled all over the world
• Perhaps on the Web, as with the current Spice/Species 2000 prototypes
• Or ...
![Page 61: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/61.jpg)
The Grid
The Grid is often thought of as a new toy for particle physicists, with – very high bandwidth– distributed computational resources
But it also provides opportunities for more structured and reliable access to data and information sources, using improved protocols with metadata– For example, access to such knowledge
sources as these cross-maps
![Page 62: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/62.jpg)
Using biodiversity information resources
• Helping Biodiversity Researchers to do their Work
• Collaborative e-Science and Virtual Organisations
![Page 63: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/63.jpg)
Biodiversity analysis and modellingScientists working with biodiversity information employ
a wide variety of resources:
• data sources
• statistical analysis and modelling tools
• presentation or visualisation software
which may be available on various local and remote computer platforms.
![Page 64: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/64.jpg)
Examples of biodiversity resourcesData sources:• Names: Species 2000 & ITIS Catalogue of Life• Data: GBIF, sequence databases• Geography: Gazetteers• Collections and distributions: BioCASE, MaNISAnalysis tools:• Statistical and multivariate analysis• Modelling• Visualisation
![Page 65: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/65.jpg)
Use of resources togetherScientists frequently need to use several of these
resources in sequence to carry out their research.
Much effort is currently expended in
• initially acquiring resources
• installing and sometimes adapting them to run on the user’s own machine
• converting and transporting data sets between stages of the analysis process
![Page 66: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/66.jpg)
Biodiversity researchBiologists are working to understand the adaptation of
organisms to their environmental niche, eventually by combining knowledge of all the levels of
biological organisation
and to predict their interactions with their environment
• genome• transcription• proteome• metabolic pathways• cell• tissue
• organ• individual whole organism• population• species• evolutionary pathways
![Page 67: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/67.jpg)
Workflows
Resources are called into use in an appropriate sequence from an interactive workflow.
The facility for scientists to be able to create their own workflows, without the need for regular assistance from computer scientists, is an essential part of the BDWorld system. Accessible tools for resource discovery and for workflow design, enactment and re-use are therefore required.
![Page 68: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/68.jpg)
For example
Changes in distribution in response to climate changes brought about by global warming
![Page 69: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/69.jpg)
CSM: Climate-space modelling
Modelling and predicting changes in distribution in response to climate changes such as those brought about by global warming
An unreasonably brief explanation:• Get current distribution of a species (e.g. specimen
records)• Get current or recent climate data for those localities• Calculate a model for the climate space the species can
occupy• Predict the distribution the species would have in any
specified climate (may be different to the climate used above)
• Project back on world map
![Page 70: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/70.jpg)
Example work-flow (Climate-space Modelling)
Projection
Prediction
SPICE
LocalitiesClimate
Space Model
Base Maps
Climate
Climate
Submit scientificname; retrieveaccepted name& synonymsfor species
Retrievedistribution mapsfor species ofinterest
Climatesurfaces
Model of climatic conditionswhere species is currentlyfound
Possibly differentclimate surfaces(e.g. predictedclimate)
World orregionalmaps
Prediction of suitableregions for speciesof interest
Projection of predicted distribution on to base map
![Page 71: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/71.jpg)
Triana screen-shots1
Creation (design, editing)
![Page 72: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/72.jpg)
Triana screen-shots
![Page 73: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/73.jpg)
Triana screen-shots
![Page 74: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/74.jpg)
Triana screen-shots
![Page 75: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/75.jpg)
Triana screen-shots2
Execution (enactment, run-time)
![Page 76: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/76.jpg)
Triana screen-shots
![Page 77: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/77.jpg)
Triana screen-shots
![Page 78: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/78.jpg)
Triana screen-shots
![Page 79: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/79.jpg)
Triana screen-shots
![Page 80: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/80.jpg)
And finally …
![Page 81: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/81.jpg)
Triana screen-shots
![Page 82: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/82.jpg)
Elements of the BDWorld system
• What did the system have to do to make that example happen?
![Page 83: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/83.jpg)
Role of the work-flow engine
• Create and edit a workflow
– locate an appropriate resource– check interoperability– arrange any necessary transformations– record provenance of generated data sets
• Execute a workflow, passing data sets to and fro
• Create a log or ‘lab book’ for user
![Page 84: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/84.jpg)
Difficulties with resources
• Finding the resources
• Knowing how to use these heterogeneous resources
– Originally constructed for various reasons, often with little attention to standards or interoperability
– Have to pass data sets from one to another– Some involve user interaction
![Page 85: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/85.jpg)
Role of metadata
Metadata is needed to enable discovery of resources and to indicate how they are to be used.
• Properties to help locate appropriate resources• Check interoperability, suggest transformations• Provenance of data sets• Log of work-flows executed
![Page 86: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/86.jpg)
What is biodiversity informatics?
• The preceding project, among others, shows that the challenges facing biodiversity informatics include not only– Describing the diversity of life at all levels of
organisation, so that biologists can understand, conserve and exploit it,
• But also– Inventing ways to describe the ever-increasing
diversity of information resources and analysis tools available, so that users can find and use them
![Page 87: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/87.jpg)
A challenge to link resources
• It is potentially very difficult to link all these resources together
• Much attention is currently being given to:– Providing unique identifiers for data objects
– Which can return metadata about themselves
– Which can be stitched together into a distributed collaborative information system: see the biodiversity informatics organisations TDWG and GBIF (later)
![Page 88: Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic](https://reader036.vdocuments.us/reader036/viewer/2022070406/56649e055503460f94af16b7/html5/thumbnails/88.jpg)
End