towards a data model for the australian microbial resources information network (amrin) lynette...

42
Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Upload: marian-hodge

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Towards a Data Model

for the

Australian Microbial Resources Information Network

(AMRiN)

Lynette WoodburnAtlas of Living Australia

Version: 0.0317/09/2010

Page 2: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Each slide in this presentation comes with accompanying Notes.

You can’t see them if you display this presentation in ‘Slide Show’ mode.

If you’d like to see the Notes

• view the presentation in ‘Normal’ mode, and • expand the pane below the slide (the Notes pane) to see extra text.

Only then will you have a chance of understanding all the crazy diagrams.

TIP

Page 3: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

a standard set of data fields for all micro-organisms

. to support the sharing and integration of data through AMRiN

. to pre-configure BioloMICS

Requirement

Options . choose an existing set

. develop something new

Towards a data model for AMRiN

Recommendation

. surprise!

Page 4: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

1. Requirements

2. Options

3. Recommendation

Page 5: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

AMRiN

AMRiN community

Page 6: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

AMRiN

AMRiN community

Page 7: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

AMRiN

AMRiN community

Page 8: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

1. Requirements

2. Options

3. Recommendation

- existing

CABRIMCL

Page 9: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Common Access to Biological Resources and Information CABRI

a European organization of partner collections

who contribute data to searchable ‘catalogues’ covering

http://www.cabri.org/

• bacteria & archaea

• fungi & yeasts

• animal & human cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

Page 10: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI’s sets of data elements

• 26

• 23

• 29

• 17

• 15

• 33

• 30

• 12

• 7

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

elements per set

Original_host_plant

Doubling_time

Lysogenicity

Isolated_from

Morphology

Page 11: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Common Access to Biological Resources and Information CABRI

For each different kind of biological resource,

CABRI defines nested sets of data elements

Mandatory Recommended Full

Page 12: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : bacteria & archaea

Strain_numberOther_collection_numbersRestrictionsOrganism_typeNameInfrasubspecific_namesStatusHistoryConditions_for_growth Form_of_supply

SerovarOther_namesIsolated_fromGeographic_originMutantGenotypeLiterature

Sexual_statePathogenicityEnzyme_productionMetabolite_productionApplicationsCatalogue_entryRemarksPrice_codePlasmids

Mandatory Recommended Full

Page 13: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : fungi & yeasts

Strain_numberOther_collection_numbersNameStatusOrganism_typeHistoryRestrictionsForm_of_supplyConditions_for_growth

Misapplied_namesRaceSubstrateGeographic_originLiteratureApplicationsMutantSexual_state

Price_codeRemarksPathogenicityMetabolite_productionEnzyme_productionGenotype

Mandatory Recommended Full

Page 14: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : animal & human cell lines

Accession_numberCell_line_nameBrief_descriptionDescriptionDepositorBibliographic_referencesMorphologyCulture_conditionsVirusesPropertiesRelease_conditionsHazard Passage_number

Species_validation

TumorigenicityKaryologyFreezing_mediumSterilityValidation_assaysFurther_bibliographyCommentsStorageDoubling_timeMycoplasmaFingerprintCytogeneticsKaryotypeCommentsResearch_council_depositBIOMED_1

Mandatory Recommended Full

Page 15: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI’s sets of data elements

• 26

• 23

• 29

• 17

• 15

• 33

• 30

• 12

• 7

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

192

Page 16: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Sharing data about one kind of biological resource is easy

eg. phages

Page 17: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

eg. plasmids

Sharing data about one kind of biological resource is easy

Page 18: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Sharing data about multiple kinds of biological resources is hard

Other_culture_collection_numbers

Other_collection_numbers

Page 19: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

133 distinct data elements …

for describing several different kinds of biological resources ?

What is the prospect of deriving a common model from CABRI

… distributed across 9 sets

bacte

ria &

arc

haea

fungi & yeasts

animal cell lines

plant cell lines

hybridomas

phag

es plasmids

plant cell viruses

genomic libraries

Page 20: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

each of 92 elements is found in only one set

CABRI as a common model ?

only 41 elements are found in more than one set

Page 21: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI as a common model ?

27 data elements are found in two sets 10 ….. in three 4 ….. in four

No elements are found in more than 4 sets

Page 22: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Distribution of data elements across CABRI sets

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• hybridomas

• phages

• plant cell lines

• plant cell viruses

• plasmids

• genomic libraries

Count of data elements in one set two three four

6 3 22 7 14 12 9 13 6 11 4 12 2 1 2 2 1 1 1 3 1

Page 23: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI data element ‘themes’

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

ID of item in

collection

Name / classific

ation of it

em

item admin

handling & distributio

n regulatio

ns

care / maintenance

characteristics

literature

….origin

Page 24: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : comparison of elements across sets

• different names, same meaning (definition)

Accession_number, Strain_number

History, History_of_deposit

Bibliographic_references, Reference_paper, Literature, Reference, Further_bibliography

Restricted_distribution, Release_conditions,Restrictions, Distribution

Morphology, Morphology_and_growth

….

Page 25: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : comparison of elements across sets

• same name, different meanings

Brief_description

Type

phages type of elementphage, transposon, minitransposon, IS element, …

plasmids type of elementplasmid, phasmid, cosmid, shuttle vector, transposon, minitransposon, IS element, …

genomic libraries type of libraryPAC, BAC, YAC, PI, cDNA, …

hybridomas listing of species, strain, antibody specificity

animal cell lines listing of species, strain, tissue, tumour, pathology, transformed/transfected

Page 26: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRI : comparison of data element sets

• varying levels of scope

Conditions_for_growth bacteria & archaea

fungi & yeasts

culture medium

atmospheric and light conditions

temperature conditions

additional remarks on cultivation

Medium plasmids, phages

Medium_1 plant cell lines

Light_regime plant cell lines

Light_conditions plant cell lines

Temperature plant cell lines

Humidity plant cell lines

Page 27: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

• 9 sets of data elements (but does not cover algae)

good for sharing information about one kind of organism

• few elements common to several sets

hard to share information about more than one kind of organism • does not lend itself to the derivation of a common set

elements of ‘different names, same meaning’ elements of ‘same name, different meanings’ elements with meanings of varying scope

• has international acceptance / presence (but no longer funded?)

CABRI : fitness for our purpose

Page 28: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

1. Requirements

2. Options

3. Recommendation

- existing

CABRIMCL

Page 29: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Microbiological Common Language

MCL

• a new data exchange standard for microbiological information

Research in Microbiology, 161(6), 439-445

http://www.straininfo.net/projects/mcl

• a pluggable framework, easily extended

• has the same ancestor as CABRI (MINE)

• underpins StrainInfo (www.straininfo.net)

“ a world-wide, virtual catalog integrating the information from BRC [Biological Resource Centres] catalogs with related information”

Page 30: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

CABRIMCL

CABRI compared with MCL

partitioned by kind of biological resource partitioned by workflow step

Page 31: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Sample IsolationCulture

Deposit

Medium Publication

Strain

The abstract model of Microbiological Common Language (MCL)

… follows the logical flow from sampling to subsequent deposits

Page 32: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

mcl : Sample

sampleDate

sampleCultureStrainNumber

sampleCollectorsampleCollectorInstitute

comments

sampleDescriptionsampleLocationDescription

sampleLocationCountrysampleLocationPlace

sampleAltsampleLatsampleLong

sampleHabitatEnvoTermsampleHabitat

sampleCulture

Sample

Page 33: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

mcl : Culture

Culture

[otherStrainNumbers]

id

cultureLastUpdateDateotherStrainNumberstrainNumber

catalogURL

speciesName

historyisolationDateisolatorisolatorInstituteisolationMethod

typeStrainOfSpeciestypeStrainOf

typeStrainOfGenus

comments

minimalGrowthTemperature[growthTemperature]

optimalGrowthTemperaturemaximalGrowthTemperature

oxygenRelationship

nomenclaturalPublicationpublication

environmentPublicationhistoryPublicationtaxonomicPublication

hasSamplerecommendMedium

Page 34: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

some Object Properties

Culture

hasSamplerecommendMedium

nomenclaturalPublicationpublication

environmentPublicationhistoryPublicationtaxonomicPublication

Sample

Medium Publication

Page 35: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

mcl : Medium mcl : Publication

Medium

mediumNamemediumNumbermediumURLmediumDescriptioncomments

Publication

dcterms: bibliographicCitationdc: titledc: creatorprism: publicationNameprism: volumeprism: numberprism: startingPageprism: pageRangedcterms: issued

Page 36: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

MCL : fitness for our purpose

• MCL offers a broadly-applicable suite of data elements

. data elements are grouped according to workflow steps, not organism type

. applicable to algae and cyanobacteria

. the Strain concept supports the logical linking of related cultures

• the model is modular and easily extensible

. model cohesion is achieved through Object Properties

. links easily with genomic standards (see StrainInfo)

• born and raised in Europe (StrainInfo), but now going global

. Asian biorepositories network is considering adoption

. we’re invited to contribute to ongoing development

• primarily devised (custom-built) as a data exchange standard

Page 37: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

1. Requirements

2. Options

3. Recommendation

Page 38: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Recommendation : dip a toe into the water

• MCL, custom-built for describing microbiological data, deserves consideration

Proposal

undertake a pilot, involving a small group of AMRiN participants,

to assess the suitability of MCL for AMRiN’s purpose.

Page 39: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

AMRiN

AMRiN community

Page 40: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

AMRiN participants’ input

map local elements to MCL elements

Note:some MCL elements

may not have a local equivalent

identify local elements to be kept ‘private’

identify other local elements to be shared ;

provide English definitionsto enable reconciliation with other participants’ elements

Page 41: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Pilot assessment

• Coverage?

• What additional common elements exist amongst the set to be shared?

How much orange overlaps purple?

How much purple overlaps purple?

• Other assessment criteria?

Page 42: Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN) Lynette Woodburn Atlas of Living Australia Version: 0.03 17/09/2010

Pulling the pieces together

Please consider the foregoing proposal.

Does it seem reasonable to you?

Do you think there’s a better way?