gigascience () is an online, open-access journal that includes, as part of its publishing...

1
GigaScience (http://www.gigasciencejournal.com ) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB ( http://www.gigadb.org ). GigaScience is co- published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data. Due to the scope of GigaScience, GigaDB needs to host a wider variety of data type than most biological databases. In order to make this possible, we have created and launched a new version of GigaDB that now uses a fully extensible database schema capable of handling this variety of data types and standards. The schema has 3 main areas, centered around these tables: Dataset Sample Data/File These are roughly analogous to those used by other common systems for submitting /curating biological data, including the SRA ( http://www.ebi.ac.uk/ena/submit/metadata-model ), and the ISA infrastructure ( http://isatab.sourceforge.net/format.html ). The dataset part includes tables to store information about the overall study design, the authors and funding bodies. It also acts as a holder to link together all the samples and data associated with it, as well as providing links to external sources. The Sample area of the schema plays host to the sample metadata and sample relationships, including their relationship to particular data files. Here we present the schema, and in an poster we show how it is implemented for metadata capture in our Submission Wizard. Schema update to accommodate the growing variety of data. Christopher I. Hunter , Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Christopher I Hunter, GigaScience 2015. DOI: Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: No space constraints, and included data and workflow hosting in GigaDB and GigaGalaxy Open access, open data, and highly visible work freely available for distribution Collation of Data Citation Numbers by Thompson-Reuters Indexing of papers in PubMed Central, Google Scholar, etc. Yong Zhang (BGI; China National Genebank) Shaoguang Liang (BGI- SZ), Alex Wong, Dennis Chan (BGI- HK) Thanks to: With financial support from: GigaGalax y GigaD B GigaScien ce All samples, files, experiments and dataset objects can have any attributes linked, which allows a fully extensible and indexable way to include the wide variety of metadata covered by the scope of the database. We link to multiple external sites to aid interoperability .

Upload: eustacia-montgomery

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GigaScience () is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB

GigaScience (http://www.gigasciencejournal.com) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB (http://www.gigadb.org). GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data.Due to the scope of GigaScience, GigaDB needs to host a wider variety of data type than most biological databases. In order to make this possible, we have created and launched a new version of

GigaDB that now uses a fully extensible database schema capable of handling this variety of data types and standards.The schema has 3 main areas, centered around these tables: Dataset Sample Data/FileThese are roughly analogous to those used by other common systems for submitting /curating biological data, including the SRA (http://www.ebi.ac.uk/ena/submit/metadata-model), and the ISA infrastructure (http://isatab.sourceforge.net/format.html). The dataset part includes tables to store information about the overall study design, the authors and funding bodies. It also acts as a holder to link together all the samples and data associated with it, as well as providing links to external sources. The Sample area of the schema plays host to the sample metadata and sample relationships, including their relationship to particular data files.Here we present the schema, and in an poster we show how it is implemented for metadata capture in our Submission Wizard.

Schema update to accommodate the growing variety of data.

Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman.

Christopher I Hunter, GigaScience 2015. DOI:

Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of:

• No space constraints, and included data and workflow hosting in GigaDB and GigaGalaxy

• Open access, open data, and highly visible work freely available for distribution

• Collation of Data Citation Numbers by Thompson-Reuters • Indexing of papers in PubMed Central, Google Scholar, etc.

Yong Zhang (BGI; China National Genebank) Shaoguang Liang (BGI-SZ), Alex Wong, Dennis Chan (BGI-HK)

Thanks to:

With financial support from:

GigaGalaxy

GigaDB

GigaScience

All samples, files, experiments and dataset objects can have any attributes linked, which allows a fully extensible and indexable way to include the wide variety of metadata covered by the scope of the database.

We link to multiple external sites to aid interoperability.