an overview of intersect’s gdm system joe thurbon intersect

16
An Overview of Intersect’s GDM System Joe Thurbon Intersect

Upload: theresa-chase

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

The Brief The research activities and communities it supports The data management and analysis issues it seeks to resolve The types and volumes of data/metadata it will manage The philosophy behind its design (e.g. why use semantic web technologies, if you are)

TRANSCRIPT

Page 1: An Overview of Intersect’s GDM System Joe Thurbon Intersect

An Overview of Intersect’s GDM System

Joe Thurbon Intersect

Page 2: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Prelude

Page 3: An Overview of Intersect’s GDM System Joe Thurbon Intersect

The Brief• The research activities and communities it supports

• The data management and analysis issues it seeks to resolve

• The types and volumes of data/metadata it will manage

• The philosophy behind its design (e.g. why use semantic web technologies, if you are)

Page 4: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Supported Communities

Page 5: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Issues Addressed

Page 6: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Issues Addressed

Page 7: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Data / Metadata

• We keep– All the short reads– All the bioanalyser output– All the trimmed reads– All the tertiary analysis output ‘that works’– All parameters used to generate all of the above• Chemistry versions• Command line parameters• Species, Locations, etc

Page 8: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Data CountsPlatform / Metric Tissue Samples

Per YearChunks of Data Per Year

Fields of Metadata Per Year

Illumina with standard multiplexing

~2050 > 12000 ~ 100000

454 with standard multiplexing

~3500 > 14000 ~ 120000

Data SizesPlatform / Metric Untrimmed / Year Trimmed / year

Illumina with standard multiplexing

1.2TB 300GB

454 with standard multiplexing

~2.5TB (?) 600GB

Page 9: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Philosophy

• Anthropology• Epistemology• History and Philosophy of Science• Bio-informatics

• GDM

Page 10: An Overview of Intersect’s GDM System Joe Thurbon Intersect

What do people do?

Page 11: An Overview of Intersect’s GDM System Joe Thurbon Intersect

What to Researchers do?

Page 12: An Overview of Intersect’s GDM System Joe Thurbon Intersect

What do Scientists do?

Page 13: An Overview of Intersect’s GDM System Joe Thurbon Intersect

What do Bio-Informaticians do?

Page 14: An Overview of Intersect’s GDM System Joe Thurbon Intersect

What does GDM Do?

• Allows bio-informaticians to– Stand on the shoulders of giants (including

themselves)– Record their observations– Record their experimental parameters– Manage their data

• Iteratively

Page 15: An Overview of Intersect’s GDM System Joe Thurbon Intersect

Demo

A results corresponds to a single experiment

• What experiment did I do? (steps)• What parameters did I set? (parameters)• What observations did I make? (outputs)• When did I end up with (data)• What did I start from? (parents)

Metadata + Data + Parents = One Results

Page 16: An Overview of Intersect’s GDM System Joe Thurbon Intersect

GDM Repository

SCU Ramaciotti

ANU

VELVET

Tcoffeee

VELVET

AC3BLAST

BLASTBLASTVELVET

NCBI EMBLDDBJ