monthly program update april 12, 2012 andrew j. buckler, ms principal investigator
DESCRIPTION
Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator. With Funding Support provided by National Institute of Standards and Technology. Agenda. Working discussion on data curation , using facilities of Iterate for storage and provenance documentation model. - PowerPoint PPT PresentationTRANSCRIPT
Monthly Program UpdateApril 12, 2012
Andrew J. Buckler, MSPrincipal Investigator
WITH FUNDING SUPPORT
PROVIDED BY NATIONAL
INSTITUTE OF STANDARDS AND
TECHNOLOGY
Agenda
• Working discussion on data curation, using facilities of Iterate for storage and provenance documentation model.
• Updates on:– Metrology Workshop results.– QIBA 3A Test bed progress.
22
Part of our discussion on data curation and processing workflow from last month…
// B
usin
ess
Requ
irem
ents
FNIH
, QIB
A, a
nd C
-Pat
h pa
rtici
pant
s do
n’t
have
a w
ay to
pro
vide
pre
cise
sp
ecifi
catio
n fo
r con
text
for u
se a
nd
appl
icab
le a
ssay
met
hods
(to
allo
w
sem
antic
labe
ling)
:Bi
omar
kerD
B =
Spec
ify (b
iom
arke
r do
mai
n ex
perti
se, o
ntol
ogy
for l
abel
ing)
;Re
sear
cher
s an
d co
nsor
tia d
on’t
have
an
abili
ty to
exp
loit
existi
ng d
ata
reso
urce
s w
ith h
igh
prec
isio
n an
d re
call:
Refe
renc
eDat
aSet
+ =
Form
ulat
e (B
iom
arke
rDB,
{Dat
aSer
vice
} );
Tech
nolo
gy d
evel
oper
s an
d co
ntra
ct re
sear
ch
orga
niza
tions
don
’t ha
ve a
way
to d
o la
rge-
scal
e qu
antit
ative
runs
:Re
fere
nceD
ataS
et .C
olle
cted
Valu
e+ =
Ex
ecut
e (R
efer
ence
Dat
aSet
.Raw
Dat
a);
The
com
mun
ity la
cks
way
to a
pply
defi
nitiv
e st
atisti
cal a
naly
ses
of a
nnot
ation
and
im
age
mar
kup
over
spe
cifie
d co
ntex
t for
us
e:Bi
omar
kerD
B.Su
mm
aryS
tatis
tic+
= An
alyz
e ( {
Ref
eren
ceD
ataS
et .C
olle
cted
Valu
e } )
;In
dust
ry la
cks
stan
dard
ized
way
s to
repo
rt
and
subm
it da
ta e
lect
roni
cally
:efi
ling
tran
sacti
ons+
= P
acka
ge
(Bio
mar
kerD
B, {R
efer
ence
Dat
aSet
} );
333333
…and the associated storage model…Subject Predicate Object
ClinicalUtility is Investigation URI
ClinicalValidity is Investigation URI
TechnicalPerformance is Investigation URI
Investigation has SummaryStatisticType
Investigation has Study URI
Study has DescriptiveStatisticType
Study has Protocol URI
Study has Assay URI
Assay has RawData URI
Assay has AnnotationData URI
AIM file is AnnotationData URI
Mesh is AnnotationData URI
(using “Share” and “Duplicate” functions of RDSM to leverage cases across investigations)
(self-generating knowledgebase from RDSM hierarchy and ISA-TAB description files)
Reference Data Set Manager:
Heavyweight Storage with URIs
Knowledgebase:Lightweight
Storage linking to URIs
44
…leading us to: Principles of ProvenanceCentral to the scientific method is the idea of replicating
prior experiments such that they are transparent and verifiable.
We need to keep track of• the origin of data• transformation methods applied to the data
• not just which programs• version information is critical• copies of actual programs used (git).
555555
• Taverna keeps provenance data in a database on the machine from which the workflow is initiated
• We need to expose provenance for external users of QI-Bench• example: provenance of the data in an exported
ISA-TAB
666666
Provenance architecture of Iterate
77
Taverna allows access to the provenance data via a Java API.
• We have not explored this area of Taverna yet.• Taverna’s documentation indicates this is an
area under active development.
888888
Iterate Demonstration• Obtaining a list of communities to which a user
belongs• Nesting a workflow• Listing the items in a folder
999999
Workflow to list community memberships in Iterate
1010
Workflow to list community memberships in Iterate using a nested workflow
1111
Provenance application in QI-Bench Demonstrators:Investigation and Studies level (ISA-TAB compliant)
1212
Provenance application in QI-Bench Demonstrators:Assay and Data levels (not ISA-TAB compliant yet)
1313
Application• Provenance of
• Demonstrator40 data [input for analysis]• Demonstrator40 Output [obviously the output]
…
141414141414
Application• So we can answer
• What is Demonstrator40_download.zip?• How did we get the Demonstrator40 data?
• What was the original dataset and where did it come from?
• What transformation on the original dataset created the Demonstrator40 data folder?
151515151515
Update: Metrology Workshop results
1616
Update: QIBA 3A Test bed progress
1717
1818
Value proposition of QI-Bench• Efficiently collect and exploit evidence establishing
standards for optimized quantitative imaging:– Users want confidence in the read-outs– Pharma wants to use them as endpoints– Device/SW companies want to market products that produce them
without huge costs– Public wants to trust the decisions that they contribute to
• By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data
• Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders
1919
Summary:QI-Bench Contributions• We make it practical to increase the magnitude of data for increased
statistical significance. • We provide practical means to grapple with massive data sets.• We address the problem of efficient use of resources to assess limits of
generalizability. • We make formal specification accessible to diverse groups of experts that are
not skilled or interested in knowledge engineering. • We map both medical as well as technical domain expertise into
representations well suited to emerging capabilities of the semantic web. • We enable a mechanism to assess compliance with standards or
requirements within specific contexts for use.• We take a “toolbox” approach to statistical analysis. • We provide the capability in a manner which is accessible to varying levels of
collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access.
2020
QI-BenchStructure / Acknowledgements• Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette)
• Co-Investigators– Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer)– Stanford (David Paik)
• Financial support as well as technical content: NIST (Mary Brady, Alden Dima, John Lu)
• Collaborators / Colleagues / Idea Contributors– Georgetown (Baris Suzek)– FDA (Nick Petrick, Marios Gavrielides) – UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad, Yelena Yesha)– Northwestern (Pat Mongkolwat)– UCLA (Grace Kim)– VUmc (Otto Hoekstra)
• Industry– Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner)– Device/Software: Definiens, Median, Intio, GE, Siemens, Mevis, Claron Technologies, …
• Coordinating Programs– RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao)– Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien)
2121