genedata profiler & irods an open & collaborative
TRANSCRIPT
Genedata Profiler & iRODSAn Open & Collaborative Enterprise Software Platform for Patient and Compound Profiling
Marc Flesch, Tamas Rujan
© 2015 Genedata 2Confidential and Proprietary
Genedata – Corporate Snapshot
RootsEstablished in 1997 | Privately owned | Headquartered in Switzerland
Global Reach~ 200 employees | Offices in Europe (Basel, Munich), North America (Boston, San Francisco) & Asia (Tokyo)
Dedicated to Drug Discovery & BiotechnologyInnovative portfolio of enterprise systems increasing productivity of data rich & complex research processes
Domain ExpertiseExperienced Ph.D. level experts coupled with efficient software engineering processes
Marquee Customer BaseLeading pharmaceutical, biotechnology, and other life science organizations
© 2015 Genedata 3Confidential and Proprietary
Customer Base – Pharma
San Francisco
Munich Basel
Tokyo
Boston
25 of Top 25 Pharmasand more …
© 2015 Genedata 4Confidential and Proprietary
Supporting the Patient Profiling Process
Patient cohorts NGS
Responder
Non-responder
Patient stratificationDrug response prediction
ATCTCTTGGCTCCATCATTTAGAGGAAGGAACTGTCAAAACTTGTTGCTTCGGCGGGGCCTGCCGTGGCATCTCTTGGCTCCAGCAGCATCGATGAATCGATACTTCTGAGTCGGATCTCTTGGCTACAACGGATCTCTTCGGATCTCTTGGCTGATGAAGAACGCAG
© 2015 Genedata 5Confidential and Proprietary
Major Challenges of Patient Profiling Process
• Efficiently managing, processing, and analyzing data– Huge & complex datasets containing patient related omics data
– Integrating disease & genomic information from different studies
• Facilitating collaboration within interdisciplinary teams– Enabling easy data, method & result sharing
– Global distribution of data generators & data consumers
• Working with data from human samples in research environments– Ensuring privacy of patient information
– Maintaining chain of custody
6© 2015 Genedata Confidential and Proprietary
“Using data from clinical samples is challenging, because we need to take patient privacy very seriously” *Henrik Seidel, Bayer
Problem Statement
© 2015 Genedata 7Confidential and Proprietary
Data privacy within a global Organization
Illumina SequencerHPC Cluster
… how-to efficiently work with distributed data?
Illumina Sequencer HPC Cluster
User GroupUser Group
User Group
© 2015 Genedata 8Confidential and Proprietary
At Present…
Common technologies applied include
• UNIX file permissions• POSIX Access Control Lists (ACLs)• CIFS Shares (SAMBA)
With the following shortcomings
• UNIX permissions are too simple to model project centric access patterns
• paths on UNIX file systems can’t replace data management systems • permissions have to be maintained manually which is extremely
cumbersome• ACLs are hard to manage• distributed storage problem stays unresolved
Our Solution
© 2015 Genedata 10Confidential and Proprietary
Marrying Security with Performance
HPC
InputData
CacheCopy
TempResults
ResultData
ComputeCluster
© 2015 Genedata 11Confidential and Proprietary
RNA-Seq Data-Processing Pipeline
© 2015 Genedata 12Confidential and Proprietary
and Interaction Points with
© 2015 Genedata 13Confidential and Proprietary
Profiler
Chain-of-Custody
rna1_1.fq
rna1_2.fq
rna2_1.fq
rna2_2.fq
rna3_1.fq
rna3_2.fq
rna4_1.fq
rna4_2.fqTina
Alice
Bob
Joe
Tina
Joe
Bob
sequencealignment
RNAquantifi-
cation
dataexport
Alice
© 2015 Genedata 14Confidential and Proprietary
Enabling Intuitive Raw Data Management
1. Visualization of clinical sample annotation together with corresponding raw data
2. Flexible search functionalities across the whole database
3. Powerful annotation curation capabilities including bulk editing and annotation information protection
© 2015 Genedata 15Confidential and Proprietary
Marrying Raw Data with Sample Annotation
Sample AnnotationRaw Data
© 2015 Genedata 16Confidential and Proprietary
Providing ‘Google-Like’ Search
search result
complex search
© 2015 Genedata 17Confidential and Proprietary
Sample Annotation Curation
locked downattribute
multiple valuesincluding units
browse sequence
© 2015 Genedata 18Confidential and Proprietary
Summary
• The smooth integration of Genedata Profiler with iRODS enables scientists to preserve their research eco-system when working with confidential data
• Genedata Profiler’s data processing and management capabilities together with iRODS’ metadata and security concepts are a unique combination to establish the chain-of-custody for analyzing personalized medicine data