working towards multi-omics integration: tools and...

33
Working towards multi-omics integration: Tools and workflows within Galaxy-P Platform Pratik Jagtap Galaxy-P Team University of Minnesota December 6, 2019

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Working towards multi-omics integration:

Tools and workflows within Galaxy-P Platform

Pratik JagtapGalaxy-P Team

University of Minnesota

December 6, 2019

Page 2: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Minnesota Supercomputing InstituteJames JohnsonThomas McGowanMichael Milligan

Ira Cooke and Maria DoyleMelbourne , Australia

University of Minnesota

Timothy Griffin PIPraveen KumarCandace GuerreroSubina MehtaAdrian Hegeman (Co-I)Art EschenlauerRay SajulgaCaleb EasterlyAndrew Rajczewski

Biologists / collaboratorsLaurie ParkerJoel RudneyManeesh BhargavaAmy SkubitzChris WendtBrian CrookerSteven FriedenbergKevin VikenKristin BoylanMarnie PetersonSomiah AfiuniBrian SandriAlexa PragmanWanda WeberAmy Treeful

Harald Barsnes Marc Vaudel University of Bergen, Norway

University of Freiburg,Freiburg, Germany

VIB, UGhent, Belgium

Judson HerveyNaval Research InstituteWashington, D.C.

Matt ChambersNashville, TN

Alessandro TancaPorto Conte Ricerche, Italy

Carolin KolmederUniversity of Helsinki, Finland

Thilo MuthBernhard RenardRobert Koch Institut

Thomas DoakJeremy Fisher Haixu Tang Sujun LiIndiana University

Josh EliasStanford University

Brook NunnU of Washington

Lennart Martens (Co-I)Bart MesuereRobbert G Singh

Bjoern GrueningBérénice Batut

Lloyd Smith (Co-I)Michael ShortreedUW-Madison

Anamika KrishanpalPriyabrata PanigrahiPersistent Systems Limited

Stephan KangIntero Life Sciences

FundingAcknowledgements

Magnus Øverlie ArntzenFrancesco DeloguNMBU,Oslo, Norway

galaxyp.org

Page 3: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Proteogenomics: A primer

+TOF MS: 24 MCA scans from Myo_tryptic.wiff Max. 5191.0 counts.

1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000m/z, amu

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5191

Inte

nsi

ty,

cou

nts

1360.7892

1606.8892

1938.0629

1815.9397

1378.8696

2316.30921506.9692

1886.06721271.6925 1661.8925

1001.4584 1983.10711589.8688

1343.7703 1798.92161071.6147 2298.26431959.0339 2505.3460 2602.5045

MS1

MS2

Page 4: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Matching amino acid sequences to MS/MS data

Page 5: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Detecting protein variants via proteogenomics

Comprehensive

Database

(Sample-specific, all

possible sequences)

UCGAUCAGGGCAAUTCGATCAGGGCAATAGCTAGTCCCGTTA

RNA sequences (e.g. RNA-seq)

(3-frame translation)

DNA sequences

(6-frame translation)

In-silico translation

Page 6: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Proteogenomic outcomes

Confirms translation of variants

Direct evidence of potential functional variants

Applications in neoantigendiscovery (immuno-oncology)

VOL.11 |NO.11 | NOVEMBER 2014 | NATURE METHODS

Page 7: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Bringing proteogenomics to the masses: informatics challenges

J. Proteome Res., 2014, 13, pp 5898–5908

• Many software tools, integration, automation….

• RNA-Seq assembly and analysis• Customized protein dB generation• Matching sequences to MS/MS data• Filtering and QC!• Interpretation! Beyond a list....

Page 8: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

PROTEOGENOMICS & ITS CHALLENGES

Ruggles et al. Mol Cell Proteomics 2017;16:959-981

© 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

Challenges• Large search database sizes• False-positive sources and their elimination. • Validation of novel peptide identification. • PSM Quality Evaluation • Targeted proteomics of identified peptides. • Genomic localization.

• Disparate tools and numerous processing steps.

Page 9: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Galaxy Platform

• A web-based bioinformatics data analysis platform.• Software accessibility and usability. • Share-ability of tools, workflows and histories. • Reproducibility and ability to test and compare results after using multiple parameters.• Ability to assimilate disparate software into integrated workflows.

https://galaxyproject.org/

Page 10: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Solution: Galaxy Platform

For example, Protein Database Downloader

downloads UniProt protein FASTA databases

of various organisms.

Software tools can be used in a sequential manner to generate analytical workflows that can be reused, shared and creatively modified.

Page 11: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Workflow #1

RNA-Seq to Variant

FASTA database

Proteogenomics Workflows In GalaxyRNASeq Data

GTF File

HISAT Alignment tool

FreeBayesVariant calling

CustomProDBVariant annotation & Genome mapping

StringTIERNA-Seq to Transcripts

GFF Compare Compares assembly with

annotated transcripts

Genome Mapping Files

PROTEIN SEQUENCE FASTA

10th Annual Meeting of Proteomics Society, India, 2018

UniProt FASTA

RNASeq Data

GTF FileGTF File

Page 12: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Proteogenomics Workflows In Galaxy

HISAT Alignment tool

FreeBayesVariant calling

CustomProDBVariant annotation & Genome mapping

StringTIERNA-Seq to Transcripts

GFF Compare Compares assembly with

annotated transcripts

PROTEIN SEQUENCE FASTA

Workflow #2

Database Searching

Using MS/MS data

RAW Files

SearchGUI and PeptideShaker

Peptides for BLAST Search

PSM Report

mz to SQLite

10th Annual Meeting of Proteomics Society, India, 2018

Page 13: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Proteogenomics Workflows In Galaxy

HISAT Alignment tool

FreeBayesVariant calling

CustomProDBVariant annotation & Genome mapping

StringTIERNA-Seq to Transcripts

GFF Compare Compares assembly with

annotated transcripts

SearchGUI and PeptideShaker

Peptides for BLAST Search

PSM Report

mz to SQLite

Workflow #3

Identifying Novel Variants

And Visualization

Summary of peptides

10th Annual Meeting of Proteomics Society, India, 2018

Page 14: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

PROTEOGENOMICS WORKFLOW

Proteo-transcriptomics workflows within Galaxy are used to determine protein expression and detect variant proteins expressed.

Page 15: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Transcriptomics workflows within are used to generate

customized protein databases; estimate gene expression &

detect variant genes expressed.

Page 16: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Quantitative proteotranscriptomics

Kumar P, Panigrahi P, Johnson J, Weber WJ, Mehta S, Sajulga R, Easterly C, Crooker BA, HeydarianM, Anamika K, Griffin TJ, Jagtap P. J Proteome Res. 2019 18:782-790.

Praveen Kumar(Krishanpal Anamika/Priyabrata Panigrahi)

Page 17: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

QuanTP: interactive visualization of RNA-protein response

Distribution

Transcriptome Data Proteome Data

Page 18: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

QuanTP: interactive visualization of RNA-protein response

Differential Expression

18

Transcriptome Data Proteome Data

Page 19: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

QuanTP: interactive visualization of RNA-protein response

Principal component analysis

Transcriptome Data Proteome Data

Page 20: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

20

QuanTP: interactive visualization of RNA-protein response

Cluster Analysis

Page 21: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Correlation of RNASeq and proteomics data

21

QuanTP: interactive visualization of RNA-protein response

Correlation

Page 22: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Cook’s Distance Analysis

22

QuanTP: interactive visualization of RNA-protein response

Influential Points

Correlation of RNASeq and proteomics data

Page 23: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Multi-Omics Visualization Platform:

Characterizing the nature of detected variants

• HTML-based Galaxy plugin• Interactive reading of mzsqlite dB

https://www.biorxiv.org/content/10.1101/842856v2.abstract

Tom McGowan

Page 24: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

MULTI-OMICS VISUALIZATION PLATFORM FOR

VISUALIZING NOVEL PROTEOFORMS

SPECTRAL QUALITY VISUALIZATION (Lorikeet Viewer)

GENOMIC LOCALIZATION (Integrated Genomics Viewer)

https://www.biorxiv.org/content/10.1101/842856v2.abstract

Page 25: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

CRAVAT-P: Assessing potential impact of variants

Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. J Proteome Res. 2018 ,17:4329-4336

Cancer-Related Analysis of Variants Toolkit (cravat.us) developed by Rachel Karchin and Michael Ryan

Page 26: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Assessing potential impact of protein-level variants: CRAVAT-P

• Intersection of transcript variants and confirmed protein variants

Ray Sajulga

Page 27: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Unleashing the power of CRAVAT on proteogenomic results

Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. J Proteome Res. 2018 ,17:4329-4336

ndexbio.org

https://jraysajulga.github.io/cravatp-galaxy-docker/

• HTML-based Galaxy plugin

• Interactive viewer

Page 28: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

COMING SOON

• PepQuery Tool uses a peptide-centric approach for validation by a) competitive filtering; b) statistical evaluation; c) unrestricted modification search and d) visualization of peptides corresponding to novel proteoforms.

Wen et al Genome Res. (2019); 29(3): 485–493. doi: 10.1101/gr.235028.118

• Extend MVP, QuanTP and CRAVAT-P tools

• Integrate newer tools from our collaborators to extend the existing workflows.

Page 29: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Accessing the Multi-omic Workflows

PUBLIC INSTANCES

Proteogenomics Gateway: z.umn.edu/proteogenomicsgateway

Step-by-step instructions: z.umn.edu/pginnov18

Metaproteomics Gateway: z.umn.edu/metaproteomicsgateway

Step-by-step instructions: z.umn.edu/suppS1

Tools and Workflows also available on : https://proteomics.usegalaxy.eu/

Page 30: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

ALSO AVAILABLE ON:

GitHub: https://github.com/galaxyproteomics

Galaxy Toolshed: https://toolshed.g2.bx.psu.edu/

Docker: https://jraysajulga.github.io/cravatp-galaxy-docker/

Training Workflows also available on : https://training.galaxyproject.org

Accessing the Multi-omic Workflows

Page 31: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Conclusions

• Proteogenomics workflows that generate quantitative peptide and protein-level values are available within Galaxy platform.

• Post-search analysis tools such as QuanTP, MVP and CRAVAT-P help understand the biological context of the data. We plan to extend these tools.

• There is a need to integrate statistical tools and methods to offer a much more comprehensive perspective of proteogenomics data.

Page 32: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

We can be Reached at :

Published Manuscripts: z.umn.edu/galaxypreferences

Galaxy-P Presentations: http://galaxyp.org/conference-presentations

Contact: http://galaxyp.org/contact/

Twitter: twitter.com/usegalaxyp

galaxyp.org

Page 33: Working towards multi-omics integration: Tools and ...galaxyp.org/wp-content/uploads/2020/01/Working_towards_multi-omi… · Pratik Jagtap Galaxy-P Team University of Minnesota December

Acknowledgements

Funding

galaxyp.org/contact

Follow us on: twitter.com/usegalaxyp

The Galaxy-P Team at University of Minnesota