taming snakemake

21
Taming Snakemake 1/27/14

Upload: jeremy-leipzig

Post on 13-Jul-2015

961 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Taming Snakemake

Taming Snakemake

1/27/14

Page 2: Taming Snakemake

Why Make?

What are Make's advantages (over Perl and shell scripts)?

Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of variables, conditionals, and loops. In Shell you are forced to think like a caveman.

Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.

Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.

Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.

Make allows you to add new input files without worrying about overwriting old ones.

Make is well supported. There are 1333 Make questions on SO alone.

When people see a Makefile, they immediately know how to run it.

Make does not force you to wrap shell statements in quotes.

Make is a DSL. It will attempt to validate your syntax.

Make is ancient, ubiquitous, and reliable.

Make can parallelize with --jobs.

Make recipes encourage reuse.

https://share.chop.edu/pages/viewpage.action?pageId=138478819

Page 3: Taming Snakemake

Make review

http://github.research.chop.edu/BiG/err_chip_seq/blob/master/Makefile

Page 4: Taming Snakemake

Pipelines and Workflows

Page 5: Taming Snakemake

Other pipelines

Ruffus GKNO

Queue

Page 6: Taming Snakemake

Why Snakemake?

Addresses Makefile weaknesses without throwing out the good stuff

Difficult to implement control flow

No cluster support

Inflexible wildcards

Too much reliance on sentinal files

No reporting mechanism

Johannes Köster

Page 7: Taming Snakemake

Syntax

Make Snakemake

Variables

Targets

Rules

Page 8: Taming Snakemake

Utilities

Logs - wire them up manually

Cluster support pretty decent

Cores/jobs/resources

source /nas/is1/leipzig/martin/variome-env/bin/activatesnakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j 16

source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activatesnakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j 16

Page 9: Taming Snakemake

Useful stuff

dry-runs

keep-going

touch

version changes

workflow diagrams

Page 10: Taming Snakemake

Python legal

Page 11: Taming Snakemake

Client websites with Jekyll

Jekyll is a templating engine for blogs that accepts Markdown

Layouts use the Liquid markup

http://mitomap.org/martin-rna-seq/

Page 12: Taming Snakemake

A workflow that reports itself

Page 13: Taming Snakemake

Avoiding Sweave-Hell

Page 14: Taming Snakemake

The bad way

Page 15: Taming Snakemake

Cache-ing chunks?

Page 16: Taming Snakemake

Avoiding Sweave-Hell

Page 17: Taming Snakemake

Avoiding Sweave-Hell

Page 18: Taming Snakemake

R/Snakemake integration

git submodule add [email protected]:BiG/rna-seq-common-functions.git common/rna-seq

Page 19: Taming Snakemake

Leave a paper trail

Page 20: Taming Snakemake

Reproducible Checklist

repository github.research.chop.edu

workflow of some kind from beginning to end

website at mybic.chop.edu

Page 21: Taming Snakemake

Ties that bind