taming snakemake

Post on 13-Jul-2015

961 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Taming Snakemake

1/27/14

Why Make?

What are Make's advantages (over Perl and shell scripts)?

Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of variables, conditionals, and loops. In Shell you are forced to think like a caveman.

Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.

Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.

Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.

Make allows you to add new input files without worrying about overwriting old ones.

Make is well supported. There are 1333 Make questions on SO alone.

When people see a Makefile, they immediately know how to run it.

Make does not force you to wrap shell statements in quotes.

Make is a DSL. It will attempt to validate your syntax.

Make is ancient, ubiquitous, and reliable.

Make can parallelize with --jobs.

Make recipes encourage reuse.

https://share.chop.edu/pages/viewpage.action?pageId=138478819

Make review

http://github.research.chop.edu/BiG/err_chip_seq/blob/master/Makefile

Pipelines and Workflows

Other pipelines

Ruffus GKNO

Queue

Why Snakemake?

Addresses Makefile weaknesses without throwing out the good stuff

Difficult to implement control flow

No cluster support

Inflexible wildcards

Too much reliance on sentinal files

No reporting mechanism

Johannes Köster

Syntax

Make Snakemake

Variables

Targets

Rules

Utilities

Logs - wire them up manually

Cluster support pretty decent

Cores/jobs/resources

source /nas/is1/leipzig/martin/variome-env/bin/activatesnakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j 16

source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activatesnakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j 16

Useful stuff

dry-runs

keep-going

touch

version changes

workflow diagrams

Python legal

Client websites with Jekyll

Jekyll is a templating engine for blogs that accepts Markdown

Layouts use the Liquid markup

http://mitomap.org/martin-rna-seq/

A workflow that reports itself

Avoiding Sweave-Hell

The bad way

Cache-ing chunks?

Avoiding Sweave-Hell

Avoiding Sweave-Hell

R/Snakemake integration

git submodule add git@github.research.chop.edu:BiG/rna-seq-common-functions.git common/rna-seq

Leave a paper trail

Reproducible Checklist

repository github.research.chop.edu

workflow of some kind from beginning to end

website at mybic.chop.edu

Ties that bind

top related