short introduction of software engineering for bioinformatics

40
Short Introduction to Software engineering for bioinformatics Joe Miyamoto

Upload: -

Post on 23-Jan-2018

486 views

Category:

Engineering


7 download

TRANSCRIPT

Short Introduction to Software engineering for bioinformatics

Joe Miyamoto

Progress of software development practice

Waterfall Agile DevOps

There is no silver bullet for software engineering and every each has own advantageBut in terms of bioinformatics, there is very little chance for adopting Waterfall.

Evolution rather than progress?

Waterfall Agile DevOps

In more precise…

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

Waterfall Agile DevOps

Waterfull stlyle development

• Very old and popular style of product development

Do not go back to previous section

It has to be clear about what we really want to make

Suited for Large-scale development

“Few A-Class architect and the mass of C-class programmer”Ref: http://fireside.gamejolt.com/post/the-game-creation-process-part-2-designing-the-idea-viq5rk2t

Waterfull style development

Advantage

•Easy to manage the progress(if we found no contingency)

Disadvantage

•Hard to manage the progress(if we do found contingency)

waterfull(spiral)

• Iteration of waterfull

Advantage

• Task becomes clear by each iteration

Disadvantage

• Time consuming

• Hard to determine how much we have to elaborate on first iteration

ref: http://www.qmetry.com/spiral.html

Waterfall Agile DevOps

Agile

• Antithesis for waterfull

• Not Technique, it’s Phiosophy

• 1 iteration is 1~4 week,and 1 feature for each iteration.

Ref: https://www.linkedin.com/pulse/essential-resources-services-technologies-your-startup-jason-oh

Agile

Advantage

• Easy to adopt changes

• Make clear where we are and where we want to go

Disadvantage

• Necessity for refactoring -> CI(We will see later)

• Communication cost -> No more than about 20 people

Difference of agile and spiral

• Spiral … makes every feature in each iteration

• Agile … implements only one feature for each iteration.

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

One way of agile incarnation

Focus on communication of developers

• Make a list for features we one to implement and update constantly

• Each iteration is 30 days and software has to be deployable in the end

• 15 minutes standing meeting everyday

• No partitioning

Scrum

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

2 purpose of software test

Test for users

Focused in Agile

Run test everytime we make a change to source code

Test for developers

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

• Distributed Version Control System(DVCS)

• Able to share history of changes

• Cut a brunch for every single feature or subproject

Ref: http://gotgroove.com/ecommerce-blog/guide-to-version-control-for-magento-using-git-and-beanstalk/

Mercurial (more simple DVCS for pythonista) could be enough for some bioinformaticians, though…

Workflow using git(≒ how to branch)

There are several practice of branching but the following are the principle rule

• 1 feature 1 branch

• Master always have to be deployable

出典:https://www.atlassian.com/ja/git/workflows#!workflow-gitflow

• Hosting service for Git

• Filing issue for every subject makes project trackable

Coding -> Pull Request -> Review -> merge

By following this flow, Source code becomes less dependent to particular person

Workflow using Git&githubWork in local

repository

push

Pull Request

Code Review

merge

Fork & clone

Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github

Workflow using Git&githubWork in local

repository

push

Pull Request

Code Review

merge

Fork & clone

Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github

Ticketing↓Issue Tracking

Buid test↓CI

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Continuous Integration(CI)

• Run automated test constantly

• Makes easy to track a Problem

Jenkins: The CI tool

Ref: http://www.slideshare.net/whyme/jenkins-reviewbot

Github and CI tool

Run test every time pushing remote

Common Combination is Github + [travisCI or jenkins]

Ref: https://github.com/hltfbk/Excitement-Open-Platform/wiki/Developers

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Practice for Issue tracking

• Rough schedule is tracked by

Gantt chart, burn down chart

Ref: https://en.wikipedia.org/wiki/Gantt_chart

Ref: http://chandoo.org/wp/2009/07/21/burn-down-charts/

• More precise schedule will be managed by Tickets or issues

Redmine Github + Zenhub

Burn down chart

Gantt chart

Test Driven Development(TDD)

• Manage task Centrally as Ticket

• Make small tasks clear and trackable

出典:http://itpro.nikkeibp.co.jp/article/COLUMN/20130927/507265/?SS=imgview&FD=55983188&ST=devops

Is a commonly used tool

Waterfall Agile DevOps

DevOps

• Extending “Agile” from Development to operation

That is ..

• Reflect changes to working system instantly when we update a code. Not only developing a

software.But to Develop a

Whole System.

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Tipical Situation in bioinformatics

Small daily analysis on laptop

Realize necessity of computation power

Move pipeline to High-performance server

Able to use Cloud?

Use CloudBiolinux or other VM imageFrom bioimg.org

_人人人人人人人人人人_

> dependency hell< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

Software (or package) Version difference

_人人人人人人人人人人_

> No Reproducibility< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

Container Virtualization(docker)

• Include wholeThird-Party developed software into one container.

• Build Once Run Anywhere

• Version-controlable and has Github-like Hosting service

Easy to transport between servers

Develop whole container as “Software”

Progress of Virtualization

chroot、cgroups KVM、Virtualbox

Isolation of file and process space OS Virtualization

• Heavy• Non-easy for Provisioning• Hard to use base image• (chroot has) a danger for depletion of

computation resource by 1 user.

Tries to take advantage of both

Emergenceof Counterforce

• Security problem

• Dockerfile problem

• Portablity problem

Some bugs around caching?Peculiar way of writing ->Better to use packer

Become root is must

Better to be run on Linux kernel version (>= 3.8)

Cloudius OSV

Problem of Docker

Not user-friendly enough so farNot enough community resource such as Base image

Not mature enough to use

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Infrastructure as code

• Maintain Server condfiguration as Code

• Assure to be idempotent

• Easily transport pipelines between servers

Fabric

Ruby base Python base

ChefZero

simple

• Chef requires users to remember fancy jargons• CloudBiolinux supports Fabric

Better to start from fabric

complex