master unix 2011

51
Bioinformatics master course, ‘11/’12 Paolo Marcatili Paolo Marcatili “Sapienza” University of Rome Biocomputing group Introduction to Programming and Algorithms

Upload: paolo-marcatili

Post on 09-May-2015

297 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Paolo Marcatili “Sapienza” University of Rome Biocomputing group

Introduction to Programming and Algorithms

Page 2: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Course Calendar & structure

Fri 2, Dec: Introduction & Unix Fri 13, Jan: Perl – Data types Fri 20, Jan: Perl – IO and loops Fri 27, Jan: Algorithms: Sort and Search Thu 2, Feb: Perl – Parsing Thu 9, Feb: Seminar – Machine Learning Fri 10, Feb: Perl – Practicals Fri 17, Feb: Perl – Practicals

Page 3: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Unix for Bioinformaticians

A survival guide

Page 4: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Agenda

•  Unix •  Folders •  Files •  Processes •  Redirection

Page 5: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Unix

Page 6: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

What is Unix? Unix

Page 7: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

What is Unix?

Operating system stable, multi-user, multi-tasking for servers, desktops and laptops.

Page 8: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Types of Unix

•  Solaris"

•  OS-X"

•  Linux!

Page 9: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Unix Operating System

Page 10: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

What’s in Unix?

Files (data) Processes (actions)

Page 11: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Unix Filesystem

Page 12: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Terminal

It makes the difference! Powerful Transparent User-unfriendly

Page 13: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Our Task Today

Page 14: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Human Immunoglobulins

Page 15: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Human Immunoglobulins

Page 16: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

The data

We have 3 files Fasta Sequences of Heavy (heavy.fasta) Lambda (lambda.fasta) Kappa (kappa.fasta)

Page 17: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

The Task

•  Look data •  Correct errors •  Do some analysis •  Make a single fasta file

Page 18: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Folders: read and write

Page 19: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Change Directory

To know where you are >pwd /home/bioinfoSMFN Let’s move! >cd /home/bioinfoSMFN/Desktop/task Or >cd Desktop/

Page 20: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Folder content

Let’s check what’s in the folder: >ls Better >ls -la Cheat: parameters are useful! >ls -lart

Page 21: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Wildcards

Extremely useful! >ls -la *heavy* >ls -la heavy.?asta *= 0,1,2,…. occurences of whatever ?= exactly 1 occurence of whatever

Page 22: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Commands

To know more about a command: >man ls >whatis ls >apropos ls Or Google!

Page 23: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Unix Permissions

Page 24: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Folders - summary

Command Meaning ls list files and directories ls -a list all files and directories mkdir make a directory cd directory change to named directory cd change to home-directory cd ~ change to home-directory cd .. change to parent directory pwd path of the current directory

Page 25: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Files: read and write

Page 26: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Backup original data

Create a directory >mkdir backup Copy all the files in the directory >cp heavy.fasta backup/ >cp heavy.fasta heavy_copy.fasta >mv heavy_copy.fasta backup/

heavy_backup.fasta

Page 27: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Read a file

Let’s see… >cat heavy.fasta

Page 28: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Read a file

Let’s see… >cat heavy.fasta Mmh… >less heavy.fasta Hint: q to quit… but if you are stuck try ctrl+c

Page 29: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Look for text

If we want to look for something while in less: / 91979410 (enter)

Page 30: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Edit a file

Let’s remove the first H sequence >vi heavy.fasta >nano heavy.fasta For those who live in 2008 >gedit heavy.fasta

Page 31: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Look for text

How many Igs in each file? We can count the lines! >wc heavy.fasta

Page 32: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Look for text

How many Igs in each file? We can count the lines! >wc heavy.fasta Lines > proteins!!!

Page 33: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Grep

# of proteins = # of “>” in a file >grep “>” heavy.fasta >grep -c “>” heavy.fasta Do it for all the files and write the result (I think they are too many…)

Page 34: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Grep

>grep -c V-region heavy.fasta >grep -c v-region heavy.fasta >grep -ci v-region heavy.fasta Ok, better!

Page 35: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Files: sumary

cp file1 file2 copy file1 and call it file2 mv file1 file2 move or rename file1 to file2 rm file remove a file cat file display a file less file display a file a page at a time head file display the first few lines tail file display the last few lines grep 'key' file search a file for keywords wc file number of lines

Page 36: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Processes

Page 37: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Run a process

Process = execution of some instructions Usually the instructions are in a file. e.g. less -> /usr/bin/less So >/usr/bin/less heavy.fasta

Page 38: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Run - It’s simple

Ok, so let’s try >./loop.pl

Page 39: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Run - It’s simple

Ok, so let’s try >./loop.pl Ok, and now?

Page 40: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Controlling processes

Ctrl+z -> go to sleep Now you’ve got the control again! But it’s not still dead… bg Ctrl+C

Page 41: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Controlling processes

Or: >./loop.pl Ctrl+z >ps >top (q to exit) >kill the_number_that_you_have_just_read

Page 42: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Redirection

Control the force

Page 43: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Inputs and outputs

Keyboard Mouse Tablet

Kernel

Display Printer File

Page 44: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Write into a file

>Cat > lista.txt Heavy 29061 Ctrl+d And >less lista.txt

Page 45: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Append to a file

>Cat >> lista.txt Kappa 7476 Ctrl+d And >less lista.txt

Page 46: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

The big one!

>cat heavy.fasta > bigone.fasta >cat kappa >> bigone.fasta >cat lambda.fasta >> bigone.fasta >grep -c “>” bigone.fasta

Page 47: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Extract headers

>grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file?

Page 48: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Extract headers

>grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file? No! >grep “>” bigone.fasta | sort >

headers_sorted.head | is called pipe, and it’s something

Page 49: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Redirect everything!

>grep ">" biggone.fasta 2> err.log >less err.log

Page 50: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Redirect - summary

command > file redirect std output to a file command >> file append std output to a file command < file redirect std input from a file cmd1 | cmd2 pipe the output of cmd1 to the

input of cmd2 cat f1 f2 > f0 concatenate f1 and f2 to f0 sort sort data

Page 51: Master unix 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Quiz

•  What does this command perform?">grep -v ">" heavy.fasta

•  How many proteins in each file? •  How many residues in each file?"

(hint: wc -m counts the # of char in a line) •  Average length of proteins in each file? •  Average content of Prolines of VH, VL and

VK?"(hint: look at grep parameters)