master unix 2011
TRANSCRIPT
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Paolo Marcatili “Sapienza” University of Rome Biocomputing group
Introduction to Programming and Algorithms
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Course Calendar & structure
Fri 2, Dec: Introduction & Unix Fri 13, Jan: Perl – Data types Fri 20, Jan: Perl – IO and loops Fri 27, Jan: Algorithms: Sort and Search Thu 2, Feb: Perl – Parsing Thu 9, Feb: Seminar – Machine Learning Fri 10, Feb: Perl – Practicals Fri 17, Feb: Perl – Practicals
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Unix for Bioinformaticians
A survival guide
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Agenda
• Unix • Folders • Files • Processes • Redirection
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Unix
Bioinformatics master course, ‘11/’12 Paolo Marcatili
What is Unix? Unix
Bioinformatics master course, ‘11/’12 Paolo Marcatili
What is Unix?
Operating system stable, multi-user, multi-tasking for servers, desktops and laptops.
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Types of Unix
• Solaris"
• OS-X"
• Linux!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Unix Operating System
Bioinformatics master course, ‘11/’12 Paolo Marcatili
What’s in Unix?
Files (data) Processes (actions)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Unix Filesystem
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Terminal
It makes the difference! Powerful Transparent User-unfriendly
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Our Task Today
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Human Immunoglobulins
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Human Immunoglobulins
Bioinformatics master course, ‘11/’12 Paolo Marcatili
The data
We have 3 files Fasta Sequences of Heavy (heavy.fasta) Lambda (lambda.fasta) Kappa (kappa.fasta)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
The Task
• Look data • Correct errors • Do some analysis • Make a single fasta file
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Folders: read and write
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Change Directory
To know where you are >pwd /home/bioinfoSMFN Let’s move! >cd /home/bioinfoSMFN/Desktop/task Or >cd Desktop/
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Folder content
Let’s check what’s in the folder: >ls Better >ls -la Cheat: parameters are useful! >ls -lart
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Wildcards
Extremely useful! >ls -la *heavy* >ls -la heavy.?asta *= 0,1,2,…. occurences of whatever ?= exactly 1 occurence of whatever
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Commands
To know more about a command: >man ls >whatis ls >apropos ls Or Google!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Unix Permissions
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Folders - summary
Command Meaning ls list files and directories ls -a list all files and directories mkdir make a directory cd directory change to named directory cd change to home-directory cd ~ change to home-directory cd .. change to parent directory pwd path of the current directory
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Files: read and write
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Backup original data
Create a directory >mkdir backup Copy all the files in the directory >cp heavy.fasta backup/ >cp heavy.fasta heavy_copy.fasta >mv heavy_copy.fasta backup/
heavy_backup.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Read a file
Let’s see… >cat heavy.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Read a file
Let’s see… >cat heavy.fasta Mmh… >less heavy.fasta Hint: q to quit… but if you are stuck try ctrl+c
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Look for text
If we want to look for something while in less: / 91979410 (enter)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Edit a file
Let’s remove the first H sequence >vi heavy.fasta >nano heavy.fasta For those who live in 2008 >gedit heavy.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Look for text
How many Igs in each file? We can count the lines! >wc heavy.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Look for text
How many Igs in each file? We can count the lines! >wc heavy.fasta Lines > proteins!!!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Grep
# of proteins = # of “>” in a file >grep “>” heavy.fasta >grep -c “>” heavy.fasta Do it for all the files and write the result (I think they are too many…)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Grep
>grep -c V-region heavy.fasta >grep -c v-region heavy.fasta >grep -ci v-region heavy.fasta Ok, better!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Files: sumary
cp file1 file2 copy file1 and call it file2 mv file1 file2 move or rename file1 to file2 rm file remove a file cat file display a file less file display a file a page at a time head file display the first few lines tail file display the last few lines grep 'key' file search a file for keywords wc file number of lines
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Processes
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Run a process
Process = execution of some instructions Usually the instructions are in a file. e.g. less -> /usr/bin/less So >/usr/bin/less heavy.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Run - It’s simple
Ok, so let’s try >./loop.pl
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Run - It’s simple
Ok, so let’s try >./loop.pl Ok, and now?
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Controlling processes
Ctrl+z -> go to sleep Now you’ve got the control again! But it’s not still dead… bg Ctrl+C
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Controlling processes
Or: >./loop.pl Ctrl+z >ps >top (q to exit) >kill the_number_that_you_have_just_read
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Redirection
Control the force
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Inputs and outputs
Keyboard Mouse Tablet
Kernel
Display Printer File
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Write into a file
>Cat > lista.txt Heavy 29061 Ctrl+d And >less lista.txt
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Append to a file
>Cat >> lista.txt Kappa 7476 Ctrl+d And >less lista.txt
Bioinformatics master course, ‘11/’12 Paolo Marcatili
The big one!
>cat heavy.fasta > bigone.fasta >cat kappa >> bigone.fasta >cat lambda.fasta >> bigone.fasta >grep -c “>” bigone.fasta
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Extract headers
>grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file?
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Extract headers
>grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file? No! >grep “>” bigone.fasta | sort >
headers_sorted.head | is called pipe, and it’s something
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Redirect everything!
>grep ">" biggone.fasta 2> err.log >less err.log
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Redirect - summary
command > file redirect std output to a file command >> file append std output to a file command < file redirect std input from a file cmd1 | cmd2 pipe the output of cmd1 to the
input of cmd2 cat f1 f2 > f0 concatenate f1 and f2 to f0 sort sort data
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Quiz
• What does this command perform?">grep -v ">" heavy.fasta
• How many proteins in each file? • How many residues in each file?"
(hint: wc -m counts the # of char in a line) • Average length of proteins in each file? • Average content of Prolines of VH, VL and
VK?"(hint: look at grep parameters)