![Page 1: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/1.jpg)
Unix Essentials
Bingbing Yuan
1 Next Hot Topics: Unix – Beyond Basics (Mon Oct 20th at 1pm)
![Page 2: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/2.jpg)
Objectives
• Unix Overview
• Whitehead Resources
• Unix Commands
• BaRC Resources
• LSF
2
![Page 3: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/3.jpg)
Objectives: Hands-on
• Parsing Human Body Index (HBI) array
data
Goal: Process a large data file to get important
information such as genes of interest, sorting
expression values, and subset the data for
further investigation.
3
![Page 4: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/4.jpg)
Advantages of Unix
• Processing files with thousands, or millions, of
lines
How many reads are in my fastq file?
Sort by gene name or expression values
• Many programs run on Unix only
Command-line tools
• Automate repetitive tasks or commands
Scripting
• Other software, such as Excel, are not able to
handle large files efficiently
• Open Source 4
![Page 5: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/5.jpg)
Scientific computing resources
5
![Page 6: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/6.jpg)
Shared packages/programs
6
https://tak.wi.mit.edu
Installed
packages/programs
Request new
packages/programs
![Page 7: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/7.jpg)
Login
• Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php
• Windows
PuTTY or Cygwin
Xming: setup X-windows for graphical display
• Macs
Access through Terminal
7
![Page 8: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/8.jpg)
8
Command
Prompt
user@tak ~$
Connecting to tak for Windows
![Page 10: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/10.jpg)
Unix Commands
• General syntax Command
Options or switches (zero or more)
Arguments (zero or more)
Example: uniq –c myFile.txt
Options can be combined
ls –l –a or ls –la
• Manual (man) page man uniq
• One line description whatis ls
10
command options arguments
![Page 11: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/11.jpg)
Unix Directory Structure
11
root
/
home dev bin nfs lab . . .
jdoe BaRC_Public solexa_public solexa_lodish
page
/lab/page /home/jdoe
![Page 12: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/12.jpg)
Accessing Shared Resources
at Whitehead
• Unix
/nfs/BaRC_Public
/lab/solexa_public
/lab/page
• Windows (access using Start Menu Search)
\\wi-files1\BaRC_Public
\\wi-files1\fink_lab
\\wi-files2\page
\\wi-htdata\solexa_public
• Macs (access using Go Connect to Server…)
cifs://wi-files1/BaRC_Public
cifs://wi-htdata/solexa_public
12
Where’s my lab’s share?
• http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares
![Page 13: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/13.jpg)
Directory Contents
• List files/directories
ls lists the contents of a directory
ls –l includes additional info (eg. permissions, time stamp) Options:
-l long listing
-h human readable
thiruvil@tak /nfs/BaRC_Public$ ls -l
total 4740
drwxrwxr-x 5 gbell barc 4096 2012-03-16 15:56 apps/
drwxrwxr-x 4 gbell barc 4096 2011-10-18 09:48 BaRC_code/
drwxrwxrwx 5 gbell barc 4096 2012-09-17 15:03 Bartel_Lab/
drwxrwsrwx 3 gbell barc 4096 2012-05-04 16:17 Cheeseman_Lab/
drwxrwsrwx 3 byuan barc 4096 2010-11-23 14:22 chip_seq/
drwxrwsrwx 2 gbell barc 4096 2012-02-21 16:26 CMT/
-rw-r--r-- 1 gbell barc 192568 2012-10-10 10:14 du.20121010a.txt
Permissions Owner Group Size (bytes) Time Stamp File or directory
13
![Page 14: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/14.jpg)
Permissions
drwxrwxr-x
14
Type: directory (d)
symbolic link(l) User Group Others
r read
w write
x execute
• Use chmod to change permissions
user(u), group(g), others(o), all(a) chmod u+x foo.pl (user can execute)
chmod g-w foo.pl (group can’t write)
permission denied error
thiruvil@tak /nfs/BaRC_Public$ ls -l myFile.txt
-rw-r--r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt
thiruvil@tak /nfs/BaRC_Public$ chmod g+w myFile.txt
thiruvil@tak /nfs/BaRC_Public$ ll myFile.txt
-rw-rw-r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt
![Page 15: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/15.jpg)
Navigating in Unix
• pwd print working directory byuan@tak ~$ pwd
/home/byuan
• cd change directory cd fink_lab # if you are in /lab
cd to home directory
cd ~
cd to directory above
cd ..
cd to a specific directory
cd /nfs/BaRC_Public
• No such file or directory error
15
![Page 16: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/16.jpg)
Organizing Files and
Directories • Commands
mkdir make a directory
mkdir my_foo
rmdir remove a directory (must be empty)
rmdir my_foo
mv move or rename a file/directory
mv myOldFile myNewFile
cp copy a file
cp myOldFile myNewFile
rm remove or delete a file
rm myFile
16
![Page 17: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/17.jpg)
Unix Tips
• Use to reuse previous commands
• Ctrl-c: stop a process that is running
• Tab-completion:
– Complete commands/file names
• Unix is case-sensitive
17
![Page 18: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/18.jpg)
Getting Files
• Getting files or directories
Files wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.17.zip
Directories from (outside) servers scp -r [email protected]:/broad/lab/works .
18
![Page 19: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/19.jpg)
(Un)Compressing Files
• .gz file Compress: gzip expression.txt > expression.gz
Uncompress: gunzip expression.gz
• .tar.gz file
Compress: tar –czvf myFiles.tar.gz myFiles
Uncompress: tar –xzvf myFiles.tar.gz
Options
-c create an archive (files to archive, archive from files)
-x extract an archive (archive to files, files from archive)
-f FILE name of archive
-v be verbose, list all files being archived/extracted
-z create/extract archive with gzip/gunzip
• View compressed files using: – zmore,zgrep
19
![Page 20: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/20.jpg)
Editing a File
• Command-line editors pico
nano
emacs (emacs –nw)
vi
• Graphical editors (Windows users need an X-windows emulator)
Note: may not be part of standard installation nedit
gedit
xemacs
• Put an & at the end of command line to run it in the background when using a graphical editor so that you can continue to use the terminal window
eg. gedit myFile.txt&
20
![Page 21: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/21.jpg)
Viewing a File
• Display page-by-page basis more myFile.txt
Use: to scroll, space for next page and q to quit
• Display first 15 lines of a file head -15 myFile.txt
• Display last 15 lines of a file tail -15 myFile.txt
• Show all contents of a file cat myFile.txt
Show hidden characters (^M or carriage return)
cat –A myFile.txt
• Display number of lines in a file wc –l myFile.txt
21
![Page 22: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/22.jpg)
Output Redirection and Piping
• Write output of a command to file Write to output file
• sort myFile.txt > myFile_sorted.txt
Replace to output file
• sort myFile.txt >| myFile_sorted.txt
Append to output file
• sort myFile.txt >> myFile_sorted.txt
• Piping “|”: use output of one command
as input for another command sort myFile.txt | more
22
![Page 23: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/23.jpg)
Parsing a File: cut
• Select columns of interest cut –f 9,12-15 myGeneValues.txt > col_9.12to15.txt
Options:
-f output only these fields
-d field delimiter
23
![Page 24: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/24.jpg)
Parsing a File: sort and uniq
• Sort on column(s) sort -k 3,3 myGeneExpression.txt | more
Options:
-n numerical sort
-r reverse
-k pos1,pos2 start a key at pos1, end it at pos2
• Get only unique entries
ensure file is sorted before running uniq uniq mySortedGenes.txt > myUniqGenes.txt
Options:
-c count entries
-d duplicate counts
24
![Page 25: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/25.jpg)
Regular Expressions
• Pattern matching
• Easier to search
• Commonly used regular expressions
Example: list all txt files ls *.txt
25
Regular Expression Matches
. All characters
* Zero or more; wildcard
^ Beginning of a line
$ End of a line
![Page 26: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/26.jpg)
Searching Within a File
• grep (global regular expression print)
• Find words, or patterns, occurring in lines of a file grep TMEM geneList.txt
TMEM131
TMEM9B
TMEM14C
TMEM66
TMEM49
Options:
-v select non-matching lines
-i ignore case
-n print line number
Example: get TMEM that does not end with 9
grep TMEM geneList.txt | grep -v "TMEM14C" | more
26
![Page 27: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/27.jpg)
BaRC Resources
• jura.wi.mit.edu
27
![Page 28: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/28.jpg)
28
BaRC SOP
http://barcwiki.wi.mit.edu/wiki/SOPs
![Page 29: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/29.jpg)
BaRC Scripts
29
![Page 30: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/30.jpg)
Running Scripts on Unix
• Perl bed2gff.pl
• R run_rma_customCDF.R
• Python myScript.py
• Matlab matlab -nodesktop -nosplash myScript.m
• Java Archive (JAR) java -Xmx1000m -jar /usr/local/share/IGVTools/igv.jar
30
![Page 31: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/31.jpg)
Running Programs/Tools on
Unix • bedtools
bedtools intersect -a myGenes1.bed –b myGenes2.bed
Other utilities: http://code.google.com/p/bedtools/wiki/Usage
• samtools samtools view myFile.bam
Other utilities: http://samtools.sourceforge.net/samtools.shtml
• Fastx toolkit fastx_quality_stats -i mySeq.fastq -o fastxStats_mySeq
• FastQC fastqc mySeq.fastq
• BLAST blastp –task blastp -db myProtDB.fa –q myProt.fa –out out.txt
31
![Page 32: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/32.jpg)
Commonly Used Data Locations
at Whitehead
32
Location Description
/nfs/genomes Genome data: gff, gtf, fasta,
bowtie indexed files, blat
indexed file, etc. for several
organisms
/nfs/seq/Data Sequence data, including blast
databases, for several
organisms
/nfs/BaRC_datasets Large (array/NGS) datasets:
HBI, HBM 2.0
![Page 33: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/33.jpg)
Scientific computing resources
33
![Page 34: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/34.jpg)
LSF cluster jobs
34
https://tak.wi.mit.edu
![Page 35: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/35.jpg)
Load Sharing Facility (LSF)
Cluster • More computing power
• Multiple jobs running at the same time
35
![Page 36: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/36.jpg)
LSF Commands
• bsub to submit jobs bsub wc –l reads.fq
bsub “sort foo.txt > sorted.txt”
Options:
-e error file
-o standard out file
-m machine
-u email address
• bjobs to view your jobs bjobs
• bkill to kill a job bkill 237878
36
![Page 37: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012 · Advantages of Unix •Processing](https://reader033.vdocuments.us/reader033/viewer/2022041921/5e6bde8d6cd1285bdf61f18c/html5/thumbnails/37.jpg)
Further Reading
• BaRC: Unix Info
http://iona.wi.mit.edu/bio/education/unix_intro.php
• LSF Cluster (incl. examples)
http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.php
• Whitehead IT Scientific Computing Tutorials http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials
37