lecture i : introductory linux and shell scripts -...

Lecture I : Introductory Linux and Shell Scripts

I. BRIEF INTRODUCTION

First let us begin by learning what Linux is:

• Linux is just another operating system much like Windows.

• It can be obtained free of charge and programs are developed by volunteers.

• It is a very stable operating system and is therefore prefered in large computer systems and for long applications.

• Originally written as proprietary software under the name Unix, then rewritten by Linus Torwald and offeredfree of charge.

• There are several distributions and which one you choose depend on what you expect from an operating system.

• Linux allows you to use the graphical user interface via icons (like Windows) as well as typed commands (likeMS-DOS). Commands are issued on a terminal.

• The collection of commands used in Linux to interact with the operating system is called the shell. The shell isat the same time a (rather clumsy and limited) programming language.

Next let us look at how Linux operates. Although the look and usage is slightly different than Windows, the ideaof storing flies and folders is similar.

A. The filesystem

A filesystem, as the name implies, is a scheme for organizing the various files that are needed for the operationof your system. Disk space is divided into directories, which may branch into several subdirectories. The smallestindivisible storage unit is a file, which is a concept you should already be familiar with from Windows. A file maycontain an image format, plain text or computer code. In a Linux system, the topmost directory is usually simplycalled the root directory and is represented by the symbol /. This directory is usually divided up into subdirectorieseach having crucial content for the operation of the system. The exact content varies among different implementations.Most users find it easy to visualize the filesystem as an upside-down tree where the root is the origin. Directories inthe root directory and their subdirectories keep branching out until you hit the real files that the user creates and/oruses, which can be thought of as the ”leaves” of the tree as illustrated in the figure below.

2

You can view the content of the / directory by typing

hande@p439a:~/teaching/phys343/notes/lab01$ ls /bin dev home media opt root srv sys usr windowsboot etc lib mnt proc sbin subdomain tmp var

The most common of these subdirectories are [1] [2]

• /bin : Contains utilities that are essential to system operation. Hence, the shells, file manipulation commandssuch as cp and chmod all live here.

• /etc : Dedicated to system configuration. Contains configuration files for the system daemons, startup scripts,system parameters, and more.

• /lib : Core applications and utilities installed with Unix require the libraries in /lib to run.

• /usr : End-user applications, such as editors, games, and interfaces to system features are here, as is the libraryof man pages, and more. Chances are that if the file is useful, but not mandatory for system operation, it’ll befound in /usr.

• /var : Repository for files that typically grow in size over time. Mailboxes, log files, printer queues and databasescan be found in /var. It’s commonplace for Web sites to be kept in /var as well, since a Web site tends toamass data preternaturally over time.

• /home : This is the most important. Contains all of the users’ home directories. A home directory is wherea regular user keeps his/her non-application files and modifies them. In most Linux systems, it is the defaultdirectory that you find yourself in when you log in to the system.

B. Linux commands

In contrast to the Windows operating system, actions in Linux are often performed with commands that you writeon a terminal instead of clicking on icons. The near infinite number of Linux commands that we use not only enableus to perform many different tasks but also provide flexibility within each command through options.

A Linux command has the following general structure :

<command> [OPTIONS] [ARGUMENTS]

where

• <command> is a special instruction that causes the system to perform an action (such as ls, mkdir, grep)

• [OPTIONS] are special expressions for the particular command which increases its capabilities. Options areusually supplied by means of a single dash and a letter, e.g. -l, -k, -a. You might also encounter options givenas two dashes and a full expression, e.g. - -escape, - -ignore-backups (for the command ls).

• [ARGUMENTS] refer to entities which we would like the command to act upon. These can be files, directories,sentences, words or even numbers.

Because it is impossible to know off the top of your head all the options to all the commands, a very handy toolhas been designed by Linux developers, which is called the man pages. man stands for manual and if you would like tolist all the options for a particular command, you invoke the related man pages by typing

man <command>

The man command, being a command itself, also has options and a particularly useful one is -k, which lists theavailable commands for a keyword supplied as an argument. So suppose you would like to know which commands areavailable for manipulating, displaying and converting pdf files. You can just type

3

hande@p439a:~/teaching/phys343/notes/lab01$ man -k pdfpdfeinitex (1) - PDF output from TeXtexi2dvi4a2ps (1) - Compile Texinfo and LaTeX files to DVI or PDF.....

and it will list all the available commands. You can then man the particular command of interest and learn moreabout it. I find man -k to be very useful because you find yourself learning other commands while you are lookingfor a particular one, much like looking up a word in a dictionary and learning other words in the process.

C. The shell

Every time you type a command, it instructs the system to perform an action, such as invoke an editor or list thefiles in a directory. The interpreter that converts the commands you type into actual actions is called a shell. Thereare different kinds of shells varying slightly from each other. The most commonly known shells are

1. The original Unix shell sh.

2. The Bourne-Again-SHell bash.

3. The Korn shell ksh.

4. The C shell csh.

The Linux system allows you to choose which shell you want to work with. For our purposes, any of the aboveshells will work.

There are two ways to use shell commands : on the command line (from the terminal) and from a file. Linuxcommands are a very versatile set, which means they allow you to do almost anything you can think of but it mighttake a while to find out how to conduct a given task through arguments and options. This can only be improvedthrough lots of practice.

The shell not only provides us with a very useful collection of commands, it also gives us the option of puttingtogether a set of commands in executable files and creating, in a way, our own commands. This is useful if we findourselves repeating the set of commands several times. Such files are called shell scripts and they usually carry theextension “.sh”.

Shell scripts may be quite complicated but here we shall cover only two important aspects : shell variables andcontrol structures.

1. What’s a variable?

A variable in a computer program is, in a way, similar to a variable in math. It’s basically a symbol whosecontent may change during the course of the program. The difference between mathematical variables and variablesin computer programs is that mathematical variables are an abstract concept whose values may never be explicitlyassigned whereas in most computer codes (except those which perform analytical computations), each variable has awell-defined value at any given time. This is in contrast with a constant whose value is assigned to a number, stringor character once and for all without undergoing any changes.

Let’s see an example of a variable in Octave, which is one of the computer languages we will be using in this course.

octave:1> a=pia = 3.1416octave:2> b=a^eb = 22.459octave:3> c=a+bc = 25.601octave:4> a=(sin(a/6))^2a = 0.25000

4

Here a and b are variables. At the beginning of the Octave session, a is assigned to the predefined mathematicalconstant pi. On the next line, the second variable b is calculated by raising the first variable a to another specialconstant e. Notice here that the computer remembers the value of the variable a when called while calculating thevalue of the variable b and replaces in its place the number pi. The third operation uses the already stored valuesof the variables a and b to construct a third variable c. In the final line, we observe a common usage of variables,available in many computer languages, which is the reassignment of a variable through self-manipulation. On theright-hand side of the equation, we see an operation involving the old value of the variable a, namely pi. After thisoperation is completed, the resulting value is reassigned to the same variable a whereafter the value of this variableis modified to 0.25. Variables are necessary in computer programming for the flexibility of our programmes. It wouldbe silly to write a program that knows only to add 2 and 4 but not 2 and 3, much like how absurd algebraic additionwould be if it were only defined for 2 and 4 and not any other number.

2. Shell variables

In practice, variables in different computing languages are accessed in different ways. In some languages such as Cand Octave, they are called simply by their name, whereas in others, you might need a special additional character toaccess the value assigned to a particular variable. The shell programming language is an example to those languageswhere values of variables are extracted using the special character, $.

hande@p439a:~$ a="Phys343 is a great class."hande@p439a:~$ echo $aPhys343 is a great class.hande@p439a:~$ b=4.3hande@p439a:~$ echo The number that you entered is $b.The number that you entered is 4.3.

Notice the difference in the way we assign and extract variables in shell. In assigning a number, a string or a characterto a shell variable, we just simply use the name of the variable with the = sign. In accessing it however, we need touse the $ sign. Compare this to how variables are accessed directly without the need of a special character in Octaveas in the example above.

The concept of variables in computer programming necessitates the expansion of the meaning of the variable. Incontrast to mathematical variables, which are numerical, variables in a computer program may be a variety of objectsuch as numbers, characters, files, strings and arrays.

3. Shell scripts

As mentioned above, it’s possible to work in Linux by executing command after command from a terminal. Youwill see very soon, though, that this proves rather inefficient if you find yourself repeating the same set of commandsover and over again. For example, if you wish to calculate the terminal speed of a biker in the presence of friction,this is difficult to do from the command line for two reasons :

1. There are two many steps and you cannot take back what you have written from the command line. You haveto start all over again every time you’ve made a mistake.

2. If you want to repeat the same calculation for different sets of parameters, say for bikers with different poweroutputs, you would have to type all those lines of code on the command line again.

Instead, you can collect all those commands in a file and call that file from the command line with different arguments.For example, suppose we want to assign a number two a variable, multiply this variable by two and three and assign

each result to different variables and finally display all three variables. Compare the following two ways of doing this.

• On the command line

hande@p439a:~$ a=3hande@p439a:~$ b=‘expr $a \* 2‘hande@p439a:~$ c=‘expr $a \* 3‘hande@p439a:~$ echo The three numbers are $a, $b and $c.The three numbers are 3, 6 and 9.

5

• From a file (a shell script)

hande@p439a:~/$ cat multiply.sha=3b=‘expr $a \* 2‘c=‘expr $a \* 3‘echo The three numbers are $a, $b and $c.hande@p439a:~/$ sh multiply.shThe three numbers are 3, 6 and 9.

Notice that shell scripts are generally run by using the interpreter sh. When you run the set of commands from thescript, it becomes very easy to change the value of parameters and rerun. You only need to go into the file multiply.shand change the line a=3 to, say, a=4. In fact you can even do better by making only a minor modification to thecode and supplying command line arguments to your customized shell script. These arguments are very much likearguments you would give to an ordinary shell command and this way you wouldn’t even have to modify your file.

hande@p439a:~/$ cat multiply.sha=$1b=‘expr $a \* 2‘c=‘expr $a \* 3‘echo The three numbers are $a, $b and $c.hande@p439a:~/$ sh multiply.sh 3The three numbers are 3, 6 and 9.hande@p439a:~/$ sh multiply.sh 4The three numbers are 4, 8 and 12.

In shell, the construct $<n> designates a special function, which accesses the value of the nth argument that yousupply to your function when you call it from the command line. You can call your script with different argumentseach time and you can use as many arguments as you wish although it is in general good practice to keep the commandline arguments to a minimal set.

4. File manipulation commands

One of the most commonly used features of the shell is its file manipulation functions. If you are looking for a givenpattern in a text file, trying to extract a column of data from a file containing several columns or trying to print thefirst/last few lines of a file, you may make use of the vast number of commands available in the shell. We shall nowdemonstrate some of these functions through several examples which we will extend in the lab.

• cat, grep, wc, | : First let us create a file using the command cat

hande@p439a:~$ cat > file1.txtI have one dog.You have one dog.I have two cats.You have no cats.I have two sisters.Ali has three sisters.I have one brother.Ayse has no brothers.Ctrl-d

Next let us find all the lines where there is an occurence of the word dog.

hande@p439a:~$ grep dog file1.txtI have one dog.You have one dog.

6

Now let us count the number of lines of the file file1.txt using the command wc, which stands for word count.

hande@p439a:~$ wc file1.txt8 32 154 file1.txt

In the output of this command the first number is the number of lines, the second is the number of words andthe third is the number of bytes which are the basic units of information stored in a computer.

Now let us combine the two operations above, namely let us find, say, all the instances of the word one in thefile and count the number. For such a chain of commands we use a structure called a pipe and denoted by thevertical line |. It is used for taking the output of one command and channeling it over to another command asinput.

hande@p439a:~$ grep one file1.txt | wc3 12 54

We thus have three occurences of the word one in our file. Notice that if you are using the pipe, the name ofthe file is no longer necessary as the argument of wc.

• head, tail, awk : Suppose you are given the following text file, which you might imagine could be the outputfile of a scientific calculations. (Notice the alternative use of cat for displaying the contents of a file.

hande@p439a:~$ cat file2.txtThe energy at step 1 is 3.4 eV.The energy at step 2 is 2.1 eV.The energy at step 3 is 1.2 eV.The energy at step 4 is 1.1 eV.The energy at step 5 is 1.1 eV.End of calculation.

Let us say that you would like to display the first three lines or the last two lines of this file. The commandshead and tail take the number of lines you would like together with the name of the file and display the linesfrom the beginning or end respectively.

hande@p439a:~$ head -3 file2.txtThe energy at step 1 is 3.4 eV.The energy at step 2 is 2.1 eV.The energy at step 3 is 1.2 eV.hande@p439a:~$ tail -2 file2.txtThe energy at step 5 is 1.1 eV.End of calculation.

While dealing with output files in text formats we want, most of the time, to extract the important results(mostly numbers) among the rest of the unuseful text. The command awk allows us to do this by means of acumbersome but straightforward syntax.

hande@p439a:~$ awk ’{print $7}’ file2.txt3.42.11.21.11.1

hande@p439a:~$ awk ’{print $5,$7}’ file2.txt1 3.42 2.13 1.24 1.15 1.1

7

hande@p439a:~$ awk ’{print "Iteration:",$5, "Energy:", $7}’ file2.txtIteration: 1 Energy: 3.4Iteration: 2 Energy: 2.1Iteration: 3 Energy: 1.2Iteration: 4 Energy: 1.1Iteration: 5 Energy: 1.1Iteration: Energy:

Above is three different ways of displaying the results. In the last one though there is the unwanted final linewhich is a problem. There are many ways of getting rid of the final line. Here are two examples.

hande@p439a:~$ awk ’{print "Iteration:",$5, "Energy:", $7}’ file2.txt | head -5Iteration: 1 Energy: 3.4Iteration: 2 Energy: 2.1Iteration: 3 Energy: 1.2Iteration: 4 Energy: 1.1Iteration: 5 Energy: 1.1grep -v End file2.txt | awk ’{print "Iteration:",$5, "Energy:", $7}’Iteration: 1 Energy: 3.4Iteration: 2 Energy: 2.1Iteration: 3 Energy: 1.2Iteration: 4 Energy: 1.1Iteration: 5 Energy: 1.1hande@p439a:~$ nolines=‘wc file2.txt | awk ’{print $1}’‘hande@p439a:~$ nolinesminusone=‘expr $nolines - 1‘hande@p439a:~$ awk ’{print "Iteration:",$5, "Energy:", $7}’ file2.txt | head -$nolinesminusoneIteration: 1 Energy: 3.4Iteration: 2 Energy: 2.1Iteration: 3 Energy: 1.2Iteration: 4 Energy: 1.1Iteration: 5 Energy: 1.1

• diff, paste : If you have two files that are very similar except for minor differences and you would like tocompare them, you can use the command diff.

hande@p439a:~$ cat file3.txtabcThis file is the first file.123hande@p439a:~$ cat file4.txtabcThis file is the second file.123hande@p439a:~$ diff file3.txt file4.txt4c4< This file is the first file.---> This file is the second file.

8

hande@p439a:~$ diff -u file3.txt file4.txt--- file3.txt 2009-10-01 01:00:47.000000000 +0300+++ file4.txt 2009-10-01 01:00:57.000000000 +0300@@ -1,7 +1,7 @@abc-This file is the first file.+This file is the second file.123hande@p439a:~$ paste file3.txt file4.txta ab bc cThis file is the first file. This file is the second file.1 12 23 3

The command option -u used with diff allows the user to display the differences between the files in somecontext. The final command paste simply displays the two files side-by-side.

II. EXERCISES

Exercise 0 : Download some files and practice a few commands.In order to complete this lab session, you will need to download some files from web site of our course. Open aterminal and type the following :

hande@p439a:~$ wget www.physics.metu.edu.tr/~hande/teaching/343-labs/lab-01/files/files.tar.gz

wget is a fast way to download files from the Web directly to the directory you are in as long as you have a connection.Now pay attention to the extension of the files, .tar.gz. This file really has two extension. The first, .tar meansthat this files is in fact a collection of files that you combine to form one file; in computing jargon such a file is calleda tarball. This is convenient in cases when you find yourself having to send or receive multiple files, just like in ourcase. The next extension is .gz, which means that this file is zipped or compressed to save space. In order to get anidea of the size reduction as a result of this operation, see the size of the file by typing

hande@p439a:~$ ls -lh files.tar.gz

As you see the size is really small. Now let us unzip and untar this file :

hande@p439a:~$ gunzip files.tar.gz

If you list the files in your directory you will see that you have a file with the name files.tar. Now, let us see onceagain the size of this new file.

hande@p439a:~$ ls -lh files.tar

As you see, the file is now several times larger. The files that are in this tarball are text files and therefore compressreally well. Other file types such as binary do not compress very well since they are already in a somewhat compressedformat. Now let us exract our files from the tarball. To do this, type

hande@p439a:~$ hande@p439a:~$ tar xvf files.tarcolumns.txt

9

einstein1.txteinstein2.txtsun1.txtsun2.txtsun3.txtsun4.txt

The options xvf stand for extract, verbose and file respectively. Now you have extracted all the files necessaryto complete this lab exercise. The opposite of the actions described above, namely compressing and unzippingcan be achieved using the commands gzip and tar cvf (c stands for create). Please refer to the man pages for details.

Exercise 1 : Learn to work with an editor.

The first thing that a programmer should do is to choose a good editor and get to know its functionalities. Aneditor is an interface between you and the computer that allows you to create and edit files. A valid analogy inWindows would be NotePad. Conversely, Windows Word is not a good analogy since the document that you createin Windows is converted into machine language and is not human-readable as text. Editors, in contrast, allow you toread and write text files directly. While you can mix text and figures in a Word document, editors only accept text.To create a Word-like document such as a report, you need to use word-processing software such as LaTex.

There is a plethora of editors that are available for Linux. In this lecture, we will concentrate on emacs but feelfree to explore other possibilities and find the one that is right for you. Some choices are vim (or vi), nano, pico andnedit. Let us practice some basic emacs with a step-by-step tutorial. Important : It will take some time toget the hang of using the very extensive functionality of emacs but once you have, it will make yourprofessional life a whole lot easier.

1. Open up a terminal, and invoke emacs.

hande@p439a:~$ emacs my-file.txt

Now come back to your terminal and try to type something. Your terminal probably looks something like thiswhen you try that :

It seems like you have lost your prompt and your terminal is totally inactive. If you would like to go back andforth between your emacs window and your terminal and use both simultaneously, you cannot do that. Youcan, of course, circumvent this problem by opening a second terminal and leave the inactive terminal hangingaround on your desktop but that is hardly a very elegant solution. A better way is to invoke the emacs window

10

in the background, so that it doesn’t interfere with your terminal. Now go to your inactive terminal and pressCtrl-c (that is hold down the Ctrl and while holding it press c). This liberates your terminal, killing youremacs window.

Now re-invoke emacs with the following comment.

hande@p439a:~$ emacs my-file &The & sign sends a job to the background, with an external window popping up.[1] 4771 This is the process ID assigned to emacs.hande@p439a:~$ jobs[1]+ Running emacs my-file.txt &

The command jobs allows you to list all the processes that you have running in the background. We will revisitthis command some time later.

2. In the emacs window that opens, type in the text normally as if you were typing in a Word document.

3. It’s good practice to save your work often while using an editor, so click on the File icon on the toolbar on topof screen. Editors, in particular emacs usually allows tasks to be performed through keyboard shortcuts, whichare key combinations that speed up typing. Saving your work in emacs can be done by the key combinationCtrl-x Ctrl-s. This sequence is to be interpreted in the following way : Hold down the Ctrl key and pressfirst x and then s, without releasing the Ctrl key.

4. Next, we are going to create a third line that is almost identical to the other second line, except we’ll replacethe word “second” with the word “ third”. For this go to the beginning of the second line and type Ctrl-k.This will cut the text and store it in a sort of a clipboard. Then type Ctrl-y once to undo the cutting of thesecond line, press enter to go on to the next line and then type Ctrl-y again to form the third line. Save yourwork.

11

Now, you are at the beginning of the third line. Go to the end of the word “second” and type Alt-Backspace.This action erases entire words instead of erasing them letter-by-letter.

You can now finally type the word “third”.

Make sure to save again! As long as you don’t save your work, it won’t actually get written to disk and will belost when you close your emacs session.

You can alternatively do all this by using the various menus and icons on the top of the screen. However, learningkeyboard shortcuts using emacs allows you much flexibility and speed. A complete list of keyboard shortcuts isbasically impossible but a concise collection can be found at various pages on the web. Two examples :

• http://www.cs.duke.edu/courses/fall01/cps100/emacs.html• http://naveen.wikidot.com/emacs-commonly-used-shortcuts

5. Suppose you are done with the first file my-file.txt and would like to open a second file. One very rudimentaryapproach would be to close your present emacs window and open another one, but as you can guess there arebetter ways of doing this. You can actually continue to use your current, active emacs window to navigatebetween files. We are now going to open one of the text files we have downloaded in the previous exercise, sayeinstein1.txt. To do this, use the keyboard shortcut Cntr-f. This will prompt you to enter the name of thefile you wish to open in the small bar underneath the main text window, like so :

Now your cursor is in the small bar and anything you type appears there until you press Enter. Type the nameof the file you would like to see here and press Enter. NOTE that Tab-completion can be used in the smallbar as well. If you make a mistake or if you change your mind about opening the file, you can always use theshortcut Ctrl-g to cancel your operation. In fact, this key combination can be used for most errors.

12

6. Finally let us close the emacs window. Simply use the shortcut ctrl-x ctrl-c and watch the window disappear.

7. OPTIONAL : A big advantage of using emacs is that, if configured properly, it recognizes the particularlanguage used and colors the special words of the language appropriately. In addition, it also does theindentation properly, which is a great help when coding. As an example, consider the following two codesnippets written in Octave and C.

Octave mode C mode

If you are interested in making use of this feature of emacs in addition to many others, you can download theemacs configuration file from the course Web site. In order for it to work correctly, you need to save it directlyunder you home directory with the name .emacs. Next time you call up an emacs window, you will see thechanges.

Here are some similar notes on other editors. Please go through these as well by yourself and decide which one isbetter for you. In the next lecture, we will take a poll to see how many people decided on which editor.

Vi : (pronounced vee-i)

1. Invoke the editor vi.

hande@p439a:~$ vi my-file.txt

There’s no need to send vi to the background because unlike emacs, it opens the editor in the terminal andnot in an external window. As a result of this there is no option to use the process and the originating terminalsimultaneously.

2. vi has two modes : the insert mode and the command mode. The insert mode lets you type while the commandmode is for actions such as removing line, moving the cursor around and yanking text. In order to type into thefile, type i. This will take you into the insert mode. You can then type text normally.

3. After you are done typing, you need to save your work before, exiting. For this, type Esc. This is allow you toexit the insert mode and go into the command mode. Now, type :, you will see it appear at the bottom of thescreen. If you would like to save and quit the file, type x. If you would like to save and continue working, typew. If you, instead, want to quit without saving changes, type q!. The ! forces quit without saving.

13

4. You can read more about vi at http://www.cs.colostate.edu/helpdocs/vi.html.

Pico :

1. Invoke the editor pico.

hande@p439a:~$ pico my-file.txt

Once again, you do not need to send the editor to the background.

2. Pico doesn’t have modes so you can type right away.

3. Some keyboard shortcuts in pico are the same as those in emacs. You can for instance use Ctrl-k to cut a line.Yanking a line, however, has a different binding, namely Ctrl-u.

4. Once you are done writing, you can either type Ctrl-o and save without exiting or Ctrl-x to exit, which willprompt you to save the file.

5. The good thing about pico is that it lists relevant actions at the bottom of the screen although I find itscapabilities rather limited in comparison to emacs.

Exercise 2 : Learn to manipulate and list files and directories : mv, cp, mkdir, touch, ls

The first exercise of this lab period is also going to be a lesson in good computing habits. The best way to confuseyourself about your work is by keeping your files all in one place without making use of directories. You should divideyour files into meaningfully-named directories and possibly further into sub-directories. So let’s start by making adirectory to contain the work you’ll be doing for this class and sub-directories to separate your work between classes.

hande@p439a:~$ mkdir phys343hande@p439a:~$ cd phys343hande@p439a:~$ mkdir lab-01hande@p439a:~$ cd lab-01hande@p439a:~$ pwd/home/hande/phys343/lab-01

The command mkdir creates a new directory with the name you supply to it. Then you can go into the newlycreated directory using the command cd, which stands for change directory and does precisely that. You can equallywell create the directory phys343 and the sub-directory lab-01 at the same time.

14

hande@p439a:~$ mkdir -p phys343/lab-01

You do, however, have to use the -p option to force the system to simultaneously create the parents of the deepestsub-directory lab-01. You can then make sure you are in the right place by using the pwd command. pwd gives youthe full path or address of the current directory.

Now, we can create the files that are necessary for this lab. In most Linux systems, there are several ways to createa file.

1. Using the touch command :

hande@p439a:~$ ls my-file.txtls: my-file.txt: No such file or directoryhande@p439a:~$ touch my-file.txthande@p439a:~$ ls -l my-file.txt-rw-r--r-- 1 hande users 0 2007-10-02 21:22 my-file.txthande@p439a:~$ touch my-file.txthande@p439a:~$ ls -l my-file.txt-rw-r--r-- 1 hande users 0 2007-10-02 21:23 my-file.txt

Here the command touch creates an empty file if the file does not already exist and it only changes its modi-fication date if it does. The command ls lists all the files and directories that are under the directory you arein. The option -l displays different aspects of the files being listed. (See below)

2. Using the cat command :

hande@p439a:~$ ls my-file.txtls: my-file.txt: No such file or directoryhande@p439a:~$ cat > my-file.txtThe > sign directs the following into the file.This is the first line.This is the second line. Terminate with Ctrl-d.^Ctrl-d

hande@p439a:~$ cat my-file.txt Without the > sign, it only prints contents of file.This is the first line.This is the second line.hande@p439a:~$ cp my-file.txt my-file2.txthande@p439a:~$ cat my-file.txt my-file2.txtThis is the first line.This is the second line.This is the first line.This is the second line.hande@p439a:~$ cat my-file.txt my-file2.txt > my-file3.txt > sign redirects into a filehande@p439a:~$ cat my-file3.txtThis is the first line.This is the second line.This is the first line.This is the second line.

cat is a command that assumes different functions in different situations. It can be used

1. to create files,

2. to concatanate files, meaning to add two or more files consecutively,

3. or to just simply display a file.

15

Exercise 3 : Comparing files : diffSuppose you have two files that are almost the same but you know that there are a few differences between them.You would like to know these differences and where in the file they occur. For this exercise, we will work on the fileseinstein1.txt and einstein2.txt we have downloaded from the Web site earlier. They both contain similar text with afew different words in each paragraph. First let’s just see a brief account which says whether the two files differ.

hande@p439a:~$ diff -q einstein1.txt einstein2.txtFiles einstein1.txt and einstein2.txt differ

Now let’s see the lines where they differ. For this we’ll simply invoke the diff command without any options.

hande@p439a:~$ diff einstein1.txt einstein2.txt9,10c9,10< Jack Rosenberg remembers the time Einstein’s coworker asked him to< turn the tables and help give the famous scientist a present.---> Jack Rosenberg remembers the time Einstein’s colleagues asked him to> turn the tables and help give the famous scientist a gift......

You’ll see that the differences are given line by line together with line numbers.Sometimes, it is useful to see the differences in context, meaning displaying the differences with some lines around

it. This is especially valuable if your text is sparse and it’s hard to understand the location of the differences withoutseeing what’s around it. This function can be invoked using the -U option. You can decide how many lines you wantto be displayed around the lines with differences.

hande@p439a:~$ diff -U 1 einstein1.txt einstein2.txt--- einstein1.txt 2007-09-19 16:47:39.000000000 +0300+++ einstein2.txt 2007-09-19 16:48:34.000000000 +0300@@ -8,4 +8,4 @@

Albert Einstein was well known in Princeton for his generosity. But- Jack Rosenberg remembers the time Einstein’s colleagues asked him to- turn the tables and help give the famous scientist a gift.+ Jack Rosenberg remembers the time Einstein’s coworkers asked him to+ turn the tables and help give the famous scientist a present.

@@ -22,3 +22,3 @@Einstein’s most important findings, the theory of special relativity,

- plans are now under way to remember the findings of the man who+ plans are now under way to remember the discoveries of the man who

revolutionized physics in 1905 by redefining scientists’ perception of

Example 4 : Extract information from files : awk, head and tail; the pipe.

Suppose we have a file that contains several columns. We’d like to display the first three lines of items that are onthe fifth column of this file. For this exercise, we will use the file columns.txt that we have downloaded earlier.

First let’s learn how to display the first N lines of a file. You can do this using the command head. The defaultnumber of lines displayed with head is 10, however, you can override this by specifying the number of lines you wouldlike to be displayed.

hande@p439a:~$ head -3 columns.txt! total energy = -1090.13343774 Ry! total energy = -1090.20757070 Ry! total energy = -1090.24296462 Ry

This is almost what we want except we would like only the fifth column (i.e. the numbers column) to be displayed.We do this using a command called awk. awk is really a pattern searching language in itself, which is very helpful forcertain tasks. awk is ordinarily used to extract columns from files in the following way :

16

hande@p439a:~$ awk ’{print $5}’ columns.txt-1090.13343774-1090.20757070-1090.24296462-1090.25563488-1090.27085564-1090.27693129-1090.28213580-1090.29131927

In awk and in a lot of shell scripting, the direction of the quotation marks is very important. Whatever is insidethe {́ }´ is interpreted as the instruction given to awk, which in our case is to print the fifth column of the file,designated by $5.

However, this isn’t what we wanted to do. Instead of displaying the fifth column of the entire file, we are onlyinterested in displaying the fifth column of the first three lines. A very common construct in Linux when we wantto process the outcome of a command using a second command is the pipe, which is the vertical line |. Instead ofhaving awk read from a file, we can pipe the output of head to awk directly without having to save it to a file first.

hande@p439a:~$ head -3 columns.txt | awk ’{print $5}’-1090.13343774-1090.20757070-1090.24296462

Note that the first part of the pipe, namely the one starting with head is a full command with the file name, whereasthe second part does not have the file name as the argument anymore.

To add a final complication, let’s say that we are interested in displaying only the fifth column of the second lineof this command. We can do this by adding another pipe with the command tail. The usage of tail is very muchlike that of head except that it displays the last N lines of text.

hande@p439a:~$ head -2 columns.txt | awk ’{print $5}’ | tail -1-1090.20757070

The action head -2 displays the first two lines, awk then selects the fifth column of the output and finally tail -1displays the last line of the second output. You can form a chain of pipes of arbitrary length in this way.

Exercise 5 : Extracting information from files : ls, less, cat, grepIn this exercise we will use the four files sun1.txt, sun2.txt, sun3.txt and sun4.txt we have downloaded earlier. Firstmake sure that you have all the files and that they are not empty by using the ls command.

hande@p439a:~$ ls -l -l option lists informationabout content, permissions,size, owner etc.

total 136permissions links owner group size modification time name

datedrwxr-xr-x 2 hande users 176 2007-09-18 22:03 figs-rw-r--r-- 1 hande users 113 2007-09-18 22:03 my-file.txt-rw-r--r-- 1 hande users 69323 2007-09-18 22:31 notes.pdf-rw-r--r-- 1 hande users 5982 2007-09-18 22:38 notes.tex-rw-r--r-- 1 hande users 436 2007-09-18 22:31 notes.toc-rw-r--r-- 1 hande users 2640 2007-09-18 22:46 sun1.txt

Perhaps you are not interested in seeing all the files in your directory but those which start with the word “sun”.

hande@p439a:~$ ls -l sun*-rw-r--r-- 1 hande users 563 2007-09-18 22:46 sun1.txt

17

-rw-r--r-- 1 hande users 1462 2007-09-18 22:46 sun2.txt-rw-r--r-- 1 hande users 1992 2007-09-18 22:46 sun3.txt-rw-r--r-- 1 hande users 2640 2007-09-18 22:46 sun4.txt

Or equally, you want to view just those files which have a .txt extension.

hande@p439a:~$ ls -l *.txt-rw-r--r-- 1 hande users 3048 2007-09-18 22:34 ideas.txt-rw-r--r-- 1 hande users 563 2007-09-18 22:46 sun1.txt-rw-r--r-- 1 hande users 1462 2007-09-18 22:46 sun2.txt-rw-r--r-- 1 hande users 1992 2007-09-18 22:46 sun3.txt-rw-r--r-- 1 hande users 2640 2007-09-18 22:46 sun4.txt

The asterisk(*) is called a wild-card and it is used to list files that start with, end in or contain a given pattern.Now that we are convinced that the files exist and they are not empty, let’s take a look at one of them. When

you try to use the command cat like we did before, you will see that the file is too long to fit into a single screen.What would be nice is to be able to control the portion of the file that is shown on screen. This can be done withthe command less.

hande@p439a:~$ cat sun3.txt...... Runs off the screen!of the Moon is an awesome experience. For a few precious minutes it getsdark in the middle of the day. The stars come out. The animals and birdsthink it’s time to sleep. And you can see the solar corona. It is wellworth a major journey.hande@p439a:~$ less sun3.txt......total eclipse of the Sun. Partial eclipses are visible over a wide area ofthe Earth but the region from which a total eclipse is visible, called thepath of totality, is very narrow, just a few kilometers (though it issun3.txt lines 1-23/32 69% Scroll using up and down arrows.

Next, let’s count the number of lines, words and bytes in the file sun2.txt. This can be done using the wc command,which stands for word count.

hande@p439a:~$ wc sun2.txt96 1086 5976 sun2.txt

Doing research, we often find ourselves looking for a particular word or a pattern in a given file. This could be, forexample, energy or result. While this search can be done by visual inspection if the file is small, for larger files, thiswould be impossible. Linux has a very powerful command, grep, for conducting pattern search. It takes as argumentsthe pattern being searched and a file name. If called without any options, it prints lines containing the pattern onthe screen. Suppose we want to know the number of times the word “sun” occurs in the file sun2.txt.

hande@p439a:~$ grep Sun sun2.txtThe surface of the Sun, called the photosphere, is at a temperature ofabout 5800 K. Sunspots are "cool" regions, only 3800 K (they look dark only.....

Calling grep with the option -n causes the number of the line to be displayed where the given pattern occurs.

hande@p439a:~$ grep -n Sun sun2.txt1:The surface of the Sun, called the photosphere, is at a temperature of2:about 5800 K. Sunspots are "cool" regions, only 3800 K (they look dark only.....

If we are only interested in the number of occurrences of the pattern, we can use the -c option.

18

hande@p439a:~$ grep -c Sun sun2.txt33

You might have noticed that grep not only finds the instances of the word “sun” but also words which “sun” is apart of, such as “Sunspot” (Notice also that grep is not case-sensitive, but that can be modified through options).Suppose now that we would like to count all occurrences of the word “Sun” but not Sunspot. This can again beachieved by piping the output of the grep from above to a second grep, this time using the -v option, which matchesnonoccurences of the given pattern. Because we are only interested in the number, we can do a further pipe to wc, asbefore.

hande@p439a:~$ grep Sun sun2.txt | grep -v spot | wc -l24

Exercise 6 : Learn to write Shell scipts.

As mentioned previously, the shell not only provides us with a very useful and versatile set of commands, it alsogives us the option of collecting a set of commands in executable files and creating, in a way, our on commands. Thisis useful if we find ourselves repeating the set of commands several times. Such files are called shell scripts and theyusually carry the extension “.sh”. Shell scripts may be quite complicated but here we shall cover only two importantaspects : the control structures and shell variables.

Let’s go back to our earlier example of the sun.txt files and let’s say that we would now like to do the operation ofcounting the occurrences of the word “Sun” in all the files that start with “sun” and also separate the occurrences ofSunspot from them. In addition, we would also like to know how many occurrences we get for which file. Foreseeingthat we might need to do this several times, it’s better to write the shell instructions in a file. Let’s go to emacs again.

hande@p439a:~$ emacs count.sh &

Shell scripts are run from the command line (or terminal) using the command sh.

hande@p439a:~$ sh count.shsun1.txt : 5sun2.txt : 24sun3.txt : 8sun4.txt : 8

So we see that there are 5, 24, 8 and 8 occurrences respectively of Sun but not Sunspot in these files. In the simpleshell script above, we recognize the line grep... from the earlier example. The only difference is that we know definea shell variable, called file and let it run over all the files starting with the word sun using the for, do, done

19

construct. This, as mentioned before, is made possible by the asterisk, *. In shell, once a variable is assigned, it canbe recalled using the dollar sign, $. The for loop assigns one by one all the files that start with the word sun to thevariable file and between the keywords do and done, we specify what we’d like shell to do with these files. For each$file, we first display the name of the file using the echo command. The -n option suppresses the newline. Thenext line is already familiar from the previous example and we finally close our loop with the keyword done.

In fact, we can make our shell script even more similar to a real shell command by supplying a command lineargument. Let’s say that we would like to have the flexibility to tell the shell script the prefix of the file, in this case“sun”. We can modify the script in the following way.

hande@p439a:~$ sh count.sh sunThe prefix you have supplied is sun.sun1.txt : 5sun2.txt : 24sun3.txt : 8sun4.txt : 8

Here $1 refers to the first command line argument supplied to the shell script. Clearly, you can have any numberyou like. Once supplied, command line arguments are treated in exactly the same way as regular variables, using the$ sign whenever they need to be expanded.

IMPORTANT : A good programmer always writes sufficient comments in their programs. A comment is a line orlines that start with a special character (# for shell scripts) which are not interpreted by the shell but which provideexplanation of the usually cryptic program. This is as much for the readability of the code by other people as by theprogrammer because it’s guaranteed that to months later he will have forgotten most of what he’s written.

[1] Condensed from Linux Magazine, A Tour of the Linux System by Martin Streicher, Thursday, August 23rd, 2007[2] You can find a nice comparison of the Linux filesystem to the Windows filesystem at

http://polishlinux.org/first-steps/filesystem-and-disks/

lecture i : introductory linux and shell scripts -...

Documents