10 the awk programming language

31
10 The Awk Programming Language Mauro Jaskelioff (originally by Gail Hopkins)

Upload: robbin

Post on 13-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

10 The Awk Programming Language. Mauro Jaskelioff (originally by Gail Hopkins). Introduction. What is awk? Command line syntax Patterns and procedures Commands Variables Built in variables Variable assignment Arrays Defining functions. What is awk?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 10 The Awk Programming Language

10 The Awk Programming Language

Mauro Jaskelioff(originally by Gail Hopkins)

Page 2: 10 The Awk Programming Language

Introduction

• What is awk?• Command line syntax• Patterns and procedures• Commands• Variables

– Built in variables– Variable assignment

• Arrays• Defining functions

Page 3: 10 The Awk Programming Language

What is awk?

• A pattern matching program for processing files

• There are different versions of awk:– awk - the original version, sometimes called

old awk, or oawk– New awk - additional features added in 1984.

Often called nawk– GNU awk (gawk)- has even more features

• The version installed in unnc-cslinux is GNU awk 3.1.3

Page 4: 10 The Awk Programming Language

What does awk do?

• A text file is thought of as being made up of records and fields

• On this file you can:– Do arithmetic and string operations– Use loops and conditionals (if-then-else)– Produce formatted reports

Page 5: 10 The Awk Programming Language

What does awk do? (2)

• awk (new awk) also allows you to:– Execute UNIX commands from within a

script– Process the output from UNIX

commands– Work with multiple input streams– Define functions

Page 6: 10 The Awk Programming Language

What does awk do? (3)

• awk can also be combined with sed and shell scripting!– Shell is very easy and quick to write, but

it lacks functionality.– sed, awk and shell are designed to be

integrated• Simply invoke the sed or awk interpreter

from within the shell script, rather than from the command line!

Page 7: 10 The Awk Programming Language

awk Command Line Syntax

• From the command line, you can invoke awk in two ways:– awk [options] ‘script’ var=value file(s)

• Here, a script is specified directly from the command line

– awk [options] -f scriptfile var=value file(s)• Here, a script is stored in a scriptfile and

specified with the -f flag• nawk allows you to specify more than one

scriptfile at a time (-f scriptfile1 -f scriptfile2, etc.)

Page 8: 10 The Awk Programming Language

awk Command Line Syntax - assigning values to variables• You can assign a value to a variable on the

command line (nawk only):– This value can be one of three things:

• A literal, e.g. count=5– awk -f scriptFile count=5

• A shell variable, e.g. $count– awk -f scriptFile count=$count

• A command substitution, e.g. `cmd`– awk -f scriptFile count=`who | wc-l`

• The value is ONLY available after the BEGIN statement within the script is executed– To make the value available to BEGIN

statement:• awk -v count=5 -f scriptFile

Page 9: 10 The Awk Programming Language

awk Command Line Syntax - giving awk a file to operate on

• awk operates on one or more more files

• You do not have to give awk any files to operate on– Either don’t specify one– Or specify none using ‘-’

• awk -f scriptFile -

• If you don’t give awk a file to operate on it takes input from STDIN

Page 10: 10 The Awk Programming Language

awk Command Line Syntax - Field separators

• You can set a field separator– In other words, a symbol (or even a regular

expression in nawk) that should appear between fields of a record

• Do this using -F• E.g. awk –F’;’ –f scriptFile count=5 myFile

– Would look for fields in a record (or line) in myFile separated by a semi-colon

– Also awk –f scriptFile FS=’;’ count=5 myFile• Fields are referred to by the variables $1, $2,

etc.– $0 means the whole record

Page 11: 10 The Awk Programming Language

Field Separators - example

• Suppose you want to extract and print the first three (colon-separated) fields of each record in /etc/passwd, on separate lines

$ head /etc/passwdroot:x:0:0:root:/root:/bin/bashrootnir:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7:lp:/var/spool/lpd:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown:/sbin:/sbin/shutdownhalt:x:7:0:halt:/sbin:/sbin/haltmail:x:8:12:mail:/var/spool/mail:/sbin/nologin

Page 12: 10 The Awk Programming Language

Field Separators - example (2)

$ awk -F: '{print $1; print $2; print $3}' /etc/passwdrootx0rootnirx0binx1daemonx2admx3…

Look for fields separated by a colon

Print the first ($1), second ($2)

and third ($3) fieldLook in the file

/etc/passwd

Page 13: 10 The Awk Programming Language

Patterns and Procedures• awk scripts consist of patterns and

procedures:

• Patterns and procedures are optional– If a pattern is missing, the procedure applies to

all lines– If the procedure is missing, the matched line

(matched by pattern) is printed

awk -F: ‘/^...:/ {print $1}’ /etc/passwd

Pattern Procedure

Page 14: 10 The Awk Programming Language

Patterns

A pattern can be:• /regular expression/

– Use the metacharacters we have already seen– ^ and $ mean the beginning and end of a

string (e.g. the fields) NOT beginning/end of a lineawk -F: ‘/^...:/ {print $1}’ /etc/passwd

• Relational expression– Use relational operators, e.g. $1 > $2– Can do numeric or string comparisons

awk -F: ‘$1==“gdm” {print $0}’ /etc/passwd

Page 15: 10 The Awk Programming Language

Patterns (2)• Pattern-matching expression

– E.g. quoted strings, numbers, operators, defined variables…

– ~ means match, !~ means don’t matchawk -F: '$1 ~ /.dm.*/ {print $0}' /etc/passwd

• BEGIN– Specifies procedures that take place before

the first input line is processedawk ‘BEGIN {print “Version 1.0”}’ dataFile

• END– Specifies procedures that take place after

the last input record is readawk ‘END {print “end of data”}’ dataFile

Page 16: 10 The Awk Programming Language

Procedures

• Consist of one or more:– Commands– Functions– Variable assignments

• These are separated by newlines or semi-colons and are contained within curly brackets { }

Page 17: 10 The Awk Programming Language

Commands used with Procedures

• There are 5 types of commands:– Assignments of variables or arrays– Commands that print– Built-in functions– Control-flow commands– User-defined functions (in nawk only)

Page 18: 10 The Awk Programming Language

Some Examples usingPatterns and Procedures

awk –F: '{print $1}' /etc/passwd -print first field of each

line in /etc/passwd

awk '/root/' /etc/passwd-print all lines in

/etc/passwd that contain the pattern “root”

awk -F: '/root/ {print $1}' /etc/passwd -print first field of linesthat contain “root” in

/etc/passwd

awk ‘{print NR}’-print the number of

the current record

Page 19: 10 The Awk Programming Language

awk Built-in Variables

• awk has a number of built in variables:– FILENAME - current filename– FS - Field separator– NF - Number of fields in current record– NR - Number of current record– RS - Record separator– $0 - Entire input record– $n - nth field in current record

Page 20: 10 The Awk Programming Language

awk OperatorsSymbol Meaning$ Field reference

++ -- Increment, decrement

+ - ! Addition, subtraction, negation

* / % Multiplication, division, modulus

< <= > >= != == Relational operators

~ !~ Match regular expression and negation

In Array membership

&& || Logical and, Logical or

?: If-then-else for expressions

x == y ? “Equal” : “Not equal”

= += -= *= /= %= Assignment

Page 21: 10 The Awk Programming Language

Variable Assignments

• Assign variables with an =, E.g.:– FS = “:”– var1 = count+2– var2 = max-min– var3 = 2 < 3 ? 4 : 5

• Access variables using just the name– {print var3}

• What’s the result?

Page 22: 10 The Awk Programming Language

Arrays in awk

• awk has arrays with elements subscripted with strings (associative arrays)

• Assign arrays in one of two ways:– Name them in an assignment statement

• myArray[i]=n++

– Use the split() function• n=split(input, words, " ")

Page 23: 10 The Awk Programming Language

Reading elements in an array

• Using a for loop:

• Using the operator in:

• …use this to see if an index exists. (nawk)

for (item in array)print array[item]

if (index in array)...

Page 24: 10 The Awk Programming Language

Defining Functions in awk

• You can define your own functions in awk, in much the same way as you define a function in C or Java– Thus code that is to be repeated can be

grouped together inside a function– Allows code reuse!– NOTE: when calling a function you have

defined yourself, no space is allowed between the function name and the left bracket.

Page 25: 10 The Awk Programming Language

An Example using a Function and an Array

# capitalise the first letter of each word in a stringfunction capitalise(input){

result= ""n=split(input, words, " ")for (i=1; i <=n; i++){

w = words[i]w = toupper(substr(w, 1, 1)) substr(w, 2)if (i > 1)

result = result " "result = result w

}return result

} # this is the main program{ print capitalise($0) }

Page 26: 10 The Awk Programming Language

Break-down of Example

# capitalise the first letter of each word in a stringfunction capitalise(input){

…Variable to be used in function

- input contains whatever the caller called the function with

Page 27: 10 The Awk Programming Language

Break-down of Example (2)

…result= ""n=split(input, words, " ")

Set result to be an empty string

Take the input and split it up into the array “words” - divide the input wherever there is a space

n is the result returned by the split command and contains the number of elements in the array “words”

Page 28: 10 The Awk Programming Language

Break-down of Example (3)

…for (i=1; i <=n; i++){

w = words[i]w = toupper(substr(w, 1, 1)) substr(w, 2)if (i > 1)

result = result " "result = result w

}return result

}…

Assign element to w

For each element of array from 1 to the number of elements…

Tag a space on to the end of the result string

Tag the next word on to the end of the result string

Take remainder of string starting at 2nd character and append it to capitalised character

Take the substring which starts at the first character and has a length of 1 and capitalise using toupper()

Page 29: 10 The Awk Programming Language

Break-down of Example (4)

…# this is the main program{ print capitalise($0) }

This is a comment in awk

Call the capitalise function with the entire input record. Print the result.

Page 30: 10 The Awk Programming Language

Output from Example

• Given the input file:

• …our Capitalise function will output:

In theory there is no difference between theory and practice, but in practice there is

In Theory There Is No Difference Between Theory And Practice, But In Practice There Is

Page 31: 10 The Awk Programming Language

Summary

• An introduction to awk• Using awk patterns and procedures

on the command line• Writing awk scripts