unix - class7 - awk

9
UNIX - awk Data extraction and formatted Reporting Tool Presentation By Nihar R Paital

Upload: nihar-ranjan-paital

Post on 29-Jan-2018

1.531 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Unix - Class7 - awk

UNIX - awk

Data extraction and formatted Reporting Tool

Presentation By

Nihar R Paital

Page 2: Unix - Class7 - awk

Nihar R Paital

Introduction

Developer : Alfred Aho Peter Weinberger Brian Kernighan

Appears in : Version 7 UNIX onwards

Developed during : 1970 s

Developed at : Bell Labs

Category : UNIX Utility

Supported by : All UNIX flavors

Page 3: Unix - Class7 - awk

Nihar R Paital

Definition

The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports.

Page 4: Unix - Class7 - awk

Nihar R Paital

Formatting using input file$ awk {print $n} FilenameExample:$ awk {print $1} awk.txt > awk.txt.bak

Formatting using a filter in a pipeline$ generate_data | awk {print $1}Example:$ cat awk.txt | awk {print $1} > awk.txt.bak

Before proceeding to next slide please create a file named awk.txt with following Contents.

07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot"123.125.71.19 [28/Sep/2010:04:20:11] "GET / HTTP/1.1" 304 - "Baiduspider"

It performs basic text formatting on an inputstream ( A file / input from a pipeline )

Page 5: Unix - Class7 - awk

Nihar R Paital

Basic but important for awk

Syntax : awk {print $n} filename Generate data : awk {print $n}

Awk programs will start with a "{" and end with a "}"

$0 is the entire line

Awk parses the line in to fields for you automatically, using any whitespace (space, tab) as a delimiter.

Fields of a regular file will be available using $1,$2,$3 … etc

NF : It is a special Variable contains the number of fields in the current line. We can print the last field by printing the field $NF

NR : It prints the row number being currently processed.

Page 6: Unix - Class7 - awk

Nihar R Paital

Basic Examples

$ awk '{print $0}' awk.txtIt will print all the lines as they are in File

$ echo 'this is a test' | awk '{print $3}'

It will print 'a'

$ echo 'this is a test' | awk '{print $NF}'

It prints "test"

$ awk '{print $1, $(NF-2) }' awk.txt It will print the last 3rd word of file awk.txt

$ awk '{print NR ") " $1 " -> " $(NF-2)}‘Output:

1) 07.46.199.184 -> 200

2) 123.125.71.19 -> 304

Page 7: Unix - Class7 - awk

Nihar R Paital

Advance use of AWK

$ awk '{print $2}' logs.txt Output:

[28/Sep/2010:04:08:20][28/Sep/2010:04:20:11]

The date field is separated by "/" and ":" characters. Suppose I want to print like [28/Sep/2010[28/Sep/2010

$ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' Output:

[28/Sep/2010[28/Sep/2010

Here FS=“:” means Field Separator as colon(:)

$ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' | sed 's/\[//' Output:

28/Sep/201028/Sep/2010

Here We are Substituting [ with NULL value

Page 8: Unix - Class7 - awk

Nihar R Paital

Advance Use of AWK

If I want to return only the 200 status lines $ awk '{if ($(NF-2) == "200") {print $0}}' logs.txt

Output:07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot"

$ awk '{a+=$(NF-2); print "Total so far:", a}' logs.txt

Output:Total so far: 200

Total so far: 504

$ awk '{a+=$(NF-2)}END{print "Total:", a}' logs.txt

Output:

Total: 504

Page 9: Unix - Class7 - awk

Nihar R Paital