chapter five advanced file processing. 2 lesson a selecting, manipulating, and formatting...
TRANSCRIPT
Chapter FiveChapter Five
Advanced File ProcessingAdvanced File Processing
22
Lesson ALesson A
Selecting, Manipulating, and Selecting, Manipulating, and Formatting InformationFormatting Information
33
ObjectivesObjectives
Use the pipe operator to redirect the Use the pipe operator to redirect the output of one command to another output of one command to another commandcommand
Use the grep command to search for Use the grep command to search for a specified pattern in a filea specified pattern in a file
Use the uniq command to remove Use the uniq command to remove duplicate lines from a fileduplicate lines from a file
44
ObjectivesObjectives
Use the comm and diff commands to Use the comm and diff commands to compare two files compare two files
Use the wc command to count words, Use the wc command to count words, characters and lines in a filecharacters and lines in a file
Use the manipulate and format Use the manipulate and format commands: sed, tr, and prcommands: sed, tr, and pr
55
Advancing YourAdvancing YourFile Processing SkillsFile Processing Skills
The select commands, which extract dataThe select commands, which extract data
66
Advancing YourAdvancing YourFile Processing SkillsFile Processing Skills
The manipulation and transformation commands alter The manipulation and transformation commands alter and transform into useful and appealing formats dataand transform into useful and appealing formats data
77
Using the Select Using the Select CommandsCommands
Select commands: grep, diff, uniq, comm, Select commands: grep, diff, uniq, comm, wcwc
Using Pipes – The pipe operator (|) Using Pipes – The pipe operator (|) redirects the output of one command to redirects the output of one command to the input of another command the input of another command – An example would be to redirect the output of An example would be to redirect the output of
the ls command to the more commandthe ls command to the more command– The pipe operator can connect several The pipe operator can connect several
commands on the same command linecommands on the same command line
88
Using PipesUsing Pipes
Using pipe operators and connecting commands is useful when viewing directory information
99
Using the grep CommandUsing the grep Command
Used to search for a specific pattern in a file, Used to search for a specific pattern in a file, such as a word or phrasesuch as a word or phrase
grep’s options and wildcard support allow for grep’s options and wildcard support allow for powerful search operationspowerful search operations
You can increase grep’s usefulness by You can increase grep’s usefulness by combining with other commands, such as head combining with other commands, such as head or tailor tail
1010
Using the grep CommandUsing the grep Command
grep can take input from other commands and also be directed to provide input for other commands
1111
Using the uniq CommandUsing the uniq Command
Removes duplicate lines from a fileRemoves duplicate lines from a file
It compares only consecutive lines, therefore It compares only consecutive lines, therefore uniq requires sorted inputuniq requires sorted input
Uniq has an option that allows you to generate Uniq has an option that allows you to generate output that contains a copy of each line that has output that contains a copy of each line that has a duplicate a duplicate
1212
Using the comm CommandUsing the comm Command
Used to identify duplicate lines in sorted filesUsed to identify duplicate lines in sorted files
Unlike uniq, it does not remove duplicates, and it Unlike uniq, it does not remove duplicates, and it works with two files rather than oneworks with two files rather than one
It compares lines common to file1 and file2, and It compares lines common to file1 and file2, and produces three column outputproduces three column output– Column one contains lines found only in file1Column one contains lines found only in file1– Column two contains lines found only in file2Column two contains lines found only in file2– Column three contains lines found in both filesColumn three contains lines found in both files
1313
Using the diff CommandUsing the diff Command
Attempts to determine the minimal Attempts to determine the minimal changes needed to convert file1 to file2changes needed to convert file1 to file2
The output displays the line(s) that differThe output displays the line(s) that differ
The associated codes in the output The associated codes in the output indicate that in order for the files to match, indicate that in order for the files to match, specific lines must be added or deletedspecific lines must be added or deleted
1414
Using the wc CommandUsing the wc Command
Used to count the number of lines, words, and Used to count the number of lines, words, and bytes or characters in text filesbytes or characters in text files
You may specify all three options in one You may specify all three options in one issuance of the commandissuance of the command
If you don’t specify any options, you see counts If you don’t specify any options, you see counts of lines, words, and characters (in that order)of lines, words, and characters (in that order)
1515
Using the wc CommandUsing the wc Command
The options for the wc command:
–l for lines
–w for words
–c for characters
1616
Using the Manipulate and Using the Manipulate and Format CommandsFormat Commands
These commands are: sed, tr, prThese commands are: sed, tr, pr
Used to edit and transform the Used to edit and transform the appearance of data before it is appearance of data before it is displayed or printeddisplayed or printed
1717
Introducing sedIntroducing sed
sed is a UNIX editor that allows you to make sed is a UNIX editor that allows you to make global changes to large filesglobal changes to large files
Minimum requirements are an input file and a Minimum requirements are an input file and a command that lets sed know what actions to command that lets sed know what actions to apply to the fileapply to the file
sed commands have two general formssed commands have two general forms– Specify an editing command on the command lineSpecify an editing command on the command line– Specify a script file containing sed commandsSpecify a script file containing sed commands
1818
Introducing sedIntroducing sed
The many options of sed allow you to create new files containing the specific data you specify
1919
Translating CharactersTranslating CharactersUsing the tr commandUsing the tr command
tr copies data from the standard input to tr copies data from the standard input to the standard output, substituting or the standard output, substituting or deleting characters specified by options deleting characters specified by options and patterns and patterns
The patterns are strings and the strings The patterns are strings and the strings are sets of characters are sets of characters
A popular use of tr is converting lowercase A popular use of tr is converting lowercase characters to uppercasecharacters to uppercase
2020
Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output
pr prints specified files on the standard pr prints specified files on the standard output in paginated formoutput in paginated form
By default, pr formats the specified files By default, pr formats the specified files into single-column pages of 66 linesinto single-column pages of 66 lines
Each page has a five-line header, its latest Each page has a five-line header, its latest modification date, current page, and five-modification date, current page, and five-line trailer consisting of blank linesline trailer consisting of blank lines
2121
Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output
2222
Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output
2323
Lesson BLesson B
Using UNIX File-Processing ToolsUsing UNIX File-Processing Tools
to Create an Applicationto Create an Application
2424
ObjectivesObjectives
Design a new file-processing Design a new file-processing applicationapplication
Design and create files to implement Design and create files to implement the applicationthe application
Use awk to generate formatted outputUse awk to generate formatted output
2525
ObjectivesObjectives
Use cut, sort, and join to organize and Use cut, sort, and join to organize and transform selected file informationtransform selected file information
Develop customized shell scripts to extract Develop customized shell scripts to extract and combine file dataand combine file data
Test individual shell scripts and combine Test individual shell scripts and combine all scripts into a final shell programall scripts into a final shell program
2626
Designing a New File-Designing a New File-Processing ApplicationProcessing Application
The most important phase in developing a The most important phase in developing a new application is the designnew application is the design
The design defines the information an The design defines the information an applications needs to produceapplications needs to produce
The design also defines how to organize The design also defines how to organize this information into files, records, and this information into files, records, and fields, which are called logical structuresfields, which are called logical structures
2727
Designing RecordsDesigning Records
The first task is to define the fields in the The first task is to define the fields in the records and produce a record layoutrecords and produce a record layout
A record layout identifies each field by A record layout identifies each field by name and data type (numeric or name and data type (numeric or nonnumeric)nonnumeric)
Design the file record to store only those Design the file record to store only those fields relevant to the record’s primary fields relevant to the record’s primary purposepurpose
2828
Linking Files with KeysLinking Files with Keys
Multiple files are joined by a key – a common Multiple files are joined by a key – a common field that each of the linked files sharefield that each of the linked files share
Another important task in the design phase is to Another important task in the design phase is to plan a way to join the filesplan a way to join the files
The flexibility to gather information from multiple The flexibility to gather information from multiple files comprised of simple, short records is the files comprised of simple, short records is the essence of a relational database system. UNIX essence of a relational database system. UNIX provides several commands providing this provides several commands providing this flexibility flexibility
2929
3030
Creating the ProgrammerCreating the Programmerand Project Files and Project Files
With the basic design complete, you now With the basic design complete, you now implement your application designimplement your application design
UNIX file processing predominantly uses UNIX file processing predominantly uses flat files. Working with these files is easy, flat files. Working with these files is easy, because you can create and manipulate because you can create and manipulate them with text editors like vi and Emacs them with text editors like vi and Emacs
3131
3232
Formatting OutputFormatting Output
The awk command is used to prepare The awk command is used to prepare formatted outputformatted output
For the purposes of developing a new file-For the purposes of developing a new file-processing application, we will focus processing application, we will focus primarily on the printf action of the awk primarily on the printf action of the awk commandcommand
3333
Formatting OutputFormatting Output
Awk provides a shortcut to other UNIX commands
3434
Using a Shell Script toUsing a Shell Script toImplement the ApplicationImplement the Application
Shell scripts should contain:Shell scripts should contain:– The commands to executeThe commands to execute– Comments to identify and explain the script so Comments to identify and explain the script so
that users or programmers other than the that users or programmers other than the author can understand how it worksauthor can understand how it works
Use the pound (#) character to mark Use the pound (#) character to mark comments in a script filecomments in a script file
3535
Running a Shell ScriptRunning a Shell Script
You can run a shell script in virtually any You can run a shell script in virtually any shell that you have on your systemshell that you have on your system
The Bash shell accepts more variations in The Bash shell accepts more variations in command structures that other shellscommand structures that other shells
Run the script by typing sh followed by the Run the script by typing sh followed by the name of the script, or make the script name of the script, or make the script executable and type ./ prior to the script executable and type ./ prior to the script namename
3636
Putting it all together toPutting it all together toProduce the ReportProduce the Report
An effective way to develop applications is An effective way to develop applications is to combine many small scripts in a larger to combine many small scripts in a larger script filescript file
Have the last script added to the larger Have the last script added to the larger script print a report indicating script script print a report indicating script functions and resultsfunctions and results
3737
Putting it all together toPutting it all together toProduce the ReportProduce the Report
3838
Putting it all together toPutting it all together toProduce the ReportProduce the Report
3939
Chapter SummaryChapter Summary
The UNIX file-processing commands can be The UNIX file-processing commands can be organized into two categories: (1) select and (2) organized into two categories: (1) select and (2) manipulation and transformationmanipulation and transformationThe uniq command removes duplicate lines from a The uniq command removes duplicate lines from a sorted filesorted fileThe comm command compares lines common to The comm command compares lines common to file1 and file2, and produces output that shows the file1 and file2, and produces output that shows the variances between the twovariances between the twoThe diff command attempts to determine the The diff command attempts to determine the minimal set of changes needed to convert file1 into minimal set of changes needed to convert file1 into file2file2
4040
Chapter SummaryChapter Summary
The tr command copies data read from the The tr command copies data read from the standard input to the standard output, standard input to the standard output, substituting or deleting characters specifiedsubstituting or deleting characters specifiedThe se command is a file editor designed to The se command is a file editor designed to make global changes to large filesmake global changes to large filesThe pr command prints the standard output in The pr command prints the standard output in pagespagesThe design of a file-processing application The design of a file-processing application reflects what the application needs to producereflects what the application needs to produceUse record layout to identify each field by Use record layout to identify each field by name and data typename and data type
4141
Chapter SummaryChapter SummaryShell programs should contain commands to Shell programs should contain commands to execute programs and comments to identify execute programs and comments to identify and explain the programs. The pound (#) and explain the programs. The pound (#) character denotes commentscharacter denotes comments
Write shell scripts in stages so that you can Write shell scripts in stages so that you can test each part before combining them into one test each part before combining them into one script. Using small shell scripts and combining script. Using small shell scripts and combining them in a final shell script file is an effective them in a final shell script file is an effective way to develop applicationsway to develop applications
4242
4343