compiler and shell language

37
Compiler From Wikipedia, the free encyclopedia Jump to: navigation , search This article is about the computing term. For the anime, see Compiler (anime) . A diagram of the operation of a typical multi-language, multi- target compiler A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code ). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code ). If the compiled program can run on a computer whose CPU or operating system is different from the one on which the compiler runs, the compiler is known as a cross-compiler . A program that translates from a low level language to a higher level one is a decompiler . A program that translates between high-level languages is usually called a language translator ,

Upload: gauraarnav9

Post on 10-Aug-2015

25 views

Category:

Documents


2 download

DESCRIPTION

description of shell and compiler in detail.

TRANSCRIPT

Page 1: Compiler and shell language

CompilerFrom Wikipedia, the free encyclopediaJump to: navigation, search This article is about the computing term. For the anime, see Compiler (anime).

A diagram of the operation of a typical multi-language, multi-target compiler

A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). If the compiled program can run on a computer whose CPU or operating system is different from the one on which the compiler runs, the compiler is known as a cross-compiler. A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code optimization.

Program faults caused by incorrect compiler behavior can be very difficult to track down and work around; therefore, compiler implementors invest significant effort to ensure the correctness of their software.

Page 2: Compiler and shell language

The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser.

Contents

1 History o 1.1 Compilers in education

2 Compilation o 2.1 Structure of a compiler

3 Compiler output o 3.1 Compiled versus interpreted languages o 3.2 Hardware compilation

4 Compiler construction o 4.1 One-pass versus multi-pass compilers o 4.2 Front end o 4.3 Back end

5 Compiler correctness 6 Related techniques 7 International conferences and organizations 8 See also 9 Notes 10 References 11 External links

History

Main article: History of compiler construction

Software for early computers was primarily written in assembly language. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the costs of writing a compiler. The limited memory capacity of early computers led to substantial technical challenges when designing the first compilers.

Towards the end of the 1950s machine-independent programming languages were first proposed. Subsequently several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960.[1]

In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more complex.

Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created in 1962 for Lisp by Tim Hart and Mike Levin at MIT.[2] Since the 1970s it has become common practice

Page 3: Compiler and shell language

to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problem—the first such compiler for a language must be compiled either by hand or by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.

Compilers in education

Compiler construction and compiler optimization are taught at universities and schools as part of a computer science curriculum.[3] Such courses are usually supplemented with the implementation of a compiler for an educational programming language. A well-documented example is Niklaus Wirth's PL/0 compiler, which Wirth used to teach compiler construction in the 1970s.[4] In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field:

1. Program development by stepwise refinement (also the title of a 1971 paper by Wirth)[5]

2. The use of a recursive descent parser3. The use of EBNF to specify the syntax of a language4. A code generator producing portable P-code5. The use of T-diagrams [6] in the formal description of the bootstrapping problem

Compilation

Compilers enabled the development of programs that are machine-independent. Before the development of FORTRAN (FORmula TRANslator), the first higher-level language, in the 1950s, machine-dependent assembly language was widely used. While assembly language produces more reusable and relocatable programs than machine code on the same architecture, it has to be modified or rewritten if the program is to be executed on different computer hardware architecture.

With the advance of high-level programming languages that followed FORTRAN, such as COBOL, C, and BASIC, programmers could write machine-independent source programs. A compiler translates the high-level source programs into target programs in machine languages for the specific hardwares. Once the target program is generated, the user can execute the program.

Structure of a compiler

Compilers bridge source programs in high-level languages with the underlying hardware. A compiler requires 1) determining the correctness of the syntax of programs, 2) generating correct and efficient object code, 3) run-time organization, and 4) formatting output according to assembler and/or linker conventions. A compiler consists of three main parts: the frontend, the middle-end, and the backend.

The front end checks whether the program is correctly written in terms of the programming language syntax and semantics. Here legal and illegal programs are recognized. Errors are reported, if any, in a useful way. Type checking is also performed by collecting type

Page 4: Compiler and shell language

information. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end.

The middle end is where optimization takes place. Typical transformations for optimization are removal of useless or unreachable code, discovery and propagation of constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the context. The middle-end generates another IR for the following backend. Most optimization efforts are focused on this part.

The back end is responsible for translating the IR from the middle-end into assembly code. The target instruction(s) are chosen for each IR instruction. Register allocation assigns processor registers for the program variables where possible. The backend utilizes the hardware by figuring out how to keep parallel execution units busy, filling delay slots, and so on. Although most algorithms for optimization are in NP, heuristic techniques are well-developed.

Compiler output

One classification of compilers is by the platform on which their generated code executes. This is known as the target platform.

A native or hosted compiler is one which output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a cross compiler is designed to run on a different platform. Cross compilers are often used when developing software for embedded systems that are not intended to support a software development environment.

The output of a compiler that produces code for a virtual machine (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason such compilers are not usually classified as native or cross compilers.

The lower level language that is the target of a compiler may itself be a high-level programming language. C, often viewed as some sort of portable assembler, can also be the target language of a compiler. E.g.: Cfront, the original compiler for C++ used C as target language. The C created by such a compiler is usually not intended to be read and maintained by humans. So indent style and pretty C intermediate code are irrelevant. Some features of C turn it into a good target language. E.g.: C code with #line directives can be generated to support debugging of the original source.

Compiled versus interpreted languages

Higher-level programming languages usually appear with a type of translation in mind: either designed as compiled language or interpreted language. However, in practice there is rarely anything about a language that requires it to be exclusively compiled or exclusively interpreted, although it is possible to design languages that rely on re-interpretation at run time. The categorization usually reflects the most popular or widespread implementations of a language — for instance, BASIC is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.

Page 5: Compiler and shell language

Interpretation does not replace compilation completely. It only hides it from the user and makes it gradual. Even though an interpreter can itself be interpreted, a directly executed program is needed somewhere at the bottom of the stack (see machine language). Modern trends toward just-in-time compilation and bytecode interpretation at times blur the traditional categorizations of compilers and interpreters.

Some language specifications spell out that implementations must include a compilation facility; for example, Common Lisp. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a runtime library that includes a version of the compiler itself.

Hardware compilation

The output of some compilers may target computer hardware at a very low level, for example a Field Programmable Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). Such compilers are said to be hardware compilers or synthesis tools because the source code they compile effectively controls the final configuration of the hardware and how it operates; the output of the compilation is not instructions that are executed in sequence - only an interconnection of transistors or lookup tables. For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity, Synopsys and other vendors.

Compiler construction

This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2010)

Main article: Compiler construction

In the early days, the approach taken to compiler design used to be directly affected by the complexity of the processing, the experience of the person(s) designing it, and the resources available.

A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. When the source language is large and complex, and high quality output is required, the design may be split into a number of relatively independent phases. Having separate phases means development can be parceled up into small parts and given to different people. It also becomes much easier to replace a single phase by an improved one, or to insert new phases later (e.g., additional optimizations).

The division of the compilation processes into phases was championed by the Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon University. This project introduced the terms front end, middle end, and back end.

Page 6: Compiler and shell language

All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at which these two ends meet is open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).

The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.

The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.

This front-end/middle/back-end approach makes it possible to combine front ends for different languages with back ends for different CPUs. Practical examples of this approach are the GNU Compiler Collection, LLVM, and the Amsterdam Compiler Kit, which have multiple front-ends, shared analysis and multiple back-ends.

One-pass versus multi-pass compilers

Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.

The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler and one-pass compilers generally perform compilations faster than multi-pass compilers. Thus, partly driven by the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., Pascal).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.

Page 7: Compiler and shell language

While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:

A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements).

Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations

o This Prolog machine is also known as the Warren Abstract Machine (or WAM).

o Bytecode compilers for Java, Python, and many more are also a subtype of this.

Just-in-time compiler , used by Smalltalk and Java systems, and also by Microsoft .NET's Common Intermediate Language (CIL)

o Applications are delivered in bytecode, which is compiled to native machine code just prior to execution.

Front end

The compiler frontend analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over several phases, which includes some of the following:

1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some implementations of ALGOL and Coral 66) are examples of stropped languages which compilers would have a Line Reconstruction phase.

2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.

3. Preprocessing . Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.

4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.

Page 8: Compiler and shell language

5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

Back end

The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.

The main phases of the back end include the following:

1. Analysis : This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use-define chains, dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any compiler optimization. The call graph and control flow graph are usually also built during the analysis phase.

2. Optimization : the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation and even automatic parallelization.

3. Code generation : the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi-Ullman algorithm). Debug data may also need to be generated to facilitate debugging.

Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, dependence analysis is crucial for loop transformation.

In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a basic block to the procedure/function level, or even over the whole program (interprocedural optimization). Obviously, a compiler can potentially do a better job using a broader view. But that broad view is not free: large scope analysis and optimizations are very costly in terms of compilation time and memory space; this is especially true for interprocedural analysis and optimizations.

Interprocedural analysis and optimizations are common in modern commercial compilers from HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems. The open source GCC was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open source compiler with full analysis and optimization

Page 9: Compiler and shell language

infrastructure is Open64, which is used by many organizations for research and commercial purposes.

Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them by default. Users have to use compilation options to explicitly tell the compiler which optimizations should be enabled.

Compiler correctness

Main article: Compiler correctness

Compiler correctness is the branch of software engineering that deals with trying to show that a compiler behaves according to its language specification.[citation needed] Techniques include developing the compiler using formal methods and using rigorous testing (often called compiler validation) on an existing compiler.

Related techniques

Assembly language is a type of low-level language and a program that compiles it is more commonly known as an assembler, with the inverse program known as a disassembler.

A program that translates from a low level language to a higher level one is a decompiler.

A program that translates between high-level languages is usually called a language translator, source to source translator, language converter, or language rewriter. The last term is usually applied to translations that do not involve a change of language.

A program that translates into an object code format that is not supported on the compilation machine is called a cross compiler and is commonly used to prepare code for embedded applications.

International conferences and organizations

Every year, the European Joint Conferences on Theory and Practice of Software (ETAPS) sponsors the International Conference on Compiler Construction (CC), with papers from both the academic and industrial sectors.[7]

See also

Book: Compiler construction

Abstract interpretation Attribute grammar Binary recompiler Bottom-up parsing Byzantine fault tolerance compile and go loader

Page 10: Compiler and shell language

Compile farm Compiler-compiler (or Parser generator) Compiler correctness Decompiler History of compiler writing Just-in-time compilation Linker List of compilers List of important publications in computer science#Compilers Metacompilation Overhead code Semantics encoding Transcompiler

Notes

1. ̂ "IP: The World's First COBOL Compilers". interesting-people.org. 12 June 1997.2. ̂ T. Hart and M. Levin. "The New Compiler, AIM-39 - CSAIL Digital Archive -

Artificial Intelligence Laboratory Series". publications.ai.mit.edu.3. ̂ Chakraborty, P., Saxena, P. C., Katti, C. P., Pahwa, G., Taneja, S. A new practicum

in compiler construction. Computer Applications in Engineering Education, In Press. http://onlinelibrary.wiley.com/doi/10.1002/cae.20566/pdf

4. ̂ "The PL/0 compiler/interpreter".5. ̂ "The ACM Digital Library".6. ̂ T diagrams were first introduced for describing bootstrapping and cross-compiling

compilers in McKeeman et al. A Compiler Generator (1971). Conway described the broader concept before that with his UNCOL in 1958, to which Bratman added in 1961: H. Bratman, “An alternate form of the ´UNCOL diagram´“, Comm. ACM 4 (March 1961) 3, p. 142. Later on, others, including P.D. Terry, gave an explanation and usage of T-diagrams in their textbooks on the topic of compiler construction. Cf. Terry, 1997, Chapter 3. T-diagrams are also now used to describe client-server interconnectivity on the World Wide Web: cf. Patrick Closhen, et al. 1997: T-Diagrams as Visual Language to Illustrate WWW Technology, Darmstadt University of Technology, Darmstadt, Germany

7. ̂ ETAPS - European Joint Conferences on Theory and Practice of Software. Cf. "CC" (Compiler Construction) subsection.

References

Compiler textbook references A collection of references to mainstream Compiler Construction Textbooks

Aho, Alfred V. ; Sethi, Ravi; and Ullman, Jeffrey D., Compilers: Principles, Techniques and Tools (ISBN 0-201-10088-6) link to publisher. Also known as “The Dragon Book.”

Allen, Frances E. , "A History of Language Processor Technology in IBM", IBM Journal of Research and Development, v.25, no.5, September 1981.

Allen, Randy; and Kennedy, Ken, Optimizing Compilers for Modern Architectures, Morgan Kaufmann Publishers, 2001. ISBN 1-55860-286-0

Appel, Andrew Wilson

Page 11: Compiler and shell language

o Modern Compiler Implementation in Java, 2nd edition. Cambridge University Press, 2002. ISBN 0-521-82060-X

o Modern Compiler Implementation in ML , Cambridge University Press, 1998. ISBN 0-521-58274-1

Bornat, Richard , Understanding and Writing Compilers: A Do It Yourself Guide, Macmillan Publishing, 1979. ISBN 0-333-21732-2

Cooper, Keith D., and Torczon, Linda, Engineering a Compiler, Morgan Kaufmann, 2004, ISBN 1-55860-699-8.

Leverett; Cattel; Hobbs; Newcomer; Reiner; Schatz; Wulf, An Overview of the Production Quality Compiler-Compiler Project, in Computer 13(8):38-49 (August 1980)

McKeeman, William Marshall; Horning, James J.; Wortman, David B., A Compiler Generator, Englewood Cliffs, N.J. : Prentice-Hall, 1970. ISBN 0-13-155077-2

Muchnick, Steven , Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 1997. ISBN 1-55860-320-4

Scott, Michael Lee , Programming Language Pragmatics, Morgan Kaufmann, 2005, 2nd edition, 912 pages. ISBN 0-12-633951-1 (The author's site on this book).

Srikant, Y. N.; Shankar, Priti, The Compiler Design Handbook: Optimizations and Machine Code Generation, CRC Press, 2003. ISBN 0-8493-1240-X

Terry, Patrick D., Compilers and Compiler Generators: An Introduction with C++, International Thomson Computer Press, 1997. ISBN 1-85032-298-8,

Wirth, Niklaus , Compiler Construction (ISBN 0-201-40353-6), Addison-Wesley, 1996, 176 pages. Revised November 2005.

External links

Wikibooks has a book on the topic of: Compiler Construction

The dictionary definition of compiler at Wiktionary Compilers at the Open Directory Project Compile-Howto Basics of Compiler Design by Torben Ægidius Mogensen Short animation explaining the key conceptual difference between compilers and

interpreters

View page ratingsRate this pageWhat's this?TrustworthyObjectiveCompleteWell-written

I am highly knowledgeable about this topic (optional) Categories:

American inventions Compilers Compiler construction

Page 12: Compiler and shell language

Computer libraries Programming language implementation Utility software types

Navigation menu

Create account Log in

Article Talk

Read Edit View history

Main page Contents Featured content Current events Random article Donate to Wikipedia

Interaction

Help About Wikipedia Community portal Recent changes Contact Wikipedia

Toolbox

Print/export

Languages

Afrikaans العربية Aragonés Asturianu Azərbaycanca বাং��লা� Беларуская Беларуская (тарашкевіца) Български Bosanski

Page 13: Compiler and shell language

Català Česky Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 हि�न्दी� Hornjoserbsce Hrvatski Bahasa Indonesia Interlingua Íslenska Italiano עברית ქართული Қазақша Кыргызча Latina Latviešu Lietuvių Magyar Македонски മലയാ�ളം� Bahasa Melayu Nederlands ने�पाली� 日本語 Norsk (bokmål) Олык марий Polski Português Română Русский Simple English Slovenčina Slovenščina Српски / srpski Suomi Svenska தமி�ழ் తెలు�గు� ไทย

Page 14: Compiler and shell language

Türkçe Українська اردو Tiếng Việt יִידיש 中文 Edit links

This page was last modified on 20 February 2013 at 09:37. Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. See Terms of Use for details.Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

Contact us

Privacy policy About Wikipedia Disclaimers Mobile view

Note: This file is part of Linux Shell Scripting Tutorial, and contains many Linux/Unix definition, miscellaneous concepts and answer to many shell scripts exercise section.

Free

Linux is free. 

First ,It's available free of cost (You don't have to pay to use this OS, other OSes like MS-Windows or Commercial version of Unix may cost you money) 

Second free means freedom to use Linux, i.e. when you get Linux you will also get source code of Linux, so you can modify OS (Yes OS! Linux OS!!) according to your taste. 

It also offers many Free Software applications, programming languages, and development tools etc. Most of the Program/Software/OS are under GNU General Public License (GPL).

Page 15: Compiler and shell language

Unix Like 

Unix is almost 35 year old Os. 

In 1964 OS called MULTICS (Multiplexed Information and Computing System) was developed by Bell Labs, MIT & General Electric. But this OS was not the successful one.

Then Ken Thompson (System programmer of Bell Labs) thinks he could do better (In 1991, Linus Torvalds felt he could do better than Minix - History repeats itself.). So Ken Thompson wrote OS on PDP - 7 Computer, assembler and few utilities, this is know as Unix (1969). But this version of Unix is not portable. Then Unix was rewrote in C. Because Unix written in 'C', it is portable. It means Unix can run on verity of Hardware platform (1970-71). 

At the same time Unix was started to distribute to Universities. There students and professor started more experiments on Unix. Because of this Unix gain more popularity, also several new features are added to Unix. Then US govt. & military uses Unix for there inter-network (now it is know as INTERNET).

So Unix is Multi-user, Multitasking, Internet-aware Network OS.  Linux almost had same Unix Like feature for e.g.

Like Unix, Linux is also written is C. Like Unix, Linux is also the Multi-user/Multitasking/32 or 64 bit Network OS. Like Unix, Linux is rich in Development/Programming environment. Like Unix, Linux runs on different hardware platform; for e.g.

o Intel x86 processor (Celeron/PII/PIII/PIV/Old-Pentiums/80386/80486)o Macintosh PC's o Cyrix processor o AMD processor o Sun Microsystems Sparc processoro Alpha Processor (Compaq)

Open Source

Linux is developed under the GNU Public License. This is sometimes referred to as a "copyleft", to distinguish it from a copyright.

Under GPL the source code is available to anyone who wants it, and can be freely modified, developed, and so forth. There are only a few restrictions on the use of the code. If you make changes to the programs , you have to make those changes available to everyone. This basically means you can't take the Linux source code, make a few changes, and then sell your modified version without making the source code available. For more details, please visit the open-source home page.

Common vi editor command list

Page 16: Compiler and shell language

For this Purpose Use this vi Command Syntax

To insert new textesc + i ( You have to press 'escape' key then 'i')

To save fileesc + : + w (Press 'escape' key  then 'colon' and finally 'w')

To save file with file name (save as) esc + : + w  "filename"To quit the vi editor esc + : + qTo quit without saving esc + : + q!To save and quit vi editor esc + : + wq

To search for specified word in forward direction

esc + /word (Press 'escape' key, type /word-to-find, for e.g. to find word 'shri', type as/shri)

To continue with search  nTo search for specified word in backward direction

esc + ?word (Press 'escape' key, type word-to-find)

To copy the line where cursor is located esc + yyTo paste the text just deleted or copied at the cursor

esc + p

To delete entire line where cursor is located esc + ddTo delete word from cursor position esc + dw

To Find all occurrence of given word and Replace then globally without confirmation 

esc + :$s/word-to-find/word-to-replace/g

For. e.g. :$s/mumbai/pune/gHere word "mumbai" is replace with "pune"

 To Find all occurrence of given word and Replace then globally with confirmation

esc + :$s/word-to-find/word-to-replace/cg

To run shell command like ls, cp or date etc within vi

esc + :!shell-command

For e.g. :!pwd

How Shell Locates the file

To run script, you need to have in the same directory where you created your script, if you are in different directory your script will not run (because of path settings), For e.g.. Your home directory is ( use $ pwd to see current working directory) /home/vivek. Then you created one script called 'first', after creation of this script you moved to some other directory lets say /home/vivek/Letters/Personal, Now if you try to execute your script it will not run, since script 'first' is in /home/vivek directory, to overcome this problem there are two ways first, specify complete path of your script when ever you want to run it from other directories like giving following command$ /bin/sh   /home/vivek/first

Page 17: Compiler and shell language

Now every time you have to give all this detailed as you work in other directory, this take time and you have to remember complete path. 

There is another way, if you notice that all of our programs (in form of executable files) are marked as executable and can be directly executed from prompt from any directory. (To see executables of our normal program give command $ ls -l /bin ) By typing commands like$ bc$ cc myprg.c$ caletc, How its possible? All our executables files are installed in directory called /bin and /bin directory is set in your PATH setting, Now when you type name of any command at $ prompt, what shell do is it first look that command in its internal part (called as internal command, which is part of Shell itself, and always available to execute), if found as internal command shell will execute it, If not found It will look for current directory, if found shell will execute command from current directory, if not found, then Shell will Look PATH setting, and try to find our requested commands executable file in all of the directories mentioned in PATH settings, if found it will execute it, otherwise it will give message "bash: xxxx :command not found", Still there is one question remain can I run my shell script same as these executables?, Yes you can, for this purpose create bin directory in your home directory and then copy your tested version of shell script to this bin directory. After this you can run you script as executable file without using command like$ /bin/sh   /home/vivek/first Command to create you own bin directory.

$ cd$ mkdir bin$ cp first ~/bin$ first

Each of above commands can be explained as follows:

Each of above command Explanation$ cd Go to your home directory

$ mkdir bin

Now created bin directory, to install your own shell script, so that script can be run as independent program or can be accessed from any directory

$ cp   first ~/bin copy your script 'first' to your bin directory

$ firstTest whether script is running or not (It will run)

Page 18: Compiler and shell language

Answer to Variable sections exercise

Q.1.How to Define variable x with value 10 and print it on screen.$ x=10$ echo $x

Q.2.How to Define variable xn with value Rani and print it on screenFor Ans. Click here$ xn=Rani$ echo $xn

Q.3.How to print sum of two numbers, let's say 6 and 3$ echo 6 + 3This will print 6 + 3, not the sum 9, To do sum or math operations in shell use expr, syntax is as follows  Syntax: expr   op1   operator   op2Where, op1 and op2 are any Integer Number (Number without decimal point) and operator can be+ Addition- Subtraction/ Division% Modular, to find remainder For e.g. 20 / 3 = 6 , to find remainder 20 % 3 = 2, (Remember its integer calculation)\* Multiplication$ expr 6 + 3 Now It will print sum as 9 , But$ expr 6+3will not work because space is required between number and operator (See Shell Arithmetic)

Q.4.How to define two variable x=20, y=5 and then to print division of x and y (i.e. x/y)For Ans. Click here$x=20$ y=5$ expr x / y

Q.5.Modify above and store division of x and y to variable called zFor Ans. Click here$ x=20$ y=5$ z=`expr x / y`$ echo $z

Q.6.Point out error if any in following script

$ vi   variscript##

Page 19: Compiler and shell language

# Script to test MY knolwdge about variables!#myname=Vivekmyos   =  TroubleOS    -----> ERROR 1myno=5echo "My name is $myname"echo "My os is $myos"echo "My number is   myno,   can you see this number"  ----> ERROR 2

ERROR 1 Read this

ERROR 2 Read this

Following script should work now, after bug fix!

$ vi   variscript### Script to test MY knolwdge about variables!#myname=Vivekmyos=TroubleOSmyno=5echo "My name is $myname"echo "My os is $myos"echo "My number is   $myno,   can you see this number"

Parameter substitution.

Now consider following command$($ echo 'expr 6 + 3')

The command ($ echo 'expr 6 + 3')  is know as Parameter substitution. When a command is enclosed in backquotes, the command get executed and we will get output. Mostly this is used in conjunction with other commands. For e.g.

$pwd$cp /mnt/cdrom/lsoft/samba*.rmp `pwd`

Now suppose we are working in directory called "/home/vivek/soft/artical/linux/lsst" and I want to copy some samba files from "/mnt/cdrom/lsoft" to my current working directory, then my command will be something like

$cp   /mnt/cdrom/lsoft/samba*.rmp    /home/vivek/soft/artical/linux/lsst

Instead of giving above command I can give command as follows

Page 20: Compiler and shell language

$cp  /mnt/cdrom/lsoft/samba*.rmp  `pwd`

Here file is copied to your working directory. See the last Parameter substitution of `pwd` command, expand it self to /home/vivek/soft/artical/linux/lsst. This will save my time.$cp  /mnt/cdrom/lsoft/samba*.rmp  `pwd`

Future Point: What is difference between following two command?$cp  /mnt/cdrom/lsoft/samba*.rmp  `pwd`

                        A N D

$cp  /mnt/cdrom/lsoft/samba*.rmp  .

Try to note down output of following Parameter substitution.

$echo "Today date is `date`"$cal > menuchoice.temp.$$$dialog --backtitle "Linux Shell Tutorial"  --title "Calender"  --infobox  "`cat  menuchoice.temp.$$`"  9 25 ; read

Answer to if command. 

A) There is file called foo, on your disk and you give command, $ ./trmfi   foo what will be output.Ans.: foo file will be deleted, and message "foo file deleted" on screen will be printed.

B) If bar file not present on your disk and you give command, $ ./trmfi   bar what will be output.Ans.: Message "rm: cannot remove `bar': No such file or directory" will be printed because bar file does not exist on disk and we have called rm command, so error from rm commad

C) And if you type $ ./trmfi, What will be output.Ans.:  Following message will be shown by rm command, because rm is called from script without any parameters.rm: too few argumentsTry `rm --help' for more information.

Answer to Variables in Linux.

1) If you want to print your home directory location then you give command:     (a) $ echo $HOME

                    or

Page 21: Compiler and shell language

     (b) $ echo HOME

Which of the above command is correct & why?

Ans.: (a) command is correct, since we have to print the contains of variable (HOME) and not the HOME. You must use $ followed by variable name to print variables cotaines.

Answer to Process Section.

1) Is it example of Multitasking?Ans.: Yes, since you are running two process simultaneously.

2) How you will you find out the both running process (MP3 Playing & Letter typing)?Ans.: Try $ ps aux or $ ps ax | grep  process-you-want-to-search

3) "Currently only two Process are running in your Linux/PC environment", Is it True or False?, And how you will verify this?Ans.: No its not true, when you start Linux Os, various process start in background for different purpose. To verify this simply use top or ps aux command.

4) You don't want to listen music (MP3 Files) but want to continue with other work on PC, you will take any of the following action:

1. Turn off Speakers2. Turn off Computer / Shutdown Linux Os3. Kill the MP3 playing process4. None of the above

Ans.: Use action no. 3 i.e. kill the MP3 process.Tip: First find the PID of MP3 playing process by issuing command:$ ps ax | grep mp3-process-name Then in the first column you will get PID of process. Kill this PID to end the process as:$ kill  PID

Or you can try killall command to kill process by name as follows:$ killall  mp3-process-name

Linux Console (Screen) 

How can I write colorful message on Linux Console? , mostly this kind of question is asked by newcomers (Specially those who are learning shell programming!). As you know in Linux everything is considered as a file, our console is one of such special file. You can write special character sequences to console, which control every aspects of the console like Colors on screen, Bold or Blinking text effects, clearing the screen, showing text boxes etc. For this purpose we have to use special code called escape sequence code.  Our Linux console is based on the DEC VT100 serial terminals which support ANSI escape sequence code.

Page 22: Compiler and shell language

What is special character sequence and how to write it to Console?

By default what ever you send to console it is printed as its. For e.g. consider following echo statement,$ echo "Hello World"Hello WorldAbove echo statement prints sequence of character on screen, but if there is any special escape sequence (control character) in sequence , then first some action is taken according to escape sequence (or control character) and then normal character is printed on console. For e.g. following echo command prints message in Blue color on console$ echo -e "\033[34m   Hello Colorful  World!"Hello Colorful  World!

Above echo statement uses ANSI escape sequence (\033[34m), above entire string ( i.e.  "\033[34m   Hello Colorful  World!" ) is process as follows

1) First \033, is escape character, which causes to take some action2) Here it set screen foreground color to Blue using [34m escape code.3) Then it prints our normal message Hello Colorful  World! in blue color.

Note that ANSI escape sequence begins with \033 (Octal value) which is represented as ^[ in termcap and terminfo files of terminals and documentation.

You can use echo statement to print message, to use ANSI escape sequence you must use -e option (switch) with echo statement, general syntax is as followsSyntaxecho   -e  "\033[escape-code    your-message"

In above syntax you have to use\033[ as its with different escape-code for different operations. As soon as console receives the message it start to process/read it, and if it found escape character (\033) it moves to escape mode, then it read "[" character and moves into Command Sequence Introduction (CSI) mode. In CSI mode console reads a series of ASCII-coded decimal numbers (know as parameter) which are separated by semicolon (;) . This numbers are read until console action letter or character is not found (which determines what action to take). In above example

\033 Escape character[ Start of CSI34 34 is parameterm m is letter (specifies action)

Following table show important list of such escape-code/action letter or character

Character or letter Use in CSI Examplesh Set the ANSI mode echo -e "\033[h"l Clears the ANSI mode echo -e "\033[l"m Useful to show characters in

different colors or effects such echo -e  "\033[35m Hello World"

Page 23: Compiler and shell language

as BOLD and Blink, see below for parameter taken by m.

qTurns keyboard num lock, caps lock, scroll lock LED on or off, see below.

echo -e "\033[2q"

sStores the current cursor x,y position (col , row position) and attributes

echo -e "\033[7s"

uRestores cursor position and attributes

echo -e "\033[8u"

m understand following parameters

Parameter Meaning Example

0

Sets default color scheme (White foreground and Black background), normal intensity, no blinking etc.

 

1 Set BOLD intensity

$ echo -e "I am \033[1m BOLD \033[0m Person"I am BOLD PersonPrints BOLD word in bold intensity and next ANSI Sequence remove bold effect (\033[0m)

2 Set dim intensity$ echo -e "\033[1m  BOLD \033[2m DIM  \033[0m"

5 Blink Effect $ echo -e "\033[5m Flash!  \033[0m"

7

Reverse video effect i.e. Black foreground and white background in default color scheme

$ echo -e "\033[7m Linux OS! Best OS!! \033[0m"

11

Shows special control character as graphics character. For e.g. Before issuing this command press alt key (hold down it) from numeric key pad press 178 and leave both key; nothing will be printed. Now give --> command shown in example and try the above, it works. (Hey you must know extended ASCII Character for this!!!)

$ press alt + 178$ echo -e "\033[11m"$ press alt + 178$ echo -e "\033[0m"$ press alt + 178

25Removes/disables blink effect

 

27Removes/disables reverse effect

 

Page 24: Compiler and shell language

30 - 37

Set foreground color31 - RED32 - Greenxx - Try to find yourself this left as exercise for you :-)

$ echo -e "\033[31m I am in Red"

40 - 47Set background colorxx - Try to find yourself this left as exercise for you :-)

$ echo -e "\033[44m Wow!!!"

q understand following parameters

Parameters Meaning0 Turns off all LEDs on Keyboard1 Scroll lock LED on and others off2 Num lock LED on and others off3 Caps lock LED on and others off

Click here to see example of q command.

Click here to see example of m command.

Click here to see example of s and u command.

This is just quick introduction about Linux Console and what you can do using this Escape sequence. Above table does not contains entire CSI sequences. My up-coming tutorial series on C Programming Language will defiantly have entire story with S-Lang and curses (?). What ever knowledge you gain here will defiantly first step towards the serious programming using c. This much knowledge is sufficient for  Shell Programming, now try the following exercise :-) I am Hungry give me More Programming Exercise & challenges! :-)

1) Write function box(),  that will draw box on screen (In shell Script)    box (left, top, height, width)    For e.g. box (20,5,7,40)   

Page 25: Compiler and shell language

   

Hint: Use ANSI Escape sequence1) Use of 11 parameter to m2) Use following for cursor movement   row;col H      or   rowl;col f    For e.g.  $ echo   -e "\033[5;10H Hello"  $ echo   -e "\033[6;10f Hi"

In Above example prints Hello message at row 5 and column 6 and Hi at 6th row and 10th Column.

Shell Built in Variables

Shell Built in Variables

Meaning

$#Number of command line arguments. Useful to test no. of command line args in shell script.

$* All arguments to shell$@ Same as above$- Option supplied to shell$$ PID of shell

$! PID of last started background process (started with &)

See example of $@ and $* variable.

Page 26: Compiler and shell language

Compiler-compilers

In computer science, a compiler-compiler or compiler generator is a tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine. The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler. Similarly, code generator-generators (such as JBurg) exist, but such tools have not yet reached maturity.

The ideal compiler-compiler takes a description of a programming language and a target instruction set architecture, and automatically generates a usable compiler from them. In practice, the state of the art has yet to reach this degree of sophistication and most compiler generators are not capable of handling semantic or target architecture information.

Bootstrapping compiler

In computer science, bootstrapping is the process of writing a compiler (or assembler) in the target programming language which it is intended to compile. Applying this technique leads to a self-hosting compiler.

Many compilers for many programming languages are bootstrapped, including compilers for BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Java, Python, Scala and more.

Contents

1 Advantages 2 The chicken and egg problem 3 History 4 List of languages having self-hosting compilers 5 See also 6 References

Advantages

Bootstrapping a compiler has the following advantages:[1] [2]

it is a non-trivial test of the language being compiled; compiler developers only need to know the language being compiled; compiler development can be done in the higher level language being compiled; improvements to the compiler's back-end improve not only general purpose programs

but also the compiler itself; it is a comprehensive consistency check as it should be able to reproduce its own

object code.