Compilers and Language Translation

Download Compilers and Language Translation

Post on 05-Feb-2016




0 download


Compilers and Language Translation. Gordon College. Whats a compiler?. All computers only understand machine language Therefore, high-level language instructions must be translated into machine language prior to execution. This is a program. 10000010010110100100101. Whats a compiler?. - PowerPoint PPT Presentation


<ul><li><p>Compilers and Language TranslationGordon College</p></li><li><p>Whats a compiler?All computers only understand machine language</p><p>Therefore, high-level language instructions must be translated into machine language prior to execution</p><p>10000010010110100100101This isa program</p></li><li><p>Whats a compiler?Compiler A piece of system software that translates high-level languages into machine language</p><p>10000010010110100100101Congrats!while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')printf("Congrats!"); else if (c!='x') printf("You Loser!"); } </p><p>Compilergcc -o prog program.cprogram.cprog</p></li><li><p>Assembler (a kind of compiler)LOADXAssembly0101 0000 0000 1001Machine Language(symbol table)(opcode table)One-to-one translation</p></li><li><p>Compiler (high-level language translator)a = b + c - d;One-to-many translation0101 00001110001LOAD B0111 00001110010ADD C0110 00001110011SUBTRACT D0100 00001110100STORE A0101 00001110001 0111 00001110010.</p></li><li><p>Goals of a compilerCode produced must be correct</p><p>A = (B+C)-(D+E);Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE ANo - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intendIs this correct?</p></li><li>Goals of a compilerCode produced should be reasonably efficient and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+. 2x50000sum = 0.0for(i=0;i</li><li><p>General Structure of a Compiler</p></li><li><p>The Compilation ProcessPhase I: Lexical analysisCompiler examines the individual characters in the source program and groups them into syntactical units called tokensPhase II: ParsingThe sequence of tokens formed by the scanner is checked to see whether it is syntactically correctScannerSourcecodeGroups oftokensParserGroups oftokenscorrectnot correct</p></li><li><p>The Compilation ProcessPhase III: Semantic analysis and code generationThe compiler analyzes the meaning of the high-level language statement and generates the machine language instructions to carry out these actions</p><p>Code GeneratorGroupsof tokensMachinelanguage</p></li><li><p>The Compilation ProcessPhase IV: Code optimizationThe compiler takes the generated code and sees whether it can be made more efficientCode OptimizerMachinelanguageMachinelanguage</p></li><li><p>Overall Execution Sequence on a High-Level Language Program</p></li><li><p>The Compilation ProcessSource programOriginal high-level language programObject programMachine language translation of the source program</p></li><li><p>Phase I: Lexical AnalysisLexical analyzerThe program that performs lexical analysisMore commonly called a scannerJob of lexical analyzerGroup input characters into tokensTokens: Syntactical units that are treated as single, indivisible entities for the purposes of translationClassify tokens according to their type</p></li><li><p>Phase I: Lexical AnalysisProgram statementsum = sum + a[i];</p><p>Digital perspective:tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;</p><p>Tokenized:sum,=,sum,+,a[i],;</p></li><li><p>Phase I: Lexical Analysis</p><p>TOKEN TYPECLASSIFICATION NUMBERSymbol1Number2=3+4- 5;6==7If8Else9(10)11[ 12] 13Typical Token Classifications</p></li><li><p>Phase I: Lexical AnalysisLexical Analysis Process 1. Discard blanks, tabs, etc. - look for beginning of token. 2. Put characters together 3. Repeat step 2 until end of token 4. Classify and save token 5. Repeat steps 1-4 until end of statement 6. Repeat steps 1-5 until end of source codeScannersum=sum+a[i];sum 1=3+4a1[12i1]13;6</p></li><li><p>Phase I: Lexical AnalysisInput to a scanner - A high-level language statement from the source programScanners output - A list of all the tokens in that statement - The classification number of each token foundScannersum=sum+a[i];sum 1=3+4a1[12i1]13;6</p></li><li><p>Phase II: Parsing Parsing phaseA compiler determines whether the tokens recognized by the scanner are a syntactically legal statementPerformed by a parser</p></li><li><p>Phase II: Parsing Output of a parserA parse tree, if such a tree existsAn error message, if a parse tree cannot be constructedSuccessful construction of a parse tree is proof that the statement is correctly formed</p></li><li><p>ExampleHigh-level language statement: a = b + c</p></li><li><p>Grammars, Languages, and BNFSyntaxThe grammatical structure of the languageThe parser must be given the syntax of the languageBNF (Backus-Naur Form) Most widely used notation for representing the syntax of a programming languageliteral_expression ::= integer_literal | float_literal | string | character </p></li><li><p>Grammars, Languages, and BNFIn BNFThe syntax of a language is specified as a set of rules (also called productions)A grammarThe entire collection of rules for a languageStructure of an individual BNF ruleleft-hand side ::= definition</p></li><li><p>Grammars, Languages, and BNFBNF rules use two types of objects on the right-hand side of a productionTerminalsThe actual tokens of the languageNever appear on the left-hand side of a BNF ruleNonterminalsIntermediate grammatical categories used to help explain and organize the languageMust appear on the left-hand side of one or more rules</p></li><li><p>Grammars, Languages, and BNFGoal symbolThe highest-level nonterminalThe nonterminal object that the parser is trying to produce as it builds the parse treeAll nonterminals are written inside angle bracketsJava BNF</p></li><li><p>BNF Example ::= ::= | ::= | "." ::= ::= "," ::= "Sr." | "Jr." | | ""Identify the following:Goal symbol, terminals, nonterminals, a individual rule</p><p>Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563 </p></li><li><p>Parsing Concepts and TechniquesFundamental rule of parsing:By repeated applications of the rules of the grammar-If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid statement of the language else the sequence of tokens is not a syntactically valid statement of the language</p></li><li><p>Parsing Example ::= http:// [ / ] [ ? ] ::= [ : ] ::= | ::= [ . ] ::= . . . ::= ::= | [ / ] ::= [ + ] ::= | | | | ::= [ ] ::= | + ::= [ ] ::= [ ] ::= a | b | | z | A | B | | Z ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= $ | - | _ | @ | . | &amp; | ~ ::= ! | * | " | ' | ( | ) | : | ; | , | ::= % ::= | a | b | c | d | e | f | A | B | C | D | E | F ::= [ ] ::=Is the following http address legal:</p></li><li><p>Parsing Concepts and TechniquesLook-ahead parsing algorithms - intelligent parsersOne of the biggest problems in building a compiler is designing a grammar that:Includes every valid statement that we want to be in the languageExcludes every invalid statement that we do not want to be in the language</p></li><li><p>Parsing Concepts and TechniquesAnother problem in constructing a compiler: Designing a grammar that is not ambiguousAn ambiguous grammar allows the construction of two or more distinct parse trees for the same statement NOT GOOD - multiple interpretations</p></li><li><p>Phase III: Semantics and Code GenerationSemantic analysisThe compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically validIf they are valid the compiler can generate machine language instructions else there is a semantic error; machine language instructions are not generated</p></li><li><p>Phase III: Semantics and Code GenerationSemantic analysisSyntactically correct, but semantically incorrect example:sum = a + b;int a; double sum;data type mismatch char b;Semantic recordsaintegersumdoublebchar</p></li><li><p>Phase III: Semantics and Code GenerationSemantic analysis</p><p>a integerb char</p><p>temp ?</p><p> + </p><p>Parse treeSemantic recordSemantic recordSemantic record</p></li><li><p>Phase III: Semantics and Code GenerationSemantic analysis</p><p>a integerb integer</p><p>temp integer</p><p> + </p><p>Parse treeSemantic recordSemantic recordSemantic record</p></li><li><p>Phase III: Semantics and Code GenerationCode generationCompiler makes a second pass over the parse tree to produce the translated code</p></li><li><p>Phase IV: Code OptimizationTwo types of optimizationLocal Global Local optimizationThe compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code blockRelatively easy; included as part of most compilers:</p></li><li><p>Phase IV: Code OptimizationExamples of possible local optimizationsConstant evaluation x = 1 + 1 ---&gt; x = 2Strength reduction x = x * 2 ---&gt; x = x + xEliminating unnecessary operations</p></li><li><p>Phase IV: Code OptimizationGlobal optimizationThe compiler looks at large segments of the program to decide how to improve performanceMuch more difficult; usually omitted from all but the most sophisticated and expensive production-level optimizing compilersOptimization cannot make an inefficient algorithm efficient - only makes an efficient algorithm more efficient</p></li><li><p>SummaryA compiler is a piece of system software that translates high-level languages into machine languageGoals of a compiler: Correctness and the production of efficient and concise codeSource program: High-level language program</p></li><li><p>SummaryObject program: The machine language translation of the source programPhases of the compilation processPhase I: Lexical analysisPhase II: ParsingPhase III: Semantic analysis and code generationPhase IV: Code optimization</p></li></ul>


View more >