TheClassicalToolChainModule4– Part1
CS230– Winter2020
CS230- Winter2020 4-2
ClassicalToolChainl Compiler translateshighlevellanguageintoanassemblyprogram.
l Assembler translatesanassemblyprogramintomachinecodeintheformofanobjectfile.
l Linker combinesmultipleobjectfilesofmachinecodeintoaprogramfile.
l Loader loadsaprogramfileintomainmemory.
CS230- Winter2020 4-3
ClassicalToolChainif a:
b += 1beq $2, $0, endifaddi $3, $3, 1endif:
Compiler
00010000000010000000
Assembler LinkerSourceCode
AssemblyCode ObjectFile
Library
LoaderMainMemory
machinecode
ObjectFile
ObjectFile
Library
CS230- Winter2020 4-4
OtherExecutionApproachesl Interpretation
l executesourcecodedirectlyl executebinarycodebysoftware
l ByteCodel compiletointermediatebinaryrepresentation
l Just-In-TimeCompilationl compileduringruntime
ASimpleViewofCompilation
Module4– Part2CS230– Winter2020
CS230- Winter2020 4-6
Compiler
l Programtranslationl fromsource languagel totarget language(usuallyassemblycode)
l Typicallyfollowedbyanassemblerl togeneratemachinecode
beq $2, $0, endifaddi $3, $3, 1endif:
SourceCode AssemblyCode
if a:b += 1
CS230- Winter2020 4-7
BasicCompilationStepsl Scanning
l Sourcecodetotokensequence
l Syntaxanalysisl Tokensequencetoparse
treel Semanticanalysis
l Useparsetreetogenerateasymboltable
l Typechecktheparsetreeagainstthesymboltable
l Codegenerationl Parsetreeandsymboltable
totarget language
scanner parser
symboltable
source
code
tokens parse
tree
CS230- Winter2020 4-8
Scanner/Tokenizer
l AlsocalledLexicalAnalysisl Convertprogramtextintostreamoftokensl Typesoftokens(sample):
l keyword– for,while,etc.l operator– +,&&,etc.l constant– 1000,3.5,etc.l delimiter– :,brackets,etc.l variablename– minpos,etc.l subroutinename– power2,etc.
CS230- Winter2020 4-9
Background
l Formallanguagesl tokensl languagespecificationandrules
l Well-studiedintheoreticalcomputersciencel deterministicfiniteautomata(DFA)l non-deterministicfiniteautomata(NFA)l regularexpressions(RE)
l Thesetofthingsthatcanberepresentedbyoneoftheseforms(DFA,NFA,RE)isexactlythesetofthingsthatcanberepresentedbyanyoftheothers
l standardalgorithmsexisttoconvertfromoneformtoanother(beyondthescopeofCS230)
l CS230willconsidersomeadhoctechniques
FiniteAutomataModule4– Part3
CS230– Winter2020
CS230- Winter2020 4-11
DeterministicFiniteAutomata(DFA)
l Alsoknownasadeterministicfinitestatemachine(FSM)
l Comprisedofl Afinitesetofstates
l Includesexactlyonestartstatel Includesatleastonefinal(alsocalledaccept)state(s)
l Afinitesetofinputsymbolsknownasthealphabetl Afinitesetoftransitionsfromonestatetoanotherbasedontheinput
l Candetermineifinputisacceptedorrejected
CS230- Winter2020 4-12
RegularLanguages
l WecallthesetofallstringsacceptedbyaDFA,NFA,orREthelanguage
l Thestringsofthelanguageareformedbycharactersthataredescribedinthealphabet
l Thealphabetisspecifiedbydescribingasetandisnormallydesignatedwiththesymbol∑
l Forexample,∑={0,1}describesalanguagecomprisedofstringsthatonlyincludethecharacters0 and1
l ADFA,NFA,orREdescribeswhichstringsareacceptedinthelanguageaswellaswhichstringsarerejected
CS230- Winter2020 4-13
FiniteAutomata
l AlsoknownasFiniteStateMachines(FSM)l Comprisedof
l Afinitesetofstatesl Includesexactlyonestartstatel Includesatleastonefinal(alsocalledaccept)state(s)
l Afinitesetofinputsymbols:thealphabetl Afinitesetoftransitionsfromonestatetoanotherbased
ontheinput
l Afinitesetsetofstatesandtransitionscandescribeaninfinitesetofacceptablestrings
CS230- Winter2020 4-14
DFAExample1
a b
l Startstatehasarrowfromnowherel Final(accept)statesaredoublecircles
l Noticethestartstateisalsoanacceptstatel Transitionsaremarkedwiththeinput(s)theyconsume
CS230- Winter2020 4-15
DFAExample1
a b
Letsconsidertheinputstring:l Begininthestartstate
ab
CS230- Winter2020 4-16
DFAExample1
a b
Letsconsidertheinputstring:l Begininthestartstatel Placeourmarkeratthestartofthestringl Lookatwhattransitions(arrows)areavailable
ab
CS230- Winter2020 4-17
DFAExample1
a b
Letsconsidertheinputstring:l Wetakethea transitionbecausea isthefirstcharacterafterourmarker
l Consumethea characterandmovetothestateattheendoftheatransition
l Lookatwhattransitionsareavailable
ab
CS230- Winter2020 4-18
DFAExample1
a b
Letsconsidertheinputstring:l Wetaketheb transitionbecauseb isthefirstcharacterafterourmarker
l Consumetheb characterandmovetothestateattheendoftheatransition
l Lookatwhattransitionsareavailable
ab
CS230- Winter2020 4-19
DFAExample1
a b
Letsconsidertheinputstring:l Ourmarkerisattheendofthestring
l Notransitionsareavailable
l Checkifweareinanacceptstate
l WesaythisDFAaccepts thestringab
ab
CS230- Winter2020 4-20
DFAExample2-1
ab d
ac
Letsconsidertheinputstring: aac
CS230- Winter2020 4-21
DFAExample2-1
ab d
ac
Letsconsidertheinputstring: aac
CS230- Winter2020 4-22
DFAExample2-1
ab d
ac
Letsconsidertheinputstring: aac
CS230- Winter2020 4-23
DFAExample2-1
ab d
ac
Letsconsidertheinputstring:WesaytheDFAaccepts thestring
aac
CS230- Winter2020 4-24
DFAExample2-2
ab d
ac
Nowletsconsidertheinputstring: abc
CS230- Winter2020 4-25
DFAExample2-2
ab d
ac
Nowletsconsidertheinputstring: abc
CS230- Winter2020 4-26
DFAExample2-2
ab d
ac
Nowletsconsidertheinputstring:We’restuck!WesaytheDFArejects thestring
abc
CS230- Winter2020 4-27
Non-deterministicFiniteAutomata(NFA)l ThesameasaDFAexceptthat:
l NFAsmayhavetransitionsfromthesamestateonthesameinputtodifferentstatesl DFAscanonlyhaveonetransitionperinputperstate
l NFAscanincludeanƛ (lambda)transitionl Movetoanewstatewithout consuminginput
l EasiertodesignthanequivalentDFAl Morecomplextoevaluateinput
l CanalwayscreateaDFAfromanNFAl AllDFAsarealsolegalNFAs
CS230- Winter2020 4-28
NFAExample
ab
l Middlestatehastwob transitionsl Startstatehasalambdatransitiontotheacceptstate
ƛ
b
CS230- Winter2020 4-29
NFAExample
ab
ƛ
b
Letsconsidertheinputstring: abab
CS230- Winter2020 4-30
NFAExample
ab
ƛ
b
Letsconsidertheinputstring:Nowwehavetwochoices,sowecreate“clones”
abab
CS230- Winter2020 4-31
NFAExample
ab
ƛ
b
Letsconsidertheinputstring:Thereweretwochoices,sowehavetwoclones
abab
CS230- Winter2020 4-32
NFAExample
ab
ƛ
b
Letsconsidertheinputstring:Oneclonegotstuckanddied!
abab
CS230- Winter2020 4-33
NFAExample
ab
ƛ
b
Letsconsidertheinputstring:Atleastonecloneisinanacceptstate
abab
CS230- Winter2020 4-34
NFAExample
ab
ƛ
b
Letsconsidertheinputstring:TheNFAacceptsabab
abab
RegularExpressionsModule4– Part4
CS230– Winter2020
CS230- Winter2020 4-36
RegularExpression
l Regularexpressions(regexs)areanotherwaytodefineregularlanguages
l Definesetofstringsoveralphabet∑l ∑isthesetofalllegalcharactersinlanguage
l Constants:l emptyset:Øl emptystring:εl literalcharacter:a∈ ∑
CS230- Winter2020 4-37
BasicRegex Operations
l AlternationR|S=RU Sl Uistheunionoperator
l ConcatenationRS={αβ :α inRandβ inS}
l Kleene starR*=smallestsupersetofRcontainingε andclosedunderconcatenationl “ZeroormorecopiesofR”
CS230- Winter2020 4-38
BasicRegex Examples
l a* ={ε,a,aa,aaa,...}l b|a* ={b,ε,a,aa,aaa,...}l (0|1)* =binarynumbers,plusemptystringl (h|c)at ={hat,cat}l (a|b)(c|d) ={ac,ad,bc,bd }l while ={while}
CS230- Winter2020 4-39
MoreRegex Operations
l a+ ={a,aa,aaa,...}l a? ={ε,a}l ab+ ={ab,abb,abbb,...}l (h|c)?at ={hat,cat,at}l (a|b)+(c|d) ={ac,ad,bc,bd,aac,aad,
bbc,bbd,aaac,aaad,bbbc,bbbd,abc,abd,abac ...}
l wh?ile? ={while,wile,whil,wil }
CS230- Winter2020 4-40
RegularExpressionsandDFA/NFA
l Regularexpressionsdefineregularlanguagesl NFAsandDFAsdefineregularlanguagesl Simpleconversionrules:
l a*a?
l a+a|ba
a
a
a
a
b
CS230- Winter2020 4-41
Regex toDFAExample
l Trythisregularexpression:(ab)?(cd)+l Weknowwestartwithanoptionalab soletsusethatrulefirst:
ab
CS230- Winter2020 4-42
Regex toDFAExample
l Trythisregularexpression:(ab)?(cd)+l Nowweaddoneormorecopiesofcdl Butnowhowtowehavetheab firstthencd ?
ab
dc
c
CS230- Winter2020 4-43
Regex toDFAExample
l Trythisregularexpression:(ab)?(cd)+l Weneedtoaddacarrowandremovetheextraacceptstate.Nowitworks!
ab
dc
c
c
CS230- Winter2020 4-44
DFAtoRegex Example
a
b
l Writeoutasmanyacceptedstringsasyoucanfinduntilyouseeapattern:
l aa,bb,bbb,bbbb
a
b
b
CS230- Winter2020 4-45
DFAtoRegex Example
a
b
l Wecanseeachoiceofeithertwoa oratleasttwob
l Regex:aa|bb+oraa|bbb*
a
b
b
CS230- Winter2020 4-46
Syntax/Extensions
l Squarebrackets(withranges)l matchoneofthegivenlettersl [a-z]matchesalllowercasecharacters
l Dotmatchesanysingleletterl .atmatcheshat,cat,fat,mat,bat,7at,Aat,etc.
l Escapecharacterl matchactualbrackets,dots,etc.l [0-9]+\.[0-9]+ tomatchfractionalnumbers
CS230- Winter2020 4-47
Real-worldExample
l Searchforalloccurrencesofanameinafilel Usingunix command“egrep”
l DifferentspellingforGeorgFriedrichHändel:l Händell Haendell Handell Hendel
egrep "[Hh](ae|a|e|ä)ndel" ex.txt
ParsingandContextFreeGrammars
Module4– Part5CS230– Winter2020
CS230- Winter2020 4-49
Parsing/SyntaxAnalysis
l Clearandunambiguousdescriptionofl validsentences->naturallanguagel validprogram->programminglanguage
l Needarulesetforclassificationofvalidsequencesl calledaGrammar
l Programminglanguage– formalspecificationl English– informalspecification
CS230- Winter2020 4-50
Grammar
l Grammarl formaldescriptionofalanguagel thequalityofthegrammarwilldeterminethequalityofthecompiler;i.e.ifthegrammarisflawed,thecompilerwillnotbeabletoworkproperly
l Parsingl validitycheckinganderrorreportingl comparesasequenceoftokenswithagrammarl buildaparsetree
l ifwecanmakeaparsetree:sequenceisinthelanguagel ifwecannot:sequenceisnotinthelanguage
CS230- Winter2020 4-51
Context-FreeGrammar
l Formalismtospecifylanguagesl simpleandpreciserecursivedefinitionl buildingblockstructureoflanguages
l Well-studiedintheoreticalcomputersciencel Context-free:sequencesdonotdependonanyexternalcontext
CS230- Winter2020 4-52
CFGComponentsl Terminal/Token:atomic(undividable)symboll Non-terminal/Variable:abstractcomponent
l doesnotliterallyappearininputl designatednon-terminalstartsymbol
l firstnon-terminalingrammarl notation:anglebrackets<...>
l Production/Rule:l possibleexpansionofanon-terminalintozeroormoreterminalsand/ornon-terminals
l morethanonerulepernon-terminalpossible
CS230- Winter2020 4-53
Example– BasicArithmeticCFG
<Expr> à <Expr>+ <Term>| <Expr>- <Term>| <Term>
<Term> à ( <Expr>)| integer
CS230- Winter2020 4-54
Example– ParseTree
l Parsetreefor1 + 2 - 3<Expr>
<Expr>
<Expr>
<Term>
<Term>
-
+
<Term>
3
2
1
CS230- Winter20204-55
Example2– ParseTree
Parsetreefor1 + (2 – 3)<Expr>
<Expr> <Term>+
<Expr>
1
<Term> ( )
<Expr> <Term>-
2
<Term> 3
CS230- Winter2020 4-56
Example– BiggerArithmeticCFG
<Expr> à <Expr>+ <Term>| <Expr>- <Term>| <Term>
<Term> à <Term>* <Factor>| <Term>/ <Factor>| <Factor>
<Factor>à ( <Expr>)| integer
CS230- Winter2020 4-57
Example3– ParseTree
l Parsetreefor1 + 2 * 3<Expr>
<Expr> <Term>+
1
<Term>
2
3
<Term> <Factor>*
<Factor><Factor>
CS230- Winter2020 4-58
SemanticAnalysisandCodeGeneration
l Parsedtreeisanalyzedl semanticanalysis
l typecheckingl variablesexistl functionparametersmatch
l Resultsconvertedintotargetlanguagel usuallyassemblylanguagel calledcodegeneration