layered combinator parsers with a unique state pieter koopman rinus plasmeijer nijmegen, the...
Post on 14-Dec-2015
214 Views
Preview:
TRANSCRIPT
Layered Combinator Layered Combinator Parsers with a Unique Parsers with a Unique
StateState
Pieter KoopmanRinus Plasmeijer
Nijmegen, The Netherlands
Parser Combinators Pieter Koopman 2
Overview
conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling
Parser Combinators Pieter Koopman 3
parser combinators
Non deterministic, list of results:: Parser s r :== [s] -> [ ParseResult s r ]:: ParseResult s r :== ([s],r)
fail & yieldfail = \ss = []yield r = \ss = [(ss,r)]
recognize symbolsatisfy :: (s->Bool) -> Parser s ssatisfy f = pwhere p [s:ss] | f s = [(ss,s)] p _ = []
symbol sym :== satisfy ((==) sym)
Parser Combinators Pieter Koopman 4
parser combinators 2 sequence-combinators(<&>) infixr 6::(Parser s r)(r->Parser s t)->Parser s t(<&>) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1 , tuple <- p2 r1 ss2 ] (<+>)infixl 6::(Parser s(r->t))(Parser s r)->Parser s t(<+>) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1 , (ss3,r) <- p2 ss2 ]
choose-combinator(<||>) infixr 4::(Parser s r) (Parser s r)->Parser s r(<||>) p1 p2 = \ss = p1 ss ++ p2 ss
Parser Combinators Pieter Koopman 5
parser combinators 3
some useful abbreviations(@>) infixr 7(@>) f p :== yield f <+> p
(<:>) infixl 6(<:>) p1 p2 :== (\h t=[h:t]) @> p1 <+> p2
Parser Combinators Pieter Koopman 6
parser combinators 4
Kleene starstar p = p <:> star p
<||> yield []
plus p = p <:> star p
parsing an identifieridentifier :: Parser Char String
identifier = toString @> satisfy isAlpha
<:> star (satisfy isAlphanum)
Parser Combinators Pieter Koopman 7
parser combinators 5
context sensitive parserstwice the same character
doubleChar = satisfy isAlpha <&> \c -> symbol c
arbitrary look aheadlookAhead
= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c'
Parser Combinators Pieter Koopman 8
parser combinators 5
context sensitive parserstwice the same character
doubleChar = satisfy isAlpha <&> \c -> symbol c
arbitrary look aheadlookAhead
= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' <||> star (satisfy isSpace) +> symbol 'a' <||> symbol 'x'
Parser Combinators Pieter Koopman 9
properties of combinators
+ concise and clear parsers+ full power of fpl available+ context sensitive + arbitrary look-ahead+ can be efficient, continuations IFL '98
- no error handling (messages & recovery)
- no unique symbol tables- separate scanner yields problems
scan entire file before parser starts
Parser Combinators Pieter Koopman 10
Requirements parse state with
error file notion of position user-defined extension e.g. symbol table
possibility to add separate scanner efficient implementation, continuations
for programming languages we want a single result (deterministic grammar)
Parser Combinators Pieter Koopman 11
Uniqueness
files and windows that should be single-threaded are unique in Clean
fwritec :: Char *File -> *File
data-structures can be updated destructively when they are unique
only unique arrays can be changed
Parser Combinators Pieter Koopman 12
System-architecture
replace the list of symbols by a structure containing actual input position error administration user defined part of the state
use a type constructor class to allow multiple levels
Parser Combinators Pieter Koopman 13
Type constructor class
Reading a symbolclass PSread ps s st :: (*ps s *st)->(s, *ps s *st)
Copying the state is not allowed,use functions to manipulate the input
class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st)
class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st)
class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st)
Minimal parser state requires Clean 2.0class ParserState ps symbol state
| PSread, PSsplit, PSback, PSclear ps symbol state
Parser Combinators Pieter Koopman 14
New parser combinators
Parsers have three arguments1. success-continuation
determines action upon successSuccCont :== Item failCont State -> (Result, State)
2. fail-continuation specifies what to do if parser failsFailCont :== State -> (Result, State)
3. current input stateState :== (Symbol, ParserState)
Parser Combinators Pieter Koopman 15
New parser combinators 2 yield and fail, apply appropriate
continuationyield r = \succ fail tuple = succ r fail tuple
failComb = \succ fail tuple = fail tuple
sequence of parsers, change continuation<&> p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t
choice, change continuations(<|>) p1 p2
= \succ fail tuple = p1 (\r f t = succ r fail (PSclear t)) (\t2 = p2 succ fail (PSback t2)) (PSsplit tuple)
Parser Combinators Pieter Koopman 16
string input a very simple instance of ParserState:: *StringInput symbol state = { si_string :: String // string holds input , si_pos :: Int // index of current char , si_hist :: [Int] // to remember old positions , si_state :: state // user-defined extension , si_error :: ErrorState }instance PSread StringInput Char statewhere PSread si=:{si_string,si_pos} = (si_string.[si_pos],{si & si_pos = si_pos+1})
instance PSsplit StringInput Char statewhere PSsplit (c,si=:{si_pos,si_hist}) = (c,{si & si_hist = [si_pos:si_hist]})
instance PSback StringInput Char statewhere PSback (_,si=:{si_string,si_hist=[h:t]}) = (si_string.[h-1],{si & si_pos = h, si_hist = t})
Parser Combinators Pieter Koopman 17
Separate scanner and parser sometimes it is convenient to have a
separate scannere.g. to implement the offside rule
task of scanner and parser is similar.So, use the same combinators
due to the type constructor class we can nest parser states
Parser Combinators Pieter Koopman 18
a simple scanner
use of combinators doesn’t change produces tokens (algebraic datatype)
scanner = skipSpace +> ( generateOffsideToken <|> satisfy isAlpha <:> star (satisfy isAlphanum) <@ testReserved o toString <|> plus (satisfy isDigit)<@ IntToken o to_number 0 <|> symbol '=' <@ K EqualToken <|> symbol '(' <@ K OpenToken <|> symbol ')' <@ K CloseToken )
Parser Combinators Pieter Koopman 19
generating offside tokens use an ordinary parse functiongenerateOffsideToken
= pAcc getCol <&> \col -> // get current coloumn
pAcc getOffside <&> \os_col -> // get offside position
handleOS col os_col
where
handleOS col os_col
| EndGroupGenerated os_col
| col < os_col
= pApp popOffside (yield EndOfGroupToken)
= pApp ClearEndGroup failComb
| col <= os_col
= pApp SetEndGroup (yield EndOfDefToken)
= failComb
Parser Combinators Pieter Koopman 20
Parser state for nesting parser state contains scanner and its state:: *NestedInput token state = E. .ps sym scanState: { ni_scanSt :: (ps sym scanState) , ni_scanner :: (ps sym scanState) -> *(token, ps sym scanState)) , ni_buffer :: [token] , ni_history :: [[token]] , ni_state :: state }
can be nested to any depth we can, but doesn’t have to, use this
Parser Combinators Pieter Koopman 21
Parser state for nesting 2
NestedInput
*File
*ErrorState
*OffsideState
ScanState
scanner
*HashTable
Parser Combinators Pieter Koopman 22
Parser state for nesting 3
apply scanner to read token
instance PSread NestedState token state
where
PSread ns=:{ns_scanner, ns_scanSt}
# (tok, state) = ns_scanner ns_scanSt
= (tok, {ns & ns_scanSt = state})
here, we ignored the buffer define instances for other functions in class ParserState
Parser Combinators Pieter Koopman 23
error handling general error correction is difficult correct simple errors skip to new definition otherwise
Good error messages: location: position in file what are we parsing: stack of
contexts
Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of =
Parser Combinators Pieter Koopman 24
error handling 2 basic error generationparseError expected val = \succ fail (t,ps) = let msg = toString expected +++ " expected instead of " +++ toString t in succ val fail (PSerror msg (PSread ps))
useful primitiveswantSymbol sym = symbol sym <|> parseError sym sym
want p msg value = p <|> parseError msg value
skipToSymbol sym = symbol sym <|> parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym
Parser Combinators Pieter Koopman 25
Parser
Parsing expressionspExpression = "Expression" ::> BV @> match mBasicValue <|> pIdentifier <|> symbol CaseToken +> pDeter Case @> pCompoundExpression <+ wantSymbol OfToken <+> star pCaseAlt <+ skipToSymbol EndOfGroupToken
<|> symbol OpenToken +> pCompoundExpression <+ wantSymbol CloseToken
Parser Combinators Pieter Koopman 26
identifiers in hashtable
use a parse-function hashtable is user defined state in
ParserStatepIdentifier
= match mIdentToken
<&> \ident = pAccSt (putNameInHashTable ident)
<@ \name={app_symb=UnknownSymbol name, app_args=[]}
the function pAccSt applies a function to the user defined state
Parser Combinators Pieter Koopman 27
limitations of this approach syntax specified by parse
functions grammar is not a datastructure no detection of left recursion
runtime error instead of nice message no automatic left-factoring
do it by hand, or runtime overhead
p1 = p <&> q1 <|> p <&> q2
p2 = p <&> (q1 <|> q2)
Parser Combinators Pieter Koopman 28
discussiondiscussion old advantages
concise, fpl-power, arbitrary look ahead, context sensitve
new advantages unique and extendable parser state one or more layers decent error handling,
simple error correction can be added still efficient, overhead < 2 non-determinism only when needed
top related