parboiled explained

Post on 09-Feb-2017

195 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Parboiled2 explained

Covered

Why Parboiled2Library basicsPerfomance optimizationsBest PracticesMigration

Features PEG No lexer required Flexible typesfe EDSL Compile-time optimizations Decent error reporting scala.js support

When regex fail

Parse arbitrary HTML with regexes is like asking Paris Hilton to write an operating system (c)

When regex fail

Performance (regex)

Parsing

Warmup

620.38

621.95

Parboiled2Regex

Data is taken from here:http://bit.ly/1XHAJaA

Lower is better

Performance (json)

Parboiled1

Parboiled2

Argonaut

Json4SNative

Json4SJackson

85.64

13.17

7.01

8.06

4.09

Data is taken from here:http://myltsev.name/ScalaDays2014/#/

Lower is better

Performance (json)

Parser combinators

Parboiled1

Parboiled2

Argonaut

Json4SNative

Json4SJackson

2385.78

85.64

13.17

7.01

8.06

4.09

Data is taken from here:https://groups.google.com/forum/#!topic/parboiled-user/bGtdGvllGgU

Lower is better

Alternatives

● Grappa [java]● ANTLR● Regexps● Parser-combinators● Language Workbenches (xtext, MPS)

<dependency>

<groupId>org.parboiled</groupId>

<artifactId>parboiled_2.11</artifactId>

<version>2.1.0</version>

</dependency>

import org.parboiled2._

class MyParser (val input: ParserInput) extends Parser { // Your grammar}

Rule DSL

Basic match

def CaseDoesntMatter = rule { ignoreCase("string")}

def MyCharRule = rule { 'a' }def MyStringRule = rule { "string" }

def MyCharRule = rule { ch('a') }def MyStringRule = rule { str("string") }

Basic match

def CaseDoesntMatter: Rule0 = rule { ignoreCase("string") }

def MyCharRule: Rule0 = rule {'a'}

def MyStringRule: Rule0 = rule { "string" }

Syntactic predicates

● ANY – matches any character except EOI● EOI – virtual chararter represents the end of input

val EOI = '\uFFFF'

You must define EOI at the end of the main/root rule

Syntactic predicates● anyOf – at least one of the defined chars● noneOf – everything except those chars

def Digit = rule { anyOf("1234567890")}

def Visible = rule { noneOf(" \n\t")}

Character ranges

def Digit = rule { '0' - '9' }def AlphaLower = rule { 'a' - 'z' }

Good, but not flexible(the main issue of parboiled1)

● Sometimes you don't need ANY character

● You have a range of characters

Character predicatesThere is set of predifined char predicates:

● CharPredicate.All● CharPredicate.Digit● CharPredicate.Digit19● CharPredicate.HexDigit

Of course you can defien your own

def AllButQuotes = rule {

CharPredicate.Visible -- "\"" -- "'"

}

def ValidIdentifier = rule {

CharPredicate.AlphaNum ++ "_"

}

CharPredicate from (_.isSpaceChar)

Character predicates

def ArithmeticOperation = rule {

anyOf("+-*/^")

}

def WhiteSpaceChar = rule { noneOf(" \t\n")}

anyOf/noneOf

def cows = rule { 1000 times "cow" }

def PRI = rule { 1 to 3 times Digit }

N times

def OptWs = rule { zeroOrMore(Whitespace) // Whitespace.*}

def UInt = rule { oneOrMore(Digit) // Whitespace.+}

def CommaSeparatedNumbers = rule { oneOrMore(UInt).separatedBy(",")}

0+/1+

import CharPredicate.Digit

// "yyyy-mm-dd"def SimplifiedRuleForDate = rule { Year ~ "-" ~ Month ~ "-" ~ Day}

def Year = rule { Digit ~ Digit ~ Digit ~ Digit}

def Month = rule { Digit ~ Digit }def Day = rule { Digit ~ Digit }

Sequence

// zeroOrOnedef Newline = rule { optional('\r') ~ '\n'}

def Newline = rule { '\r'.? ~ '\n'}

Optional

def Signum = rule { '+' | '-' }

def bcd = rule { 'b' ~ 'c' | 'b' ~ 'd'}

Ordered choice

// why order mattersdef Operator = rule { "+=" | "-=" | "*=" | "++" | "--" | "+" | "-" | "*" | "/" ...}

def Operators = rule { ("+" ~ ("=" | "+").?) | ("-" ~ ("=" | "-").?) | ...}

Order matters

Running the parserclass MyParser(val input: ParserInput)

extends Parser {

def MyStringRule: Rule0 = rule {

ignoreCase("match") ~ EOI }

}

Running the parser

val p1 = new MyParser("match")val p2 = new MyParser("much")

p1.MyStringRule.run() // Success

p2.MyStringRule.run() // Failure

Different delivery schemes are also available

Running the parser

val p1 = new MyParser("match")val p2 = new MyParser("much")

p1.MyStringRule.run() // Success

p2.MyStringRule.run() // Failure

Different delivery schemes are also available

BKVserver.name = "webserver"server { port = "8080" address = "192.168.88.88"

settings { greeting_message = "Hello!\n It's me!" }}

Performance

Unroll n.times for n <=4

// Slowerrule { 4 times Digit }

// Fasterrule { Digit ~ Digit ~ Digit ~ Digit }

Faster stack operations

// Much fasterdef Digit4 = rule { Digit ~ Digit ~ Digit ~ Digit ~ push( #(charAt(-4))*1000 + #(charAt(-3))*100 + #(charAt(-2))*10 + #(lastChar) )}

Do not recreate CharPredicate

class MyParser(val input: ParserInput) extends Parser { val Uppercase = CharPredicate.from(_.isUpper)

}

Use predicatesdef foo = rule { capture(zeroOrMore(noneOf("\n")))}

def foo = rule { capture(zeroOrMore(!'\n')) //loop here}

def foo = rule { capture(zeroOrMore( !'\n' ~ ANY ))}

Best Practices

Best Practices

● Unit tests● Small rules● Decomposition● Case objects instead of strings

Push case objectsdef LogLevel = rule {

capture("info" | "warning" | "error")

}

def LogLevel = rule {

“info” ~ push(LogLevel.Info)

| “warning" ~ push(LogLevel.Warning)

| “error" ~ push(LogLevel.Error)

}

Simple syntax for object capture

case class Text(s: String)

def charsAST: Rule1[AST] = rule {

capture(Chars) ~> ((s: String) => Text(s))

}

def charsAST = rule {

capture(Chars) ~> Text

}

Named rulesdef Header: Rule1[Header] =

rule("I am header") { ... }

def Header: Rule1[Header] = namedRule("header") {...}

def UserName = rule {

Prefix ~ oneOrMore(NameChar).named("username")

}

Migration

Migration

● Separate classpath org.parboiled vs org.parboiled2

● Grammar is hard to break● Compotition: trait → abstract class● Removing primitives library

Drawbacks

Drawbacks

● PEG (absence of lexer)● No support for left recursive grammars● No error recovery mechanism● No IDE support● No support for indentation based grammars● Awful non informative error messages

Q/A

top related