yacc no more

34
Session # 2221 YACC no more Sriram Srinivasan (“Ram”) Integrating parsers, interpreters and compilers into your application

Upload: niles

Post on 09-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

YACC no more. Integrating parsers, interpreters and compilers into your application. Sriram Srinivasan (“Ram”). This is he. Sriram Srinivasan One of the core engineers of the WebLogic app server Wrote the first commercially available EJB implementation Wrote the TP engine in the WLS - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: YACC no more

Session # 2221

YACC no more

Sriram Srinivasan (“Ram”)

Integrating parsers, interpreters and compilers into your application

Page 2: YACC no more

Session # 22212

This is he

• Sriram Srinivasan

• One of the core engineers of the WebLogic app server– Wrote the first commercially available EJB

implementation– Wrote the TP engine in the WLS

• Author: “Advanced Perl Programming” (O’reilly)

Beginning

Page 3: YACC no more

Session # 22213

Why this talk?

• Quest for higher level programming patterns– More productive / faster / maintainable etc…

• Integrating compilers, parsers, interpreters into your application

Beginning

Page 4: YACC no more

Session # 22214

Embeddable Parsers

• JDK parsers for configuration data– java.util.Properties, XML, regex library

• java.util.Properties– Limited to “property = value” format– Takes care of comments, multi-line values, quotes

Case Study: Configuration Data

#app server propertiesconnectionPoolName = testPoolnumThreads = 10

…p = new Properties().load(inputStream)

Middle

Page 5: YACC no more

Session # 22215

XML parsers

• Good for structured, hierarchical data

• DOM (Document Object Model) parser– Converts an entire XML document into a

corresponding tree of Nodes.

• SAX (Simple API for XML) – Callback class extends DefaultHandler– Supplies methods for startDocument(…), startElement(…), endElement(…) etc.

Middle

Page 6: YACC no more

Session # 22216

Adding code to data

• Problem: We want to add add macros and expressions to our properties.

numThreads = numProcessors# Ensure that connection pool is smaller than# thread pool. connectionPoolSize = min(numThreads – 2, 1)

• This requires an expression evaluator

Middle

Page 7: YACC no more

Session # 22217

Embeddable interpreters

• Plethora of free, high quality interpreters available– BeanShell (Java-like syntax)– Rhino (JavaScript)– Jython (Python in Java)– Kawa (Scheme in Java)

• When embedded, flow of control easily goes from java to interpreter to back.

• Command-line shell always included

Middle

Page 8: YACC no more

Session # 22218

BeanShell

• Expressions identical to java

• Types are inferred dynamically

Middle

add( a, b ) { return a + b; }

sum = add(1, 2); // 3 str = add("Web", "Logic"); // "WebLogic"

Page 9: YACC no more

Session # 22219

Embedding BeanShell

Middle

import bsh.Interpreter;

Interpreter i = new Interpreter();i.set("foo", 5);i.eval("bar = foo*10"); System.out.println("bar = "+ i.get("bar"));

i.eval(new FileReader("config.properties"));Integer n = i.get("connectionPoolSize");

• Instead of writing code to parse the properties file, just eval it!– Comments should be “// … ”, not “# …– Each property definition line should end in “;”

Page 10: YACC no more

Session # 222110

• Strict java expression syntax – no class declarations

• Loose convenience syntax

BeanShell features

Middle

b = new java.awt.Button();b.label = "Yo" // eqvt. to b.setLabel("Yo")h = new Hashtable();h{"spud"} = "potato";// Swing stuffb = new JButton("My Button");f = new JFrame("My Frame");f.getContentPane().add(b, "Center");f.pack();f.show();

Page 11: YACC no more

Session # 222111

Rhino

• Free ECMAScript interpreter from Mozilla

• Slightly more cumbersome to embed than BeanShell

• Contains bytecode compiler that can be called from within java

• Closures

• Regex support built-in. Good for text manipulation

Middle

Page 12: YACC no more

Session # 222112

Case study: Command pattern

Middle

function insertCommand(text) { this.pos = buf.pos buf.insert(text) this.len = text.length this.undo = function () { buf.moveTo(this.pos) buf.erase(this.len); }

undoStack.push(this);}

new insertCommand("foo")undoStack.pop().undo()

• Undo/Redo in an editor

Page 13: YACC no more

Session # 222113

Python

Middle

• Python (Java implementation is "Jython") – powerful high-level language– Compiles to bytecode. – True scripting language– Can extend java classes– Static compilation and standalone execution

Page 14: YACC no more

Session # 222114

More case studies

• Embedded expressions – Spreadsheet formulae

• Customizable GUIs– Macro facility, keyboard mapping

• Remote agents

• Monitoring

• Performance through partial evaluation

Middle

Page 15: YACC no more

Session # 222115

Case Study: Remote Agents

• Example: Test Agents

• Can upload script to each agent to launch processes, control them locally.– Jython is well-suited for this kind of task

• Example: Scriptable IMAP mail server– "All messages that contain this regex, make

a copy in this folder"

Middle

Page 16: YACC no more

Session # 222116

Case Study: Monitoring

• SNMP model: Obtain attributes from each node over the network, do calculation

• Alternatively, upload script to each node, and let it return the result– Conserves network bandwidth

• Can insert any kind of probe • Study application data structures• Application-specific profiling

Middle

Page 17: YACC no more

Session # 222117

Case Study: Performance

• Partial evaluation can yield substantial performance benefits

• Object - RDBMS adaptors– Code generator studies class and db

schema– Omits unnecessary conversions, null checks

• Vector dot product

Middle

dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];

// But if 'a' is fixed {16,0,4} …dp = b[0] << 4 + b[2] << 2

Page 18: YACC no more

Session # 222118

Generating java

• Moving from embedded interpreters to generating java source– Example: JSP.

• Convert template to java, compile and dynamically load

• BEA/WebLogic's weblogic.dtdc– Converts XML DTD to a high performance

SAX parser tuned to that DTD

Middle

Page 19: YACC no more

Session # 222119

Generating code with Doclets

• javadoc is a general purpose parserjavadoc –doclet ListClass foo.java

• ListClass.start() called with a hierarchy of *Doc nodes

import com.sun.javadoc.*; public class ListClass { public static boolean start(RootDoc root) { ClassDoc[] classes = root.classes(); for (int i = 0; i < classes.length; ++i) { System.out.println(classes[i]); } return true; }

• Arbitrary tags can be introduced at any level

Middle

Page 20: YACC no more

Session # 222120

Case study: iContract

• Pattern: doclet expressions converted to annotated java code

/** * Ensure that argument is always > 0* @pre f >= 0.0** Ensure that the function produces the sqrt * within a* @post Math.abs((return * return) - f) < 0.001 */ public float sqrt(float f) { ... }

Middle

Page 21: YACC no more

Session # 222121

Case Study: EJBGen

/** * @ejbgen:entity * ejb-name = AccountEJB-OneToMany * data-source-name = demoPool * table-name = Accounts */abstract public class AccountBean implements EntityBean { /** * @ejbgen:cmp-field column = acct_id * @ejbgen:primkey-field * @ejbgen:remote-method transaction-attribute = Required */ abstract public String getAccountId();

Middle

Page 22: YACC no more

Session # 222122

Generating bytecode

• Example: WebLogic RMI adaptors

• Sometimes, some facilities are available only in bytecode (goto's!)

• Example: fast string matching– Given a search string, encode the state

machine into bytecode– Worth it if the same pattern is going to be

used many times• Virus scanners• Searching genome sequences

Middle

Page 23: YACC no more

Session # 222123

Example: String matching

• Problem: match "10100"– Convert to a state machine– Each state encodes a succesful prefix match

Middle

S5S0 S1 S3 S41 0 1 0 0

0 1

S2

1

0

1

Page 24: YACC no more

Session # 222124

String matching (contd.)

• If only goto were allowed in java …

• But, goto's are allowed in bytecode!

Middle

try { //buf is the buffer to be searched int i = -1; s0: i++; if (buf[i] != '1') goto s0; s1: i++; if (buf[i] != '0') goto s1; s2: i++; if (buf[i] != '1') goto s0; s3: i++; if (buf[i] != '0') goto s1; s4: i++; if (buf[i] != '0') goto s3; s5: i++; return i-5;} catch (ArrayIndexOutOfBoundsException e) { return -1;}

Page 25: YACC no more

Session # 222125

String matching (contd.)

• Using an assembler like jasmin

Middle

iconst_m1 istore_1S0: ;; i++; if a[i] != '1' goto S0; iinc 1 1 ; i++ aload_0 ; load a[i] iload_1 caload bipush 49 ; load '1' if_icmpne S0 ; if .. goto S0S1: ;; i++; if a[i] != '0' goto S1 iinc 1 1 aload_0 iload_1 caload bipush 48 if_icmpne S1

Page 26: YACC no more

Session # 222126

Custom languages

• Craft a language that fits the context you are working in– Avoid XML ugliness: SRML (Simple Rule Markup)– Instead of "if s.purchaseAmount > 100 … "

Middle

<simpleCondition className="ShoppingCart" objectVariable="s"> <binaryExp operator="gt"> <field name="purchaseAmount"/> <constant type="float" value="100"/> </binaryExp> </simpleCondition>

Page 27: YACC no more

Session # 222127

Antlr Introduction

• Antlr: A recursive descent parser with configurable lookahead (LL(k) parser)

• Much, much simpler than lex/yacc– Yacc error messages are cryptic, tough for non-CS

types to understand– Even generated code easy to understand

• Includes tree building and recognition– No such facility in yacc

• Lexer, parser and tree recognizer phase have similar syntax

Middle

Page 28: YACC no more

Session # 222128

Antlr

• Example: hierarchical property list– A list consists of name value pairs– Names are identifiers, values are numbers or lists

Middle

( a 200 b (c 10 d 20))

Page 29: YACC no more

Session # 222129

Antlr (contd.)

Middle

class LispLexer extends Lexer;

ID : ('a' .. 'z')+;

NUM: ('0' .. '9')+;

LP : '(';

RP : ')';

class LispParser extends Parser;

list : LP (nameValuePair)+ RP;

nameValuePair : ID value ;

value : NUM | list;

Page 30: YACC no more

Session # 222130

Antlr (contd.)

Middle

nameValuePair returns [NVP ret=null]

{Object v;}

: t:ID v=value

{ret = new NVP(t.getText(),v);}

;

value returns [Object ret=null]

: t:NUM {ret=t.getText();}

| ret=list

;

• Adding code, arguments, return values

Page 31: YACC no more

Session # 222131

Way out there …

Middle

• Configurable hardware– New circuits on the fly

• Intentional programming– Code not represented as a stream of characters

Page 32: YACC no more

Session # 222132

Summary

• Run-time evaluation gives you a lot of power

• Other languages add features (e.g. closures) to java

• Lots of simple, free, quality parsers, interpreters

• Produce custom java source or byte code for performance

• Roll your own domain-specific language with ANTLR or javacc.

• Yacc No More.

End

Page 33: YACC no more

Session # 222133

References

• Doclets– Doclet tools: www.doclet.com– EJBGen: www.beust.com, Cedric Beust – Icontract: www.reliable-systems.com, Reto Kramer

• Languages, interpreters– Beanshell: www.beanshell.org– Rhino: www.mozilla.org/rhino– Python: www.python.org, www.jython.org– ANTLR: www.antlr.org– More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html

• SRML: xml.coverpages.org/srml.html

End

Page 34: YACC no more

Session # 222134

References (contd.)

• Bytecode manipulation:– Jasmin: mrl.nyu.edu/~meyer/jasmin/– Jikes Bytecode toolkit:

www.alphaworks.ibm.com/tech/jikesbt – BCEL: bcel.sourceforge.net

• "Rapid" - Reconfigurable hardware – www.cs.washington.edu/research

• "The death of computer languages, the birth of intentional programming", Charles Simonyi– research.microsoft.com/scripts/pubs/trpub.asp– Microsoft tech report MSR-TR-95-52

• Thinking in Patterns with Java, Bruce Eckel– www.mindview.net/Books/TIPatterns

End