language oriented programming and language workbenches: building domain languages atop java neal...
TRANSCRIPT
Language Oriented Programming and
Language Workbenches: Building Domain
Languages atop Java
Neal FordApplication Architect
ThoughtWorkswww.nealford.com
Blog: memeagora.blogspot.com
2
Questions, Slides, and Samples
• Please feel free to ask questions anytime• The slides and samples will be available at
www.nealford.com• I’ll show that address again at the end
3
Dynamic Languages
• Look at the way experienced developers use dynamic languages (like Lisp or Ruby)• Developers build meta-languages on top of the dynamic
language• Sometimes they create layers of languages, each closer to the
problem domain and further from the language domain
• The Unix tradition of “little languages”• Active Data Models
4
Examples
• The first on-line “build your own” ecommerce solution created in Lisp• Documented in Hackers and Painters by Paul Graham
• Implementing a PHP-like language in Lisp to create Lisp HTML templates• Created by Gene Michael Stover, Documented in Dr. Dobbs
Journal, Oct 2004• Managed to recreate most of the functionality of PHP (with
some additions) in 200 lines of Lisp code
5
Philosophy
• Well stated in The Art of Unix Programming by Eric Raymond• In Chapter 1, Section 1.6.14• The Rule of Generation: Avoid hand-hacking; write programs to
write programs when you can
• The central idea is that programming languages reside at too low a level
• This notion has been neglected in statically typed modern languages
• But it’s still a good idea!
6
Nomenclature
• Some terminology coined by Martin Fowler
• Domain Specific Language• A limited form of computer language designed for a specific class
of problems.
• Some communities like to use DSL only for problem domain languages, but I'm following the usage that uses DSL for any limited domain.
• Language Oriented Programming• The general style of development which operates about the idea
of building software around a set of domain specific languages
• Language Workbench• A generic term for a new breed of tools that makes it easy to
perform language oriented programming
7
Building Domain Languages
• This session talks about building your own domain languages in and around Java
• The motivation for building DSLs• Internal DSLs
• Built using Java syntax
• External DSL’s• Built using “compiler compilers”
• Language Workbenches
8
Domain Languages Already Underfoot
• Two examples:• The Front Controller in a J2EE web framework• Reading a positional text file
• We’re talking about two types of code• The web framework code; the Reader and Strategy classes
• Abstractions to enable functionality
• The properties file; the configureXXX() methods• Configuration information, utilizing the abstraction framework
• Framework abstraction and configuration are 2 different things• Why do you think there is so much XML in J2EE frameworks?
9
Internal Domain Specific Languages
• These are languages built using the syntactic elements of the underlying language
• In the case of Java, building a DSL using Java classes and methods
10
Java Building Blocks
• Java (and other similar language) make it a little tougher• That annoying strong typing• The fairly rigid infrastructure required by the language• (Not that there’s anything wrong with this)
• You can mitigate the shortcomings of Java by using some coding standards
11
Standards: Composed Method
• Composed method (originally in Smalltalk Best Practice Patterns by Kent Beck)
• Method names are long, readable descriptions of what the method does
• Every method is as discrete as possible• No method longer than 15 lines of code
• Some limit this to 5!
• All public methods read as outlines of tasks, implemented as private methods
12
Composed Method
• Allows you to create methods that mean something in the problem domain
• Each method should perform 1 (and only 1) application to the problem domain
• If the problem domain insists on multi-step operations, model is as composed methods
febLog.add(
new Swim().
onDate("02/01/2005").
forDistance(1250));
13
Example: Workout Log
• This application demonstrates:• Building your own domain language
• Infrastructure
• Application
• Interesting Classes• Exercise (and sub-classes)
• Log
• MonthlyExerLog
• Improvements:• Better robustness• Less state dependent on object instances
14
What Would Make this Cleaner?
• Java doesn’t support operator overloading• One of the few times it would actually come in handy!
• Java’s strong typing forces you to “cruft” up the code just a bit
• Enumerations help• TypeSafe enum is ugly in this conext
• Java forces you to create a class with main()• The compiler is very picky• All goods things…except when you are creating a new
language on top of Java
15
Another Example: jMock
• jMock is a library for testing Java code using mock objects
• Typical jMock code
import org.jmock.*;
class PublisherTest extends MockObjectTestCase { public void testOneSubscriberReceivesAMessage() { Mock mockSubscriber = mock(Subscriber.class); Publisher publisher = new Publisher(); publisher.add((Subscriber) mockSubscriber.proxy()); final String message = "message"; // expectations mockSubscriber.expects(
once()).method("receive").with( eq(message) ); // execute publisher.publish(message); }}
16
Nomenclature
• The concrete syntax of a language is its syntax in its representation that we see
• Is this a legitimate way to write the reader configuration?<ReaderConfiguration> <Mapping Code = "SVCL" TargetClass = "dsl.ServiceCall"> <Field name = "CustomerName" start = "4" end = "18"/> <Field name = "CustomerID" start = "19" end = "23"/> <Field name = "CallTypeCode" start = "24" end = "27"/> <Field name = "DateOfCallString" start = "28" end = "35"/> </Mapping> <Mapping Code = "USGE" TargetClass = "dsl.Usage"> <Field name = "CustomerID" start = "4" end = "8"/> <Field name = "CustomerName" start = "9" end = "22"/> <Field name = "Cycle" start = "30" end = "30"/> <Field name = "ReadDate" start = "31" end = "36"/> </Mapping></ReaderConfiguration>
17
An Easier to Read Version
• Here is the same information, in another syntax
• This is a domain specific language, suitable only for the purpose of mapping fixed length fields into classes
• A classic example of a Unix “little language”
mapping SVCL dsl.ServiceCall 4-18: CustomerName 19-23: CustomerID 24-27 : CallTypeCode 28-35 : DateOfCallString
mapping USGE dsl.Usage 4-8 : CustomerID 9-22: CustomerName 30-30: Cycle 31-36: ReadDate
18
Nomenclature
• The XML file and the DSL file have different concrete syntaxes
• Both share the same basic structure• Multiple mappings• Each with
• A code
• A target class name
• A set of fields
• This basic structure is the abstract syntax of the language
19
Domain Specific Languages
• All three representations of the abstract syntax are in fact DSL• The Java reader configuration code• The XML equivalent• The simple text format
20
The Obligatory Ruby Slide
• What about this version?
• This is one of the reasons people rave about Ruby• Minimally intrusive syntax• Literals for ranges• Flexible runtime evaluation
• Its DSL and Ruby code all rolled into one!
mapping('SVCL', ServiceCall) doextract 4..18, 'customer_name'extract 19..23, 'customer_ID'extract 24..27, 'call_type_code'extract 28..35, 'date_of_call_string'
endmapping('USGE', Usage) do
extract 9..22, 'customer_name'extract 4..8, 'customer_ID'extract 30..30, 'cycle'extract 31..36, 'read_date'
end
21
External Domain Specific Languages
• External DSLs• Written in a different language than the main (host) language of
the application • Transformed into it using some form of compiler or interpreter
• May include• XML configuration files• Plain text configuration files• Full-blown languages
22
Building a Language with XML
• Another option is to use XML as the data container and Java as the “active” parts• Java classes read information from XML for data• Java methods form the domain layer
• Example• Every J2EE framework you’ve ever encountered!• Examples too numerous to mention
• XML is ugly!• Uglier than Java?• Which would your end users like less?• You can mitigate this with XSLT
• A little further away from the problem domain • Now you have XML and Java cruft to deal with
23
The Ultimate External DSL
• Build your own language!• You need 3 things
• A Grammar• A Parser• A Lexer
24
Using Antlr
• Antlr – ANother Tool for Language Recognition• http://www.antlr.org/• Antlr is an open source project that is used in many
other projects• It provides a framework for constructing recognizers,
compilers, and translators from grammatical descriptions
• Supports tree construction, tree walking, and translation
25
Using Antlr: “Hello Whomever”
• First, specify what the language will look like with a grammar
• The grammar is placed in a Lexer• The lexer is a scanner or tokenizer• Responsible for breaking the input stream into a series of
tokens
class L extends Lexer;
NAME: ( 'a'..'z'|'A'..'Z' )+ NEWLINE
;
NEWLINE
: '\r' '\n' // DOS
| '\n' // UNIX
;
26
Parser
• Next, you create a parser• The parser applies the language grammar to the input stream• It reports compilation errors• Both the lexer and parser can reside in the same file
class P extends Parser;
startRule
: n:NAME
{System.out.println( "Hi there, “ + n.getText());}
;
27
Generate the Java Code
• Next, you run the Antlr tool to generate 2 classes, and interface, and a text file: java antlr.Tool t.g
• ANTLR generates: • L.java
The lexical analyzer (Lexer). • P.java
The Parser. • PTokenTypes.java
The token type definitions (integer constants). The Lexer breaks up the input stream of characters into vocabulary symbols called tokens, which are identified by token types.
• PTokenTypes.txtA file containing all the token types that ANTLR can easily read back in if a parser in another file wants to use the vocabulary, lexer, or parser defined in t.g. You can look at this as a persistence file.
28
A Slightly More Interesting Example
• Let’s do math• Consider this lexer
class IntAndIDLexer extends Lexer;
INT : ('0'..'9')+ ;
ID : ('a'..'z')+ ;
COMMA: ',' ;
NEWLINE
: '\r' '\n' // DOS
| '\n' // UNIX
;
29
Math: Parser
class SeriesParser extends Parser;
series
{ int n = 1; // how many elements? At least 1 }
: element (COMMA element {n++;})*
{
System.out.println("there were "+ n + " elements");
}
;
/* Match either an INT or ID */
element
: a:INT { System.out.println(a.getText()); }
| b:ID { System.out.println(b.getText()); }
;
Match an element (INT or ID) with possibly a bunch of ", element" pairs to follow, matching input that looks like 32,a,size,28923,i
This is considered an initialization action and is done before recognition of this rule begins. These look like local variables to the resulting method SeriesParser.series()
30
Abstract Syntax Trees
• To perform real work, you frequently must make multiple passes over the source code
• Antlr makes this easy be facilitating the creation of Abstract Syntax Trees
• AST’s are a structured representation of the text input• You can either
• Walk them by hand• Specify an Antlr tree grammar that describes the tree structure
(along with appropriate actions)
31
AST Example
class SeriesTreeParser extends TreeParser;
series
: ( a:INT {System.out.println(a.getText());}
| b:ID {System.out.println(b.getText());}
)+
;
sumpass {
int sum = 0;
}
: ( a:INT {sum += Integer.parseInt(a.getText());}
| ID
)+
{System.out.println("sum is " + sum);}
;
Match a flat tree (a list) of one or more INTs or IDs. This rule differs from SeriesParser.series(), which is in a different grammar.
Sum up all the integers
32
AST Example
• When walking the tree, you can choose which target you want to execute
• Handy for creating multi-pass languages
parser.series();
// Get the tree out of the parser
AST resultTree = parser.getAST();
// Make an instance of the tree parser
SeriesTreeParser treeParser = new SeriesTreeParser();
treeParser.series(resultTree); // walk AST once
treeParser.sumpass(resultTree); // walk AST again!!
33
Domain Language with Antlr
• Here is a subset of the Exercise Log domain language, implemented with Antlr
• Changes• Weekdays instead of dates (because parsing dates is tougher)• Simple summary target• More “hard coded” values in the parse tree
• Could be made just as dynamic as the original, but at the expense of complexity
34
The ExerLang Domain Language
• Sample “source” file
• The interesting files:• Exer.g• Main
swim on WED for 2150run on TUE for 6.5bike on SAT for 50summary
35
Other Antlr Examples
• Antlr comes with grammars for a huge number of languages• HTML, Java, XML, etc.
• It has hooks for many other languages
36
External DSLs
• Advantages• You are free to use any form you like• Limited only by your ability to parse and lex the language
• Disadvantages• You have to build the translator• They lack symbolic integration• You don’t get the same support for your language as you get for
your base language• What are you going to edit it with? Emacs?• A common objection is that DSLs lead to language cacophony
• Not if used correctly!
37
Internal DSLs
• Advantages• You have the full power of the underlying language• You have full access to sophisticated tools
• Disadvantages• Hard to write in modern “curly brace” languages
• Easier in Ruby and Lisp
• Limited by the syntax and structure of the language• You must understand the base language or you are in syntax
trouble
38
Language Workbenches
• A language workbench is a tool that supports Language oriented programming
• A brand new software category• Today’s language workbenches
• Intentional Software (developed by Charles Simonyi)• Meta Programming System (developed by JetBrains)• Software Factories (developed by Microsoft)
39
Elements of a Language Workbench
• Traditional compilation
40
Post-IntelliJ IDEs
• IntelliJ isn’t just a cool IDE• They were the first mainstream tool that allowed you to
edit against the AST instead of text• That’s how refactoring and other intelligent support
works• Now all the Java IDEs do this
41
Language Workbenches
42
Language Workbenches
• The editable representation is merely a projection of the abstract representation• There is no need for the editable representation to be complete• Some aspects can be missing if they are needed for the task at
hand• You can have multiple projections - each showing a different
aspect of the abstract representation
• The storage representation still has to be worked out• The abstract representation has to be comfortable with
errors and ambiguities
43
Defining a DSL with a Workbench
• Three steps to designing a DSL• Define the abstract syntax (the schema of the abstract
representation)• Define an editor to let people manipulate the abstract
representation through a projection• Define a generator for the executable representation
44
JetBrains MPS
• Creating a Hello, World Language• Define a schema• Define an editor• Define a generator
• Notice that the semantics are completely removed from the implementation
• Schema is very loosely coupled to the generator
45
Summarizing the Tradeoffs in LOP
• The fundamental issue is the benefit of using DSLs versus the cost of building the necessary infrastructure• Using internal DSLs reduces the tool cost • Adds serious constraints on the DSL
• An external DSL gives you the most potential but at a higher cost• This is why LOP hasn’t caught on yet• The advent of Language Workbenches solves the external DSL
issue
46
External DSLs and Non-Programmers
• One of the supposed strengths of COBOL was that you could let non-programmers use it• The COBOL Inference - most technologies that are supposed to
eliminate professional programmers do nothing of the sort
• The Holy Grail of Language oriented programming is to create DSLs that non-programmers can read and write
• DSLs make everyone more productive (programmer and non-programmer)
• If we create code that a business analyst can verify, that’s a big win
• We don’t think that DSLs are the Holy Grail…but it’s closer than COBOL or Java
47
Summary
• Language oriented programming has the potential to be the next big revolution in programming paradigms• Once the tools and techniques permeate the market and
developers minds• The productivity of the tools starts making a big impact
• A level of encapsulation not possible in OOP• Languages close to the problem domain, not just
structures
Neal Ford
www.nealford.com
memeagora.blogspot.com
Questions?
Samples & slides at www.nealford.com