language oriented programming and language workbenches: building domain languages atop java neal...

48
Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks www.nealford.com www.thoughtworks.com [email protected] Blog: memeagora.blogspot.com

Upload: mildred-ball

Post on 24-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

Language Oriented Programming and

Language Workbenches: Building Domain

Languages atop Java

Neal FordApplication Architect

ThoughtWorkswww.nealford.com

[email protected]

Blog: memeagora.blogspot.com

Page 2: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

2

Questions, Slides, and Samples

• Please feel free to ask questions anytime• The slides and samples will be available at

www.nealford.com• I’ll show that address again at the end

Page 3: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

3

Dynamic Languages

• Look at the way experienced developers use dynamic languages (like Lisp or Ruby)• Developers build meta-languages on top of the dynamic

language• Sometimes they create layers of languages, each closer to the

problem domain and further from the language domain

• The Unix tradition of “little languages”• Active Data Models

Page 4: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

4

Examples

• The first on-line “build your own” ecommerce solution created in Lisp• Documented in Hackers and Painters by Paul Graham

• Implementing a PHP-like language in Lisp to create Lisp HTML templates• Created by Gene Michael Stover, Documented in Dr. Dobbs

Journal, Oct 2004• Managed to recreate most of the functionality of PHP (with

some additions) in 200 lines of Lisp code

Page 5: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

5

Philosophy

• Well stated in The Art of Unix Programming by Eric Raymond• In Chapter 1, Section 1.6.14• The Rule of Generation: Avoid hand-hacking; write programs to

write programs when you can

• The central idea is that programming languages reside at too low a level

• This notion has been neglected in statically typed modern languages

• But it’s still a good idea!

Page 6: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

6

Nomenclature

• Some terminology coined by Martin Fowler

• Domain Specific Language• A limited form of computer language designed for a specific class

of problems.

• Some communities like to use DSL only for problem domain languages, but I'm following the usage that uses DSL for any limited domain.

• Language Oriented Programming• The general style of development which operates about the idea

of building software around a set of domain specific languages

• Language Workbench• A generic term for a new breed of tools that makes it easy to

perform language oriented programming

Page 7: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

7

Building Domain Languages

• This session talks about building your own domain languages in and around Java

• The motivation for building DSLs• Internal DSLs

• Built using Java syntax

• External DSL’s• Built using “compiler compilers”

• Language Workbenches

Page 8: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

8

Domain Languages Already Underfoot

• Two examples:• The Front Controller in a J2EE web framework• Reading a positional text file

• We’re talking about two types of code• The web framework code; the Reader and Strategy classes

• Abstractions to enable functionality

• The properties file; the configureXXX() methods• Configuration information, utilizing the abstraction framework

• Framework abstraction and configuration are 2 different things• Why do you think there is so much XML in J2EE frameworks?

Page 9: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

9

Internal Domain Specific Languages

• These are languages built using the syntactic elements of the underlying language

• In the case of Java, building a DSL using Java classes and methods

Page 10: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

10

Java Building Blocks

• Java (and other similar language) make it a little tougher• That annoying strong typing• The fairly rigid infrastructure required by the language• (Not that there’s anything wrong with this)

• You can mitigate the shortcomings of Java by using some coding standards

Page 11: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

11

Standards: Composed Method

• Composed method (originally in Smalltalk Best Practice Patterns by Kent Beck)

• Method names are long, readable descriptions of what the method does

• Every method is as discrete as possible• No method longer than 15 lines of code

• Some limit this to 5!

• All public methods read as outlines of tasks, implemented as private methods

Page 12: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

12

Composed Method

• Allows you to create methods that mean something in the problem domain

• Each method should perform 1 (and only 1) application to the problem domain

• If the problem domain insists on multi-step operations, model is as composed methods

febLog.add(

new Swim().

onDate("02/01/2005").

forDistance(1250));

Page 13: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

13

Example: Workout Log

• This application demonstrates:• Building your own domain language

• Infrastructure

• Application

• Interesting Classes• Exercise (and sub-classes)

• Log

• MonthlyExerLog

• Improvements:• Better robustness• Less state dependent on object instances

Page 14: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

14

What Would Make this Cleaner?

• Java doesn’t support operator overloading• One of the few times it would actually come in handy!

• Java’s strong typing forces you to “cruft” up the code just a bit

• Enumerations help• TypeSafe enum is ugly in this conext

• Java forces you to create a class with main()• The compiler is very picky• All goods things…except when you are creating a new

language on top of Java

Page 15: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

15

Another Example: jMock

• jMock is a library for testing Java code using mock objects

• Typical jMock code

import org.jmock.*;

class PublisherTest extends MockObjectTestCase { public void testOneSubscriberReceivesAMessage() { Mock mockSubscriber = mock(Subscriber.class); Publisher publisher = new Publisher(); publisher.add((Subscriber) mockSubscriber.proxy()); final String message = "message"; // expectations mockSubscriber.expects(

once()).method("receive").with( eq(message) ); // execute publisher.publish(message); }}

Page 16: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

16

Nomenclature

• The concrete syntax of a language is its syntax in its representation that we see

• Is this a legitimate way to write the reader configuration?<ReaderConfiguration> <Mapping Code = "SVCL" TargetClass = "dsl.ServiceCall"> <Field name = "CustomerName" start = "4" end = "18"/> <Field name = "CustomerID" start = "19" end = "23"/> <Field name = "CallTypeCode" start = "24" end = "27"/> <Field name = "DateOfCallString" start = "28" end = "35"/> </Mapping> <Mapping Code = "USGE" TargetClass = "dsl.Usage"> <Field name = "CustomerID" start = "4" end = "8"/> <Field name = "CustomerName" start = "9" end = "22"/> <Field name = "Cycle" start = "30" end = "30"/> <Field name = "ReadDate" start = "31" end = "36"/> </Mapping></ReaderConfiguration>

Page 17: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

17

An Easier to Read Version

• Here is the same information, in another syntax

• This is a domain specific language, suitable only for the purpose of mapping fixed length fields into classes

• A classic example of a Unix “little language”

mapping SVCL dsl.ServiceCall 4-18: CustomerName 19-23: CustomerID 24-27 : CallTypeCode 28-35 : DateOfCallString

mapping USGE dsl.Usage 4-8 : CustomerID 9-22: CustomerName 30-30: Cycle 31-36: ReadDate

Page 18: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

18

Nomenclature

• The XML file and the DSL file have different concrete syntaxes

• Both share the same basic structure• Multiple mappings• Each with

• A code

• A target class name

• A set of fields

• This basic structure is the abstract syntax of the language

Page 19: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

19

Domain Specific Languages

• All three representations of the abstract syntax are in fact DSL• The Java reader configuration code• The XML equivalent• The simple text format

Page 20: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

20

The Obligatory Ruby Slide

• What about this version?

• This is one of the reasons people rave about Ruby• Minimally intrusive syntax• Literals for ranges• Flexible runtime evaluation

• Its DSL and Ruby code all rolled into one!

mapping('SVCL', ServiceCall) doextract 4..18, 'customer_name'extract 19..23, 'customer_ID'extract 24..27, 'call_type_code'extract 28..35, 'date_of_call_string'

endmapping('USGE', Usage) do

extract 9..22, 'customer_name'extract 4..8, 'customer_ID'extract 30..30, 'cycle'extract 31..36, 'read_date'

end

Page 21: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

21

External Domain Specific Languages

• External DSLs• Written in a different language than the main (host) language of

the application • Transformed into it using some form of compiler or interpreter

• May include• XML configuration files• Plain text configuration files• Full-blown languages

Page 22: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

22

Building a Language with XML

• Another option is to use XML as the data container and Java as the “active” parts• Java classes read information from XML for data• Java methods form the domain layer

• Example• Every J2EE framework you’ve ever encountered!• Examples too numerous to mention

• XML is ugly!• Uglier than Java?• Which would your end users like less?• You can mitigate this with XSLT

• A little further away from the problem domain • Now you have XML and Java cruft to deal with

Page 23: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

23

The Ultimate External DSL

• Build your own language!• You need 3 things

• A Grammar• A Parser• A Lexer

Page 24: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

24

Using Antlr

• Antlr – ANother Tool for Language Recognition• http://www.antlr.org/• Antlr is an open source project that is used in many

other projects• It provides a framework for constructing recognizers,

compilers, and translators from grammatical descriptions

• Supports tree construction, tree walking, and translation

Page 25: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

25

Using Antlr: “Hello Whomever”

• First, specify what the language will look like with a grammar

• The grammar is placed in a Lexer• The lexer is a scanner or tokenizer• Responsible for breaking the input stream into a series of

tokens

class L extends Lexer;

NAME: ( 'a'..'z'|'A'..'Z' )+ NEWLINE

;

NEWLINE

: '\r' '\n' // DOS

| '\n' // UNIX

;

Page 26: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

26

Parser

• Next, you create a parser• The parser applies the language grammar to the input stream• It reports compilation errors• Both the lexer and parser can reside in the same file

class P extends Parser;

startRule

: n:NAME

{System.out.println( "Hi there, “ + n.getText());}

;

Page 27: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

27

Generate the Java Code

• Next, you run the Antlr tool to generate 2 classes, and interface, and a text file: java antlr.Tool t.g

• ANTLR generates: • L.java

The lexical analyzer (Lexer). • P.java

The Parser. • PTokenTypes.java

The token type definitions (integer constants). The Lexer breaks up the input stream of characters into vocabulary symbols called tokens, which are identified by token types.

• PTokenTypes.txtA file containing all the token types that ANTLR can easily read back in if a parser in another file wants to use the vocabulary, lexer, or parser defined in t.g. You can look at this as a persistence file.

Page 28: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

28

A Slightly More Interesting Example

• Let’s do math• Consider this lexer

class IntAndIDLexer extends Lexer;

INT : ('0'..'9')+ ;

ID : ('a'..'z')+ ;

COMMA: ',' ;

NEWLINE

: '\r' '\n' // DOS

| '\n' // UNIX

;

Page 29: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

29

Math: Parser

class SeriesParser extends Parser;

series

{ int n = 1; // how many elements? At least 1 }

: element (COMMA element {n++;})*

{

System.out.println("there were "+ n + " elements");

}

;

/* Match either an INT or ID */

element

: a:INT { System.out.println(a.getText()); }

| b:ID { System.out.println(b.getText()); }

;

Match an element (INT or ID) with possibly a bunch of ", element" pairs to follow, matching input that looks like 32,a,size,28923,i

This is considered an initialization action and is done before recognition of this rule begins. These look like local variables to the resulting method SeriesParser.series()

Page 30: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

30

Abstract Syntax Trees

• To perform real work, you frequently must make multiple passes over the source code

• Antlr makes this easy be facilitating the creation of Abstract Syntax Trees

• AST’s are a structured representation of the text input• You can either

• Walk them by hand• Specify an Antlr tree grammar that describes the tree structure

(along with appropriate actions)

Page 31: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

31

AST Example

class SeriesTreeParser extends TreeParser;

series

: ( a:INT {System.out.println(a.getText());}

| b:ID {System.out.println(b.getText());}

)+

;

sumpass {

int sum = 0;

}

: ( a:INT {sum += Integer.parseInt(a.getText());}

| ID

)+

{System.out.println("sum is " + sum);}

;

Match a flat tree (a list) of one or more INTs or IDs. This rule differs from SeriesParser.series(), which is in a different grammar.

Sum up all the integers

Page 32: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

32

AST Example

• When walking the tree, you can choose which target you want to execute

• Handy for creating multi-pass languages

parser.series();

// Get the tree out of the parser

AST resultTree = parser.getAST();

// Make an instance of the tree parser

SeriesTreeParser treeParser = new SeriesTreeParser();

treeParser.series(resultTree); // walk AST once

treeParser.sumpass(resultTree); // walk AST again!!

Page 33: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

33

Domain Language with Antlr

• Here is a subset of the Exercise Log domain language, implemented with Antlr

• Changes• Weekdays instead of dates (because parsing dates is tougher)• Simple summary target• More “hard coded” values in the parse tree

• Could be made just as dynamic as the original, but at the expense of complexity

Page 34: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

34

The ExerLang Domain Language

• Sample “source” file

• The interesting files:• Exer.g• Main

swim on WED for 2150run on TUE for 6.5bike on SAT for 50summary

Page 35: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

35

Other Antlr Examples

• Antlr comes with grammars for a huge number of languages• HTML, Java, XML, etc.

• It has hooks for many other languages

Page 36: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

36

External DSLs

• Advantages• You are free to use any form you like• Limited only by your ability to parse and lex the language

• Disadvantages• You have to build the translator• They lack symbolic integration• You don’t get the same support for your language as you get for

your base language• What are you going to edit it with? Emacs?• A common objection is that DSLs lead to language cacophony

• Not if used correctly!

Page 37: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

37

Internal DSLs

• Advantages• You have the full power of the underlying language• You have full access to sophisticated tools

• Disadvantages• Hard to write in modern “curly brace” languages

• Easier in Ruby and Lisp

• Limited by the syntax and structure of the language• You must understand the base language or you are in syntax

trouble

Page 38: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

38

Language Workbenches

• A language workbench is a tool that supports Language oriented programming

• A brand new software category• Today’s language workbenches

• Intentional Software (developed by Charles Simonyi)• Meta Programming System (developed by JetBrains)• Software Factories (developed by Microsoft)

Page 39: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

39

Elements of a Language Workbench

• Traditional compilation

Page 40: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

40

Post-IntelliJ IDEs

• IntelliJ isn’t just a cool IDE• They were the first mainstream tool that allowed you to

edit against the AST instead of text• That’s how refactoring and other intelligent support

works• Now all the Java IDEs do this

Page 41: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

41

Language Workbenches

Page 42: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

42

Language Workbenches

• The editable representation is merely a projection of the abstract representation• There is no need for the editable representation to be complete• Some aspects can be missing if they are needed for the task at

hand• You can have multiple projections - each showing a different

aspect of the abstract representation

• The storage representation still has to be worked out• The abstract representation has to be comfortable with

errors and ambiguities

Page 43: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

43

Defining a DSL with a Workbench

• Three steps to designing a DSL• Define the abstract syntax (the schema of the abstract

representation)• Define an editor to let people manipulate the abstract

representation through a projection• Define a generator for the executable representation

Page 44: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

44

JetBrains MPS

• Creating a Hello, World Language• Define a schema• Define an editor• Define a generator

• Notice that the semantics are completely removed from the implementation

• Schema is very loosely coupled to the generator

Page 45: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

45

Summarizing the Tradeoffs in LOP

• The fundamental issue is the benefit of using DSLs versus the cost of building the necessary infrastructure• Using internal DSLs reduces the tool cost • Adds serious constraints on the DSL

• An external DSL gives you the most potential but at a higher cost• This is why LOP hasn’t caught on yet• The advent of Language Workbenches solves the external DSL

issue

Page 46: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

46

External DSLs and Non-Programmers

• One of the supposed strengths of COBOL was that you could let non-programmers use it• The COBOL Inference - most technologies that are supposed to

eliminate professional programmers do nothing of the sort

• The Holy Grail of Language oriented programming is to create DSLs that non-programmers can read and write

• DSLs make everyone more productive (programmer and non-programmer)

• If we create code that a business analyst can verify, that’s a big win

• We don’t think that DSLs are the Holy Grail…but it’s closer than COBOL or Java

Page 47: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

47

Summary

• Language oriented programming has the potential to be the next big revolution in programming paradigms• Once the tools and techniques permeate the market and

developers minds• The productivity of the tools starts making a big impact

• A level of encapsulation not possible in OOP• Languages close to the problem domain, not just

structures

Page 48: Language Oriented Programming and Language Workbenches: Building Domain Languages atop Java Neal Ford Application Architect ThoughtWorks

Neal Ford

www.nealford.com

[email protected]

memeagora.blogspot.com

Questions?

Samples & slides at www.nealford.com