dynamic language embedding with homogeneous tool support
DESCRIPTION
Domain-specific languages (DSLs) are increasingly used as embedded languages within general-purpose host languages. DSLs provide a compact, dedicated syntax for specifying parts of an application related to specialized domains. Unfortunately, such language extensions typically do not integrate well with existing development tools. Editors, compilers and debuggers are either unaware of the extensions, or must be adapted at a non-trivial cost. Furthermore, these embedded languages typically conflict with the grammar of the host language and make it difficult to write hybrid code; few mechanisms exist to control the scope and usage of multiple tightly interconnected embedded languages. In this dissertation we present Helvetia, a novel approach to embed languages into an existing host language by leveraging the underlying representation of the host language used by these tools. We introduce Language Boxes, an approach that offers a simple, modular mechanism to encapsulate (i) compositional changes to the host language, (ii) transformations to address various concerns such as compilation and syntax highlighting, and (iii) scoping rules to control visibility of fine-grained language changes. We describe the design and implementation of Helvetia and Language Boxes, discuss the required infrastructure of a host language enabling language embedding, and validate our approach by case studies that demonstrate different ways to extend or adapt the host language syntax and semantics.TRANSCRIPT
Dynamic Language EmbeddingWith Homogeneous Tool Support
PhD DefenseLukas Renggli
AdvisorOscar Nierstrasz
1
2
SELECT email FROM usersWHERE username = 'lr'
3
SELECT email FROM usersWHERE username = 'lr'
Syntax
4
SELECT email FROM usersWHERE username = 'lr'
Semantics
General PurposeHost Language
5
+
SELECT email FROM usersWHERE username = 'lr'
General PurposeHost Language
6
?
SELECT email FROM usersWHERE username = 'lr'
SyntaxSQL SemanticsSQL
SyntaxHost SemanticsHost
ToolsHost
7
8
Host Language
External Language
9
10
Host Language
Internal Language
11
12
Host Language
Embedded Language
13
Non-StandardHost Language
Embedded Language
14
15
!" #"$%&
Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
16
ConventionalLanguage
17
18 Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
Multiple Context Specific Languages
19
20 Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
Homogeneous Code and Data Abstraction
21
§
§§
22 Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
§
§
§
HomogeneousTool Support
23
Thesis
To support seamless integration of context-dependent languages without breaking the tools, we need
1. a host-language grammar that can be changed by language extensions,
2. a first-class language description used by the development environment, and
3. a transformation mechanism of the embedded language into a common executable representation.
24
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
25
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
26
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
27
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
28
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
29
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
30
31
!" #"$%&
32
Types of Embedded Languages
33
◦ ◦ ◦◦ ◦ ●◦ ● ◦◦ ● ●● ◦ ◦● ◦ ●● ● ◦● ● ●
Syn
tax
Voca
bul
ary
Sem
antic
s
34
Host Language ◦ ◦ ◦◦ ◦ ●◦ ● ◦◦ ● ●● ◦ ◦● ◦ ●● ● ◦● ● ●
Syn
tax
Voca
bul
ary
Sem
antic
s
35
Host Language ◦ ◦ ◦◦ ◦ ●
Internal Language ◦ ● ◦◦ ● ●● ◦ ◦● ◦ ●● ● ◦● ● ●
Syn
tax
Voca
bul
ary
Sem
antic
s
36
Host Language ◦ ◦ ◦◦ ◦ ●
Internal Language ◦ ● ◦Pidgin ◦ ● ●
● ◦ ◦● ◦ ●● ● ◦● ● ●
Syn
tax
Voca
bul
ary
Sem
antic
s
37
Host Language ◦ ◦ ◦◦ ◦ ●
Internal Language ◦ ● ◦Pidgin ◦ ● ●
● ◦ ◦● ◦ ●● ● ◦
Creole ● ● ●S
ynta
x
Voca
bul
ary
Sem
antic
s
38
Host Language ◦ ◦ ◦Argot ◦ ◦ ●Internal Language ◦ ● ◦Pidgin ◦ ● ●
● ◦ ◦● ◦ ●● ● ◦
Creole ● ● ●S
ynta
x
Voca
bul
ary
Sem
antic
s
39
Host Language ◦ ◦ ◦Argot ◦ ◦ ●Internal Language ◦ ● ◦Pidgin ◦ ● ●— ● ◦ ◦— ● ◦ ●— ● ● ◦Creole ● ● ●
Syn
tax
Voca
bul
ary
Sem
antic
s
40
Pidgin ◦ ● ●Creole ● ● ●Argot ◦ ◦ ●
Syn
tax
Voca
bul
ary
Sem
antic
s
41
42
43
44
45
46
Package Name
x = 1
y = 1 (2, 1)
(2, 2)(1, 2)
x = 2
y = 2
Package Name
x = 1
y = 1 (2, 1)
(2, 2)(1, 2)
x = 2
y = 2
47
aBuilder row grow.aBuilder row fill.
aBuilder column grow.aBuilder column fill.
aBuilder x: 1 y: 1 add: (LabelShape new text: [ :each | each name ]; borderColor: #black; borderWidth: 1; yourself).aBuilder x: 1 y: 2 w: 2 h: 1 add: (RectangleShape new borderColor: #black; borderWidth: 1; width: 200; height: 100; yourself)
Package Name
x = 1
y = 1 (2, 1)
(2, 2)(1, 2)
x = 2
y = 2
48
row = grow.row = fill.
column = grow.column = fill.
(1 , 1) = label text: [ :each | each name ]; borderColor: #black; borderWidth: 1.
(1 , 2) -‐ (2 , 1) = rectangle borderColor: #black; borderWidth: 1; width: 200; height: 100.
49
shape { cols: #grow, #fill; rows: #grow, #fill;}label { position: 1 , 1; text: [ :each | each name ]; borderColor: #black; borderWidth: 1;}rectangle { position: 1 , 2; colspan: 2; borderColor: #black; borderWidth: 1; width: 200; height: 100;}
Package Name
x = 1
y = 1 (2, 1)
(2, 2)(1, 2)
x = 2
y = 2
50
51
52
Conventional Language
53
Conventional Language
Context Specific
54
Conventional Language
Context Specific
HomogeneousCode & Data
55
Conventional Language
Context Specific
HomogeneousCode & Data
HomogeneousTool Support
56
Conventional Language
Context Specific
HomogeneousCode & Data
HomogeneousTool Support
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
57
Renggli et al.ECOOP 2010
58 Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
59 Mar
co Z
anol
i, cc
-by-
sa, 2
.5, w
ww
.wik
iped
ia.d
e
Editor
Compiler
Debugger
Syntax
Language 1
60
Editor
Compiler
Debugger
Syntax
Language 1
61
Language 2
Editor
Compiler
Debugger
Syntax
Language 1
62
Language 2
Editor
Compiler
Debugger
Syntax
Language 1
63
Language 2
Language Boxes
64
65
SELECT * FROM users
66
| r |
r :=
^ User fromRow: r
.SELECT * FROM users
LanguageScope
LanguageConcern
LanguageChange
LanguageBox
67
Language Scope
Active?
68
Language Scope
69
‣ System
‣ Packages
‣ Classes
‣ Methods
Language Scope
70
Language Concern
Semantics
71
Language Concern
Transformation
72
Language Concern
73
Context Menus
Navigation Search
Code Expansion
Code Completion
Error Correction
Custom Inspector
Refactorings
Code Folding
Highlighting
Language Change
74
Syntax
Language Change
75
Host Language
Language Change
76
Host Language
+ SQL Language
Language Change
77
Host Language
+ SQL Language
+ ...
Host Language
+ SQL Language
+ ...
= Custom Host Language
Language Change
78
79
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
80
Renggli et al.SLE 2009
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s81
Language and Tool Extensions
LanguageChange
LanguageConcern
LanguageScope
LanguageBox
82
LanguageChange = Host Language δ
83
LanguageChange =
Host Language Grammarδ
84
LanguageChange =
Grammar Transformation
85
scanIdentifier self step. ((currentCharacter between: $A and: $Z) or: [ currentCharacter between: $a and: $z ]) ifTrue: [ [ self recordMatch: #IDENTIFIER. self step. (currentCharacter between: $0 and: $9)or: [ (currentCharacter between: $A and: $Z) or: [ currentCharacter between: $a and: $z ] ] ] whileTrue. ^ self reportLastMatch ]
86
#(#[1 0 9 0 25 0 13 0 34 0 17 0 40 0 21 0 41] #[1 0 9 0 25 0 13 0 34 0 93 0 76 0 157 0 112] #[1 2 38 0 21 2 38 0 25 2 38 0 26 0 13 0 34] #[0 1 154 0 16 0 21 0 25 0 26 0 34 0 40 0 41] #[0 1 210 0 76 0 81] #[0 1 214 0 76 0 81] #[1 0 173 0 76 0 177 0 81] #[0 1 134 0 16 0 21 0 25 0 26 0 34 0 40 0 41] #[1 1 46 0 21 1 46 0 25 1 46 0 26 1 69] #[1 1 54 0 21 1 54 0 25 1 54 0 26 1 54 0 34] #[0 2 102 0 21 0 25 0 26 0 34 0 40 0 41 0 76]
#[0 2 50 0 21 0 25 0 26 0 76 0 79] #[1 1 13 0 76 2 85 0 124 1 21 0 125] #[1 2 89 0 17 2 30 0 21 2 30 0 82] #[1 2 93 0 21 2 97 0 82] )
87
ScannerlessParser Combinator
88
a..z a..z
0..9
ID ::= letter { letter | digit } ;
89
letter
letter digit
sequence
choice
many
ID ::= letter { letter | digit } ;
90
GrammarTransformation
91
letter
letter digit
sequence
choice
many
letter ! letter | "_"
92
letter
digit
sequence
choice
manychoice
_
letter
choice
_
letter ! letter | "_"
93
letter digit
sequence
choice
manychoice
_
Optimizations
94
GrammarComposition
95
Insert grammar fragment
before/after grammar production
as a choice/sequence/replacement.
96
Language Change
Insert SQL grammar
after expression production
as an additional choice
97
Language Change
98
Language Change
Variable
Literal
Parens
Expr:
99
Language Change
Variable
Literal
Parens
Expr:
SQL
Conflicts &Ambiguities
100
SELECT * FROM users
101
SELECT * FROM users
102
| r |
r :=
^ User fromRow: r
.
expr sql|
103
SELECT * FROM users<SQL: >
104
Parsing Expression Grammars
105
expr sql/
ordered
106
expr sql/
no conflict
107
expr sql/
108
surprise
/ exprsql
109
surprise
expr sql|
110
unordered
expr sql|
111
!sql expr / !expr sql
expr sql$
112
!sql expr / !expr sql / ui
expr sql$
113
!sql expr / !expr sql / ui
Language and Tool Extensions
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Renggli et al.DYLA 2010
114
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
115
Assignments and Swapping
Asynchronous Messages
Automaton
Brainfuck Language
Functional Pattern Matching
Grammar Definition
Message Pipes
Mondrian
Object Relationships
Positional Arguments
Program Checking
Quasiquoting
Regular Expression
Roman Numbers
SPath Expression
SQL
Schematic Tables
String Interpolation
Transactional Memory
Tuple Space
116[http://scg.unibe.ch/research/helvetia/examples]
Assignments and Swapping
Asynchronous Messages
Automaton
Brainfuck Language
Functional Pattern Matching
Grammar Definition
Message Pipes
✓Mondrian
Object Relationships
Positional Arguments
Program Checking
Quasiquoting
Regular Expression
Roman Numbers
SPath Expression
✓SQL
Schematic Tables
String Interpolation
Transactional Memory
Tuple Space
117[http://scg.unibe.ch/research/helvetia/examples]
Assignments and Swapping
Asynchronous Messages
Automaton
Brainfuck Language
Functional Pattern Matching
Grammar Definition
Message Pipes
✓Mondrian
Object Relationships
Positional Arguments
‣ Program Checking
‣ Quasiquoting
Regular Expression
Roman Numbers
SPath Expression
✓SQL
Schematic Tables
String Interpolation
‣ Transactional Memory
Tuple Space
118[http://scg.unibe.ch/research/helvetia/examples]
applyhasChangedhasConflict
Change
object
*changes
Process0..1
currentTransaction
do: aBlockretry: aBlockcheckpointabort: anObject
escapeContext
Transaction
previousCopyworkingCopy
ObjectChange
applyBlockconflictTestBlock
CustomChange
*
Transactional Memory
119
Meta-Programming Facilities
``(`,(aString) asRegex)
120
Domain-SpecificProgram Checking
121
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
Renggli et al.
CLSS 2009 Renggli et al.
IWST 2009 Nierstrasz et al.
LNCS 2009 Renggli et al.
TOOLS 2010
122
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
123
To support seamless integration of context-dependent languages without breaking the tools, we need
1. a host-language grammar that can be changed by language extensions,
2. a first-class language description used by the development environment, and
3. a transformation mechanism of the embedded language into a common executable representation.
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
124
To support seamless integration of context-dependent languages without breaking the tools, we need
1. a host-language grammar that can be changed by language extensions,
2. a first-class language description used by the development environment, and
3. a transformation mechanism of the embedded language into a common executable representation.
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
125
To support seamless integration of context-dependent languages without breaking the tools, we need
1. a host-language grammar that can be changed by language extensions,
2. a first-class language description used by the development environment, and
3. a transformation mechanism of the embedded language into a common executable representation.
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s
Language and Tool Extensions
126
To support seamless integration of context-dependent languages without breaking the tools, we need
1. a host-language grammar that can be changed by language extensions,
2. a first-class language description used by the development environment, and
3. a transformation mechanism of the embedded language into a common executable representation.
Language Boxes
Host Language
Dyn
amic
G
ram
mar
s127
Language and Tool Extensions