assessing the quality of model-to-model transformations ...€¦ · in model-driven software...

Assessing the Quality of Model-to-ModelTransformations Based on Scenarios

by

Sebastian Lehrig

Faculty of Electrical Engineering – Computer Science – MathematicsHeinz Nixdorf Institute and Department of Computer Science

Software Engineering GroupZukunftsmeile 133102 Paderborn

Assessing the Quality ofModel-to-Model Transformations

Based on Scenarios

Master’s ThesisSubmitted to the Software Engineering Research Group

in Partial Fulfillment of the Requirements for theDegree of

Master of Science

bySebastian Lehrig

Konigstr. 3433098 Paderborn

Thesis Supervisor:Jun.-Prof. Dr.-Ing. Steffen Becker

andProf. Dr. Gregor Engels

Paderborn, October 2012

Abstract

In model-driven software development, model-to-model (M2M) transformationsare primary software artifacts. Consequently, software engineers need to select areasonable M2M transformation technology (i.e., transformation approach, lan-guage, and engine) for a given transformation scenario. However, there is a lackof guidance in selecting a suitable M2M transformation technology and apply-ing it with high quality standards: (1) scenario classifications characterizing thescenario’s requirements are immature and (2) comparisons of different M2M tech-nologies are not supported by empirical evidence regarding their quality properties.

To cope with these issues, this thesis introduces an initial framework for assess-ing and comparing the quality of M2M transformation approaches, languages, andengines. In particular, the framework provides a more mature scenario classifica-tion for characterizing a given M2M transformation scenario. The thesis showsthe applicability of the framework by comparing Java, QVT-O, and QVT-R trans-formations regarding their maintainability. This comparison is supported by em-pirical data, amongst others, collected with a questionnaire. The gained insightsprovide a first guidance in selecting a suitable M2M transformation technology.

Zusammenfassung (German Abstract)

In der modellgetriebenen Softwareentwicklung sind Modell-zu-Modell-Transformationen (M2M-Transformationen) primare Softwareartefakte. Folglichmussen Softwareingenieure zur Implementierung eines Transformationsszenarioseine angemessene Auswahl an M2M-Transformationstechnologien (d.h. Trans-formationsansatz, -sprache und -engine) treffen. Es gibt allerdings einen Mangelan Hilfestellungen, die die Selektion einer geeigneten Technologie und ihreAnwendung unter hohen Qualitatsstandards ermoglichen: Bisherige Szenario-klassifizierungen zur Charakterisierung ihrer Anforderungen sind unausgereiftund Vergleiche von verschiedenen M2M-Transformationstechnologien werdennicht durch empirische Belege bezuglich ihrer Qualitatseigenschaften gestutzt.

Um hierfur eine Losung zu bieten, fuhrt diese Arbeit ein Frame-work zum Bewerten und Vergleichen der Qualitat von verschiedenen M2M-Transformationsansatzen, -sprachen und -engines ein. Insbesondere bietetdas Framework eine ausgereiftere Szenarioklassifizierung zur Charakterisierungeines gegebenen M2M-Transformationsszenarios. Die Arbeit zeigt die Anwend-barkeit des Frameworks durch den Vergleich von Java-, QVT-O- und QVT-R-Transformationen bezuglich ihrer Wartbarkeit. Der Vergleich wird durch em-pirische Daten unterstutzt, die u.a. im Rahmen einer Umfrage gesammelt wurden.Die gewonnenen Erkenntnisse bieten eine erste Hilfestellung bei der Auswahl einergeeigneten M2M-Transformationstechnologie.

Declaration(Translation from German)

I hereby declare that I prepared this thesis entirely on my own and have notused outside sources without declaration in the text. Any concepts or quotationsapplicable to these sources are clearly attributed to them. This thesis has notbeen submitted in the same or substantially similar version, not even in part, toany other authority for grading and has not been published elsewhere.

Original Declaration Text in German:

Erklarung

Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung andererals der angegebenen Quellen angefertigt habe und dass die Arbeit in gleicher oderahnlicher Form noch keiner anderen Prufungsbehorde vorgelegen hat und vondieser als Teil einer Prufungsleistung angenommen worden ist. Alle Ausfuhrun-gen, die wortlich oder sinngemaß ubernommen worden sind, sind als solche ge-kennzeichnet.

City, Date Signature

vii

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Solution Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Fundamentals 72.1 Model-Driven Software Development (MDSD) . . . . . . . . . . . 7

2.1.1 Models and Metamodels . . . . . . . . . . . . . . . . . . . 72.1.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Goal/Question/Metric (GQM) Method . . . . . . . . . . . 102.2.2 Software Quality Models . . . . . . . . . . . . . . . . . . . 122.2.3 Data Collection Procedures . . . . . . . . . . . . . . . . . 13

3 Model-to-Model (M2M) Transformations 153.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 Transformation Rules . . . . . . . . . . . . . . . . . . . . . 173.1.2 Location Determination . . . . . . . . . . . . . . . . . . . 213.1.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.4 Rule Organization . . . . . . . . . . . . . . . . . . . . . . 233.1.5 Source-Target Relationship . . . . . . . . . . . . . . . . . . 233.1.6 Incrementality . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.7 Directionality . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.8 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.1 Direct-Manipulation . . . . . . . . . . . . . . . . . . . . . 263.2.2 Structure-Driven . . . . . . . . . . . . . . . . . . . . . . . 273.2.3 Operational . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.4 Template-Based . . . . . . . . . . . . . . . . . . . . . . . . 273.2.5 Relational . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6 Graph-Transformation-Based . . . . . . . . . . . . . . . . 273.2.7 Hybrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.1 Java and EMF . . . . . . . . . . . . . . . . . . . . . . . . 283.3.2 Query/View/Transformation (QVT) . . . . . . . . . . . . 293.3.3 Atlas Transformation Language (ATL) . . . . . . . . . . . 34

ix

Contents

3.4 Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.1 Medini QVT . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4.2 ModelMorf . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.3 QVT Operational . . . . . . . . . . . . . . . . . . . . . . . 363.4.4 SmartQVT . . . . . . . . . . . . . . . . . . . . . . . . . . 373.4.5 ATL Transformation Engine . . . . . . . . . . . . . . . . . 37

3.5 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.1 Related Classifications . . . . . . . . . . . . . . . . . . . . 383.5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . 403.5.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5.4 Discussion of the Scenario Classification . . . . . . . . . . 46

3.6 Quality Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 483.6.1 Functional Suitability . . . . . . . . . . . . . . . . . . . . . 483.6.2 Performance Efficiency . . . . . . . . . . . . . . . . . . . . 483.6.3 Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . 493.6.4 Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.6.5 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.6.6 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.6.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . 513.6.8 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Qualitative Comparison 534.1 Approach/Language/Engine Combinations . . . . . . . . . . . . . 534.2 Case Study Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Goal/Question/Metric Plan 615.1 General Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Related Work with Similar Goals . . . . . . . . . . . . . . . . . . 625.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.4 Template for Questions, Metrics, and Hypotheses . . . . . . . . . 66

5.4.1 What is the quality property X of the implementations? . 665.4.2 What are the reasons for differences in quality property X? 70

5.5 Questions, Metrics, and Hypotheses . . . . . . . . . . . . . . . . . 765.5.1 ModuQ1/ModuQ2: Modularity . . . . . . . . . . . . . . . 775.5.2 ReuseQ1/ReuseQ2: Reusability . . . . . . . . . . . . . . . 795.5.3 AnaQ1/AnaQ2: Analyzability . . . . . . . . . . . . . . . . 805.5.4 ModiQ1/ModiQ2: Modifiability . . . . . . . . . . . . . . . 835.5.5 ConsQ1/ConsQ2: Consistency . . . . . . . . . . . . . . . . 845.5.6 ApproQ1/ApproQ2: Appropriateness Recognizability . . . 865.5.7 LearnQ1/LearnQ2: Learnability . . . . . . . . . . . . . . . 88

5.6 Discussion of the GQM Plan . . . . . . . . . . . . . . . . . . . . . 92

6 Data Collection 936.1 Refined Measurement Scope . . . . . . . . . . . . . . . . . . . . . 93

x

Contents

6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.2.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 956.2.2 Measurement Tool “M2M Quality” . . . . . . . . . . . . . 976.2.3 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.3.1 Automated . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.3.2 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . 1006.3.3 Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Interpretation 1037.1 Answers to the Questions of the GQM Plan . . . . . . . . . . . . 103

7.1.1 What is the Modularity of the Implementations? . . . . . . 1047.1.2 What are the Reasons for Differences in Modularity? . . . 1057.1.3 What is the Reusability of the Implementations? . . . . . 1067.1.4 What are the Reasons for Differences in Reusability? . . . 1077.1.5 What is the Analyzability of the Implementations? . . . . 1087.1.6 What are the Reasons for Differences in Analyzability? . . 1087.1.7 What is the Modifiability of the Implementations? . . . . . 1107.1.8 What are the Reasons for Differences in Modifiability? . . 1107.1.9 What is the Consistency of the Implementations? . . . . . 1127.1.10 What are the Reasons for Differences in Consistency? . . . 1137.1.11 What is the Appropriateness Recognizability of the Imple-

mentations? . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.1.12 What are the Reasons for Differences in Appropriateness

Recognizability? . . . . . . . . . . . . . . . . . . . . . . . . 1147.1.13 What is the Learnability of the Language/Engine Combi-

nations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.1.14 What are the Reasons for Differences in Learnability? . . . 117

7.2 Derived Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . 1187.3 Discussion of the Goal Attainment . . . . . . . . . . . . . . . . . 1207.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.4.1 Conclusion Validity . . . . . . . . . . . . . . . . . . . . . . 1227.4.2 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . 1237.4.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . 1247.4.4 External Validity . . . . . . . . . . . . . . . . . . . . . . . 1257.4.5 Discussion of Threats to Validity . . . . . . . . . . . . . . 125

8 Conclusions 1278.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278.2 Knowledge Gained . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

xi

Contents

Appendix

A Feature Models 131

B Deliverables 133B.1 Case Study Implementations . . . . . . . . . . . . . . . . . . . . . 133B.2 Measurement Tool “M2M Quality” . . . . . . . . . . . . . . . . . 136B.3 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

C Results 147C.1 Questionnaire Results . . . . . . . . . . . . . . . . . . . . . . . . . 147

C.1.1 Average X Points in Questionnaire (GM1.2.i) . . . . . . . 147C.1.2 Answers to GQ2 in Questionnaire (GM2.11) . . . . . . . . 151

C.2 Results Independent of Scenario Implementations . . . . . . . . . 157C.2.1 Qualitative Differences (GM2.1) . . . . . . . . . . . . . . . 157C.2.2 Possible Language Constructs (Learn1.1.1) . . . . . . . . . 159C.2.3 Applied Language Constructs (Learn1.1.2) . . . . . . . . . 159C.2.4 Size of the Documentation (Learn2.12 - Learn2.16) . . . . 160

C.3 Measurements per Scenario . . . . . . . . . . . . . . . . . . . . . 161C.3.1 Measurement Template . . . . . . . . . . . . . . . . . . . . 161C.3.2 SimpleUML to SimpleRDBMS . . . . . . . . . . . . . . . . 167C.3.3 Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172C.3.4 Rule1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177C.3.5 Rule2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182C.3.6 Rule3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187C.3.7 Rule4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.3.8 Rule5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197C.3.9 Rule6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202C.3.10 Rule7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207C.3.11 Rule8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212C.3.12 Rule9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217C.3.13 Rule10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222C.3.14 Rule11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227C.3.15 Rule12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

C.4 Measurement Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 237C.5 Evaluation of Metric Measurements and Hypotheses per Question 256

C.5.1 General Hypotheses of the Reason Question . . . . . . . . 256C.5.2 ModuQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256C.5.3 ModuQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260C.5.4 ReuseQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261C.5.5 ReuseQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263C.5.6 AnaQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264C.5.7 AnaQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265C.5.8 ModiQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

xii

Contents

C.5.9 ModiQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267C.5.10 ConsQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268C.5.11 ConsQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269C.5.12 ApproQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270C.5.13 ApproQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273C.5.14 LearnQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275C.5.15 LearnQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Bibliography 279

xiii

Figures

2.1 The Four Phases and Deliverables of the GQM Method . . . . . . 112.2 ISO/IEC’s Product Quality Model . . . . . . . . . . . . . . . . . 13

3.1 General Concept of M2M Transformations . . . . . . . . . . . . . 153.2 Top-Level Feature Diagram . . . . . . . . . . . . . . . . . . . . . 173.3 Features of Transformation Rules . . . . . . . . . . . . . . . . . . 173.4 Features of Domains . . . . . . . . . . . . . . . . . . . . . . . . . 183.5 Features of Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 193.6 Features of Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.7 Features of Typing . . . . . . . . . . . . . . . . . . . . . . . . . . 203.8 Features of Parametrization . . . . . . . . . . . . . . . . . . . . . 213.9 Features of Location Determination . . . . . . . . . . . . . . . . . 213.10 Features of Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 223.11 Features of Rule Organization . . . . . . . . . . . . . . . . . . . . 233.12 Features of the Source-Target Relationship . . . . . . . . . . . . . 243.13 Features of Incrementality . . . . . . . . . . . . . . . . . . . . . . 243.14 Features of Directionality . . . . . . . . . . . . . . . . . . . . . . . 253.15 Features of Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . 263.16 Conceptual Overview of the QVT Specification . . . . . . . . . . 303.17 Features of Higher-Order Transformations . . . . . . . . . . . . . 403.18 Feature-Based Classification of Model Transformation Scenarios . 413.19 Criteria of Model Transformation Scenarios . . . . . . . . . . . . . 42

6.1 General Process of “M2M Quality” . . . . . . . . . . . . . . . . . 98

7.1 Derived Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . 1197.2 Experiment Principles and Threats to Validity . . . . . . . . . . . 121

B.1 SimpleUML Metamodel . . . . . . . . . . . . . . . . . . . . . . . 134B.2 SimpleRDBMS Metamodel . . . . . . . . . . . . . . . . . . . . . . 135B.3 Shapes Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . 136B.4 Component Diagram of M2M Quality . . . . . . . . . . . . . . . . 137

C.1 Number of Included Modules . . . . . . . . . . . . . . . . . . . . . 238C.2 Number of Applied Reuse Mechanisms . . . . . . . . . . . . . . . 238C.3 Average Number of When Clauses . . . . . . . . . . . . . . . . . . 239C.4 Number of Intermediate Structures . . . . . . . . . . . . . . . . . 239

xv

Figures

C.5 Average Distinct Phases per Rule . . . . . . . . . . . . . . . . . . 240C.6 Average Number of Domains . . . . . . . . . . . . . . . . . . . . . 240C.7 Average Fan-Out . . . . . . . . . . . . . . . . . . . . . . . . . . . 241C.8 Average Rule Dependency Depth . . . . . . . . . . . . . . . . . . 241C.9 Average Number of Explicit Internal Scheduling Calls . . . . . . . 242C.10 Number of Changed Rules when Moving from Copy to RuleX . . 242C.11 Number of Additional Rules when Moving from Copy to RuleX . 243C.12 Number of Additional Reused Rules when Moving from Copy to

RuleX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243C.13 Lines of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244C.14 Number of Starts . . . . . . . . . . . . . . . . . . . . . . . . . . . 244C.15 Number of Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 245C.16 Number of Top-Level Rules . . . . . . . . . . . . . . . . . . . . . 245C.17 Average Size of the Domain Pattern . . . . . . . . . . . . . . . . . 246C.18 Number of Additional/Changed Comment Lines of Code when Mov-

ing from Copy to RuleX . . . . . . . . . . . . . . . . . . . . . . . 246C.19 Number of Possible Language Constructs & Number of Applied

Language Constructs over all Scenarios . . . . . . . . . . . . . . . 247C.20 Percentage of Applied Language Constructs Regarding Possible

Language Constructs . . . . . . . . . . . . . . . . . . . . . . . . . 247C.21 Time Until a Scenario was Implemented Successfully . . . . . . . 248C.22 Number of Newly Introduced Language Constructs when Moving

from Copy to RuleX . . . . . . . . . . . . . . . . . . . . . . . . . 248C.23 Size of Language Documentation (in Pages) . . . . . . . . . . . . 249C.24 Size of Language Documentation (in Lines) . . . . . . . . . . . . . 249C.25 Size of Language Documentation (in Words) . . . . . . . . . . . . 250C.26 Size of Language Documentation (in Characters) . . . . . . . . . . 250C.27 Size of Language Documentation (in Figures) . . . . . . . . . . . 251C.28 Questionnaire: Modularity . . . . . . . . . . . . . . . . . . . . . . 251C.29 Questionnaire: Reusability . . . . . . . . . . . . . . . . . . . . . . 252C.30 Questionnaire: Analyzability . . . . . . . . . . . . . . . . . . . . . 252C.31 Questionnaire: Modifiability . . . . . . . . . . . . . . . . . . . . . 253C.32 Questionnaire: Consistency . . . . . . . . . . . . . . . . . . . . . 253C.33 Questionnaire: Recognizability . . . . . . . . . . . . . . . . . . . . 254C.34 Questionnaire: Learnability . . . . . . . . . . . . . . . . . . . . . 254C.35 Average Decrease in Other Quality Properties when Moving from

Copy to RuleX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255C.36 Number of Newly Introduced Inconsistencies when Moving from

Copy to RuleX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

xvi

Tables

4.1 Qualitative Comparison of M2M Approach/Language/Engine Com-binations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Qualitative Comparison of Scenarios . . . . . . . . . . . . . . . . 58

5.1 General Goal Definition . . . . . . . . . . . . . . . . . . . . . . . 625.2 Mapping Between Quality Models . . . . . . . . . . . . . . . . . . 645.3 Goal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.4 Template for the Quality Question . . . . . . . . . . . . . . . . . 695.5 Template for the Reason Question . . . . . . . . . . . . . . . . . . 765.6 Refined Quality Question Template for Modularity . . . . . . . . 785.7 Refined Reason Question Template for Modularity . . . . . . . . . 795.8 Refined Quality Question Template for Reusability . . . . . . . . 805.9 Refined Reason Question Template for Reusability . . . . . . . . . 805.10 Refined Quality Question Template for Analyzability . . . . . . . 815.11 Refined Reason Question Template for Analyzability . . . . . . . 825.12 Refined Quality Question Template for Modifiability . . . . . . . 835.13 Refined Reason Question Template for Modifiability . . . . . . . . 845.14 Refined Quality Question Template for Consistency . . . . . . . . 855.15 Refined Reason Question Template for Consistency . . . . . . . . 865.16 Refined Quality Question Template for Approp. Recognizability . 875.17 Refined Reason Question Template for Approp. Recognizability . 885.18 Refined Quality Question Template for Learnability . . . . . . . . 905.19 Refined Reason Question Template for Learnability . . . . . . . . 91

A.1 Cardinality-Based Feature Modeling Notation . . . . . . . . . . . 131

C.1 Averaged Points of Quality Properties in Questionnaire . . . . . . 150C.2 General Measurement Template . . . . . . . . . . . . . . . . . . . 166C.3 Results of the Data Collection Phase for the SimpleUML to Sim-

pleRDBMS Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 171C.4 Results of the Data Collection Phase for the Copy Scenario . . . . 176C.5 Results of the Data Collection Phase for the Rule1 Scenario . . . 181C.6 Results of the Data Collection Phase for the Rule2 Scenario . . . 186C.7 Results of the Data Collection Phase for the Rule3 Scenario . . . 191C.8 Results of the Data Collection Phase for the Rule4 Scenario . . . 196C.9 Results of the Data Collection Phase for the Rule5 Scenario . . . 201C.10 Results of the Data Collection Phase for the Rule6 Scenario . . . 206

xvii

Tables

C.11 Results of the Data Collection Phase for the Rule7 Scenario . . . 211C.12 Results of the Data Collection Phase for the Rule8 Scenario . . . 216C.13 Results of the Data Collection Phase for the Rule9 Scenario . . . 221C.14 Results of the Data Collection Phase for the Rule10 Scenario . . . 226C.15 Results of the Data Collection Phase for the Rule11 Scenario . . . 231C.16 Results of the Data Collection Phase for the Rule12 Scenario . . . 236

xviii

Listings

3.1 Example Transformation in QVT-R . . . . . . . . . . . . . . . . . 303.2 Example Relation in QVT-R . . . . . . . . . . . . . . . . . . . . . 323.3 Example Transformation in QVT-O . . . . . . . . . . . . . . . . . 333.4 Example Mapping Operation in QVT-O . . . . . . . . . . . . . . 34

xix

1 Introduction

Engineering disciplines such as software, mechanic, and electrical engineering, arecharacterized by their ability to predict the outcomes of design decisions [Pre01, p.16]. Moreover, in software engineering itself but also in interdisciplinary fields likein systems engineering, engineers are concerned with the design and developmentof software. Consequently, important design decisions of these disciplines arerelated to methods and processes applied in software development.

This thesis focuses on one particular approach for software development: model-driven software development (MDSD). The thesis is especially interested in apply-ing MDSD as an engineering discipline, i.e., in evaluating the quality propertiesof the methods it applies.

Therefore, Section 1.1 describes the problem domain in more depth and dis-cusses the problem this thesis is concerned with. Section 1.2 introduces the so-lution idea of the thesis briefly. The introduction closes with an overview of thisthesis in Section 1.3.

1.1 Motivation

One approach to software development is model-driven software development(MDSD) [VSC06, pp. 14-15]. MDSD uses (semi-)formal, domain-specific mod-els as primary software artifacts, thus, leveraging software implementations to ahigher level of abstraction. Using MDSD induces two necessities: firstly, (source)models need to be transformed into other models (target models) and, secondly,models need to be transformed to implementation code. The means to implementthe former necessity are model-to-model (M2M) transformations and for the lattermodel-to-text (M2T) transformations. This thesis focuses on M2M transforma-tions applied by transformation engineers.

To cope with the lack of guidance in selecting reasonable M2M transformationtechnologies and applying these, this thesis introduces an initial framework forassessing and comparing the quality of these. One main idea of the frameworkis to base comparisons on concrete M2M scenarios implemented within an M2Mlanguage and compliant to an M2M engine.

M2M transformations can be characterized along five transformation dimen-sions : features, approaches, languages, engines, and scenarios. Transformationfeatures are all the variable or common attributes technical realizations (lan-guage/engine combinations) of transformations can have conceptually. This in-cludes, for instance, the supported direction in which the transformation is ex-

1

1. Introduction

ecuted. Transformation languages allow to specify a transformation following ageneral transformation approach. The transformation approach is the paradigma transformation language follows. For instance, a transformation language canfollow an operational (e.g., QVT-O) or relation (e.g., QVT-R) approach. Thisis similar to programming languages which can, for instance, follow imperative(e.g., C) or declarative (e.g., Prolog) approaches. The transformation engine is aprogram capable of executing the specification expressed in a transformation lan-guage. Thus, it is the entity which executes the transformation by transforminga source to a target model. Finally, the transformation scenario is the problemdomain an M2M transformation targets. For instance, the scenario describes thenumber of source and target models or the structure of the models itself.

To evaluate quality properties of M2M transformations, all five dimensions haveto be considered. For instance, a transformation scenario with one source and onetarget model may be best suitable for a concrete language/engine combination.Changing the scenario to two source models can make the language/engine com-bination infeasible. The reasons for this could be that, for instance, (a) the enginedoes not support two source models but the language does (in principle), (b) thelanguage does not support two source models but the approach does (in principle),(c) the approach does not cover two source models but the features do (in princi-ple), (d) there is no combination of features for two source models (the scenariomay be impossible to implement).1

To make such an evaluation structured and operational, a classification for eachdimension is needed first. This enables a qualitative comparison of M2M trans-formations and also serves as a basis for a quantitative comparison. While theexisting literature provides mature classifications of the first four dimensions (cf.[CH06, VSC06]), the classifications of transformation scenarios are less mature.Baier et al. [BBJ+08, pp. 99-100] propose a promising classification as they ex-plicitly name criteria an M2M transformation can target. Nonetheless, there arefeatures that do not fit into their categorization but which are obviously of inter-est when describing a transformation scenario. The number of source and targetmodels is such a case. Biehl [Bie10] also mentions the problem of less matureclassifications for transformation scenarios and provides a classification based ona literature study of existing classifications. Biehl provides a combination of fea-tures from Baier et al. [BBJ+08, pp. 99-100] and Czarnecki and Helsen [CH06]who mainly base their features on concrete languages and not on scenarios. How-ever, Biehl [Bie10] neither evaluates his classification nor uses the classificationfor an evaluation of M2M approaches.

Other authors typically classify transformation scenarios into categories like“simple” and “complex” based on insufficient or too few metrics. For instance,Gardener et al. [GGKH03] differentiate between (a) transformation scenarios whichtransform single elements from the source to single elements in the target model

1Note that, in fact, there are language/engine combinations that support two source models(e.g., QVT-O with the QVT Operational engine)

2

1.1 Motivation

(simple), and (b) transformation scenarios that build structures in the targetmodel that do not directly correspond to an individual element in the sourcemodel (complex). However, even transforming single source elements to singletarget elements can become complex, e.g., if the source and target models arevery large. In this regard, Kapova et al. [KGBH10] use a more convincing metricas they use the number of elements in the source and target model to differenti-ate between simple (few elements) and complex (many elements)2. Still, stickingto only one metric does not provide a convincing classification into “simple” and“complex” as even models with few elements can induce complex transformations.

Furthermore, the classification into “simple” and “complex” itself is not con-vincing since it puts high restrictions on the quality of evaluation results: whencomparing two M2M language/engine combinations A and B in a given complexscenario, possible evaluation results are (a) A handles the scenario “better” thanB, (b) B handles the scenario “better” than A, and (c) A and B handle complexscenarios “equally good”. Taking an additional complex scenario into the consid-eration requires the evaluation to deliver the same results regarding the relationbetween A and B if meaningful conclusions should be drawn. That is, a result iseither meaningful (e.g., “A handles a scenario better than B because the scenariois complex”) or too general (e.g., “A and B could be used for the scenario becausethe scenario is complex in which case A is not absolutely better than B and viceversa”).

On the contrary, a typical use case for a transformation engineer is to selecta suitable M2M technology (M2M approach, language, and engine) in a givenscenario. However, the lack of mature scenario classifications makes it hard fortransformation engineers to make objective decisions for their selection since amapping from features of scenarios to suitable M2M technologies does not exist.Such a mapping needs a scenario classification first and can then be derived, forinstance, from empirical studies.

Consequently, there is the need to (a) develop a reasonable classification oftransformation scenarios, and (b) compare the approaches based on the classifi-cations of all five dimensions. This allows to draw more meaningful conclusionsthan without considering all dimensions. These conclusions finally allow a trans-formation engineer to make objective decisions when selecting an M2M approach,language, or engine.

The contribution of this thesis is an initial step towards filling these gaps byproviding a framework for assessing and comparing the different M2M technologiesregarding quality properties. It allows to derive the means for a transformationengineer to decide, in a given transformation scenario, which M2M transformationapproach, language, or engine best fits the scenario’s requirements. The frame-work provides a more mature scenario classification and describes classificationsfor the other M2M transformation dimensions as well as for transformation qualityproperties. Based on these classifications, it shows how to use the classifications for

2More precisely, they use the number of metamodel classes (cf. Section 2.1.1).

3

1. Introduction

qualitative and quantitative comparisons of M2M technologies. The thesis illus-trates the application of the framework by applying it on three M2M technologies(involving the M2M languages Java with EMF, QVT-R, and QVT-O), comparingthe maintainability of these technologies on a quantified basis, and compiling firstrules for transformation engineers who want to select an M2M technology whichbest fits their maintainability requirements via a decision tree.

1.2 Solution Strategy

This thesis follows the Goal/Question/Metric (GQM) method. The GQM methodis a goal-oriented approach to measure properties of software systems; the GQMplan is the central artifact of the method and plans the measurement and inter-pretation of the properties [vSB99, pp. 21-25].

The thesis begins with GQM’s planning phase in which the five dimensions ofmodel transformations are qualitatively described and analyzed. For developing aclassification of transformation scenarios, the thesis starts with a first “prototype”classification and refines it further. The prototype is a combination of severalliterature sources similar to Biehl’s approach [Bie10] and partially based on myown experience. Based on feedback by my supervisors and by applying severalconcrete transformation scenarios to the classification, the prototype is refinedfurther.

After this, the thesis develops the GQM plan which specifies how to comparedifferent M2M transformation technologies. The GQM plan is based on concretetransformation scenarios and plans measurements as well as corresponding inter-pretations related to the dimensions of model transformations. Finally, the GQMplan is executed and the corresponding data sets are collected. The interpreta-tion of the data leads to valuable insights of advantages and disadvantages ofthe corresponding approach, language, or engine in a particular transformationscenario.

1.3 Overview

This thesis is structured as follows. Chapter 2 explains the fundamentals neededwithin the thesis. This includes the ideas, concepts, and terms of model-drivensoftware development as well as the evaluation methods used within the thesis.Afterwards, Chapter 3 continues with describing M2M transformations; the mainarea of interest of the thesis. It includes descriptions and classifications of the dif-ferent transformation dimensions and the selection of concrete M2M technologiesand transformation scenarios used as case studies for this thesis. Chapter 4 qual-itatively compares the selected technologies and scenarios. This serves as a basisfor a quantitative comparison as planned by the GQM plan which is compiled inChapter 5. The plan especially concentrates on maintainability properties of M2M

4

1.3 Overview

transformations but also covers M2M quality properties in general. In particular,this includes the main related work of this thesis, i.e., related work targeting sim-ilar qualitative comparisons. Thereafter, Chapter 6 executes the data collectionas planned by the compiled GQM plan. Chapter 7 interprets the collected results,derives a first, simple decision tree, and provides a threats to validity discussion.Finally, Chapter 8 concludes this thesis with a summary and suggestions for futurework.

5

2 Fundamentals

This thesis builds on fundamentals in the area of model-driven software develop-ment and on software evaluation methods. Model transformations are of maininterest for the thesis and a vital part of model-driven software development. Fur-thermore, the assessment of methods and approaches for model transformations(and software in general) requires appropriate measurement techniques, i.e., soft-ware evaluation methods.

This chapter describes the idea, concepts, and fundamental terms of model-driven software development in Section 2.1. Afterwards, Section 2.2 explains thesoftware evaluation methods important for this thesis.

2.1 Model-Driven Software Development (MDSD)

In MDSD, models describe a domain’s problem in an abstract and (semi-)formalway. The abstraction allows software engineers to concentrate on the essentialparts of the problem by omitting irrelevant details and, thus, keeping the modelcompact. Since the models are (semi-)formal, software engineers can transformthem into other models or into source code. This way, models can become a partof the software and serve for documentation [VSC06, p. 366]. The formalizationrequires the use of Domain-Specific Languages (DSLs) which are (semi-)formallanguages designed and implemented for a specific domain. A DSL allows tospecify a domain-specific model. One example and standard for MDSD is theModel Driven Architecture (MDA) which was launched and published by theObject Management Group (OMG) [Obj06]. It provides a set of basic terminologyand standards aiming at tool interoperability as well as platform independence.

This section gives further insights into MDSD topics relevant for this thesis.In order to avoid any ambiguity, Section 2.1.1 gives clear definitions for modelsand metamodels. Afterwards, Section 2.1.2 provides a definition for model trans-formations and a general classification into model-to-text and model-to-modeltransformations.

2.1.1 Models and Metamodels

Models and metamodels are central artifacts of MDSD. Therefore, this sectiongives clear definitions for these concepts and describes them.

7

2. Fundamentals

Models

Models have the most important role in MDSD. Nonetheless, there are severaldifferent definitions for this term. Baier et al. [BBJ+08, p. 94] give a definition ofa model which fits the needs of this thesis.

Definition 2.1 (Model) “A model describes a (real) system in a simplified (ab-stract) manner in pursuance of a concrete goal.” [BBJ+08, p. 94] (translated bythe author)

Based on a fundamental book of Stachowiak [Sta73, p. 207], Baier et al.[BBJ+08, p. 94] emphasize that a model has three characteristics: abstraction,pragmatics, and homomorphism1. Abstraction describes the property that themodel removes details which are not needed to serve a specific purpose. The pur-pose reflects the goal of creating the model (pragmatics). Thus, the pragmaticsdictate the attributes of interest when abstracting. Additionally, statements onthe model should also relate to the modeled entity (with respect to the pragmat-ics), i.e., there must be a homomorphism between model and represented entity.

Metamodels

Metamodels are important to formally specify the structure of models. Baier etal. [BBJ+08, p. 94] define it as follows.

Definition 2.2 (Metamodel) “A metamodel is a precise definition of constructsand rules for the creation of models. It includes an abstract syntax, at least oneconcrete syntax as well as static and dynamic semantics.” [BBJ+08, p. 94] (trans-lated by the author)

This means that a metamodel describes a set of models which conform to it, i.e.,each model uses the constructs and obeys the rules dictated by the metamodel.A model conforming to a metamodel is called an instance of the metamodel. Theshorter form model instance is also common. Furthermore, a metamodel can haveits own metamodel, the so-called meta meta model, and so forth.

Definition 2.2 is based on the concept formation introduced by Volter et. al[VSC06, pp. 56-58]. Accordingly, the abstract syntax specifies the set of syn-tactically correct model instances independent of its concrete representation. Aconcrete representation is specified by exactly one concrete syntax which complieswith the abstract syntax. There can be several concrete syntaxes such as, e.g., atextual and a graphical syntax.

The static semantics put further constraints (criteria for well-formedness) onthe set of syntactical valid model instances. These semantics can be checkedwithout knowing the intention of the model. In contrast, the dynamic semanticsspecify its intention which enables to interpret the model in a given context.

1Stachowiak [Sta73, p. 207] uses the German words “Verkurzungsmerkmal”, “PragmatischesMerkmal”, and “Abbildungsmerkmal”. Abstraction, pragmatics, and homomorphism are noliteral translations of these words but precisely reflect their meaning (cf. [Bec08, p. 31]).

8

2.1 Model-Driven Software Development (MDSD)

One important technology to specify metamodels is the Meta Object Facility(MOF) [Obj11b] which lies at the core of the MDA. For metamodel specifica-tion, MOF specifies the MOF model as a metamodel for metamodels. Anotherimportant technology is the Object Constraint Language (OCL) [Obj10] which isa part of the UML and specified by the MOF model. OCL is a formal languagewithout side effects; (initially) for describing expressions on UML models. In fact,the OCL is able to specify queries and constraints on any MOF-based modelinglanguage [VSC06, p. 96]. For MDSD, OCL is especially important since (1) OCLcan enrich metamodels by constraints, and (2) transformations can use OCL tooperate on models.

2.1.2 Transformations

In MDSD, models play the most important role and are part of the software. Con-sequently, software engineers need mechanisms to convert one model into anothermodel in order to apply a refinement, a refactoring, etc. to the former one. Forthis purpose, MDSD uses model transformations as a key technology.

Definition 2.3 (Model Transformation) “A model transformation is a com-putable function which maps model instances of a set of source models onto modelinstances of a set of target models.” [BBJ+08, p. 97] (translated by the author)

This thesis also uses the term transformation for brevity. The attribute com-putable is essential for an automated processing of transformations. It allowsto reuse transformations which, among other advantages, decreases costs whileincreasing development speed and software quality [VSC06, p. 13]. Chapter 5examines these quality issues in detail.

The definition allows different relationships (m:n, m:1, 1:n, and 1:1) betweenmodels. Gardner et al. [GGKH03] use these relationships to classify transforma-tion scenarios into different complexities and discuss the need to handle them.Furthermore, they review six transformation approaches and determine the typeof relationship they support. The 1:1 relationship is the most common one. There-fore, this thesis assumes in general 1:1 relationships and explicitly states wheneveran approach uses a different relationship.

Another common classification of transformations is to distinguish betweenmodel-to-model (M2M) transformations and model-to-text (M2T) transforma-tions [BBJ+08, p. 97]. An M2M transformation transforms a model (sourcemodel) into another model (target model). Typically, the metamodels of sourceand target model are different. Still, they could also have the same metamodels,e.g., for refactoring purpose. As M2M transformations are of main interest forthis thesis, they are treated separately and in detail in Chapter 3. A M2T trans-formation generates texts like source code, configuration files, XML, etc. from amodel. Consequently, they can be seen as a special case of M2M transformationssince the target metamodel could be an arbitrary text file [CH06].

9

2. Fundamentals

2.2 Evaluation Methods

Prechelt [Pre01, p. 30] describes software engineering as an engineering discipline(as the name suggests). Consequently, methods and approaches in the area ofsoftware engineering have to be evaluated with respect to their quality properties.Typically, this is done empirically, based on experience, and with a concrete goalin mind.

One of the main goal-oriented approaches to measure properties of softwaresystems is the Goal/Question/Metric (GQM) method [vSB99, pp. 21-25]. It isbased on defining goals, deriving questions related to the goals, and specifyingmetrics which shall help to answer the questions.

This thesis uses the GQM method to assess the quality properties of severalM2M transformation approaches in given transformation scenarios. Thus, thesection continues with introducing the GQM method in detail (Section 2.2.1).Afterwards, Section 2.2.2 describes quality models which specify possible qualityproperties of software products. In the context of the GQM method, a qualitymodel can, thus, help deriving a set of questions if quality properties are of in-terest by the corresponding goal. Finally, Section 2.2.3 describes data collectionprocedures which carry out the measurement tasks regarding a metric.

2.2.1 Goal/Question/Metric (GQM) Method

The Goal/Question/Metric (GQM) method is a goal-oriented approach to estab-lish a software measurement program. The method was developed by Basili andWeiss [BW84] and later expanded by Basili et al. (cf. [BCR02]). Furthermore,Solingen and Berghout [vSB99] contribute to the original work by providing fur-ther analyses and examples as well as by adding techniques like a cost/benefitanalysis. They describe the GQM method in the form of a practical guide. Thisthesis follows this guide and, therefore, mainly refers to Solingen and Berghout[vSB99] regarding the GQM method.

The GQM method, as illustrated in Figure 2.1, consists of four phases: theplanning, definition, data collection, and interpretation phase [vSB99, pp. 21-22].Figure 2.1 visualizes each of the four phases as a grey-shaded box.

The planning phase is the first phase and constitutes the framework of the mea-surement program. It includes the selection, specification and characterization,as well as the planning of a project the measurement is applied on. The maindeliverable of this phase is a project plan.

The second phase is the definition phase. It compiles a GQM plan which speci-fies the measurement program. It consists of three main parts which are developedtop-down [vSB99, p. 23]. Firstly, the GQM plan includes a set of explicit mea-surement goals. Secondly, the GQM plan refines the goals to questions makingthe goal attainment operational: by analyzing the answers to the questions, anengineer can decide whether the goals are attained. A hierarchical ordering ofquestions (with sub-questions) is also possible. Thirdly, the GQM plan identifies

10


Interpretation Phase

Data Collection Phase

Definition PhasePlanning Phase

Goal

Question

Metric Measurement

Answer

Goal Attainment

Collected Data

Proj

ect P

lan

refers to

refers to

refers to

basisfor

describesproject of

specifies uses

Figure 2.1: The Four Phases and Deliverables of the GQM Method (derived from[vSB99, p. 22])

metrics that provide the information to answer the questions. In addition to thesethree parts, the GQM plan specifies hypotheses which state the expected outcomeswhen taking the measurements. This is important to increase the learning effectfrom the measurement by eventually comparing the measurement expectationswith its outcomes (cf. [vSB99, pp. 55-56]).

The data collection phase is the third phase. It executes the measurements ofthe project and collects the data as specified by the metrics of the GQM plan.For this, it uses so-called data collection procedures which are further describedin Section 2.2.2. The phase includes the implementation of the data collectionprocedure as well as the data collection and storage itself.

The interpretation phase is the fourth and final phase. Its purpose is to use,interpret, and evaluate the collected data to draw conclusions regarding the mea-surement program. Like the definition phase, it consists of three main parts butinterprets the parts bottom-up instead of top-down [vSB99, p. 23]. The mea-surement results are the first part. A measurement result refers to a concretemetric and uses the collected data to provide the result. Secondly, the measure-ment results allow to answer the respective questions of the GQM plan. Thirdly,the answers are used to evaluate the goal attainment. Besides these three parts,the interpretation phase compares the hypothesis of the GQM plan to the actualoutcomes. In the case that a hypothesis and an outcome differ, an engineer hasto find the reason for this. The engineer can, for instance, further inspect andanalyze the collected data or take additional measurements.

11

2. Fundamentals

2.2.2 Software Quality Models

Software quality is the degree to which a software product satisfies the differentneeds (quality properties) by its users, developers, and other stakeholders [ISO11].Software quality models provide a categorization of quality properties for softwaresystems. Thus, they allow to refine the general term “quality” into concreteproperties which can be handled individually. In the context of the GQM method,quality models are useful to derive concrete questions (“What is the product’squality property X?”) whenever a goal is related to quality. Furthermore, theGQM method can profit from the existing literature that addresses metrics relatedto the quality properties.

In the past, several quality models were developed as, for instance, by Boehm[Boe78] or McCall et al. [MRW77]. Based on these first developments, the ISO/IECdefined the ISO/IEC 9126-1 [ISO01] standard which presents ISO/IEC’s qualitymodel. The successor of this standard is the ISO/IEC 25010 [ISO11] standard.As the latter is the newest standard currently available, this thesis concentrateson the ISO/IEC 25010 quality model only and does not evaluate other qualitymodels.

The feature model2 in Figure 2.2 shows the eight main quality properties ofthe ISO/IEC 25010 product quality model: functional suitability, performanceefficiency, compatibility, usability, reliability, security, maintainability, and porta-bility. ISO/IEC 25010 composes each of these main properties to subproperties.For instance, maintainability includes modularity, reusability, analyzability, modi-fiability, and testability. For concrete descriptions of each (sub)property, the thesisrefers to the ISO/IEC 25010 [ISO11] standard as the (sub)properties should bewell-known by software engineers. Nonetheless, this thesis gives a detailed de-scription of a used property at the relevant place. In particular, Section 3.6 usesthe properties of ISO/IEC 25010 as a basis to define quality properties in thecontext of M2M transformations.

Besides the quality model, the ISO/IEC also describes metrics corresponding toit. The ISO/IEC 9126-2 [ISO03a] standard describes external and the ISO/IEC9126-3 [ISO03b] standard internal metrics for each quality property of the qualitymodel. External metrics describe metrics only applicable during runtime whileinternal metrics are applicable without executing the software. The new standardsthat will replace the ISO/IEC 9126-2 and ISO/IEC 9126-3 are the ISO/IEC 25022and the ISO/IEC 25023 standard (cf. [ISO05]). As these were not published atthe time of writing this thesis, this thesis does not consider these and sticks to theformer standards. Whenever this thesis refers to one of ISO/IEC’s metrics, thethesis describes the metric at the relevant place.

2See Appendix A for a description of syntax and semantics of feature models.

12


System/Software Product Quality

PerformanceEfficiency

Usability Reliability Security

Maintain-ability

Functional Suitability Portability

Compatibility

Figure 2.2: ISO/IEC’s Product Quality Model (derived from [ISO11])

2.2.3 Data Collection Procedures

Data collection procedures execute data collection tasks, i.e., the measurementsas specified by a GQM plan [vSB99, p. 66]. Corresponding to Solingen andBerghout, data collection procedures can utilize manual forms, electronic forms,and automated data collection tools [vSB99, pp. 67-68]. Manual and electronicforms provide the means for collecting data from participants of the measurement.Automated data collection tools use predefined algorithms for calculating a GQMplans’ metrics on the respective artifact.

Prechelt [Pre01, pp. 35-48] provides a wider scope concerning data collectionprocedures. He differs between six empirical methods: questionnaires, case stud-ies, benchmarks, field studies, controlled experiments, and meta studies.

Questionnaires collect subjective data from its participants by collecting theiranswers to a set of specified questions. For this, questionnaires can utilize manualand electronic forms as well as conduct structured interviews.

Case studies collect data of tools or methods by the means of applicationexamples. They allow to compare methods without keeping every influencingfactor constant, thus, also limiting the expressiveness of the comparison.

Benchmarks are precisely specified use cases for tools or methods. Furthermore,benchmarks specify algorithms for calculating quantitative properties of the toolor method. Thus, they are similar to automated data collection tools and are aspecial case of case studies.

Field studies are also similar to case studies but observe real software projectsinstead of artificial ones. Their advantage is that they provide realistic results.Their disadvantage is that their results are hard to interpret.

Controlled experiments compare tools or methods by case studies but keep thenumber of variable properties low. Their advantage is that the low number ofvariable properties makes the interpretation of the data easy as varying comparisonresults can be explained by the variable properties. Their disadvantage is that they

13

2. Fundamentals

are costly. For instance, whenever artifacts created by an engineer are compared,individual traits need to be removed. Hence, many other engineers need to createthese artifacts, too.

Meta studies conduce their data from studies related to a common topic. Typ-ically, they are inexpensive but require several studies on the same subject.

This thesis mainly uses case studies (the implemented scenarios) to conduct itsresults. These results are validated by simple, subjective questionnaires in whichten domain experts participated. However, this thesis could only consider a smallnumber of these experts. A controlled experiment is out of the scope of the thesis.However, the thesis provides the data and setting needed for controlled exper-iments in future work. Furthermore, the thesis contains aspects of benchmarksand meta studies. Firstly, the implemented tool for measuring metrics can be usedto benchmark other scenario implementations. Secondly, the developed scenarioclassification is based on a study of the existing literature similar to a meta study.

14

3 Model-to-Model (M2M)Transformations

As Section 2.1.2 explains, M2M transformations transform source into target mod-els where the model’s metamodels can also be the same. These four artifacts (tar-get and source model/metamodel) specify the transformation scenario. Since atransformation is a computable function, a transformation engineer needs also tospecify the transformation via a (formal) transformation language. As Figure 3.1illustrates, an M2M transformation specification refers to a source and targetmetamodel, respectively. The transformation specification induces one possiblesolution to the transformation scenario. From a technical perspective, a transfor-mation engine then can read the source model, execute the M2M transformationwith respect to its specification, and create or modify the target model.

SourceMetamodel

SourceModel

TargetMetamodel

TargetModel

TransformationSpecification

Transformationenginereads writes

executesinstance of instance of

refers to refers to

Figure 3.1: General Concept of M2M Transformations (derived from [CH06])

This chapter continues with providing a classification for features of M2M trans-formations (regarding approaches, languages, and engines) in Section 3.1. After-wards, Section 3.2 gives an overview of general approaches to realize M2M trans-formations. Section 3.3 lists and describes concrete languages for M2M transfor-mation specification and Section 3.4 describes transformation engines capable ofexecuting a language’s transformation specifications. Section 3.5 provides a clas-sification for transformation scenarios as well as a set of concrete transformationscenarios. Section 3.6 concludes the chapter with applying M2M transformationsto the ISO/IEC 25010 [ISO11] standard, thus, defining quality properties for M2Mtransformations. The following chapters refer to these quality properties whenplanning and executing the assessment of the quality of M2M transformations.

15

3. Model-to-Model (M2M) Transformations

3.1 Features

Based on a domain analysis of several model transformation approaches andlanguages, Czarnecki and Helsen [CH06] propose a categorization for these ap-proaches1. They focus on features of concrete languages and engines rather thanon features of scenarios. Their categorization of features is especially importantfor this thesis since it enables a qualitative comparison of approaches as well asserves as a basis to derive metrics for a quantitative comparison.

Czarnecki and Helsen [CH06] make their categorization explicit by means of afeature model2. The feature model consists of a top-level feature diagram andseveral sub-diagrams, each covering a different leaf node of the top-level diagram.

Figure 3.2 shows the top-level feature diagram. Accordingly, a model transfor-mation has the following (top-level) features:

Relational Constraints A model transformation can provide specifications likepre- and post-conditions. Typically, these specifications describe relationsand are not executable.3

Transformation Rules Czarnecki and Helsen describe transformation rules as thesmallest unit of a transformation. This includes any approach which is ableto specify how to map source to target model elements like, for instance, afunction.

Rule Application Control A transformation has to control the application ofrules which involves the two aspects scheduling and location determination.Scheduling is in charge of the order in which transformation rules are ap-plied. Location determinations describe the mechanisms to find the sourcemodel elements a rule is applied to.

Rule Organization Rule organization is concerned with issues related to the com-position and structure of several transformation rules. This includes, forinstance, inheritance mechanisms.

Source-Target Relationship The source-target relationship specifies whether sourceand target model are the same (in-place) or two separate models.

Incrementality Incrementality describes issues related to changes of the sourcemodel: the effect on the target model, the parts which have to be re-

1Czarnecki and Helsen published a first version in 2003 [CH03]. In 2006, they revisited theirfirst proposal taking feedback to their first version and approaches which came up after 2003into account [CH06].

2See Appendix A for a description of syntax and semantics of feature models.3This thesis uses the term “relational constraints” instead of “specification” chosen by Czar-

necki and Helsen. The term “specification” is unfortunate in the context of this thesis sincethe thesis describes “transformation specifications” as the language artifacts to describe atransformation. A transformation engine can then execute these specifications.

16

3.1 Features

examined by a transformation, and retainment of custom changes insidethe target model.

Directionality A transformation either supports execution in one direction (uni-directional) or multiple directions (multidirectional).

Tracing Tracing keeps record of the transformation execution. It logs, for in-stance, which source elements correspond to the respective target elements.

Model Transformation

Rule Appli- cation Control

RelationalConstraints

Transforma-tion Rules

SchedulingLocation Determination

Rule Orga-nization

Source-Target Relationship

Incremen-tality

Directio-nality Tracing

Figure 3.2: Top-Level Feature Diagram (derived from [CH06])

Czarnecki and Helsen [CH06] give refined descriptions for each feature nodewhich includes a black arrow. The following sections give a summary of thesedescriptions for each of these nodes.

3.1.1 Transformation Rules

Figure 3.3 shows the different features of transformation rules:

TransformationRules

Domain

[1..*]

Reflection &Aspects

IntermediateStructures

Application Conditions

Multi-directionality

Syntactic Separation

Parame-trization

Figure 3.3: Features of Transformation Rules (derived from [CH06])

Domain A transformation rule can have several domains. A domain of a rule isits means to access the model the domain refers to. It is, for instance, ametamodel variable serving as an entry point to a model. The number ofdomains induces the different domain relationships (m:n, m:1, 1:n, and 1:1)as described in Section 2.1.2.

A domain has the features as shown in Figure 3.4:

17


Domain

BodyDynamic Mode RestrictionStatic ModeDomain

Language Typing

In Out In/Out LogicPatternsVariables

Figure 3.4: Features of Domains (from [CH06])

Domain Language The domain language is the language specification forvalid metamodels. For instance, MDA’s domain language has the formof a metamodel specified via MOF. Furthermore, if source and targetmetamodel are the same, the transformation is endogenous and exoge-nous otherwise.

Static Mode In programming languages, a procedure can have in, out, orin/out parameters. Analogously, a domain has a static mode which caneither be in, out, or in/out. The static mode is expressed explicitly orassumed implicitly.

Dynamic Mode Restrictions A transformation can restrict its static modeat execution time. For instance, a transformation can change an in/out-domain to an in-domain, thus, inducing the execution direction.

Body The body of a domain is the domain-related specification part ofa transformation. It includes variables as well as patterns or logic.Variables can have source and/or target model elements as a value.

Patterns are model fragments related to the domain and can makeuse of variables. Figure 3.5 shows their features. The structure of apattern relates to the internal structure of the domains’ models. InM2T transformations, string patterns are commonly used (in so-calledtextual templates). M2M transformations commonly use term or graphpatterns. Furthermore, patterns can use a concrete textual or graphicalsyntax. Czarnecki and Helsen [CH06] additionally add the “abstractsyntax” of a pattern to their classification. However, it is not clearwhat they mean by this as transformation specifications always applya concrete syntax. Therefore, this thesis does not consider his featurefurther.

18

3.1 Features

Patterns

SyntaxStructure

Strings Terms Graphs ConcreteAbstract

GraphicalTextual

Figure 3.5: Features of Patterns (from [TJF+09])

Logic is the domains’ means to put constraints or execute computationson model elements. Figure 3.6 shows the features of logic. Logic hasa language paradigm which can either be object-oriented, functional,logic, or procedural. This is similar to the paradigms of programminglanguages. Furthermore, logic dictates how values are specified and el-ements are created depending on whether an imperative or declarativeapproach is used. Imperative approaches use a direct imperative as-signment for value specification and create elements explicitly by this.Declarative approaches use value binding (like in functional program-ming) or constraints for value specification. Thus, they create elementsimplicitly or explicitly.

Logic

Language Paradigm

Object-Oriented

Func-tional Logic

Value Specification

ImperativeAssignment

Value Binding

Con-straint

Element Creation

Implicit ExplicitPro-cedural

Figure 3.6: Features of Logic (from [TJF+09])

Typing Domains can have different support for typing its variable, logic,and patterns. According to Czarnecki and Helsen [CH06], these el-ements can either be untyped, syntactically typed, or semanticallytyped, respectively (Figure 3.7). Syntactic typing allows variables tobe associated with the metamodel class it can hold instances of. Se-

19


mantical typing allows to assert the static and dynamic semantics of atransformation. For instance, static semantics allow to check for well-formedness of rules and dynamic semantics to check behavior propertiesof transformations.

This classification of Czarnecki and Helsen [CH06] mixes concepts knownfrom programming languages (for instance, cf. [Seb12]). There, typingprovides the features (1) typed and untyped, (2) weak and strong, and(3) static and dynamic. These features relate to when and how strictlytyping takes place. In contrast, syntax relates to the form and se-mantic to the meaning of programming language constructs. As theprogramming language literature is more accurate, this thesis uses theprogramming languages terminology when comparing transformationapproaches regarding the “typing” feature of Czarnecki and Helsen.

Typing

Untyped Syntactically Typed

Semantically Typed

Figure 3.7: Features of Typing (from [TJF+09])

Syntactic Separation Syntactic separation describes the issue whether a rule ex-plicitly handles the parts belonging to different domains separately.

Multidirectionality Multidirectionality describes whether a rule can be appliedin different directions. For instance, a rule which transforms an element Ato an element B may also be applied to transform from B to A. If this ispossible, the rule is bidirectional and unidirectional otherwise.

Application Conditions A rule can have conditions which state whether the rulecan be executed (application condition). During the execution of a trans-formation, the result of an application conditions’ evaluation of a rule canchange its value (from true to false or the other way round).

Intermediate Structures A rule can create elements which are not part of itsdomains’ models. For instance, traceability links are such a case (cf. Sec-tion 3.1.8).

Parameterization Rules can have parameters. Parameters can be control pa-rameters, generics, and higher-order rules as shown in Figure 3.8. Controlparameters are values which can be passed as control flags. Generics pro-vide the means to pass data types as parameters. This particularly includes

20

3.1 Features

model element types. Finally, higher-order rules allow to take other rules asparameters. Czarnecki and Helsen [CH06] claim that generics make trans-formations more reusable. Furthermore, they claim that higher-order rulesprovide an even higher level of reusability and abstraction. Still, neitherCzarnecki and Helsen nor the references they give provide an evaluation ofthese claims.

Parameterization

Control Parameters Generics Higher-Order

Rules

Figure 3.8: Features of Parametrization (from [CH06])

Reflection & Aspects Reflection allows a reflective access to transformationsduring execution. Reflection can particularly be used to realize aspects,i.e., crosscut concerns. An example for this is logging.

3.1.2 Location Determination

Figure 3.9 shows the different features of location determination:

Location Determination

Non-DeterministicDeterministic

One-PointConcurrent

Interactive

Figure 3.9: Features of Location Determination (from [CH06])

Deterministic The mechanism to find source model elements can be determinis-tic. For instance, the mechanism could follow a depth-first strategy.

21


Non-Deterministic The mechanism can be non-deterministic. The mechanismcould apply a transformation rule to a non-deterministically selected location(one-point) or concurrently to all matching source locations (concurrent).

Interactive The mechanism can be interactive, i.e., it allows the user to select asource location.

3.1.3 Scheduling

Figure 3.10 shows the different features of scheduling:

Scheduling

ExplicitImplicit

Non-Deter-minism

Explicit Condition

Inter-active

Conflict Resolution LoopingRecur-

sionFixpoint Iteration

Rule Selection

FormRule

Iteration Phasing

Internal External

Figure 3.10: Features of Scheduling (from [CH06])

Form The scheduling form describes how the scheduling mechanism is expressed.This can either be implicit or explicit. The implicit form does not give auser explicit control over the order in which rules are applied. In contrast,the explicit form allows the user to explicitly control the order. The user canimplement the scheduling through internal mechanism which allow rules toinvoke other rules or external mechanisms which are dedicated mechanisms(e.g., via a finite state machine) for scheduling, separated from the rulespecification.

Rule Selection Rule selection describes the way rules are selected for applica-tion. Possible rule selection mechanisms are by explicit conditions, non-deterministic choices, a conflict resolution mechanism (e.g., based on prior-ities), and interactive.

Rule Iteration Rule iteration mechanisms handle the repeated application of rules.Possible rule iteration mechanisms are recursion, looping, and fixpoint itera-tion. Fixpoint iteration repeats the application of rules until the target doesnot change anymore or no rule is applicable anymore.

22

3.1 Features

Phasing A transformation can have several phases. Each phase has a specificpurpose such that only certain rules can be applied within this phase. Forinstance, a first phase could build up the structure of a target model, and asecond phase sets the values of attributes of target model elements.

3.1.4 Rule Organization

Figure 3.11 shows the different features of rule organization:

Rule Organization

Logical CompositionInheritance Target-

OrientedSource-Oriented Independent

Reuse Mechanisms

Organizational Structure

Modularity Mechanisms

Figure 3.11: Features of Rule Organization (from [CH06])

Modularity Mechanisms A transformation language can allow to package rulesinto modules. Rules and other modules can then import a module and accessits rules.

Reuse Mechanisms Reuse mechanisms are mechanisms to specify rules based onother rules. On the one hand, scheduling mechanisms allow a basic supportfor this as they allow to let rules invoke other rules. On the other hand,dedicated mechanisms are also possible. These include inheritance betweenrules and modules as well as logical composition. Examples for inheritancebetween rules are rule inheritance, derivation, extension, and specialization.An example for inheritance between modules is unit inheritance.

Organizational Structure There are three ways rules can be organized: source-oriented, target-oriented, and independent. Source-oriented rules are or-ganized according to the structure of the source domain language whereastarget-oriented rules are organized according to the target domain language.Otherwise, i.e., if rules have their own organization, they are independent.

3.1.5 Source-Target Relationship

Figure 3.12 shows the different features of the source-target relationship:

23


Existing TargetNew Target

Source-Target Relationship

Destructive Extension Only

Update In-Place

Figure 3.12: Features of the Source-Target Relationship (from [CH06])

New Target Transformations can create a new target model separated from thesource model.

Existing Target Transformations can use an existing target model. If modeltransformations allow source and target model to be equal, they supportin-place transformations. Furthermore, a transformation can update an ex-isting target either destructively or by extension only. The latter means thatexisting elements cannot be removed by the transformation.

3.1.6 Incrementality

Figure 3.13 shows the different features of incrementality:

Incrementality

Target-Incrementality

Preservation of User Edits in the Target

Source-Incrementality

Figure 3.13: Features of Incrementality (from [CH06])

Target-Incrementality Target-incrementality is a basic feature for model trans-formations which support incrementality. It describes the mechanisms whichhandle changes in the source model regarding the target model. The mecha-nisms create the target models on the first execution. Afterwards, theyhandle source changes by updating the target models, i.e., they only applynecessary changes.

24

3.1 Features

Source-Incrementality Source-incrementality describes the mechanisms which re-examine the source when the source changes. A model transformation sup-ports source-incrementality if it minimizes the amount of re-examinations.

Preservation of User Edits in the Target A user can change the target modelafter a transformation manually. A transformation supports the “preserva-tion of the user edits” if the changes do not get lost when the transformationis re-executed, i.e., the user modifications are retained.

3.1.7 Directionality

Figure 3.14 shows the different features of directionality:

Directionality

Uni-directional

Multi-directional

Figure 3.14: Features of Directionality (from [CH06])

Unidirectional A transformation is unidirectional if it can be executed in onedirection: based on a source model, the transformation computes a targetmodel.

Multidirectional A transformation is multidirectional if it can be executed inmultiple directions. For instance, bidirectional transformations can computea model B from model A and vice versa. Thus, in the former case A has therole of the source model and B the role of the target model. In the lattercase, they exchange their roles: A is the target and B the source model.

3.1.8 Tracing

Figure 3.15 shows the different features of tracing. If a transformation has dedi-cated support for tracing, it provides the following features:

Creation The creation of trace information can be realized manually or automati-cally. The manual creation can be realized by creating trace target elementsvia the transformation itself. A transformation language could also requireto encode tracing information manually. Automatic creation relies on dedi-cated support by the language and engine. Still, it can be possible to tunethe tracing mechanism. For instance, a transformation engineer can putrestrictions on the kind of traces the engineer is interested in.

25


Dedicated Support

Storage LocationCreation

Manual Automatic Model Separate

TargetSource

Tracing

Tunable

Figure 3.15: Features of Tracing (from [CH06])

Storage Location Tracing information can either be stored inside the sourceand/or target model or separately.

3.2 Approaches

Czarnecki and Helsen [CH06] name and describe seven general approachesfor M2M transformations: direct-manipulation, structure-driven, operational,template-based, relational, graph-transformation-based, and hybrid approaches.One main difference is their varying focus on features as described in Section 3.1.This section summarizes each approach briefly.

3.2.1 Direct-Manipulation

Direct-manipulation approaches consist of (1) an internal representation of thetarget model, and (2) an (imperative) API for the manipulation of the model.Typically, the API is an object-oriented framework where a transformation en-gineer has to implement issues like transformation rules, scheduling, etc. fromscratch.

26

3.2 Approaches

3.2.2 Structure-Driven

Structure-driven approaches apply two distinct phases: (1) the creation of thetarget model as a hierarchical structure, and (2) the modification of attributesas well as references of the target model. Frameworks for this approach handlescheduling and the application strategy, whereas transformation engineers specifythe transformation rules.

3.2.3 Operational

Operational approaches are similar to direct-manipulation approaches. The differ-ence lies in the higher degree of dedicated support for model transformations likefacilities for expressing computations on metamodels or tracing. One commonapproach is to use a MOF implementation in combination with OCL extendedwith imperative constructs.

3.2.4 Template-Based

Approaches in this category apply model templates. Model templates consist of(1) models which are conform to the target metamodel, and (2) embedded meta-code for variable model parts. Typically, model templates are represented bya concrete syntax of the target metamodel and the metacode is annotated onmodel elements. Metacode can include conditions, iterations, and expressions asexpressible by OCL, for instance.

3.2.5 Relational

Relational approaches follow the concept of mathematical relations. For this,they specify relations between source and target element types via constraints.Consequently, relational approaches are declarative and create target elementsimplicitly whereas direct-manipulation and operational approaches are imperativeand create target elements explicitly. Relational approaches are side-effect free,support multidirectional rules, and can provide backtracking. Usually, relationalapproaches do not allow in-place updates since they require source and targetmodel to be distinct.

3.2.6 Graph-Transformation-Based

This category is based on theoretical work on graph transformations. A graphtransformation rule consists of a left-hand side (LHS) and a right-hand side (RHS)graph pattern. To apply a rule on a source model, two steps are executed consec-utively: (1) elements that occur in the LHS and not in the RHS are removed; and(2) elements that occur in the RHS and not in the LHS are added. The result ofthis execution is the target model.

27


Graph-transformation-based approaches apply typed, attributed, and labeledgraphs. The graphs are displayed in abstract or concrete syntax. Implementa-tions of this approach differ in features they provide. Among these features arescheduling (explicit and implicit), unidirectional, in-place as well as multidirec-tional transformations, negative application conditions for the LHS, and verifica-tion support.

3.2.7 Hybrid

Hybrid approaches are a combination of two or more approaches as described inSection 3.2.1 to Section 3.2.6. There are two extremes for the combination: (1) asseparate components and (2) at the level of individual rules.

3.3 Languages

This section describes languages for the specification of M2M transformations.This thesis concentrates on languages being part of the Eclipse Modeling Project.In their survey, Streekman and Kruse [SK09] identify these languages as commonlyapplied languages within the MDSD community.

The Eclipse Modeling Project is a MDSD project of the Eclipse Foundation[Gro09, p. 8]. The core of the project is the Eclipse Modeling Framework (EMF)which includes a MOF implementation (Ecore) and, thus, allows to specify meta-models. Several projects exist which use EMF to provide MDSD capabilities likeabstract and concrete syntax development as well as transformations. Particu-larly, the M2M transformation languages QVT and ATL are part of the EclipseModeling Project and make use of EMF.

First, this thesis considers a direct manipulation approach which is a combina-tion of using Java and the EMF framework (Section 3.3.1). This combination isoften used in practice [SK09]. Next, this thesis considers dedicated M2M languagesof the Eclipse Modeling Project: the operational approach QVT Operational Map-ping and the relational approach QVT Relations (both discussed in Section 3.3.2),as well as the hybrid approach ATL (Section 3.3.3). For the quantitative and qual-itative comparison, this thesis does not further consider ATL as this is out of thethesis’ scope and, thus, left as a future work.

3.3.1 Java and EMF

EMF comes with a generator able to create Java code from a EMF metamodelspecification. The generator is implemented via the template-based M2T trans-formation language JET (Java Emitter Templates) [SBPM09, p. 342]. For thegeneration of classes, interfaces, references, etc. of the metamodel, EMF usesseveral patterns as described by Steinberg et al. [SBPM09, pp. 239-308].

28

3.3 Languages

When using EMF in the context of M2M transformations, target and sourcemodels are object instances of the respective EMF metamodel classes. Methodsprovided by the EMF framework allow a transformation engineer to executeseveral tasks relevant for a transformation, for instance, to

(1) navigate through the source model,

(2) create the target model programmatically, and

(3) alter the source model (for in-place transformations).

This corresponds to the direct-manipulation approach to an M2M transforma-tion: besides these mechanisms, EMF does not provide dedicated facilities fortransformation specification, tracing, or scheduling.

3.3.2 Query/View/Transformation (QVT)

This section gives an overview of the Query/View/Transformation (QVT) speci-fication [Obj11a]. QVT is an OMG standard and can be seen as a hybrid M2Mtransformation approach with three separate components [CH06]: the two rela-tional approaches QVT Core and QVT Relations (QVT-R), as well as the opera-tional approach QVT Operational Mapping (QVT-R). In addition, the specifica-tion considers the non-standard component Black Box Implementations allowingto enrich the QVT Core or QVT-R by operational aspects. Any programminglanguage with a MOF binding can implement such a black box.

Figure 3.16 illustrates how the different parts of the specification relate to eachother. QVT Core is based on small extensions to MOF and OCL allowing tospecify declarative relationships between MOF models. The specification alsopresents the semantics of the QVT Core language. For defining the semanticsof QVT-R, the specification gives an M2M transformation from QVT-R to QVTCore. The higher level of abstraction is the advantage of QVT-R.

Besides the declarative part, the specification allows QVT Core and QVT-R toinvoke the operational components QVT-O and black box implementations. Forthis, a relation of the declarative part specifies a class responsible for tracing. Inaddition, the class has an 1:1 mapping to an operation signature implementedwithin one of the two operational approaches. Figure 3.16 illustrates this depen-dency of the operational components by arrows to the relational components.

The idea of mapping from QVT-R to QVT Core for defining the semanticsof QVT-R is feasible in principle but requires QVT Core to be as expressive asQVT-R. In fact, this is in general not true as shown by Stevens [Ste11]. Stevensshows that already a simple example expressible via QVT-R is not expressible viaQVT Core. On the other hand, QVT-R’s intended semantics are also describedin the specification in a combination of English and first order predicate logic.Furthermore, Volter et al. [VSC06, p. 208] mention that the detour via QVTCore is not a practical approach for implementing an engine supporting QVT-R.

29


QVT Relations

QVT Operational Mapping

QVT Core

Black Box Implementations(Java, .NET, ...)

QVT Relations to

QVT CoreTransformation

extends extends

extends extends

Figure 3.16: Conceptual Overview of the QVT Specification (derived from[Obj11a])

Instead, Volter et al. suggest to implement the engine directly based on QVT-R’ssemantics. As a consequence, this section continues with describing QVT-R (withits intended semantics) and QVT-O only.

QVT Relations (QVT-R)

QVT-R specifies transformations via a set of relations. A relation can either bea top-level or a non-top-level relation. Top-level relations must always hold fora successful transformation execution. Non-top-level relations need only to holdwhenever they are invoked by other relations, i.e., by QVT-R’s internal schedul-ing mechanism. The transformation itself references candidate models which areconstrained by the relations. Candidate models are named references to modelinstances of the respective metamodel type. QVT-R supports multidirectionaltransformations where the concrete direction is determined by selecting one of thecandidate models as a target. A QVT-R engine determines how a transformationengineer can select candidate models as a target.

Listing 3.1 gives an example of a QVT-R transformation specification whichtransforms a simplified UML model into a simplified relational database model(cf. Section 3.5.3 for a detailed description of the scenario). The transforma-tion umlRdbms references the candidate models uml and rdbms. SimpleUML is themetamodel of uml and SimpleRDBMS is the metamodel of rdbms. umlRdbms in-cludes two top-level relations (PackageToSchema and ClassToTable) as well asone non-top-level relation (AttributeToColumn).

1 transformation umlRdbms ( uml : SimpleUML , rdbms : SimpleRDBMS ) {2 top relation PackageToSchema { . . . }3 top relation ClassToTable { . . . }4 relation AttributeToColumn { . . . }5 }

Listing 3.1: Example Transformation in QVT-R (from [Obj11a])

30

3.3 Languages

QVT-R relations implement the concepts as described in Section 3.1.1 (“Do-main” feature). They consist of variable declarations, two or more domains, aswell as a when and where clause:

Variable declarations Variable declarations specify which variables are usedwithin the relation. The variables are named and typed. Utilized variablesare intermediately created structures.

Domains QVT-R domains allow relations to access elements of the candidatemodels. Domains can be marked as checkonly or enforce which determinesthe behavior of a transformation whenever the relevant candidate model isselected as a target. If the domain is marked as checkonly, the transformationonly checks whether the candidate model matches the relation and does notmodify the target model to satisfy the relationship. Hence, the static modeof a checkonly domain can be seen as “in”. In contrast, if the domain ismarked as enforce, the transformation checks whether the candidate modelmatches the relation and, in case it does not match the relation, modifiesthe target model to satisfy the relationship (check-before-enforce semantics).Hence, the static mode of an enforce domain is “in/out”.

The body of domains uses patterns structured by terms and with a concretetextual as well as graphical syntax. Moreover, the body’s logic follows thelogic programming language paradigm with implicit element creation andallows value specification by OCL constraints as well as value binding.

QVT-R domains are strongly typed. Whether the domains are statically ordynamically typed depends on the engine used. If the engine applies typechecking before executing the transformation specification, it supports statictyping.

Because QVT-R specifies domains separately, it supports syntactic separa-tion. Furthermore, relations can be executed multidirectionally and supportcontrol parameters. For multidirectionality, the concrete direction is inducedby the selection of the target candidate model. For specifying parameters,dedicated domains have to be added to the relation, thus, allowing to derivethe parameters of the relation: an output parameter for enforce domainsand an input parameter for checkonly domains.

When Clauses When clauses specify the application conditions of relations, i.e., arelation needs only to hold whenever its when clause evaluates to true. Whenclauses can contain OCL expressions as well as reference other relations, i.e.,check whether another relation holds.

Where Clauses Where clauses specify conditions which need to be satisfied by arelation. Similar to when clauses, where clauses can contain OCL expres-sions as well as reference other relations. The latter mechanism is QVT-R’sinternal scheduling as it allows to invoke other relations. This way, alsonon-top-level relations can be invoked.

31


Listing 3.2 presents an example for a QVT-R relation. It shows the implementa-tion of the ClassToTable top-level relation of Listing 3.1. ClassToTable declarestwo string variables (cn and prefix), two domains, and a when and where clause.

The first domain is a checkonly domain which uses c as identifier to access theClass element of the uml candidate model. The second domain is an enforcedomain which uses t as identifier to access the Table element of the rdbms candi-date model. The body of the relation assigns values to attributes of c and t via(extended) OCL expressions. This pattern can also be nested as, for instance, inline 11 where an instance cl of type Column is assigned to the column attributeof t. cl’s attributes are assigned within the nested brackets.

The when clause dictates that ClassToTable must hold if PackageToSchema

with the parameters p and s evaluates to true. As PackageToSchema transformsan UML package to a RDBMS schema, this assures that the namespace of c andthe schema of t are set up correctly.

The where clause assures that the prefix variable is empty and invokes theAttributeToColumn relation with the three parameters c, t, and prefix. Thisallows AttributeToColumn to assure that every attribute of c is mapped to acolumn of t.

1 top relation ClassToTable {2 cn , prefix : String ;3 checkonly domain uml c : Class {4 namespace=p : Package {} ,5 kind=’Persistent ’ ,6 name=cn

7 } ;8 enforce domain rdbms t : Table {9 schema=s : Schema {} ,

10 name=cn ,11 column=cl : Column { name=cn+’_tid’ , type=’NUMBER ’ } ,12 key=k : Key { name=cn+’_pk’ , column=cl }13 } ;14 when { PackageToSchema (p , s ) ; }15 where { prefix = ” ; AttributeToColumn (c , t , prefix ) ; }16 }

Listing 3.2: Example Relation in QVT-R (from [Obj11a])

QVT Operational (QVT-O)

A QVT-O transformation is an imperative and unidirectional M2M transformationwith a signature that states the involved models. Similar to a class, it can beinstantiated and consists of a set of properties and operations. Possible operationsare entry, mapping, constructor, helper, and blackbox operations.

The entry operation is the (unique) entry point of the transformation execution.Its name is main and, in contrast to the other operations, it has no parameters(but can access global properties and parameters). A mapping operation specifiesthe mapping between one or more source into one or more target model elements.Similar to QVT-R relations, mapping operations can have a mapping body as

32

3.3 Languages

well as when and where clauses, thus, allowing to specify application and postconditions. Constructors are dedicated operations for the creation of metaclassinstances. Helpers are operations that provide a result; typically based on aperformed computation on model elements. If a helper has no side-effects, thehelper is a query operation. Finally, each operation can be a black box operationif its body is not provided within the QVT-O transformation but implementedelsewhere (e.g., in Java).

Like the example in Section 3.3.2, the QVT-O transformation Uml2Rdbms ofListing 3.3 shows a transformation from a simplified UML model into a simplifiedrelational database model. The signature of Uml2Rdbms identifies uml of type UML

as source model (in direction) and rdbms of type RDBMS as target model (outdirection). The entry operation main first creates a list of Package objects of theuml model. Afterwards, it applies the mapping operation packageToSchema oneach element of the list.

1 transformation Uml2Rdbms ( in uml : UML , out rdbms : RDBMS ) {2 main ( ) {3 uml . objectsOfType ( Package )−>map packageToSchema ( ) ;4 }5 . . .6 }

Listing 3.3: Example Transformation in QVT-O (from [Obj11a])

The domain features (cf. the transformation rules feature of Section 3.1.1) ofQVT-O operations include the following. Operation parameters support the staticmodes “in”, “out”, and “inout” as directions. Operations have no dedicated mech-anism for a dynamic mode restriction. The body of operations allows variables,uses patterns structured as terms and with a concrete textual syntax, and followsan imperative language paradigm. Thus, the logic of QVT-O operations supportsimperative assignments for value specification and an explicit element creationmechanism. Furthermore, the language paradigm provides object-oriented as-pects like the instantiation of other transformations as objects. Finally, QVT-Ooperations are strongly typed. As for QVT-R relations, static and dynamic typingdepends on the language’s engine.

Regarding the further transformation rules features, QVT-O operations supportno clear syntactic separation. The left hand side of expressions usually addressestarget elements and the right hand side source elements but exceptions exist. Forinstance, variables not related to target elements can get a value assigned. Anexample for this is the pre-defined result variable which either relates to a uniqueparameter or (in case of non-uniqueness) a list of declared result parameters.

QVT-O operations are not multidirectional. As mentioned, the application con-ditions of mapping operations are realized via when clauses. Intermediate struc-tures include traces and intermediate variables but also intermediate metaclassesand additional properties of metaclasses. Generally, operation parameters aresimple control parameters. However, QVT-O also has dedicated mechanisms to

33


cast to a transformation instance: the asTransformation operation casts a trans-formation specification to an instance of a QVT compliant transformation class.Thus, QVT-O implicitly allows higher-order rules as parameters. Finally, QVT-Ohas no dedicated reflection and aspect mechanisms.

1 mapping Class : : class2table ( ) : Table when { self . isPersistent ( ) } {2 name := ’t_’ + self . name ;3 column := self . attribute−>map attr2Column ( ) ) ;4 key := object Key {5 name := ’k_’+ self . name ;6 column := result . column [ kind=’primary ’ ] ;7 } ;8 }

Listing 3.4: Example Mapping Operation in QVT-O (derived from [Obj11a])

Listing 3.4 shows an example for a QVT-O mapping operation (class2table)without parameters. class2table specifies how a UML Class has to be trans-formed into a RDBMS Table. The when clause dictates that the transformationbody can only be executed if the contextual parameter (an element of type Class)accessed via the pre-defined variable self is a persistent class. Otherwise, theoperation returns the null value. The transformation body creates the outputparameters name, column, and key of the target model by imperative assignments.The object expression construct in line 4 specifies an inline mapping operation.This expression assures that a Key instance with the output parameters as speci-fied in lines 5 and 6 is created and assigned to the key parameter.

Further features of QVT-O include mechanisms to access tracing informationfor accessing target model elements previously created, rule organization featureslike modules, compositioning of transformations, the reuse mechanisms inheritanceand merge, disjunction of mapping operations, and the parallel execution of trans-formations. Details of these mechanisms can be found in the QVT specificationdocument [Obj11a].

3.3.3 Atlas Transformation Language (ATL)

The Atlas Transformation Language (ATL) [Ecl12a] is an M2M transformationlanguage which follows a hybrid approach [JABK08]. It provides declarative aswell as operational parts. The language is mainly based on OMG’s QVT and OCLspecifications. Thus, it has a lot of similarities with QVT-R and QVT-O. ATL isa part of the Eclipse Modeling Project.

3.4 Engines

This section describes M2M transformation engines. M2M transformation enginesread a source and create or modify a target model by executing the specification ofan M2M transformation. The specification generally uses the syntax of an M2M

34

3.4 Engines

transformation language as described in Section 3.3 but may add own constructs.The specification’s semantics are in general consistent with the semantics intendedby the specification of the language. Nonetheless, engines may extend or changeparts of a language’s semantics defining their own language interpretation.

This section concentrates on engines which provide interpretations of the lan-guages discussed in Section 3.3. As Java with EMF and the JVM can alreadybe seen as an engine, this section does not describe it again and refers to Sec-tion 3.3.1. For QVT-R, Stevens [Ste11] identifies two main engines: MediniQVT (Section 3.4.1) and Model Morf (Section 3.4.2). For QVT-O, this thesisdiscusses the open source tools QVT Operational (Section 3.4.3) and SmartQVT(Section 3.4.4). ATL comes with its own engine “ATL transformation engine”(Section 3.4.5).

3.4.1 Medini QVT

Medini QVT4 is an engine for QVT-R developed by the ikv++ technologies ag[ikv12]. The engine comes as an Eclipse plugin and is able to handle EMF models.Medini QVT provides its own interpretation of the QVT-R language. The maindifferences from the QVT specification [Obj11a] are the following:

Checking Semantics: check-before-enforce The QVT specification requires thatan enforce domain is not instantiated whenever a match can already be foundin the target [Obj11a, p. 15]. However, Medini QVT does not perform atarget domain matching [KE08]. Instead, it always instantiates the targetdomain if there are no traces pointing to created elements or no keys can beapplied.

Enforcement Semantics: enforce-by-deletion The QVT specification describeshow to handle the situation that a target domain pattern can be matched inthe target model but the source domain pattern cannot be matched [Obj11a,pp. 20-21]. The QVT specification requires that the relation is enforced.For this, a transformation must delete matched and bounded elements ofthe target model.

As mentioned, Medini QVT does not perform a target domain matching[KE08]. Instead, Medini QVT checks whether elements created by a formertransformation are also created by the current transformation. If this is notthe case, the elements are deleted. These semantics are similar to QVT-R’ssemantics but not the same.

Pattern Matching Semantics: Collection Templates The QVT specificationdescribes collection template expressions as a part of QVT-R [Obj11a, p.30-31]. These expressions are patterns that match a collection of elements.However, Medini QVT does not support collection templates [KE08].

4This thesis considers Medini QVT Version 1.7.0 [ikv12].

35


Black Box Implementations Medini QVT does not support the Black Box Im-plementations as given by the QVT specification [KE08]. Instead, Kiegelandand Eichler [KE08] suggest to use operations implemented inside metamodelclasses which then can be called via OCL (OCL operation call).

Bidirectional Transformations As mentioned by the Medini QVT User Guide[ikv07], a bidirectional transformation has to be left and right unique, i.e.,bijective. Instead, QVT’s specification does not require bidirectional trans-formations to be bijective. Thus, Medini QVT puts more limitations on itsQVT-R interpretation.

Graphical Syntax Although QVT-R provides a graphical syntax in general, Me-dini QVT does not support it. However, once a transformation is imple-mented within the textual syntax, a corresponding graphical syntax couldbe manually created, too.

Extends/overrides concepts The QVT specification describes “extends and over-rides” concepts as inheritance mechanisms. The extends concept allows toinclude all rules of a first transformation into a second rules’ execution con-text [Obj11a, p. 25]. The overrides concept specifies that those included (in-herited) rules can be overridden by a new rule, i.e., the new rule is executedinstead of the original one (in case its application conditions are fulfilledduring execution) [Obj11a, p. 27]. Both concepts are not implemented inMedini QVT [Nol09, p. 130].

3.4.2 ModelMorf

ModelMorf5 is an M2M transformation engine of the Tata Research Developmentand Design Centre and is free for academic use [Tat12]. ModelMorf conformsclosely to the QVT-R specification [Ste11]. It comes as a command-line tooltaking XMI [Obj11c] files of models as input. As its implementation only differsslightly from the QVT-R specification, this section does not describe a speciallanguage interpretation.

3.4.3 QVT Operational

The QVT Operational6 engine is a QVT-O transformation engine and part ofthe Eclipse Modeling Project [Ecl12b]. It comes as an Eclipse plugin and usesan imperative OCL extension. The engine aims at being fully compliant withthe QVT-O standard7. The main differences to the QVT specification [Obj11a]include the following:

5This thesis considers ModelMorf Beta 1 [Tat12]6This thesis considers QVT Operational Version 3.1.0 [Ecl12b]7The developers refer to an older version of the QVT specification (cf. [Dvo08] and [Ecl12c]).

Nonetheless, this thesis keeps referring to the newer specification [Obj11a] but explicitlystates whenever a feature is not supported by the engine.

36

3.4 Engines

AST Model The AST model used by the engine comes “with some differencesfrom the spec” [Dvo08] due to legacy reasons. Dvorak [Dvo08] announcesto standardize the implemented QVT AST to enable the XMI export func-tionality and to fix the issue.

Concrete Syntax QVT Operational’s concrete syntax realizes main concepts butis not complete [Dvo08]. Dvorak [Dvo08] announces to complete the syntaxwith the exception of “parallel transf. etc.” [Dvo08]. This last statementmay refer to Section 8.1.18 of the QVT Specification [Obj11a, pp. 76-77]where the advanced features “dynamic definition” and “parallelism” are de-scribed.

Dvorak’s presentation slides at the EclipseCon 2008 [Dvo08] do not describewhere exactly the two problems lie. The slides of the succeeding conference[BDI09] do not mention the issues anymore which may be a sign that the is-sues were solved. Another argument is that Nolte [Nol10, p. 161] reports thatthe engine is mostly compliant to the QVT specification [Obj11a] but he does notgive concrete arguments for this (he reports his experiences). Consequently, thisthesis expresses explicitly whenever there is a problem with non-compliance withthe QVT specification [Obj11a].

The documentation of the engine is very sparse. Hence, this thesis can onlyconsider the following resources. General information about the engine can befound at the QVT Operational page [Ecl12b] and at its Wiki page [Ecl12c]. Thementioned presentation slides [Dvo08, BDI09] and the QVT Operational Devel-oper Guide [Ecl08] give a good overview of the engines’ main functionality. Ingeneral, the QVT specification [Obj11a] is also a good resource since the engine ismostly compliant to it. Nolte [Nol10, pp. 161ff] describes the engine briefly andpresents a case study with the engine.

3.4.4 SmartQVT

SmartQVT [Fra12] is an open source QVT-O compliant transformation enginedeveloped by France Telecom R&D. The engine compiles QVT-O transformationsinto Java programs which can perform the transformation based on EMF. It comesas an Eclipse plugin. As there seems to be no update activity anymore (lastupdate on 2008-08-07 [Fra12]), this thesis does not have a more detailed look onSmartQVT and sticks to the QVT Operational engine as representative engine forQVT-O.

3.4.5 ATL Transformation Engine

ATL’s engine8 provides the means for compiling and executing ATL transforma-tion specifications [JABK08]. Similar to Java, ATL transformations are compiled

8This thesis considers the ATL Version 3.2.1 [Ecl12a]

37


into byte-code which can be executed on ATL’s Virtual Machine (VM). The VMuses an abstraction layer for manipulating models. For its realization, the layeraccesses components following the direct-manipulation approach, e.g., EMF. Theengine accepts the full ATL language syntax and is compliant to its semantics.Consequently, no special language interpretation has to be described.

3.5 Scenarios

The feature model of Czarnecki and Helsen [CH06] (as described in Section 3.1)concentrates on features provided by approaches and languages. Instead, this sec-tion proposes a feature model for transformation scenarios. That is, it shows whathas to be transformed and not how the scenarios are implemented via transfor-mations.

This section continues with describing important related scenario classificationsin Section 3.5.1. Based on the related classifications, Section 3.5.2 introducesthe scenario classification used and proposed by this thesis. Section 3.5.3 contin-ues with describing concrete examples of model transformation scenarios as casestudies. Thereby, it highlights their distinct features with respect to the proposedclassification. Particularly, this shows the applicability of the classification withthe help of case studies. Section 3.5.4 completes the section with a discussion ofthe suggested classification.

3.5.1 Related Classifications

This section describes the scenario classifications Vertical and Horizontal ScenarioCriteria by Baier et al. [BBJ+08, pp. 99-100], Rule Level Scenario Criteria byIacob et al. [ISH08], and Higher-Order Transformations by Tisi et al. [TJF+09].As argued in Section 1.1, other literature resources are less mature and, hence,are not discussed.

Vertical and Horizontal Scenario Criteria

The classification of Baier et al. [BBJ+08, pp. 99-100] differentiates between verti-cal and horizontal transformations characterizing the possible criteria of a scenario.Vertical transformations are either abstractions or refinements. Abstractions leaveout details to focus on the main aspects of interest, e.g., in the context of reengi-neering where a system architecture shall be derived from an implementation. Incontrast, refinements add details to provide a more (platform) specific model.

A horizontal transformation, on the other hand, can be refactoring, optimiza-tion, migration, renovation, normalization, or another change in structure. Arefactoring improves the system’s inner structure without changing its outer be-havior; typically, bad smells are removed [FBB+99]. Optimization targets the

38

3.5 Scenarios

improvement of extra-functional characteristics like performance. With a migra-tion transformation, systems are moved to different platforms or languages, e.g.,from Java 1.5 to Java 1.6. A renovation removes functional errors (bugs) or ap-plies changes due to new or changed requirements. Normalization is for reducingthe syntactic complexity; often, redundancies are removed.

There can be several of these scenario criteria for one transformation scenario.In particular, a transformation scenario can require both, a vertical and a hori-zontal transformation. Baier et al. [BBJ+08, p. 100] name these situations skewtransformation scenarios.

Furthermore, this classification is not complete: for instance, the number ofsource models also characterizes a scenario. Therefore, the classification of Baieret al. gives one view on a scenario classification but needs to be extended to covermore aspects that are of interest for this thesis.

Rule Level Scenario Criteria

Iacob et al. [ISH08] propose a set of transformation design patterns targeting differ-ent transformation scenarios. Thus, they implicitly induce possible features for ascenario classification. In contrast to the scenario criteria of Baier et al. [BBJ+08,pp. 99-100], Iacob et al. consider a more fine-grained view on scenario criteriawhich is close to the rule level (instead of the transformation level) of scenariocriteria. Concerning model instances, they differentiate between the two elementtypes “nodes” (instances of metaclasses) and “edges” (instances of metaclassesrealizing an association between a source and a target node).

The proposed patterns are mapping, duality, refinement, abstraction, and flat-tening. Mapping establishes an 1:1 relationship between elements of source andtarget models without changing the element type (node or edge). That is,(a) source nodes are mapped to target nodes or (b) source edges are mappedto target edges. In contrast, duality describes an 1:1 relationship which changesthe element type. That is, (a) source nodes are mapped to target edges or (b)source edges are mapped to target nodes. Refinements can be edge or node re-finements. Edge refinements transform one edge into a set of edges interleavedby nodes. Node refinements transform one node into a set of nodes interleavedby edges. Abstractions abstract away from nodes or edges by applying edge re-spectively node refinements in the opposite direction. Finally, flattening removescontainment hierarchies from models.

Higher-Order Transformation (HOT)

Higher-order transformations (HOTs) are transformations which have transfor-mation models as source and/or target [TJF+09]. A transformation model isa model instance of a metamodel for transformations. This implies that thereexists a metamodel for a concrete model transformation language. This thesisadds HOTs to the scenario classification as it assumes HOTs to be a commontransformation scenario.

39


Tisi et al. [TJF+09] differentiate between synthesis, analysis, (de)composition,and modification HOTs as shown in Figure 3.17. Synthesis HOTs have no sourcebut a target transformation model. In contrast, analysis HOTs have a sourcebut no target transformation model. (De)composition HOTs have at least onesource and target transformation model. Furthermore, while composition HOTshave at least two source models, decomposition HOTs have at least two targettransformation models. Finally, modification HOTs have one source as well asone target transformation model.

Higher-Order Transformation

Synthesis Analysis (De)Compo-sition Modification

Figure 3.17: Features of Higher-Order Transformations (derived from [TJF+09])

The parametrization feature “higher-order rules” by Czarnecki and Helsen’sclassification (cf. Section 3.1.1) is similar to HOTs but (a) is not defined via trans-formation metamodels, and (b) does not address transformations as an explicittransformation goal.

3.5.2 Classification

Figure 3.18 introduces the transformation scenario classification proposed by thisthesis. At the top-level feature diagram, the feature model reuses ideas from theclassification of Czarnecki and Helsen [CH06] (domain, source-target relationship,and directionality) and from the classification of Baier et al. [BBJ+08, pp. 99-100](scenario criterion). The classification of Iacob et al. [ISH08] is reused within therefinement of the scenario criterion feature as it generally fits into this categorybut has a more fine-grained view on the respective feature (e.g., Iacob et al. alsodescribe “abstraction” and “refinement” as scenario criteria). In contrast, thefeatures described by Czarnecki and Helsen are orthogonal to these classifications(e.g., directionality does not depend on whether the scenario describes an abstrac-tion or a refinement). Hence, the features by Czarnecki and Helsen are separatedfrom the scenario criterion feature and classified into the top-level feature diagram.

The different classifications are reused and modified (for the case of transfor-mation scenarios) as follows:

Domains A scenario can have several domains. The typical case are two domains,i.e., one source and one target domain. This thesis also differs whether the

40

3.5 Scenarios

TransformationScenario

Domain

[1..*]

Source-Target Relationship Directionality Scenario

Criterion

[1..*]

Figure 3.18: Feature-Based Classification of Model Transformation Scenarios

domains refer to a single metamodel (endogenous) or not (exogenous) (cf.Section 3.1.1). The “static mode” feature of domains is important since itspecifies whether a domain is used as source (“in”), target (“out”), or both(“in/out”). Domain features not important for scenarios are the “body”feature and the “typing” feature as these features directly relate to trans-formation specifications.

Source-Target Relationship A scenario can (a) use the source model as a target(in-place) or (b) create or update a target model distinct from the sourcemodel (cf. Section 3.1.5). The update feature can retain user modificationswithin the existing target or only provide extensions to it (cf. Section 3.1.5).

Directionality On an execution basis, a transformation scenario can only be trans-formed to one direction at once. However, a scenario implementation canbe implemented in a way that executing the same transformation in an-other direction is also possible. As this influences the characteristics of atransformation scenario, it needs to be considered within this classification:a scenario can be defined to be unidirectional or multidirectional (cf. Sec-tion 3.1.7).

Scenario Criteria A scenario can have several criteria which a transformationhas to consider (as illustrated in Figure 3.19). The criteria are derived fromtheir related classifications of Section 3.5.1: the scenario criterion feature hashierarchical, 1:1 relation, abstraction level, structure, and HOT as subfea-tures. The hierarchical feature relates to changes of containment hierarchies.While Iacob et al. [ISH08] only consider a flattening, this thesis also addsclustering, its opposite. The 1:1 relation states whether the scenario realizesa mapping or duality (according to Iacob et al. [ISH08]). Abstraction leveland structure resemble the classification of Baier et al. [BBJ+08, pp. 99-100]based on features. The abstraction level feature also fits to the classificationof Iacob et al. [ISH08]. Finally, the HOT feature states whether a scenariodescribes a HOT.

41


ScenarioCriterion

Refactoring

Structure(horizontal)

Optimization

Migration

Renovation

Normalization

Abstraction Level(vertical)

Abstraction

Refinement

Higher-OrderTransformation

Flattening

Clustering

Hierarchical

Mapping

Duality

1:1 Relation

Figure 3.19: Criteria of Model Transformation Scenarios (derived from [BBJ+08,pp. 99-100])

3.5.3 Case Studies

This section describes several transformation scenarios as case studies. It alsoapplies the proposed classification to each scenario in order to show the classifica-tion’s applicability.

SimpleUML to SimpleRDBMS

The QVT specification [Obj11a, pp. 203ff] describes a scenario where a simplifiedUML model (SimpleUML) of classes is mapped to a simple model of relationaldatabase management system tables (SimpleRDBMS). SimpleUML classes consistof several attributes which can be primitive (integer, string, etc.) or non-primitive(their type is specified via another class which allows to specify sets, for instance).SimpleRDBMS tables consist of several columns which can have keys and foreignkeys. The appendix in Section B.1 illustrates both metamodels. The specificationprovides both, an implementation for QVT-R and for QVT-O.

When considering the feature model for scenarios, the scenario can be describedas follows:

Domains The scenario involves two exogenous domains (SimpleUML and Sim-pleRDBMS). The SimpleUML metamodel has six and the SimpleRDBMSmetamodel has eight metamodel classes.

Source-Target Relationship The target model (an instance of the SimpleRDBMS)is created anew.

Directionality The scenario is unidirectional (from SimpleUML to SimpleRDBMS).However, it can be checked whether a SimpleRDBMS model is consistentwith a SimpleUML model.

42

3.5 Scenarios

Scenario Criteria The scenario includes one main criterion: a refinement fromSimpleUML to SimpleRDBMS (vertical transformation). The reason isthat SimpleUML can be seen as a description of the conceptual design ofa database and SimpleRDBMS as its logical design (in terms of a database schema). Then, the details added are logical relations inherent to re-lational databases: the SimpleUML associations are refined to foreign keyrelationships between the corresponding tables which is one possible way toimplement associations. If the database context did not define a change ofthe abstraction level, the scenario would be seen as a migration from Sim-pleUML to SimpleRDBMS (horizontal transformation), i.e., scenario criteriacan depend on the context of the transformation scenario.

Furthermore, the scenario includes a flattening and several 1:1 mapping re-lations. The flattening feature describes that (1) non-primitive attributes ofSimpleUML have to be flattened to a set of database columns and (2) inher-ited attributes have to be mapped to columns of the corresponding table.Examples for 1:1 mapping relations on a “node” basis are transformationsfrom packages to schemas, classes to tables, attributes to columns. An ex-ample for an 1:1 mapping relation on an “edge” basis is the mapping fromassociations to foreign keys of tables.

The scenario is especially important due to its simplicity and popularity. Forinstance, Nolte explains the QVT-R [Nol09, pp. 134-140] and QVT-O [Nol10,pp. 117-141] implementations in detail to demonstrate the development with therespective language. Also Czarnecki and Helsen [CH06] use excerpts of the sce-nario to illustrate the different transformation approaches they describe. Anotherexample is the evaluation of the maintainability of QVT-R transformation speci-fications by Kapova et al. [KGBH10] who use the scenario as a case study.

Ecore to Copy HOT

Goldschmidt and Wachsmuth [GW08] present a transformation scenario whichmaps from EMF’s Ecore to the QVT-R metamodel. This transformation gener-ates QVT-R copy rules for a given EMF metamodel. The EMF metamodel allows,for instance, to model a package including a set of metaclasses which can have aset of meta attributes. Example elements of the QVT-R metamodel are “transfor-mations” which can include a set of “relations”. Therefore, the scenario describesthat, for instance, packages and classes need to be mapped to copy transforma-tions (copy package transformation) and copy relations (copy class rule). As thetarget model describes a transformation specification for its own and the inputmodel is arbitrary, the scenario describes a HOT of type synthesis.

The scenario can be applied to the feature model as follows:

Domains The scenario involves two exogenous domains (Ecore and the QVT-Rmetamodel). The Ecore metamodel has 31 and the QVT-R metamodel has110 metamodel classes.

43


Source-Target Relationship The target model (an instance of the QVT-R meta-model) is created anew.

Directionality The scenario is unidirectional (from Ecore to the QVT-R meta-model).

Scenario Criteria The scenario is a HOT of type synthesis. Examples for included1:1 mapping relations are from packages to transformations, classes to rela-tions, and attributes to relations. An example for an 1:1 duality relation isthe transformation from a reference to a relation.

Message-oriented-Middleware (MOM) Completion

Happe et al. [HFBR08] specify a scenario which adds platform-specific,performance-relevant details into a component-based software architecture. Thescenario has two source domains: one for the software architecture and one for theadded details (a so-called mark- or annotationmodel). The two source domainsresult in one target domain which includes the information of both source do-mains, i.e., the scenario describes a completion. The markmodel describes aspectsof a concrete message-oriented middleware (MOM). For instance, the markmodelcould specify that messaging channels support guaranteed delivery.

Happe et al. [HFBR08] use the “Ecore to Copy HOT” (as described in thissection) to generate an initial transformation rule set for realizing the completionof the scenario. They enrich the initial rule set by specific rules in charge ofmerging the details of the MOM model manually. This new scenario can beapplied to the feature model as follows:

Domains The scenario involves two source domains: the Palladio ComponentModel (PCM) metamodel [Bec08] describes the component-based softwarearchitecture and the MOM markmodel the details of a MOM. Furthermore,it includes one target domain which also refers to the PCM metamodel.Thus, the scenario includes endogenous (from PCM to PCM metamodel)as well as exogenous (from MOM markmodel to PCM) aspects. The PCMmetamodel has 110 metamodel classes.

Source-Target Relationship The target model (an instance of the PCM meta-model) is created anew.

Directionality The scenario is unidirectional (from PCM metamodel and MOMmarkmodel to the PCM metamodel).

Scenario Criteria The scenario includes a refinement: from a general architecturemodel to a more concrete architecture that takes the applied MOM intoaccount. A more fine-grained view on refinements is the transformation from(annotated) connectors to MOM completions. Furthermore, the scenario isno HOT but uses one to create an initial rule set.

44

3.5 Scenarios

Medini QVT’s Shapes-Tutorial

Medini QVT’s Shapes-Tutorial is a set of several transformation scenarios thatcome with the Medini QVT engine (cf. Section 3.4.1). For each scenario, the in-volved source and target metamodels refer to the shapes metamodel [ISH08]. Theshapes metamodel consist of shapes and arrows that relate the shapes. Shapes caneither be simple or blocks. Simple shapes are either circles, triangles, or squares.Blocks can contain zero or more shapes and arrows, thus, allowing to build upcontainment hierarchies. Moreover, each shapes model has one unique root block;modeled as a specialization of the block class. The appendix in Section B.1 pro-vides an illustration of this metamodel.

The set of scenarios consists of one basic scenario “Copy” (a copy transformationwhich simply copies a shapes model) and twelve other scenarios named “Rule1”to “Rule12” which are based on the basic scenario. The rationale behind Rule1to Rule12 is that each scenario demonstrates a different type of transformationfeature as basically every transformation copies a shapes model but applies ascenario-specific change to the copy pattern. When applying the feature modelfor scenarios, this is made explicit:

Domains Except for Rule11, each scenario involves exactly one source and target(in/out) domain, respectively. In sum, Rule11 has three in/out domains asit transforms two source models to their intersection target model. Rule11builds the intersection by comparing its two source models element by ele-ment regarding equal names (serving as IDs).

Furthermore, each domain (of every rule) refers to the shapes metamodel.Thus, every scenario specifies an endogenous transformation. The shapesmetamodel has nine metamodel classes.

Source-Target Relationship Except for Rule9, every scenario creates the targetmodel (an instance of the shapes metamodel) anew. Rule9 additionallyapplies a backward transformation (from target to source) after an initialforward transformation to update the source model. Furthermore, Rule9retains user modifications made in the target model when retransforming.

Directionality Except for Rule9, every scenario is unidirectional (from shapesmetamodel to shapes metamodel). Rule9 is bidirectional as it also allowsbackward transformations (see “Source-Target Relationship”).

Scenario Criteria Each scenario includes aspects of an 1:1 mapping relation asthey include several copy rules. Besides this, the scenario-specific scenariocriteria are as follows:

Copy Specifies a pure 1:1 mapping relation.

Rule1 Specifies an 1:1 mapping relation while replacing circles by squares.

45


Rule2 Specifies a hierarchical flattening that moves every shape from innerblocks to the root block.

Rule3 Specifies a refinement that transforms an arrow to an arrow-square-arrow combination.

Rule4 Specifies a refinement that transforms a square to a block whichcontains a circle.

Rule5 Specifies an abstraction that removes all triangles.

Rule6 Specifies an abstraction that replaces all empty blocks by a square.

Rule7 Specifies an 1:1 duality relation by transforming circles to arrows andvice versa.

Rule8 Specifies a hierarchical clustering that groups fork/join pairs of cir-cles.

Rule9 No special scenario criteria; see “Source-Target Relationship” and“Directionality”.

Rule10 Specifies a refinement that transforms a circle to a square-arrow-circle combination.

Rule11 No special scenario criteria; see “Domains”.

Rule12 Specifies a refactoring that rewrites mathematical expressions with-out changing their value.

3.5.4 Discussion of the Scenario Classification

The case studies of Section 3.5.3 show that the introduced scenario classificationis applicable as well as demonstrate how to apply the classification. Furtherimportant characteristics of the classification are (1) its completeness, (2) whetherscenarios can be unambiguously applied to it, and (3) its degree of granularity.

Firstly, the main goal of the introduced classification was not to provide a com-plete classification but a “first version” of it that unites related work and sufficesthe needs of this thesis. That is, every important scenario-specific feature of theconsidered case studies could be covered. An example for a feature that may hastoo few subfeatures is the “structure” feature. It is derived from the classificationof Baier et al. [BBJ+08, pp. 99-100] who just give examples of possible struc-tural transformations (refactoring, optimization, etc.). Hence, there can be otherstructural features not stated explicitly. However, adding new features is not aproblem when using feature diagrams. Furthermore, “synchronization” can beseen as a feature of scenarios. This thesis does not add it explicitly to the classifi-cation as it is implicitly included in the “multidirectionality” feature. Researchersfocusing on synchronization scenarios can add a more fine-grained classificationof it to the provided feature model. Also “model weaving” is only implicitly in-cluded by allowing several domains and a feature regarding computation-intensive

46

3.5 Scenarios

needs (e.g., passing source elements to complex algorithms that provide a resultnecessary for a target element) is not considered at all. In general, this “first ver-sion” of the feature model can serve as a starting point for refined and improvedclassifications covering needs outside this thesis.

Secondly, the features differ in their degree of unambiguousness. Features like“number of domains” and “direction” can directly and unambiguously be appliedon scenarios. In contrast, features like “refinement” and “abstraction” need knowl-edge about the context the scenario is used in. For instance, the SimpleUML toSimpleRDBMS case study shows that two views on the scenario are possible: itcan be seen as a refinement from a UML diagram to a relational database or as amigration depending on whether the context describes a change of the abstractionlevel. It is also important to note that scenarios can have several scenario criteria.For instance, the Ecore to Copy HOT scenario does not just include “synthesis”as feature but also several “1:1 relations”. Also, depending on the context, a hi-erarchical flattening can relate to a change of the abstraction level or simply to astructural changing (hence, it is listed as a separate feature). In summary, engi-neers need to understand the requirements and the context of scenarios in order toapply the classification but the classification helps to work towards understandingthese issues by providing important features that need to be considered.

Thirdly, the degree of granularity can differ (a) between different features and(b) between different interpretations of a single feature. The first issue is obviouswhen comparing features and subfeatures as subfeatures are generally more fine-grained than their parent features. But the issue can also occur when comparingfeatures on the same level regarding their granularity. For instance, statementsabout the abstraction level can be less fine-grained than statements about 1:1relationships (the SimpleUML to SimpleRDBMS scenario is an example for this).An example for the second issue is the MOM Completion scenario. The refine-ment feature can be interpreted for the scenario in general (“the scenario refinesa general architecture model to a concrete MOM”) or considering concrete, morefine-grained refinements (“an annotated connector is refined to a MOM comple-tion”).

The lessons learned are that the scenario classification neither relieves engineersfrom a detailed requirements analysis that also involves the application context,nor can be seen as entirely complete. Instead, it provides a guideline which as-pects of the problem domain have to be considered for identifying requirementsand domain-specific issues may have to be added to the feature model. As theapplication to the classification of the case studies considered within this thesisalready identifies scenario-specific features, the classification is sufficient for thisthesis: the scenario-specific features allow to argue about the reasons for quanti-tative differences when comparing the different scenario implementations.

47


3.6 Quality Properties

This section describes quality properties in the context of M2M transformations.Quality properties are defined in the ISO/IEC 25010 standard [ISO11] (c.f. Sec-tion 2.2.2). However, the standard is not specific for M2M transformations. Asa consequence, this section defines the quality properties with respect to M2Mtransformations. The general process for these definitions is to use the definitionof the standard as a basis but to refine these general definitions such that theyare restricted to M2M transformations. For a discussion and related work in thisarea, this section refers to Section 5.2.

Furthermore, this section focuses on the “product quality” properties as de-scribed in Section 2.2.2. That is, it focuses on quality properties which can beapplied just to the product (M2M scenario implementations, languages, and en-gines) rather than focusing on the “quality in use”. The latter is related to theproduct’s impact on its stakeholders (e.g., the users of a transformation imple-mentation).9

In the following, this section describes each of the main quality properties ofthe standard in a separate subsection.

3.6.1 Functional Suitability

The “functional suitability” quality property gives the degree to which a transfor-mation is implemented and working as intended. In particular, it gives the degreeto which a transformation scenario is correctly implemented by a transformation.

Functional suitability includes three subproperties: (1) functional complete-ness, (2) functional correctness, and (3) functional appropriateness. Functionalcompleteness is the degree to which a transformation’s rules cover the tasks ofa given scenario. Functional correctness is the degree to which a transformationprovides correct results (target models) with the needed degree of precision. Func-tional appropriateness is the degree to which a transformation’s rules facilitate theaccomplishment of a scenario’s tasks.

3.6.2 Performance Efficiency

“Performance efficiency” is important when a transformation engineer wants toexecute a transformation by an engine. As this execution can take time, thetransformation engineer can have requirements regarding the engine’s performancerelative to the amount of resources.

Subproperties of performance efficiency are (1) time behavior, (2) resource uti-lization, and (3) capacity. The time behavior of an M2M transformation executionrelates to the response and processing times as well as the throughput rates of theexecution carried out by an engine. More precisely, it is the degree to which

9The ISO/IEC 25010 standard [ISO11] provides a “quality in use” model. It is not furtherdiscussed within this thesis as the thesis’ focus is on “product quality” properties.

48


these aspects meet a scenario’s requirements. Resource utilization is the degree towhich the amounts and types of resources used by an engine when performing atransformation execution meets a scenario’s requirements. Capacity is the degreeto which the maximum limits of an engine meet a scenario’s requirements.

3.6.3 Compatibility

“Compatibility” of a transformation is the degree to which the transformation canexchange information with other systems and/or perform its required functions,while sharing the same hardware or software environment. This relates to thecapabilities of the engine and the language.

The quality property includes two subproperties: (1) co-existence and (2) in-teroperability. Co-existence is the degree to which a transformation engine canperform a scenario’s requirements efficiently while sharing a common environmentand resources with other systems, without detrimental impact on any other sys-tems. Interoperability is the degree to which transformation engines can exchangeinformation and use the information that has been exchanged. For instance, allconsidered language/engine combinations support model interchange via XMI se-rializations which can be used for the exchange of information.

3.6.4 Usability

The quality property “usability” of transformations generally includes two view-points to the degree of effectiveness and efficiency when working with transforma-tions: the viewpoint of (1) transformation engineers regarding M2M approaches,languages, and engines for transformation development, maintenance, and execu-tion, and (2) users within an MDSD project regarding the execution of transfor-mations provided by transformation engineers. The second viewpoint relates tothe “quality in use” properties and is, hence, not further investigated.

The first viewpoint is specified via its six usability subproperties: (1) appropri-ateness recognizability, (2) learnability, (3) operability, (4) user error protection,(5) user interface aesthetics, and (6) accessibility. Appropriateness recognizabilityis the degree to which transformation engineers can recognize whether a trans-formation in a given scenario is appropriate for their needs, e.g., by a provideddocumentation and reading the implemented transformation specification. Thisrelates to the M2M approach, language, and engine as well as the specificationof the transformation. Learnability is the degree to which a transformation lan-guage or engine can be used by transformation engineers to achieve specified goalsof learning to use the language or engine with effectiveness, efficiency, freedomfrom risk, and satisfaction in the context of transformation development, mainte-nance, and execution. Operability is the degree to which a transformation enginehas attributes that make it easy to operate and control. User error protectionis the degree to which a transformation engine protects transformation engineers

49


against errors. User interface aesthetics describe the degree to which a trans-formation engine’s user interface enables pleasing and satisfying interaction fromtransformation engineers. Accessibility is the degree to which a transformationengine can be used by transformation engineers with the widest range of charac-teristics and capabilities (including disabilities associated with age, for instance)to achieve a specified goal in the context of transformation development, mainte-nance, and execution.

3.6.5 Reliability

The “reliability” quality property specifies the degree to which a transformationengine executes transformations and can be used for the development of transfor-mations under specified conditions for a specific period of time. An area wherethis quality property is especially important is the application of M2M transfor-mations in safety-critical environments, e.g., if an engine is in charge of dynamicreconfigurations of a (model-driven) car system.

Reliability has four subproperties: (1) maturity, (2) availability, (3) fault tol-erance, and (4) recoverability. Maturity is the degree to which a transformationengine meets the needs for reliability under normal operation. Availability is thedegree to which a transformation engine is operational and accessible when re-quired for use. Fault tolerance describes the degree to which a transformationengine operates as intended despite the presence of hardware or software faults.Recoverability is the degree to which, in the event of an interruption or a failure,a transformation engine can recover the data directly affected and re-establish thedesired state of the transformation.

3.6.6 Security

“Security” describes the degree to which a transformation protects informationand data so that persons or other systems have the degree of data access appro-priate to their types and levels of authorization. For instance, an MDSD companycould sell an implemented transformation but does not want its customers to in-spect and copy the transformation’s specification. One way to cope with this issuecan be to provide compiled transformation specifications only.

The quality property includes five subproperties: (1) confidentiality, (2) in-tegrity, (3) non-repudiation, (4) accountability, and (5) authenticity. Confiden-tiality is the degree to which a transformation specification ensures that data isaccessible only to those authorized to have access. Thus, the example above relatesto confidentiality. Integrity is the degree to which a transformation specificationis prevented from unauthorized access or modification. Non-repudiation describesthe degree to which transformation executions or changes to transformation spec-ifications can be proven to have taken place, so that the events or actions cannotbe repudiated later. Accountability is the degree to which transformation execu-tions or changes to transformation specifications can be traced uniquely to the

50


respective entity. Finally, authenticity is the degree to which the identity of anentity can be proved to be the one claimed.

The described subproperties partly relate to the organizational structure of anM2M transformation project. For instance, non-repudiation, accountability, andauthenticity can relate to the information and security provided by an appliedrevision control system. However, such a system can be seen as a part of theapplied engine and the tooling it provides.

3.6.7 Maintainability

“Maintainability” is important as transformations may need to be modified (e.g.,if metamodels change, new requirements occur, or bugs within the transformationare discovered). Consequently, a transformation engineer can have requirementsregarding the effectiveness and efficiency of transformation maintenance.

Subproperties of maintainability are (1) modularity, (2) reusability, (3) ana-lyzability, (4) modifiability, and (5) testability. Modularity describes the degreeto which a transformation specification is composed of discrete components suchthat a change to one component has minimal impact on the other components.Reusability is the degree to which a transformation specification can be used inmore than one scenario. Analyzability is the degree of effectiveness and efficiencywith which it is possible to assess the impact on a transformation specificationof an intended change to one or more of its parts, or to diagnose the transforma-tion specification for deficiencies or causes of failures, or to identify parts to bemodified. Modifiability is the degree to which a transformation specification canbe effectively and efficiently modified without introducing defects or degradingexisting transformation specification quality. Finally, testability is the degree ofeffectiveness and efficiency with which test criteria can be established for a trans-formation specification and tests can be performed to determine whether thosecriteria have been met.

3.6.8 Portability

The “portability” quality property describes the degree of effectiveness and effi-ciency with which an M2M transformation project can be transferred to differentenvironments. Portability is mainly concerned with the transformation engine.

Portability includes three subproperties: (1) adaptability, (2) installability, and(3) replaceability. Adaptability is the degree to which a transformation enginecan effectively and efficiently be adapted for different evolving hardware, software,or other operational or usage environments. Installability describes the degree ofeffectiveness and efficiency with which a transformation engine can be successfullyinstalled and/or uninstalled in a specific environment. Finally, replaceability isthe degree to which a transformation engine or transformation specification can bereplaced by another engine or transformation specification for the same purposein the same environment.

51

4 Qualitative Comparison

This chapter qualitatively compares the approach/language/engine combinationsand case study scenarios as selected in Chapter 3. For this, the chapter appliesthe classifications also described in Chapter 3. These comparisons help derivingquestions, metrics, and hypotheses when compiling the GQM plan in Chapter 5as well as help interpreting the measured results in Chapter 7.

Each comparison is based on a table that reflects the features of the respectiveclassification. That is, the respective table includes a “Feature” column whichexplicitly states the features of interest. For clarity, the features are indentedand the rows are colored according to their level within the corresponding featuremodel: lower levels are more indented and get brighter grey-scales. Furthermore,each table includes a “Value Range” column making the possible values for apply-ing approach/language/engine combinations and scenarios explicit, respectively.The possible values can be listed in a row (separated by commas) and can includeadditional comments in brackets. If no value is applicable for a given feature, thehyphen sign is used. The values for concrete approach/language/engine combina-tions and scenarios are derived from the descriptions of Chapter 3.

This chapter is structured as follows. Section 4.1 presents the comparison forapproach/language/engine combinations and Section 4.2 the comparison for sce-narios.

4.1 Approach/Language/Engine Combinations

Table 4.1 shows the table comparing three approach/language/engine combina-tions: (1) Java with EMF and the JVM representing the direct-manipulationapproach, (2) QVT-R with Medini QVT representing the relational approach,and (3) QVT-O with QVT Operational representing the operational approach.Each combination has a separate column which makes the respective features ofa combination explicit. These feature assignments are based on the classifica-tion in Section 3.1 as well as the descriptions of the respective M2M dimensionin Section 3.2, Section 3.3, and Section 3.4. The assignment only considers na-tively and dedicated supported features regarding transformations. For instance,tracing could be implemented within Java but it is no feature already supportedby the standard Java libraries. For clarity, these sections only name the respec-tive language (Java, QVT-O, or QVT-R) when, in fact, comparing the respectivecombination of approach, language, and engine.

53

4. Qualitative Comparison

Feature Value Range Direct-‐Manipulation/ Java with EMF/JVM

Relational/QVT-‐R/ Medini QVT

Operational/QVT-‐O/ QVT Operational

Relational Constraints {Yes, No} No No NoTransformation Rules {} -‐ -‐ -‐ Domain {} -‐ -‐ -‐ # Supported Domains {*, 1, 2, ...} * * * Domain Language {Ecore, MOF, ...} Ecore (EMOF implementation) Ecore (EMOF implementation) Ecore (EMOF implementation) Static Mode {In, Out, In/Out} In, Out, In/Out In, In/Out In, Out, In/Out Dynamic Mode Restr. {Yes, No} No No No Body {} -‐ -‐ -‐ Variables {Yes, No} Yes Yes Yes Patterns {} -‐ -‐ -‐ Structure {Strings, Terms, Graphs} Terms Graphs Terms Syntax {} -‐ -‐ -‐ Abstract {} -‐ -‐ -‐ Concrete {Textual, Graphical} Textual Textual (Medini QVT does not

support graphical syntax)Textual

Logic {} -‐ -‐ -‐ Language Paradigm {Object-‐Oriented, Functional,

Logic, Procedural, Imperative, Declarative, ...}

Imperative (Object-‐Oriented and Procedural aspects)

Declarative (Functional and Logic aspects)

Imperative (Object-‐Oriented and Procedural aspects)

Value Specification {Imperative Assignment, Value Binding, Constraint}

Imperative Assignment Value Binding, Constraint Imperative Assignment

Element Creation {Implicit, Explicit} Explicit Implicit Explicit Typing {Untyped, Typed, Weak,

Strong, Static, Dynamic}Typed, Strong, Static Typed, Strong, Static Typed, Strong, Static

Syntactic Separation {Yes, No} No Yes No Multidirectionality {Yes, No} No Yes No Application Conditions {Yes, No} No Yes (when clause) Yes (when clause) Intermediate Structures {Yes, No} Yes (e.g., any Java Objects) Yes (Variables) Yes (Traces, Variables,

Metaclasses, Metaattributes) Parametrization {Control Parameters, Generics,

Higher-‐Order Rules}Control Parameters, Generics Control Parameters Control Parameters, Higher-‐

Order Rules Reflection {Yes, No} Yes No Yes ("this" variable) Aspects {Yes, No} No No NoRule Application Control {} -‐ -‐ -‐ Location Determination {} -‐ -‐ -‐ Deterministic {Yes, No} Yes Yes Yes Non-‐Deterministic {No, Concurrent, One-‐Point} No No No Interactive {Yes, No} No No No Scheduling {} -‐ -‐ -‐ Form {} -‐ -‐ -‐ Implicit {Yes, No} No Yes No Explicit {No, Internal, External} Internal Internal (where clause) Internal Rule Selection {Explicit Condition, Non-‐

Determinism, Conflict Resolution, Interactive}

Explicit Condition Explicit Condition Explicit Condition

Rule Iteration {No, Recursion, Looping, Fixpoint Iteration}

Recursion, Looping Recursion Recursion, Looping (e.g., over collections)

Phasing {Yes, No} No No Yes (on operation basis, e.g., init, end, lateresolve)

Rule Organization {} -‐ -‐ -‐ Modularity Mechanisms {Yes, No} Yes (Classes) Yes (Compilation Units) Yes (Libraries) Reuse Mechanisms {Inheritance, Logical

Composition}Inheritance, Logical Comp. No Inhertiance, Logical Comp.

Organizational Structure {Source-‐Oriented, Target-‐Oriented, Independent}

Independent Independent Independent

Source-‐Target Relationship {} -‐ -‐ -‐ New Target {Yes, No} Yes Yes Yes Existing Target {Yes, No} Yes Yes Yes Update {Destructive, Extension Only} Destructive Destructive Destructive In-‐Place {Yes, No} Yes Yes YesIncrementality {} -‐ -‐ -‐ Target-‐Incrementality {Yes, No} No Yes (cf. check-‐before-‐enforce) No Source-‐Incrementality {Yes, No} No Yes (cf. enforce-‐by-‐deletion) No Preservation of User Edits {Yes, No} No Yes NoDirectionality {Unidirectional,

Multidirectional}Unidirectional Multidirectional Unidirectional

Tracing {} -‐ -‐ -‐ Dedicated Support {Yes, No} No Yes Yes Creation {Manual, Automatic,

Automatic/Tunable}-‐ Automatic Automatic

Storage Location {} -‐ -‐ -‐ Model {No, Source, Target} -‐ No No Separate {Yes, No} -‐ Yes Yes

Table 4.1: Qualitative Comparison of M2M Approach/Language/Engine Combi-nations

54

4.1 Approach/Language/Engine Combinations

Java and QVT-O have generally most features in common. They mainly differ(1) in their dedicated support for transformation features and (2) in the numberof features specialized for transformations. Regarding the first difference, QVT-Osupports application conditions in the form of the when clause. Java has no dedi-cated support for specifying the conditions whether a method can be executed (inthe form of preconditions). On the one hand, Java assertions can be seen as a lim-ited form of application conditions enabling a non-public (in-method) specificationof conditions. However, as they are a general purpose assertion mechanism, theycannot be seen as a dedicated feature for checking the applicability of a method.The same argumentation holds for “simulating” assertions with an if-constructcausing an error in case it evaluates to false. On the other hand, frameworksproviding a design-by-contract facility allow the specification of application condi-tions in the form of preconditions. An example for such a framework is iContract[Kra98]. However, as this thesis considers Java and EMF only, applying externalframeworks is not considered and left as a future work.

Furthermore, QVT-O provides a dedicated support for rule-based phasing, e.g.,allowing to structure rules according to an initialization and finishing phase, anda dedicated support for tracing. In contrast to Java, QVT-O also has a dedi-cated support for higher-order rules for transformation rule parametrization. Theparametrization feature of Java supports generics; QVT-O does not.

Secondly, QVT-O provides a more specialized support for features, i.e., it leavesout details that may not be needed for transformations. While Java allows tocreate arbitrary Java objects as intermediate structures, QVT-O is restricted tocreating traces, variables holding values of primitive data types or metaclasses, aswell as intermediate metaclasses or metaattributes. Java provides general reflec-tion mechanisms, e.g., for a reflective access to Java classes. QVT-O’s reflectionmechanisms are specialized for accessing a transformation only: the pre-definedthis variable allows to refer to its attributes and operations. Moreover, the ruleiteration mechanisms of QVT-O provide specialized mechanisms to loop rulesover collections and to organize rules with the help of transformation librariesas a modularity mechanism. Java provides looping mechanisms but these are nodedicated structures for applying transformation rules. Furthermore, Java classesare a modularity mechanism in general but are not specialized for the context ofspecifying transformation rules.

The observations regarding dedicated and specialized transformation featuressupport coincide with the statement in Section 3.2.3 derived from the work of Czar-necki and Helsen [CH06]: operational approaches are similar to direct-manipulationapproaches. While Czarnecki and Helsen [CH06] name facilities for expressingcomputations on metamodels and tracing as main reasons for this, the compari-son in Table 4.1 provides a more fine-grained view on the reasons. The “facilitiesfor expressing computations on metamodels” are not explicitly covered by theclassification, i.e., the classification is incomplete in this area. QVT-O uses animperative OCL variant and Java sticks to EMF and standard Java mechanisms.Again, QVT-O provides a more specialized variant.

55


Taking QVT-R into these considerations, three classes of differences can befound: (1) features exclusively supported by QVT-R, i.e., features unsupportedby Java as well as QVT-O or features having a different value assigned than anyother language, (2) features solely unsupported or less supported by QVT-R, and(3) features that do not fit into one of the first categories.

Firstly, the features exclusively supported by QVT-R are the following. QVT-Ris the only language that follows declarative paradigms (“language paradigm”feature) as the other languages are imperative. This also results in the differencesin the features “Value Specification” (QVT-R uses value binding and constraintsinstead of imperative assignments), “Element Creation” (QVT-R uses an implicitcreation instead of an explicit one), and “Implicit” scheduling (QVT-R’s engineis responsible for determining the order of rule applications whereas Java andQVT-O stick to internal scheduling mechanisms only).

QVT-R exclusively allows multidirectional transformation rules and transforma-tions. The syntactic separation feature for transformation rules relates to thesefeatures and is also exclusively supported by QVT-R. Java and QVT-O allow mul-tidirectional transformations in principle but have no dedicated support for thesetransformations (it is always possible to specify two sets of transformation rules:one for the forward direction and one for the backward direction).

Furthermore, the “Incrementality” feature is exclusively supported by QVT-Rby providing the means for target- and source-incrementality as well as for re-taining user modifications. Medini QVT implements target-incrementality byits check-before-enforce semantics and source-incrementality by its enforce-by-deletion semantics.

Another important issue is that QVT-R provides a graphical syntax in generalbut Medini QVT does not support it. This has the consequence that the influenceof using a graphical syntax during transformation development cannot directlybe measured within the Medini QVT development tool. However, creating thegraphical syntax manually for a comparison is still an option. Therefore, everylanguage uses a textual syntax.

Secondly, the features solely unsupported by QVT-R include reflection andreuse mechanisms (the QVT specification describes the extends/overrides con-cepts as reuse mechanisms but Medini QVT does not support these concepts;cf. Section 3.4.1). Features that are more limited in QVT-R only are “Inter-mediate Structures” (QVT-R only supports variables), rule “Parametrization”(QVT-R only supports control parameters), “Explicit” scheduling (QVT-R’s in-ternal scheduling is restricted to where clauses), and “Rule Iteration” (QVT-Ronly provides support for recursion).

Thirdly, the “Phasing”, “Modularity Mechanisms”, and “Tracing” features donot fit into one of the first categories for QVT-R. Like Java, QVT-R does notprovide phasing mechanisms which makes QVT-O the only representative of thisfeature. QVT-R’s modularity mechanisms (compilation units [Obj11a]) can becompared to QVT-O libraries as these provide dedicated support to organize rules,too. Finally, tracing is supported by QVT-O and QVT-R but not by Java.

56

4.2 Case Study Scenarios

To summarize, QVT-R mostly coincides with the descriptions of relational ap-proaches of Section 3.2.5 which is based on the work of Czarnecki and Helsen[CH06]. An exception is the statement that relational approaches do not allowin-place updates as these are supported by QVT-R. Furthermore, the featuresexclusively supported by QVT-R provide a more detailed view on features in-herent to relational approaches (compared to the more general descriptions byCzarnecki and Helsen [CH06]). The features of the second class point to featureswhich may be added to the Medini QVT engine. For instance, they include fea-tures that target maintainability quality properties like reusability and modularity(e.g., “Reuse Mechanisms” and “Parametrization”). Whether these features reallyachieve an improved maintainability is subject to quantitative comparisons. Thethird class of features suggests similar statements when comparing the consideredapproach/language/engine combinations. These considerations serve as a basisfor corresponding hypotheses of the GQM plan in Chapter 5.


Table 4.2 shows the table comparing the case study scenarios: (1) SimpleUMLto SimpleRDBMS, (2) Ecore to Copy HOT, (3) MOM Completion, (4) MediniQVT’s Shapes-Tutorial. The first three scenarios have a separate column whichmakes the respective features of a combination explicit. The Shapes-Tutorial issplit up in two columns: one regular column for the Copy scenario and one columncombining the feature assignment of scenarios Rule1 to Rule12. The latter columnstates for a scenario RuleX (X ∈ {1, . . . , 12}) which feature assignment is differentcompared to the assignments of the Copy scenario. For instance, Rule1 is equalto the Copy rule but maps elements of type “Circle” to elements of type “Square”instead of mapping them to elements of type “Circle”. Another example is Rule2which, in contrast to the Copy scenario, includes a hierarchical flattening. Thefeature assignments are based on the classification of Section 3.5.2 as well as thedescriptions of the respective scenario in Section 3.5.3.

Concerning the “Domain” features, the scenarios can be divided into thirteenendogenous scenarios (Copy and Rule1-Rule12), two exogenous scenarios (Sim-pleUML to SimpleRDBMS and Ecore to Copy HOT), and one scenario containingboth, endogenous and exogenous, aspects (MOM Completion). MOM Completionand Rule11 are the only scenarios involving two source models (static modes “in”and “in/out”, respectively); the other scenarios have exactly one source model(static mode “in” and “in/out” for the Shapes-Tutorial scenarios, respectively).Moreover, every scenario includes exactly one target model (static mode “out”and “in/out” for the Shapes-Tutorial scenarios, respectively). The consideredmetamodels are generally distinct when considering different scenarios; solely theShapes-Tutorial scenarios represent scenarios working on a common metamodel.

57


!"#$%&"

'#(%")*#+,"

-./

0("123)$4)

-./

0("*562-

784&")$4)940:);<=

2<2)94/0("$.4+

->#0"?@)940:

->#0"?@)*%("A

!"#

$%&

BCD

DD

DD

''(&)

"*+&

",-

B54/#.+EDF54/#.+EG)HHHC

DD

I92DFI92

->#0"?DF->#0"?

*%("JJ@)KL->#0"?DF->#0"?

''(."*+&"

,-B54/#.+EDF54/#.+6G)HHHC

-./

0("123DF-./

0("*562-

784&"DFM'=D*

2<2)2

#&N/4O"(DFI92

DD

''/0$0%1'2")

+BC

DD

DD

D

''''3&

'4/",

51+6

B54/#.+EG)54/#.+6G)HHHC

-./

0("123

784&"

I92G)2<2)2

#&N/4O"(

DD

''''7,0'48

$5*+06

B54/#.+EG)54/#.+6G)HHHC

-./

0("*562-

M'=D*

I92

DD

''''3&

97,0'4/",

51+':'8$5*+06

B54/#.+EG)54/#.+6G)HHHC

DD

DKL->#0"?

*%("JJ@)PL->#0"?

/",51+;8$5*+0'<+=$0%"&->%?

BCD

DD

DD

''@+A

'8$5*+0-

B54/#.+E)2

4O"(JG)HHHC

-./

0("123)2

4O"(

M'=D*)2

4O"(

I92)2

4O"(

->#0"?)=#&,"$)24O"(

D

''(.%-0%&*'8$5*+0-

B54/#.+E)2

4O"(JG)HHHC

DD

DD

*%("Q@)->#0"?)-4%&8"R=#&,"$)24O"(

''''B?)

$0+

B54/#.+E)2

4O"(JG)HHHC

DD

DD

*%("Q@)->#0"?)-4%&8")2

4O"(

''''''<+0$%&'B-+5'2")

%C%1$0%"&-

B54/#.+E)2

4O"(JG)HHHC

DD

DD

*%("Q@)->#0"?)=#&,"$)24O"(

''''3&

;D=$1+

B54/#.+E)2

4O"(JG)HHHC

DD

DD

D

!%5+10%"&

$=%0E

BCD

DD

DD

''B&%)%5+10%"&$

=B54/#.+EDF54/#.+6G)HHHC

-./

0("123DF-./

0("*562-

784&"DFM'=D*

I92DFI92G)2<2)2

#&N/4O"(DFI92

->#0"?DF->#0"?

*%("Q@)S4$)%+.O.&"8$.4+#(

''2,=0%)%5+10%"&$

=B54/#.+ETDF54/#.+6G)HHHC

DD

DD

*%("Q@)->#0"?TDF->#0"?

/1+&

$5%"'F5%0+5%$

BCD

DD

DD

''G%+5$51>%1$=

BCD

DD

DD

''''H=$00+&

%&*

B;."&#&8>.8#()7("/"+$U?V)E)DF)!(#$$"+"O)7("/"+$U?V)6G)

HHHC

S4+D0&./.$.W")E$$&.X%$"?)DF)

-"$)4Y)5#$#X#?")94(%/+?

DD

D*%("K@)Z++"&)6(48N)->#0"?)DF)*44$)

6(48N)->#0"?

''''F=,-0+5%&*

B!(#$$"+"O)7("/"+$U?V)E)DF);."&#&8>.8#()7("/"+$U?V)6G)

HHHC

DD

DD

*%("[@)!4&NR\4.+)9.&8("?DF6(48N

''IJI'<+=$0%"&

BCD

DD

DD

''''2

$??%&*

BS4O")]7O,"^)7("/"+$U?V)E)DF)S4O")]7O,"^)7("/"+$U?V)

6G)HHHC

I#8N#,"?)DF)-8>"/#?G

9(#??"?)DF)=#X("?G

E$$&.X%$"?)DF)94(%/+?G

E??48.#$.4+?DF!4&".,+)_":?

I#8N#,"?)DF)=&#+?Y4&/

#$.4+?G

9(#??"?)DF)=#X("?G

E$$&.X%$"?)DF)*"(#$.4+?

D7("/"+$?)4Y)=:0")E)DF)

7("/"+$?)4Y)=:0")E

*%("J@)9.&8("?)DF)-`%#&"?)U#OO.$.4+#()

?0"8.#()8#?"V

''''!,$

=%0E

BS4O")]7O,"^)7("/"+$U?V)E)DF)7O,")]S4O"^)7("/"+$U?V)

6G)HHHC

*"Y"&"+8"?)DF)*"(#$.4+?

DD

*%("a@)9.&8("?DFE&&4b?G)

E&&4b?DF9.&8("?

''KL-05$10%"&

'M+N+=

BCD

DD

DD

''''KL-05$10%"&

B94+8&"$")7("/"+$U?V)E)DF)EX?$&#8$)7("/"+$U?V)6G)HHHC

DD

DD

*%("c@)=&.#+,("?)DF)BCG)*%("d@)7/0$:)

6(48N?)DF)-`%#&"?

''''<+C%&+#

+&0

BEX?$&#8$)7("/"+$U?V)E)DF)94+8&"$")7("/"+$U?V)6G)HHHC

-./

0("123)DF)-./

0("*562-)

U84#&?"D,&#.+"OV

De"+"&#()I92)2

4O"()DF)94+8&"$")2

<2)

U84#&?"D,&#.+"OVG)E++4$#$"O)

94++"8$4&?)DF)2

<2)94/0("$.4+?

D*%("P@)E&&4b)DF)E&&4bD-`%#&"DE&&4bG)

*%("f@)-`%#&")DF)6(48N)b.$>)9.&8("G)

*%("Jg@)9.&8(")DF)-`%#&"DE&&4bD9.&8("

''/05,10,5+

BCD

DD

DD

''''<+C$10"5%&*

B6#O)-/"(()7("/"+$U?V)E)DF)*"Y#8$4&"O)7("/"+$U?V)6G)

HHHC

DD

DD

*%("JK@)2#$>"/#$.8#()7h0&"??.4+)E)DF)

2#$>"/#$.8#()7h0&"??.4+)6

''''7?0%#

%O$0%"&

B34b)M%#(.$:)7("/"+$U?V)E)DF);.,>)M%#(.$:)7("/"+$U?V)

6G)HHHC

DD

DD

D

''''2

%*5$0%"&

BI(#$YHR3#+,H)E)7("/"+$U?V)6)DF)I(#$YHR3#+,H)Ei)

7("/"+$U?V)6iG)HHHC

DD

DD

D

''''<+&

"N$0%"&

B6%,,"OR<

%$O#$"O)7("/"+$U?V)E)DF)*"+4W#$"O)

7("/"+$U?V)6G)HHHC

DD

DD

D

''''@"5#$=%O$0%"&

B*"O%+O#+$)7("/"+$U?V)EG)HHHC

DD

DD

D

''G%*>+

5;75)+5'85$&-C"5#

$0%"&BC

DD

D)U%?"?);<=V

DD

''''/E&0>+-%-

B=#&,"$)=&#+?Y4&/

#$.4+)2

"$#/4O"()EG)HHHC

DM'=D*)2

4O"(

DD

D

''''K&$

=E-%-

B-4%&8")=&#+?Y4&/

#$.4+)2

"$#/4O"()EG)HHHC

DD

DD

D

''''4!

+6F"

#?"

-%0%"&

B-4%&8"R=#&,"$)2"$#/4O"()EG)HHHC

DD

DD

D

''''2

")%C%1$0%"&

B-4%&8"R=#&,"$)2"$#/4O"()EG)HHHC

DD

DD

D

Table 4.2: Qualitative Comparison of Scenarios

58


The “Source-Target Relationship” features show that creating a new targetmodel is the common case for the considered scenarios. Rule9 additionally appliesupdate transformations to the source model as well as to the target model whereuser modifications within the target model are retained.

Regarding the “Directionality” features, the common case are unidirectionaltransformations. Rule9 is an exception as the scenario describes transformationsin two directions. The static mode of Rule9 (in/out) is an important prerequisiteto accomplish this bidirectional transformation.

The scenarios show several differences in their “Scenario Criteria” features. Hi-erarchical flattening only occurs in the SimpleUML to SimpleRDBMS and Rule2scenarios. Rule8 is the sole representative of hierarchical clustering.

In contrast, 1:1 relation mappings are common scenario criteria as they areapplied by every scenario except for the MOM Completion scenario. The MOMCompletion scenario includes only indirectly 1:1 relation mappings because theHOT it uses includes 1:1 relation mappings. 1:1 relation duality occurs in theMOM Completion and the Rule7 scenarios only.

The feature assignments related to changes in the abstraction level subdivideinto a coarse-grained and a fine-grained view. The view is generally fine-grainedwhen considering transformations of single elements and coarse-grained depend-ing on the context when considering the overall transformation. As the Shapes-Tutorial is a generic set of transformations for illustrating transformation patterns,the context is not clearly defined. Hence, only the fine-grained view is considered.Rule5 and Rule6 are the only scenarios that involve abstractions (fine-grainedview). Refinements from a coarse-grained view are included in the SimpleUML toSimpleRDBMS and the MOM Completion scenarios. Here, the context makes thecoarse-grained view possible: for instance, adding information about a message-oriented-middleware to a general component structure is commonly seen as arefinement step towards an implementation of a system modeled by components.Refinements from a fine-grained view are included in the MOM Completion sce-nario, too, as well as in the Rule3, Rule4, and Rule10 scenarios.

Scenario criteria related to structural changes are less common in the consideredcase studies. Solely the Rule12 scenario involves structural changes; a refactoring.This can be a sign that, for the considered case studies, the provided classificationis too fine-grained. However, when adding more case studies to the comparisonin future work, a more fine-grained view on structural changes can turn out to beuseful.

Similar statements hold for the “Higher-Order Transformation” feature. TheEcore to Copy HOT is the only case study that is a HOT. The MOM Completiononly uses a HOT but is not a HOT on its own.

59

5 Goal/Question/Metric Plan

This chapter compiles the Goal/Question/Metric (GQM) plan used within thisthesis. It plans the measurements needed to assess and quantitatively comparequality properties of the different M2M transformation approaches, languages,and engines. The GQM plan builds on the assumption that concrete scenarioimplementations can be used to accomplish the comparison. That is, a plannedmeasurement can relate to a concrete scenario implementation specified in a con-crete M2M language. This implementation can then be executed by a concreteM2M engine allowing to take measurements regarding the engine. Furthermore,the thesis assumes that a combination of language and engine represents a con-crete M2M approach. The basis for the compilation of the GQM plan are theresults of the qualitative comparisons in Chapter 4 since these results allow toderive metrics and hypotheses for concrete measurements.

The chapter follows the standard process when compiling a GQM plan (cf.“definition phase” of the GQM method in Section 2.2.1). Therefore, it states anddiscusses the general goal of the GQM plan. The thesis refers to this goal as“general goal” as it plans to cover quality properties in general; the overall goalof the framework. However, a full coverage of all possible quality properties isout of the thesis’ scope. To handle this, this thesis reuses results of the existingliterature and refines the general goal towards maintainability properties sincemaintainability properties are covered by the related work to a great degree. Thisapproach illustrates how to apply the framework to a particular quality propertybut leaves the refinement of other quality properties as a future work. Nonetheless,this chapter compiles a general template for deriving GQM question, metrics, andhypotheses for M2M transformation quality assessment in general. The templatehas the advantage that it can be applied for refining the general goal, i.e., it issuitable for quality properties in general. It is therefore a part of the providedframework.

This chapter is structured as follows. Section 5.1 states and discusses the generalgoal of the GQM plan. Thereafter, Section 5.2 describes and discusses the relatedwork with similar goals to the general goal. Section 5.3 refines the general goal byselecting quality properties the thesis is interested in (maintainability properties)and discusses this selection based on the related work. Section 5.4 compiles thegeneral template. After this, Section 5.5 derives the concrete questions, metrics,and hypotheses corresponding to the refined goal (targeting maintainability). Thisshows the applicability of the template to one concrete set of quality properties.Finally, Section 5.6 provides a short discussion of the compiled GQM plan.

61

5. Goal/Question/Metric Plan

5.1 General Goal

A GQM plan’s measurement goal should be understandable and clearly structured[vSB99, p. 51]. For this purpose, Solingen and Berghout [vSB99, p. 51] providea GQM goal definition template. The template covers purpose (the object undermeasurement and why it is measured), perspective (the aspects of measurementand who measures), and context characteristics (the environment the measure-ments takes place) of the goal to be reached.

Table 5.1 applies the template in the context of this thesis and, thus, definesthe general goal of the GQM plan. The first two rows of Table 5.1 reflect thecentral idea and research goal of this thesis, namely to use concrete M2M scenarioimplementations to compare different M2M dimensions (purpose). Row three andfour restrict the aspects of the measurement to quality properties important fromthe viewpoint of a transformation engineer (perspective). Finally, the last rowdictates that this thesis is only interested in a scenario in which a transformationengineer needs to implement, execute, and maintain a transformation (contextcharacteristics).

Analyze M2M scenario implementationsFor the purpose of comparing M2M approaches, languages, and enginesWith respect to quality propertiesFrom the viewpoint of the transformation engineerIn the context of M2M transformation implementation, execution, and

maintenance

Table 5.1: General Goal Definition

5.2 Related Work with Similar Goals

This section describes and discusses the related work targeting similar qualityassessment goals as the general goal defined in Section 5.1. A general observationis that no related work covers all quality properties as defined in Section 3.6.Instead, the different authors focus on a single main quality property or on a setof subproperties of the main quality properties.

Kapova et al. [KGBH10] focus on the maintainability of M2M transformations.Their main contribution is to provide a set of maintainability metrics for QVT-Rtransformation specifications. They argue which metric values might improve orreduce maintainability and relate the metrics to different subproperties of main-tainability. They use the SimpleUML to SimpleRDBMS, Ecore to Copy HOT, andMOM Completion case studies as described in Section 3.5.3 for showing the appli-cability of their approach and to provide initial evaluation results. Furthermore,they do not empirically evaluate the relation between metrics and maintainability

62

5.2 Related Work with Similar Goals

quality properties and leave it as a future work.1 The main differences to thisthesis are that Kapova et al. (1) do not compare different M2M languages and(2) do not use a mature scenario classification as a basis for measurements (theyuse a classification into “simple”, “medium-complex”, and “complex” dependingon the number of metamodel classes). However, the metric set they provide isuseful when deriving maintainability metrics for the GQM goal of this thesis.

In his PhD thesis, van Amstel [vA11] makes several contributions in the areaof maintainability and development-related quality properties for M2M transfor-mation. Firstly, he provides definitions for the quality properties he considersimportant in the context of M2M transformations [vA11, pp. 48-50]. Van Am-stel’s selection of important properties is based on a combination of the qualitymodels by Boehm [Boe78] and the ISO/IEC 25010 standard [ISO11]. However,van Amstel does not give any reasons why quality properties he excluded shouldbe less important (e.g., testability and learnability). Nonetheless, the results forthe quality properties he does consider can be used within this thesis.

For this, Table 5.2 maps between the quality properties of Section 3.6 (col-umn 1) and the quality properties selected by van Amstel (column 2).2 Mostproperties can directly be mapped as van Amstel’s definitions coincide with thedefinitions of Section 3.6 for these properties. This is the case for the mappingbetween functional completeness/correctness and completeness, functional appro-priateness and conciseness, appropriateness recognizability and usability as wellas for the mappings of modularity, reusability, and modifiability. Consistency hasno direct counterpart within ISO/IEC 25010’s quality model. Van Amstel definesconsistency as the degree to which “a model transformation is implemented in auniform manner” [vA11, p. 50]. This may be implicitly included in the appro-priateness recognizability property, however, this is only a hypothesis. Therefore,this thesis considers consistency as an additional and separate subproperty ofmaintainability to make use of van Amstel’s contribution regarding consistencyand without requiring to stick to the latter hypothesis.

Secondly, van Amstel derives metrics for the transformation languages ATL,QVT-O, Xtend, and ASF+ADF regarding each of his quality properties [vA11,pp. 105-113]. The metrics for ATL are similar to the QVT-R metrics of Kapovaet al. [KGBH10]. Furthermore, van Amstel provides measurements on case studyimplementations in ASF+ADF and ATL regarding the metrics he identified [vA11,pp. 57-103]. A strong point of van Amstel’s work is that he also empiricallyevaluates the relation between his metrics and the quality properties. For this, he

1In personal communication with the authors of the cited paper, they provided me the follow-up paper which includes empirical evidence [KGH12]. They use questionnaires for this,similar to van Amstel [vA11, pp. 198-201]. Their questionnaires are already available athttp://www.furcas.org/survey/qvtr-questionnaire.html (last retrieved 2012-10-17).

2Kapova et al. [KGBH10] use similar terms as van Amstel. Hence, the table allows this thesisto use the results of Kapova et al., too.

Moreover, Berander et al. [BDE+06] provide a general mapping between ISO/IEC 25010’s[ISO11] former standard, the ISO/IEC 9126-1[ISO01], and Boehm’s quality model.

63

http://www.furcas.org/survey/qvtr-questionnaire.html


Properties of Section 3.6 Properties of van AmstelFunctional Completeness & Correctness (Section 3.6.1) CompletenessFunctional Appropriateness (Section 3.6.1) Conciseness- ConsistencyAppropriateness Recognizability (Section 3.6.4) UnderstandabilityModularity (Section 3.6.7) ModularityReusability (Section 3.6.7) ReusabilityModifiability (Section 3.6.7) Modifiability

Table 5.2: Mapping Between the Quality Properties in Section 3.6 and van Ams-tel’s [vA11, pp. 48-50]

used questionnaires regarding the concrete scenario implementations and handedthese out to experienced users in ASF+ADF [vA11, pp. 66-77] and ATL [vA11,pp. 94-103]. He consulted four users regarding six implementations for ASF+ADFand nineteen users regarding seven implementations for ATL.

The differences between this thesis and van Amstel’s work are similar to thedifferences between this thesis and the work of Kapova et al. [KGBH10]. Thatis, van Amstel (1) does not compare between different M2M languages and (2) hedoes not use any scenario classification as a basis for his measurements. However,van Amstel’s work is especially important for deriving the metrics in Section 5.5as his work gives empirical evidence that the applied metrics can be used to assesspecific quality properties.

Goldschmidt and Kubler [GK08] follow an approach similar to this thesis. Theyderive a GQM plan for assessing the maintainability of M2M transformations.However, their plan is at an early stage: it only covers analyzability and is notapplied to concrete measurements. They also do not consider the comparison ofseveral languages or the idea of using a scenario classification.

In his Master’s thesis, Bosems [Bos11] assesses the time behavior quality prop-erty. He compares the time behavior of two different case study scenarios imple-mented in QVT-R, QVT-O, and ATL; executed by Medini QVT, QVT Opera-tional, and the ATL Transformation Engine, respectively. Furthermore, he definessize and complexity metrics for classifying the applied models along these dimen-sions. For the measurements, he varies the case studies along these dimensions.He particularly shows the impact on the time behavior when the case studiesare varied like this. Bosems comes to the overall conclusion that ATL performsbest with respect to the execution time of transformations3, followed by QVT-O,and finally QVT-R. He identifies the fact that the ATL Transformation Engineis implemented as a compiler as one main reason that ATL performs best. TheQVT-O and QVT-R engines he considers are implemented as interpreters. Heneither considers using (1) SmartQVT (cf. Section 3.4.4), which compiles QVT-Otransformations, nor (2) the QVT-O transformation AST model, which is always

3To measure the “execution time”, he uses the sum of the time for loading and serializingmetamodels and models as well as the execution time itself.

64

5.3 Goal

created anew by the default settings of the QVT Operational engine, as a com-piled version of the QVT-O transformation4. Therefore, Bosems conclusion ismodest as it is consistent with the general (and well-known) observation thatcompilers usually perform better than interpreters regarding their execution time.The comparison between the time behavior of QVT-O and QVT-R indicates thatrelational approaches can come with higher rule scheduling costs but Bosems doesnot investigate this issue in detail. Furthermore, he does not consider a maturescenario classification (only complexity and size metrics). Especially the fact thathe only considers two case studies lowers the statistical significance of his work.Nonetheless, his general idea of how to compare different languages regarding aquality property is very similar to this thesis and his investigations regarding timebehavior are a good starting point for future work.

5.3 Goal

This section refines the general goal introduced in Section 5.1 to ensure the GQMgoal fits the scope of a Master’s thesis. This thesis simply refers to the refinedgoal as the “goal” (of the GQM plan) or the “GQM goal”.

Table 5.1 introduces the goal. Except for row three, the goal equals the generalgoal. Row three further restricts the perspective of the general goal: instead ofconsidering all quality properties, this thesis is only interested in the maintainabil-ity quality property. One reason for this refinement is the discussed related workin Section 5.2. Although the related work considers maintainability as an im-portant quality property for M2M transformations, it lacks comparisons betweendifferent languages and mature scenario classifications. Moreover, this thesis canprofit from the maintainability metrics derived within the related work.

Analyze M2M scenario implementationsFor the purpose of comparing M2M approaches, languages, and enginesWith respect to maintainabilityFrom the viewpoint of the transformation engineerIn the context of M2M transformation implementation, execution, and

maintenance

Table 5.3: Goal Definition

Restricting the perspective to the maintainability quality property does notnecessarily remove the consideration of other quality properties and subproper-ties within the GQM plan. For instance, functional suitability, usability, andmaintainability can influence each other. An example is the “learnability” sub-property of usability as, in order to maintain a transformation specification, thecorresponding M2M language needs to be learned first. The easier to learn a

4Doing so has been reported as successful in the Eclipse.org forums; see http://www.eclipse.org/forums/index.php/t/169070/ (last retrieved 2012-10-17)

65

http://www.eclipse.org/forums/index.php/t/169070/

http://www.eclipse.org/forums/index.php/t/169070/


language, the easier it can be to (re)understand and maintain a transformationspecification. Consequently, this thesis includes learnability properties into itsconsiderations when comparing maintainability even though it does not providea full coverage of usability properties in general. Regarding functional suitability,this thesis assumes that there are no differences between the different scenarioimplementations and does not take functional suitability into its considerations.The “testability” property involves the establishment of test criteria and the exe-cution of tests checking whether the criteria have met. A complete investigationof possible test criteria is out of this thesis’ scope. Therefore, this thesis excludestestability from its considerations and leaves it as a future work.

5.4 Template for Questions, Metrics, andHypotheses

When deriving questions, metrics, and hypotheses for the general goal or for theGQM goal, several of these issues follow a similar scheme and hold for more thanone quality property. Therefore, this section introduces a template for deriv-ing general questions, metrics, and hypotheses for the general GQM goal. Thishas the advantage that it can be included into the framework without restrict-ing its applicability to maintainability. The basic idea for the goal refinements isthat each subproperty X of a quality property (as defined by the ISO/IEC 25010[ISO11] standard) helps formulating concrete questions regarding the quality prop-erty. Examples for these subproperties X are the “modularity” subproperty ofthe “maintainability” quality property or the “learnability” subproperty of the“usability” quality property. The related work discussed in Section 5.2 providesfurther guidance for deriving questions.

The template provides two generic questions: (1) “what is the quality propertyX of the implementations?”, and (2) “what are the reasons for differences inquality property X?”. Section 5.4.1 explains the first question in detail and derivescorresponding (generic) metrics and hypotheses for it. Afterwards, Section 5.4.2treats the second question analogously.

5.4.1 What is the quality property X of the implementations?

Question. The first question “What is the quality property X of the implemen-tations?” asks for a quantified value of a quality (sub)property X regarding animplementation within one M2M approach, language, and engine. The answerto this question allows to assess X in a given scenario and regarding a concreteM2M approach/language/engine. Particularly, the answer makes the comparisonas requested by the GQM goal operational if the answer is manifested as an ordi-nal value per M2M approach/language/engine combination and scenario. In thefollowing, this thesis refers to the first question as “quality question”.

66

5.4 Template for Questions, Metrics, and Hypotheses

Metrics. This GQM plan considers two types of metrics for the quality question:(1) metrics measurable by inspecting involved artifacts and (2) subjective metricsmeasurable by questionnaires.

The first type needs to be further refined per quality property X as appro-priate metrics differ per property. These refined metrics should be expected toeither positively or negatively correlate with X. As an example consider the sub-property “modularity” of the quality property maintainability (X = modularity).One possible metric for modularity is the “average number of domains” which ismeasured by inspecting the transformation specification of a concrete M2M lan-guage. In Java, it measures the number of parameters per method; in QVT-Othe number of parameters per operation (entry operation, mappings, helpers, andconstructors); in QVT-R the number of applied domains per relation. Kapova etal. [KGH12] identify a positive correlation with modularity for this metric (at leastfor QVT-R). Note that this metric cannot be applied for all quality properties. Forinstance, the “accountability” subproperty of security is obviously independent ofthe number of domains.

The second type of metrics assesses the quality of X by questionnaires. Alsothis metric needs to be refined per quality property as the same questions cannotbe reused for other quality properties. For instance, the question “How would yourate the modularity of the transformation?” regarding the modularity subpropertyof maintainability is clearly inappropriate for the “accountability” subpropertyof security. For evaluating the questions, it holds that the more points qualityproperty X gets within a questionnaire, the higher X (positive correlation). Thisallows to empirically evaluate whether the metrics selected for the first type canbe used for assessing the quality property X (cf. [vA11, pp. 66ff]). Furthermore,van Amstel [vA11, p. 68] emphasizes that it is important to check the consistencyof the answers to the questions as the participants can answer carelessly and, thus,the statistical significance of the questionnaire is lowered. Therefore, he suggeststo ask several similar but different questions per quality property. The qualityproperty definitions of Section 3.6 are a good source to derive more questions for agiven quality property. For instance, another question regarding modularity couldbe “To what degree is the transformation composed of distinct components suchthat a change to one component has minimal impact on the other components?”(cf. Section 3.6.7). Another source for questions are the questionnaires by vanAmstel for ASF+SDF [vA11, pp. 192ff] and ATL [vA11, pp. 197ff] as well as thequestionnaire for QVT-R by Kapova et al. [KGH12]5.

There also exist other metric types besides the two types considered here. Thedata collection procedures as discussed in Section 2.2.3 point to these other met-rics. For instance, metrics could be related to field studies or controlled experi-ments. This thesis does not consider these issues in more depth as it is out of itsscope.

5They published their questionnaire at http://www.furcas.org/survey/qvtr-questionnaire.html (last retrieved 2012-10-17).

67




Hypotheses. It is possible to provide a set of common hypotheses for the qualityquestion, i.e., this thesis expects these hypotheses to be true for any quality prop-erty in general. The GQM plan considers three general hypotheses as describedsubsequently.

The first general hypothesis states that quality properties generally differ be-tween different M2M approach, language, and engine combinations in a givenscenario. This is a central hypothesis as it induces the need for transformationengineers to assess quality properties of M2M techniques in order to select a tech-nique which best fits a given context. To check this hypothesis, the standarddeviation for each of the measurements related to a quality question metric (i.e., asample) can be estimated. This thesis uses Bessel’s correction [NK64, Appendix A]as a formula for estimating the standard deviation since not every possible combi-nation can be sampled (and, thus, the mean of the whole population is unknown):

σ =

√√√√ 1

N − 1

N∑i=1

(xi − x)2,

whereN is the number of considered combinations (hence, N = 3), xi is the samplevalue for combination i, x is the mean of the samples, and σ is the estimatedstandard deviation. Low values for σ falsify the hypothesis while high valuesconfirm the hypothesis. However, it is hard to determine the precise limit for“high” and “low” values. Those limits need to be determined once M2M quality isbetter understood and more empirical evidence is available. As a consequence, thisthesis leaves a detailed investigation of this issue as a future work. Nonetheless, I(subjectively) classify the calculated σ values into “low”, “medium”, or “high” toindicate a tendency whether the hypothesis is correct and to serve as a startingpoint for future work.

The second general hypothesis states that no M2M technique dominates (per-forms better for all considered scenarios) another M2M technique regarding aquality property X. This hypothesis is based on the current popularity of theconsidered M2M techniques [SK09] and the fact that no related work completelyrejects a concrete M2M technique because of weaknesses regarding a certain qual-ity property. To check this hypothesis, the “dominates” relation needs to beformalized. This thesis applies the notations known from game theory for this:QX

S (A) ∈ {low,medium, high} denotes whether the (sub)quality X of an M2Mapproach, language, or engine A in scenario S is low, medium, or high (in orderof increasing quality). As this classification is a totally ordered set, its values areordinal and order relations (<,≤, etc.) can be applied on these values. Then,the dominates relation �X

S for (sub)quality X and scenario S can be defined asA �X

S B with QXS (A) ≥ QX

S (B) for every M2M approach, language, or engine AandB. Thus, it states that M2M approach, language, or engineA performs as leastas well as M2M approach, language, or engine B regarding quality (sub)propertyX and in scenario S. Furthermore, A dominates B regarding X if A �X

S B for

68


all scenarios S. With these formalisms, the hypothesis can simply be checkedif the ordinal quality values low, medium, or high are known for X in a givenscenario. These values can be derived from the subjective questionnaires (secondmetric type of the quality question) by classifying the gained results into thesethree ordinal values. If the metrics of the first type truly correlate with X, theirmeasured values can also be used and classified into the three ordinal values.

The third general hypothesis expresses that quality properties change whenvarying the scenario for a given M2M technique. This reflects another centralhypothesis of this thesis: the scenario affects the quality of the applied M2M tech-nique. Hence, it is important to base the selection of a concrete M2M techniqueon a given scenario. In terms of the formalisms introduced for the second generalhypothesis, the third hypothesis can be formalized as follows: for every A �X

S Bthere exists an S ′ with S 6= S ′ and A 6�X

S′ B (i.e., A does not dominate B in S ′

regarding X). Again, only the ordinal values need to be determined for A and Bfor every scenario and per quality property for checking this hypothesis.

Template. Table 5.4 shows the template for the quality question listing thediscussed question and summarizing its metrics and hypotheses. The templateconsists of three columns: (1) the type dealt with in the corresponding row (ques-tion, metric, or hypothesis), (2) an associated, unique ID per question, metric,and hypothesis, and (3) a brief description of the respective question, metric, orhypothesis.

Type ID Description

Question GQ1∗ What is the quality property X of the implementations?Metric GM1.1.i∗ +/− Metrics specific for quality property X

GM1.2.i∗ + Average X points in questionnaireHypothesis GH1.1 Different M2M approaches/languages/engines come with differ-

ent X values in a given scenario.GH1.2 No M2M approach/language/engine dominates another M2M ap-

proach/language/engine regarding quality property X.GH1.3 Different scenarios come with different X values for a given M2M

approach/language/engine.

∗ : needs to be refined per quality property

Table 5.4: Template for the Quality Question

Some IDs are marked by an asterisk. The question and the metrics belongingto these marked IDs, respectively, need to be refined for a concrete propertyX as described above. As multiple refinements are possible for the first andsecond metric type, the i variable enables to uniquely name those instances ofthe respective metrics type. E.g., the “average number of domains” metric formodularity could get the instance ID “Modu1.1.1”. The next modularity metriccould get the instance ID “Modu1.1.2” and so forth.

69


Furthermore, each of the metrics has a “+” or “−” as a first sign of its de-scription indicating whether the metric is expected to positively (+) or negatively(−) correlate with the respective quality property X. In cases where this cannotbe determined yet, i.e., in cases where the correlation is determined within therefinement step, the template uses the “+/−” sign.

5.4.2 What are the reasons for differences in qualityproperty X?

Question. The second question “What are the reasons for differences in qualityproperty X?” asks for the reasons of differences regarding X when comparingdifferent scenarios and/or approaches, languages, and engines. This builds on theassumption that the first and third hypothesis (GH1.1 and GH1.3) of the qualityquestion are true, i.e., there are differences regarding a quality property X.

As the question asks for a (pair-wise) difference, the entities for which thisdifference is considered need to be explicitly named. These entities can eitherbe two different scenarios or two different approach, language, and engine com-binations. Therefore, the GQM plan uses the following four conventions in thefollowing: (1) S1 and S2 with S1 6= S2 are two different scenarios, (2) S1 andS2 ∈ S := {UML2RDBMS, Ecore2Copy, GeneratedCopyRules, MOM, Copy,Rule1, . . . , Rule12} where the elements of S represent the respective scenario,(3) T1 and T2 with T1 6= T2 are two different M2M approach, language, andengines combinations, as well as (4) T1 and T2 ∈ T := {Java, QVT-R, QVT-O}where the elements of T represent the respective combination of approach, lan-guage, and engine considered within this thesis. Furthermore, this thesis refers tothe second question as “reason question”.

Metrics. The GQM plan considers eleven metrics for answering the reason ques-tion. Each of these metrics is described in the following. The first metric suggeststo consider the qualitative differences which Chapter 4 makes explicit. Under theassumption that the classifications which underlie the qualitative comparison arecorrect, complete, and fine-grained enough, this metric provides a set of the possi-ble reasons. To formalize this metric, let ∆F (T1, T2) ⊆ FT be the set of approach,language, and engine combination features where T1 and T2 differ regarding thefeatures FT induced by Table 4.1 and ∆F (S1, S2) ⊆ FS the set of scenario fea-tures where S1 and S2 differ regarding the features FS induced by Table 4.2. Forinstance, “Element Creation ∈ ∆F (QVT-R, QVT-O)” indicates that QVT-R andQVT-O differ regarding the element creation feature. In particular, ∆F definesthe first metric: it maps to a discrete unordered set of sets (nominal scale) whichgives the set of features where two combinations or scenarios, respectively, have aqualitative difference. Each of its elements is a possible reason for the differencein X.

The second and third metric are based on investigating the differences whencomparing the Copy scenario with each of the Rule scenarios of Medini QVT’s

70


Shapes-Tutorial. The idea is that each Rule scenario is characterized by only singleor few changes regarding scenario features when comparing it to the Copy sce-nario. Hence, changes regarding quality properties directly point to these scenariofeatures as possible reasons for the change.

The second metric tries to count the number of changes that affect the qualityproperty of interest. As this depends on the concrete quality property considered,the metric must be refined for each possible quality property of interest. Thisrefinement could be based on the definitions of the respective quality property inSection 3.6. Here, it is important to be precise when specifying what a “changethat affects X” is and how changes are counted. For instance, the definition of“modularity” defines a change as the “impact on the other components when onecomponent is changed”. One possible metric to count these changes is to countthe absolute number of changes (sum of renamed variables and transformationrules, additional language constructs, etc.) made within the transformation. Thehigher this absolute number, the lower the modularity because a component (inthe sense of a transformation rule) has changed and several other changes had tobe applied, too. Note that this metric abstracts away from counting the changesmade in other transformation rules (cf. the modularity definition) than the onethat was changed. Therefore, other metrics taking this aspect into account couldbe derived, too.

In contrast to the second metric, the third metric measures the relative differ-ence of the ratio-scaled values identified by the quality question’s refined metrics ofthe first type. It is important to work with ratio scales for this metric as nominal,ordinal, and interval scales do not provide the possibility to calculate (reasonable)relative differences. Thus, the disadvantage of this metric is that it cannot be ap-plied on all of the refined metrics. The advantage is that this metric is generic asit generically defines what a “change that affects X” is. As an example, considerthe “average number of domains” metric of modularity, again. It is ratio-scaledsince its origin is 0. If Copy has an average number of domains of 1 and RuleX(both in QVT-R; with RuleX ∈ {Rule1, . . . , Rule12}) an average of 2, the relativedifference between those values is 200%. High values for the relative difference be-tween Copy and RuleX indicate that the elements of ∆F (Copy,RuleX) are thereason for a change in quality.

The second and third metric try to measure the impact on quality attribute Xwhen altering single or few scenario features. However, they are specific to MediniQVT’s Shapes-Tutorial (although not on QVT-R as the Medini QVT scenarioscan also be implemented for other M2M languages). Nonetheless, their basic ideacan be reused for similar cases. This is demonstrated by the fourth and fifthmetric which are equivalent to the second and third but consider the differenceswhen comparing the initial rule set with copy rules created by the Ecore to CopyHOT for the MOM Completion and the MOM Completion itself.

The sixth to tenth metrics help to check whether the fourth to seventh hy-potheses are correct. Therefore, the description of these metrics is given togetherwith the description of the corresponding hypotheses below.

71


The eleventh metric is similar to the quality question metric of the secondtype (GM1.2.i). It enables the empirical evaluation of the selected metrics forthe reason question by questionnaires. For each quality property X, the question“What influences the quality of X in your opinion?” is added to the questionnaire.The answers to this are nominal-scaled values.

To formalize this metric, let QXR be the list of answers for all questionnaires

regarding the reason question for quality X. This list needs first to be normalizedas the answers are, in general, full sentences in natural language. Therefore, thelist is modified to Q′XR by inspecting each of these sentences and identifying therespective feature of the M2M classifications. Identified features which are notpart of the classifications are collected in a separate list F ∗. F ∗ points to missingfeatures of the classification, misunderstood concepts by the participants, featureswhich may target dimensions not considered within this thesis (e.g., the quality ofthe M2M tool itself is not considered within this thesis), etc. Therefore, F ∗ needsto be further inspected and appropriate conclusions have to be drawn.

Q′XR , on the other hand, can be converted to a set Q∗XR eliminating duplicateentries. However, the number of duplicates should be counted as it can be a metricfor the importance and also the obviousness of the respective feature identified asa reason. Moreover, let F := {x | (x ∈ ∆F (S1, S2)∨x ∈ ∆F (T1, T2))∧S1, S2 ∈S ∧ T1, T2 ∈ T} be the set of all features that were identified as possible reasonsby the first reason question metric. The set F ∩Q∗XR points to features identifiedby the first metric as well as by the questionnaire. Therefore, the features of thisset empirically validate the respective features of the first metric. The set F \Q∗XRgives the set of features solely empirically found as reasons. These need to befurther investigated as they could point, e.g., to mistakes by the questionnaireparticipants or wrong classifications. The set Q∗XR \ F , on the other hand, pointsto features not identified by the participants. Also this needs to be further in-spected as it might be an indication that the respective features are no reason fordifferences regarding a quality property. Another interpretation could be that theparticipants just did not thought about the respective feature.

Hypotheses. The GQM plan includes eight general hypotheses regarding thereason question. Like the hypotheses for the quality question, this thesis expectsthese hypothesis to be true for any quality property in general.

The first hypothesis states that features collected by the first metric (thus,the elements of the set F introduced by the eleventh metric) can be used toanswer the reason question and that possible answers are features inherent tothe scenario or approach/language/engine classifications. The eleventh metricallows to empirically validate this hypothesis: the larger the set F ∩ Q∗XR , thehigher is the probability that this hypothesis is true. However, the hypothesiscannot completely be falsified or validated as the two considered sets could, forinstance, both include correct reasons for differences but are disjunct. Therefore,this hypothesis only indicates an expected tendency.

72


The second and third hypotheses target special features of QVT-R. Relatedwork often suggests to use QVT-R for bidirectional transformations and for “sim-ple” transformations (e.g., Guduric et al. [GPT09]). This observation is the basisfor these two metrics.

The second hypothesis states the former issue directly as hypothesis: bidirec-tional scenarios perform best regarding X in QVT-R. To check this hypothesis,Rule9 of Medini QVT’s Shapes-Tutorial needs to be investigated as it is the onlybidirectional scenario. For comparing X of Rule9 between QVT-R, Java, andQVT-O, this thesis sticks to the questionnaire results of the quality question re-garding X (GM1.2.i). As the questionnaire provides ordinal values, X can directlybe compared between QVT-R, Java, and QVT-O such that the hypothesis candirectly be checked. Note that it would also be possible to stick to other metricsof the quality question for checking the hypothesis. As the other metrics are notrefined at this point, this thesis sticks to the (more generic) questionnaire metricand leaves this issue as a future work.

In contrast, the third hypothesis targets the latter issue: QVT-R is especiallyappropriate when the scenario is characterized by a high degree of “1:1 relations”.The hypothesis targets the issue only indirectly as “simple” transformations arenot sufficiently defined (cf. Section 1.1). The rationale behind this is that 1:1relations fit very well into the concept of mathematical relations. In contrast tothe “simple” metric, sticking to the “1:1 relations” metric is more convincing as ittargets a concrete feature of the scenario. Therefore, showing that this hypothesisis correct allows a better reasoning whether to choose QVT-R in a given scenario.

To check the hypothesis, the scenarios with a high degree of 1:1 relations have tobe identified first. A naive approach to accomplish this is to select all the scenariosthat provide 1:1 relations in Table 4.2. With this table, the set R of scenarios with1:1 relations is induced as R = {UML2RDBMS, Ecore2Copy, Copy, Rule1, . . . ,Rule12}. R covers all scenarios except for the MOM scenario. The problem is thatthis first, naive approach for determining a high degree of 1:1 relations selects mostof the scenarios and, thus, may be too coarse-grained for a selection. However, thefact that the MOM scenario is the only scenario which has no direct 1:1 relationssuggests to state the opposite hypothesis: the MOM scenario is not appropriatefor QVT-R. Checking this hypothesis is analogously to the check of the secondhypothesis except that the subject under study is the MOM scenario instead ofthe Rule9 scenario. As it is out of the thesis’ scope, deriving suitable metrics fora “high degree of 1:1 relations” is left as a future work.

The fourth, fifth, and sixth hypotheses target special features of Java (withEMF).

The fourth hypothesis states that modularity and reuse mechanisms are pri-marily applied in Java which generally affects quality properties positive. Therationale behind this is that modularity mechanisms are only useful when ma-ture and highly reusable modules exist. As the Java community has developed arich set of these modules (e.g., Java libraries consisting of several classes), it is toexpect that these libraries can effectively and efficiently be applied within Java.

73


The modules within QVT-O and QVT-R are, in contrast, less mature and veryfew module libraries exist. Another reason for the hypothesis is that inheritanceconcepts are not present in QVT-R at all (more precisely, the QVT specificationdescribes the extends/overrides concepts but Medini QVT does not support theseconcepts; cf. Section 3.4.1). For QVT-O, inheritance exists which can influenceQVT-O implementations positive, similar to the possibilities within Java.

To verify the fourth hypothesis, the sixth metric measures the number ofincluded modules and the seventh metric the number of applied reuse mecha-nisms. The number of included modules equals the number of “import” state-ments within the transformation specification (for all languages). This representsin Java the number of imported classes and in QVT-O as well as QVT-R thenumber of imported transformation libraries. In contrast, the number of appliedreuse mechanisms equals (a) in Java the number of “extends” statements (as Javadoes not support multiple inheritance, the number can either be “0” or “1”), (b) inQVT-O the sum of the number of mapping inheritances, merges, and disjunctions,and (c) in QVT-R “0” because Medini QVT does not provide reuse mechanisms.A more accurate metric for the case of Java would be to additionally count thenumber of delegations. Delegations allow object composition and enable to replaceinheritance as a mechanism for reusing code [GHJV95, pp. 20-21]. However, asthe number of compositions is more complicated to calculate, this thesis leavesthe handling of delegations as a future work. Both metrics can, finally, be use tocheck the fourth hypothesis since the respective measurements are ordinal-scaled.

The fifth hypothesis targets the absence of dedicated application conditionswithin Java (this thesis does not consider Java assertions or frameworks like iCon-tract as a dedicated mechanism; cf. Section 4.1). The hypothesis states that thisgenerally lowers quality properties. The reason for this is that application condi-tions enable locally (per rule) to decide under which conditions the rule is executed.Therefore, the prerequisites of the rules make explicit which assumptions the rulemakes and, thus, foster maintainability.

The eight metric measures the number of application conditions to checkwhether application conditions have a significant influence on quality properties.In QVT-R and QVT-O, the average number of when clauses per transformationrule are calculated. For Java, the metric calculates always a “0” as argued above.

The sixth hypothesis states that the general purpose constructs for intermediatestructures6 within Java generally lower transformation quality properties. Thereason for this is that Java’s constructs are too powerful and unspecialized to fitthe needs of transformations.

For checking the sixth hypothesis, the ninth metric measures the number ofintermediate structures within every language. The metric builds, thus, the ba-sis to check whether these structures influence quality properties. For Java, themetric uses the number of inner classes plus the number of member variables.In QVT-O, the number of intermediate classes and properties is counted. For

6See Section 3.1.1 for a definition and Section 4.1 for an application of intermediate structures.

74


QVT-R, the metric counts the total number of variables. To check the sixth hy-pothesis, it needs to be checked whether there is a negative correlation betweenthe ninth metric and each quality property X. As only maintainability proper-ties are investigated within this thesis, the hypothesis can only be checked formaintainability properties. For identifying a correlation, the GQM plan uses thequestionnaire results of the quality question (regarding X). The description ofthe second hypothesis gives more details on using the questionnaire results.

The seventh hypothesis targets QVT-O’s special feature “phasing on operationbasis”. As it provides the means for systematically structure rules, the hypothesisstates that it generally improves quality properties of transformations.

To check this, the tenth metric measures the average number of distinct phasesper mapping within QVT-O. For Java and QVT-R, the metric calculates alwaysa “0” since these languages have no phasing mechanisms. Similar to the sixthhypothesis, the seventh hypothesis is checked by identifying a negative correlationbetween the tenth metric and the questionnaire results.

Finally, the eight hypothesis states that HOTs generally have a lower main-tainability compared to scenarios that are no HOT. The rationale behind this isthat HOTs are transformations on a higher level of abstraction and, thus, are typ-ically harder to maintain. The scenario classification of Section 4.2 shows that theEcore2Copy scenario is the only HOT of the case studies. To check the hypothesis,the Ecore2Copy questionnaire results of the quality question need to be comparedwith the results of the other scenarios for every maintainability property X. Ifthe hypothesis is correct, the Ecore2Copy scenario provides lower values for everycomparison.

Template. Table 5.5 shows the template for the reason question giving the dis-cussed question and summarizing its metrics and hypotheses. It has the samestructure as the template of the quality question.

Type ID Description

Question GQ2∗ What are the reasons for differences in quality property X?Metric GM2.1 Qualitative differences identified in Chapter 4

GM2.2.i∗ Changes affecting X when moving from Copy to RuleXGM2.3 Relative differences between ratio-scaled GM1 metrics when mov-

ing from Copy to RuleXGM2.4.i∗ Changes affecting X when moving from generated copy rules to

MOM completionGM2.5 Relative differences between ratio-scaled GM1 metrics when mov-

ing from generated copy rules to MOM completionGM2.6 Number of included modules (classes/compilation units/libraries)GM2.7 Number of applied reuse mechanisms (inheritance/logical compo-

sition)GM2.8 Average number of when clausesGM2.9 Number of intermediate structuresGM2.10 Average distinct phases per ruleGM2.11 Answers to GQ2 in questionnaire

75


Hypothesis GH2.1 Features identified by GM2.1 point to scenario or approach/lan-guage/engine features which influence X.

GH2.2 Bidirectional scenarios (e.g., Rule9) generally perform best re-garding X within QVT-R. The reason is that QVT-R has dedi-cated support for multidirectional rules and transformations.

GH2.3 Scenarios with a high degree of 1:1 relations (e.g., Shapes: Copy)generally perform best regarding X within QVT-R. The reasonis that central ideas of relational approaches come from mathe-matical relations which can efficiently and effectively be used toexpress 1:1 relations.

GH2.4 Modularity and reuse mechanisms in rule organization are pri-marily applied in Java. Applying these mechanisms has generallya positive effect on X (related to GM2.6 and GM2.7).

GH2.5 The absence of dedicated application conditions in Java generallylowers X (related to GM2.8)

GH2.6 The general purpose constructs for intermediate structures inJava generally lowers X (related to GM2.9)

GH2.7 Phasing on operation basis generally improves X in QVT-O (re-lated to GM2.10).

GH2.8 Higher-order Transformations generally suffer from a lower Xcompared to scenarios which are no HOTs.

∗ : needs to be refined per quality property

Table 5.5: Template for the Reason Question

5.5 Questions, Metrics, and Hypotheses

This section derives the concrete questions, metrics, and hypotheses for the GQMgoal. As the perspective of the goal is on maintainability, the subproperties ofthe maintainability quality property (modularity, reusability, analyzability, mod-ifiability, and consistency) as well as partly of the usability quality properties(appropriateness recognizability and learnability; cf. Section 5.3) are applied onthe general template of Section 5.4.

Note that the applied metrics for the refinement of the quality questions’ firstmetric (GM1.1.i) primarily refer to the work of Kapova et al. [KGH12]. Kapovaet al. [KGH12] identify metrics that correlate with understandability, modularity,consistency, conciseness, completeness, reusability, and modifiability of QVT-R.By using Table 5.2, the understandability metrics can particularly be applied forappropriateness recognizability. However, for using the metrics within this thesis,the thesis generalizes the metrics such that they can be applied on Java with EMFand QVT-O as well. The refinement of the quality questions’ second (GM1.2.i)metric tries to validate whether these generalizations are feasible by refined ques-tionnaire questions. Furthermore, this thesis only describes the generalized formsof the metrics. For examples and the rationales of the metrics, the thesis refers tothe work of Kapova et al. [KGH12].

76


Section 5.5.1 to Section 5.5.7 continue with describing the refinements for thequality properties. Each section describes one quality question (GQ1) and onereason question (GQ2) refinement corresponding to a different quality property,respectively.

5.5.1 ModuQ1/ModuQ2: Modularity

This section refines the quality (ModuQ1) and reason (ModuQ2) questions formodularity (defined in Section 3.6.7).

ModuQ1. Regarding the quality question, Kapova et al. [KGH12] identify four7

metrics that correlate with modularity: the average (1) number of domains,(2) fan-out, (3) rule dependency depth, and (4) number of explicit internal schedul-ing calls. Therefore, these metrics can be used to refine the first metric (GM1.1.i)of the quality question:

Number of domains (Modu1.1.1) Number of domains (cf. Section 3.1.1) pertransformation rule. In Java the average number of parameters per method,in QVT-O the average number of parameters per operation (mapping, helper,constructor, and entry operations), and in QVT-R the average number ofdomains per relation.

Fan-out (Modu1.1.2) Number of distinct explicit internal scheduling calls (cf.Section 3.1.3) of a rule. In Java the average number of distinct calls frommethods to methods per method (method fan-out), in QVT-O the aver-age number of distinct calls from operations to operations per operation(operation fan-out), in QVT-R the average number of distinct calls (i.e.,“RelationCallExps” in when and where clauses) from relations to relationsper relation (relation fan-out).

The rule dependency depth (Modu1.1.3) Describes the length from a leaf ruleto a start rule. Leaf rules are rules which neither occur as an applicationcondition (cf. Section 3.1.1) within other rules nor apply explicit internalscheduling calls (cf. Section 3.1.3) to other rules. Start rules are top levelrules without application conditions where top level rules are rules whichare not invoked by explicit internal scheduling mechanisms. In Java the de-pendency depth of methods, in QVT-O the dependency depth of operations,and in QVT-R the dependency depth of relations.

Number of explicit internal scheduling calls (Modu1.1.4) Number of (not nec-essarily distinct; cf. “fan-out”) explicit internal scheduling calls (cf. Sec-tion 3.1.3) of a rule. This metric corresponds to the “NWWP-C” metric of

7In fact, Kapova et al. [KGH12] identify five metrics. However, the “number of when- andwhere-predicates” metric is too specific for QVT-R and, thus, not considered.

77


Kapova et al. [KGH12]. It can analogously to “fan-out” be applied on therespective M2M language.

For the refinement of the second metric (GM1.2.i), which specifies questionsfor a questionnaire, the questions derived in the examples of Section 5.4.1 can beused as the examples relate to modularity. Therefore, the refinement includes thequestions “How would you rate the modularity of the transformation?” and “Towhat degree is the transformation composed of distinct components such that achange to one component has minimal impact on the other components?”.

Table 5.6 shows the refined template for the quality question of modularity,thus, summarizing the descriptions from above. Note that it does not includean additional hypothesis. However, the general hypotheses of the quality questionare inherited and expected to hold regarding modularity, e.g., there are differencesin modularity when comparing the scenario implementations (GH1.3). As this isspecified via the general template, it does not need to be included within Table 5.6(nor in other template refinements). Moreover, the descriptions of the refinedmetrics of GM1.1.i additionally include a reference to related work that arguedfor using the respective metric. This is also consistently done for other templaterefinements.

Type ID Description

Question ModuQ1 What is the modularity of the implementations?Metric Modu1.1.1 + Average number of domains [KGH12]

Modu1.1.2 + Average fan-out [KGH12]Modu1.1.3 − Average rule dependency depth [KGH12]Modu1.1.4 + Average number of explicit internal scheduling calls [KGH12]Modu1.2.1 + Questionnaire: “How would you rate the modularity of the

transformation?”Modu1.2.2 + Questionnaire: “To what degree is the transformation com-

posed of distinct components such that a change to one compo-nent has minimal impact on the other components?”

Hypothesis - (no additional hypothesis)

Table 5.6: Refined Quality Question Template for Modularity

ModuQ2. The reason question refinement asks for the reasons of modularitydifferences. For this, GM2.2.i (targeting changes between Copy and RuleX) isrefined twice according to the modularity definition in Section 3.6. The firstrefinement targets the number of changed rules and the second refinement thenumber of additional rules:

Changed Rules (Modu2.2.1) Number of rules changed within RuleX when com-pared to Copy. In Java the number of changed methods, in QVT-O thenumber of changed operations, and in QVT-R the number of changed rela-tions.

Additional Rules (Modu2.2.2) Number of rules additional within RuleX whencompared to Copy. It can analogously to “Changed Rules” be applied tothe respective M2M language.

78


Both refinements target the “impact on other components” of the modularitydefinition. The same metrics can analogously be applied to the refinement ofGM2.4.i (targeting changes between the generated copy rule set and the MOMcompletion scenario) as well.

Table 5.7 shows the refined template for the reason question of modularity. Incontrast to the refined quality question template, it includes the additional hypoth-esis ModuH2. ModuH2 states that the implicit scheduling of QVT-R improvesmodularity. The reason for this is that implicit scheduling frees a transformationengineer from maintaining the explicit scheduling mechanisms. The hypothe-sis can be checked by comparing the questionnaire results regarding modularity(Modu1.2.1 and Modu1.2.2) between QVT-R and the other languages (cf. GH2.2in Section 5.4.2 for a detailed discussion on this approach).

Type ID Description

Question ModuQ2 What are the reasons for differences in modularity?Metric Modu2.2.1 Number of changed rules when moving from Copy to RuleX

Modu2.2.2 Number of additional rules when moving from Copy to RuleXModu2.4.1 Number of changed copy rules within MOM completionModu2.4.2 Number of additional rules within MOM completion

Hypothesis ModuH2 QVT-R’s implicit scheduling improves modularity.

Table 5.7: Refined Reason Question Template for Modularity

5.5.2 ReuseQ1/ReuseQ2: Reusability

This section refines the quality (ReuseQ1) and reason (ReuseQ2) questions forreusability (defined in Section 3.6.7).

ReuseQ1. The metrics for reusability identified by Kapova et al. [KGH12] arethe same as for modularity (Modu1.1.1 to Modu1.1.4). The reasons for this canbe differentiated based on a technical and an interpreted view. From the technicalview, the respective metrics have the same correlation with reusability as well asmodularity. More precisely, their correlation coefficients are similar (Kapova etal. [KGH12] consider two variables with a correlation coefficient from the interval〈−0.4, 0.4〉 as independent where the correlation coefficient lies between −1 and 1).From an interpreted view, reusability and modularity are inherently related andinfluence each other: for instance, the structuring of a transformation into modulesto improve modularity can, at the same time, foster reusability as these modulesmay be (re)used independently of each other for several other transformations.

Therefore, the hypothesis ReuseH1 explicitly states that reusability correlatespositively with modularity as shown in the refined template Table 5.8. For check-ing this hypothesis, it is especially important to refine the “questionnaire metric”(GM1.2.i) for reusability. Therefore, the refinement includes the questions “Howwould you rate the reusability of the transformation?” and “To what degree isit possible to apply the transformation rules the transformation specifies also in

79


other scenarios?”. The hypothesis can then be checked by comparing the ques-tionnaire results between modularity (Modu1.2.1 and Modu1.2.2) and reusability(Reuse1.2.1 and Reuse1.2.2) for each scenario with each other (cf. GH2.2 in Sec-tion 5.4.2 for a detailed discussion on this approach).

Type ID Description

Question ReuseQ1 What is the reusability of the implementations?Metric Reuse1.1.i See Modu1.1.1 to Modu1.1.4 [KGH12]

Reuse1.2.1 + Questionnaire: “How would you rate the reusability of thetransformation?”

Reuse1.2.2 + Questionnaire: “To what degree is it possible to apply thetransformation rules the transformation specifies also in otherscenarios?”

Hypothesis ReuseH1 Reusability correlates positively with modularity.

Table 5.8: Refined Quality Question Template for Reusability

ReuseQ2. The reason question refinement asks for the reasons of reusabilitydifferences. For this, Reuse2.2.1 refines the “changes” of GM2.2.i to “the numberof additional reused rules”. This corresponds to the number of explicit internalscheduling calls of a Rule scenario minus the number of explicit internal schedulingcalls of the Copy scenario. In Java “rules” correspond to methods, in QVT-Oto operations, and in QVT-R to relations. The refinement of GM2.4.i to theReuse2.4.1 metric behaves analogously.

Table 5.7 shows the refined template for the reason question of reusability. Itincludes the additional hypothesis ReuseH2. It states that the implicit schedulingof QVT-R improves reusability. The reasons for this are the same as for ModuH2.Furthermore, note that if ModuH2 and ReuseH1 are true, ReuseH2 can be de-duced. However, if ModuH2 and ReuseH1 are incorrect, ReuseH2 can still becorrect.

Type ID Description

Question ReuseQ2 What are the reasons for differences in reusability?Metric Reuse2.2.1 Number of additional reused rules when moving from Copy to

RuleXReuse2.4.1 Number of additional reused rules when moving from MOM

Copy to MOM completionHypothesis ReuseH2 QVT-R’s implicit scheduling improves reusability.

Table 5.9: Refined Reason Question Template for Reusability

5.5.3 AnaQ1/AnaQ2: Analyzability

This section refines the quality (AnaQ1) and reason (AnaQ2) questions for ana-lyzability (defined in Section 3.6.7).

80


AnaQ1. Goldschmidt and Kubler [GK08] propose metrics similar to the ap-propriateness recognizability metrics discussed in Section 5.5.6. Therefore, thehypothesis AnaH1 explicitly states that analyzability correlates positively withappropriateness recognizability and Ana1.1.i links to the respective appropriate-ness recognizability metrics.

Similar as for the case of reusability (cf. the ReuseH1 hypothesis), the mea-surements of GM1.2.i (with respect to analyzability) can be used to verify thishypothesis. For the case of analyzability, the refinement includes, therefore, thequestions “How would you rate the analyzability of the transformation?” and “Towhat degree is it possible to diagnose the transformation for deficiencies?”. Thesecond question is based on the analyzability definition in Section 3.6.7.

Table 5.10 summarizes the refinements of the quality question for analyzability.

Type ID Description

Question AnaQ1 What is the analyzability of the implementations?Metric Ana1.1.i See Appro1.1.1 to Appro1.1.6 [GK08]

Ana1.2.1 + Questionnaire: “How would you rate the analyzability of thetransformation?”

Ana1.2.2 + Questionnaire: “To what degree is it possible to diagnose thetransformation for deficiencies?”

Hypothesis AnaH1 Analyzability correlates positively with appropriateness recog-nizability.

Table 5.10: Refined Quality Question Template for Analyzability

AnaQ2. The reason question refinement asks for the reasons of analyzabilitydifferences. On the one hand, these could be derived from the results of appro-priateness recognizability (in case AnaH1 holds). On the other hand, the analyz-ability definition is similar to the one of modularity regarding “changes” to oneimplementation. The GQM plan follows the latter idea by relating Ana2.2.i andAna2.4.i to the equivalent modularity metrics Modu2.2.i and Modu2.4.i. This hasthe advantage that it lowers the strong dependency to AnaH1. AnaH2.1 makes theidea additionally explicit by stating that the more changes/additions needed ac-cording to a given scenario, the harder it is to analyze. This hypothesis can, again,be checked by comparing the questionnaire results of analyzability (Ana1.2.i) withthe metrics applied for measuring changes and additions (Ana2.2.i and Ana2.4.i).In case the values correlate positively, the hypothesis is falsified.

AnaH2.2 is a second additional hypothesis for the reason question of analyz-ability. It states that changes regarding hierarchies and abstraction levels suffermore from low analyzability than changes regarding 1:1 relations and structuralchanges. This hypothesis is based on the observation that the considered changesare applied on scenario implementations consisting of copy rule sets. As these havea high amount of 1:1 relations, adding hierarchical and abstraction level featuresincreases the number of distinct scenario criteria which can reduce analyzability.Structural changes, on the other hand, also introduce new scenario criteria butcan be easier to analyze as the abstraction level does not change.

81


For checking the second hypothesis, two sets are created first. The first set S1collects the scenario pairs that specify changes regarding hierarchies and abstrac-tion levels. When consulting Table 4.2, the first set is induced as S1 = {(Copy,Rule2), (Copy, Rule4), (Copy, Rule5), (Copy, Rule8), (Copy, Rule10), (Generated-CopyRules, MOM)}. The second set S2 collects the scenario pairs that spec-ify changes regarding 1:1 relations and structural changes. Table 4.2 inducesS2 = {(Copy, Rule1), (Copy, Rule7), (Copy, Rule12)}. The next step is to calcu-late for every pair (a, b) of each set the relative difference between the questionnaireanalyzability results of a and b. Note that this is generally not possible for ques-tionnaires as questionnaires are generally not ratio-scaled (cf. the discussion ofLearnQ1 in Section C.5.14). Therefore, this thesis assumes ratio-scaled values forthis case and emphasizes that any interpretations based on this assumption justallow to derive tendencies. With this assumption, the relative differences describea tendency with which factor analyzability changes from a to b, respectively. As itis likely that analyzability is lowered when altering copy rule sets, the GQM planassumes that the relative difference is negative. In case it is positive, the latterassumption is shown to be wrong and the change fosters analyzability.

Assuming the latter assumption holds, the relative differences of S1 elementscould be pair-wise compared to the relative differences of S2 elements. Note thatthis comparison only requires interval-scaled values; these are provided by thequestionnaire. The hypothesis AnaH2.2 can then be falsified if the relative differ-ence of any S1 element is greater than the relative difference of any S2 element(meaning that a change regarding hierarchies/abstractions lowered analyzabilityless than a change regarding 1:1 relations/structural changes). Another possibilityis to analogously compare the average values for the relative differences over S1and S2, respectively. This has the advantage that (1) no pair-wise comparison isneeded, and (2) the hypothesis is not seen too strict but as a general tendencywhere exceptions are possible. The latter issue is also the downside of the secondpossibility as the result is “a rule of thumb” and not a rule holding for every case.

Table 5.11 summarizes the refinements of the reason question for analyzability.

Type ID Description

Question AnaQ2 What are the reasons for differences in analyzability?Metric Ana2.2.i See Modu2.2.1 and Modu2.2.2

Ana2.4.i See Modu2.4.1 and Modu2.4.2Hypothesis AnaH2.1 The more changes/additions needed according to a given scenario,

the harder to analyze it.AnaH2.2 Changes regarding “hierarchical” and “abstraction level” scenario

criteria features suffer more from low analyzability than changesregarding “1:1 relations” and “structure” scenario criteria features.

Table 5.11: Refined Reason Question Template for Analyzability

82


5.5.4 ModiQ1/ModiQ2: Modifiability

This section refines the quality (ModiQ1) and reason (ModiQ2) questions formodifiability (defined in Section 3.6.7).

ModiQ1. The metrics for modifiability identified by Kapova et al. [KGH12] over-lap with the metrics Modu1.1.2 to Modu1.1.4 for modularity. The reasons forthis are the same from the technical and similar from an interpreted view as forreusability (cf. Section 5.5.2). From an interpreted view, modifiability is also in-herently related to modularity. For instance, the structuring of a transformationinto modules to improve modularity can, at the same time, foster modifiabilityas these modules are encapsulated entities that can locally be modified; withoutside-effects to other modules.

Therefore, the hypothesis ModiH1 explicitly states that modifiability correlatespositively with modularity as shown in the refined template Table 5.12. For check-ing this hypothesis, it is especially important to refine the “questionnaire metric”(GM1.2.i) for modifiability. Therefore, the refinement includes the questions “Howwould you rate the modifiability of the transformation?” and “To what degree is itpossible to alter the transformation without introducing defects?”. The hypothesiscan then be checked by comparing the questionnaire results between modularity(Modu1.2.1 and Modu1.2.2) and modifiability (Modi1.2.1 and Modi1.2.2) for eachscenario with each other (cf. GH2.2 in Section 5.4.2 for a detailed discussion onthis approach).

Type ID Description

Question ModiQ1 What is the modifiability of the implementations?Metric Modi1.1.i See Modu1.1.2 to Modu1.1.4 [KGH12]

Modi1.2.1 + Questionnaire: “How would you rate the modifiability of thetransformation?”

Modi1.2.2 + Questionnaire: “To what degree is it possible to alter thetransformation without introducing defects?”

Hypothesis ModiH1 Modifiability correlates positively with modularity.

Table 5.12: Refined Quality Question Template for Modifiability

ModiQ2. The reason question refinement asks for the reasons of modifiabilitydifferences. For this, Modi2.2.1 refines the “changes” of GM2.2.i to “the averagedecrease in other quality properties”. Thus, Modi2.2.1 tries to measure the degreeto which the changes are “degrading existing transformation specification quality”which is consistent with the modifiability definition (cf. Section 3.6.7). Note thatthis is only one way to define this metric: every considered quality property influ-ences the average with the same weight. Another possibility would be to specifydifferent weights for the quality properties.

To measure the “average decrease”, the decrease in quality when comparingthe Copy scenario with the respective “Rule scenario” is averaged over everyconsidered quality property except for modifiability. For this, the thesis uses

83


the respective refinements of the “questionnaire metric” GM1.2.i. Measuring thedecreases is analogously to the measuring of relative differences described for thereason question of analyzability (cf. Section 5.5.3). The refinement of GM2.4.i toModi2.4.1 can be realized analogously.

Table 5.13 shows the refined template for the reason question of reusability. Itincludes the additional hypothesis ModiH2. It states that the implicit schedulingof QVT-R improves reusability for the same reasons as for ModuH2. Note thatif ModuH2 and ModiH1 are true, ModiH2 can be deduced. However, if ModuH2and ModiH1 are incorrect, ModiH2 can still be correct.

Type ID Description

Question ModiQ2 What are the reasons for differences in modifiability?Metric Modi2.2.1 Average decrease in other quality properties when moving from

Copy to RuleXModi2.4.1 Average decrease in other quality properties when moving from

MOM Copy to MOM completionHypothesis ModiH2 QVT-R’s implicit scheduling improves modifiability.

Table 5.13: Refined Reason Question Template for Modifiability

5.5.5 ConsQ1/ConsQ2: Consistency

This section refines the quality (ConsQ1) and reason (ConsQ2) questions for con-sistency (defined in Section 5.2).

ConsQ1. Kapova et al. [KGH12] identify seven8 metrics that correlate with con-sistency: the (1) lines of code, the number of (2) starts, (3) rules, and (4) top-levelrules, as well as the average (5) number of domains, (6) fan-out, and (7) numberof explicit internal scheduling calls:

Lines of code (Cons1.1.1) Lines of code of a scenario implementation. ThisGQM plan considers the “source lines of code”, i.e., the total lines of codeminus the lines only containing comments and minus the lines which arecompletely blank.

Number of starts (Cons1.1.2) Number of start rules (as defined in Section 5.5.1).Equals “1” in Java when requiring that the transformation is started froma main method, equals always “1” in QVT-O as QVT-O transformationspecifications always have exactly one entry operation, and in QVT-R thenumber of top-level rules which have no application conditions and are notinvoked by other rules.

Number of rules (Cons1.1.3) Number of rules. In Java the number of methods,in QVT-O the number of operations (entry, mapping, helper, and construc-tor operations), and in QVT-R the number of relations.

8Again, the “number of when- and where-predicates” metric by Kapova et al. [KGH12] is notconsidered.

84


Number of top-level rules (Cons1.1.4) Number of top-level rules (as defined inSection 5.5.1). Equals “1” in Java when requiring that the transformationis started from a main method, equals always “1” in QVT-O as QVT-Otransformation specifications always have exactly one entry operation, andin QVT-R the number of top-level rules.

Number of domains (Cons1.1.5) See Modu1.1.1.

Fan-out (Cons1.1.6) See Modu1.1.2.

Number of explicit internal scheduling calls (Cons1.1.7) See Modu1.1.4.

Kapova et al. [KGH12] provide for each metric the respective rationale. Fur-thermore, they empirically show that these metrics correlate with consistency.

Moreover, the “questionnaire metric” (GM1.2.i) for modifiability needs also tobe refined. Therefore, the refinement includes the questions “How would you ratethe consistency of the transformation?” and “To what degree is the transformationimplemented in a uniform manner?”. The second question follows directly fromthe consistency definition in Section 5.2.

Table 5.14 gives the refined template for the quality question of consistency.It does not have an additional hypothesis. Of course, the general hypothesesassociated to the quality question hold nonetheless since they are inherited.

Type ID Description

Question ConsQ1 What is the consistency of the implementations?Metric Cons1.1.1 − Lines of code [KGH12]

Cons1.1.2 − Number of starts [KGH12]Cons1.1.3 − Number of rules [KGH12]Cons1.1.4 − Number of top-level rules [KGH12]Cons1.1.5 + Average number of domains [KGH12]Cons1.1.6 + Average fan-out [KGH12]Cons1.1.7 + Average number of explicit internal scheduling calls [KGH12]Cons1.2.1 + Questionnaire: “How would you rate the consistency of the

transformation?”Cons1.2.2 + Questionnaire: “To what degree is the transformation imple-

mented in a uniform manner?”Hypothesis - (no additional hypothesis)

Table 5.14: Refined Quality Question Template for Consistency

ConsQ2. The reason question refinement asks for the reasons of consistency dif-ferences. Therefore, Cons2.2.1 tries to find newly introduced inconsistencies whenchanges are applied to the Copy scenario (this is consistent with the consistencydefinition in Section 5.2). To measure this, a special task is added to the ques-tionnaire. This task is the following: “Assume the implementation of B is createdon the basis of the implementation of A. Count and name newly introduced in-consistencies (if any) in the implementation of B. Think of elements as variables,

85


rules, etc.” where A is the implementation of Copy or GeneratedCopyRules, andB is the implementation of Rule1 to Rule12 or the MOM scenario implementa-tion. The more inconsistencies are found, the more likely it is that the respectivechanges (identified by the scenario classification) cause inconsistencies.

Table 5.15 shows the refined template for the reason question of consistency. Itincludes the additional hypothesis ConsH2. ConsH2 states that changes regardinghierarchies and abstraction levels suffer more from inconsistencies than changesregarding 1:1 relations and structural changes. The same argumentation as forAnaH2 holds here and the hypothesis can be checked analogously.

Type ID Description

Question ConsQ2 What are the reasons for differences in consistency?Metric Cons2.2.1 Questionnaire: “Assume the implementation of RuleX is created

on the basis of the implementation of Copy. Count and namenewly introduced inconsistencies (if any) in the implementationof RuleX. Think of elements as variables, rules, etc.”

Cons2.4.1 Questionnaire: “Assume the implementation of MOM is cre-ated on the basis of the implementation of GeneratedCopyRules.Count and name newly introduced inconsistencies (if any) in theimplementation of MOM. Think of elements as variables, rules,etc.”

Hypothesis ConsH2 Changes regarding “hierarchical” and “abstraction level” sce-nario criteria features suffer more from inconsistencies thanchanges regarding “1:1 relations” and “structure” scenario cri-teria features.

Table 5.15: Refined Reason Question Template for Consistency

5.5.6 ApproQ1/ApproQ2: Appropriateness Recognizability

This section refines the quality (ApproQ1) and reason (ApproQ2) questions forappropriateness recognizability (defined in Section 3.6.4).

ApproQ1. Kapova et al. [KGH12] identify six metrics that correlate with ap-propriateness recognizability: the (1) lines of code, the number of (2) starts, (3)rules, and (4) top-level rules, as well as the average (5) size of domain patterns,and (6) number of explicit internal scheduling calls:

Lines of code (Appro1.1.1) See Cons1.1.1.

Number of starts (Appro1.1.2) See Cons1.1.2.

Number of rules (Appro1.1.3) See Cons1.1.3.

Number of top-level rules (Appro1.1.4) See Cons1.1.4.

Size of domain pattern (Appro1.1.5) The size of a domain pattern. Domainpatterns are defined in Section 3.1.1. The size is measured in the number

86


of expressions and subexpressions that are part of the pattern: in Javathe number of all expressions within one method, in QVT-O the numberof all expressions within one operation, and in QVT-R the number of allexpressions within one domain.

Number of explicit internal scheduling calls (Appro1.1.6) See Modu1.1.4.

Again, Kapova et al. [KGH12] provide the rationale for each metric, respectively.Furthermore, they empirically show that these metrics correlate with consistency.Also note that the correlations of the metrics referenced can be different to thecorrelations used for appropriateness recognizability (shown in Table 5.16). Thesereferences just point to the description of the respective metric.

The refinement of the “questionnaire metric” (GM1.2.i) for appropriatenessrecognizability includes the questions “How would you rate the appropriatenessrecognizability (also known as understandability) of the transformation?” and “Towhat degree is it possible to recognize whether the transformation is appropriatefor the transformation scenario?”. The first question additionally includes thealternative term “understandability” as described in Section 5.2 since it may bemore well-known. The second question follows directly from the appropriatenessrecognizability definition in Section 3.6.4.

Type ID Description

Question ApproQ1 What is the appropriateness recognizability of the imple-mentations?

Metric Appro1.1.1 + Lines of code [KGH12]Appro1.1.2 + Number of starts [KGH12]Appro1.1.3 + Number of rules [KGH12]Appro1.1.4 + Number of top-level rules [KGH12]Appro1.1.5 − Average size of domain pattern [KGH12]Appro1.1.6 + Average number of explicit internal scheduling calls [KGH12]Appro1.2.1 + Questionnaire: “How would you rate the appropriateness rec-

ognizability (also known as understandability) of the transfor-mation?”

Appro1.2.2 + Questionnaire: “To what degree is it possible to recognizewhether the transformation is appropriate for the transformationscenario?”


Table 5.16: Refined Quality Question Template for Approp. Recognizability

Table 5.16 shows the refined template for the quality question of appropriatenessrecognizability. Furthermore, the quality question does not have an additionalhypothesis to the ones it inherits.

ApproQ2. The reason question refinement asks for the reasons of appropri-ateness recognizability differences. For this, the metric Appro2.2.1 refines the“changes” of GM2.2.i to “the number of additional/changed comment lines of

87


code” as appropriateness recognizability particularly relates to the documenta-tion. The refinement of GM2.4.i to Appro2.2.2 can be realized analogously. Theidea is that copy rule sets are generic. Hence, only the purpose of the change tothe copy rules needs to be understood in order for an improved appropriatenessrecognizability. Additional or changed comments relate to the documentation andare one way to guide through the changes.

In particular, the hypothesis ApproH2 states that the changes to generic rulesets are commented and that this improves appropriateness recognizability. Forchecking this hypothesis, (1) the values measured for Appro2.2.1 and Appro2.2.2must be positive and (2) appropriateness recognizability needs to be higher for thechanged scenario compared with the generic copy rule set. For the latter issue,the questionnaire results of the “questionnaire metric” Appro1.2.i can be used fora direct comparison.

Table 5.17 summarizes the refinements of the reason question for appropriate-ness recognizability.

Type ID Description

Question ApproQ2 What are the reasons for differences in appropriateness recog-nizability?

Metric Appro2.2.1 Number of additional/changed comment lines of code when mov-ing from Copy to RuleX

Appro2.2.2 Number of additional/changed comment lines of code when mov-ing from MOM Copy to MOM completion

Hypothesis ApproH2 Changes and additions to copy rules are documented and, thus,improve the appropriateness recognizability independent of thescenario.

Table 5.17: Refined Reason Question Template for Approp. Recognizability

5.5.7 LearnQ1/LearnQ2: Learnability

This section refines the quality (LearnQ1) and reason (LearnQ2) questions forlearnability (defined in Section 3.6.4). As Kapova et al. [KGH12] do not considerlearnability, the GQM plan derives appropriate metrics from the work of Grossmanet al. [GFA09]. They provide a survey of several metrics targeting learnability forsoftware in general. Therefore, this thesis refines a subset of these metrics suchthat they can be applied for M2M transformations as well.

Furthermore, the applied metrics are particularly related to the specifications ofdifferent M2M languages. For Java the GQM plan sticks to “The Java LanguageSpecification” by Gosling et al. [GJSB05] and for QVT-O as well as for QVT-Rto the QVT specification by the OMG [Obj11a].

LearnQ1. For answering the quality question with respect to learnability, thisthesis identifies three metrics: the (1) number of possible language constructs, (2)number of applied language constructs over all scenarios per language, and (3)time until a scenario was implemented successfully:

88


Number of possible language constructs (Learn1.1.1) Measures the number oflanguage constructs as provided by M2M language specifications. The met-ric relates to the documentation metric “time taken to review documentationuntil starting a task” of Grossman et al. [GFA09] as well as serves as a basisfor Learn1.1.2.

One possibility to measure the constructs is to count the number of nonter-minals and terminals of the respective M2M language grammar (as given inthe languages’ specification). A problem of this approach is that the specifi-cation may not provide the whole grammar or divides it through the wholedocument. For instance, the grammar of the Java specification [GJSB05][pp.585-596] contains the nonterminal “NullLiteral” but does not list the accord-ing production rule at these pages. This production rule has to be lookedup at [GJSB05][p. 30].

Another possibility is to count the number of keywords as given by the lan-guage specification. The advantage of this is that a specification lists thesekeywords clearly arranged (e.g., at [GJSB05][p. 21] for Java). Its disadvan-tage is that it does not give any clues how these keywords are arranged and,thus, a part of a languages’ complexity is obscured.

A third possibility is to count the number of metamodel classes as providedby a languages’ EMF metamodel (which indirectly relates to the respectivelanguage specification). EMF metamodels are available for all consideredM2M languages (these are used for an AST creation by the “M2M Quality”tool implemented for this thesis; cf. Section 6.2.2). The advantage of thisapproach is that this number can easily be determined and also be automatedby a tool. The disadvantage is that the respective metamodel can containelements not needed for programming within the language. For instance,the QVT-O metamodel coming with the QVT Operational engine containsthe metaclass “VisitableASTNode” only needed for parsing QVT-O files.

This thesis sticks to the second possibility (i.e., to count the keywords) sinceit is the simplest approach. In the long run, especially tool support forthe third possibility should be (additionally) added as it also enables anautomated measurement of Learn1.1.2.

Number of applied language constructs (Learn1.1.2) Measures the number oflanguage constructs as applied by the transformation engineer over all sce-narios regarding one M2M language/engine combination. For its calculation,applied constructs are measured and related to the Learn1.1.1 metric. More-over, Learn1.1.2 relates to the command metric “percent of commands usedby user” of Grossman et al. [GFA09].

Time for successful implementation (Learn1.1.3) Measures the time for a suc-cessful scenario implementation. As I am the only one implementing scenar-ios, this metric is not representative. Nonetheless, it can give a trend and

89


a weak verification for the stated hypothesis. Furthermore, future work canmeasure this metric under a controlled environment and with more partic-ipants. The metric relates to the task metric “time until user completes acertain task successfully” of Grossman et al. [GFA09].

As no related work empirically proves that these metrics relate to the learn-ability quality property for M2M transformations, the subjective metric GM1.2.iregarding learnability is especially important to provide first evidence for any cor-relations. Grossman et al. [GFA09] describe this kind of metrics as “subjectivemetrics” and also consider the application of questionnaires. The GQM plan in-cludes the questions “How would you rate the learnability of the transformationlanguage?” and “To what degree can the transformation language be taught touse it with effectiveness, efficiency, freedom from risk, and satisfaction?”. Thesecond question follows directly from the learnability definition in Section 3.6.4.

Table 5.18 gives the refined template for the quality question of learnability. Itdoes not have an additional hypothesis; only the general hypotheses associated tothe quality question are inherited.

Type ID Description

Question LearnQ1 What is the learnability of the languages/engine combinations?Metric Learn1.1.1 − Number of possible language constructs [GFA09]

Learn1.1.2 + Number of applied language constructs over all scenarios perlanguage [GFA09]

Learn1.1.3 − Time until a scenario was implemented successfully [GFA09]Learn1.2.1 + Questionnaire: “How would you rate the learnability of the

transformation language?”Learn1.2.2 + Questionnaire: “To what degree can the transformation lan-

guage be taught to use it with effectiveness, efficiency, freedomfrom risk, and satisfaction?”


Table 5.18: Refined Quality Question Template for Learnability

LearnQ2. The reason question refinement asks for the reasons of learnabilitydifferences. For this, the metric Learn2.2.1 refines the “changes” of GM2.2.i to“the number of newly introduced language constructs”. This metric relates tothe command metric “increase in commands used over certain time interval” ofGrossman et al. [GFA09]. To measure this metric, the relative difference of appliedlanguage constructs of the first and second scenario is calculated. The refinementof GM2.4.i to Learn2.4.1 can be implemented analogously.

Hypothesis LearnH2.1 targets the latter two metrics. It states that changesnot related to 1:1 relations especially foster learnability. The rationale is that theconsidered changes start with copy rule sets which contain several 1:1 relations.Therefore, it is to assume that 1:1 relations are already well understood. Addinganother 1:1 relation has therefore only a small learning effect. On the other hand,

90


if the changes do not relate to 1:1 relations, new language constructs may haveto be found which comes with a higher learning effect. When considering thechanges between the Copy scenario and the Rule1 to Rule12 scenarios, only Rule1and Rule7 have changes regarding 1:1 relations (cf. Table 4.2). The hypothesis canthen be checked by comparing measurements of Learn2.2.1 for Rule1 and Rule7with the measurements for the other Rule scenarios.

Learnability is often related to the size of the documentation (e.g., the documen-tation metric “time taken to review documentation until starting a task” describedby Grossman et al. [GFA09]). However, the hypothesis LearnH2.2 states that thesize of the language documentation does not influence learnability significantly.The reason for this hypothesis is that relational languages like QVT-R can typi-cally be described shortly and precisely. However, related work often states thatQVT-R “is not easy to learn” [GPT09]. To verify this hypothesis, the metricsLearn2.12 to Learn2.169 try to measure the size of the considered M2M languagedocumentations. They measure the number of pages, lines, words, characters, andfigures of a documentation, respectively. This thesis uses the following documentsfor representing the “documentation” of the respective language: [GJSB05] forJava, [Obj11a][Chapter 7] for QVT-R, and [Obj11a][Chapter 8] for QVT-O. Thehypothesis can then be validated by checking whether these metrics negativelycorrelate with the “questionnaire metrics” Learn1.2.i.

Table 5.19 summarizes the refinements of the reason question for learnability.

Type ID Description

Question LearnQ2 What are the reasons for differences in learnability?Metric Learn2.2.1 Number of newly introduced language constructs when moving

from Copy to RuleXLearn2.4.1 Number of newly introduced language constructs when moving

from MOM Copy to MOM completionLearn2.12 Size of language documentation (number of pages)Learn2.13 Size of language documentation (number of lines)Learn2.14 Size of language documentation (number of words)Learn2.15 Size of language documentation (number of characters)Learn2.16 Size of language documentation (number of figures)

Hypothesis LearnH2.1 Moving from a copy rule set to a scenario containing less 1:1relations fosters learnability as other scenario criteria requireother language constructs which have to be understood.

LearnH2.2 The size of the language documentation does not significantlyinfluence learnability.

Table 5.19: Refined Reason Question Template for Learnability

9These are additional metrics to the eleven inherited ones. Therefore, the metric number startsfrom “12”.

91


5.6 Discussion of the GQM Plan

This chapter compiles a GQM plan for assessing the quality of M2M transfor-mations based on scenarios. As the scope of this Master’s thesis is limited, ittargets only quality properties related to maintainability. However, the generaltemplate of Section 5.4 can also be applied to other quality properties. Section 5.5illustrates the applicability of the template for maintainability in particular.

The derived metrics for the respective question are mainly based on relatedwork. Related work that targets quality of M2M transformations, e.g., the work ofKapova et al. [KGH12], is especially useful as it already provides first evidence forthe feasibility of metrics regarding certain quality properties. In these cases, theGQM plan could just add them without an exhaustive reasoning for the selectedmetrics. This was important as a full inspection of every metric is out of thethesis’ scope.

However, the metrics for learnability derived in Section 5.5.7 were not appliedin the context of M2M transformations before. Therefore, these metrics sufferfrom the fact that no empirical evidence exists indicating that these metrics reallyreflect learnability in the context of M2M transformations. Here, it is especiallyimportant to measure subjective metrics like specified by the “questionnaire met-ric” GM1.2.i and to provide the rationales behind each hypothesis and metric.

The compiled GQM plan is a first step towards a systematic process to quan-titatively compare quality properties of M2M transformations. However, it doesnot claim to be complete or exhaustive. Future work needs to identify problemsinherent to the GQM plan, improve it, and provide external validations. Theadvantage of this GQM plan especially is that it allows to compare it with mea-surement plans and measurements derived in future work. This way, the qualityof the M2M transformation quality comparison itself can be incrementally im-proved.

92

6 Data Collection

This chapter describes how this thesis carries out the measurements specified bythe GQM plan of Chapter 5 as well as presents the collected and stored data.Hence, it describes the third phase (data collection) of the GQM method (cf.Section 2.2.1).

In addition to the data collection, this chapter describes a refined scope of themeasurements. That is, the data collection does not follow the complete GQMplan and leaves some issues as a future work. The main reasons for this areproblems during the implementation of the scenarios as well as to stay within thescope of this thesis.

Therefore, Section 6.1 discusses these issues in detail and derives the conse-quences and modifications for the GQM plan. Section 6.2 continues with explain-ing the implementation of the artifacts (tools, questionnaires, etc.) involved in themeasurement. Thereafter, Section 6.3 describes the applied process for measure-ment execution. Section 6.4 concludes this section by presenting the measurementresults.

6.1 Refined Measurement Scope

This section describes a refined scope of the measurements planned by the GQMplan. One reason for this refinement is that there were several problems during theimplementation of the case studies. These problems concern the “Ecore to CopyHOT” and the “Message-oriented-Middleware (MOM) Completion” scenarios: themetamodels and transformations provided by Kapova et al. [KGH12] are incon-sistent, i.e., metamodel classes and attributes required by the transformations arenot covered by the provided metamodels1. Furthermore, the implementations andmeasurements of the other fourteen scenarios are already enough for the scopeof this thesis. The remaining two scenarios mentioned above are, therefore, leftas a future work. For this, the successfully implemented and measured scenarioscan serve as an example and reference of how the unimplemented scenarios canbe implemented and measured. Further, note that the GQM plan is completeregarding these remaining scenarios and can directly be used.

1Metamodels and transformations can be downloaded at http://www.furcas.org/survey/qvtr-questionnaire.html (last retrieved 2012-10-17). In private communication withKapova et al., they also provided additional metamodels and a documentation. However,even these additional artifacts did not solve all inconsistencies. The deliverables coming withthis thesis contain all files gathered for the respective case study and can be used in futurework.

93



6. Data Collection

As a consequence of this refined scope, several planned measurements cannotbe executed in the scope of this thesis and, thus, need to be modified according tothe new scope. The modifications concern (1) the general template (Section 5.4)and (2) the refined templates (Section 5.5).

The general template consists of a template for the quality question and atemplate for the reason question. The general quality question (Section 5.4.1)is not affected by the refined scope. In contrast, the general reason question(Section 5.4.2) is affected regarding two issues. Firstly, the metrics targeting theMOM completion scenario (GM2.4.i and GM2.5) cannot be measured anymore.Since GM2.4.i is a metric which needs to be refined, every refinement related tothis metric is also removed from the scope of the measurements. However, asthe metrics GM2.4.i and GM2.5 are equivalent metrics but target other scenarios(scenarios of Medini QVT’s Shapes-Tutorial), the idea of the metrics is still appliedeven though not validated on another scenario. Secondly, the hypothesis GH2.8which targets the HOT feature is not operational anymore since no HOT scenariois present in the refined scope.

The only modification needed for the refined templates is to remove the refinedmetrics for GM2.4.i as described above. For instance, the two metrics Modu2.4.iand Modu2.4.2 of modularity (Section 5.5.1) are removed from the scope of themeasurements.

In summary, two types of metrics and hypotheses are removed: metrics directlyrelated to changes of the removed scenarios and hypotheses related to HOTs. Thefirst modification does not remove a general idea of this thesis as similar metricsfor other scenarios are still applied. However, by removing these metrics, anempirical evaluation whether the metrics can be generalized is missing and left as afuture work. The second modification removes the measurements regarding HOTs.Therefore, hypotheses related to HOTs are not empirically validated within thisthesis and also left as a future work.

6.2 Implementation

This section describes the implementation of the artifacts created and used forthe measurements specified by the GQM plan. Section 6.2.1 describes the issuesand process applied for the implementation of the different case studies in Java,QVT-O, and QVT-R. Several measurements applied on these implementations canbe automated. Therefore, Section 6.2.2 gives an overview of the “M2M Quality”tool which incorporates these automated measurements. It was implemented inthe context of this thesis and provides a framework for M2M quality measurement.Section 6.2.3 finally discusses the design of the questionnaire derived from theGQM plan. It enables to collect empirical evidence for the validity of the appliedmetrics. Concrete artifacts and the location of these can be found in Appendix B.

94

6.2 Implementation

6.2.1 Case Studies

The case studies as described in Section 3.5.3 were implemented in QVT-R usingMedini QVT, in QVT-O using the QVT Operational engine, and in Java withEMF using the JVM. This section describes the applied process and issues for theimplementation for each of these language/engine combinations.

QVT-R. The QVT-R implementations considered within this thesis are nearlya one-to-one copy of the implementations included in the plug-ins of the MediniQVT engine (cf. Section 3.4.1). The engine includes the plug-ins “UML2RDBMS”and “Shapes-Tutorial” which provide QVT-R implementations for the “Sim-pleUML to SimpleRDBMS” case study as well as for all scenarios of “MediniQVT’s Shapes-Tutorial”, respectively. Therefore, this covers all considered sce-narios. The only difference within the copy for this thesis is a changed namespacein the metamodels applied.

There are some consequences induced by this process. First of all, the imple-mentation style is different in QVT-R when compared to QVT-O and Java. Thisis most explicit when inspecting the QVT-R implementation regarding commentsas these implementations contain only few comments. In contrast, the implemen-tations of the other languages provide detailed comments. Therefore, comparingthe “amount of documentation” between different languages is biased.

Another issue is that it was not possible to measure the time until a scenario wasimplemented successfully (metric Learn1.1.3 of learnability; cf. Section 5.5.7) inQVT-R. Therefore, this thesis does not consider this metric for QVT-R. Controlledexperiments should target measuring this time in future work to get representativeresults.

QVT-O. I implemented every considered scenario in QVT-O. I started with the“SimpleUML to SimpleRDBMS” case study. For this, I used the implementa-tion provided by the QVT specification [Obj11a] while using the “SimpleUML”and “SimpleRDBMS” metamodels provided by the Medini QVT engine (cf. Sec-tion 3.4.1). As the transformation was not complete and free of errors whendirectly using the QVT specification’s implementation, I modified the transfor-mation implementation such that it worked correctly. For this, I tried to mod-ify as less as possible regarding the original transformation implementation. Asseveral code fragments were copied for this implementation, no implementationtime was recorded (hence, metric Learn1.1.3 of learnability is not considered forthis scenario).

For the implementations of the “Medini QVT’s Shapes-Tutorial” scenarios, Irecorded the time for each implementation. I started with the “Copy” scenarioand continued with “Rule1” to “Rule12” one after the other until every scenarioproduced the same output models as the corresponding QVT-R transformations(based on the example models provided in the deliverables). One important designdecisions for the QVT-O implementations was to use the “deepclone” operation for

95

6. Data Collection

copying a source to a target model instead of creating a whole transformation ruleset for every element to be copied. This affects, for instance, the Copy scenarioand causes fundamental differences to the approach applied in QVT-R (whichfollows the latter approach). The reason for this is that creating a generic rule setfor copy operations is too artificial in QVT-O as directly using the “deepclone”operation is the straight-forward way for realizing a scenario that needs to copymost elements.

Every scenario was implemented completely successfully except for the “Rule9”scenario. Rule9 describes a scenario which (1) can be transformed forwards aswell as backwards and (2) is capable of retaining user modifications made in thetarget model (cf. Section 3.5.3). The first issue could also be realized in QVT-Oby introducing a “configuration property” supported by the QVT Operationalengine. It allows to set up a boolean configuration variable within the QVT-O runconfiguration. With this, it was possible to configure whether the transformationshould be executed backwards or forwards. As the scenario only creates one-to-one copies of all model elements in an endogenous transformation scenario,the implemented rules for transforming these elements could be reused for bothdirections.

The second issue could not be realized. For retaining user modifications it isnecessary to include tracing information into the transformation implementationwhich identify the elements already mapped to target elements. With this infor-mation it is possible to identify modifications by users and to leave these unaffectedby another transformation execution. However, as neither the QVT Operationalengine nor the QVT-O standard itself have dedicated support for this kind oftracing, I did not include the support for the retainment of user modifications.Furthermore, using the (simple) trace model created by QVT-O as an input andhandling the identification of user modifications was out of the thesis’ scope.

Java. I implemented every considered scenario in Java. I started with the “Sim-pleUML to SimpleRDBMS” case study and continued with the scenarios of “Me-dini QVT’s Shapes-Tutorial” (in the same order as for QVT-O). I implementedevery transformation in Java after finishing with all QVT-O implementations.Furthermore, I recorded the implementation time for every implementation.

The general process applied was to open the corresponding QVT-O implementa-tion and (manually) transforming it into an equivalent Java transformation. Fur-thermore, the source code for (1) starting the transformation with a static mainmethod, (2) loading source models, (3) initializing target models, and (4) seri-alizing the target model to XMI was copied into each scenario separately. Thereason for this is that the scenarios are seen as independent of each other and,hence, no common superclass was created. However, future work should targetthe investigation of other possibilities to use Java with EMF for M2M transfor-mations as there are several design possibilities. This includes, for instance, theusage of libraries and frameworks specialized or useful for transformations such

96

6.2 Implementation

as (1) the usage of superclasses including generic transformation methods as de-scribed above, (2) Xtend2, or (3) Guava3. The provided Java implementationsshould be seen as a starting point for improved Java implementations and as afirst reference implementation.

As a consequence of the strong dependence to QVT-O, the Java implemen-tations have several things in common with the QVT-O implementations. Forinstance, the comments for describing the transformations or transformation rules(QVT-O operations and Java methods) could be reused to a great extend. Alsovariable and transformation rule names are often equivalent.

Moreover, in Java it was not possible to implement the Rule9 scenario due toJava’s lack of a tracing facility for model transformations (similarly to QVT-O).Also for the case of Java, an implementation of such a facility is out of this thesis’scope.

In general, there were no problems in transforming the QVT-O implementationsto Java manually. It is likely that the reason for this is the common, imperativelanguage paradigm where QVT-O can be seen as a language more specialized forthe application for M2M transformations. This fosters the hypothesis that otherJava design alternatives should be investigated (as described above), too.

6.2.2 Measurement Tool “M2M Quality”

This section presents the “M2M Quality” tool which allows to measure severalmetrics of the GQM plan automatically. It is based on the measurement toolprovided by Kapova et al. [KGBH10] which is specific for QVT-R. In the con-text of this thesis, the components of the latter tool were abstracted away fromQVT-R to cover M2M languages in general. The issues specific for QVT-R wereextracted into a separate “QVT-R component”. Similar components were createdfor QVT-O and Java as well.

Figure 6.1 illustrates the general process of M2M Quality via a flow chart.The inputs to M2M Quality are the transformation specification implementedwithin the different M2M languages. M2M Quality parses these specificationsinto language-specific AST representations (process (1)). As each AST refers toa language-specific metamodel, process (1) is effectively a text-to-model trans-formation (T2M). Next, process (2) transforms each AST into language-specificmetric measurements via M2M transformations. As these transformations arelanguage-specific, they need to be included into the respective language-specificcomponent. Furthermore, the metric measurements are manifested in model in-stances of a quality metamodel. The quality metamodel is a part of M2M Qualityand specifies how metric measurements are stored (e.g., each measurement has aname and one or many values). Finally, process (3) is an M2T transformationwhich takes the three measurement models as an input and creates a measure-

2See http://www.eclipse.org/xtend/ (last retrieved 2012-10-17).3See https://code.google.com/p/guava-libraries/ (last retrieved 2012-10-17).

97

http://www.eclipse.org/xtend/

https://code.google.com/p/guava-libraries/

6. Data Collection

ment table as an output. This table includes all measurements; it consists of onecolumn per language and one line per metric.

Note that the process shown in Figure 6.1 is simplified. For instance, the mea-surement of lines of code (which is also supported by the tool) cannot be realizedvia an AST representation. Instead, this “special” metric is directly measured onthe respective transformation specification file. The appendix in Section B.2 givesmore details on the implementation of M2M Quality (e.g., on its architecture).

JavaTransf. Spec.

(1) Parse into ASTs(T2M)

(2) Transform ASTs to Metric Measurements

(M2M)

QVT-OTransf. Spec.

QVT-RTransf. Spec.

Measurem.Table

(3) Transform Metric

Measurements to Measure-ment Table

(M2T)

JavaAST

QVT-OAST

QVT-RAST

JavaMeasurem.

QVT-OMeasurem.

QVT-RMeasurem.

Legend:Document Process Data Flow

Figure 6.1: General Process of “M2M Quality” (simplified)

M2M Quality allows to measure metrics of two types: (1) some of the generalmetrics of the general reason question (cf. Section 5.4.2) as well as (2) most re-finements of the quality question’s first metric (GM1.1.i; cf. Section 5.5). Thesupported metrics of the first type are GM2.6 to GM2.10. The supported metricsof the second type are all GM1.1.i refinements except for the one of learnability.Therefore, the metrics that have to be manually measured include all metrics re-lated to (1) questionnaires (i.e., GM1.2.i refinements and GM2.11; cf. Section 5.4.1and Section 5.4.2, respectively), (2) changes between two scenarios (i.e., GM2.2.irefinements and GM2.5; cf. Section 5.4.2), and (3) learnability (cf. Section 5.5.7).

6.2.3 Questionnaire

In the context of this thesis, I handed out a questionnaire to ten MDSD experts.The questionnaire results (presented in Section C.1 of the appendix) provide afirst empirical evidence whether the GQM plan metrics assigned to certain qual-ity properties really relate to these properties. This section discusses how thequestionnaire was compiled. The questionnaire itself is given in Section B.3 of theappendix.

98

6.2 Implementation

The questionnaire is based on the QVT-R questionnaire4 provided by Kapovaet al. [KGH12] as well as the ATL questionnaire provided by van Amstel [vA11,pp. 94-103]. Analogously to these questionnaires, the questionnaire of this thesisis divided into four parts: (1) an introduction to the questionnaire, (2) backgroundquestions, (3) evaluation questions, and (4) open questions. In the following, theparts of this thesis’ questionnaire are discussed.

The introduction to the questionnaire consists of a description and instructions.The description explains the questionnaire’s goal and the role of the participant.The instructions give the concrete tasks of the participant step-by-step. Theintroduction is mainly adopted from the introduction by van Amstel [vA11, pp.94-103]. Only the instructions and the references regarding ATL are altered suchthat the introduction complies to a comparison of Java, QVT-O, and QVT-R.

The background questions include formality questions, e.g., asking for name ande-mail of the participants as well as questions asking for the estimated prior knowl-edge about a concrete M2M language. This thesis directly adopts the backgroundquestions by van Amstel [vA11, pp. 94-103] for its questionnaire. Therefore, thequestionnaire does not include a question asking for prior knowledge regardingquality properties. The interpretation of the results in Chapter 7, however, showsthat there can be high differences regarding this knowledge. Consequently, futurework should consider this aspect.

Evaluation questions are the refined questions for metric GM1.2.i derived withinthe GQM plan (cf. Section 5.4.1). For instance, modularity adds the questions“How would you rate the modularity of the transformation?” and “To what degreeis the transformation composed of distinct components such that a change to onecomponent has minimal impact on the other components?” to this category (cf.Section 5.5.1). The order of the questions is distributed randomly over the set ofconsidered questions.

Open questions are the questions specified via GM2.11 (cf. Section 5.4.2). Thereis one of those questions per quality property. For instance, the questionnaire addsthe question “What are the reasons for differences in modularity?” for modularity.

As a pretest, I handed out the questionnaire to one volunteer working within anMDSD context. The volunteer checked the questionnaire for its understandabilitybut did not assess the implemented scenarios. An extensive pretest is out of thethesis’ scope but suggested for future work to increase the validity of collectedresults (cf. Section 7.4).

Furthermore, the participants had to indicate their evaluation of the questionsvia (1) a five- or seven-point Likert scale [Lik32] or (2) a text field where theycould answer freely. The background questions contain all of these types. Theevaluation questions contain only seven-point Likert scales and the open questionsonly text fields.

Using a Likert scale corresponds to the so-called “direct-rating method” asdescribed by Eisenfuhr and Weber [EW03][pp. 105-107]. Its advantage is that

4http://www.furcas.org/survey/qvtr-questionnaire.html (last retrieved 2012-10-17).

99


6. Data Collection

it is the simplest method for determining a value function (e.g., for modularity).Its disadvantage is that it is prone to errors as questionnaire participants oftengive bad weights when asked directly; participants perform better when askedfor a pair-wise comparison [EW03][pp. 105-107]. Nonetheless, I decided to usethe direct-rating method as I could reuse the question design of the related work([vA11, pp. 94-103] and [KGH12]). Thus, I was able to stay within the scopeof this thesis as well as to provide an external validation of the questionnairesapplied by the related work. Future work should, however, consider a redesign ofthe questionnaire which focuses more on pair-wise comparisons.

6.3 Measurements

This section describes how the measurements planned by the GQM plan werecarried out. Section 6.3.1 describes the execution of the automated measurementsby the M2M Quality tool (cf. Section 6.2.2). Thereafter, Section 6.3.2 explainsthe process applied for executing the questionnaire with its participants (cf. Sec-tion 6.2.3). Finally, Section 6.3.3 describes the measurement of the remainingmetrics that had to be measured manually.

6.3.1 Automated

For the automated measurements, a Java, QVT-O, and QVT-R file for each sce-nario implementation was passed to the M2M Quality tool. The tool directlyoutputs the supported metric measurements (Section 6.2.2 lists supported met-rics).

6.3.2 Questionnaire

I handed out the questionnaire (cf. Section 6.2.3) to ten MDSD experts. Forthis, I used “Google Forms”5. Compared to a “paper and pencil” solution,this had the advantage that I could directly access the results within download-able spreadsheet. The questionnaire is available at https://docs.google.com/

spreadsheet/viewform?formkey=dDF3X0phQk9BWjVOenZuWlhYb1k0QlE6MQ (lastretrieved 2012-10-17). Additionally, the appendix in Section B.3 provides a repro-duction of the online version.

At the time the participants evaluated the questionnaire, they worked in theSoftware Engineering Group of the Heinz Nixdorf Institute and Department ofComputer Science at the University of Paderborn either as student workers, PhDcandidates, or as an assistant professor. They were selected for participation asthey were familiar with M2M transformations. To measure their concrete skillsregarding the considered M2M languages, the questionnaire asks the participantsfor their knowledge about these.

5http://www.google.com/google-d-s/forms/ (last retrieved 2012-09-02)

100

https://docs.google.com/spreadsheet/viewform?formkey=dDF3X0phQk9BWjVOenZuWlhYb1k0QlE6MQ


http://www.google.com/google-d-s/forms/

6.4 Results

Each participant was assigned randomly to one of three sets of scenarios:(1) UML2RDBMS, Copy, Rule1, Rule2, and Rule3, (2) Copy, Rule4, Rule5, Rule6,Rule7, and Rule8, as well as (3) Copy, Rule9, Rule10, Rule11, and Rule12. Thereason for this is that the respective participants could not be allocated for a com-plete evaluation of all implemented scenarios. A controlled experiment with moreparticipants should be executed as a future work to increase the representativenessof the questionnaire. This is out of the thesis’ scope. The goal of this thesis isonly to gather first empirical data and to show the basic principles that have tobe considered for an empirical study with more participants.

6.3.3 Manual

I measured the metrics which are not covered by the M2M Quality tool andwhich are not covered by the questionnaire manually (Section 6.2.2 lists thesemetrics). As these metrics partially relate to the results of M2M Quality and thequestionnaire, I had to gather the latter results first.

6.4 Results

Appendix C provides the metric measurements as specified by the GQM plan. Inparticular, Appendix C includes processed data, illustrating diagrams as proposedby Solingen and Berghout [vSB99][pp. 72-73], and detailed descriptions of thedata. This is especially important for the interpretation phase of the GQM planin Chapter 7.

101

7 Interpretation

This chapter executes the interpretation phase of the GQM method, i.e., its fourthand final phase (cf. Section 2.2.1). Therefore, it investigates the measurementresults as described in Chapter 6 in detail as well as interprets and evaluates theseto draw appropriate conclusions.

The chapter follows a bottom-up approach for this: firstly, it answers the GQMplan’s questions (Section 7.1), secondly, it manifests derived conclusions withina decision tree (Section 7.2), and finally, it evaluates the goal attainment (Sec-tion 7.3). Additionally, Section 7.4 provides a threats to validity discussion whichis especially important for empirical studies (cf. [Yin03][Chapter 5]).

7.1 Answers to the Questions of the GQM Plan

This section provides answers to the questions specified by the GQM plan. Itfocuses on answering the question on a higher level of abstraction rather thaninvestigating every metric and hypothesis associated to the question on a detailedlevel. However, the results also need to be investigated in detail to make the in-terpretations of this section reproducible. Therefore, the appendix in Section C.5provides this detailed evaluation for the metric measurements and hypotheses.The content of this section can, however, be understood without reading Sec-tion C.5.

For every considered quality property, this section provides a subsection forthe evaluated answer to the quality and the reason question, respectively. Eachsubsection that answers a reason question additionally discusses the answers bythe questionnaire participants to the respective reason question. The reason forthis is that the participants provided additional and important insights. In par-ticular, this approach allows to jointly evaluate (1) the reasons identified by themetric measurements and hypotheses and (2) the statements by the questionnaireparticipants.

Accordingly, this section answers the quality question and the reason ques-tion corresponding to modularity (Section 7.1.1 and Section 7.1.2), reusability(Section 7.1.3 and Section 7.1.4), analyzability (Section 7.1.5 and Section 7.1.6),modifiability (Section 7.1.7 and Section 7.1.8), consistency (Section 7.1.9 and Sec-tion 7.1.10), appropriateness recognizability (Section 7.1.11 and Section 7.1.12),as well as learnability (Section 7.1.13 and Section 7.1.14).

103

7. Interpretation

7.1.1 Evaluation of ModuQ1: “What is the modularity of theimplementations?”

This section summarizes and evaluates the detailed modularity results of the ap-pendix in Section C.5.2. The interpretations of the metrics chosen for modularitypartly contradict each other. This is, for instance, the case for the interpreta-tions of Modu1.1.1 and Modu1.1.2. Furthermore, there is no direct correlationbetween the modularity values of the questionnaire evaluation and the consideredmetrics. This is particularly the case for the QVT-R metrics identified by Kapovaet al. [KGH12]. Therefore, it is generally hard to answer the quality question formodularity.

There are two main reasons for these vague results. Firstly, the modularitydefinition itself needs to be revised and extended by examples to guide the par-ticipants of a questionnaire. It was, for instance, unclear to the participants thatthe Copy scenario has, in fact, a high modularity in QVT-O (cf. Figure C.28 inthe appendix). Another example is that the applied metrics partly try to measurethe cohesion of the transformation implementations. For instance, the averagefan-out (Modu1.1.2) is one way to indicate how strongly different transformationrules belong together. Rules with a high fan-out have a high cohesion and should,thus, be included within one “transformation library” to foster the modularitybetween different transformation modules. However, no implementation makesuse of those libraries even though it is, e.g., possible to specify a transformationlibrary within QVT-O.

As for the concept of classes and methods in Java, it can be easier for a transfor-mation engineer (or a questionnaire participant) to evaluate the modularity whena set of related transformation libraries is given. For instance, a new questionwithin a questionnaire could be “do you think transformation rule X fits betterinto transformation library A or B?”. These kind of questions illustrate moreintuitively the meaning of modularity.

The fact that no questionnaire participant has suggested to use transformationlibraries indicates that (a) the participants did not know that creating transfor-mation libraries is possible, (b) they did not think about using libraries eventhough they knew about the possibility, or (c) they explicitly decided for everyscenario that the selected set of transformation rules perfectly fits to the providedtransformation. If case (c) would hold, the modularity of each implementationwould have been evaluated very high for every scenario. As this is not the case,(a) or (b) are likely to hold. Both of these cases show that there is a lack ofunderstanding modularity mechanisms for M2M transformations1. Therefore, fu-ture work should investigate the possibilities of those modularity mechanisms and

1This holds at least for some of the questionnaire participants indicating that the knowledgeregarding modularity mechanisms is not widespread among the transformation community.However, there is already literature that studies transformation modularity mechanisms.For instance, Klar et al. [KKS07] discuss generalization and packaging concepts for modeltransformations.

104


try to provide transformation engineers with a deeper understanding of these con-cepts. This particularly involves a set of transformation scenarios that apply thesemechanisms. A starting point for this can be the “language AST to quality model”transformations that come with the M2M Quality tool implemented for this thesisbecause these transformations are structured by several transformation libraries(cf. Section 6.2.2).

The second reason for the vague results is the selection of the metrics whichis mainly based on the results by Kapova et al. [KGH12]. Their results (1) alsosuffer from the problems in our questionnaire (c.f. the critique regarding the direct-rating method in Section 6.2.3 as well as the insufficient understanding of M2Mtransformation quality properties as described above) and (2) are specialized forQVT-R. Regarding the second issue, this thesis assumes that the respective met-rics are appropriate for other M2M languages, too. Whether this assumption iswrong is not determined yet as the first issue is the more critical one and shouldbe solved by future work first. Finally, note that the first issue particularly affectsthe work of van Amstel [vA11] as he uses a similar questionnaire (cf. Section 5.2for a description of van Amstel’s work).

7.1.2 Evaluation of ModuQ2: “What are the reasons fordifferences in modularity?”

This section summarizes and evaluates the detailed modularity results of the ap-pendix in Section C.5.3 and relates the statements of the questionnaire partici-pants (cf. the appendix in Section C.1.2) to these results. Several questionnaireparticipants identified similar reasons affecting modularity than covered by theapplied modularity metrics. For instance, statements like “Interdependent rules”,“Many calls to other rules”, or “small amount of parameters” clearly relate tothe “fan-out” and “rule dependency” metrics. As the evaluation of the qualityquestion states, it is important at this point to differentiate between the mod-ularity within one transformation module and the modularity between differenttransformation modules. Directly related to this difference is the cohesion andthe coupling of transformation modules. For a high modularity, the cohesion ofthe module should be high and the coupling (with respect to other modules) low.However, this difference is neither covered by the modularity definition nor con-sidered by the participants. A task for future work is, thus, to consolidate theseaspects by a refined modularity definition for transformations and to support thedefinition by concrete examples. This is particularly a necessary preparation fora controlled experiment that targets the assessment of modularity as well as thereinvestigation of the mentioned metrics.

Another class of statements by the questionnaire participants target features not(directly) covered by the modularity metrics of this thesis. One participant, forinstance, notes that modularity is negatively influenced by “global data variables,e.g., hash maps or lists”. The general metric GM2.9 measures the number of

105

7. Interpretation

intermediate structures and directly relates to the latter constructs. Accordingto Figure C.4, especially Java makes use of those constructs. This indicates thatJava generally suffers from these additional dependencies regarding modularity.In particular, this also confirms the hypothesis GH2.6 stating that Java’s generalpurpose constructs for intermediate structures lower quality properties.

A second example is the statement that a “matching between meta-model con-structs and transformation functions [...], e.g. one function for each class in themeta-model” has a positive influence on modularity. This statement is closelyrelated to the “organizational structure” subfeature of the “rule organization”feature (cf. Section 3.1.4): there are languages that dictate to align transforma-tion rules along the source or target domain, respectively. However, the languagesconsidered by this thesis are independent, i.e., they have their own organization.This has the consequence that a transformation engineer is in charge of decidingwhether the rules need to be aligned according to source or target domains. Asthis thesis does not measure the effect of these different design alternatives (norprovides metrics to measure these), it leaves this issue as a future work.

A further reason for differences in modularity can be the M2M approach andlanguage itself. For instance, QVT-R is often claimed to be modular withouthaving empirical evidence at hand (e.g., by Bradfield and Stevens [BS12]). Alsoone participant states that in “a relational language you seem to be forced toprogram tranformations [sig] in a modular way”. In fact, these claims can partlybe confirmed based on the questionnaire results for modularity. However, due tothe mentioned problems of the questionnaire, this result is only preliminary andneeds to be confirmed in future work.

In summary, the reasons for differences in modularity could not completely beidentified due to the mentioned problems. The gained results indicate, however,that relational approaches promise a good modularity. The main lesson learnedis that the notion of modularity needs to be clearly defined for transformationengineers by relating modularity to “cohesion” and “coupling”. This definitionneeds to be supported by (1) examples that include transformation modules and(2) controlled experiments.

7.1.3 Evaluation of ReuseQ1: “What is the reusability of theimplementations?”

This section summarizes and evaluates the detailed reusability results of the ap-pendix in Section C.5.4. The results for reusability are similar to the results formodularity. Java and QVT-O show a clear (positive) correlation with modularitywhile QVT-R seems to be less influenced by the modularity of the scenarios. How-ever, the different languages generally do not show large differences in reusabilitywhen compared to their differences in modularity (cf. Figure C.28 and Figure C.29in the appendix).

The results for Java and QVT-O particularly indicate that the participants hada modularity definition in mind that is closely related to reusability (“this trans-

106


formation rule is modular because it can easily be reused in other scenarios”). Thisis a further indication that (1) the modularity definition must be further refinedfor a clearer distinction between the two quality properties and (2) questionnaire(or controlled experiment) participants need more guidance in evaluating theseproperties.

The results for QVT-R, however, indicate that modularity itself was not theonly basis for evaluation. This is due to the fact that, compared to modularity,QVT-R suffers from lower values regarding reusability. One reason for this isthat QVT-R provides less support to specify “generic” transformation rules whencompared to Java and QVT-O, i.e., rules which can be applied independent ofconcrete metamodel elements. Therefore, QVT-R can be programmed modularbut is generally restricted to concrete usage scenarios (with the consequence of alower reusability). For instance, QVT-O provides “abstract mappings” allowing tospecify transformation rules that are not specific to only one metamodel element.QVT-O additionally comes with a larger set of dedicated and generic transfor-mation operations like, for instance, the copy operation. QVT-R, in contrast, isbasically restricted to relation specifications that strongly depend on the elementsof the metamodel. Note that QVT-R’s “query” operation can limitedly be seenas an exception of this (cf. Section C.5.4).

7.1.4 Evaluation of ReuseQ2: “What are the reasons fordifferences in reusability?”

This section summarizes and evaluates the detailed reusability results of the ap-pendix in Section C.5.5 and relates the statements of the questionnaire partici-pants (cf. the appendix in Section C.1.2) to these results. The questionnaire par-ticipants identified four main classes of factors that influence reusability: (1) thedegree between generic and specialized transformation rules, (2) modularity, (3)intermediate structures and rule organization, as well as (4) appropriateness rec-ognizability.

Concerning the first class of factors, the participants stated that transformations“bound to special classes”, with a “dependence to [a] meta-model”, and with a“hard-coded model access (e.g., using EMF functions)” have a negative influenceon reusability while “’abstract’ transformation parts” have a positive influence.This coincides with the evaluation for ReuseQ1 and can, thus, be confirmed.

The second class of factors relates reusability to modularity. For instance, par-ticipants stated that “non-existing library concepts”, “no real good inheritance /module concepts” have a negative influence on reusability while a “high modular-ity” has a positive influence. Also these observations fit to the evaluation resultsfor ReuseQ1. Interestingly, the participants commented on the usage of “trans-formation modules” which was not the case for modularity itself. This is, again,an indication that (1) the understanding of modularity needs to be improved and(2) there is a strong dependency between modularity and reusability.

107

7. Interpretation

Two participants target the third class of factors by stating that “global datavariables” have a negative influence on reusability while the possibility that a“single rule only relies on one meta-model element on the source-side (or thetarget side [...])” and the “independence of target/source meta-models” have apositive influence. These issues can generally be confirmed. The evaluation ofModuQ2 discusses these issues in detail; the evaluation for reusability can beperformed analogously.

The fourth class of factors relates reusability to appropriateness recognizability.One participant stated that a transformation “has to be understandable, so theuser can understand what it does and if it fits his purpose” in order to efficientlyreuse transformation rules of the transformation. However, the questionnaire re-sults for appropriateness recognizability do not provide a clear correlation withreusability. Since this does not disprove that appropriateness recognizability canhave an influence on reusability, future work needs to investigate this issue.

7.1.5 Evaluation of AnaQ1: “What is the analyzability of theimplementations?”

The results for analyzability indicate that analyzability positively correlates withappropriateness recognizability. There are generally only small differences betweenthese two quality properties. Therefore, this section refers to Section 7.1.11 for adetailed discussion on applied metrics and the interpretation of measured values.

7.1.6 Evaluation of AnaQ2: “What are the reasons fordifferences in analyzability?”

This section summarizes and evaluates the detailed analyzability results of theappendix in Section C.5.7 and relates the statements of the questionnaire par-ticipants (cf. the appendix in Section C.1.2) to these results. The questionnaireparticipants identified five main classes of factors that influence analyzability: (1)appropriateness recognizability metrics, (2) tooling aspects, (3) syntax, (4) mod-ularity, and (5) application conditions.

The first class of factors covers appropriateness recognizability metrics as alsoexpected by this thesis. The participants stated, for instance, that “some controlflow in the transformation helps identifying where the error occurred” and that“many lines of code”, “many [sic] infrastructure code” and a “bad documenta-tion” have a negative influence on analyzability. These issues relate to the metricsmeasuring the control flow (e.g., the “number of explicit scheduling calls” met-ric), size (e.g., the “lines of code” and “size of the domain pattern” metrics), andthe amount of documentation (e.g., the “number of additional/changed commentlines of code” metric) as applied for measuring appropriateness recognizability.This confirms that there is a strong dependency to appropriateness recognizabil-ity. Interestingly, no participant directly related analyzability to appropriatenessrecognizability when asked for AnaQ2. However, when asked for appropriate-

108


ness recognizability (ApproQ2), the participants also stated that appropriatenessrecognizability positively correlates with analyzability. A factor that could havecaused this result is the order of the questions asked within the questionnaire: theparticipants were asked for appropriateness recognizability after they evaluatedanalyzability.

The second class of factors cover tooling aspects such as the support for trac-ing and debugging. It was, e.g., stated that a “trace model to analyze is good,a well-designed debugger is better, both is perfect” and that the “ability to bedebugged” has a positive influence on analyzability. Section 7.1.14 discusses de-bugging in more detail; it concludes that future work needs to address debugging.Tracing, on the other hand, directly relates to the “tracing” feature for M2Mapproach/language/engine combinations. Table 4.1 illustrates that only QVT-Oand QVT-R have a dedicated support for tracing. This fact can theoretically bea reason for the generally lower analyzability questionnaire result values for Javawhen compared to QVT-O. As, however, the participants did not investigate is-sues related to tracing, these issues cannot have affected the evaluation results.An investigation of issues related to tracing is, therefore, left as future work.

Two statements by the questionnaire participants are related to the syntax(third class of factors) of transformation specifications: “if you have when clausesyou have to read from back to front what makes it [...] complicated” and that “badsyntax” has a negative influence on analyzability. Note that the former issue is anexample of the second issue. As this thesis does not focus on measuring a “badsyntax”, it leaves this issue as a future work. The hypothesis that the syntax of atransformation language has an influence on analyzability is generally reasonable.

The fourth class of factors relates analyzability to modularity. One participantdirectly states that a high “modularity” has a positive influence on analyzability.This assumption was also the basis for deriving the metrics Ana2.2.i and Ana2.4.ifor AnaQ2: these metric link to modularity metrics as the definitions of modularityand analyzability are similar. As AnaH2.1 could successfully be checked via thesemetrics (cf. the appendix in Section C.5.7), there is a clear indication that thelatter assumption is correct.

Interestingly, the hypothesis AnaH2.2 could not be verified (cf. the appendixin Section C.5.7). A reason for this relates to the assumption that the scenariosRule1 to Rule12 each have a lower analyzability as the Copy scenario. This as-sumption is falsified by the questionnaire results. For instance, Rule4 has highervalues for Java and QVT-R (cf. Figure C.30 in the appendix). These observationsindicate that the questionnaire suffers from an inability to compare values betweendifferent scenarios: there is no concrete reason why the Copy scenario should havea lower analyzability than another scenario. This is a further indication that thediscussed problems inherent to the questionnaire are critical (cf. the appendix inSection 6.2.3).

The fifth class of factors targets application conditions. One questionnaire par-ticipant stated that “missing pre- / post-conditions” have a negative influenceon analyzability. This coincides with the general hypothesis GH2.5 stating that

109

7. Interpretation

Java’s missing support for application conditions generally comes with a lowerquality. As the measurements for GM2.8 show (illustrated in Figure C.3 of theappendix), QVT-R generally applies most application conditions. However, thequestionnaire results indicate that this has not a significant, positive influence onanalyzability. For instance, QVT-R’s average analyzability lies at approximately3.9 while QVT-O comes with a value of 5.3 followed by Java with a value of 4.5.To investigate the influence of application conditions further, I suggest that futurework compares QVT-O implementations with and without application conditions(which is generally always possible). Afterwards, the results of this investigationcan be used to establish a comparison between different languages.

7.1.7 Evaluation of ModiQ1: “What is the modifiability of theimplementations?”

The results (cf. the appendix in Section C.5.8) of modifiability indicate that mod-ularity generally affects modifiability positively and correlates with it. However,there are situations in which other factors outweigh the positive influence of mod-ularity. One of these factors is the analyzability of the implementation. Forinstance, although the Rule8 implementation has a high modularity in QVT-R,its analyzability is especially low. The latter issue could, therefore, cause a lowmodifiability for the QVT-R implementation of Rule8. In fact, a causal relationbetween these three quality properties is likely as, e.g., for being capable of mod-ifying a transformation, a transformation engineer could first try to identify theappropriate part of the transformation (which is positively influenced by a highmodularity). In case the identified part is hard to analyze, the transformationengineer finds it difficult to modify the transformation.

For a detailed discussion on applied metrics and the interpretation of measuredvalues, this section refers to Section 7.1.1 as well as to Section C.5.8.

7.1.8 Evaluation of ModiQ2: “What are the reasons fordifferences in modifiability?”

This section summarizes and evaluates the detailed modifiability results of theappendix in Section C.5.9 and relates the statements of the questionnaire par-ticipants (cf. the appendix in Section C.1.2) to these results. The questionnaireparticipants identified five main classes of factors that influence modifiability:(1) modularity, (2) analyzability, (3) unit tests, (4) rule organization and special-ization, as well as (5) tracing.

The first and second class of factors relate modifiability to modularity and toanalyzability. For instance, the questionnaire participants stated that “a modu-lar structure makes it easier to find the place where to modify a transformation”as well as that “few files/functions” have a positive and “no local analyzabil-ity” has a negative influence on modifiability. This coincides with the evaluationof ModiQ1 (cf. Section 7.1.7) and can, therefore, be confirmed. Furthermore,

110


the participants identified the “usage of external libraries” and “dedicated libraryfunctions” as factors that influence modifiability negatively. One participant givesthe “copy” operation of QVT-O’s Copy scenario implementation as an example.This participant emphasizes that this scenario can be understood easily but it ishard to modify due to the “copy” operation. The questionnaire results for theCopy scenario confirm this hypothesis: the QVT-O implementation comes withan average value of approximately 3.11 for modifiability while the Java implemen-tation has an average of approximately 3.94 and the QVT-R implementation anaverage of approximately 4.94.

The third class of factors suggests to use unit tests for improving modifiability:according to one participant “no unit tests” have a negative influence on modifi-ability. The reason for this is that these tests automatically allow to determinewhether pre-defined test cases evaluate successful on transformation execution[CFM10]. This helps, for instance, to test whether a transformation still pro-vides correct results after a modification. However, this thesis does not cover unittesting for M2M transformations. In particular, the classification in Section 3.1lacks in providing features that target “testing”. Therefore, future work shouldinvestigate this issue further and extend the classification accordingly.

The fourth class of factors targets the rule organization and specialization ofrules. For instance, the questionnaire participants stated that “specialization canallow easier modifications for single elements, since other parts of the transfor-mation will not be effected” and that “treating every meta-model element in aseparate rule [...]” has a positive influence on modifiability. The evaluation ofModuQ2 already discusses the latter issue which is related to the rule organi-zation. The former issue relates to the usage of dedicated library functions asdiscussed above for “modularity factors”: one possibility for implementing a func-tionality is to use an existing “dedicated library function” (if one exists) whileanother possibility is to implement a specialized transformation rule for the de-sired purpose. If a transformation engineer expects that the requirements of thetransformation rule applied change often, the transformation engineer will profitfrom a specialized transformation rule. On the other hand, “dedicated libraryfunctions” are generally easier to understand. An example for these statementsprovides the usage of the “copy” operation (also described above for “modularityfactors”).

The fifth class of factors targets the relation between tracing and modifiability.One participant stated that “an automatically maintained trace model” has a pos-itive influence on modifiability. In personal communication with this participant,he further reported on his experience with triple graph grammars [Sch95, KS06]where changes of transformation rules can also affect the associated tracing modelwhich must, hence, also be modified. The modification of the trace model caneither be manual or automatic. This kind of tracing model is neither coveredby the classification of M2M approach/language/engine combinations nor by theconsidered languages. Therefore, future work should investigate this issue furtherand extend the classification accordingly.

111

7. Interpretation

In addition to the reasons identified by the questionnaire participants, the ap-pendix in Section C.5.9 identifies further reasons for differences in modifiabilitywhen comparing different M2M languages: scenarios that are characterized bya high amount of the scenario features “hierarchical clustering”, “refinement”,“multiple source domains”, and “refactoring” are not suited for QVT-R regardingmodifiability. The reason for this is that these scenario features cannot easily bederived from a generic copy rule set.

7.1.9 Evaluation of ConsQ1: “What is the consistency of theimplementations?”

This section summarizes and evaluates the detailed consistency results of the ap-pendix in Section C.5.10. The questionnaire results for consistency (cf. Figure C.32in the appendix) indicate that QVT-R generally provides the best consistencyfollowed by QVT-O and Java. One important question is which kind of consis-tency the questionnaire participants assessed: as Figure C.36 shows, only veryfew inconsistencies in terms of inconsistent naming of variables and rules, missingcomments, etc. have been identified. This generally indicates a high consistencywithin every language. However, the results of the questionnaire provide the clearresult that QVT-R performs best which, at first, seems as a contradiction.

A reason for this can be another aspect of consistency: the degree to whichseveral implementations share a similar structure. Most QVT-R implementa-tions share, for instance, the structure as provided by the Copy scenario (e.g.,most QVT-R implementations specify the “MapCircle”, “MapBlock”, and “Map-Square” relations). Within Java or QVT-R, on the other hand, the structure ofthe implementations changes more strongly. For instance, the QVT-O Copy sce-nario implementation has only one operation while the implementations for Rule1to Rule12 have up to seven other operations.

Several metric measurements, indeed, indicate that the QVT-R implementa-tions provide a more constant structure than the Java and QVT-O implementa-tions. First of all, the discussed “number of rules” (Cons1.1.3) metric measure-ments provide this indication. This metric was, however, expected to correlatenegatively with consistency. As this is not reflected by the measurements, thisexpected correlation must be revised. Nonetheless, given a set of scenarios thathave to be consistently implemented, the results indicate that there is a negativecorrelation between consistency and the standard deviation over the scenarios’number of rules. Similarly, the measurements for the “average number of do-mains” (Cons1.1.5) and the “average rule dependency depth” (Modu1.1.3) allowfor the same interpretation since the QVT-R implementations provide the mostconstant number of domains as well as rule dependency depth over the consideredscenarios. These metrics confirm the assessments by the questionnaire participantsas they indicate that QVT-R is most consistent when considering the structure ofseveral scenario implementations.

112


7.1.10 Evaluation of ConsQ2: “What are the reasons fordifferences in consistency?”

This section summarizes and evaluates the detailed consistency results of the ap-pendix in Section C.5.11 and relates the statements of the questionnaire partic-ipants (cf. the appendix in Section C.1.2) to these results. The questionnaireparticipants identified four main classes of factors that influence consistency: (1)naming, (2) documentation, (3) structure, and (4) reuse.

The first and second class of factors target a consistent naming of variables,transformation rules, etc. as well as a consistent documentation, e.g., by trans-formation code comments. For instance, the participants stated that “namingconventions for functions” and a “documentation [that] describes what is actuallyimplemented” positively influence consistency. As the questionnaire participantsfound only few inconsistencies of this kind (cf. Section 7.1.9), a detailed investi-gation of the influence of these factors is left as a future work.

The third class of factors targets the structure of transformation implementa-tions. The participants stated, for instance, that the “partition of [a] transfor-mation in consistent parts, i.e., one method/transformation rule per meta modelelement” has a positive influence on consistency. Section 7.1.9 confirms this state-ment.

The fourth class of factors relates to the reuse of existing transformation rules.For instance, the questionnaire participants stated that “using the same codefor the same tasks” and the degree to which “similar calculations/relations areimplemented in the same way” positively influence consistency. Reuse is generallyrelated to the structure (third class of factors) of transformation implementationsas well as the reusability quality property. The latter aspect suggests to comparethe questionnaire results for reusability with the results for consistency. However,the reusability measurements show no clear correlation between consistency andreusability. This indicates that the influence of reusability itself is minor. Adetailed investigation of this issue is left as a future work.

In general, the measured results do not provide clear indications for concretescenario features that influence consistency. For instance, the hypothesis ConsH2stating that the scenario features “hierarchical” and “abstraction level” generallycause a lower consistency than the “1:1 relations” and “structure” features couldnot be confirmed (cf. Section C.5.11).

7.1.11 Evaluation of ApproQ1: “What is the appropriatenessrecognizability of the implementations?”

This section summarizes and evaluates the detailed appropriateness recognizabilityresults of the appendix in Section C.5.12. The questionnaire results correspondingto Appro1.2.1 and Appro1.2.2 indicate that QVT-O generally performs best, fol-lowed by Java and QVT-R. However, the results of Appro1.1.1 to Appro1.1.6 donot reflect these results when sticking to the expected correlations of these metrics

113

7. Interpretation

with appropriateness recognizability. On the contrary, the correlations appear tobe the opposite of the expected ones, i.e., instead of positive correlations, nega-tive correlations can be observed. For instance, the measurements of the “lines ofcode” metric (Appro1.1.1) suggest that QVT-O performs worst but the oppositeis the case.

I conclude, therefore, that the correlations of appropriateness recognizabilitymetrics as identified by Kapova et al. [KGH12] do not hold for a comparisonbetween different languages. First empirical evidence collected by this thesis in-dicates that the correlations are generally the opposite. The reason for this canbe that the applied metrics allow measurements related to the complexity of dif-ferent implementations. This allows for the following interpretation: the highera measured value, the more complicated it is to understand. This interpretationparticularly induces a negative correlation with appropriateness recognizability.

Still, the correlations can hold when staying within one language, i.e., whencomparing the measured values for different scenarios within one language. Thisshould especially be the case for QVT-R as Kapova et al. [KGH12] focus on thislanguage. However, the measured correlations also deviate from the expectedones. For instance, the QVT-R measurements for Appro1.1.1, Appro1.1.3, andAppro1.1.6 indicate that the Rule8 implementation has a high appropriatenessrecognizability when compared to the other scenarios implemented in QVT-R. Incontrast, the questionnaire results indicate that the Rule8 implementation per-forms worst.

I conclude that, for appropriateness recognizability, the correlations identifiedby Kapova et al. [KGH12] cannot be seen as proven, solid, and final. The mainreason for this is that I could not generalize the identified correlations to differentscenarios than applied by Kapova et al. [KGH12]. This indicates that (a) thereare unidentified factors that influence the correlation of the applied metrics or(b) the applied metrics do not causally relate to the quality property of interest.The evaluation of the reason question for appropriateness recognizability providesa first inspection of factors that can influence appropriateness recognizability. Acomplete investigation of the mentioned issues is, however, left as future work.

7.1.12 Evaluation of ApproQ2: “What are the reasons fordifferences in appropriateness recognizability?”

This section summarizes and evaluates the detailed appropriateness recognizabilityresults of the appendix in Section C.5.13 and relates the statements of the ques-tionnaire participants (cf. the appendix in Section C.1.2) to these results. Thequestionnaire participants identified four main classes of factors that influence ap-propriateness recognizability: (1) structural aspects, (2) number and purpose oflanguage constructs, (3) documentation, and (4) other quality properties.

Structural aspects relate to the amount of infrastructure code (e.g., for loadinga model instance) and to the general structure of transformation rules. Partici-pants stated, for instance, that “hiding of repeated maintenance code” and “short

114


language concepts to query, to map, to access trace” influence appropriatenessrecognizability positively and that a “large overhead of the programming/trans-formation language, due to its syntax” has a negative influence. These statementscan be confirmed based on the questionnaire results for the quality question: the“average size of the domain pattern” metric (Appro1.1.5) indicates that Java hasmore infrastructure code than QVT-O (on average, the values for Java are ap-proximately 130% higher than for QVT-O). At the same time, the participantsevaluated the appropriateness recognizability of Java approximately 17% lowerthan for QVT-O. This indicates that a causal relation between the amount ofinfrastructure code and appropriateness recognizability is likely.

Concerning the general structure of transformation rules, the participants statedthat the “reuse of [the] rule structure” and “the clear mapping structure ofQVT-R” positively influence appropriateness recognizability. These statementscan be confirmed based on the measurements for QVT-R: the transformationrules as applied for the Copy scenario are reused for several other scenarios. TheRule8 scenario is, however, an exception of this. Besides inspecting the differentimplementations, the identification of this exception is also possible by the mea-surements of the “number of changed rules when moving from Copy to RuleX”(Modu2.2.1) and the “number of additional rules when moving from Copy toRuleX” (Modu2.2.2) metrics. The measurements of these metrics show especiallyhigh values for Rule8 in QVT-R. On the other hand, the appropriateness recogniz-ability as evaluated by the questionnaire participants is especially low for Rule8.This indicates that changes in appropriateness recognizability are caused by devi-ating from well-known structures and clear mapping structures as inherent to theCopy scenario in QVT-R. With respect to QVT-R, this particularly indicates thatscenarios with a high degree of 1:1 relations perform better regarding appropri-ateness recognizability than scenarios with fewer 1:1 relations. However, this doesnot mean that QVT-R outperforms other languages given a scenario with a highdegree of 1:1 relations. On the contrary, the questionnaire participants assessedthe QVT-O implementation of the Copy scenario (which has a high degree of 1:1relations) with the highest appropriateness recognizability along the consideredlanguages.

As the Copy scenario provides a “clear mapping structure for QVT-R”, thereason for the appropriateness recognizability difference between the QVT-O andQVT-R implementation for the Copy scenario is not a structural aspect (thestructural aspect would suggest that QVT-R performs best). The second mainclass of factors influencing appropriateness recognizability (i.e., the number andpurpose of language constructs) provides a possible reason that outweighs influ-ences by structural aspects: QVT-O provides the “copy” operation as a dedicatedlanguage construct for copying models. Even though restricted to endogenoustransformation scenarios, applying this special-purpose language construct seemsto have a positive effect on appropriateness recognizability.

Regarding the number and purpose of language constructs, the questionnaireparticipants stated that “a huge set of keywords and short-cuts in the language”

115

7. Interpretation

have a negative influence and “Few but clear language concepts” have a posi-tive influence on appropriateness recognizability. As the observations above show,these statements are not differentiated enough: there is a trade-off between thenumber of keywords and the appropriateness of the keywords in a given scenario.Like for the case of the Copy scenario, QVT-O (which provides more keywordsthan QVT-R) performs better because it comes with a language construct thatperfectly fits to the scenario’s requirements. Therefore, it also comes with a betterappropriateness recognizability. Note that these issues also influence the learnabil-ity quality property (as discussed in Section 7.1.13).

Regarding the third class of factors (documentation), one questionnaire par-ticipant stated that a “good documentation” has a positive influence on appro-priateness recognizability. This statement can partly be confirmed based on themeasurements for the “number of additional/changed comment lines of code whenmoving from Copy to RuleX” metric (Appro2.2.1): the measurements indicatethat QVT-R performs worst regarding appropriateness recognizability when com-pared to QVT-O and Java. However, this result is very limited due to the issuesdiscussed for the measurements of Appro2.2.1 (cf. Section C.5.14). Therefore, theinfluence of the documentation should be reinvestigated in future work.

Finally, some statements by the questionnaire participants relate appropriate-ness recognizability to other quality properties (fourth class of factors). One par-ticipant states that appropriateness recognizability correlates with analyzabilityand another participant states that it positively correlates with modularity. Thiscan be confirmed for analyzability as discussed in Section 7.1.5. However, there isno clear correlation between appropriateness recognizability and modularity whensticking to the questionnaire results.

7.1.13 Evaluation of LearnQ1: “What is the learnability of thelanguage/engine combinations?”

This section summarizes and evaluates the detailed learnability results of theappendix in Section C.5.14. The evaluation of the questionnaire correspondingto Learn1.2.1 and Learn1.2.2 provides clear results for this question: QVT-Operforms best, followed by Java and QVT-R. The (relative) differences betweenthe different evaluations (compared to QVT-O, the learnability values for Java are21.83% and for QVT-R 36.93% lower; cf. the evaluation of Learn1.2.1/Learn1.2.2in Section C.5.14) indicate a tendency of the extend to which learnability differsbetween the different languages. However, these differences should be used withcaution as the values are not ratio-scaled.

Sticking to these empirical results, the consequences for the other metrics corre-sponding to this question are the following. Firstly, there is no negative correlationbetween learnability and the number of possible language constructs (Learn1.1.1)but the values suggest that the opposite is true, i.e., there is a positive correla-tion. One reason for high values in QVT-O is that QVT-O provides a rich set of

116


language constructs dedicated for M2M transformations. Having such a rich setof (useful) constructs at hand can foster the learnability of an M2M language. Infact, Learn1.1.2 shows that (absolutely seen) most constructs have been appliedin QVT-O. On the other hand, learnability of a language does not necessarilycause this positive correlation between Learn1.1.1 and Learn1.1.2. As QVT-Oprovides more language constructs, it seems natural to also use more of thesewhen compared to other languages. However, being capable of doing so indicatesthat QVT-O has a good learnability.

The percentage of applied language constructs (additional metric to Learn1.1.2;illustrated by Figure C.20) has no positive correlation (as expected) but a negativecorrelation with learnability. This illustrates that languages with few languagesconstructs (e.g., QVT-R) suffer from another problem than learning the languageconstructs themselves. This could, therefore, indicate that the problems in learn-ing a language result from using the language constructs efficiently and effectively.For imperative languages, the results indicate that this is less of a problem thanit is for declarative languages.

Finally, the empirical results confirm that the “time until a scenario was im-plemented successfully” (Learn1.1.3) has a negative correlation with learnability.This impression is strengthened by the hypothesis mentioned in Section C.5.14that the Java implementations profited from the QVT-O implementations whichwere completed before implementing the corresponding scenarios in Java.

In summary, Learn1.1.3 most clearly correlates with the questionnaire resultsfor learnability. Furthermore, the causal relationship between this metric andlearnability is intuitive. Whether the other metrics really allow for deriving thelearnability of a language (i.e., that there is a causal relationship), needs to beconfirmed in future work with more M2M languages and with a higher varianceof metrics for the number of language constructs (cf. the “Learn1.1.1” paragraphin Section 5.5.7).

7.1.14 Evaluation of LearnQ2: “What are the reasons fordifferences in learnability?”

This section summarizes and evaluates the detailed learnability results of theappendix in Section C.5.15 and relates the statements of the questionnaire partic-ipants (cf. the appendix in Section C.1.2) to these results. Four statements of theparticipants are directly related to the number of language constructs: “huge set ofkeywords and short-cuts in the language” (negative effect), “few but clear languageconcepts” (positive effect), “The expressiveness of a language negatively affects itslearnability. A custom DSL for model transformations is easier to learn.”, and “asuitable DSL (Java is no transformation language!)”. The questionnaire resultsfor QVT-O regarding learnability show that the former two statements can (atleast partly) be discarded. The results indicate that a rich library of dedicatedtransformation language constructs (like the latter two statements indicate) foster

117

7. Interpretation

a good learnability. Languages with fewer constructs suffer from the problem thatit is harder to compose the constructs.

Another factor that influences learnability is the language paradigm as indicatedby one participant: “Way of thinking (imperative will be easier to learn for most ofthe programmers)”. This statement can be confirmed based on the questionnaireresults.

As both, QVT-O and Java, are imperative languages, it is also of interest whythere are differences between these languages. One aspect is that QVT-O is a ded-icated transformation language. This relates to the statements above (“a customDSL”) as well as to another statement by one participant: “Less infrastructurecode” has a positive effect on learnability. This statement precisely points to oneproblem of model transformations within Java: in contrast to QVT-O, a trans-formation engineer has to manually implement methods for loading, serializing,executing, etc. models and transformations, respectively, with the consequence ofa lower learnability.

One participant further stated that the “ability to be debugged” fosters learn-ability. This is closely related to the capabilities of M2M engines. However, alsothe M2M language itself influences the possibilities for debugging (e.g., debuggingof the graphical and the textual syntax of QVT-R needs to be realized in differentways). These concepts should, therefore, be added to the classification of M2Mapproaches, languages, or engines and further inspected in future work.

7.2 Derived Decision Tree

This section manifests the interpretation results that can assist a transformationengineer in selecting an appropriate M2M approach/language/engine combinationin a given transformation scenario. To accomplish this manifestation, the sectionapplies and proposes a first version of a decision tree. This decision tree is neitherexhaustive nor based on sufficient empirical evidence. Instead, its purpose is togive a first impression of the possibilities this thesis’ framework for M2M qual-ity assessment offers. Future work needs, therefore, to incrementally extend thedecision tree and to provide further empirical evidence regarding its correctness.

The decision tree is based on three main observations: (1) only QVT-R has adedicated support for the retainment of user modifications, (2) regarding mod-ifiability, the scenario features “hierarchical clustering”, “refinement”, “multiplesource domains”, and “refactoring” are not suited for QVT-R, and (3) QVT-Operforms best regarding learnability.

Table 4.1 induces the first observation and Section 6.2.1 confirms the observationas it was not possible to implement a scenario requiring the retainment of usermodifications in QVT-O and Java. Section 7.1.8 provides evidence for the secondobservation as QVT-R performs badly for scenarios that are characterized bythe mentioned scenario features. The third observation is based on the clearinterpretation results in Section 7.1.13 and Section 7.1.14. The latter two sectionsconclude that QVT-O generally performs best regarding learnability.

118

7.2 Derived Decision Tree

Figure 7.1 shows the derived decision tree based on the latter three observations.Given an M2M scenario, the first decision node (upper left) asks whether theretainment of user modifications is a primary requirement. If this is the case,the transformation engineer is recommended to use QVT-R with Medini QVT toimplement the scenario (observation (1)).

Retain usermodifications main

requirement?Use QVT-R with

Medini QVT

High degree of scenario features in S

yes

no

S := { "hierarchical clustering", "refinement", "multiple source domains", "refactoring" }

Use X

Do not use QVT-R with Medini QVT but use QVT-O with QVT

Operationalyes*

no

* : especially if the quality property "modifiability" is important

High experiencein a language/engine combination X ∈ C?

Use QVT-O with QVT Operational

yes

noC := { "Java/EMF with JVM", "QVT-O with QVT Operational", "QVT-R with Medini QVT", ...}

Figure 7.1: Derived Decision Tree

Otherwise, the transformation engineer needs to check the second decision node.The second node requires the transformation engineer to investigate the given sce-nario regarding the degree to which it is characterized by hierarchical clusterings,refinements, etc. In case this degree is high, the decision tree explicitly statesthat QVT-R with Medini QVT should not be applied. Instead, it suggests to useQVT-O with the QVT Operational engine because of its generally higher learn-ability (observations (2) and (3)). In case modifiability is a primary concern, thissuggestion should be seen as a hard rule.

If, on the other hand, the scenario is not characterized by the mentioned scenariofeatures, the final decision node has to be considered. The final decision node askswhether the transformation engineer has a high experience in a language/enginecombination X. If this is the case, the tree suggests to use X as it takes thetransformation engineer only few effort to use the language/engine combinationefficiently and effectively. Otherwise, the tree suggests to use QVT-O with theQVT Operational engine. This is based on the observation that QVT-O generallyprovides the best learnability (observation (3)) and, thus, the effort of learning anM2M language is kept minimal.

Note that it was necessary to use observation (3) for two different decisions. Inparticular, this points to a general drawback of decision trees: decision trees sufferfrom exponential growth in the number of the variables used as a basis for decisionnodes (the variable “learning” in this case) [BS99]. To cope with this issue, in-

119

7. Interpretation

fluence diagrams [BS99] can, for instance, be used instead. Another advantage ofinfluence diagrams is that they allow to include the full set of influencing factors asidentified in Chapter 7. Furthermore, the outlined decision making problem canbe extended by the transformation engineers’ preferences or utility functions. Anextensive investigation and evaluation of these issues is, therefore, suggested as afuture work. As a starting point, Bielza and Shenoy [BS99] provide an overviewand comparison of different techniques for these decision making problems. Inthis context, the decision tree in Figure 7.1 is a first, yet important step towardsa structured guidance for transformation engineers.

7.3 Discussion of the Goal Attainment

This section evaluates whether this thesis attains the goal of the GQM plan to“analyze M2M implementations for the purpose of comparing M2M approaches,languages, and engines with respect to maintainability from the viewpoint of thetransformation engineer in the context of M2M transformation implementation,execution, and maintenance” (cf. Section 5.3). This evaluation is generally thefinal step of the GQM method (cf. Section 2.2.1).

Section 7.1 successfully answers several questions of the GQM plan. Therefore,the goal is attained to a certain extent. For instance, the derived decision tree isa new, clear, and useful result that contributes to the goal attainment. However,the goal cannot be seen as completely and exhaustively be attained in the sensethat all possible analyzes regarding maintainability were executed. Instead, theanswers to the GQM questions provide initial insights and need to be confirmedand further investigated by future work. Section 8.3 points, therefore, to severalrecommendations for future work.

7.4 Threats to Validity

This section describes the most important threats to validity of this thesis’ empir-ical study. The section focuses on the four types of validity described by Wohlinet al. [WRH+12, Section 8.7]: (1) conclusion validity, (2) internal validity, (3) con-struct validity, and (4) external validity.

Wohlin et al. [WRH+12, Section 8.7] apply the terminology illustrated in Fig-ure 7.2 to describe the four types of validity. Firstly, they generally distinguishbetween the theory (experiment objective) and the observation of an empiricalstudy. Within this thesis, the theory (upper half in Figure 7.2) is a hypothesis asstated by the GQM plan. The hypothesis describes a relationship between a causeand an effect construct. For instance, the hypothesis LearnH2.2 states that thesize of a language documentation (cause) does not have a significant influence onthe learnability of a language (effect).

120


Cause Construct(e.g., "Size of Language

Specification")

Effect Construct(e.g., "Learnability")

Treatment(e.g., "QVT-O Specification")

Outcome/Output(e.g., "82")

Observation

Theory cause-effectconstruct

treatment-outcomeconstruct

Factor (e.g., "Language Specification") ∈

Independent variables (e.g., "{Language

Specification, Scenario, Implementation, ...}")

Dependent variable(metric; e.g., "Number of Pages")Experiment operation

(data collection)

Experiment objectives(hypothesis)

1 2

33

4

Legend:1: Conclusion Validity2: Internal Validity3: Construct Validity4: External Validity

Figure 7.2: Experiment Principles and Threats to Validity (derived from[WRH+12, p. 103])

The observation (lower half in Figure 7.2) describes the experiment to test thishypothesis. The experiment consists of treatments and outcomes. A treatment isone particular value a factor can take and a factor is an independent variable thatis subject of the study. For instance, a “language specification” is a factor and the“QVT-O specification” is a treatment. The outcomes (or outputs) are the valuesthe dependent variables can take. This thesis specifies the dependent variablesvia the metrics of the GQM plan. For instance, the metric “number of pages”(Learn2.12) is an example for a dependent variable and the concrete measurementresult “82” is the corresponding outcome for the QVT-O specification.

The arrows in Figure 7.2 depict relationships between causes, effects, treat-ments, and outcomes. These relationships have an influence on the validity of theexperiment. Therefore, Figure 7.2 additionally illustrates which types of validityneed to be considered for the respective relationship: each number within a circlecorresponds to exactly one type of validity (“1” for conclusion validity, “2” forinternal validity, “3” for construct validity, and “4” for external validity).

This section describes each validity type and applies it to the empirical studyof this thesis. Therefore, Section 7.4.1 covers conclusion validity, Section 7.4.2 in-ternal validity, Section 7.4.3 construct validity, and Section 7.4.4 external validity.

121

7. Interpretation

7.4.1 Conclusion Validity

Conclusion validity is the degree to which conclusions regarding the relationshipbetween treatments and outcomes are correct. Examples for these issues are thecare taken in the implementation of the transformation scenarios and the correct-ness of the counted number of pages from the example depicted in Figure 7.2.Circle “1” in Figure 7.2 annotates the relationship this threat affects.

Wohlin et al. [WRH+12, Section 8.8.1] provide a list of threats to conclusionvalidity. In the following, this section uses the most important (regarding thisthesis) items of this list to comment on conclusion validity threats that concernthis thesis (cf. Wohlin et al. [WRH+12, Section 8.8.1] for a detailed description ofthe different threats):

Low statistical power This thesis applies no hypothesis testing which, e.g., in-volves the specification of a null hypothesis. The hypothesis GH1.1, forinstance, has no predefined limits for its classification of standard deviationvalues into “low”, “medium”, or “high”. This has the consequence that thereis a high risk that erroneous conclusion are drawn. The reason for this isthat it is hard to reject the hypothesis in case it is erroneous. Furthermore,the lack of hypothesis testing makes it impossible to state whether observedcorrelations are significant based on a significance level.

Another major threat is the low number of questionnaire participants. Inparticular, these participants were divided into three different groups thathad to evaluate different scenarios. This lowers the statistical power regard-ing single scenarios further.

Moreover, the questionnaire is based on the direct-rating method (cf. Sec-tion 6.2.3). Section 7.1.1 concludes, for instance, that the application of thismethod caused questionnaire results with a low statistical power.

Violated assumptions of statistical tests This thesis assumes that the resultvalues of the questionnaire are (1) interval-scaled and (2) comparable re-garding different languages as well as scenarios. However, the values foranalyzability are not comparable regarding different scenarios (as shown inSection 7.1.6).

Reliability of measures This threat especially concerns experiments that involvehuman judgment. Accordingly, the questionnaire suffers from different in-terpretations of the quality properties by the participants. For instance,Section 7.1.1 concludes that modularity was interpreted differently by theparticipants. The automated measurements, on the other hand, provide abetter reproducibility and, thus, suffer less from this threat. Nonetheless,bugs in the implementation of the M2M Quality tool and the scenario im-plementations are a possible threat to these measurements.

122


Random heterogeneity of subjects The questionnaire participants had gener-ally different knowledge in transformation languages. They were not askedfor their knowledge regarding quality properties but I suspect also a highheterogeneity regarding this aspect. On the other hand, the fact that Iselected the participants only from the Software Engineering Group of theHeinz Nixdorf Institute and Department of Computer Science at the Univer-sity of Paderborn (cf. Section 6.3.2) lowers their heterogeneity. Note that ahigh heterogeneity has a negative effect on conclusion validity but a positiveeffect on external validity (cf. Section 7.4.4).

7.4.2 Internal Validity

Internal validity is the degree to which conclusions regarding the relationship be-tween treatments and outcomes are causal (given an observed relationship; cf.“conclusion validity” in Section 7.4.1). This particularly involves the degree towhich uncontrolled factors influence outcomes (a high degree has a negative in-fluence on internal validity). Examples for threats to internal validity are theselection of questionnaire participants and their division into different groups.Circle “2” in Figure 7.2 annotates the relationship this threat affects.

Wohlin et al. [WRH+12, Section 8.8.2] provide a list of threats to internal valid-ity. In the following, this section uses the most important (regarding this thesis)items of this list to comment on internal validity threats that concern this the-sis (cf. Wohlin et al. [WRH+12, Section 8.8.2] for a detailed description of thedifferent threats):

History The questionnaire participants did not evaluate the questionnaire withina controlled environment. Therefore, the history (e.g., a participant had tocatch a bus) could have influenced the outcomes of the questionnaire.

Maturation On the one hand, questionnaire participants can get bored duringthe experiment. The length of the questionnaire and the size and number ofpresented scenario implementations is certainly a factor that has an influenceon the mood of the participants. On the other hand, there is also a learningeffect when evaluating several scenarios. For instance, Section 7.1.14 ob-serves results that indicate a learning effect regarding the M2M languagesand implementation structure.

Instrumentation The questionnaire was handed out via Google Forms (cf. Sec-tion 6.3.2). The questions in the latter questionnaire differ from the ques-tions presented in Section B.3 of the appendix in the aspect that one ques-tion had to be asked separately for each considered language (in contrast,Section B.3 includes every language within one evaluation question). In par-ticular, this could have negatively influenced the comparability between thevalues for different languages.

123

7. Interpretation

Selection I selected the participants only from the Software Engineering Groupof the Heinz Nixdorf Institute and Department of Computer Science at theUniversity of Paderborn (cf. Section 6.3.2). Furthermore, all participantswere volunteers. This could have had an effect on the general performanceof the participants. Furthermore, the selection of the scenarios has an influ-ence on internal validity as mainly scenarios that were already implementedin QVT-R were selected. Another example is the selection of the M2M lan-guages as well as the selection of a language documentation for the metricsLearn2.12 to Learn2.16.

Mortality One participant dropped out during the questionnaire after evaluatingthe questionnaire for exactly one scenario (UML2RDBMS). In personal com-munication with this participant, he argued that he had better evaluated theCopy scenario first because the UMML2RDBMS scenario implementation isvery large and, thus, his motivation decreased. Furthermore, he argued thathe did not find the time for finishing the remaining evaluations.

7.4.3 Construct Validity

Construct validity is the degree to which conclusions regarding the relationshipbetween theory and observation are correct. This includes that (1) the treatmentsufficiently reflects the cause, and (2) the outcome sufficiently reflects the effect.For instance, the number of possible language constructs (Learn1.1.1) may be apoor measure for the learnability of a language while the time until a scenario wasimplemented successfully (Learn1.1.3) may be a better measure. The two circles“3” in Figure 7.2 annotate the relationships this threat affects.

Wohlin et al. [WRH+12, Section 8.8.3] provide a list of threats to constructvalidity. In the following, this section uses the most important (regarding thisthesis) items of this list to comment on construct validity threats that concernthis thesis (cf. Wohlin et al. [WRH+12, Section 8.8.3] for a detailed description ofthe different threats):

Inadequate preoperational explication of constructs Some metrics (e.g., thelanguage metrics in Section 5.5.7) have not been applied to M2M trans-formations before. Therefore, they can suffer from unclear definitions or areinadequate to measure the effect construct of interest. As mentioned in theintroduction of this section, the metric Learn1.1.1 may be inadequate, forinstance.

Interaction of different treatments As the participants had to evaluate severaldifferent scenarios and language implementations, this can have influencedtheir assessment of the quality properties, i.e., their evaluation scheme couldhave been different for different scenarios. For instance, after evaluatingseveral scenario implementations with a low modularity, a scenario with amedium modularity may be assessed too high regarding modularity.

124


7.4.4 External Validity

External validity is the degree to which (internally valid) results can be gener-alized. For instance, if the results of this thesis have a high external validity,they will also hold for scenarios not considered within this thesis, i.e., the rela-tionship between cause construct and effect construct is generalizable. Anotherexample is the generalizability from QVT-R to declarative approaches. Circle “4”in Figure 7.2 annotates the relationship this threat affects.

Wohlin et al. [WRH+12, Section 8.8.4] provide a list of threats to externalvalidity. In the following, this section uses the most important (regarding thisthesis) items of this list to comment on external validity threats that concern thisthesis (cf. Wohlin et al. [WRH+12, Section 8.8.4] for a detailed description of thedifferent threats):

Interaction of selection and treatment The selection of participants is based onthe fact that they had at least some prior experience with M2M transfor-mations. In case an engineer with no prior knowledge wants to learn anM2M transformation language, the results (e.g., regarding learnability) can,therefore, be biased.

Interaction of setting and treatment The selected scenarios can be seen as small.Therefore, it is unclear whether the gained results also hold for larger examples,especially within an industrial context.

7.4.5 Discussion of Threats to Validity

Section 7.4.1 to Section 7.4.4 show that there are especially threats to the validitythat relate to the design and execution of the questionnaire as well as to the resultsgained by the questionnaire. The interpretation in Section 7.1 takes this generallyinto account by drawing conclusions that relate to the questionnaire with caution.

This special care worked out well due to a threat to construct validity that didnot affect the empirical study of this thesis: the thesis copes well with the “mono-method bias” threat [WRH+12, Section 8.8.3]. The mono-method bias describesthe threat of using only a single type of measurement or observation with theconsequence of misleading experiments. However, by generally inspecting severalmetric measurements, it was possible to cross-check results against each other anddetect inconsistencies.

For instance, by inspecting the Copy scenario (thus, providing a second obser-vation), Section 7.1.6 concludes that analyzability cannot be compared betweendifferent scenarios when just sticking to the questionnaire results. Section 7.1.2provides another example by inspecting the reason for a low modularity in Java(regarding the questionnaire results and compared to the other languages). Basedon a statement by one participant (stating that global data variables influencemodularity negatively), Section 7.1.2 inspects the measurement results for GM2.9(number of intermediate structures) which provide a measure for “global datavariables”. As especially Java makes use of these constructs, the questionnaireresults could be confirmed.

125

8 Conclusions

This chapter concludes this thesis. Firstly, Section 8.1 summarizes the contentsof the thesis. Afterwards, Section 8.2 provides the lessons learned and outlinesthe benefits for transformation engineers and researchers. Section 8.3 closes thischapter by highlighting suggestions for future work.

8.1 Summary

To cope with the lack of guidance in selecting reasonable model-to-model (M2M)transformation technologies and applying these, this thesis introduces an initialframework for assessing and comparing the quality of M2M approaches, languages,and engines. One main idea of the framework is to base comparisons on concreteM2M scenarios implemented within an M2M language and compliant to an M2Mengine.

This thesis applies the Goal/Question/Metric (GQM) method for a structureddevelopment of the framework. For qualitative comparisons, the framework pro-vides suitable classifications of the five M2M dimensions (features, approaches,languages, engines, and scenarios). For quantitative comparisons, the frameworkshows how to compile a GQM plan, execute planned measurements, and interpretcollected results to gain valuable insights. The framework is supported by the“M2M Quality” tool which can measure quality metrics on M2M transformationspecifications.

Furthermore, this thesis illustrates the applicability of the framework by com-paring the M2M languages Java1, QVT-O, and QVT-R with each other. The thesiscomes, therefore, with several M2M scenarios implemented in Java, QVT-O, aswell as QVT-R. Firstly, the thesis provides a qualitative comparison of the consid-ered M2M languages and scenarios. Secondly, it provides quantitative comparisonsbased on metric measurements that target, e.g., the scenario implementations aswell as the qualitative differences of scenarios and languages. The quantitativecomparison focuses on maintainability quality properties like, for instance, mod-ularity and analyzability. Moreover, the qualitative comparison is supported byempirical data collected within an questionnaire that involved ten M2M transfor-mation experts.

The thesis manifests the insights gained by the evaluation and interpretation ofthe collected data within a decision tree. The decision tree provides guidance for

1This thesis considers the combination of Java and the EMF framework to be an M2M language.

127

8. Conclusions

transformation engineers in selecting the M2M language and engine that best fitsin a given M2M scenario. Therefore, this thesis allows a transformation engineerfor a reasonable selection of suitable M2M approaches, languages, and engines.

8.2 Knowledge Gained

The results of this thesis help transformation engineers in selecting a suitable M2Mapproach, language, and engine that best fits the requirements of a given M2Mscenario. Transformation engineers can use the classification of M2M scenarios toidentify these requirements and to characterize the scenario (cf. Section 3.5). Theprovided qualitative comparison of M2M approach/languages/engines combina-tions helps them to determine whether required features are generally providedby the considered M2M approach/language/engine combinations (cf. Section 4.1).Finally, the derived decision tree (cf. Section 7.2) allows transformation engineersto apply reasonable rules to their selection process.

Furthermore, this thesis offers several insights and future work for researchersinterested in M2M transformations. Firstly, the thesis provides new and discussesexisting classifications of the different M2M dimensions. For instance, the pro-vided M2M scenario classification consolidates several findings of related workinto a common model and Chapter 7 points to several features that can be addedto the feature model by Czarnecki and Helsen [CH06] (e.g., debugging abilitiesof transformation engines, an automatically maintained trace model, and supportfor unit testing of transformations). Researchers are invited to investigate, ex-tend, and apply the provided classifications as well as the evaluation results. Inparticular, researchers can include newly identified features into new or existingM2M transformation languages and engines.

Secondly, the thesis provides researchers with a detailed analysis of factors thatinfluence the maintainability of M2M transformation specifications, approaches,languages, and engines (cf. Chapter 7). Researchers can, thus, easier conduct em-pirical experiments that measure maintainability. It is, for instance, necessary toidentify influencing factors when planning a controlled experiment. Furthermore,the thesis provides several insights for understanding the maintainability proper-ties in the context of M2M transformations (cf. Chapter 7) which allows for moreprecise definitions of these properties.

Thirdly, researchers can profit from the described framework for assessing andcomparing the quality of M2M transformations itself. On the one hand, they canexternally validate the results gained by this thesis to strengthen the confidencein these results. For this, the GQM plan allows to reproduce the measurementsof this thesis. On the other hand, researchers now have a structured approach athand which allows to assess and compare quality properties. Therefore, they can(1) reinvestigate maintainability quality properties by, for instance, focusing onmodularity only or (2) derive a GQM plan for other quality properties by applyingthe framework.

128

8.3 Future Work

Finally, the provided scenario implementations in Java, QVT-O, and QVT-R(cf. the appendix in Section B.1) can be reused by transformation engineers as wellas researchers. Transformation engineers benefit from the relatively small size andcomparability of the scenarios as this can enable to learn a particular languageeasily. For instance, if a transformation engineer is already familiar with Java andwants to learn QVT-O, the engineer can simply compare corresponding Java andQVT-O implementations with each other. By this, the engineer particularly seesand learns how QVT-O language constructs can be applied.

Researchers, on the other hand, can inspect and alter the scenarios to comparedifferent design alternatives. Accordingly, they can apply the framework to seehow different design decisions influence quality properties.

8.3 Future Work

This thesis poses several possibilities for future work. One main area of futurework is to cope with the identified threats to validity (cf. Section 7.4). To improvethe statistical power, future work should, therefore, increase the number of (1) ex-periment participants, (2) investigated scenarios, (3) considered M2M approaches,languages, and engines, and (4) covered quality properties.

Future empirical studies that involve questionnaire participants should not ap-ply the questionnaire as used by this thesis. Throughout this thesis, several flawsregarding this questionnaire have been identified (e.g., in Section 7.4). Theseobservations suggest a redesign of the questionnaire that, for instance, does notapply the direct-rating method (cf. Section 6.2.3). However, future work is notonly restricted to questionnaires as, e.g., controlled experiments can be applied,too. This thesis can serve as a basis for these experiments. For instance, differentexperiment groups could be given the task to execute the scenarios of this thesisby using different engines. Each group gets another execution order regarding theengines. The experiment results can show which engine provides the best usabilityand are, thus, covering an additional quality property.

Additional M2M scenarios can either be new scenarios or variations of the con-sidered scenarios. Possible new scenarios are, for instance, the transformation sce-narios from (1) .NET’s CLR byte code to Java’s JVM byte code as implementedand compared for QVT-R and QVT-O by Guduric et al. [GPT09], (2) UML toEJB as provided by Nolte for QVT-R [Nol09] as well as for QVT-O [Nol10, pp.117-141], and (3) the transformations from an M2M language’s AST to a qualitymetrics model as implemented within the “M2M Quality” tool (cf. Section 6.2.2).The first scenario is especially interesting as Guduric et al. [GPT09] compareQVT-O and QVT-R on a subjective basis instead of following a structured ap-proach. Also the MOM scenario can be reconsidered to investigate a scenario thatspecifies a higher-order transformation (cf. Section 6.1). Variations of existing sce-narios, on the other hand, allow to investigate design alternatives and influencesof certain factors. For instance, a metamodel could be systematically altered by

129

8. Conclusions

increasing its number of metaclasses. This can, e.g., allow to study the impacton the execution time of transformation engines (Bosems [Bos11] investigates thisissue in his Master’s thesis).

Examples for further M2M languages are ATL [Ecl12a], Xtend [Ecl12d], andtriple graph grammars [Sch95, KS06]. Besides Java, QVT-O, and QVT-R, theselanguages are commonly applied within the MDSD community [SK09].

Another area of future work targets the extension and improvement of existingM2M languages and engines. For instance, the thesis concludes that QVT-R isespecially suited for implementing scenarios that need to retain user modifications(cf. Section 7.2). This is due to the lack of dedicated support for this feature inQVT-O and Java. Therefore, the QVT Operational engine could be enrichedby a new construct providing support for this feature. Another possibility is toimplement a dedicated Java library for transformations which includes this feature.

Finally, the answers to the GQM plan provide a rich source of recommendationsfor future work (Section 7.1 and Section C.5). These recommendations focus oninvestigating maintainability properties of M2M transformations further.

130

Appendix A

Feature Models

Feature models are models to describe and analyze domains by means of a hierar-chy of variable and common features of the domain [CH06]. Feature models wereintroduced by Kang et al. [KCH+90] and later extended and described in detailby Czarnecki and Eisenecker [CE00, Chap. 4]. Furthermore, Kim and Czar-necki [CHE04] proposed the “cardinality-based feature modeling notation” whichextends feature models by cardinalities for features. Czarnecki et al. [CHE05]formalized these concepts afterwards and applied the cardinality-based featuremodeling in several of their papers (e.g., in [KC05] and [CH06]).

This thesis uses the cardinality-based feature modeling notations as appliedby Kim and Czarnecki [KC05] and Czarnecki and Helsen [CH06] for two reasons:(1) this thesis heavily refers to the work of Czarnecki and Helsen [CH06] and reusestheir feature models that, in particular, include cardinalities (cf. Chapter 3), and(2) the feature models introduced within this thesis also require cardinalities (e.g.,the feature model illustrated in Figure 3.18).

Element Description

F Root feature F

F Mandatory feature F (cardinality [1..1])

F Optional feature F (cardinality [0..1])

F

[m..n]

Mandatory clonable feature F (cardinality [m..n] with 0 ≤ m ≤ n and n > 1)

F Grouped feature F (cardinality [0..1])

F Feature model reference F

XOR-feature-group (cardinality 〈1− 1〉)

OR-feature-group (cardinality 〈1− k〉 where k is the group size)

Table A.1: Cardinality-Based Feature Modeling Notation (derived from [CH06])

131

A. Feature Models

Therefore, Table A.1 describes the applied elements for visualizing feature mod-els, i.e., their concrete syntax. This thesis makes use of eight distinct featureelements: (1) root, (2) mandatory, (3) optional, (4) mandatory clonable, and(5) grouped features, (6) feature model references, as well as (7) XOR-feature-groups and (8) OR-feature groups.

The root feature is the only feature which has no parent features. The root andthe other features can contain children features which can either be solitary orgrouped. Firstly, solitary features include the mandatory feature which is alwaysa feature of the parent feature, the optional feature which can be a feature of theparent feature, and the mandatory clonable feature which occurs at least once asa children feature of the parent.

Secondly, grouped features are optional features of a group of features. Groupsof features are either specified via an XOR-feature-group which allows exactly onefeature to be part of the parent, or via an OR-feature-group which allows multiplefeatures to be part of the parent. Table A.1 also shows the cardinalities inducedby these descriptions. Finally, a feature model reference allows to link to sub-feature models: if it is used as a leaf feature, it links to another feature modelrepresenting its sub-features. This feature model is identified via a feature modelreference used as the root of the feature model. A diagram that applies theseelements for feature models is referred to as feature diagram.

132

Appendix B

Deliverables

B.1 Case Study Implementations

The implementations of the different case studies (cf. Section 3.5.3) specified viaJava, QVT-O, and QVT-R come with the deliverables of this thesis. They areincluded within the de.upb.m2m.quality.casestudies plug-in. The Java imple-mentation reside under the package de.upb.m2m.quality.casestudies.javaemf.The QVT-O and QVT-R implementations can be found within the qvto and qvtr

folder, respectively.Along with the scenario implementations come the respective metamodels and

example models. The metamodels are included within the metamodel and themodels within the model folder of de.upb.m2m.quality.casestudies. In addi-tion to this, Figure B.1 and Figure B.2 illustrate the involved metamodels of the“SimpleUML to SimpleRDBMS” case study and Figure B.3 the metamodel usedwithin the “Medini QVT’s Shapes-Tutorial” case study as this thesis is especiallyinterested in these (cf. Section 6.1).

133

B. Deliverables

Figure B.1: SimpleUML Metamodel

134

B.1 Case Study Implementations

Figure B.2: SimpleRDBMS Metamodel

135

B. Deliverables

Figure B.3: Shapes Metamodel

B.2 Measurement Tool “M2M Quality”

Figure B.4 shows the component diagram for the “M2M Quality” tool as intro-duced in Section 6.2.2. The illustrated components provide the capabilities toexecute the process shown in Figure 6.1.

The main component is the tool “M2M Quality” itself. It provides the meansto measure several metrics within Java, QVT-O, and QVT-R. The output canbe model instances of the quality model included in M2M Quality as well asa table conforming to the GQM plan of this thesis. To work properly, M2MQuality requires the Palladio Workflow Engine1 for structuring the process ofFigure 6.1 within the implementations as well as a component for each supportedlanguage to parse and create an AST of the respective transformation implemen-tation (MoDisco2 for Java, QVT Operational for QVT-O, and Medini QVT forQVT-R).

The M2M Quality component owns nine basic components. These componentsare described in the following.UI provides the user of M2M Quality with the capabilities to configure and exe-

cute M2M Quality run configurations within the Eclipse IDE. A user can configure

1http://sdqweb.ipd.kit.edu/wiki/Palladio_Workflow_Engine (last retrieved 2012-10-17).

2http://www.eclipse.org/MoDisco/ (last retrieved 2012-10-17).

136

http://sdqweb.ipd.kit.edu/wiki/Palladio_Workflow_Engine

http://www.eclipse.org/MoDisco/

B.2 Measurement Tool “M2M Quality”

M2M Quality

UI Configurator

Parser

Languages

Java

QVT-O

QVT-R

Model

Palladio Workflow Engine

MoDisco

QVT Operational

Medini QVTGQM

Figure B.4: Component Diagram of M2M Quality

137

B. Deliverables

the location of Java, QVT-O, and QVT-R files as well as the locations of tracefiles and output files in general. When the run configuration is executed, UI callsthe Configurator and passes the run configurations to it.

The Configurator creates and executes the workflow of the process shown inFigure 6.1. Each action of the process is implemented within a separate work-flow job. The Configurator uses the Palladio Workflow Engine as underlyingframework for implementing the workflow. For receiving language-specific infor-mation (e.g., the name of the M2M language currently parsed or the language-specific parser), it calls the Languages component. For executing the parsing, itrequires the Parser component.

The Languages component provides the interface to language-specific informa-tion. A language-specific component can register itself to Languages such thatLanguages is capable of receiving language-specific information (e.g., the lan-guages’ name). Furthermore, Languages provides methods to count different linesof code metrics within any M2M transformation language. Registered language-specific components only have to provide the information which comments therespective language supports (e.g., “//” are one-line comments and “/* ... */”multiple-line comments in Java).Java, QVT-O, and QVT-R are examples for language-specific components. Each of

these components makes use of an appropriate framework to parse and create anAST (as described above). Furthermore, each language-specific component needsto know the M2M Quality Model component as it must specify the transformationsthat map from its AST representation to the quality model of M2M Quality.Model contains the quality model of M2M Quality. The model allows to store

metric measurements as a model instance.Finally, the GQM component realizes the M2T transformation to the GQM table.

For this, it uses the M2T language Xpand3. Currently, it needs to be manuallyinvoked by the user of M2M Quality.

3http://www.eclipse.org/modeling/m2t/?project=xpand (last retrieved 2012-09-06).

138

http://www.eclipse.org/modeling/m2t/?project=xpand

B.3 Questionnaire

B.3 Questionnaire

The questionnaire handed out to the participants is availableat https://docs.google.com/spreadsheet/viewform?formkey=

dDF3X0phQk9BWjVOenZuWlhYb1k0QlE6MQ (last retrieved 2012-10-17). Thequestionnaire provided below reproduces this questionnaire.

Description

The goal of our research is to make the quality of model-to-model transforma-tions in Java (with EMF), QVT-R, and QVT-O measurable and comparable. Toinvestigate the influences on the quality of Java, QVT-R, and QVT-O model trans-formations, we have to know what their perceived quality is. Therefore, we wouldlike you to participate in our experiment. We would like you, as an MDSD expert,to evaluate the quality of (a number of) Java, QVT-R, and QVT-O model trans-formations manually. The strength of the results of this experiment will benefitfrom the amount of model transformations that are evaluated. Filling out thissurvey will take approximately 15 to 20 minutes of your time.

Introductions

Download the material needed for this survey first: https://docs.google.com/

uc?id=0B4_NA9DeRAbMaHBkQnpHY19FWk0&export=download You were assigned tosome of the transformation scenarios provided by the material. You only have toevaluate those scenarios. Start with the first one assigned to you and follow theinstructions provided below.

Start with the first scenario assigned to you and follow these introductions:

1. Open the scenario implementations in Java, QVT-R, and QVT-O.

2. Open the metamodel(s) required for the scenario (Copy, Rule1, . . . , Rule12require the “Shapes” metamodel; UML2RDBMS the “SimpleUML” and“SimpleRDBMS” metamodels)

3. Evaluate the implementation first for Java, then in QVT-R, and finally inQVT-O.

4. Fill out the questionnaire provided below. Note the following:

• The term “transformation language” refers to the respective model-to-model language in general.

• The terms “transformation” as well as “transformation implementa-tion” refer to the implementation provided by the material.

• The term “transformation scenario” refers to what has to be trans-formed and not how the scenarios are implemented.

139



https://docs.google.com/uc?id=0B4_NA9DeRAbMaHBkQnpHY19FWk0&export=download

https://docs.google.com/uc?id=0B4_NA9DeRAbMaHBkQnpHY19FWk0&export=download

B. Deliverables

5. Submit the questionnaire.

6. Continue at “1.” with the next scenario assigned to you (until you haveevaluated every assigned scenario).

Survey

Background questions

1. Please enter your name.

2. Please enter your e-mail address. We might use this to contact you withregard to your answers to the open questions.

3. How would you rate your knowledge of Java (with EMF)?

1 2 3 4 5Very Low # # # # # Very High

4. How would you rate your knowledge of QVT-R?


5. How would you rate your knowledge of QVT-O?


6. What is the number of the transformation scenario you are evaluating?

7. Please enter the name of the transformation scenario you are evaluating.

8. How familiar are you with the transformation scenario you are analyzing?

1 2 3 4 5I never heard about it # # # # # I designed it

140

B.3 Questionnaire

Evaluation questions

1. To what degree is the transformation composed of distinct components suchthat a change to one component has minimal impact on the other compo-nents?

1 2 3 4 5 6 7Java: Very Low # # # # # # # Very High

QVT-R: Very Low # # # # # # # Very HighQVT-O: Very Low # # # # # # # Very High

2. To what degree is the transformation implemented in a uniform manner?



3. How would you rate the reusability of the transformation?



4. How would you rate the analyzability of the transformation?



5. To what degree is it possible to alter the transformation without introducingdefects?



6. How would you rate the appropriateness recognizability (also known as un-derstandability) of the transformation?



141

B. Deliverables

7. To what degree can the transformation language be taught to use it witheffectiveness, efficiency, freedom from risk, and satisfaction?



8. How would you rate the consistency of the transformation?



9. To what degree is it possible to diagnose the transformation for deficiencies?



10. How would you rate the modularity of the transformation?



11. How would you rate the learnability of the transformation language?



12. To what degree is it possible to recognize whether the transformation isappropriate for the transformation scenario?



142

B.3 Questionnaire

13. How would you rate the modifiability of the transformation?



14. To what degree is it possible to apply the transformation rules the transfor-mation specifies also in other scenarios?



15. Special task (if you currently are evaluating RuleX ∈ {Rule1, . . . , Rule12}:Assume the implementation of RuleX is created on the basis of the im-plementation of the “Copy” implementation (open it in the respective lan-guage). Count and name newly introduced inconsistencies (if any) in theimplementation of RuleX. Think of elements as variables, rules, etc.

Java:

QVT-R:

QVT-O:

143

B. Deliverables

Open questions

Please answer these questions at least once. If you fill out this questionnaire asecond time, you can leave the fields below empty. You are allowed to answer ingeneral or to give examples. You do not need to reason for your opinion eventhough this might be helpful.

1. What affects modularity of a model transformation in your opinion (eitherpositively or negatively)?

2. What affects reusability of a model transformation in your opinion (eitherpositively or negatively)?

3. What affects analyzability of a model transformation in your opinion (eitherpositively or negatively)?

4. What affects modifiability of a model transformation in your opinion (eitherpositively or negatively)?

5. What affects consistency of a model transformation in your opinion (eitherpositively or negatively)?

6. What affects appropriateness recognizability (also known as understandabil-ity) of a model transformation in your opinion (either positively or nega-tively)?

144

B.3 Questionnaire

7. What affects learnability of a model transformation in your opinion (eitherpositively or negatively)?

8. If you have any other remarks, please put them in the text field below.

145

Appendix C

Results

This chapter provides the results of the measurement as planned by the GQMplan (cf. Chapter 5). Section C.1 gives the results of the questionnaire. Next,Section C.2 presents the results which were measured without requiring a scenarioimplementation. Thereafter, Section C.3 provides a complete measurement tablefor each scenario and Section C.4 provides diagrams illustrating the measurementvalues of the measurement tables. Section C.5 concludes the chapter by providinga detailed evaluation of metric measurements and stated hypotheses.

C.1 Questionnaire Results

This section presents the results collected within the questionnaire. Section C.1.1provides the values for the different quality properties as evaluated by the question-naire participants for each scenario and M2M language. Afterwards, Section C.1.2gives the answers of the participants to the open questions of the questionnaire,i.e., answers to the question which factors affect the different quality properties.

C.1.1 Average X Points in Questionnaire (GM1.2.i)

Table C.1 shows the average values for the different quality properties as evaluatedby the questionnaire participants. For each quality property, two questions Q1and Q2 were asked (cf. Chapter 5). The average of these questions is build over thenumber of collected answers to these questions per scenario and M2M language,respectively. Table C.1 presents these average values in the “Q1” and “Q2” rows ofthe respective quality property. The “Average” row is the average of the respectiveQ1 and Q2 values (i.e., the sum of the value of Q1 and Q2 divided by two). Eachof these “Average” rows is additionally illustrated by a diagram (Figure C.28 toFigure C.34). The “Std. Dev.” row gives the standard deviation corresponding tothe average value; it is calculated based on Bessel’s correction (cf. Section 5.4.1).

147

C. Results

!"#$%&'

()*+,-*&'

./01234/5

6+,'

2"$-7

2"$-1

8#9-

:',-

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

/+@"$#*%&'

!"

#$%&'$((

'$(()$&*

#$%&%$""

#$'('$((

%$'(#$'(

#$'(%$'(

!+

)$(()$%&

'$((#$&*

#$%&'$'%

#$'()$((

%$'()$'(

)$'(%$'(

,-./01.#$*#

)$*#'$((

)$+*#$%&

'$*##$'(

)$'(%$'(

)$(()$((

%$'(234$56.-$

($&%"$%"

"$((+$""

#$"%"$(7

($&""$)"

($&"($((

+$"+($&"

2-"A#B%$%&'

!"

#$%&)$##

)$%&)$))

'$++)$++

)$'('$((

)$'()$((

'$'()$'(

!+

+$%&#$%&

#$%&)$##

)$*7#$*7

#$'('$((

)$'()$'(

%$(()$'(

,-./01.#$"&

)$(()$"&

)$#7'$(%

)$(%)$((

'$(()$'(

)$+''$&'

)$'(234$56.-$

"$()"$((

"$"'"$#%

+$()+$"%

($&"($&"

#$')($#'

($#'#$')

CD#$'E#B%$%&'

!"

#$##)$%&

'$%&#$%&

'$"")$++

)$((%$'(

'$(()$'(

%$(('$((

!+

+$%&'$((

)$%&#$&*

'$%&)$'%

)$'(%$'(

'$(()$((

%$(()$'(

,-./01.#$((

)$*#'$"&

#$&+'$#7

)$#7)$+'

%$'('$((

)$+'%$((

)$&'234$56.-$

($'("$"'

($+7"$7)

+$%++$((

"$(%($&"

"$)"($#'

($(("$&&

/+@%F%#

B%$%&'

!"

)$##'$((

'$((#$%&

#$(('$%&

#$(()$'(

%$'(#$'(

'$((%$'(

!+

)$%&'$%&

)$##)$++

#$++)$++

#$(()$'(

%$((#$((

)$'('$'(

,-./01.)$'(

'$##)$%&

#$7)#$""

)$7)#$((

)$'(%$+'

#$+')$&'

%$((234$56.-$

"$'("$"'

"$'#"$*"

"$%'"$%"

"$)"+$*#

($#'"$(%

"$&&($&"

6+DA%A&-

DG'

!"

#$%&'$%&

'$%&)$&*

%$""'$%&

#$'('$((

%$'(#$'(

)$((%$'(

!+

)$%&'$((

'$##)$&*

%$*7%$%&

#$'('$((

%$'()$((

)$((%$'(

,-./01.)$"&

'$##'$'(

)$&*%$'(

%$"&#$'(

'$((%$'(

#$&')$((

%$'(234$56.-$

"$()"$"'

($*&"$&+

"$(("$(7

($((($&"

($&""$(%

"$)"($&"

C,,*+,*H(2

-G+IDH!"

)$(()$%&

'$##)$++

%$&*#$##

)$'(%$'(

%$(()$((

%$(()$'(

!+

#$##'$((

'$##)$++

%$))#$##

#$'(%$'(

)$'()$'(

%$(()$'(

,-./01.#$%&

)$*#'$##

)$++%$%"

#$##)$((

%$'('$+'

)$+'%$((

)$'(234$56.-$

($'*($&%

($'*"$'+

($*%"$%%

($&"($&"

"$&&($#'

($((+$"+

0-#*D#B%$%&'

!"

#$##'$##

)$(()$##

'$*7#$'%

)$'(%$'(

'$((#$'(

%$(()$((

!+

#$%&%$((

)$###$&*

%$((#$&*

#$'(%$((

)$'(#$'(

'$'()$'(

,-./01.#$'(

'$%&)$"&

)$(%'$7)

#$%&)$((

%$+')$&'

#$'('$&'

)$+'234$56.-$

"$#+($+7

"$))"$#"

($7+"$#(

($&""$(%

"$&&($&"

($#'"$&&

148


2"$-J

2"$-K

2"$-L

2"$-M

2"$-N

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

)$'('$'(

'$(()$%&

'$##%$%&

%$(('$##

'$##'$##

'$((%$##

'$##)$((

%$###$'(

)$'('$'(

%$##%$((

'$%&%$((

'$%&'$##

'$%&)$%&

%$##'$##

)$##%$##

)$(('$((

'$+''$'(

'$%&%$"&

%$(('$'(

'$##'$'(

)$*#%$##

'$##)$"&

%$##($((

"$)"+$)&

"$*("$"'

($+7"$((

"$#+"$"'

"$*(+$(+

($'*"$'#

($&%($'*

)$'(%$((

)$((%$((

)$##'$%&

'$%&'$((

%$(('$##

#$##%$%&

#$###$%&

'$##)$((

)$'('$((

%$(('$##

%$(('$%&

'$(()$%&

'$((#$%&

%$((#$%&

#$##'$((

)$+''$+'

)$'(%$((

)$*#'$*#

'$%&'$((

'$##'$"&

#$'(%$##

#$'(#$'(

'$"&($#'

($#'+$*#

($'("$'#

($&%($'*

"$((($&%

"$()($*&

($'*"$#+

"$'("$()

)$'(%$((

'$(()$##

'$%&#$##

%$(('$##

#$((%$((

%$##)$##

#$##)$%&

)$(()$'(

%$(()$'(

)$%&'$%&

#$##'$%&

'$%&#$((

'$%&%$##

)$##)$%&

)$%&#$%&

)$'(%$((

)$&')$'(

'$%&#$##

'$*#'$'(

#$(('$*#

%$##)$##

)$(()$%&

#$*#($&"

($(("$&&

+$"*($'*

+$#"($+7

($'(+$((

($+7($'*

"$&%"$'(

"$"'($&%

)$'(%$((

)$'(%$((

#$%&#$%&

%$(()$%&

#$##%$((

)$%&#$%&

'$##)$((

)$##)$((

'$'()$((

'$(()$##

#$##%$((

)$%&#$((

'$%&)$((

)$%&)$%&

#$%&#$%&

)$+''$&'

)$+''$'(

)$((#$'(

%$(()$%&

#$"&'$*#

)$##)$"&

'$((#$*#

)$((($#'

($#'+$)&

"$*("$*(

#$()($*&

"$))+$(+

"$()+$(+

+$(+"$#+

+$(+"$&#

)$'('$'(

'$(()$%&

%$((%$##

%$((%$((

%$(('$##

'$%&%$##

'$##'$##

%$%&)$((

'$'('$'(

'$##%$((

%$%&'$%&

'$%&%$((

'$##%$((

%$%&'$%&

%$((%$##

)$+''$'(

'$+''$((

%$((%$'(

'$*#'$*#

%$(('$##

'$*#%$'(

'$'('$%&

%$'(($#'

($&""$&&

+$%'"$((

($'("$+%

"$+%"$((

+$(*"$+%

($'("$#+

"$+%($*&

'$((%$'(

'$(('$##

%$((#$##

%$(('$%&

#$((%$((

%$##'$((

'$(('$((

#$%&)$'(

%$'('$'(

)$%&)$%&

'$%&'$((

'$###$%&

'$((%$((

)$(()$%&

)$%&)$%&

)$&'%$'(

'$+''$((

'$##)$'(

'$'('$'(

#$##'$'(

%$"&)$'(

)$*#)$*#

)$"&($#'

($&""$&&

($'(($&%

"$*(($'(

($'(+$+'

($'(($+7

($*&($+7

($&%($+7

)$'(%$((

)$'()$##

'$((#$%&

)$##'$((

+$%&)$##

'$((+$%&

)$%&#$%&

+$%&)$'(

%$(()$'(

#$##'$%&

+$%&'$((

'$%&#$((

)$##'$%&

#$(()$##

)$((+$%&

)$'(%$((

)$'(#$*#

'$###$"&

)$%&'$##

+$*#)$##

'$##+$*#

)$'(#$*#

+$%&($&"

($&"+$"+

($&%($&%

"$&%($&%

($'*"$+%

($'*"$'#

"$%"($*&

"$&%"$'#

149

C. Results

2"$-O

2"$-P

2"$-7Q

2"$-77

2"$-71

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

;#<#

!=:>?

!=:>2

)$(()$%&

'$##)$&'

'$&''$&'

)$(()$((

'$###$##

+$##'$%&

#$((+$##

'$%&#$%&

)$%&'$((

)$'('$'(

)$&')$%&

)$%&)$%&

#$((+$((

)$%&#$##

+$%&'$((

#$*#)$%&

'$"&)$%#

'$%#'$+'

)$##)$##

'$((#$"&

+$"&'$"&

#$"&+$'(

'$##+$(*

"$"'"$()

"$+'($)*

($%'"$+%

"$()($*&

($'*"$+%

($'*"$"'

"$*(($&%

+$%&+$%&

+$%&#$+'

)$(()$((

+$%&#$%&

)$###$((

+$(()$((

"$%&"$%&

#$((+$##

+$((+$##

#$&''$((

)$&')$##

)$##)$((

+$%&+$%&

)$%&+$%&

+$###$%&

+$'(+$##

+$'(#$'(

)$'()$#*

#$'()$((

)$"&+$*#

+$##)$##

+$"&+$((

#$##"$#+

"$'#"$*(

"$++"$(*

"$"""$#+

"$'("$))

"$'#"$'#

"$%"($+7

($'(+$(+

#$##)$%&

+$(('$'(

'$&')$&'

)$((+$##

#$##'$((

)$###$((

'$(()$%&

#$###$##

)$##+$##

'$&'%$((

'$(('$((

)$%&+$%&

'$##)$##

#$(()$##

)$##+$%&

#$##)$'(

+$"&'$%#

'$**)$**

)$'(#$'(

#$(('$"&

)$###$((

)$%&)$'(

#$(("$'#

"$*(($&%

($)*"$(#

($+'"$&#

"$(("$#+

"$'#+$(*

"$(("$+%

+$((($*&

#$##+$%&

+$##)$'(

'$&')$&'

)$(()$##

#$((#$%&

#$%&)$##

)$%&'$((

#$###$%&

#$##+$%&

)$&''$'(

)$&')$##

)$###$##

)$##)$##

#$%&#$%&

)$###$%&

#$'(#$((

+$'()$%#

'$%#)$&'

)$"&)$##

#$"&)$((

)$(()$((

)$"&)$%&

#$'(+$'(

+$(("$#+

"$&'($)*

($%'($&%

"$()($'*

"$*(+$((

($*&"$+%

"$"'"$((

)$(()$##

'$%&)$&'

)$&')$+'

)$(()$((

#$%&#$##

#$%&)$%&

#$((#$##

)$(()$%&

'$##%$((

)$'('$&'

%$'(#$##

#$%&'$((

+$%&#$((

'$###$##

#$##'$##

)$##)$*#

'$*#)$%#

'$+''$#*

#$%&#$*#

)$###$((

#$##'$((

#$"&#$##

)$%&+$&'

"$%"($+7

($*'($%'

($7'"$"'

"$*7"$()

($*&"$()

($*&($&%

"$()"$+%

#$##)$##

+$(()$'(

'$&'#$&'

'$(()$((

+$##'$((

)$%&+$##

)$%&'$##

#$((+$%&

#$%&#$##

)$+'%$+'

)$+')$((

)$###$((

)$%&'$((

)$(()$%&

'$(()$((

#$(()$((

+$%&)$#*

%$(()$((

)$'()$"&

+$%&)$*#

)$*##$"&

)$%&'$"&

#$'("$#+

"$&#"$'#

"$&'"$(*

($&"($((

"$()"$%"

"$%""$()

($'*"$'#

"$+%"$&#

)$%&)$%&

+$##)$'(

%$+'#$((

'$((%$((

+$%&)$%&

'$((#$((

'$(('$((

#$(()$((

)$%&+$((

)$+'%$((

)$'()$((

'$%&#$%&

)$%&'$##

#$(()$%&

'$((#$##

)$##)$%&

+$"&)$#*

%$"##$&'

)$'('$*#

#$"&)$%&

'$"&#$((

)$*#'$((

#$"&($'*

($'*"$+%

"$%(($*'

"$+%"$#+

($+7"$+%

"$'#"$))

"$(("$+%

"$&#($&%

Table C.1: Averaged Points of Quality Properties in Questionnaire

150


C.1.2 Answers to GQ2 in Questionnaire (GM2.11)

This section reproduces the questionnaire answers to the GQ2 questions (GM2.11)as provided by the questionnaire participants. For this, the section provides theanswers per quality property via a list. Each list entry corresponds to the answerby exactly one participant regarding the respective question. In case a participantused line breaks or hyphens for creating a custom list within an answer, thecorresponding list entry uses a sublist for expressing this.

Modularity: “What affects modularity of a model transformation in youropinion (either positively or negatively)?”

• – Model Navigation (positive) if you can access forward and backwardyou can better split the matching and even can choose the order ofmatchings.

– Readability (negative) if a separation makes it much more difficult tounderstand it is often not done.

• Modularity is negatively influenced by global data variable, e.g., hash mapsor lists. Then, nearly all operations rely on these global containers such thatthe different parts cannot be split. It is positively influenced by the abilityto have user-defined helper functions for repeating sub-tasks.

• If a transformation consists of distinct components then it is easier to changeconcrete components (like methods, relations, rules etc.) such that othercomponents must not be changed.

• – Local analysability of sub-rules (+)

– Interdependent rules (-)

– Many calls to other rules (-)

• – +: few other method/transformation rule calls

– +: small amount of parameters

– +: division of larger rules into smaller ones (however, the points aboveare important in this case)

• – Number and size of rules

– Relation between rules

• In a relational language you seem to be forced to program tranformations[sig] in a modular way.

151

C. Results

• Matching between meta-model constructs and transformation functions (pos-itively), e.g., one function for each class in the meta-model

• – The ability to create transformation rules at arbitrary levels of granu-larity. This would allow for the creation of very fine-grained rules whichcould then be used to build bigger ones.

– Black box library functions, like in the copy scenario, affect the mod-ularity in a negative way. It may be convenient to use them but theycan only be used “as provided”.

• Neg: Direct dependencies between rules

Reusability: “What affects reusability of a model transformation in youropinion (either positively or negatively)?”

• – Bound to special classes (negative)

– Methode [sig] calling and nesting (positive)

• – Dependence to meta-model (-)

– Non-existing library concepts (-)

– No real good inheritance / module concept (-)

• – +: high modularity

– +: “abstract” transformation parts

• Again, global data variables, e.g., hash maps or lists. Then, the code cannotbe reused in another transformation. Thus, it should be possible that eachsingle rule only relies on one meta-model element on the source side (or thetarget side depending on the direction).

• If you can reuse components of a transformation (like methods, relations,rules etc.), you need less effort to create new transformation.

• Modularity :)

• Modularity positively affects reusability, because components can be reusedfor similar transformations with only minor adaptations.

• Independence of target/source meta-models (positively)

• Reusability requires a balanced transformation. It has to be understandable,so the user can understand what it does and if it fits his purpose. It has tobe of general such that there are cases in which it can be applied besidesthe one that it was originally designed for. For that it may not be too largeto avoid doing “too much”. It has to be large enough so that reusing itprovides a reasonable time saving even when the transformation has to befound and understood first.

152


• Neg: Hard-coded model access (e.g., using EMF functions), worst when usedinside the actual transformation methods

Analyzability: “What affects analyzability of a model transformation in youropinion (either positively or negatively)?”

• Binding order: if you have when clauses you have to read from back to frontwhat makes it much more complicated.

• A trace model to analyze is good, a well-designed debugger is better, bothis perfect. In addition, some control flow in the transformation helps iden-tifying where the error occured [sig]. If the complete operationalization ishidden in the interpreter, it is hard to comprehend the effect of the trans-formation. Rich helper functions influence analyzability negatively becausethey hide part of the intend.

• If a transformation is analyzable then

– it is easier for developers to work with it

– it is easier to find errors in the transformation

• – Many lines of code (-)

– Many [sig] infrastructure code (-)

– Bad documentation (-)

– Missing pre- / post-conditions (-)

– Bad syntax (-)

• – -: complex flow within one transformation rule

– -: complex transformation rule call hierarchy

– +: documentability ;-)

• – Tool support

– Rate of LOC that actually contribute to the transformation itself (lowin java, high in transformation languages)

• A clear mapping structure, like in QVT-r is easier to understand and analyze.

• – Use of external libraries (negatively)

– Ability to be debugged (positively)

• The length (which is greatly affected by the availability of dedicated libraryfunctions). A great example is the comparison of the Copy scenario in Javaand QVT-O. The former is cluttered with far too many lines of code whichdo not contribute to the transformation itself. The latter is very simple.

• Pos: Modularity, good documentation

153

C. Results

Modifiability: “What affects modifiability of a model transformation in youropinion (either positively or negatively)?”

• – Specialization can allow easier modifications for single elements, sinceother parts of the transformations will be not effected.

– Modularity can be helpful for the same reason.

• – No local analysability (-)

– No unit tests (-)

• – -: low analyzability

– -: side effects

• – Few files/functions (positively)

– Usage of external libraries (negatively)

• An automatically maintained trace model affects it positively. In addition,treating every meta-model element in a separate rule with an automaticdispatching based on the meta-model elements type would greatly increasemodifiability.

• If a transformation is modifiable, you can change some elements like namesof variables if you need it, such that the transformation stays the same.

• Modulatity [sig] again

• A modular structure makes it easier to find the places where to modify atransformation.

• Dedicated library functions affect the modifiability negatively. Its dead sim-ple to understand the copy scenario in QVT-O but nothing can be done tomodify it. Its easier to modify transformations that are built from a numberof smaller building blocks which can be modified themselves. It stands toreason that relational transformations are a good candidate for that.

• Pos: Analyzability, modularity

Consistency: “What affects consistency of a model transformation in youropinion (either positively or negatively)?”

• – Naming

– Structure

– Using the same code for the same tasks

• – Naming conventions for functions (positively)

154


– Similar meta-model constructs are handled differently (negatively)

• Consistency depends often on analyzabiliy [sig] and recognizability. It nor-mally comes only to inconsistencies if there is a lack or the transformationlanguage lacks of expressivness [sig].

• Pro: Control flow

• If the code of a transformation is consistent, it is easier to read it.

• +: partition of transformation in consistent parts, i.e. one method/trans-formation rule per meta model element

• Mainly the programmer has influence on the consistency of a transformation– either positive or negative.

• The design of the language can greatly affect the consistency. If the languageprovides similar but distinguishable constructs for similar tasks that cangreatly improve consistency.

• Pos: Similar calculations/relations are implemented in the same way, docu-mentation describes what is actually implemented

Appropriateness Recognizability: “What affects appropriatenessrecognizability (also known as understandability) of a model transformationin your opinion (either positively or negatively)?”

• – Pos: concise syntax, hiding of repeated maintenance code (load andsave of the models, initialization of the transformation)

– Neg: rich interpreter implementation hiding a lot a execution informa-tion, a huge set of keywords and short-cuts in the language

• – Short language concepts to query, to map, to access trace (+)

– Closeness to existing programming languages (-)

– Few but clear language concepts (+)

– Reuse of rule structure (+)

• – +: high analyzability

– -: large overhead of the programming/transformation language, due toits syntax

• – Neg: Size of the transformation (e.g., LoC)

– Pos: Modularity, good documentation

• Commenly [sig] used structures like method call makes it easier.

155

C. Results

• If a transformation is understanable [sig] then it is easier for developers towork with this transformation if they did not created it.

• Again, I think the clear mapping structure of QVT-r helps to understand atransformation.

• Usage of external libraries (positively)

• In contrast to consistency, I think that appropriateness recognizability doesnot depend so much on the language design but on the language use, i.e.,on the concrete transformation. A transformation can have the same effectbut can be written either obscurely or clearly.

Learnability: “What affects learnability of a model transformation in youropinion (either positively or negatively)?”

• – Pos: integration with heavily used other standard languages (e.g.,OCL), a suitable DSL (Java is no transformation language!)

– Neg: rich interpreter implementation hiding a lot a execution informa-tion, a huge set of keywords and short-cuts in the language

• If a transformation language is learnable then

– you need less time to learn it

– it is easier to understand existing transformations

– it is easier to create own transformations

• – Few but clear language concepts (+)

– Good, standardized documentation (+)

– Less infrastructure code (+)

• – +: good understandability

– +: “forced” structure of transformation

• – Way of thinking (imperative will be easier to learn for most of theprogrammers)

– Number of keywords

– Resemblance to existing well-established languages

• – Ability to be debugged (positively)

– Usage of a general purpose language (positively)

• Same as recognizability. And how easy the matching concept works.

156

C.2 Results Independent of Scenario Implementations

• The expressiveness of a language negatively affects its learnability. A customDSL for model transformations is easier to learn.

• I think the learnability is affected by properties of both, consistency andappropriateness recognizability. A consistent and understandable transfor-mation is easier to learn (and teach by the way).

• Neg: Several possible ways of implementing a transformation exist

C.2 Results Independent of ScenarioImplementations

This section presents all measurement results which are independent of a concretescenario implementation. This includes the qualitative differences of M2M ap-proach/language/engine combinations and M2M scenarios (Section C.2.1), learn-ability metrics related to the number of language constructs (Section C.2.2 andSection C.2.3), as well as learnability metrics related to the size of an M2M lan-guage documentation (Section C.2.4).

C.2.1 Qualitative Differences (GM2.1)

This section presents the measurement data for the metric “GM2.1” (cf. Sec-tion 5.4.2) which asks for the “qualitative differences identified in Chapter 4”.

Differences in M2M Approach/Language/Engine Combinations

The following list gives the qualitative differences in M2M approach/lan-guage/engine combinations as required for the metric “GM2.1”. Table 4.1 inducesthe listed sets.

∆ F(Java, QVT-R) = { Static Mode, Language Paradigm, Value Specifica-tion, Element Creation, Syntactic Separation, Multidirectionality, Applica-tion Conditions, Intermediate Structures, Parametrization, Reflection, Im-plicit Scheduling, Rule Iteration, Reuse Mechanisms, Target-Incrementality,Source-Incrementality, Preservation of User Edits, Directionality, Tracing }

∆ F(Java, QVT-O) = { Application Conditions, Intermediate Structures,Parametrization, Phasing, Tracing }

∆ F(QVT-R, QVT-O) = { Static Mode, Language Paradigm, Value Specifica-tion, Element Creation, Syntactic Separation, Multidirectionality, Interme-diate Structures, Parametrization, Reflection, Implicit Scheduling, Rule It-eration, Reuse Mechanisms, Target-Incrementality, Source-Incrementality,Preservation of User Edits, Directionality }

157

C. Results

The cardinalities of these sets are |∆ F(Java, QVT-R) | = 18, |∆ F(Java,QVT-O)| = 5, and |∆ F(QVT-R, QVT-O)| = 16.

Differences in M2M Scenarios

The following list gives the qualitative differences in M2M scenarios as requiredfor the metric “GM2.1”. Table 4.2 induces the listed sets. Some pair-wise differ-ences are omitted as there would be too many possible combinations. However,the Copy scenario and Rule1 to Rule12 have most features in common such thata complete pair-wise comparison would also be of low value. Therefore, the dif-ferences between the Copy scenario to the other scenarios is sufficient.

∆ F(Copy, SimpleUML to RDBMS) = { Endogenous, Exogenous, Static Mode(In and Out vs. In/Out), Flattening, Refinement }

∆ F(Copy, Rule1) = { Mapping }

∆ F(Copy, Rule2) = { Flattening }

∆ F(Copy, Rule3) = { Refinement }


∆ F(Copy, Rule5) = { Abstraction }

∆ F(Copy, Rule6) = { Abstraction }

∆ F(Copy, Rule7) = { Duality }

∆ F(Copy, Rule8) = { Clustering }

∆ F(Copy, Rule9) = { Existing Target, Update, User Edit Preserving, Direc-tionality (Unidirectional vs. Multidirectional) }


∆ F(Copy, Rule11) = { Endogenous, Static Mode (In/Out) }

∆ F(Copy, Rule12) = { Refactoring }

The cardinalities of these sets are |∆ F(Copy, SimpleUML to RDBMS) | = 4,|∆ F(Copy, Rule9)| = 4, |∆ F(Copy, Rule11)| = 2, and “1” for every remainingset.

158

C.2 Results Independent of Scenario Implementations

C.2.2 Possible Language Constructs (Learn1.1.1)

The following list gives the keywords for Java [GJSB05][p. 21], QVT-O [Obj11a][pp.137-138], and QVT-R [Obj11a][p. 36] as required for the metric “Learn1.1.1” (cf.Section 5.5.7).

Java Keywords = { abstract, assert, boolean, break, byte, case, catch, char,class, const, continue, default, do, double, else, enum, extends, final, finally,float, for, if, goto, implements, import, instanceof, int, interface, long, native,new, package, private, protected, public, return, short, static, strictfp, super,switch, synchronized, this, throw, throws, transient, try, void, volatile, while}

QVT-O Keywords = { Bag, Collection, Dict, OrderedSet, Sequence, Set, Tu-ple, List, abstract, access, and, any, assert, blackbox, break, case, class,collect, collectNested, collectOne, collectselect, collectselectOne, composes,compute, configuration, constructor, continue, datatype, default, derived,disjuncts, do, elif, else, end, endif, enum, except, exists, extends, excep-tion, false, forAll, forEach, forOne, from, helper, if, implies, import, in,inherits, init, inout, intermediate, invresolve, invresolveIn, invresolveone,invresolveoneIn, isUnique, iterate, late, let, library, literal, log, main, map,mapping, merges, metamodel, modeltype, new, not, null, object, one, or,ordered, out, package, population, primitive, property, query, raise, read-only, references, refines, reject, resolve, resolveIn, resolveone, resolveoneIn,return, select, selectOne, sortedBy, static, switch, tag, then, transformation,true, try, typedef, unlimited, uses, var, when, where, while, with, xcollect,xmap, xor, xselect }

QVT-R Keywords = { checkonly, domain, enforce, extends, implementedby, im-port, key, overrides, primitive, query, relation, top, transformation, when,where }

The cardinalities of these sets are |Java Keywords| = 50, |QVT-O Keywords| =117, and |QVT-R Keywords| = 15.

C.2.3 Applied Language Constructs (Learn1.1.2)

The following list gives the applied keywords over all scenarios for Java, QVT-O,and QVT-R as required for the metric “Learn1.1.1” (cf. Section 5.5.7). Addition-ally, the unapplied keywords were collected, too, as they could be easily derivedfrom the former ones. For creating this list, every language implementation wassearched for the keywords of Section C.2.2 (per language). For instance, the “ab-stract” keyword was not found via “grep -r ’abstract’ .” in any of the Javaimplementations.

159

C. Results

Java Applied Keywords = { assert, boolean, catch, class, double, else, extends,final, for, if, import, instanceof, int, new, package, private, public, return,static, this, throw, try, void, while }

Java Unapplied Keywords = { abstract, break, byte, case, char, const, continue,default, do, enum, finally, float, goto, implements, interface, long, native,protected, short, strictfp, super, switch, synchronized, throws, transient,volatile }

QVT-O Applied Keywords = { Bag, OrderedSet, Sequence, Set, abstract, and,any, assert, class, configuration, constructor, disjuncts, else, endif, forEach,helper, if, in, inherits, init, inout, intermediate, log, main, map, mapping,modeltype, new, not, null, object, out, property, query, resolve, resolveone,return, select, then, transformation, true, uses, var, when, while, with }

QVT-O Unapplied Keywords = { Collection, Dict, Tuple, List, access, black-box, break, case, collect, collectNested, collectOne, collectselect, collectselec-tOne, composes, compute, continue, datatype, default, derived, do, elif, end,enum, except, exists, extends, exception, false, forAll, forOne, from, implies,import, invresolve, invresolveIn, invresolveone, invresolveoneIn, isUnique, it-erate, late, let, library, literal, merges, metamodel, one, or, ordered, package,population, primitive, raise, readonly, references, refines, reject, resolveIn,resolveoneIn, selectOne, sortedBy, static, switch, tag, try, typedef, unlim-ited, where, xcollect, xmap, xor, xselect }

QVT-R Applied Keywords = { checkonly, domain, enforce, key, primitive, query,relation, top, transformation, when, where }

QVT-R Unapplied Keywords = { extends, implementedby, import, overrides }

The cardinalities of these sets are |Java Applied Keywords| = 24 (48.00%), |JavaUnapplied Keywords| = 26 (52.00%), |QVT-O Applied Keywords| = 46 (39.32%),|QVT-O Unapplied Keywords| = 71 (60.68%), |QVT-R Applied Keywords| = 11(73.33%), and |QVT-R Unapplied Keywords| = 4 (26.67%). The percentage valuerespectively given in brackets denotes the amount of applied/unapplied keywordsin relation to all possible keywords (rounded to two decimal places).

C.2.4 Size of the Documentation (Learn2.12 - Learn2.16)

For measuring the size metrics for learnability (described in Section 5.5.7), I pro-ceeded as follows. I used [GJSB05] for Java, [Obj11a][Chapter 7] for QVT-R, and[Obj11a][Chapter 8] for QVT-O. For QVT-R and QVT-O I created a separatePDF file only containing the denoted chapters, respectively. For the number ofpages, I opened the corresponding PDF file and looked up the total number ofpages as given by my PDF viewer. I used the wc command line tool for mea-suring the number of lines, words, and characters of the PDF files. Finally, I

160

C.3 Measurements per Scenario

counted the number of figures manually by inspecting the whole document. I didneither count “tables” (e.g., the tables included within the Java specification likeat [GJSB05][p. 556]) nor “diagrammatic notations” (i.e., the notations given forQVT-R at [Obj11a][pp. 43-44]) as figures.

The list of results below shows these size measurements:

Java: 684 pages, 48414 lines, 834455 words, 8122120 characters, and 2 figures

QVT-O: 82 pages, 8714 lines, 51422 words, 900130 characters, and 7 figures

QVT-R: 50 pages, 6781 lines, 38324 words, 549449 characters, and 13 figures


This section presents the measurements of the GQM plan scenario-wise. For this,Section C.3.1 first explains the general template of how the measurement table isbuild up. Afterwards, Section C.3.2 to Section C.3.15 provide the measurementsfor each scenario by sticking to this template.

C.3.1 Measurement Template

Table C.2 provides the general template that needs to be filled-out per scenario.It consists of one row per metric of the GQM plan (cf. Chapter 5). The firstcolumn gives the ID of the metric and the fifth column a description of the metric,respectively. The second, third, and fourth column provide the measurements forJava, QVT-O, and QVT-R, respectively.

In cases where the descriptions of the GQM plan just link to other metricsthe template resolves those links and provides a full description of the respectivemetric. For instance, the reusability metrics “Reuse1.1.i” link to the modularitymetrics “Modu1.1.1” to “Modu1.1.4” (cf. Section 5.5.2). Therefore, the tem-plate includes the rows with IDs “Reuse1.1.1” to “Reuse1.1.4” (instead of just“Reuse1.1.i”). The description column of those rows is copied from the corre-sponding modularity metric description.

The fields with the measurements are (1) already filled out, (2) AUTO, (3)MANUAL, or (4) QUEST.

The data for the already filled out fields is available as these fields are scenario-independent. First of all, metrics that need to be refined (i.e., metrics marked byan asterisk “*”) cannot have a value; only their refinements. Each field of thosemetrics is filled out by a hyphen “-” indicating that there is no value measured forthe respective metric. Similarly, Table C.2 also uses a hyphen if the metric relatesto the MOM scenario (which is not subject to measurements for this thesis; cf.Section 6.1). The GM2.1 metrics are measured in Section C.2.1 and the GM2.11metrics in Section C.1.2. Table C.2 simply refers to those sections within the

161

C. Results

respective fields. Finally, Table C.2 includes some learnability metric measure-ments as these are measured in Section C.2.2, Section C.2.3, and Section C.2.4.These learnability metrics are Learn1.1.1 and Learn1.1.2 as well as Learn2.12 toLearn2.16. Furthermore, the QVT-R field of Learn1.1.3 (time until a scenario wasimplemented successfully) is filled out by “n/a” (not available) as the QVT-R im-plementations are obtained from the Medini QVT plugin which does not provideany information about the time that was needed for implementing the differentscenarios.

Fields which are marked by “AUTO” can be automatically filled out by theM2M Quality tool (cf. Section 6.2.2). The last step of an execution of M2M Qualityis a M2T transformation to a measurement table. Table C.2 is the underlyingtemplate for this transformation where the fields containing the “AUTO” stringcontain appropriate transformation code.

Fields which are marked by “MANUAL” need to be manually filled out per sce-nario. These measurements relate to refinements of the GM2.2.i metric (changesaffecting X when moving from Copy to RuleX). I manually gathered the data forthe former measurements by following the respective metric definitions.

Fields which are marked by “QUEST” are refinements of the “questionnairemetric” GM1.2.i. For these values, I copied the respective values from Sec-tion C.1.1 into the measurement table of the respective scenario.

ID Java QVT-O QVT-R Description

GM1.1.i∗ - - - +/− Metrics specific for qualityproperty X

GM1.2.i∗ - - - + Average X points in question-naire

GM2.1 Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1

Qualitative differences identified inChapter 4

GM2.2.i∗ - - - Changes affecting X when movingfrom Copy to RuleX

GM2.3 MANUAL MANUAL MANUAL Relative differences between ratio-scaled GM1 metrics when movingfrom Copy to RuleX

GM2.4.i∗ - - - Changes affecting X when movingfrom generated copy rules to MOMcompletion

GM2.5 - - - Relative differences between ratio-scaled GM1 metrics when movingfrom generated copy rules to MOMcompletion

GM2.6 AUTO AUTO AUTO Number of included modules (class-es/compilation units/libraries)

GM2.7 AUTO AUTO AUTO Number of applied reuse mecha-nisms (inheritance/logical compo-sition)

GM2.8 AUTO AUTO AUTO Average number of when clausesGM2.9 AUTO AUTO AUTO Number of intermediate structures

162



GM2.10 AUTO AUTO AUTO Average distinct phases per ruleGM2.11 Cf. Sec-

tion C.1.2Cf. Sec-tion C.1.2

Cf. Sec-tion C.1.2

Answers to GQ2 in questionnaire

Modu1.1.1 AUTO AUTO AUTO + Average number of domains[KGH12]

Modu1.1.2 AUTO AUTO AUTO + Average fan-out [KGH12]Modu1.1.3 AUTO AUTO AUTO − Average rule dependency depth

[KGH12]Modu1.1.4 AUTO AUTO AUTO + Average number of explicit inter-

nal scheduling calls [KGH12]Modu1.2.1 QUEST QUEST QUEST + Questionnaire: “How would you

rate the modularity of the transfor-mation?”

Modu1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis the transformation composed ofdistinct components such that achange to one component has min-imal impact on the other compo-nents?”

Modu2.2.1 MANUAL MANUAL MANUAL Number of changed rules whenmoving from Copy to RuleX

Modu2.2.2 MANUAL MANUAL MANUAL Number of additional rules whenmoving from Copy to RuleX

Modu2.4.1 - - - Number of changed copy ruleswithin MOM completion

Modu2.4.2 - - - Number of additional rules withinMOM completion

Reuse1.1.1 AUTO AUTO AUTO + Average number of domains[KGH12]

Reuse1.1.2 AUTO AUTO AUTO + Average fan-out [KGH12]Reuse1.1.3 AUTO AUTO AUTO − Average rule dependency depth

[KGH12]Reuse1.1.4 AUTO AUTO AUTO + Average number of explicit inter-

nal scheduling calls [KGH12]Reuse1.2.1 QUEST QUEST QUEST + Questionnaire: “How would you

rate the reusability of the transfor-mation?”

Reuse1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis it possible to apply the trans-formation rules the transformationspecifies also in other scenarios?”

Reuse2.2.1 MANUAL MANUAL MANUAL Number of additional reused ruleswhen moving from Copy to RuleX

Reuse2.4.1 - - - Number of additional reused ruleswhen moving from MOM Copy toMOM completion

Ana1.1.1 AUTO AUTO AUTO + Lines of code [KGH12]Ana1.1.2 AUTO AUTO AUTO + Number of starts [KGH12]Ana1.1.3 AUTO AUTO AUTO + Number of rules [KGH12]

163

C. Results


Ana1.1.4 AUTO AUTO AUTO + Number of top-level rules[KGH12]

Ana1.1.5 AUTO AUTO AUTO − Average size of domain pattern[KGH12]

Ana1.1.6 AUTO AUTO AUTO + Average number of explicit inter-nal scheduling calls [KGH12]

Ana1.2.1 QUEST QUEST QUEST + Questionnaire: “How would yourate the analyzability of the trans-formation?”

Ana1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis it possible to diagnose the trans-formation for deficiencies?”

Ana2.2.1 MANUAL MANUAL MANUAL Number of changed rules whenmoving from Copy to RuleX

Ana2.2.2 MANUAL MANUAL MANUAL Number of additional rules whenmoving from Copy to RuleX

Ana2.4.1 - - - Number of changed copy ruleswithin MOM completion

Ana2.4.2 - - - Number of additional rules withinMOM completion

Modi1.1.1 AUTO AUTO AUTO + Average fan-out [KGH12]Modi1.1.2 AUTO AUTO AUTO − Average rule dependency depth

[KGH12]Modi1.1.3 AUTO AUTO AUTO + Average number of explicit inter-

nal scheduling calls [KGH12]Modi1.2.1 QUEST QUEST QUEST + Questionnaire: “How would you

rate the modifiability of the trans-formation?”

Modi1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis it possible to alter the trans-formation without introducing de-fects?”

Modi2.2.1 MANUAL MANUAL MANUAL Average decrease in other qualityproperties when moving from Copyto RuleX

Modi2.4.1 - - - Average decrease in other qual-ity properties when moving fromMOM Copy to MOM completion

Cons1.1.1 AUTO AUTO AUTO − Lines of code [KGH12]Cons1.1.2 AUTO AUTO AUTO − Number of starts [KGH12]Cons1.1.3 AUTO AUTO AUTO − Number of rules [KGH12]Cons1.1.4 AUTO AUTO AUTO − Number of top-level rules

[KGH12]Cons1.1.5 AUTO AUTO AUTO + Average number of domains

[KGH12]Cons1.1.6 AUTO AUTO AUTO + Average fan-out [KGH12]Cons1.1.7 AUTO AUTO AUTO + Average number of explicit inter-

nal scheduling calls [KGH12]

164



Cons1.2.1 QUEST QUEST QUEST + Questionnaire: “How would yourate the consistency of the transfor-mation?”

Cons1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis the transformation implementedin a uniform manner?”

Cons2.2.1 MANUAL MANUAL MANUAL Questionnaire: “Assume the im-plementation of RuleX is createdon the basis of the implementationof Copy. Count and name newlyintroduced inconsistencies (if any)in the implementation of RuleX.Think of elements as variables,rules, etc.”

Cons2.4.1 - - - Questionnaire: “Assume the imple-mentation of MOM is created onthe basis of the implementation ofGeneratedCopyRules. Count andname newly introduced inconsis-tencies (if any) in the implementa-tion of MOM. Think of elements asvariables, rules, etc.”

Appro1.1.1 AUTO AUTO AUTO + Lines of code [KGH12]Appro1.1.2 AUTO AUTO AUTO + Number of starts [KGH12]Appro1.1.3 AUTO AUTO AUTO + Number of rules [KGH12]Appro1.1.4 AUTO AUTO AUTO + Number of top-level rules

[KGH12]Appro1.1.5 AUTO AUTO AUTO − Average size of domain pattern

[KGH12]Appro1.1.6 AUTO AUTO AUTO + Average number of explicit inter-

nal scheduling calls [KGH12]Appro1.2.1 QUEST QUEST QUEST + Questionnaire: “How would you

rate the appropriateness recogniz-ability (also known as understand-ability) of the transformation?”

Appro1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreeis it possible to recognize whetherthe transformation is appropriatefor the transformation scenario?”

Appro2.2.1 MANUAL MANUAL MANUAL Number of additional/changedcomment lines of code whenmoving from Copy to RuleX

Appro2.2.2 - - - Number of additional/changedcomment lines of code when mov-ing from MOM Copy to MOMcompletion

Learn1.1.1 50 117 15 − Number of possible languageconstructs [GFA09]

165

C. Results


Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)

+ Number of applied language con-structs over all scenarios per lan-guage [GFA09]

Learn1.1.3 MANUAL MANUAL n/a − Time until a scenario was imple-mented successfully [GFA09]

Learn1.2.1 QUEST QUEST QUEST + Questionnaire: “How would yourate the learnability of the transfor-mation language?”

Learn1.2.2 QUEST QUEST QUEST + Questionnaire: “To what degreecan the transformation language betaught to use it with effectiveness,efficiency, freedom from risk, andsatisfaction?”

Learn2.2.1 MANUAL MANUAL MANUAL Number of newly introduced lan-guage constructs when movingfrom Copy to RuleX

Learn2.4.1 - - - Number of newly introduced lan-guage constructs when movingfrom MOM Copy to MOM comple-tion

Learn2.12 684 82 50 Size of language documentation(number of pages)

Learn2.13 48414 8714 6781 Size of language documentation(number of lines)

Learn2.14 834455 51422 38324 Size of language documentation(number of words)

Learn2.15 8122120 900130 549449 Size of language documentation(number of characters)

Learn2.16 2 7 13 Size of language documentation(number of figures)

Table C.2: General Measurement Template

166


C.3.2 SimpleUML to SimpleRDBMS

Table C.3 provides the measurement results for the SimpleUML to SimpleRDBMScase study.





Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not appli-cable

Not appli-cable

Not appli-cable

Relative differences between ratio-scaled GM1 metrics when movingfrom Copy to RuleX



GM2.6 29.00 0.00 0.00 Number of included modules (class-es/compilation units/libraries)

GM2.7 0.00 0.00 0.00 Number of applied reuse mecha-nisms (inheritance/logical compo-sition)

GM2.8 - 0.43 0.63 Average number of when clausesGM2.9 9.00 2.00 19.00 Number of intermediate structuresGM2.10 - 1.57 - Average distinct phases per ruleGM2.11 Cf. Sec-


Cf. Sec-tion C.1.2


Modu1.1.1 1.08 0.64 2.50 + Average number of domains[KGH12]

Modu1.1.2 1.07 2.20 0.75 + Average fan-out [KGH12]Modu1.1.3 3.00 3.67 2.00 − Average rule dependency depth

[KGH12]Modu1.1.4 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Modu1.2.1 3.67 5.00 5.00 + Questionnaire: “How would you


Modu1.2.2 4.00 4.67 5.00 + Questionnaire: “To what degreeis the transformation composed ofdistinct components such that achange to one component has min-imal impact on the other compo-nents?”

167

C. Results


Modu2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of changed rules whenmoving from Copy to RuleX


Not appli-cable

Not appli-cable

Number of additional rules whenmoving from Copy to RuleX



Reuse1.1.1 1.08 0.64 2.50 + Average number of domains[KGH12]

Reuse1.1.2 1.07 2.20 0.75 + Average fan-out [KGH12]Reuse1.1.3 3.00 3.67 2.00 − Average rule dependency depth

[KGH12]Reuse1.1.4 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Reuse1.2.1 3.67 4.33 4.67 + Questionnaire: “How would you


Reuse1.2.2 2.67 3.67 3.67 + Questionnaire: “To what degreeis it possible to apply the trans-formation rules the transformationspecifies also in other scenarios?”

Reuse2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of additional reused ruleswhen moving from Copy to RuleX


Ana1.1.1 256.00 97.00 178.00 + Lines of code [KGH12]Ana1.1.2 1.00 1.00 1.00 + Number of starts [KGH12]Ana1.1.3 13.00 10.00 8.00 + Number of rules [KGH12]Ana1.1.4 1.00 1.00 3.00 + Number of top-level rules

[KGH12]Ana1.1.5 13.08 4.50 2.90 − Average size of domain pattern

[KGH12]Ana1.1.6 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Ana1.2.1 3.33 4.67 5.67 + Questionnaire: “How would you

rate the analyzability of the trans-formation?”

Ana1.2.2 2.67 5.00 4.67 + Questionnaire: “To what degreeis it possible to diagnose the trans-formation for deficiencies?”

Ana2.2.1 Not appli-cable

Not appli-cable

Not appli-cable



Not appli-cable

Not appli-cable



168




Modi1.1.1 1.07 2.20 0.75 + Average fan-out [KGH12]Modi1.1.2 3.00 3.67 2.00 − Average rule dependency depth

[KGH12]Modi1.1.3 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Modi1.2.1 4.33 5.00 5.00 + Questionnaire: “How would you


Modi1.2.2 4.67 5.67 4.33 + Questionnaire: “To what degreeis it possible to alter the trans-formation without introducing de-fects?”

Modi2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Average decrease in other qualityproperties when moving from Copyto RuleX


Cons1.1.1 256.00 97.00 178.00 − Lines of code [KGH12]Cons1.1.2 1.00 1.00 1.00 − Number of starts [KGH12]Cons1.1.3 13.00 10.00 8.00 − Number of rules [KGH12]Cons1.1.4 1.00 1.00 3.00 − Number of top-level rules

[KGH12]Cons1.1.5 1.08 0.64 2.50 + Average number of domains

[KGH12]Cons1.1.6 1.07 2.20 0.75 + Average fan-out [KGH12]Cons1.1.7 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Cons1.2.1 3.67 5.67 5.67 + Questionnaire: “How would you

rate the consistency of the transfor-mation?”

Cons1.2.2 4.67 5.00 5.33 + Questionnaire: “To what degreeis the transformation implementedin a uniform manner?”

Cons2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Questionnaire: “Assume the im-plementation of RuleX is createdon the basis of the implementationof Copy. Count and name newlyintroduced inconsistencies (if any)in the implementation of RuleX.Think of elements as variables,rules, etc.”

169

C. Results



Appro1.1.1 256.00 97.00 178.00 + Lines of code [KGH12]Appro1.1.2 1.00 1.00 1.00 + Number of starts [KGH12]Appro1.1.3 13.00 10.00 8.00 + Number of rules [KGH12]Appro1.1.4 1.00 1.00 3.00 + Number of top-level rules

[KGH12]Appro1.1.5 13.08 4.50 2.90 − Average size of domain pattern

[KGH12]Appro1.1.6 1.07 2.20 1.50 + Average number of explicit inter-

nal scheduling calls [KGH12]Appro1.2.1 4.00 4.67 5.33 + Questionnaire: “How would you


Appro1.2.2 3.33 5.00 5.33 + Questionnaire: “To what degreeis it possible to recognize whetherthe transformation is appropriatefor the transformation scenario?”

Appro2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of additional/changedcomment lines of code whenmoving from Copy to RuleX



Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)


Learn1.1.3 n/a n/a n/a − Time until a scenario was imple-mented successfully [GFA09]

Learn1.2.1 3.33 5.33 4.00 + Questionnaire: “How would yourate the learnability of the transfor-mation language?”

Learn1.2.2 3.67 6.00 4.33 + Questionnaire: “To what degreecan the transformation language betaught to use it with effectiveness,efficiency, freedom from risk, andsatisfaction?”

170



Learn2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of newly introduced lan-guage constructs when movingfrom Copy to RuleX







Table C.3: Results of the Data Collection Phase for the SimpleUML to Sim-pleRDBMS Scenario

171

C. Results

C.3.3 Copy

Table C.4 provides the measurement results for Copy of the “Medini QVT’sShapes-Tutorial” case study.





Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not appli-cable

Not appli-cable

Not appli-cable








Cf. Sec-tion C.1.2








172




Not appli-cable

Not appli-cable



Not appli-cable

Not appli-cable









Reuse1.2.2 MANUAL MANUAL MANUAL + Questionnaire: “To what degreeis it possible to apply the trans-formation rules the transformationspecifies also in other scenarios?”

Reuse2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of additional reused ruleswhen moving from Copy to RuleX









Not appli-cable

Not appli-cable



Not appli-cable

Not appli-cable



173

C. Results








Modi2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Average decrease in other qualityproperties when moving from Copyto RuleX








Cons2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Questionnaire: “Assume the im-plementation of RuleX is createdon the basis of the implementationof Copy. Count and name newlyintroduced inconsistencies (if any)in the implementation of RuleX.Think of elements as variables,rules, etc.”

174










Appro2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of additional/changedcomment lines of code whenmoving from Copy to RuleX



Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)


Learn1.1.3 55.00 min. 5.00 min. n/a − Time until a scenario was imple-mented successfully [GFA09]



175

C. Results


Learn2.2.1 Not appli-cable

Not appli-cable

Not appli-cable

Number of newly introduced lan-guage constructs when movingfrom Copy to RuleX







Table C.4: Results of the Data Collection Phase for the Copy Scenario

176


C.3.4 Rule1

Table C.5 provides the measurement results for Rule1 of the “Medini QVT’sShapes-Tutorial” case study.





Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








177

C. Results


Modu2.2.1 3.00 1.00 1.00 Number of changed rules whenmoving from Copy to RuleX

Modu2.2.2 2.00 1.00 0.00 Number of additional rules whenmoving from Copy to RuleX









Reuse2.2.1 0.03 0.50 0.00 Number of additional reused ruleswhen moving from Copy to RuleX








Ana2.2.1 3.00 1.00 1.00 Number of changed rules whenmoving from Copy to RuleX

Ana2.2.2 2.00 1.00 0.00 Number of additional rules whenmoving from Copy to RuleX


178









Modi2.2.1 0.37 -0.10 -0.84 Average decrease in other qualityproperties when moving from Copyto RuleX








Cons2.2.1 0.00 0.00 0.00 Questionnaire: “Assume the im-plementation of RuleX is createdon the basis of the implementationof Copy. Count and name newlyintroduced inconsistencies (if any)in the implementation of RuleX.Think of elements as variables,rules, etc.”

179

C. Results









Appro2.2.1 18.00 12.00 5.00 Number of additional/changedcomment lines of code whenmoving from Copy to RuleX



Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





180



Learn2.2.1 3.00 4.00 1.00 Number of newly introduced lan-guage constructs when movingfrom Copy to RuleX







Table C.5: Results of the Data Collection Phase for the Rule1 Scenario

181

C. Results

C.3.5 Rule2






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








182
























183

C. Results








Modi2.2.1 0.24 0.28 -0.59 Average decrease in other qualityproperties when moving from Copyto RuleX









184













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





185

C. Results










186


C.3.6 Rule3






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








187

C. Results























188









Modi2.2.1 -0.13 -0.18 -0.34 Average decrease in other qualityproperties when moving from Copyto RuleX









189

C. Results












Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





190











191

C. Results

C.3.7 Rule4






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








192
























193

C. Results








Modi2.2.1 -0.73 0.06 -0.34 Average decrease in other qualityproperties when moving from Copyto RuleX









194













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





195

C. Results










196


C.3.8 Rule5






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








197

C. Results























198









Modi2.2.1 -1.34 0.08 0.27 Average decrease in other qualityproperties when moving from Copyto RuleX









199

C. Results












Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





200











201

C. Results

C.3.9 Rule6






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








202













Reuse2.2.1 0.03 0.50 -0.06 Number of additional reused ruleswhen moving from Copy to RuleX











203

C. Results

















204













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





205

C. Results










206


C.3.10 Rule7






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








207

C. Results























208


















209

C. Results












Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





210











211

C. Results

C.3.11 Rule8






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








212
























213

C. Results








Modi2.2.1 0.69 1.36 1.16 Average decrease in other qualityproperties when moving from Copyto RuleX









214













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





215

C. Results










216


C.3.12 Rule9






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








217

C. Results























218









Modi2.2.1 -0.28 -0.03 -0.03 Average decrease in other qualityproperties when moving from Copyto RuleX









219

C. Results












Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





220











221

C. Results

C.3.13 Rule10






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








222
























223

C. Results

















224













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





225

C. Results










226


C.3.14 Rule11






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








227

C. Results























228


















229

C. Results












Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





230











231

C. Results

C.3.15 Rule12






Cf. Sec-tion C.2.1

Cf. Sec-tion C.2.1



GM2.3 Not mea-sured

Not mea-sured

Not mea-sured








Cf. Sec-tion C.1.2








232
























233

C. Results

















234













Learn1.1.2 24(48, 00%)

46(39, 32%)

11(73.33%)





235

C. Results










236

C.4 Measurement Diagrams


This section presents several diagrams created on the basis of the measurementresults of Section C.3. Each diagram illustrates the measurements of one metric.The order of the diagrams follows the structure of the GQM plan, i.e., this sectionfirst provides diagrams of the general metrics corresponding to the quality respec-tively reason question. Afterwards, it presents the diagrams of metrics as derivedfor the different quality properties. The section presents all diagrams related tothe evaluated questionnaire metrics in the end.

As some metrics are applied by several quality property refinements, the captionof each diagram explicitly names the respective metric IDs that relate to the illus-trated measurements. For instance, the diagram of Figure C.6 shows the “averagenumber of domains” measurement which relates to the metric IDs Modu1.1.1,Reuse1.1.1, and Cons1.1.5. Moreover, the caption annotates the metric IDs withthe expected correlation regarding the respective quality property where appropri-ate (“+” for a positive or “−” for a negative correlation). For instance, Modu1.1.1of Figure C.6 is expected to positively correlate with modularity and, hence, an-notated by a “+” (i.e., “+Modu1.1.1”).

Each diagram names its corresponding metric within its caption as well as inits diagram title (by using the metric description as provided by the GQM plan).The ordinate of each diagram shows the measured values of the respective met-ric. The abscissa shows in most cases the different scenarios used as case studies.For learnability, this differs as the measurements are partly scenario-independent.Therefore, the abscissa of learnability diagrams can also relate to the issue mea-sured instead (pages, lines, possible or applied keywords, etc.).

In the following, this section lists these diagram one after the other.

237

C. Results

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

Number of Included Modules (classes/compila6on units/libraries)

Java

QVT-‐O

QVT-‐R

Figure C.1: Measurements for the “Number of Included Modules” (GM2.6)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Number of Applied Reuse Mechanisms (inheritance/logical composi:on)

Java

QVT-‐O

QVT-‐R

Figure C.2: Measurements for the “Number of Applied Reuse Mechanisms”(GM2.7)

238


0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

Average Number of When Clauses

Java

QVT-‐O

QVT-‐R

Figure C.3: Measurements for the “Average Number of When Clauses” (GM2.8)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00

Number of Intermediate Structures

Java

QVT-‐O

QVT-‐R

Figure C.4: Measurements for the “Number of Intermediate Structures” (GM2.9)

239

C. Results

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

Average Dis+nct Phases per Rule

Java

QVT-‐O

QVT-‐R

Figure C.5: Measurements for the “Average Distinct Phases per Rule” (GM2.10)

0.00

0.50

1.00

1.50

2.00

2.50

3.00

Average Number of Domains

Java

QVT-‐O

QVT-‐R

Figure C.6: Measurements for the “Average Number of Domains” (+Modu1.1.1,+Reuse1.1.1, and +Cons1.1.5)

240


0.00

0.50

1.00

1.50

2.00

2.50

Average Fan-‐Out

Java

QVT-‐O

QVT-‐R

Figure C.7: Measurements for the “Average Fan-Out” (+Modu1.1.2, +Reuse1.1.2,+Modi1.1.1, and +Cons1.1.6)

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

Average Rule Dependency Depth

Java

QVT-‐O

QVT-‐R

Figure C.8: Measurements for the “Average Rule Dependency Depth”(−Modu1.1.3, −Reuse1.1.3, and −Modi1.1.2)

241

C. Results

0.00

0.50

1.00

1.50

2.00

2.50

Average Number of Explicit Internal Scheduling Calls

Java

QVT-‐O

QVT-‐R

Figure C.9: Measurements for the “Average Number of Expl. Inter. Sched.Calls” (+Modu1.1.4, +Reuse1.1.4, +Modi1.1.3, +Cons1.1.7, and+Appro1.1.6)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

Rule1 Rule2 Rule3 Rule4 Rule5 Rule6 Rule7 Rule8 Rule9 Rule10 Rule11 Rule12

Number of Changed Rules (when Moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.10: Measurements for the “Number of Changed Rules when Moving fromCopy to RuleX” (−Modu2.2.1 and −Ana2.2.1)

242


0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00


Number of Addi-onal Rules (when Moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.11: Measurements for the “Number of Additional Rules when Movingfrom Copy to RuleX” (−Modu2.2.2 and −Ana2.2.2)

-‐0.20

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80


Number of Addi-onal Reused Rules (when Moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.12: Measurements for the “Number of Additional Reused Rules whenMoving from Copy to RuleX” (+Reuse2.2.1)

243

C. Results

0.00

50.00

100.00

150.00

200.00

250.00

300.00

Lines of Code (removed blank and comment lines)

Java

QVT-‐O

QVT-‐R

Figure C.13: Measurements for the “Lines of Code” (−Cons1.1.1 and+Appro1.1.1)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

Number of Starts

Java

QVT-‐O

QVT-‐R

Figure C.14: Measurements for the “Number of Starts” (−Cons1.1.2 and+Appro1.1.2)

244


0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Number of Rules

Java

QVT-‐O

QVT-‐R

Figure C.15: Measurements for the “Number of Rules” (−Cons1.1.3 and+Appro1.1.3)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Number of Top-‐Level Rules

Java

QVT-‐O

QVT-‐R

Figure C.16: Measurements for the “Number of Top-Level Rules” (−Cons1.1.4and +Appro1.1.4)

245

C. Results

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Average Size of Domain Pa1ern

Java

QVT-‐O

QVT-‐R

Figure C.17: Measurements for the “Average Size of the Domain Pattern”(−Appro1.1.5)

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00


Number of Addi-onal/Changed Comment Lines of Code (when moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.18: Measurements for the “Number of Additional/Changed CommentLines of Code when Moving from Copy to RuleX” (+Appro2.2.1)

246


0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

�Possible Applied (over all Scenarios)

Number of Language Constructs (counted as the number of possible keywords)

Java

QVT-‐O

QVT-‐R

Figure C.19: Measurements for the “Number of Possible Language Constructs”(−Learn1.1.1) and the “Number of Applied Language Constructsover all Scenarios” (+Learn1.1.2)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

Applied (over all Scenarios)

Percentage of Applied Language Constructs Regarding Possible Language Constructs

(counted as the number of possible keywords)

Java

QVT-‐O

QVT-‐R

Figure C.20: Measurements for the “Percentage of Applied Language Con-structs Regarding Possible Language Constructs” (+Additional toLearn1.1.2)

247

C. Results

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

Time Un(l a Scenario was Implemented Successfully (in minutes; zero-‐values indicate unavailable data)

Java

QVT-‐O

QVT-‐R

Figure C.21: Measurements for the “Time Until a Scenario was Implemented Suc-cessfully” (−Learn1.1.3)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00


Number of Newly Introduced Language Constructs (when moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.22: Measurements for the “Number of Newly Introduced Language Con-structs when Moving from Copy to RuleX” (+Learn2.2.1)

248


0

100

200

300

400

500

600

700

800

Pages

Size of Language Documenta1on (number of pages)

Java

QVT-‐O

QVT-‐R

Figure C.23: Measurements for the “Size of Language Documentation (in Pages)”(Learn2.2.12)

0

10000

20000

30000

40000

50000

60000

Lines

Size of Language Documenta1on (number of lines)

Java

QVT-‐O

QVT-‐R

Figure C.24: Measurements for the “Size of Language Documentation (in Lines)”(Learn2.2.13)

249

C. Results

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

Words

Size of Language Documenta1on (number of words)

Java

QVT-‐O

QVT-‐R

Figure C.25: Measurements for the “Size of Language Documentation (in Words)”(Learn2.2.14)

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

Characters

Size of Language Documenta1on (number of characters)

Java

QVT-‐O

QVT-‐R

Figure C.26: Measurements for the “Size of Language Documentation (in Char-acters)” (Learn2.2.15)

250


0

2

4

6

8

10

12

14

Figures

Size of Language Documenta1on (number of figures)

Java

QVT-‐O

QVT-‐R

Figure C.27: Measurements for the “Size of Language Documentation (in Fig-ures)” (Learn2.2.16)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Modularity (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.28: Evaluated Questionnaire for “Modularity” (Average of +Modu1.2.1and +Modu1.2.2)

251

C. Results

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Reusability (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.29: Evaluated Questionnaire for “Reusability” (Average of +Reuse1.2.1and +Reuse1.2.2)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Analyzability (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.30: Evaluated Questionnaire for “Analyzability” (Average of +Ana1.2.1and +Ana1.2.2)

252


0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Modifiability (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.31: Evaluated Questionnaire for “Modifiability” (Average of +Modi1.2.1and +Modi1.2.2)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Consistency (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.32: Evaluated Questionnaire for “Consistency” (Average of +Cons1.2.1and +Cons1.2.2)

253

C. Results

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Appropriateness Recognizability (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.33: Evaluated Questionnaire for “Appropriateness Recognizability” (Av-erage of +Appro1.2.1 and +Appro1.2.2)

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Evaluated Ques,onnaire for Learnability (Average Values)

Java

QVT-‐O

QVT-‐R

Figure C.34: Evaluated Questionnaire for “Learnability” (Average of +Learn1.2.1and Learn+1.2.2)

254


-‐2.00

-‐1.50

-‐1.00

-‐0.50

0.00

0.50

1.00

1.50

2.00


Average Decrease in Other Quality Proper7es (when Moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.35: Evaluated Questionnaire for the “Average Decrease in Other QualityProperties when Moving from Copy to RuleX” (Modi2.2.1; based onsum of averaged values of other quality properties in questionnaire)

0.00

0.50

1.00

1.50

2.00

2.50


Evaluated Ques,onnaire for the "Number of Newly Introduced Inconsistencies"

(when Moving from Copy to RuleX)

Java

QVT-‐O

QVT-‐R

Figure C.36: Evaluated Questionnaire for the “Number of Newly Introduced In-consistencies when Moving from Copy to RuleX” (+Cons2.2.1)

255

C. Results

C.5 Evaluation of Metric Measurements andHypotheses per Question

This section provides (1) an investigation of the metric measurements and (2) anevaluation of the hypotheses associated to the questions of the GQM plan. Thisserves as a basis for answering the questions in Section 7.1. As it is out of thethesis’ scope, the general hypotheses and associated metric measurements of thereason question are not evaluated in detail.

This section is, therefore, structured as follows. Section C.5.1 provides details onthe latter issue and explicitly names affected metrics and hypotheses. Afterwards,Section C.5.2 to Section C.5.15 provide detailed evaluations for each questionstated.

C.5.1 General Hypotheses of the Reason Question

This thesis does not provide a detailed evaluation of the general hypotheses statedfor the reason question (cf. Section 5.4.1). In general, the results for every qualityproperty X have to be checked against each stated generic reason question hypoth-esis. However, due to the large amount of collected data, it is out of the thesis’scope to fully evaluate these hypotheses as well as the corresponding metric mea-surements. The affected hypotheses are GH2.1 to GH2.8 and the affected metricsare GM2.3 and GM2.5 to GM2.10. The evaluation of these metric measurementsand hypotheses is left as future work.

Note that, nonetheless, this thesis discusses some of the hypotheses and mea-surements related to the latter metrics whenever these can be applied to sub-stantiate or falsify evaluation results. For instance, Section 7.1.6 investigates astatement by one questionnaire participant that relates analyzability to missingapplication conditions. This is similar to GH2.5 stating that Java’s missing sup-port for application conditions generally comes with a lower quality. Section 7.1.6provides evidence that this does not hold. In particular, Section 7.1.6 evaluatesthe measurement results for GM2.8 (which counts the number of application con-ditions) for this conclusion. Another example can be found in Section 7.1.2 whichinvestigates the measurement results for GM2.9 (measuring the number of inter-mediate structures) to obtain insights regarding modularity.

C.5.2 ModuQ1: “What is the modularity of theimplementations?”

Modu1.1.1: “Average number of domains”. Referring to the work of Kapovaet al. [KGH12], this metric correlates positively with modularity. Under thisassumption, it is possible to evaluate the modularity of the different scenarioswithin one language by a pair-wise comparison of the measured values (note thatthe metric is ordinal-scaled). Therefore, Figure C.6 illustrates that (1) in Java,

256

C.5 Evaluation of Metric Measurements and Hypotheses per Question

the scenarios Copy, Rule11, and Rule12 have the lowest modularity and Rule5and Rule8 have the highest modularity, (2) in QVT-O, UML2RDBMS, Rule7,Rule8, and Rule9 have a lower and especially Rule5 a higher modularity, and (3)in QVT-R, Rule8 has a low and UML2RDBMS a high modularity.

Furthermore, the modularity of QVT-R is most stable since most values lieclosely around the 2.00 mark. As the different “Rule scenarios” only add fewscenario features to the Copy scenario, this was expected. However, for QVT-Oand Java, the respective values deviate more strongly from one another indicatinglower overall modularity in QVT-O and Java.

Under the assumption that the measured values are comparable between thedifferent languages, Figure C.6 further illustrates that QVT-R generally pro-vides the best modularity. QVT-O performs generally better than Java exceptfor UML2RDBMS, Rule7, Rule8, and Rule9.

Modu1.1.2: “Average fan-out”. Referring to Kapova et al. [KGH12], also thismetric correlates positively with modularity. Under this assumption, Figure C.7illustrates some fundamental differences compared to the Modu1.1.1 metric: (1)in Java, the values are most stable and lie around the 0.80 mark; the Rule8scenario has the highest modularity, (2) in QVT-O, UML2RDBMS, Rule7, Rule8,and Rule9 have a higher and Rule1 as well as Rule6 a lower modularity; somevalues (for Copy, Rule2, Rule11, and Rule12) are exactly 0.00, and (3) in QVT-R,most values are stable around the 0.50 mark where especially Rule8 has a highermodularity.

Most interestingly, the observations for QVT-O are the complete opposite of theobservations for the Modu1.1.1 measurements. Also for QVT-R, the observation“Rule8 has a low modularity” changes to “Rule8 has a high modularity”. Thereasons for the 0.00 value for some scenarios implemented in QVT-O is that theseimplementations consist only of one main operation which does not call any otheroperation. In fact, the modularity of these implementations may be consideredvery high as the transformations do only provide one “discrete component” suchthat a change to this component does not influence any other component at all(cf. the modularity definition in Section 3.6.7).

Under the assumption that the measured values are comparable between thedifferent languages, there are additional differences compared to the Modu1.1.1metric: QVT-R does not generally provide the best modularity anymore. Instead,Java performs best in most cases; only for the UML2RDBMS, Rule7, Rule8, andRule9 scenarios QVT-O performs better than Java.

Modu1.1.3: “Average rule dependency depth”. The distribution of the mea-sured values of this metric is close to the distribution of the values of Modu1.1.2(see Figure C.8). However, their interpretation is the opposite as Kapova et al.[KGH12] identify a negative correlation of this metrics’ values regarding modu-larity. Therefore, the interpretation results are close to the results of Modu1.1.1:

257

C. Results

(1) in Java, especially UML2RDBMS, Rule7, Rule8, and Rule9 have a lower mod-ularity; the other values lie stable around the 1.80 mark and provide a bettermodularity, (2) in QVT-O, the values behave similar to the values in Java but, asfor Modu1.1.2, the scenarios with only one main operation have 0.00 values, and(3) in QVT-R, most values lie around the 1.00 mark indicating a stable modular-ity; UML2RDBMS performs worst.

Under the assumption that the measured values are comparable between the dif-ferent languages, QVT-R generally performs best regarding modularity. In mostcases, (a) Java and QVT-O perform equally well or (b) QVT-O performs bet-ter than Java. Exceptions of this rule are (1) the cases where QVT-O has 0.00values (as argued above for Modu1.1.2) and (2) the UML2RDBMS implementa-tions where Java performs better. A possible explanation for the different valuesbetween Java and QVT-O is the additional infrastructure code for setting up atransformation needed within Java. This code may require additional methodswithin Java which have to be additionally invoked within a call-hierarchy.

Modu1.1.4: “Average number of explicit internal scheduling calls”. Fig-ure C.9 illustrates that the measurements of this metric induce the same inter-pretations as for the measurements of the Modu1.1.2 metric (referring to Kapovaet al. [KGH12], this metric also positively correlates with modularity). The mea-sured values only differ slightly from the values for Modu1.1.2: (1) in Java, onlyRule7, Rule9, and Rule11 have different values which indicates a better modular-ity for these scenarios than Modu1.1.2 does, respectively, (2) in QVT-O, there isno difference at all, and (3) in QVT-R, all values are slightly higher than the val-ues measured for M1.1.2; UML2RDBMS and Rule11 have especially high valuesindicating a high modularity for these scenarios.

The reason for the close relationship between Modu1.1.2 and Modu1.1.4 is thatboth metrics measure nearly the same: Modu1.1.4 measures all explicit inter-nal scheduling calls while Modu1.1.2 only measures the distinct calls. Therefore,the measurements of Modu1.1.4 are at least as high as the measurements forModu1.1.2. The fact that the differences between those two metrics is mainly ob-servable in QVT-R indicates that QVT-R reuses single transformation rules moreoften than other M2M languages do (hence, QVT-R provides a better reusabilityregarding this aspect); from the viewpoint of modularity, this indicates a highercohesion within QVT-R than in other M2M languages and, thus, a higher modu-larity.

Modu1.2.1/Modu1.2.2: “Evaluated questionnaire for modularity (qualityquestion)”. Figure C.28 shows the questionnaire results regarding modularity;the values are positively correlated with modularity under the assumption that theparticipants evaluated modularity correctly. With this assumption, Figure C.28illustrates: (1) in Java, Rule11 and Rule12 have the lowest modularity and Rule4to Rule7 the highest modularity; the values approximately lie between 3.0 and 6.0,

258


(2) in QVT-O, modularity is close to the modularity in Java; in approximatelyhalf of the cases modularity is lower or equal (e.g., for Copy and Rule11) thanin Java and higher in the remaining cases (e.g., for UML2RDBMS and Rule8);the values approximately lie between 2.0 and 5.5, and (3) in QVT-R, modularityis higher than for other languages in most cases; the only exceptions are Rule5where Java performs best and Rule9 where QVT-O performs best. The values forQVT-R lie between 5.0 and 6.5.

It is noticeable that the group of participants that evaluated the second set ofscenarios (Copy and Rule4 to Rule8) generally evaluated modularity better thanthe other groups. Furthermore, these participants tended to generally evaluateJava best; followed by QVT-O and QVT-R last. It is unlikely that the reason forthis are the different scenarios: firstly, these scenarios do not differ too much fromthe scenarios the other participants evaluated and secondly, the participants thatevaluated the second set of scenarios provided unusually high values for Java andQVT-O as well as low values for QVT-R in the Copy scenario compared to theother participants (Copy was evaluated by every participant).

Therefore, it is probable that the different groups of participants had a differentunderstanding of modularity in mind when evaluating the scenarios. This indicatesthat “modularity” cannot be assumed to be correctly understood by questionnaireparticipants. In fact, the standard deviation for modularity in the Copy scenariois with 2.11 for Java and 3.16 for QVT-O the highest along all quality propertiesconfirming this observation (sticking to the Copy scenario has the advantage thatit has the most representative number of participants). Another observation is thatthe modularity of the QVT-O implementation of the Copy scenario is the lowestalong the considered M2M languages even though the definition of modularitydictates a high modularity within QVT-O (as argued above for Modu1.1.2).

As a consequence, future empirical studies have to assure that participantshave a common understanding of modularity (this also holds for the other qualityproperties). For instance, participants could be trained before evaluating differentscenarios or they could get additional explanations for the meaning of modularity(e.g., examples and clear definitions). Given these limitations for the evaluationof modularity, this thesis tries to use the evaluation results nonetheless for aninterpretation. Therefore, the thesis assumes that the evaluations are not toomuch deviating from the real modularity of the scenarios.

General hypotheses for modularity (quality question; GH1.1 to GH1.3). Thestandard deviation of the average modularity values from the questionnaires asdiscussed above is approximately 0.7 indicating that the different languages havea “medium” difference regarding modularity (this neither falsifies nor confirmsGH1.1). I classified the difference into “medium” as the standard deviation formodularity is not the largest standard deviation over all standard deviations con-sidered for GH1.1 and it is over the 0.5 mark. I selected the latter mark by aneducated guess; it needs to be investigated in future work.

259

C. Results

Regarding GH1.2, the questionnaire results generally need to be classified into“low”, “medium”, and “high”. However, as the values of QVT-R are in mostcases higher than for the other languages (and in the remaining cases close to thehighest value), this classification is not needed: it holds that QVT-R dominatesJava as well as QVT-O (cf. Section 5.4.1 for the definition of the “dominates”relation). Therefore, GH1.2 is falsified for the considered scenarios under theassumption that the questionnaire results for modularity are correct. That is,there are quality properties that inherently perform better within a concrete M2Mapproach/language/engine combination than in other combinations. Note thatthis result is limited and preliminary since the modularity measurements sufferfrom the problems discussed above and there may be scenarios not consideredwithin this thesis that behave differently. However, it gives a first clear indicationthat, given the requirement that modularity of a transformation has the highestpriority, a relational language like QVT-R should be used.

The third general hypothesis GH1.3 stating that different scenarios come withdifferent modularity values can be confirmed based on the questionnaire results.This directly follows from the evaluation of these results as provided above.

C.5.3 ModuQ2: “What are the reasons for differences inmodularity?”

Modu2.2.1: “Number of changed rules when moving from Copy to RuleX”.This metric is based on the definition of modularity. I expect that it negativelycorrelates with the modularity of the Copy scenario, i.e., a high value in a givenscenario (e.g., Rule1) indicates that the Copy scenario has a low modularity. Fur-thermore, I expect that high values in a given scenario indicate high changesregarding modularity when compared to the Copy scenario and allow to derivethe scenario feature responsible for this change.

Figure C.10 illustrates the measurements of this metric: (1) in Java, the valuesare 3.00 in most cases; some are 4.00 (Rule7, Rule8, Rule9, and Rule11), (2) inQVT-O, the value is always 1.00, and (3) in QVT-R, most values are 1.00 or2.00; the only scenarios with higher values are Rule3, Rule8, Rule10, and Rule11.The reason for the 1.00 value for QVT-O is that the Copy scenario has onlyone operation (main). Therefore, QVT-O performs best regarding modularitywhen basing an evaluation on the absolute number of changed transformationrules. However, in relative terms QVT-O performs worst as 100% of the rules arechanged.

One interesting observation is that Rule8 has an especially high value in QVT-Oindicating that there is a significant change of modularity between the Copy andRule8. However, when comparing the questionnaire results regarding modularityfor these scenarios, there are only small differences. The high amount of changeshas, consequently, not the correlation with modularity that was expected. As thismetric is also applied for analyzability, the observation may indicate a change

260


of analyzability only. In fact, the questionnaire results indicate especially lowanalyzability values for QVT-R and Rule8 when compared to the Copy scenario.

Rule8 specifies a hierarchical clustering. The high values for this metric inQVT-R indicate that this scenario feature cannot easily be based on a genericset of copy transformation rules. An investigation of the Rule8 implementationin QVT-R confirms this hypothesis as several of the generic copy rules were com-pletely removed. Taking further quality properties into account, also reusability,analyzability, modifiability, and appropriateness recognizability have especiallylow values for Rule8 in every M2M language considered. This indicates that thehierarchical clustering scenario feature itself causes a lower maintainability thanother scenario features do. Consequently, future work needs to investigate howthese scenarios can effectively and efficiently be implemented within the differentM2M languages. Based on the average questionnaire results over the quality prop-erties mentioned above, QVT-O performs best and QVT-R worst. This gives atendency which approach could be best for implementing hierarchical clusteringsand needs to be confirmed in future work, too.

Modu2.2.2: “Number of additional rules when moving from Copy to RuleX”.This metric has the same expected correlations with modularity and analyzabilityas Modu2.2.1 considered before. The reason for this is that both metrics are basedon the definitions of modularity and analyzability, respectively.

Figure C.11 illustrates a similar distribution than for Modu2.2.1. One maindifference is that it provides a more fine-grained view on QVT-O as the measure-ments are not restricted to the fact that the Copy implementation has only oneoperation within QVT-O. The high values for Rule8 confirm the observations andhypotheses the interpretation for Modu2.2.2 states.

General hypotheses for modularity (reason question; GH2.1 to GH2.8). SeeSection C.5.1.

Specific hypotheses for modularity (reason question; ModuH2). The hy-pothesis ModuH2 states that QVT-R’s implicit scheduling improves modularity(compared to other languages). The questionnaire results indicate that this hy-pothesis can be true: on average, the participants evaluated the modularity of Java22.86% lower and for QVT-O 22.37% lower when compared to QVT-R. WhetherQVT-R’s implicit scheduling is really the reason for these differences in modularitycould, however, not be determined. Therefore, this issue is left as future work.

C.5.4 ReuseQ1: “What is the reusability of theimplementations?”

Reuse1.1.1 to Reuse1.1.4. Analogously to Modu1.1.1 to Modu1.1.4.

261

C. Results

Reuse1.2.1/Reuse1.2.2: “Evaluated questionnaire for reusability (quality ques-tion)”. Figure C.29 shows the questionnaire results regarding reusability; thevalues are positively correlated with reusability under the assumption that theparticipants evaluated reusability correctly. With this assumption, Figure C.29 il-lustrates that the Java and QVT-O values for reusability are similarly distributedto the corresponding values for modularity indicating a clear (positive) correla-tion between these two quality properties. Assuming the values for modularity andreusability are comparable, the participants evaluated reusability (1) in QVT-Oapproximately 10% lower than modularity (on average) and (2) in Java approxi-mately 7% lower than modularity (on average). This particularly indicates thatreusability does not only depend on modularity.

For QVT-R, the questionnaire results show a different behavior as for Java andQVT-O. The results for QVT-R do not directly indicate that modularity andreusability correlate with each other as the corresponding values are differentlydistributed. Furthermore, the QVT-R reusability values are approximately 27%lower than the values for modularity (on average). The values are generally dis-tributed around the 4.5 mark where Rule4 and Rule6 have especially high valuesand Rule8 and Rule12 low values (the values approximately lie between 2.5 and6.3).

The observations for QVT-O indicate that transformation rules, when seen as“modules”, are less reusable in QVT-R as it is the case for Java and QVT-O. Areason for this is that Java and QVT-O support and allow more generic transfor-mation rules. For instance, QVT-O’s copy operation supports copying elementsindependent of a concrete metamodel or QVT-O’s abstract mapping definitionsallow for specifying generic mapping operations that can be reused for severalpurposes (an example can be found in the Rule9 implementation in QVT-O). InQVT-R, on the other hand, relations generally depend on concrete metamodels.An exception is the “query” construct which can be applied to specify similargeneric transformation rules as in QVT-O or Java. However, using this constructintroduces imperative concepts to QVT-R as queries are no relations. Since notsystematically planned by this thesis, an investigation of this issue is left as futurework.

General hypotheses for reusability (quality question; GH1.1 to GH1.3). Thestandard deviation of the average reusability values from the questionnaires asdiscussed above is approximately 0.3 indicating that the different languages havea “low” difference regarding reusability. This indicates that GH1.1 is falsified re-garding reusability. I classified the difference into “low” as the standard deviationfor reusability is the second lowest standard deviation over all standard deviationsconsidered for GH1.1 and it is under the 0.5 mark (as also applied for modularity).

Regarding GH1.2, the questionnaire results generally need to be classified into“low”, “medium”, and “high”. However, as the standard deviation for the reusabil-ity values is low (c.f. GH1.1), fine-grained threshold values need to be specified

262


for such a classification. As the questionnaire results suffer from a low number ofparticipants assigned to different sets of scenarios, these threshold values cannotreasonably be determined. The evaluation of this hypothesis regarding reusabilityis, therefore, left as future work.

The third general hypothesis GH1.3 stating that different scenarios come withdifferent reusability values can be confirmed based on the questionnaire results.This directly follows from the evaluation of these results as provided above.

Specific hypotheses for reusability (quality question; ReuseH1). The hypoth-esis ReuseH1 states that reusability correlates with modularity. As the evaluationof the questionnaire shows, this hypothesis can be confirmed for Java and QVT-O.For QVT-R, the situation is less obvious and needs to be inspected in future work.

C.5.5 ReuseQ2: “What are the reasons for differences inreusability?”

Reuse2.2.1: “Number of additional reused rules when moving from Copy toRuleX”. This metric is based on the definition of reusability. I expect that itpositively correlates with the reusability of the Copy scenario, i.e., a high value in agiven scenario (e.g., Rule1) indicates that the Copy scenario has a high reusability.Furthermore, I expect that high values in a given scenario indicate high changesregarding reusability when compared to the Copy scenario and allow to derive thescenario feature responsible for this change.

Figure C.12 illustrates the measurements of this metric: Java and QVT-R havegenerally low values with approximately 0.13 and 0.16 reused rules per scenario,respectively. QVT-O comes with approximately 0.70 reused rules per scenario.This indicates that the reusability of the transformation rules applied in the Copyscenario is best in QVT-O and similar for Java and QVT-R. This is confirmed bythe questionnaire results.

One interesting measurement is the negative value for QVT-R in the Rule6scenario. This indicates that, in the QVT-R implementation, relations that arespecified in the Copy scenario (a) are deleted and/or (b) less invoked by otherrelations via QVT-R’s internal scheduling mechanism. Both possibilities indicatethat the set of transformation rules provided by the Copy scenario is too exten-sive for the Rule6 scenario such that some relations are, for instance, deleted forimplementing Rule6.

Another observation are the high values for the Rule8 scenario. On the onehand, these values could indicate that the set of transformation rules provided bythe Copy scenario allows for composing an implementation of the Rule8 scenarioby reusing appropriate transformation rules. On the other hand, one reason forthese values could also be that the implementation of Rule8 is completely differentfrom the Copy scenario and new “reuse dependencies” were established. Thequestionnaire results indicate that the latter possibility is likely as Rule8 was

263

C. Results

evaluated with an especially low reusability (for every language considered). Aninvestigation of the concrete implementations for Rule8 confirm this observationas, for instance, for QVT-R the relations are hardly based on the Copy scenario.These findings coincide with the evaluation of Modu2.2.1 which states that theRule8 scenario (which describes a hierarchical clustering) itself requires furtherinvestigation to derive an optimal implementation strategy.

General hypotheses for reusability (reason question; GH2.1 to GH2.8). SeeSection C.5.1.

Specific hypotheses for reusability (reason question; ReuseH2). The hy-pothesis ReuseH2 states that QVT-R’s implicit scheduling improves reusability(compared to other languages). Analogously to the evaluation of this thesis formodularity, the questionnaire results indicate that this hypothesis can be true:on average, the participants evaluated the reusability of Java 13.01% lower andfor QVT-O 9.58% lower when compared to QVT-R. Whether QVT-R’s implicitscheduling really is the reason for these differences in reusability could, however,not be determined. Therefore, this issue is left as future work.

C.5.6 AnaQ1: “What is the analyzability of theimplementations?”

Ana1.1.1 to Ana1.1.6. Analogously to Appro1.1.1 to Appro1.1.6.

Ana1.2.1/Ana1.2.2: “Evaluated questionnaire for analyzability (quality ques-tion)”. Figure C.30 shows the questionnaire results regarding analyzability; thevalues are positively correlated with analyzability under the assumption that theparticipants evaluated analyzability correctly. With this assumption, Figure C.30illustrates that the Java, QVT-O, and QVT-R values for analyzability are sim-ilarly distributed to the corresponding values for appropriateness recognizabilityindicating a clear (positive) correlation between these two quality properties.

There are only minor differences when comparing the analyzability and appro-priateness recognizability values. This is confirmed by the following observation.Assuming the values for appropriateness recognizability and analyzability are com-parable, the participants evaluated analyzability (1) in Java approximately 0.1%higher than appropriateness recognizability (on average), (2) in QVT-O approx-imately 3.7% lower than appropriateness recognizability (on average), and (3) inQVT-R approximately 2.8% lower than appropriateness recognizability (on aver-age). As these values are very low (compared, for instance, to the comparison be-tween modularity and reusability; cf. the evaluation of Reuse1.2.1 and Reuse1.2.2),there is a strong dependency between appropriateness recognizability and analyz-ability.

264


Due to the similar results regarding analyzability and appropriateness recog-nizability, Figure C.30 can analogously be interpreted to the interpretation of thequestionnaire results for appropriateness recognizability (cf. Section C.5.12).

General hypotheses for analyzability (quality question; GH1.1 to GH1.3).Analogously to the evaluation of GH1.1 to GH1.3 for appropriateness recogniz-ability (due to the similarities between analyzability and appropriateness recog-nizability).

Specific hypotheses for analyzability (quality question; AnaH1). The hypoth-esis AnaH1 states that analyzability correlates with modularity. As the evaluationof the questionnaire shows, this hypothesis can be confirmed.

C.5.7 AnaQ2: “What are the reasons for differences inanalyzability?”

Ana2.2.1 and Ana2.2.2. Analogously to Modu2.2.1 and Modu2.2.2.

General hypotheses for analyzability (reason question; GH2.1 to GH2.8).See Section C.5.1.

Specific hypotheses for analyzability (reason question; AnaH2.1 and AnaH2.2).AnaH2.1 states that the number of additions and changes to the Copy scenariowhen implementing a one of the “Rule scenarios” correlates negatively with ana-lyzability. This can generally be confirmed based on the Modu2.2.1 and Modu2.2.2metric measurements which provide the number of changes and additions, re-spectively. The evaluation of the latter metrics also discusses the analyzabilityproperty (cf. Section C.5.3).

AnaH2.2 states that the scenario features “hierarchical” and “abstraction level”generally cause a lower analyzability than the “1:1 relations” and “structure”scenario features. One assumption for checking this metric is that the analyzabilityfor every of the “Rule scenarios” is lower than for the Copy scenario. However, thisassumption does not hold for the measured values indicating that the hypothesisdoes not hold as stated. Moreover, an inspection of the relevant scenarios does notindicate that the hypothesis is correct. For instance, Rule12 is characterized by arefactoring but suffers from low analyzability values when compared to the otherscenarios. Rule4 and Rule5, in contrast, specify a refinement and an abstraction,respectively, but were assessed with a high analyzability (compared to the otherscenarios).

Based on these results, AnaH2.2 needs to be preliminary discarded. However,due to several sources of error regarding the questionnaire (number of participants,no concrete definition of analyzability in mind, every participant evaluated Copyfirst, etc.), AnaH2.2 should be revised in future work via a controlled experiment.

265

C. Results

As I cannot find a concrete reason why the analyzability of the Copy scenarioshould be lower than one of the other scenarios, I further conclude that the ques-tionnaire generally suffers from the problem that the result values can hardly becompared between different scenarios. I suppose that the reasons for these issuesare the problems inherent to the questionnaire as discussed in Section 6.2.3.

C.5.8 ModiQ1: “What is the modifiability of theimplementations?”

Modi1.1.1 to Modi1.1.3. Analogously to Modu1.1.2 to Modu1.1.4.

Modi1.2.1/Modi1.2.2: “Evaluated questionnaire for modifiability (qualityquestion)”. Figure C.31 shows the questionnaire results regarding modifiabil-ity; the values are positively correlated with modifiability under the assumptionthat the participants evaluated modifiability correctly. With this assumption, Fig-ure C.31 illustrates that the Java, QVT-O, and QVT-R values for modifiabilityare not as similarly distributed to the corresponding values for modularity as wasexpected.

The values only allow to derive a weak positive correlation between these twoquality properties. For instance, the Copy as well as the Rule1, Rule2, Rule5,Rule7, and Rule9 scenarios provide the same order of languages when orderedby their modifiability values. For the remaining scenarios, this is, however, notthe case. For instance, with respect to the Rule8 scenario, QVT-R performs bestregarding modularity and worst regarding modifiability.

These observations allow for the following interpretation. The results indicatethat modularity generally affects modifiability positively. However, there are sit-uations in which other factors outweigh the positive influence of modularity. Oneof these factors could be the analyzability of the implementation. For instance,although the Rule8 implementation has a high modularity in QVT-R, its analyz-ability is especially low. The latter issue could, therefore, outweigh the positiveinfluence of modularity and cause a low modifiability for the QVT-R implemen-tation of Rule8. In fact, a causal relation between these three quality propertiesis likely as, e.g., for being capable of modifying a transformation, a transforma-tion engineer could first try to identify the appropriate part of the transformation(which is positively influenced by a high modularity). In case the identified partis hard to analyze, the transformation engineer finds it difficult to modify thetransformation.

General hypotheses for modifiability (quality question; GH1.1 to GH1.3).Analogously to the evaluation of GH1.1 to GH1.3 for modularity (due to thesimilarities between modifiability and modularity).

266


Specific hypotheses for modifiability (quality question; ModiH1). The hy-pothesis ModiH1 states that modifiability correlates with modularity. As theevaluation of the questionnaire shows, this hypothesis can only partly be con-firmed. On the one hand, the results indicate that modularity has a positiveinfluence on modifiability. On the other hand, other factors can outweigh thisinfluence. An example for such a factor is the analyzability. Future work needs toaddress the influence of further factors.

C.5.9 ModiQ2: “What are the reasons for differences inmodifiability?”

Modi2.2.1: “Average decrease in other quality properties when moving fromCopy to RuleX”. This metric is based on the definition of modifiability. I ex-pect that low values indicate that the Copy scenario is suitable as a basis forimplementing the corresponding scenario while high values indicate the opposite.

Figure C.35 illustrates the measurements of this metric: (1) in Java, there isgenerally the lowest decrease with an average value of approximately -0.15 perscenario; Java especially performs well for the Rule4 to Rule6 scenarios, (2) inQVT-O, the highest decrease is observable with an average value of approximately0.63 per scenario, and (3) in QVT-R, the decrease is relatively low with an averagevalue of approximately 0.06 per scenario. Furthermore, Rule10 to Rule12 as well asespecially Rule8 are characterized by high values within every language considered.

The different average values indicate that the general implementation structureas provided by the Copy scenario implementations in Java and QVT-R is generallyreusable and easy to modify. In QVT-R, this statement relates to single rules thatneed to be modified to comply to changed requirements. In Java, this statementrelates to the infrastructure code (e.g., for loading model instances) that can bereused for several other scenarios. One reason for the higher values in QVT-Ocan, therefore, be the fact that it needs less infrastructure code for the implemen-tation of the Copy scenario. Furthermore, the use of the “copy” operation forthe QVT-O implementation of the Copy scenario generally caused high evalua-tion differences when comparing the values for the Copy scenario with the otherscenarios. Therefore, these observations do not allow for a general interpretationof the modifiability of QVT-O.

The observations for Rule8 and Rule10 to Rule12 indicate that it is difficultto implement these scenarios based on the implemented Copy scenario. Due tothe size of the Copy scenario implementation in QVT-R, these observations par-ticularly indicate that QVT-R could be unsuited for the implementation of thesescenarios in general.

Concerning the scenario features, Rule8 specifies a “hierarchical clustering”,Rule10 a “refinement”, Rule11 a scenario with “two source domains”, and Rule12a “refactoring”. Under the assumption that these scenario features caused theresults, the results indicate that these scenarios characterized by a high amount

267

C. Results

of these features should not be implemented in QVT-R but in one of the otherlanguages.

General hypotheses for modifiability (reason question; GH2.1 to GH2.8).See Section C.5.1.

Specific hypotheses for modifiability (reason question; ModiH2). The hy-pothesis ModiH2 states that QVT-R’s implicit scheduling improves modifiability(compared to other languages). The questionnaire results indicate that this hy-pothesis does not hold: on average, the participants evaluated the modifiability ofJava and QVT-O approximately 5% higher when compared to QVT-R. WhetherQVT-R’s implicit scheduling has no influence on modifiability could, however, notbe determined. These results just show that the influence is not as high as was itexpected. Therefore, this issue is left as future work.

C.5.10 ConsQ1: “What is the consistency of theimplementations?”

Cons1.1.1 to Cons1.1.7. Analogously to Appro1.1.1 to Appro1.1.4 as well asModu1.1.1, Modu1.1.2, and Modu1.1.4, respectively.

Cons1.2.1/Cons1.2.2: “Evaluated questionnaire for consistency (quality ques-tion)”. Figure C.32 shows the questionnaire results regarding consistency; thevalues are positively correlated with consistency under the assumption that theparticipants evaluated consistency correctly. With this assumption, Figure C.32illustrates: the values of every language considered are very similarly distributedover the scenarios where QVT-R generally provides the highest consistency fol-lowed by QVT-O and Java. Accordingly, QVT-R has an average consistency ofapproximately 5.76, QVT-O of approximately 5.02 (12.87% lower than QVT-R),and Java of approximately 4.35 (24.46% lower than QVT-R).

The only exceptions where QVT-R does not perform best regarding consistencyare the Copy and Rule3 scenarios where QVT-O performs best followed by QVT-Rand Java, respectively: However, the difference between the QVT-R and QVT-Ovalues is small compared to the differences between these languages for the otherscenarios. Furthermore, the results for the Copy scenario are biased as the Copyscenario was always evaluated before the Rule1 to Rule2 scenarios (Section C.5.7also identifies this issue as a problem inherent to the questionnaire).

Furthermore, it is noticeable that the questionnaire participants of the secondgroup (who evaluated the Copy and the Rule4 to Rule8 scenarios) generally eval-uated consistency higher than the other participants. If the reason for this is notinherent to the scenarios, the reason for this will be generally higher consistencyassessments by the participants. This latter hypothesis is confirmed by the factthat the participants of the second group evaluated the consistency of the Copy

268


scenario generally higher than the other groups. This is, again, a problem causedby the direct-rating method (cf. Section 6.2.3).

One consequence of the latter observation is that it is only limitedly possibleto compare the consistency results between the different groups of participants.However, a comparison of the scenarios evaluated by a single group shows thatthere are generally different consistency values between different scenarios (see,for instance, Figure C.32).

General hypotheses for consistency (quality question; GH1.1 to GH1.3).The standard deviation of the average consistency values from the questionnairesas discussed above is approximately 0.7 indicating that the different languageshave a “medium” difference regarding consistency (this neither falsifies nor con-firms GH1.1). I classified the difference into “medium” as the standard deviationfor consistency is not the largest standard deviation over all standard deviationsconsidered for GH1.1 and it is over the 0.5 mark (analogously to the evaluationof modularity).

Regarding GH1.2, the questionnaire results generally need to be classified into“low”, “medium”, and “high”. However, as the values of QVT-R are in mostcases higher than for the other languages (and in the remaining cases close to thehighest value), this classification is not needed: it holds that QVT-R dominatesJava as well as QVT-O (cf. Section 5.4.1 for the definition of the “dominates”relation). Therefore, GH1.2 is falsified for the considered scenarios under the as-sumption that the questionnaire results for consistency are correct. That is, thereare quality properties that inherently perform better within a concrete M2M ap-proach/language/engine combination than in other combinations. This gives afirst clear indication that, given the requirement that consistency of a transfor-mation has the highest priority, a relational language like QVT-R should be used.Note that this observation is preliminary at this point as it is based on an investi-gation of the consistency values as assessed by the participants. To concretize thisresult, the reasons for this difference have to be considered, too (cf. Section C.5.11).

The third general hypothesis GH1.3 stating that different scenarios come withdifferent modularity values can be confirmed based on the questionnaire results.This directly follows from the evaluation of these results as provided above.

C.5.11 ConsQ2: “What are the reasons for differences inconsistency?”

Cons2.2.1: “Number of newly introduced inconsistencies when moving fromCopy to RuleX”. This metric is based on a “special task” within the question-naire. Questionnaire participants had to find inconsistencies that were introducedin Rule1 to Rule12 assuming these scenarios were implemented based on the Copyscenario. The metric is based on the consistency definition and, given one of therule scenarios, I expect a negative correlation between the value measured for this

269

C. Results

metric and the consistency of the scenario implementation.

Figure C.36 illustrates the results for this metric. The questionnaire participantsgenerally found only few inconsistencies: in sum, two in QVT-O and three in Javaand QVT-R, respectively. A reason for these low values can be that I implementedall QVT-O and Java implementations and attached great importance to a highconsistency between the implementations. Furthermore, the QVT-R implementa-tions of Rule1 to Rule12 are generally close to the QVT-R implementation of theCopy scenario which generally causes a high consistency.

Given the latter observations, the results indicate that consistency is generallyhigh for every language. The reason for this is that, otherwise, more inconsistencieshad been found.

General hypotheses for consistency (reason question; GH2.1 to GH2.8). SeeSection C.5.1.

Specific hypotheses for consistency (reason question; ConsH2). ConsH2states that the scenario features “hierarchical” and “abstraction level” generallycause a lower consistency than the “1:1 relations” and “structure” scenario fea-tures. Based on the questionnaire results, this hypothesis needs to be preliminarydiscarded and revised in future work via a controlled experiment. The reasons forthis are the same as for the hypothesis AnaH2.2 (cf. Section C.5.7).

C.5.12 ApproQ1: “What is the appropriateness recognizabilityof the implementations?”

Appro1.1.1: “Lines of code”. Figure C.13 illustrates the measurements for thelines of code metric for which Kapova et al. [KGH12] identify a positive correlationwith appropriateness recognizability. Under the assumption that this correlationis correct, the measured values allow the following interpretation: (1) in Java,UML2RDBMS has the best (followed by Rule8) and Copy the worst appropriate-ness recognizability; the other scenarios lie around the 140.00 mark, (2) in QVT-O,also UML2RDBMS performs best (but is followed by Rule5) and Copy performsworst; the other scenarios lie around the 25.00 mark, and (3) in QVT-R, again,UML2RDBMS performs best (followed by Rule7 and Rule8) and Copy as well asRule1 to Rule4 perform worst; the other scenarios lie around the 125.00 mark. Thevalues of QVT-R are, relatively seen, most stable over the considered scenarios;the values of Java and QVT-O show larger deviations indicating a larger influenceof the scenario and a lower influence of the language regarding appropriatenessrecognizability when compared to QVT-R.

Under the assumption that the measured values are comparable between thedifferent languages, QVT-O generally performs worst (given a positive correlationregarding the lines of code metric): on average, Java has approximately 480% and

270


QVT-R 425% more lines of code. Java performs best in most cases; only for theCopy, Rule11, and Rule12 scenarios QVT-R performs better.

Appro1.1.2: “Number of starts”. Figure C.14 illustrates the measurements ofthis metric. Referring to Kapova et al. [KGH12], this metric has a positive cor-relation with appropriateness recognizability. Under this assumption, it followsthat: (1) in Java and QVT-O, the appropriateness recognizability is stable overall scenarios as every value lies exactly at the 1.00 mark, and (2) in QVT-R, es-pecially Copy, Rule1, Rule2, Rule4, Rule6, Rule9, and Rule12 have a high andUML2RDBMS, Rule8, Rule10, Rule11, and Rule12 a low appropriateness recog-nizability. The reason that every measured value for Java and QVT-O is 1.00 isthat the implementations for both languages have exactly one entry point (themain method in Java and the main operation in QVT-O).

The fact that the value does not change within these languages indicates thatthis metric is not well suited for these. Therefore, it is also of low value to comparethe measured values between the different languages and the results may only beused for interpretations regarding QVT-R. One interesting measurement is themeasurement for Rule8 as it has a value of 0.00. The reason for this is that everytop relation of the implementation has a when clause. In fact, this indicates a lowappropriateness recognizability as it is not straight-forward anymore to understandwith which transformation rule a transformation execution starts.

Appro1.1.3: “Number of rules”. According to Kapova et al. [KGH12], thismetric positively correlates with appropriateness recognizability. Under this as-sumption, Figure C.15 illustrates: (1) in Java, UML2RDBMS and Rule8 havea high and Copy, Rule11, and Rule12 have a low appropriateness recognizabil-ity, (2) in QVT-O, UML2RDBMS, Rule8, and Rule9 perform well and especiallyCopy, Rule2, Rule11, and Rule12 perform badly, and (3) in QVT-R, Rule5, Rule7,Rule8, Rule11, and Rule12 have the highest and UML2RDBMS the lowest appro-priateness recognizability. The values of Java and QVT-O show larger deviationsindicating a larger influence of the scenario and a lower influence of the languageregarding appropriateness recognizability when compared to QVT-R.

Under the assumption that the measured values are comparable between thedifferent languages, QVT-O performs worst (given a positive correlation betweenthe number of rules metric): on average, Java has approximately 268% and QVT-R455% more rules. QVT-R performs best or at least as well as Java in most cases.For the UML2RDBMS scenario the situation is different. In this scenario, Javaperforms best, followed by QVT-O and QVT-R.

Appro1.1.4: “Number of top-level rules”. The interpretation of this metricis similar to the one of Appro1.1.2. This metric is, therefore, also not suited forJava and QVT-O. For QVT-R, Figure C.16 illustrates: UML2RDBMS performs

271

C. Results

worst (followed by Rule10) and Rule5 as well as Rule12 perform best (followed byRule6 and Rule11).

Appro1.1.5: “Average size of the domain pattern”. According to Kapova etal. [KGH12], this metric negatively correlates with appropriateness recognizabil-ity. Under this assumption, Figure C.17 illustrates: (1) in Java, especially Rule3,Rule4, and Rule10 perform best and UML2RDBMS worst (followed by Rule7),(2) in QVT-O, Copy, Rule11, and Rule12 perform best and UML2RDBMS, Rule3,Rule5, Rule8, and Rule10 worst, and (3) in QVT-R, most scenarios performequally well as they lie around the 1.70 mark; only UML2RDBMS performs worse.Again, the latter observation indicates that the appropriateness recognizability ofQVT-R is less influences by the scenarios as it is the case for the other languages.

Under the assumption that the measured values are comparable between thedifferent languages, QVT-R generally performs best, followed by QVT-O and,finally, Java. Compared to QVT-O, the average “average size of the domainpattern” over all scenarios is approximately 130% higher in Java and 51% lowerin QVT-R.

Appro1.1.6: “Average number of explicit internal scheduling calls”. Analo-gously to Modu1.1.4.

Appro1.2.1/Appro1.2.2: “Evaluated questionnaire for appropriateness recog-nizability (quality question)”. Figure C.33 shows the questionnaire results re-garding appropriateness recognizability; the values are positively correlated withappropriateness recognizability under the assumption that the participants evalu-ated appropriateness recognizability correctly. With this assumption, Figure C.33illustrates: (1) in Java, Rule4 to Rule6 perform best and Rule8 worst; the valueslie around the 4.00 mark, (2) in QVT-O, Copy, Rule1 to Rule3, Rule6, and Rule9perform especially well and Rule8 as well as Rule10 perform worst; the values liearound the 5.00 mark, (3) in QVT-R, UML2RDBMS, Rule1, and Rule3 performbest and Copy, Rule5, Rule8, Rule10, and Rule11 perform worst; the values liearound the 4.00 mark. The evaluation of Modu2.2.1 already discusses the lowvalues for Rule8.

When comparing the different languages, QVT-O generally performs best re-garding appropriateness recognizability. The only exceptions are UML2RDBMSwhere QVT-R performs better and Rule10 where Java performs better. In general,QVT-R performs worst regarding appropriateness recognizability. Exceptions arethe UML2RDBMS and Rule1 to Rule3 scenarios where Java performs worst, re-spectively.

A striking observation is that QVT-O performs particularly well for the Copyscenario. This indicates that copy scenarios with several 1:1 relations performespecially well in QVT-O. QVT-R, on the other hand, suffers from a lower ap-propriateness recognizability inherent to the language itself. Exceptions are thescenarios characterized by a high amount of 1:1 relations.

272


General hypotheses for appropriateness recognizability (quality question; GH1.1to GH1.3). The standard deviation of the average appropriateness recognizabil-ity values from the questionnaires as discussed above is approximately 0.7 indi-cating that the different languages have a “medium” difference regarding modu-larity (this neither falsifies nor confirms GH1.1). I classified the difference into“medium” as the standard deviation for appropriateness recognizability is not thelargest standard deviation over all standard deviations considered for GH1.1 andit is over the 0.5 mark (analogously to the evaluation of modularity).

Regarding GH1.2, the questionnaire results generally need to be classified into“low”, “medium”, and “high”. However, as the values of QVT-O are in mostcases higher than for the other languages (and in the remaining cases close to thehighest value), this classification is not needed: it holds that QVT-O dominatesJava as well as QVT-R (cf. Section 5.4.1 for the definition of the “dominates”relation). Therefore, GH1.2 is falsified for the considered scenarios under theassumption that the questionnaire results for appropriateness recognizability arecorrect. Besides modularity, appropriateness recognizability is another qualityproperty that inherently performs better within a concrete M2M approach/lan-guage/engine combination compared to other combinations. Again, note that thisresult is limited and preliminary but that it gives a first clear indication that, giventhe requirement that appropriateness recognizability of a transformation has thehighest priority, an imperative language like QVT-O should be used.

The third general hypothesis GH1.3 stating that different scenarios come withdifferent appropriateness recognizability values can be confirmed based on thequestionnaire results. This directly follows from the evaluation of these results asprovided above.

C.5.13 ApproQ2: “What are the reasons for differences inappropriateness recognizability?”

Appro2.2.1: “Number of additional/changed comment lines of code whenmoving from Copy to RuleX”. This metric is based on the definition of appro-priateness recognizability. I expect that high values in a given scenario indicate ahigher appropriateness recognizability of the scenario than the Copy scenario has.

Figure C.18 illustrates the measurements of this metric: (1) in Java, the mea-surements indicate high appropriateness recognizability for Rule3, Rule5, Rule7,Rule8, Rule9, and Rule10 and low one for Rule1, Rule2, Rule6, and Rule12; thevalues lie between 15.00 and 90.00, (2) in QVT-O, all measurements have approx-imately a 41% lower value than the corresponding measurement in Java and are,therefore, similarly distributed, and (3) in QVT-R, most measurements lie aroundthe 5.00 mark; similar to the other languages, Rule8 has exceptionally high value(approximately 30.00).

Under the assumption that the measured values are comparable between thedifferent languages, Java performs best followed by QVT-O (given a positive cor-

273

C. Results

relation between Appro2.2.1 and appropriateness recognizability): on average,QVT-O has approximately 41% and QVT-R 78% fewer additional/changed com-ment lines of code.

When I implemented the scenarios, I was able to re-use most comments of theQVT-O code also in Java. This was possible because the imperative languageparadigm of both languages allowed to structure the corresponding implementa-tions similar. However, in Java I applied a different comment-style to conform toJavaDoc. This has, for instance, the consequence that separate lines for parametername and parameter description of methods are used. Therefore, the measuredvalues of QVT-O and Java are only partly comparable.

Another issue is the low amount of comments within QVT-R. If I had imple-mented these scenarios, I would have provided comments for every QVT-R relationwith the consequence of higher measurement values. Therefore, a comparison isalso limited in this case. This observation, however, allows the interpretation thatthe developers of the QVT-R transformations had not considered it valuable toadd an extensive documentation. A possible reason for this is that they assumedthe implementations to be self-explaining enough.

The questionnaire results regarding appropriateness recognizability indicate thatthe latter assumption was wrong since the QVT-R implementations have the low-est appropriateness recognizability. Furthermore, the questionnaire results alsocontradict the expected correlation with this metric (e.g., Rule8 has low appropri-ateness recognizability values within the questionnaire although the measurementsof this metric provide high values for Rule8).

I conclude that future work needs to revise this metric with respect to compa-rability. Also the causal relation between the number comment lines and qualityproperties needs to be inspected. For instance, one may measure the commentlines per transformation rule to derive the appropriateness recognizability of thetransformation. In any case, it is important to base interpretations of this met-ric on a consistently applied commenting scheme such that comparisons betweenlanguages become reasonable.

General hypotheses for appropriateness recognizability (reason question; GH2.1to GH2.8). See Section C.5.1.

Specific hypotheses for appropriateness recognizability (reason question; Ap-proH2). This hypothesis states that the documentation of changes and additionsto generic copy rules are documented and, thus, improve appropriateness recog-nizability in a scenario that is based on generic rule sets. The first part (cf. thedescription of ApproH2 in Section 5.5.6) for checking this hypothesis indicatesthat the hypothesis can hold: all measured values for Appro2.2.1 are positive.

The second part for checking the hypothesis requires that appropriateness recog-nizability is higher for every scenario based on a generic copy rule set. However,this does not hold in every case. For instance, the QVT-R implementations of

274


Rule8 and Rule10 have lower values in the questionnaire than the Copy scenario.However, an inspection of the concrete implementations shows that these two sce-narios highly deviate from the generic copy rule set as only few or no rules wherereused. Therefore, these scenarios cannot be seen as “based on a generic copy ruleset” indicating that the hypothesis ApproH2 holds for QVT-R.

On the other hand, the values for Java and for QVT-O do not allow to confirmthe hypothesis. Especially for QVT-O it holds that the questionnaire values ofthe Copy scenario are generally higher than the values of the other scenarios.However, the QVT-O and Java implementations of the Copy scenario are notbased on a generic copy rule set. Instead, the implementations apply a dedicatedcopy operation and method, respectively. ApproH2 can, hence, not accurately bechecked for these cases.

C.5.14 LearnQ1: “What is the learnability of thelanguage/engine combinations?”

Learn1.1.1: “Number of possible language constructs”. Corresponding toGrossman et al. [GFA09], there is a negative correlation between this metric andlearnability. Figure C.19 suggests the following if this assumption holds: QVT-Rhas the best learnability, followed by Java and QVT-O.

Learn1.1.2: “Number of applied language constructs”. Corresponding toGrossman et al. [GFA09], there is a positive correlation between this metric andlearnability. Figure C.19 suggests the following if this assumption holds: QVT-Ohas the best learnability, followed by Java and QVT-O.

Additionally to this metric, Figure C.20 illustrates the percentage of appliedlanguage constructs regarding the possible constructs. A high percentage mayindicate to which level a concrete language has been already learned and may have,thus, a positive correlation with learnability. Therefore, Figure C.20 suggests thatQVT-R has the best learnability, followed by Java and QVT-O.

Learn1.1.3: “Time until a scenario was implemented successfully”. Gross-man et al. [GFA09] describe a negative correlation with this metric and learn-ability. If this holds for the case of M2M transformation languages, Figure C.21suggests that sometimes QVT-O has a better learnability and sometimes Java (forQVT-R, no data is available as the implementations were given).

However, when I implemented the scenarios, I always implemented the scenarioin QVT-O first and in Java afterwards. I had, therefore, already a concrete solu-tion idea in mind when I started implementing in Java and did not need to thinkabout different solution possibilities. The reason that I could reuse the ideas fromQVT-O in Java lie in the common language paradigm (imperative) of both lan-guages. Furthermore, I was also able to reuse several comments from the QVT-O

275

C. Results

implementation in Java. These observations indicate that QVT-O provides a bet-ter learnability as the Java implementations would probably have taken longer ifI implemented the scenarios in Java first. As this is only a hypothesis, this issuemay be inspected in future work via controlled experiments.

Learn1.2.1/Learn1.2.2: “Evaluated questionnaire for learnability (quality ques-tion)”. Figure C.34 shows the questionnaire results regarding learnability; thevalues are positively correlated with learnability under the assumption that theparticipants evaluated learnability correctly.

The question was asked for every scenario separately but it is, in general,scenario-independent. Therefore, it makes sense to first consider the average eval-uation results over all scenarios: Java has an average of 4.26, QVT-O of 5.45, andQVT-R of 3.43. Hence, QVT-O performs best followed by Java (21.83% loweraverage) and QVT-R (36.93% lower average).1

In addition to these average values for learnability, Figure C.34 illustrates thelearnability evaluations by the participants per scenario. The results are generallysimilar to the results for the average values. However, the results for the thirdgroup of participants (who evaluated Copy and Rule9 to Rule12) allow for an in-teresting observation: they gradually reduced the learnability of QVT-O and grad-ually increased the learnability of Java over Rule9 to Rule12. This indicates thatonce Java’s infrastructure code for implementing transformations is understood,the learnability of QVT-O and Java get similar. This view reflects learnabilityas a process of continuing learning which tends to be similar in imperative M2Mlanguages. The observation that QVT-R has a lower learnability indicates thatthe declarative language paradigm is less intuitive for transformation engineersthan the imperative paradigm.

General hypotheses for learnability (quality question; GH1.1 to GH1.3). Thestandard deviation of the average learnability values from the questionnaires asdiscussed above is approximately 1.0 indicating that the different languages havea “high” difference regarding learnability (this confirms GH1.1). I classified thedifference into “high” as the standard deviation for learnability is the largeststandard deviation over all standard deviations considered for GH1.1.

GH1.2 and GH1.3 target quality differences regarding concrete scenarios. How-ever, as learnability of an M2M language in general is scenario-independent, thesehypotheses cannot be applied and are, therefore, not considered for learnability.

1Note that a statement like “Java has a 21.83% lower learnability than QVT-O” would requireratio-scaled values. However, referring to Clason and Dormody [CD94] the measured valuesfrom a Likert scale can only be seen as interval-scaled. The problem for ratio-scaled valuesis that there is no true zero point for learnability.

276


C.5.15 LearnQ2: “What are the reasons for differences inlearnability?”

Learn2.2.1: “Number of newly introduced language constructs when movingfrom Copy to RuleX”. According to Grossman et al. [GFA09], this metric ispositively correlated with learnability. It shows that, even though learnabilityis scenario-independent in general, scenarios can be used for a measurement oflearnability. The goal of this metric is to show which language features especiallyfoster a good learnability by making explicit which scenarios apply new languagefeatures in comparison to the Copy scenario. Note that this implicitly assumesthat there is a causal relation between learning effects and different languageconstructs.

Figure C.22 illustrates the measurements of this metric: (1) in Java, three newlanguage constructs are introduced in most of the cases; only Rule7, Rule9, andRule11 introduce two and Rule8 four constructs, (2) in QVT-O, the number ofintroduced language constructs varies between four and eighteen; Rule5, Rule7,Rule8, and Rule9 have especially high values, and (3) in QVT-R, only Rule1,Rule2, Rule8, and Rule9 introduce one new languages construct, the other sce-narios none.

This indicates, again, that QVT-O provides the best learnability as the learningeffect of QVT-O is the highest based on the number of newly introduced languageconstructs. The four scenarios with the highest number of newly introduced con-structs for QVT-O involve the scenario features abstraction, duality, hierarchicalclustering, and multidirectional. This indicates that an implementation of sce-narios with these features exceedingly fosters the learning effect of QVT-O. Theother two languages are less influenced by different scenario features.

Note, however, that these are new hypotheses which have to be investigatedin future work. For instance, the high values within QVT-O are also a resultof the fact that the Copy implementation is very small and, hence, more newlyintroduced language constructs are needed when implementing one of the Rulescenarios. An additional factor is the fact that QVT-O has most language con-structs (in the number of keywords).

Learn2.2.12 to Learn2.2.16: “Size of language documentation”. The sizeof the considered language documentations shows no correlation with learnability(cf. Figure C.24 to Figure C.27).

One interesting observation is that even though the Java (followed by QVT-O)documentation has the largest size in terms of pages, lines, words, or characters(cf. Figure C.28 to Figure C.26), the size in terms of the number of figures behavesthe other way round (cf. Figure C.27), i.e., QVT-R has the largest size followedby QVT-O and Java.

An inspection of the concrete figures provides the reason for this: QVT-R andQVT-O provide several illustrations for underlying metamodels. This shows thatthese languages were specifically designed within an MDSD context whereas Java

277

C. Results

was designed as a general-purpose language. However, this result is only based onthe inspection of a Java specification which does not consider EMF. Therefore, aninspection of EMF documentations and their differences to QVT-O and QVT-Rdocumentations should be investigated in future work.

General hypotheses for learnability (reason question; GH2.1 to GH2.8). GH2.2,GH2.3, and GH2.8 are not applicable for learnability since they are scenario-dependent. The other hypotheses are left as future work (cf. Section C.5.1).

Specific hypotheses for learnability (reason question; LearnH2.1 and LearnH2.2).The hypothesis LearnH2.1 states that scenarios with less 1:1 relations especiallyfoster learning effects. This can only partly be confirmed. The reason for thisare the Learn2.2.1 measurement results for QVT-O which indicate that especiallyRule7 fosters a learning effect. However, Rule7 is characterized by a high degreeof 1:1 relations of type “duality”. Therefore, hypotheses LearnH2.1 can neither beconfirmed for the general case of 1:1 relations nor the special case of duality. Thehypothesis can, therefore, be refined to be only applicable on 1:1 relations of type“mapping”. The Learn2.2.1 measurement for Rule1, as an example of a scenariowith a high degree of 1:1 mappings, indicates that this refined hypothesis couldhold. Future work should confirm this result.

On the other hand, LearnH2.2 stating that the size of a language documentationdoes not significantly influence learnability can be confirmed. The reason for thisis that there is no correlation between the documentation size measurements andquestionnaire results for learnability (cf. the interpretations for Learn2.2.12 toLearn2.2.16).

278

Bibliography

[BBJ+08] Achim Baier, Steffen Becker, Martin Jung, Klaus Krogmann,Carsten Rottgers, Niels Streekmann, Karsten Thoms, and Stef-fen Zschaler. Handbuch der Software-Architektur, chapter Modell-getriebene Software-Entwicklung, pages 93–122. dPunkt.verlag Hei-delberg, 2 edition, December 2008. 2, 8, 9, 38, 39, 40, 41, 42, 46

[BCR02] Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach. TheGoal Question Metric Approach. In John. J. Marciniak, editor, Ency-clopedia of Software Engineering, pages 578–583. John Wiley & Sons,2 edition, 2002. 10

[BDE+06] Patrik Berander, Lars O. Damm, Jeanette Eriksson, Tony Gorschek,Kennet Henningsson, Per Jonsson, Simon Kaagstrom, Drazen Milicic,Frans Maartensson, Kari Ronkko, and Piotr Tomaszewski. Softwarequality attributes and trade-offs. Blekinge Institute of Technology,2006. 63

[BDI09] Sergey Boyko, Radomil Dvorak, and Alexander Igdalov. The Artof Model Transformations with Operational QVT presentation slides.http://www.eclipse.org/m2m/qvto/doc/, 2009. Borland SoftwareCorporation. EclipseCon 2009. last retrieved 2012-10-17. 37

[Bec08] Steffen Becker. Coupled Model Transformations for QoS EnabledComponent-Based Software Design. PhD thesis, University of Old-enburg, Germany, January 2008. 8, 44

[Bie10] Matthias Biehl. Literature Study on Model Transformations. Techni-cal Report ISRN/KTH/MMK/R-10/07-SE, Royal Institute of Tech-nology, July 2010. 2, 4

[Boe78] Barry W. Boehm. Characteristics of software quality. TRW series ofsoftware technology. North-Holland Pub. Co., 1978. 12, 63

[Bos11] Steven Bosems. A Performance Analysis of Model Transformationsand Tools. Master’s thesis, University of Twente, The Netherlands,March 2011. 64, 130

[BS99] Concha Bielza and Prakash P. Shenoy. A Comparison of Graph-ical Techniques for Asymmetric Decision Problems. Manage. Sci.,45(11):1552–1569, November 1999. 119, 120

279

http://www.eclipse.org/m2m/qvto/doc/

Bibliography

[BS12] Julian Bradfield and Perdita Stevens. Recursive checkonly QVT-Rtransformations with general when and where clauses via the modalmu calculus. In Proceedings of the 15th international conference onFundamental Approaches to Software Engineering, FASE’12, pages194–208, Berlin, Heidelberg, 2012. Springer-Verlag. 106

[BW84] Victor R. Basili and David M. Weiss. A Methodology for Collect-ing Valid Software Engineering Data. Software Engineering, IEEETransactions on, SE-10(6):728 –738, November 1984. 10

[CD94] Dennis L. Clason and Thomas J. Dormody. Analyzing Data Measuredby Individual Likert-Type Items. Journal of Agricultural Education,35(4):31–35, 1994. 276

[CE00] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative Program-ming: Methods, Tools, and Applications. ACM Press/Addison-WesleyPublishing Co., New York, NY, USA, 2000. 131

[CFM10] Andrea Ciancone, Antonio Filieri, and Raffaela Mirandola. MANTra:Towards Model Transformation Testing. In Fernando Brito e Abreu,Joao Pascoal Faria, and Ricardo Jorge Machado, editors, QUATIC,pages 97–105. IEEE Computer Society, 2010. 111

[CH03] Krzysztof Czarnecki and Simon Helsen. Classification of Model Trans-formation Approaches. In OOPSLA03 Workshop on Generative Tech-niques in the Context of Model-Driven Architecture, 2003. 16

[CH06] Krzysztof Czarnecki and Simon Helsen. Feature-Based Survey ofModel Transformation Approaches. IBM Syst. J., 45(3):621–645, July2006. 2, 9, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 29, 38, 40, 43,55, 57, 128, 131

[CHE04] Krzysztof Czarnecki, Simon Helsen, and Ulrich W. Eisenecker. StagedConfiguration Using Feature Models. In Robert L. Nord, editor,SPLC, volume 3154 of Lecture Notes in Computer Science, pages 266–283. Springer, 2004. 131

[CHE05] Krzysztof Czarnecki, Simon Helsen, and Ulrich W. Eisenecker. For-malizing Cardinality-based Feature Models and their Specialization.Software Process: Improvement and Practice, 10(1):7–29, 2005. 131

[Dvo08] Radomil Dvorak. Model Transformations with Operational QVT pre-sentation slides. http://www.eclipse.org/m2m/qvto/doc/, 2008.Borland Software Corporation. EclipseCon 2008. last retrieved 2012-10-17. 36, 37

280

http://www.eclipse.org/m2m/qvto/doc/

Bibliography

[Ecl08] Eclipse Modeling Project. QVT Operational Developer Guide. QVTOperational (Version 3.1.0) Eclipse Plugin Online Help, 2008. 37

[Ecl12a] Eclipse Modeling Project. ATL (Version 3.2.1). http://www.

eclipse.org/atl/, 2012. last retrieved 2012-10-17. 34, 37, 130

[Ecl12b] Eclipse Modeling Project. Operational QVT (Version 3.1.0). http:

//www.eclipse.org/m2m/, 2012. last retrieved 2012-10-17. 36, 37

[Ecl12c] Eclipse Modeling Project. Operational QVT Wiki. http://wiki.

eclipse.org/M2M/Operational_QVT_Language_%28QVTO%29, 2012.last retrieved 2012-10-17. 36, 37

[Ecl12d] Eclipse.org. Xtend (Version 2.3.1). http://www.eclipse.org/

xtend/, 2012. last retrieved 2012-10-17. 130

[EW03] Franz Eisenfuhr and Martin Weber. Rationales Entscheiden, volume 4of Springer-Lehrbuch. Springer, 2003. 99, 100

[FBB+99] Martin Fowler, Kent Beck, John Brant, William Opdyke, and DonRoberts. Refactoring: Improving the Design of Existing Code.Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,1 edition, 1999. 38

[Fra12] France Telecom R&D. SmartQVT (Version 0.2.2). http://

sourceforge.net/projects/smartqvt/; original web site http://

smartqvt.elibel.tm.fr/ not avaiblable anymore (Access Denied),2012. last retrieved 2012-10-17. 37

[GFA09] Tovi Grossman, George Fitzmaurice, and Ramtin Attar. A Surveyof Software Learnability: Metrics, Methodologies and Guidelines. InProceedings of the 27th international conference on Human factors incomputing systems, CHI ’09, pages 649–658, New York, NY, USA,2009. ACM. 88, 89, 90, 91, 165, 166, 170, 175, 180, 185, 190, 195, 200,205, 210, 215, 220, 225, 230, 235, 275, 277

[GGKH03] Tracy Gardner, Catherine Griffin, Jana Koehler, and Rainer Hauser.A review of OMG MOF 2.0 Query / Views / Transformations Submis-sions and Recommendations towards the final Standard, July 2003. 2,9

[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.Design Patterns: Elements of Reusable Object-Oriented Software.Addison-Wesley, 1995. 74

[GJSB05] James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. Java(TM)Language Specification, The (3rd Edition) (Java (Addison-Wesley)).Addison-Wesley Professional, 2005. 88, 89, 91, 159, 160, 161

281

http://www.eclipse.org/atl/

http://www.eclipse.org/atl/

http://www.eclipse.org/m2m/

http://www.eclipse.org/m2m/

http://wiki.eclipse.org/M2M/Operational_QVT_Language_%28QVTO%29

http://wiki.eclipse.org/M2M/Operational_QVT_Language_%28QVTO%29



http://sourceforge.net/projects/smartqvt/

http://sourceforge.net/projects/smartqvt/

http://smartqvt.elibel.tm.fr/

http://smartqvt.elibel.tm.fr/

Bibliography

[GK08] Thomas Goldschmidt and Jens Kubler. Towards Evaluating Maintain-ability Within Model-Driven Environments. In Software Engineering(Workshops), pages 205–211, 2008. 64, 81

[GPT09] Pavle Guduric, Arno Puder, and Rainer Todtenhofer. A Comparisonbetween Relational and Operational QVT Mappings. In InformationTechnology: New Generations, 2009. ITNG ’09. Sixth InternationalConference on, pages 266–271, April 2009. 73, 91, 129

[Gro09] Richard C. Gronback. Eclipse Modeling Project: A Domain-SpecificLanguage Toolkit. The Eclipse Series. Addison-Wesley, 2009. 28

[GW08] Thomas Goldschmidt and Guido Wachsmuth. Refinement Transfor-mation Support for QVT Relational Transformations. In Proceedingsof the 3rd Workshop on Model Driven Software Engineering (MDSE2008), 2008. 43

[HFBR08] Jens Happe, Holger Friedrich, Steffen Becker, and Ralf H. Reussner.A Pattern-Based Performance Completion for Message-Oriented Mid-dleware. In Proceedings of the 7th international workshop on Softwareand performance, WOSP ’08, pages 165–176, New York, NY, USA,2008. ACM. 44

[ikv07] ikv++ technologies ag. medini QVT User Guide. medini QVT (Ver-sion 1.7.0) Eclipse Plugin Online Help, 2007. 36

[ikv12] ikv++ technologies ag. medini QVT (Version 1.7.0). http://

projects.ikv.de/qvt/, 2012. last retrieved 2012-10-17. 35

[ISH08] Maria-Eugenia Iacob, Maarten W. A. Steen, and Lex Heerink.Reusable model transformation patterns. In Proceedings of the 200812th Enterprise Distributed Object Computing Conference Workshops,EDOCW ’08, pages 1–10, Washington, DC, USA, 2008. IEEE Com-puter Society. 38, 39, 40, 41, 45

[ISO01] ISO/IEC Standard. Software Engineering – Product Quality – Part1: Quality Model. ISO/IEC Standard 9126-1, ISO/IEC, 2001. 12, 63

[ISO03a] ISO/IEC Standard. Software Engineering – Product Quality – Part2: External metrics. ISO/IEC Standard 9126-2, ISO/IEC, 2003. 12

[ISO03b] ISO/IEC Standard. Software Engineering – Product Quality – Part3: Internal metrics. ISO/IEC Standard 9126-3, ISO/IEC, 2003. 12

[ISO05] ISO/IEC Standard. Software Engineering – Software product Qual-ity Requirements and Evaluation (SQuaRE) – Guide to SQuaRE.ISO/IEC Standard 25000, ISO/IEC, 2005. 12

282

http://projects.ikv.de/qvt/

http://projects.ikv.de/qvt/

Bibliography

[ISO11] ISO/IEC Standard. Systems and software engineering – Systems andsoftware Quality Requirements and Evaluation (SQuaRE) – Systemand software quality models. ISO/IEC Standard 25010, ISO/IEC,2011. 12, 13, 15, 48, 63, 66

[JABK08] Frederic Jouault, Freddy Allilaire, Jean Bezivin, and Ivan Kurtev.ATL: A model transformation tool. Science of Computer Program-ming, 72(1-2):31–39, 2008. Special Issue on Second issue of experi-mental software and toolkits (EST). 34, 37

[KC05] Chang Hwan Peter Kim and Krzysztof Czarnecki. SynchronizingCardinality-Based Feature Models and Their Specializations. In AlanHartman and David Kreische, editors, ECMDA-FA, volume 3748 ofLecture Notes in Computer Science, pages 331–348. Springer, 2005.131

[KCH+90] Kyo C. Kang, Sholom G. Cohen, James A. Hess, William E. Nowak,and A. Spencer Peterson. Feature-Oriented Domain Analysis (FODA)Feasibility Study. Technical report, Carnegie-Mellon University Soft-ware Engineering Institute, November 1990. 131

[KE08] Jorg Kiegeland and Hajo Eichler. medini qvt (ikv++ technologiesag) workshop slides. http://projects.ikv.de/qvt/downloads/22,February 2008. Enschede, Telematica Instituut. last retrieved 2012-10-17. 35, 36

[KGBH10] Lucia Kapova, Thomas Goldschmidt, Steffen Becker, and Jorg Henss.Evaluating Maintainability with Code Metrics for Model-to-ModelTransformations. In George Heineman, Jan Kofron, and FrantisekPlasil, editors, Research into Practice - Reality and Gaps, volume 6093of Lecture Notes in Computer Science, pages 151–166. Springer Berlin/ Heidelberg, 2010. 10.1007/978-3-642-13821-8 12. 3, 43, 62, 63, 64,97

[KGH12] Lucia Kapova, Thomas Goldschmidt, and Jorg Henss. A MetricsSuite for Evaluating the Maintainability of Declarative Model-to-Model Transformations. To be published, 2012. 63, 67, 76, 77, 78,79, 80, 83, 84, 85, 86, 87, 88, 92, 93, 99, 100, 104, 105, 114, 163, 164,165, 167, 168, 169, 170, 172, 173, 174, 175, 177, 178, 179, 180, 182,183, 184, 185, 187, 188, 189, 190, 192, 193, 194, 195, 197, 198, 199,200, 202, 203, 204, 205, 207, 208, 209, 210, 212, 213, 214, 215, 217,218, 219, 220, 222, 223, 224, 225, 227, 228, 229, 230, 232, 233, 234,235, 256, 257, 258, 270, 271, 272

[KKS07] Felix Klar, Alexander Konigs, and Andy Schurr. Model Transforma-tion in the Large. In Ivica Crnkovic and Antonia Bertolino, editors,ESEC/SIGSOFT FSE, pages 285–294. ACM, 2007. 104

283

http://projects.ikv.de/qvt/downloads/22

Bibliography

[Kra98] Reto Kramer. iContract - The JavaTM Design by ContractTM tool.In Technology of Object-Oriented Languages, 1998. TOOLS 26. Pro-ceedings, pages 295–307, August 1998. 55

[KS06] Alexander Konigs and Andy Schurr. Tool Integration with TripleGraph Grammars - A Survey. Electron. Notes Theor. Comput. Sci.,148(1):113–150, February 2006. 111, 130

[Lik32] Rensis Likert. A Technique for the Measurement of Attitudes.Archives of Psychology, 140:1–55, 1932. 99

[MRW77] Jim A. Mccall, Paul K. Richards, and Gene F. Walters. Factors inSoftware Quality, volume I, II, III. Rome Air Development CenterReports, 1977. 12

[NK64] Adam M. Neville and John B. Kennedy. Basic statistical methods forengineers and scientists. International Textbook Co., 1964. 68

[Nol09] Siegfried Nolte. QVT - Relations Language - Modellierung mit derQuery Views Transformation. Xpert.press. Springer Berlin Heidel-berg, 2009. 36, 43, 129

[Nol10] Siegfried Nolte. QVT - Operational Mappings - Modellierung mit derQuery Views Transformation. Xpert.press. Springer Berlin Heidel-berg, 2010. 37, 43, 129

[Obj06] Object Management Group (OMG). Model Driven Architecture -Specifications, 2006. 7

[Obj10] Object Management Group (OMG). Object Constraint Language(OCL) Specification (Version 2.2). Technical report, OMG, 2010. 9

[Obj11a] Object Management Group (OMG). Meta Object Facility (MOF)2.0 Query/View/Transformation Specification (Version 1.1). Tech-nical Report OMG Document Number: formal/2011-01-01, ObjectManagement Group, http://www.omg.org/spec/QVT/1.1/, January2011. 29, 30, 32, 33, 34, 35, 36, 37, 42, 56, 88, 91, 95, 159, 160, 161

[Obj11b] Object Management Group (OMG). OMG Meta Object Facility(MOF) Core Specification (Version 2.4.1). Technical Report OMGDocument Number: formal/2011-08-07, Object Management Group,http://www.omg.org/spec/MOF/2.4.1/, August 2011. 9

[Obj11c] Object Management Group (OMG). OMG MOF 2 XMI MappingSpecification (Version 2.4.1). Technical report, Object ManagementGroup, August 2011. 36

284

Bibliography

[Pre01] Lutz Prechelt. Kontrollierte Experimente in Der Softwaretechnik:Potenzial Und Methodik. Springer, 2001. 1, 10, 13

[SBPM09] Dave Steinberg, Frank Budinsky, Marcelo Paternostro, and Ed Merks.EMF: Eclipse Modeling Framework. The Eclipse Series. Addison-Wesley, 2009. 28

[Sch95] Andy Schurr. Specification of Graph Translators with Triple GraphGrammars. In Ernst Mayr, Gunther Schmidt, and Gottfried Tinhofer,editors, Graph-Theoretic Concepts in Computer Science, volume 903of Lecture Notes in Computer Science, pages 151–163. Springer Berlin/ Heidelberg, 1995. 10.1007/3-540-59071-4 45. 111, 130

[Seb12] Robert W. Sebesta. Concepts of Programming Languages. AddisonWesley, 10 edition, January 2012. 20

[SK09] Niels Streekmann and Steffen Kruse. MDSD Umfrage 2009.http://www.offis.de/f_e_bereiche/energie/publikationen/

publikationen_detailansicht/info/mdsd-umfrage-2009.html,December 2009. last retrieved 2012-10-17. 28, 68, 130

[Sta73] Herbert Stachowiak. Allgemeine Modelltheorie. Springer Verlag,Wien, 1973. 8

[Ste11] Perdita Stevens. A simple game-theoretic approach to checkonlyQVT Relations. Software and Systems Modeling, 5563:1–25, 2011.10.1007/s10270-011-0198-8. 29, 35, 36

[Tat12] Tata Research Development and Design Centre (TRDDC). modelMorf(Beta 1). http://www.tcs-trddc.com/trddc_website/ModelMorf/

ModelMorf.htm, 2012. last retrieved 2012-10-17. 36

[TJF+09] Massimo Tisi, Frederic Jouault, Piero Fraternali, Stefano Ceri, andJean Bezivin. On the Use of Higher-Order Model Transformations.In Richard Paige, Alan Hartman, and Arend Rensink, editors, ModelDriven Architecture - Foundations and Applications, volume 5562 ofLecture Notes in Computer Science, pages 18–33. Springer Berlin /Heidelberg, 2009. 10.1007/978-3-642-02674-4 3. 19, 20, 38, 39, 40

[vA11] Marinus Franciscus van Amstel. Assessing and Improving the Qual-ity of Model Transformations. PhD thesis, Technische UniversiteitEindhoven, The Netherlands, January 2011. 63, 64, 67, 99, 100, 105

[vSB99] Rini van Solingen and Egon Berghout. The Goal/Question/MetricMethod: A Practical Guide for Quality Improvement of Software De-velopment. McGraw-Hill, 1999. 4, 10, 11, 13, 62, 101

285

http://www.offis.de/f_e_bereiche/energie/publikationen/publikationen_detailansicht/info/mdsd-umfrage-2009.html

http://www.offis.de/f_e_bereiche/energie/publikationen/publikationen_detailansicht/info/mdsd-umfrage-2009.html

http://www.tcs-trddc.com/trddc_website/ModelMorf/ModelMorf.htm

http://www.tcs-trddc.com/trddc_website/ModelMorf/ModelMorf.htm

Bibliography

[VSC06] Markus Volter, Thomas Stahl, and Krzysztof Czarnecki. Model-Driven Software Development: Technology, Engineering, Manage-ment. John Wiley & Sons, 2006. 1, 2, 7, 8, 9, 29

[WRH+12] Claes Wohlin, Per Runeson, Martin Host, Magnus C. Ohlsson, BjornRegnell, and Anders Wesslen. Planning. In Experimentation in Soft-ware Engineering, pages 89–116. Springer Berlin Heidelberg, 2012.10.1007/978-3-642-29044-2 8. 120, 121, 122, 123, 124, 125

[Yin03] Robert K. Yin. Case Study Research: Design and Methods, volume 5of Applied Social Research Methods Series. SAGE Publications, 3edition, 2003. 103

286

assessing the quality of model-to-model transformations ...€¦ · in model-driven software...

Documents