fase 2015 - map-based transparent persistence for very large models
TRANSCRIPT
MAP-BASED TRANSPARENT PERSISTENCE
FOR VERY LARGE MODELS lin
Abel GómezMassimo Tisi
Gerson Sunyéand Jordi Cabot
1
OUTLINE
▌The landscape in MDE▌Motivation: running example and current
persistence approaches▌Towards a simple EMF-based
persistence layer▌NEOEMF/MAP: A transparent persistence
layer for EMF models▌Our experimental evaluation in a nutshell▌Conclusions and future work
© ATLANMOD - [email protected]
INTRODUCTION
Why another persistence solution?
3
THE LANDSCAPE IN MDE▌ Models and code generation are the center of the
software-engineering processes▌ Modeling tools are built around modeling frameworks (EMF
has become the de facto standard)▌ The technologies at the core of modeling frameworks were
designed to support simple modeling activities▌ Since its publication, the XMI standard has been the
preferred format for storing and sharing models and metamodels
▌ Clear limits arise when current technologies are applied to VLMs: XML is not the right technology for VLMs (verbosity, costly serialization/deserialization…)
▌ Some solutions exist, but problems in managing memory and persisting data are still under-studied in MDE
© ATLANMOD - [email protected]
4
RUNING EXAMPLE
© ATLANMOD - [email protected]
Java Metamodel(excerpt)
nsURI: ’http://java’
Instance
7
MOTIVATION
▌ Within a modeling ecosystem, all tools that need to access or manipulate models have to pass through a single model management interface
▌ In some of these ecosystems (e.g. EMF) the model management interface is automatically generated from the metamodel
© ATLANMOD - [email protected]
8
THE GENERATED MODEL MANAGEMENT INTERFACE▌ // Creation of objects▌ Package p1 := Factory.createPackage();▌ ClassDeclaration c1 := Factory.createClassDeclaration();▌ BodyDeclaration b1 := Factory.createBodyDeclaration();▌ BodyDeclaration b2 := Factory.createBodyDeclaration();▌ Modifier m1 := Factory.createModifier();▌ Modifier m2 := Factory.createModifier();▌ // Initialization of attributes▌ p1.setName("package1");▌ c1.setName("class1");▌ b1.setName("bodyDecl1");▌ b2.setName("bodyDecl2");▌ m1.setVisibility(VisibilityKind.PUBLIC);▌ m2.setVisibility(VisibilityKind.PUBLIC);▌ // Initialization of references▌ p1.getOwnedElements().add(c1);▌ c1.getBodyDeclarations().add(b1);▌ c1.getBodyDeclarations().add(b2);▌ b1.setModifier(m1);▌ b2.setModifier(m2)
© ATLANMOD - [email protected]
9
MOTIVATION
▌ Without any specific memory-management solution, the model would need to be fully contained in memory for any access or modification
▌ Models that exceed the main memory would cause a significant performance drop or the application crash
© ATLANMOD - [email protected]
10
STANDARD TECHNOLOGIES FOR PERSISTING MODELS IN EMF▌XML-based (XMI)
│ Pros: Readability, fast for small models│ Cons: Needs to load/keep the whole
model in memory.▌Connected Data Objects (CDO)
│ Pros: on-demand loading, transactions, versioning, notifications
│ Cons: Only the relational mapping is regularly maintained, does not scale well with VLMs
© ATLANMOD - [email protected]
11
NEW TRENDS IN PERSISTING MODELS IN EMF▌ Morsa (document-oriented)
│ On-demand loading, incremental updates, fully compatible with the EMF API
│ Requires its own query language to get good performance▌ MongoEMF (document-oriented)
│ Uses the standard EMF API│ It behaves different than the standard back-ends
▌ EMF fragments│ Uses the standard proxy mechanism to partition models in small chunks│ Requires modifications on the metamodels to get the benefits of partitions
▌ NeoEMF/Graph, a.k.a. Neo4EMF (graph-based)│ Models are a set of highly interconnected elements → graphs are the
most natural way to represent them│ The generated API only performs one-step navigations → only a
significant gain in performance is obtained when using native queries on the underlying persistence back-end
© ATLANMOD - [email protected]
12
MOTIVATION
▌ We need a transparent persistence layer able to automatically persist, load and unload model elements with no changes to the application code
© ATLANMOD - [email protected]
13
NEOEMF/MAPDESIGNGOALS
Towards a simple EMF-based persistence layer
14
MODEL-PERSISTENCE LAYER
▌NEOEMF/MAP must…
… be an exact replacement… use a replaceable underlying engine… allow different types of caching
… be memory friendly … provide on-demand load capabilities… free unused memory
… outperform current persistence layers using the standard API
Inte
rope
rabi
lity
requ
irem
ents
Perf
orm
ance
re
quire
men
ts
© ATLANMOD - [email protected]
15
MODEL-PERSISTENCE LAYER
© ATLANMOD - [email protected]
ModelManager
PersistenceManager
PersistenceBackend
NeoEMF/Map
EMF
/GraphCDOXMI
Serialization
Model-based Tools
XMI File GraphDB MapDB
CachingStrategy
RelationalDB
Model Access API
Persistence API
Backend API
ClientCode
16
NEOEMF/MAPA TRANSPARENT
PERSISTENCE LAYER FOR
EMF MODELS
Memory ManagementMap-based data modelModel operations as map operations
17
MEMORY MANAGEMENT
▌ Decoupling dependencies among objects by assigning a unique identifier to all model objects allows:
▌ Lightweight on-demand loading│ Each live model object has a lightweight delegate
object that is in charge of on-demand loading the element data and keeping track of the element’s state
▌ Efficient garbage collection in the JRE│ No hard Java references are kept among model
objects. Any model object not directly referenced by the application will be deallocated
© ATLANMOD - [email protected]
18
MAP-BASED DATA MODEL▌ The unique identifier allows flattening the graph
structure into a set of key-value mappings▌ Operations on hash-maps have a constant cost▌ Three different (hash-)maps are used to store
models’ information:│ Property map: keeps all objects’ data in a centralized
place│ Type map: tracks how objects interact with the meta-
level (e.g. instance of)│ Containment map: defines the models’ structure in
terms of containment references
© ATLANMOD - [email protected]
19
MAP-BASED DATA MODEL
▌Property map│ Key: OID + EstructuralFeature│ Value: data
© ATLANMOD - [email protected]
Key Value{ ‘c1’, ‘name’ } ‘class1’{ ‘c1’, ‘bodyDeclarations’ }
{ ‘b1’, ‘b2’ }
20
MAP-BASED DATA MODEL
▌Type map│ Key: OID│ Value: nsURI + EObject’s EClass
© ATLANMOD - [email protected]
Key Value‘c1’ ⟨ nsUri=‘http://java’, class=‘ClassDeclaration’ ⟩
21
MAP-BASED DATA MODEL
▌Containmentmap│ Key: OID│ Value: Container’s OID +
EStructuralFeature (from parent to child).
© ATLANMOD - [email protected]
Key Value‘c1’ ⟨ container=‘p1’, featureName=‘ownedElements’ ⟩
22
MODEL OPERATIONS ASMAP OPERATIONS
LOOKUPS INSERTSMETHOD MIN. MAX. MIN. MAX
OPERATIONS ON OBJECTSgetType 1 1 0 0
getContainer 1 1 0 0getContainerFeature 1 1 0 0
OPERATIONS ON PROPERTIESget* 1 1 0 0set* 0 3 1 3
isSet* 1 1 0 0unset* 1 1 0 1
OPERATIONS ON MUTI-VALUED FEATURESadd 1 3 1 3
remove 1 2 1 2clear 0 0 1 1size 1 1 0 0
© ATLANMOD - [email protected]
23
EXPERIMENTALEVALUATION
Conditions of the experimentsResultsSummary
24
EXPERIMENTAL EVALUATION▌ Based on our joint experience with industrial
partners:│ We obtained three models from OSS using reverse
engineering…│ … that resemble models from real-world scenarios│ We defined a set of queries (GraBaTs’09 and
industry-like)│ Only the standard EMF API is used → Queries are
backend-agnostic│ Three heap sizes: 8GB, 512MB and 256MB
© ATLANMOD - [email protected]
# MODEL SIZE IN XMI ELEMENTS1 org.eclipse.gmt.modisco.java 19.3MB 80.6652 org.eclipse.jdt.core 420.6MB 1.557.0073 org.eclipse.jdt.* 984.7MB 3.609.454
25
EXPERIMENTAL EVALUATION
▌ Selected back-ends:│ NEOEMF/MAP (MapDB)│ NEOEMP/GRAPH (Neo4j embedded)│ CDO (H2 embedded)
▌ Discarded back-ends:│ MongoEMF → does not strictly comply with the
standard EMF behavior│ EMF-fragments → requires manual modifications
in the source models or metamodels│ Morsa → only a small subset of the experiments
ran successfullyConfiguration details: Intel Core i7 3740QM (2.70GHz), 16 GB of DDR3 SDRAM (800MHz), Samsung SM841 SATA3 SSD Hard Disk (6GB/s), Windows 7 Enterprise 64, JRE 1.7.0_40-b43, Eclipse 4.4.0, EMF 2.10.1, NeoEMF/Map uses MapDB 0.9.10, NeoEMF/Graph uses Neo4j 1.9.2, CDO 4.3.1 runs on top of H2 1.3.168
© ATLANMOD - [email protected]
26
EXPERIMENT I
© ATLANMOD - [email protected]
Model 1 Model 2 Model 39 s 161 s
412 s41 s
1161 s
3767 s
12 s 120 s301 s
Import model from XMI (8GB)
NeoEMF/Map NeoEMF/Graph CDO27
EXPERIMENT II
© ATLANMOD - [email protected]
Model 1 Model 2 Model 34 s 35 s
79 s3 s 25 s
62 s16 s
201 s
708 s
14 s
133 s
309 s
Model traversal 8GB (incl. loading & unloading)
XMI NeoEMF/MapNeoEMF/Graph CDO
28
EXPERIMENT II
© ATLANMOD - [email protected]
Model 1 Model 2 Model 34 3
42
366
15
235
763
13
550 548
Model traversal 512MB (incl. loading & unloading)
XMI NeoEMF/MapNeoEMF/Graph CDO
29
EXPERIMENT III
© ATLANMOD - [email protected]
Model 1 Model 2 Model 30 s 0 s 0 s0 s
2 s
19 s
0 s 0 s
2 s
Model queries that do not traverse the model 8GB
NeoEMF/Map NeoEMF/Graph CDO
30
EXPERIMENT IV
Model 1 Model 2 Model 31 s 24 s
61 s11 s
188 s
717 s
9 s48 s
367 s
GraBaTs’09 8GB
NeoEMF/MapNeoEMF/GraphCDO
© ATLANMOD - [email protected]
Model 1 Model 2 Model 32 s 36 s
101 s17 s
359 s
1328 s
9 s131 s
294 s
Unused Methods 8GB
NeoEMF/MapNeoEMF/GraphCDO
31
EXPERIMENT V
© ATLANMOD - [email protected]
Model 1 Model 2 Model 31 s 24 s
62 s11 s
191 s
677 s
9 s
118 s
296 s
Model modification and saving 8GB
NeoEMF/Map NeoEMF/Graph CDO
32
EXPERIMENT V
© ATLANMOD - [email protected]
Model 1 Model 2 Model 31 s
160 s
472 s
11 s
224 s
9 s
723 s
Model modification and saving 256MB
NeoEMF/Map NeoEMF/Graph CDO
33
SUMMARY▌ NeoEMF/Map performs better than any other
solution when using the standard API▌ NeoEMF/Map presents import times in the same
order of magnitude than CDO, but it is about a 33% slower for the largest model → NeoEMF/Map is affected by the overhead produced by modifications on big lists (>100.000 elements) that grow monotonically (caching is needed)
▌ The simple data model with low-cost operations implemented by NeoEMF/Map contrasts with the more complex data model implemented by NeoEMF/Graph (consistently slower by a factor between 7 and 9)
© ATLANMOD - [email protected]
34
SUMMARY
▌ Traversal of a very large model is much faster (up to 9×) by using the NeoEMF/Map
▌ If load and unload times are considered NeoEMF/Map also outperforms XMI
▌ The fast model-traversal ability of NeoEMF/Map is exploited by the pattern followed by most of the queries in the modernization domain
▌ Queries that traverse the model to apply and persist changes perform significantly better on NeoEMF/Map (5× faster on big models, 9× on small models).
© ATLANMOD - [email protected]
35
CONCLUSIONS▌ Map-based persistence layer to handle VLMs▌ Comparison against relational-based and graph-
based alternatives▌ EMF as the implementation technology▌ We used queries from some of our industrial partners
in the model-driven modernization domain as experiments
▌ Typical model-access APIs, with fine-grained methods with one-step-navigation queries, do not benefit from complex relational or graph-based data structures.
▌ Low-level data structures, like hash-tables, with low and constant access times provide better results
© ATLANMOD - [email protected]
37
FUTURE WORK
▌ Caching strategies:│ Element unloading (which element is not needed
anymore?) │ Element prefetching (which element will be
needed in future?)▌ Benefits of other backends depending on the
specific application scenario:│ Graph-based persistence solutions when some
of our requirements can be dropped│ Bypassing the model access API by translating
the queries to high performance native graph-database queries may provide great benefits
© ATLANMOD - [email protected]
38
MAP-BASED TRANSPARENT PERSISTENCE
FOR VERY LARGE MODELS lin
Abel GómezMassimo Tisi
Gerson Sunyéand Jordi Cabot