high-performance model queries
DESCRIPTION
Presentation at the Third Biannual Workshop on Eclipse Open Source Software and OMG Open Specifications, Co-located with EclipseCon 2012TRANSCRIPT
![Page 1: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/1.jpg)
Budapest University of Technology and EconomicsFault Tolerant Systems Research Group
Workshop on Eclipse Open Source Software and OMG Open SpecificationsEclipseCon 2012, Reston, VA, USA 2012. 03. 25.
High performance model queries and their novel applications
Benedek IzsóZoltán Szatmári
István Ráth
![Page 2: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/2.jpg)
The speaker’s background Research @ academia
oMetamodeling, DSMLs, model transformations, …• The VIATRA2 Model Transformation Framework (2004-)
oModel queries• The EMF-IncQuery project (2010-)
Consulting & Development @ industry o Eclipse consultant and trainer (2006-)o Architect, model-driven tools and projects (2008-)
![Page 3: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/3.jpg)
The problem Scalable modeling tools
o Issues encountered by several industrial partners using EMF-based tools
o Modeling scenarios can get really complex really quickly• Instance models of size 1-10M and beyond• Performance issues with model transformations and code
generators have an adverse effect on everyday development tasks
Scalabilityo Complex (meta)modelso Large instance modelso High performance model I/O
![Page 4: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/4.jpg)
Focus: model queries Model queries
o “a piece of code that retrieves a given set of the model” Ubiquitous
o Every model access/read is a (simple) queryo More complex: Views, content providerso Most complex: Model transformations, code generators
In interactive (user-driven) use cases, queries are far more common than model manipulationo query performance is crucial
Query performanceo The speed of content retrieval
• First vs consecutive vs throughput• Query result vs query contents
Essence of performance issueso slow queries o Considerable programming effort necessary to get complex queries right
![Page 5: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/5.jpg)
Technology landscape – what to build a modeling tool on?
Eclipse Modelingo EMF (&variants, CDO, …)o Proven tools: Plain Java queries, OCL (&variants), EMF Query (1-2)o New/experimental technologies: EMF-IncQuery, JBoss Drools, …
Triple stores / graph databaseso OWL/RDF, SPARQLo RacerPro, Pellet, Sesame, Jena, OpenVirtuoso, Allegro, StarDogo Modeling tools: not (yet) a traditional use caseo But…
• Huge, distributed knowledge bases• Highly optimized SPARQL engines• Expensive commercial tools
Relational DBo SQL, PL/SQL, In-memoryo Overhead: ORM layers
Further options(not covered yet)• Modeling
frameworks other than EMF
• Plain XML/XPath• OO DB• K-V (NoSQL)…
![Page 6: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/6.jpg)
Assessment through benchmarking
Railway system design DSML Benchmark: well-formedness checking in 4 phases, using in-memory
modelso Phase1: model load into memory (READ)o Phase2: rule validation (retrieve all invalid elements) (CHECK1)o Phase3: scattered, programmatic, in-memory model modification (introducing
new errors) (EDIT)o Phase4: revalidation (retrieve all invalid elements) (CHECK2)
Several test queries with different characteristicso Varying “difficulty”o Approx. same percentage of invalid elements present in the model for all
querieso Tool expected to return ALL invalid model elements with references/IDs
Measuredo Execution times for all phaseso Memory usage over the entire sequence
![Page 7: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/7.jpg)
Focus Model sizes: up to 2.5M elements
o Industrial partner: we would like to see 10M+ elements!• Note: 10M+ model elements ~ 50M+ triples
o This is not yet possible with in-memory tech Query complexity
o Typical well-formedness constraints similar to actual toolso Ranging from simple attribute checking to complex structural
checks (circle detection) Interesting use cases
o READ+CHECK1: “batch validator” (models are read into memory, rules are evaluated)
o EDIT+CHECK2: “on-the-fly revalidation in an editor” (list of invalid elements updated according to changes in the model)
![Page 8: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/8.jpg)
RESULTS
![Page 9: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/9.jpg)
Disclaimer This is still work in progress External audits (not complete yet)
o OCL: Ed Willinko Drools: Mark Proctor o Java: our industrial partner (Eclipse-based tool
development)o OWL/SPARQL: technology experts at an academic
partner Detailed results, analysis and the corpus will be
available onlineo This is just an appetizer
![Page 10: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/10.jpg)
Question 1 Q: How does model size affect query
performance?
![Page 11: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/11.jpg)
Batch validation
Read time is dominatingSPARQL tools are generally 1-2 Orders-of-Magnitude (OM) slower
than top EMF tools
Eclipse OCL does really well here, followed closely by IncQuery
![Page 12: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/12.jpg)
Revalidation on-the-fly
Incremental EMF engines typically 5-10x
faster, sub-100ms response times for up
to 1.5M elements
![Page 13: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/13.jpg)
A closer look to the top
Performance advantage of OCL-IA over OCL is not significant with
complex queries
![Page 14: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/14.jpg)
Memory usage
![Page 15: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/15.jpg)
Answer 1 Q: How does model size affect query
performance? A: Polynomial growth in response times.
o Batch validation: execution times grow according to a low-order polynomial (exponent depends on the technology)
o On-the-fly revalidation: similarly, but incremental engines achieve much lower exponents (can be even close to constant)
![Page 16: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/16.jpg)
Question 2 Q: How does the complexity of the query affect
performance?
![Page 17: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/17.jpg)
Re-validation on-the-fly
• 1-2 OM differences with OCL
• Max 0.5 OM difference with IQ
![Page 18: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/18.jpg)
Answer 2 Q: How does the complexity of the query affect
performance? A: Significantly!
o Metamodel and contents of the query together determine the overall performance• E.g. inverse relations and model navigation• Metamodel and queries should be optimized together
o RETE-based tools (Drools, IncQuery) can get around this• High performance regardless of metamodel & query structure• At the cost of increased memory overhead
– But (much) less than OCL-IA!
![Page 19: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/19.jpg)
Results Fastest batch validator: Eclipse OCL Fastest incremental validator: EMF-IncQuery Fast OWL tools (RDF reasoners): Sesame, Jena
o Significantly better than commercial SPARQL engineso But at least 1-2 OM slower than top EMF technologies
(In-memory) RDBMS are slowo Due to ORM overheado Omitted for simplicity, full report will be online
![Page 20: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/20.jpg)
How to build a scalable modeling tool? If everything fits into your workstations’ RAM
o 8GB RAM 1-2M model elementso You can (& should) (continue to) use EMF
• EMF tweaks (e.g. diet) have no significant impact• Instead, optimize queries and metamodels
o Use Eclipse OCL for batch validationEMF Query 1-2 has much less expressive power, not focused on in-memory models
o Use incremental engines for on-the-fly queries• EMF-IncQuery supports on-the-fly validation, view maintenance, queries for
frequently executed and performance sensitive model transformations, …• With RETE-based tools such as Drools or IncQuery, no manual
metamodel/query etc. optimization necessary• Cost: additional memory requirements
Elseo To be investigated
![Page 21: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/21.jpg)
EMF-INCQUERY
![Page 22: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/22.jpg)
Overview A model query engine
o Supports batch querieso Optimized for incremental queries!
Incremental evaluationo Based on the RETE algorithmo Compute once, update afterwardso Gain: Instant re-evaluationo Price: Uses some more memory
• Manageable with proper life cycles
![Page 23: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/23.jpg)
Benefits Makes on-the-fly well-formedness validation, view maintenance, …
feasible over really large instance models Simplifies writing really complex queries
o Graph pattern language• Capture local & global constraints• Compositions, reusable constructs• Arbitrary recursion, negation
o Highly reusable, you can even build query libraries Easy-to-integrate into existing apps
o works with any EMF DSLo Integration through standard EMF APIs
![Page 24: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/24.jpg)
New features: IncQuery v0.7 Tooling
o Xtext2-based toolingo New language features (w.i.p.)
• Unlimited recursion and transitive closure• Short attribute notation• Aggregate functions• Match/exceed the expressive power of OCL, while providing more flexible re-use
Runtimeo Generic queries
• Build and execute queries on-the-fly, using Java and IQ PLo RETE construction optimizations
• memory footprint significantly reducedo SCC-based incremental transitive closure
Extensions / add-onso Derived featureso Efficient complex event processing over modelso Design-space exploration
![Page 25: High-performance model queries](https://reader035.vdocuments.us/reader035/viewer/2022062617/54bfd0844a79595d628b4572/html5/thumbnails/25.jpg)
Final points EMF-IncQuery: proposal submission process
underwayo New sub-project under EMFT
Pointerso http://viatra.inf.mit.bme.hu/publications/trainbenchmarko http://viatra.inf.mit.bme.hu/incquery o http://viatra.inf.mit.bme.hu/incquery/examples o http://viatra.inf.mit.bme.hu/incquery/new
Thank you very much!