transmart 17.1 technical overview
TRANSCRIPT
27th of October 2016Piotr Zakrzewski – The Hyve
TranSMART Pro 17.1 project Technical Overview
2
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
3
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
4
Repository StructureBefore you can deploy it here ...
5
Repository Structure
core-api core-db rest-api R modules core-api transmart
data legacy db
you need all of these ...
...and these...
6
Repository Structure16.2: - TranSMART 16.2 spans 10 core
repositories- Building & testing tranSMART requires a
special setup (that resides in yet another repository)
17.1:- Single repository with all core
components necessary for building working tranSMART WAR file
7
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
8
Versioning of Artifacts 16.2:- Most components are versioned as
SNAPSHOTs- core-api, core-db, rest-api, transmartApp
and all other core components need to match strictly in revision in order to work
17.1:- Single repository: all changes to different
components come in a single PR
9
Build Process16.2:- Transmart 16.2 (Grails 2) uses Gant scripts for
building- git-repo used for fetching all repositories- custom groovy script (dependency manager)
needed for dev setup17.1:- Gradle build system (comes with Grails 3)- One step build (also with database setup)- just git clone && ./gradlew build
10
Test Setup16.2:- Custom script matching branches during
travis run- Different way to run tests locally and on
travis- No reliable way to run tests for all
components- Tested on H2 in-memory database17.1:- ./gradlew test both locally and on travis- tested against Oracle and Postgres - BDD Spock framework for testing
11
- Default option for Grails 3.X- Very versatile build system - Also very popular (gained momentum due to
adoption by Android)- Especially suitable for multi-project, multi-
language builds like tranSMART
12
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
13
Java 7 to Java 8
tranSMART is still running on Java 7 which is no longer supported, even for security updates since April 2015.
Java 7 reached its end of life
14
Groovy 2.4 and Grails 3
- Java 8 supports invokeDynamic, which should increase performance of many groovy dynamic calls
- Many workarounds accounting for old Grails and Hibernate versions bugs no longer necessary
- Upgrade allowed us to adopt better build system: Gradle
15
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
16
REST-API versioning
● TranSMART REST-api is used in production● Several clients and third-party apps● But development needs to continue …
17
REST-API versioning
- in 17.1 REST-api versioning is introduced- Versioning is done on the url level- GET /studies becomes GET /v1/studies- only minor influence on existing clients (change of
base url configuration to include version)
18
Current REST-API documentation
19
Open API (previously Swagger)
20
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
21
Db schema as of now (16.2)
22
Db schema as of now (16.2)
Some facts about the current schema:Study exists only as string ids sprinkled around the star
schema (no table for study)Concepts and patients belong to a study (cannot be
shared)Combination of patient-concept yields a single
observation
23
Db schema of 17.1
24
Db schema of 17.1
Most important Consequences of 17.1 changes:Concepts and patients can be shared between studies more straightforward cross trial comparison (trial-visit
dimension) and longitudinal data (start date) supportMuch redundancy and inconsistencies removed
25
Hypercube- Introduction of longitudinal data
requires a whole different approach
- Modifiers used to store time point. Both relative and absolute allowed
- Each observation has effectively an additional dimension (hence the Hypercube)
26
How to query a Hypercube ?
27
Impact on backwards compatibility- Old UI will work only with old data, new data
(especially longitudinal) will not be supported- Old ui will not make use of new cross-trial
functionality- Migration path will be provided between 16.2 and
17.1
28
New UI however will support the longitudinal data and other features
29
What does 17.1 mean for future development?
Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star
schema to better fit tranSMART and new data types
● Documentation
30
Documentation
- one of the project deliverables is documentation on the database schema
- REST-api documented with Open-API- Documentation as part of git repository
31
Conclusion
17.1 aside from many new features is also a major clean-up that will make future
developments easier
Backup slides
33
34
Arvados Keep
35
Performance Benchmarks- Goal: safeguarding performance of REST-api- Implemented as a Gradle task (single command)- Should help developers spot falls in performance
after new changes- Reference setup on Amazon will be available to
make benchmarks comparable
36
Other changes- Multiple observations per concept-patient support- Categorial variables no longer loaded per value
(e.g. variable Treated being two variables: yes and no)
- Several new tables to accommodate new HDD data type (RNAseq measurement per transcript) and table to store generic links to external resources (files)