Download - PHUSE Connect -DH-08
PHUSE Connect 2020 / DH-08
PHUSE Connect - DH-08Controlled and FAIR Data
Access with JDBC and Arrow including Clinical Data
Ronald Steinhau, Entimo AG2020-SEP-25
PHUSE Connect 2020 / DH-08
§ Securely access clinical data from all development tools§ Ensure data security (e.g. fine granular access rights)§ Support older (JDBC) and newer (Arrow) standards for data transport§ Support your favorite development tools
§ R-Studio, Python (Jupyter, Python IDE, …)§ Java/Scala/SAS§ Spark, other Big Data Tools…
§ Fully leverage your infrastructure§ Maximum transfer speeds § Streaming rather then copy & use§ Caching for lasting performance in builds and pipelines
§ Combine clinical data with any other data§ Architecture Presentation
§ entimICE Data Access (EDA)
Vision
PHUSE Connect 2020 / DH-08
Extendable Data Access (EDA) - Architecture
entimICE File-System
ExternalFile-Systems
ExternalDatabases
entimICE ClinRep-DB
Apache Arrow (Flight) Shared Proxy Service
Apache Ignite/H2 Shared Memory
R-Studio JupyterR/PythonSpotfire SAS PL/SQLExcel
JDBC/ODBC(virtual DB)
EDA-Service
Big-Data-Tools
Web Data Grid (Server Side)
GandivaGPUs
EDA Data Grid via R-ShinyTools
Web API‘s (JSON)
EDA Web Frontend
entimICE Indexing
Data
EDA-Grabber
Remote JDBC(virtual DB)
DatatablesAG-Grid
PHUSE Connect 2020 / DH-08
§ Findable§ Use entimICE metadata, derived schema and index§ Optional integrate data catalog
§ Accessible§ Use JDBC and Apache Arrow to securely access the data§ Enriched standard tools (R, Python) by custom packages
§ Interoperable§ Join data with SQL from all sources (files, tables, streams)
§ Reusable§ entimICE metadata§ Auto-derived schema information
FAIR Data Access with EDA
PHUSE Connect 2020 / DH-08
§ Large JAR file for easy tool integration § optional ODBC bridge support
§ Operates as a virtual (in-memory) database§ Respects access rights after login§ Supports joins between any dataset
§ SAS datasets, Database, CSV, Web-API’s§ Full ANSI SQL standard supported§ Supports all entimICE datasets as a data source§ Adaptable to other clinical data sources
Access by EDA smart JDBC Diver (Virtual DB)
PHUSE Connect 2020 / DH-08
Apache Arrow Architecture
entimICER-StudioJupyter
Gandiva
PHUSE Connect 2020 / DH-08
§ Apache Arrow new (faster) Data Exchange Standard§ Version 1.0 since August 2020§ Evolution of ideas from Parquet and other columnar formats
§ Columnar optimized data format§ Organized in same typed vectors (compact and fast)§ Parallel vector transfer over network available (gRPC)§ Dictionary encoded arrays (option) for compression (code lists)
§ Tool-Support§ R-Studio à read/write/convert dataframes to/from arrow§ Jupyter à transform into python data structures§ Convenience Packages by entimICE (supporting EDA arrow server)§ Big Data tools like Spark, Pandas, Dremio support Arrow
Access by Apache Arrow
PHUSE Connect 2020 / DH-08
§ Transparent In-Memory Caching§ Multi-Level Caching (Fast File, In-Memory)§ Controlled via Arrow/JDBC§ No HDFS required
§ Web Data Grid§ Fast data evaluation (sort, filter, search) of large datasets§ All filtering/sorting/searching server side (e.g. in cache)
§ Data Catalog Integration Hooks§ Find data already covered by catalogs, indices and metadata§ Customize EDA by catalog integration
EDA Productivity Enhancements
PHUSE Connect 2020 / DH-08
§ entimICE EDA architecture
§ Covers secure and fast access to clinical data§ Supports most favorite development tools§ Caching provides full leverage of infrastructure§ FAIR principle implemented
§ Vision turned to reality!
Summary
PHUSE Connect 2020 / DH-08
Thank you for your attention!
For questions please contact: