Data-driven innovation @ Sirris Elena Tsiporkova & Tom Tourwé
1
Who are we?
Collective centre of the technology industry
• Non profit organisation
• Industry owned
MISSION
Federation for the technology industry
“To increase the competitiveness of companies of the Agoria sectors through technological innovations”
2
METALS
COMPOSITES
PLASTICS & HYBRIDS
COATINGS
NANOMATERIALS
ECO-MECHATRONICS
SENSORIZED FUTURE
MODEL BASED DESIGN
FACTORIES OF THE FUTURE
WORLD CLASS TECHNOLOGIES
ADDITIVE MANUFACTURING
SOFTWARE ENGINEERING
CLOUD COMPUTING
DATA INNOVATION
We operate in 5 domains of technology
3
Research themes and expertise
• Focus on complex and rich data few ‘typical’ big data applications
• Domain-agnostic energy, manufacturing, mobility, web, HR, ...
Predictive analytics
Wearable intelligence
Recommender systems Entity profiling
Scalable (online and offline) data processing Proactive decision support
Context modeling & reasoning
Knowledge extraction
4
Data-related challenges in materials research
• Integrate and process datasets originating from multiple sources • multiple experimental conditions • multiscale (in space and time) heterogeneous datasets
• Standardize workflows for raw data pre-processing • segmentation/discretization, standardization, normalization, …
• Consider dimension reduction: the curse of dimensionality • exploding combination of features, more features than instances
• increased computation time of data processing algorithms
• Facilitate data sharing and re-use • terminology uniformization, consistent data annotation, standards for metadata generation
• Standardize image characterization of materials i.e. feature selection • enable uniform image interpretation by different experts
• link features to a material's performance characteristics
• Include the hierarchical material structure in knowledge databases • graph databases (Neo4j) vs. relational databases
• Establish process-structure-property (PSP) linkages • initial material characterization through feature selection (imaging data) • extract trends on the evolution of selected features during a given manufacturing route (synthesis and processing time
series data)
• study how these trends affect properties/performance characteristics of interest for the material (high-throughput properties/performance data)
• build hypotheses enabling extraction of more reliable and complete PSP linkages (using all available data)
5
Data integration
• Monitoring a particular real-life phenomenon (physical process, production line, operation of a machine, …) via multiple data capturing sources or in multiple experimental conditions provides diverse evidence about the phenomenon in the form of several different datasets
• The challenge is to combine these datasets in order to derive consistent and relevant information about the phenomenon under study
• However, it is not trivial to combine data originating from different sources and measurements due to • varying granularity e.g. different sampling interval is used
• do not necessarily cover exactly the same regions or time periods
• differences in the accuracy and resolution limits of the different tools
• varying rate of missing values, measured vs. estimated values
• metadata not comparable between the different datasets
6
Data integration approaches
• Data Mediation • Establish a conceptual mapping between the same data entity e.g.
semantic technologies
• Data Fusion • Merging datasets after appropriate data transformation
• Post-Data-Analysis Aggregation • Complex integration procedures of data analysis results e.g.
integrative clustering
• Multiscale Hypermodelling • Integration of heterogeneous data e.g. integrative/composite
modelling
7
Example: Data Mediation
8
An RDF graph of a triple made up of two nodes (subject and object) and a link (predicate) connecting them
Semantic hierarchy and terminology for melting temperature of a Cu single crystal measured using differential thermal analysis.
Example: Data Fusion Studying the underlying cell-cycle mechanisms in plants
• Analysing time series data originating from the experimental monitoring of synchronized cell cultures is widely employed for the identification of genes which are periodically regulated during the cell cycle
• The state-of-the-art experimental techniques for plants cannot produce a complete (all phases) cell-cycle coverage
• One possible solution is to merge multiple datasets, produced by different experimental techniques, in order to arrive at a better cell-cycle coverage
• Standardize (e.g. via z-transform) the different time series datasets
• Perform time series alignment to determine the best pasting offset
• Aggregate the overlapping parts of the time series
9
• Time series production data collected from multiple solar plants in different geographical locations can be used to study the production performance of production components (invertors) originating from different manufacturers in order to derive relevant insights e.g. divide/partition invertors into groups according to their production behaviour
• The different time series datasets cannot be fused and analyzed as one dataset since the data has been captured in different time zones and with different sampling intervals
• One possible solution is to apply consensus clustering approach i.e. cluster each dataset separately and integrate the clustering results
Example: Post-Data-Analysis Aggregation Analysis of solar energy production data
Experiment #1 Experiment #2 Experiment #n …
Clustering
Partition #1
Clustering
Partition #2
Clustering
Partition #n …
Overall Clustering
Partition
k1 k2 kn
Partitioning of cluster centers
10
Multiscale Hypermodelling
• Complex physical phenomena of a multiscale character • Multitude of interacting components, systems, processes • Fragmented knowledge and non-homogeneous in terms of detail
• Rich scientific/engineering knowledge available at component level
• Some understanding of the dynamics of the system
• Little insight in the complex space-time scale interactions of multiple entities and the impact of external factors
• Hypermodels -> multi-layered integrative models • Decomposing a complex phenomenon or a composite model of it into its crucial
components • Reconstructing the composite model in a well-designed, formal and reproducible
way • Low-level (knowledge-driven) models encode detailed knowledge about the working
of individual components of a larger system
• High-level (data-driven) models consist of a data-driven inference of parameters and dependencies, correlation of events, detection of behaviour patterns
11
Industrial services Making data innovations tangible by identifying the potential and
developing proof-of-concepts
EluciDATA: general information • Mission
stimulate data innovation (i.e. innovation by focused data exploitation) in industry
• Details • Started on September 1st, 2014
• 4 year project, ~2.8 FTE, ~1.3M€ budget
• 3 partners
• VIS (« Vlaams InnovatieSamenwerkingsverband ») modality
• Industrial user group: 30+ companies
With the support of
Industrial services
• Feasibility studies identify the potential of data innovation in industrial context
• Demonstrators build working proof-of-concepts around well defined industrial use cases
• Master courses & in-residence trainings facilitate knowledge transfer
• Innovation and research project incubator initiate R&D projects with a complementary consortium of companies
• Student academy recruit students and define topics aligned with your interests and challenges
14
Contact Info
• Dr. Elena Tsiporkova • [email protected]
• +32 498 919 490
• Dr. Tom Tourwé • [email protected]
• +32 498 919 473
15