leonid kalinichenko, sergey stupnikov, victor zakharov, vladimir budzko, vadim korolev institute of...
Post on 26-Dec-2015
218 Views
Preview:
TRANSCRIPT
Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev
Institute of Informatics Problems, Russian Academy of Science
Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous
information resources
Declaration of Intent Draft by IPI RAN SkTech.RC/IT/Madnick
Outline
State of the art in subject mediation reached at IPI RAN Directions of research and development suggested for use in the
proposal SkTech.RC/IT/Madnick Investigation of application driven approach for scientific problem
solving in the subject mediator environment Heterogeneous multidialect mediator infrastructure for data,
knowledge and services semantic integrationMediation of data bases with nontraditional data modelsStorage of very large volumes of data [Zakharov]Cyber security issues [Budzko, Korolev]
Self-certification Coverage by the DoI of a content of the three themes (Scientific
Dataspace, Data Quality and Big Data) declared by Prof Stuart Madnick
2
State of the art in subject mediation reached at IPI RAN
3
Basic principles Subject mediation technology is aimed to fill the widening gap
between the users (applications) and heterogeneous distributed information resources
independence of definition of problem domain (the mediator definition) of the existing information resources
definition of a mediator as a result of consolidated efforts of the respective scientific community
independence of user interfaces of the multiple information resources involved
information about new resources can be published at any time independently of mediators acting at that time
GLAV-based setting for relevant information resources integration at the mediator
integrated access to the information resources in process of problem solving
recursive structure of a mediators 4
Canonical information model synthesis
5
R1
R2
R3
Resource information models
E1
E2
E3
Canonical Model
Kernel
refines
refines
refines
Resources identification and integration Identification relevant resources
metadata model (capabilities)ontological model (concepts and their relationships)canonical model (structure and behavior)
Integration of relevant resources in a mediator (registration)GLAV = Local As View (LAV) + Global As View (GAV)GAV: provide for reconciliation of various conflicts between resource
and mediator specificationsLAV: resource schemas are registered in mediator as materialized
views over virtual classes of a mediator stability of application problem specification during any modifications
of resources is provided scalability of mediators w.r.t. the number of resources is provided
6
Subject mediation: results obtained at IPI RAN (I)
A prototype of the subject mediation infrastructure used for problem solving over multiple distributed information resources (specifically, in the astronomy problem domain) [slide 8]
Methods and tools for mapping and transformation of information models of heterogeneous resources intended for their unification in mediation middleware The Model Unifier prototype tool aimed at partial automation of
heterogeneous information models unification has been implementedFirst version is based on term-rewriting technologyThe second version as an Eclipse platform application based on model
transformation languages is under implementation [slide 9] Methods for information resources semantic interoperability
support in a context of application problem domainTools for identification of resources relevant to a problem on the basis
of ontological descriptions of problem domain Tools for registration of the relevant resources in the mediator
7
Subject mediation infrastructure
8
Resource WrappersData Source WrapperService Wrapper
Semantic Mediation
Middleware
Mediator Specifications
Semantic Mappings
Rule-based Mediator Programs Conventional Application Programs
Application Conceptualization
Cloud ServiceCloud ServiceCloud ServiceCloud Service
Resource RegistryResource RegistryResource Registry
Canonical Information
Model (SYNTHESIS Language)
Clouds
Registry Wrapper
Information Grid
Рабочие станции DataBases
Grid ServiceGrid ServiceGrid ServiceGrid ServiceGrid ServiceGrid Service
Resource Registry
Серверы
Resource RegistryResource Registry
Computation and Information Resource Environments
Application Problem Domain
Computational Grid
Большие ЭВМРабочие станцииСерверыСерверы Серверы
Grid ServiceGrid ServiceGrid ServiceResource RegistryResource RegistryResource Registry Programming
Facilities
Rule-based Programs
Rule-based Programs
Rule-based Programs
Broker
W3C-RIF
Model Unifier architecture
9
Subject mediation: results obtained at IPI RAN (II) Methods and tools for rewriting of non-recursive mediator programs
into resource partial programs oriented on object schemas of resources and mediators and typed GLAV-views
A method for optimizing planning of resource partial programs execution over distributed environment takes into account capabilities of the resources assigns places of operation’s execution on the basis of estimative samples
Methods for dispersed organization of problem solving in the mediation environment An implementation of a problem in mediation environment may be
dispersed among programming systems, mediators, GLAV-views, wrappers and resources
Methods and tools for representation, manipulation and estimation of efficiency of dispersed organization
Algorithms for construction of efficient dispersed organization An original approach for binding of programming languages with
declarative mediator rule language The approach combines static and dynamic binding overcoming
impedance mismatch and allowing dynamic result types10
Directions of research and development
Application-driven approach for scientific problem solving
11
Application-driven approach for scientific problem solving
Approaches to the integrated representation of multiple information resources for problem solving:Resource-driven: an integrated representation of multiple resources is
created independently of the problemApplication-driven: a description of a problem class subject domain
is created, into which the relevant to the problem resources are mapped
Application-driven approach assumes creation of a subject mediator that supports an interaction between a user and resources
12
Experience of applying the application driven approach
The problem of secondary standards search for photometric calibration of optical components of gamma-ray bursts formulated by the Institute of Space Research of RAS
The problem was formalized and implemented applying the subject mediation: A glossary of the problem domain was manually extracted from the textual
specification An ontology required for problem solving was constructed Data structures, methods and functions constituting problem domain schema
were defined Resources relevant to the problem were identified in the Astrogrid and VizieR
information grids SDSS, USNO B-1, 2MASS, GSC, UCAC, VSX, ASAS, GCVS, NSVS
Resources were registered in the mediator and corresponding GLAV-views were obtained
The problem was formulated as a program consisted of a set of declarative rules over the mediator schema
The implemented mediator is used for an application monitoring in real time the e-mails informing about the gamma-ray bursts. The application extracts standards located in the area of a burst and e-mails them to subscribers.
13
Issues requiring further investigations
Semantic identification of resources relevant to a mediator Construction of semantic source to target schema mapping in
the presence of constraints reflecting specificity of various data models
Development of mediator program rewriting algorithms in presence of source and mediator constraints over the classes of objects
14
Directions of research and development
Heterogeneous multidialect mediator infrastructure for data, knowledge and services semantic integration
15
An approach for the infrastructure
Recently W3C adopted Rule Interchange Format (RIF) standard oriented on interoperability of declarative programs
Objective integration of
multilanguage knowledge representations and rule-based declarative programs,
heterogeneous databases and services built on the basis of unified languages and multidialect mediation
infrastructure Idea
Combining RIF standard paradigm andGLAV approach built on the extensible canonical information model
16
Modular mediator infrastructure The multidialectal construction of the canonical model
Mediators are represented as a functional composition of declarative specification of modules
Each module is based on its own dialect with an appropriate semantics Mediator modules as peers:
Rule-based modules become the mediator components alongside with the GLAV-based modules
Interoperability of the modules is based on P2P and W3C RIF techniques.
Combination of integration and interoperability The information resource integration can be provided in the scope of an
individual mediator module The integration approaches in different modules can be different.
Rule-based specifications on different levels of the infrastructure Declarative programming over the mediators Various modules of a mediator Schema mapping for semantic integration of the information resources
in the mediator etc
17
Example of a problem solving in the multidialect mediation infrastructure A problem of finding an optimal assignment of applicants among
universitiesA set of n applicants is to be assigned among m universities, where qi is
the quota of the i-th collegeApplicants (universities) rank the universities (the applicants) in the
order of their preferenceThe aim is to find optimal assignment from the quotas of the colleges
and the two sets of orderingsAn assignment is unstable if there are two applicants α and β who are
assigned to colleges A and B, respectively, although β prefers A to B and A prefers β to α, otherwise an assignment is stable
A stable assignment is called optimal if every applicant is at least as well off under it as under any other stable assignment
Program calculating assignment is defined in DLV (ASP) The required information resources are integrated in a subject
mediator OntoBroker communicates with the users and applying its
ontologies, formulates the queries to the mediator and after collecting the required data, initiates a program in DLV
18
Optimal assignment problem infrastructure
19
RIF-BLD (via XML)
DLV (ASP facilities)
BLD → DLV
DLV → BLD
Synthesis Mediation Environment
BLD → Synthesis
Synthesis → BLD
OntoBroker
OB → BLD
BLD → OB
Resources
Ontologies
Multi-Layered Broker
Resp. 1, 4Resp. 2, 3Req. 2, 3
Req. 1, 4
Requests1. OB2DLV: GetProgram(Loc, Name [Params])2. OB2SYNTH: GetSchema(Loc, Name [Params])3. OB2SYNTH: SendExec(Loc,Name,Prog [Pars])4. OB2DLV: SendExec(Loc, Name, Prog [Pars])
Responses1. DLV2OB: DLV Program (without IDB)2. SYNTH2OB: Synthesis Schema3. SYNTH2OB: Result of OB program execution.4. DLV2OB: Result of DLV program execution.
Issues to be investigated and prototyped
Approaches for constructing of the rule-based dialect mappings Methods for justification of semantic preservation by the
mappings Approaches for modular representation of knowledge in the
multidialect mediation environment Approaches for providing of interoperability of the mediator
multidialect modules Infrastructure design and prototyping Real problems solving in a scientific subject domains chosen Expansion of the experience into the Semantic Web area
20
Directions of research and development
Mediation of data bases with nontraditional data models
21
Non-traditional data models NoSQL data models oriented on the support of extra large volumes
of data applying a “key-value” technology for vertical storageDynamo, BigTable, HBase, Cassandra, MongoDB, CouchDB.
Graph data models Neo4j, InfiniteGraph, DEX, InfoGrid, HyperGraphDB, Trinity, supporting
flexible data structures. Triple-based data model (expressible in RDF, RDFS)
Virtuoso, OWLIM, 5Store, Bigdata. OWL QL profile oriented on a support of ontological modeling over
relational databases and expressed by data dependencies used together with Datalog
“Scientific” data models SciDB applying a multidimensional array data model
Prof. Pentland Connection science-oriented data models
Most of these data models the standards still do not exist Most of these data models and systems are oriented on “big data”
support applying massive parallel technique of the MapReduce kind22
The results of research planned to obtain
Information preserving methods of mapping and transformations of various classes of non-traditional data models into the canonical one
Mappings and transformations for specific data models and of adequate extensions of the canonical data model
Techniques for interpretation of canonical model DML in the DMLs of different classes of non-traditional data models and approaches for their implementation
Architectural decisions on implementation of the massive parallel techniques on the level of mediators, evaluation of performance growth that can be reached
Evaluation of suitability and efficiency of integration of non-traditional data models of different classes in the GLAV mediation infrastructure for various problem domains
23
Directions of research and development
Storage of very large volumes of data [Zakharov]
24
Storage of very large volumes of data [Zakharov]
The objective is to develop a novel distributed parallel fault-tolerant file system possessing the following capabilities:storage of data volumes of petabyte scaleunlimited period of storagescalabilityefficient multiuser access support in different kinds of networksusage of different storage types (e.g., HDD and flash memory)
The experience of existing file systems vendors should be taken onto account:ReFS (Windows Server 8) by MicrosoftVMFS by VMwareLustreZFS by Sun MicrosystemszFS (z/OS) by IBMOneFS by Isilon
25
Directions of research and development
Cyber security issues [Budzko, Korolev]
26
Cyber security issues [Budzko, Korolev] Information integrity and availability support for large-scale data
gathering & mining Technical architectures security analysis (network protocols,
architectures, operating systems, DBMSs, etc.) Vulnerability analysis Development of threat models Protection from insiders in personal information data centers
27
Self-assessment
28
Self-assessment (I) Relevance
Semantic integration of resources in the context of an applicationMediation of knowledgeMediation of non-traditional databasesSemantic Web and Big Data orientation.
NoveltyAn intellectual executable level for declarative conceptual level
specification of the problems in terms of the application domain for problem solving over diverse resources
Methods for information preserving data model mappings and for their implementation
Schema mapping and query rewriting methods in presence of constraints reflecting specificity of diverse data models, etc.
Breadth of scopeRelevant to a broad area of application domains, technologies and
research issues.
29
Self-assessment (II) Challengability
Hard theoretical and implementation problems need to be overcome Entrepreneurship possibilities
Areas of possible application are very diverseTo reach a proper commercialization level serious investments are
required Educational potential
Very broad, various courses can be proposed for master studentsMany challenging research topics for PhD research
30
Coverage of a content of the proposed themes
31
Scientific DataSpace Large-scale federated data architecture Semantic integration of heterogeneous information Context mediation Semantic web
Architecture for semantic mediation and integration of heterogeneous resources
Infrastructures: semantic layer for grids and clouds, P2P heterogeneous knowledge-based mediator infrastructures
Data model transformation, data model unification, declarative canonical model extension and synthesis
Justification of correctness of data model transformation, sets of dependencies (constraints) extending canonical model core should be decidable and tractable
Information resources: semantic description, canonical modeling, wrappers, registries, metadata
Problem domains: conceptual description, ontologies, metadata, multidomains, context mediation
Semantic based information resource discovery Semantic schema mapping for data exchange and integration
32
Data Quality Recognizing and resolving heterogeneous data semantics Effective integration of data from multiple and disparate data
sources
Semantic schema mapping Justification of correctness of data model (schemas and
operations) transformation Dispersed implementation of problems in subject mediation
environment
33
Big Data Data extraction and gathering from the web Federated data systems Parallel infrastructures for high-performance big data
manipulation and analysis Large-scale and novel “big data” applications Novel approaches to development of large-scale data
warehouses
Mediation infrastructure including Grids and clouds Non-traditional data models integration in the canonical data
model Parallel infrastructures at the mediation level Distributed parallel fault-tolerant file system
34
International Cyber Security Secure information architectures Techniques for assessment of threats and vulnerabilities
Cyber security issues
35
top related