catalogue of technologies · the unified digital platform for scientific and scientific-techni-cal...

Catalogue of Technologies

2019

Cata

logu

e o

f T

ech

nologie

s

contents

4 ISP RAS: 25 years of development and growth7 ISP RAS: the main events of 2019

TECHNOLOGIES10 Anxiety dynamic analyzer13 AstraVer verification toolset15 BinSide: binary code static analysis tool17 Constructivity 4D: technology of indexing, searching and

analysis of large spatial-temporal data19 DigiTEF: digital twin platform21 ISP Fuzzer: testing tool23 Klever: the technology for C programs model checking25 Lingvodoc: virtual laboratory for documenting endangered

languages27 Masiw: support for designing highly reliable software systems29 MicroTESK test program generator31 ISP Obfuscator34 Protosphere: network traffic analyzer36 ISP RAS software analysis platform based on QEMU39 Retrascope: static analysis of HDL descriptions41 SciNoon: exploratory search system for scientific groups43 Svace static analyzer46 Talisman: framework for social media analysis49 Texterra: semantic analyzer51 Trawl: binary code analysis platform53 Solutions for creating service-oriented data centers

4

In 2019, ISP RAS celebrates its anniversary: January 25th, 1994 is the day when academician Victor Ivannikov founded the Institute for System Programming, which has become a leading center of excellence in this area in Russia. The ecosystem of ISP RAS is based on a scientific school, which was created in 1960-1970 at the Institute of Precise Mechan-ics and Computer Engineering (IPMCE) under the guidance of academician Sergey Lebedev. The business model of ISP RAS is the “triangle of knowledge” combining education, research and innovation.

Despite this model universality, its successful implemen-tation substantially depends on a number of factors – for example, current economic situation. The first five years after the establishment of the Institute all efforts were aimed at preserving the scientific school and the ability to educate highly qualified scientists. Despite the various problems of the 1990s and the 80% replacement of ISP RAS staff due to the massive “brain drain” from the country, our business model has shown its viability. We continued to work in several directions, developing compiler technologies, operating sys-tems and databases; tried to maintain and increase the flow of students from Moscow State University (MSU) and Moscow Institute of Physics and Technology (MIPT). From the very be-ginning, we have widely used open source software as a basis for long-term development. All this helped us to find foreign customers and to sign our first contracts with large industrial partners, for example, with Nortel Networks Corporation, the Canadian telecommunications and data networking equip-ment manufacturer. Together we have implemented a num-ber of research projects, in particular, in the field of formal verification of programs. We developed the successful model of international cooperation, allowing us to receive industry feedback and financial support for fundamental research.

In the early 2000s we have entered the period of stabilization. During this time the ISP RAS business model demonstrated not only sustainability, but also the ability to ensure rapid de-velopment. Our reputation allowed us to attract new business partners: Intel, HP, Dell. We began to develop new scientific areas: software vulnerabilities analysis, binary code analysis,

Arutyun Avetisyan

Doctor of Physics and Mathematics, academi-cian of the RAS, ISP RAS Director.

25 years of development and growth

Cata

logu

e o

f T

ech

nologie

s

5

natural language processing (NLP), and social media analysis. By 2009, the number of ISP RAS employees exceeded 150 people, the average salary was 47,000 rubles (compared to 22,000 in 2003), and the contract rate with commercial customers reached 73%. The ISP RAS ecosystem moved to a new level: in 2008-2009 the Institute entered the period of growth.

During this decade ISP RAS demonstrated a successful transfer of knowledge and technology beyond the Institute. The joint research with large companies was complemented with the introduction of our technologies, while the intellec-tual property rights remained at ISP RAS. In 2009 we started a long-term partnership and organized a joint laboratory with Samsung. One of the main results of our collaboration was the Svace analyzer developed at ISP RAS. Now it is the main static analysis tool in Samsung and is used for finding defects in source code. At the same time the number of our Russian partners began to grow. We started a long-term cooperation with such companies as VimpelCom, RusBITech and others.

Also we began to organize the network of regional labora-tories for system programming: in Yerevan (2008), Great Novgorod (2009), and Orel (2019). The idea is that such a distributed center of excellence allows to share resources and knowledge and to solve large-scale tasks in the field of software development.

In 2015 ISP RAS opened the chair of system programming in the Higher School of Economics, which became the third alongside with similar chairs in MSU and MIPT. The number of students has increased. In 2016 the Institute launched the tradition of annual Open Conferences with hundreds of participants. We have significantly expanded the list of our international partners. In particular, in 2019 we organized the joint laboratory with Huawei.

In 2019 our annual income exceeds 800,000,000 rubles, which is 3 times more than 10 years ago. Currently the level of contracts with commercial customers exceeds 88% (among which roughly a half is with Russian and a half with foreign companies). The number of employees is constantly growing and now is about 300 people.

ISP RAS has intensified research in such areas as artificial intelligence and big data analysis. Cybersecurity remains one of the main directions of our work. Currently we transform the system of our technologies into the system of platforms aimed at ensuring the country’s technological independence: secure software development lifecycle platform (includes Svace and BinSide static analysis tools, ISP Fuzzer and Anx-iety dynamic analysis tools etc.), text and social media analy-sis platform (includes Talisman framework, Texterra platform), SciNoon system).

Our technologies are included in the Unified Register of Russian Programs and are implemented in large global companies (Samsung, Huawei), as well as in the Russian market (RusBITech, GosNIIAS, etc.). At the same time we try to diversify the risks, and we never rely on a single commer-cial customer.

6

In the field of cybersecurity, ISP RAS collaborates with FSTEC of Russia for creation of specialized standards and meth-ods. We also implement joint projects with educational and research centers: Israel Institute of Technology, ITRI (Tai-wan), Belgrade University, etc. Our future plans include the development of interdisciplinary research (this year we began to work in the field of digital medicine), increasing number of students, further platform development.

During the past 25 years the Institute successfully created an ecosystem that allows educating highly qualified specialists and generating innovations in the field of system program-ming. Our business model has demonstrated successful work in different economic conditions, and our technological background, human resources, reputation and integration with industry allow us to look forward to the future with confi-dence.

This Catalogue provides some information on the main achievements of ISP RAS in 2019 and a detailed description of our innovative technologies.

Cata

logu

e o

f T

ech

nologie

s

7

In 2019 we began to transform the system of our technologies into the system of platforms which can be customized accord-ing to various requirements. This decision was made in order to unite interacting technologies into specialized stacks that can provide efficient, productive and safe software operation.

This year we signed a number of cooperation agreements with various companies and organizations including the Taiwan Industrial Technology Research Institute (ITRI), the University of Belgrade, and the National Research Nuclear University MEPhI, whose students will have the opportunity to practice and write graduate works at ISP RAS. Also in 2019 the Institute created a joint laboratory with Huawei for re-search and development in various fields including compiler technologies and operating systems components.

One of the most important activities in 2019 was the devel-opment and implementation of an additional professional education program together with FSTEC of Russia. The two-week training courses conducted by the Institute’s experts for 3 groups from May to October covered all the technologies mentioned in the new FSTEC regulation norms for finding software vulnerabilities. This year we have trained 56 special-ists who are employees of certification bodies and testing laboratories accredited by FSTEC of Russia.

This year the Institute was chosen as the operator of two interconnected digital platforms, which are created at the initiative of the Ministry of Science and Higher Education. It is the Unified digital platform for scientific and scientific-techni-cal interaction, organization and joint research performed re-motely (also with foreign scientists). It will ensure the effective interaction of researchers and the work of virtual laboratories with access to a variety of services. And also it is the Digital management system for the collaborative research services, including unique scientific facilities and joint use centers. One of its main functions will be providing on-demand com-puting resources, cloud storage and application packages.

ISP RAS also won a competition held by the Russian Founda-tion for Basic Research together with the Ministry of Science and Technology of Israel, and will now be involved in the

the main events of 2019

8

implementation of a medical project with the Israel Insti-tute of Technology (Technion). The project is focused on the development of new methods for the automatic recognition of electrocardiograms from a 12-channel electrocardiograph. ISP RAS specialists will develop and maintain a mobile appli-cation and cloud service.

In 2019 the Institute organized two conferences with the support of IEEE (ISP RAS Open Conference and Ivannikov Memorial Workshop), held a round table discussion “System programming as a key direction of fighting cyber threats” at the International military-technical forum “Army-2019”, and participated in the organization of the OS DAY developer conference, as well as in a number of other events. ISP RAS employees held the contest “Best Open Source Diploma” in the final round of the XII International Olympiad “IT-planet”. In addition, ISP RAS together with MEPhI and a number of foreign research centers became the organizer of the international conference “Intelligent Technologies in Robotics” (ITR-2019).

Another important activity this year was the expansion of the Yerevan and Novgorod laboratories of system programming, opened by ISP RAS in 2008 and 2009 respectively, and also the foundation of a similar lab in Orel. The labs started to develop the direction of big data analysis (in addition to R&D work in the field of program analysis). ISP RAS also signed an agreement with Novgorod State University concerning the expanse of scientific and educational activities in connection with launch of the master’s program “Big Data Information Technologies.”

Also this year together with the Yerevan lab’s scientists we launched the transformation of higher education in the largest universities of Armenia. Now the courses in the Russian-Armenian University and in the Yerevan branch of MSU are being synchronized with the courses taught at the Department of System Programming at the Faculty of Com-putational Mathematics and Cybernetics (MSU).

In 2019 ISP RAS received 5 software registration certificates and 1 invention patent (a method for verifying a formal au-tomaton model of a software system behavior). Six programs were included in the Unified Register of Russian Programs:

On November 15th, 2019 Director of ISP RAS Arutyun Avetisyan became the first academician of the RAS in applied mathe-matics, computer science and cybersecurity (Department of Mathematical Sciences of the RAS). Two employees of the Institute won the 2019-2021 competition and will receive the President’s of Russia scholarship for young scientists and graduate students working in the area of “Strategic Information Technologies, including the creation of supercomputers and software development.” One of the employees received the Moscow Government Award for young scientists.

— DigiTEF (No. 5377),— Trawl (No. 5323),— Talisman.Biography (No. 5547),— Talisman.Flow (No. 6045),— Asperitas (No. 5921),— Fanlight (No. 6066).

Cata

logu

e o

f T

ech

nologie

s

9

Technologies

10

DYNAMIC ANALYZER

Features and advantages

Anxiety is a framework for finding errors and potential vulner-abilities during software development, QA, and deployment phases. It is based on dynamic symbolic execution, which allows generating input data for the issues found without source code or debugging information present in the binary. Anxiety can be used for adhering to the GOST R 56939-2016 requirements.

Anxiety’s special feature is the combined approach to dynamic analysis, which involves the integration with static analyzers and fuzzing tools. The successful combination of technologies allows Anxiety to solve the same tasks as the leading global competitors (CA Veracode Dynamic Analysis, Synopsys Dynamic Application Security Testing and Rogue Wave CodeDynamics).

Anxiety provides:

— Easy development of analysis tools (checkers) based on the dynamic symbolic execution approach that are designed for finding certain types of errors;

— High-level analysis performance due to the distributed and parallel analysis support, integration with a fuzzer, and sup-porting filters for input data stream and analyzed functions;

— Integration with static analyzers of source or binary code for the implementation of directed analysis, which allows selec-tive testing of the target program components. This analysis mode verifies the defects previously found by static analysis (in particular, division by zero, null pointer dereference, infinite loops, user asserts violation etc.);

— Integration with randomized testing tools (fuzzers) to increase the analysis performance. The integration solves the well-known fuzzing problem of passing a conditional that depends on comparison with a constant. Employing fuzzers allows covering the program code with input data sets much faster than when using dynamic symbolic execution alone;

— Modular infrastructure (tracer, checker and input data genera-tor) allowing to exchange freely the individual system compo-nents and to expand its functionality;

— Support of various sources for input data (such as files, net-work sockets, environment variables, standard input flow);

— Solving various analysis tasks that are convenient for a dy-namic symbolic execution tool, such as reachability problem for the given function or instruction;

— Can be used for adhering to the GOST R 56939-2016 re-quirements (when certifying software within Russia), as well as the new FSTEC regulation norms for finding software vulnerabilities.

Cata

logu

e o

f T

ech

nologie

s

11

What is Anxiety target audience?

Anxiety deployment stories

Supported environments and tools

— Companies aimed at software development with a special focus on high reliability and security.

— Companies responsible for software audit or certification.

Anxiety is used for testing packages included into the Astra Linux OS.

Anxiety supports Windows OS (XP version and higher) and Debian Linux OS, as well as various SMT solvers (STP, Z3, MathSAT, etc.). It is based on the DynamoRIO dynamic instru-mentation environment (instructions are processed by the Triton framework to support Windows) and Valgrind dynamic binary instrumentation framework, which is expanded with the custom plugins for gathering execution grace and basic block coverage calculation.

Anxiety workflow

Dynamic symbolic execution

coverage analysis

Solvers

Valrgind

DynamoRio

defects and input data

CVC4 MathSAT

input data Tracercombined

trace

PIN

trace splitter

STP Z3

SMT-LIB2

Linux only

Windows and Linux

CVC andSMT-LIB2

new input datainput data generator

path constraint

Dyninst

input dataand metric

dangerous operations

new path constraint

12

Fuzzing

Binary code Fuzzer

Input data

Input data and branching

DSE

Binary code

Input data and

addresses

Input dataвходные данные

Cata

logu

e o

f T

ech

nologie

s

13

VERIFICATION TOOLSET


AstraVer Toolset is a deductive verification system for key software components. It allows developing and verifying security policy models as well as proving the correctness of software modules written in the C programming language. Astraver is essential for ensuring the required trust levels from ADV_SPM and ADV_FSP assurance families as defined in the ISO/IEC 15408 standard.

AstraVer Toolset is a set of tools designed for industrial use. It is based on many years of scientific research and com-bines two verification approaches: at the model level and at the code level. Parts of the AstraVer Toolset are similar to Microsoft VCC and Frama-C WP, but unlike those Astraver is specifically designed to support the key security compo-nents’ verification in the Linux kernel. AstraVer Toolset is free and open source: http://linuxtesting.org/astraver.

AstraVer provides:

— An integrated approach to verification, supporting the formal-ization of high-level requirements and analyzing the C source code behavior;

— Modeling and formalizing functional requirements, proving internal consistency and unreachability of insecure states;

— Verification of critical components written in C (formalization of requirements, correctness proof on all possible input values);

— Support for real industrial C code (GCC compiler extensions, arithmetic operations with bitwise precision, address arithme-tic including container_of intrinsic, function pointers, casting);

— Adhering to the protection profile requirements (ISO/IEC 15408)— formal security policy modeling; — formal verification of internal consistency of a security

policy model; — formal proof that the target system cannot reach an inse-

cure state; — development of a formal or a semi-formal functional

specification; — formal/semi-formal proof of correspondence between the

security policy model and the functional specification; — formal/semi-formal proof of correspondence between

different representations of target software like functional specification, design and source code.

— Ability to adjust the toolset for a specific customer to perform the verification of the C source code components.

14

Who is AstraVer target audience?

AtstraVer deployment stories

— Companies developing critical systems, including software in the aviation, railway, medical and nuclear power industries;

— Companies that need certification of their software as guided by the ISO/IEC 15408 standard;

— Certification laboratories.

AstraVer Toolset was used in the development of access control mechanisms for Astra Linux Special Edition (RPA Rus-BITech JSC). As a result, this Astra Linux edition has passed the certification for compliance with the information security requirements of FSTEC, which are set for operating systems defined in the 2A protection profile. Both the security policy model and the access control mechanisms source code were successfully verified using AstraVer Toolset. The verifi-cation work for the new security model features is constantly ongoing.

AstraVer workflow

Manual development

Automatic verification

Linux Security Module

Deductive verification of security models

Deductive verification of operating system components

Functional security

requirements

Linux kernel API

Security policy model

Formal functional specification

LSM-level requirements

LSM designFormal design of LSM

Pre- and postconditions of LSM operations

Specification of library functions

Custom LSM

Linux kernel

Cata

logu

e o

f T

ech

nologie

s

15

A BINARY CODE STATIC ANALYSIS TOOL


BinSide is a static program analysis tool for finding defects in binary code. It is useful when checking programs without source code, such as closed source 3rd party libraries, as well as assisting with required static information to dynamic analysis tools.

BinSide is a binary code analysis tool based on the BinNa-vi framework, which translates assembler code into a REIL representation. REIL allows analyzing binary code in a target processor and OS independent way. BinSide is integrated with the IDA Pro interactive disassembler, a widely used tool for reverse engineering.

BinSide provides:

— Easy extension:— individual error detectors are written as plugins that can

be quickly added and changed;— the REIL representation of 17 instructions without side

effects is used (each assembly instruction is translated into a set of REIL instructions);

— Plugins for the most critical error types (including format string vulnerabilities and pointer handling errors);

— Finding two buffer overflow types (a vulnerability is detected when the user controls the input buffer):

— happens when copying information from a larger buffer to a smaller one (for example, using dangerous functions such as strcpy, memcpy, etc.);

— happens when copying one buffer to another without checking the boundaries of the first buffer (for exam-ple, copying until the end of line character).

— Finding heap memory management errors including use after free and double free error types;

— A powerful flexible engine with the following main analysis types:

— value and pointer analysis, tracking tainted data, static and dynamic memory models, as well as data flow and control graph analysis;

— finding errors on all execution paths (including those not covered by testing or dynamic analysis);

— marking functions in IDA Pro as having tainted data sources, or functions with inaccurate memory manip-ulation;

— binaries are imported from IDA Pro, which allows ana-lyzing non-standard or obfuscated binary files.

16

Who is BinSide target audience?

System requirements

— A plugin for program patch static analysis;— High processing speed (BinSide works about 2000 seconds

on a binary file with more than 3000 functions);— Converting result to the Svace analyzer format (in the pres-

ence of debug info) for displaying errors in a common web interface and providing navigation in source files.

— Companies that need to check thoroughly the used 3rd-party software with no access to its source code;

— Developers who need to increase dynamic analysis quality with the data collected by a static analysis.

BinSide supports analysis of executable files and libraries of x86, x64, ARM, PowerPC and MIPS architectures.

BinSide workflow

binary code

assemblercode

Bin Export analysis

Plugins

specification

potentialdefect

REIL

IDA

PostgreSQLdatabase

Cata

logu

e o

f T

ech

nologie

s

17


Constructivity 4D is a technology for creating innovative software services that are capable of processing highly dy-namic scenes and vast arrays of spatial and temporal data. It performs visual analysis of millions of objects with individual geometry and dynamic behavior. Constructivity is deployed within the Synchro system that is used for 4D modeling of extremely large construction sites.

Constructivity 4D is a production level technology that puts together original methods of spatio-temporal indexing, search and qualitative and quantitative data analysis. The developed methods account for the specifics of objects’ ge-ometric representation, complex organization and the apriori known nature of their dynamic changes.

Constructivity 4D provides:

— Support for a well-developed set of operations:— Temporal operations implement the classical interval

algebra introduced by Allen with respect to time stamps of discrete events and their intervals;

— Metric operations allow determining the individual prop-erties of geometric objects and the characteristics of their mutual arrangement. Diameter, area, volume, center of mass, planar projections, and distances between ob-jects can be calculated for solid geometric objects;

— Topological operations are intended to classify the rel-ative location of objects and establish the facts of their coincidence, intersection, coverage, touch, overlap or col-lision. In comparison with the known topological models DE-9IM, RCC-8, RCC-3D the operations allow construc-tive implementation and are applicable for the analysis of complex objects;

— Orientational operations generalize the known Frank’s and Freksa’s relative orientation calculi, cardinal direction calculi (CDC), oriented point relation algebra (OPRA) and are applicable for the analysis of objects with extended boundaries.

A TECHNOLOGY OF INDEXING, SEARCHING AND ANALYSIS OF LARGE SPATIAL-TEMPORAL DATA

18

— Efficient query execution and typical problems solving, in par-ticular, queries for reconstructing a scene at the given point in time, retrieving objects in the given spatial region, finding nearest neighbors, determining static and dynamic collisions, and conflict-free routing in a global dynamic environment are effectively resolved;

— A spatial-temporal indexing system including binary event trees, spatial decomposition trees, bounding volume trees, object cluster trees, space occupation trees;

— A hybrid computational strategy for determining collisions in scenes that combines methods for precise collision determi-nation, collision localization methods using spatial decompo-sition, methods of hierarchies of bounding volumes, temporal coherence methods;

— An object-oriented library implemented in C++ that including extensible set of classes, interfaces and related methods for specifying spatial-temporal data and executing typical que-ries;

— An original method for navigation in global dynamic environ-ment is based on extracting spatial, metric and topological information from geometric representation of 3D scenes and its concerted usage on path planning;

— Various options for extending the library so that it can be used both in the development of new software applications and in legacy applications.

The technology is used for creating application systems in vastly different fields, including but not limited to: computer graphics and animation, geoinformatics, scientific visualiza-tion, design and manufacturing automation, robotics, logis-tics, project management and scheduling.

The technology has been successfully deployed within the Synchro software system (https://www.synchroltd.com) that is designed for visual 4D-modeling, planning and management of large-scale industrial projects in the construction and in-frastructure areas, as well as others. Synchro is used in more than 300 companies in 36 countries.

Who is Constructivity 4D target audience?

Constructivity 4D deployment stories

Cata

logu

e o

f T

ech

nologie

s

19

a digital twin platform


DigiTEF is a software platform based on OpenFOAM and oth-er open source tools, as well as unique modules and libraries developed at ISP RAS. DigiTEF solves various application problems of gas dynamics, aerodynamics, hydrodynamics, and acoustics. It is tailored for creating and working with highly sophisticated digital models of industrial devices. Dig-iTEF is included in the Unified Register of Russian Programs (No. 5377).

The platform delivers the same level of user experience as its competitors worldwide. The DigiTEF core performance and accuracy evaluations compared with ANSYS Fluent and Star CCM+ showed similar (and in some cases lower) computa-tional costs with the same accuracy.

A community of engineers, researchers, and industrial project developers are formed around the DigiTEF platform.

DigiTEF means:

— the open source code (allows to control and to adapt imple-mented algorithms);

— the development pace as in OpenFOAM+; — the automation tools for computation and model integration

that allow integrated research of technical objects;— the possibility of developing additional components accord-

ing to the specific requirements.

DigiTEF consists of two main blocks:

1. OpenDTE, the platform core based on OpenFOAM. It contains the basic algorithms, procedures, and functions, as well as a set of third-party libraries in C++. It is completely open and can be obtained at https://github.com/unicfdlab. OpenDTEF consists of the following components:

— tools for modeling compressible flows;— settings setup for advanced cases based on swak4Foam;— parameterization based on Python. This allows automating

calculation cases as well as integrating Salome, ParaView, and CodeAster software systems into DigiTEF.

2. Modules developed at ISP RAS:— Data analysis for visualizing and retrieving information. It is

designed to analyze the results and build models of reduced dimension using data processing methods (FFT, POD, DMD, Hilbert transformations);

— Compressible flows simulation based on quasi-gas dynamics (QHD) equations, allowing to use the spatio-temporal averag-

20

Who are DigiTEF users?

Deployment stories

System requirements

Workflow

DigiTEF is designed for use in the facility of resource-intensive industries. Using digital twin models allows increasing engi-neering efficiency as well as reducing the cost and complexity of the industrial projects implementation.

DigiTEF is used in several projects in the fields of wind energy, aerospace, aviation, metallurgy, as well as in the oil and gas industry.

Linux OS. Other operating systems that support the Oracle VirtualBox virtual machine may also be used (on Microsoft Windows 10 via the Bash shell). Moreover, the performance loss due to virtualization does not exceed 5%.

Required RAM: 16 Gb or higher.

DigiTEF supports parallel computing, which significantly speeds up its work. Also, it supports the use of high-perfor-mance computing systems (supercomputers and clusters) to accelerate the calculations. The maximum tested number of cores is 1536.

ing procedure to determine the main gas-dynamic quantities (density, velocity, temperature, and others);

— Incompressible flows simulation based on QGD equations. The module is applicable in oceanology, convection, and sub-sonic flows problems;

— Incompressible and compressible flows simulation based on the Pimple and Kurganov-Tadmore hybrid algorithm;

— Subsonic turbulent flows simulation using the hybrid URANS / LES approach and low dissipative numerical schemes;

— Acoustic analysis. The module implements the Curle and Focs Williams-Hawkings analogies.

Setting of a unique problem

Specialized digital model Digital model

Unique module

SALOME

BEM++

AMReX

Code_Aster

Input dataSetting of a standard

task

PANS/LES

PyFoam

…

Swak4Foam

HCS module

CAA module

As a platform for integration As a modeling tool

QGD module

Digital Test Facility

Cata

logu

e o

f T

ech

nologie

s

21

A TESTING TOOL


ISP Fuzzer is a tool for performing dynamic program analysis based on the fuzzing approach. It can detect errors, back-doors, and vulnerabilities either with or without access to the program’s source code. ISP Fuzzer allows organizing a development process that adheres to the GOST R 56939-2016 requirements.

ISP Fuzzer is a dynamic analysis tool that is essential at soft-ware development, testing, and deployment phases. It has the same level of user experience as its global competitors (Synopsys Codenomicon, beSTORM, Peach Fuzzer), but it is more convenient for Russian companies in the context of the import substitution.

ISP Fuzzer provides:

— Fuzz testing through various input data sources (files, com-mand line arguments, standard input stream, environment variable arguments, network sockets);

— The ability to add custom mutational transformations (for new input data generation and increasing fuzz testing efficiency);

— Modules of pre and post processing of input data for per-forming data transformations before submitting it to the analyzed software;

— Parallel analysis support both for a single machine and for distributed hardware;

— Support of custom plugins for sending data over the network (plugins allow to interact with a client or a server software and to send mutated data);

— Ability to integrate with the security development lifecycle tools developed at ISP RAS:— using Anxiety dynamic analyzer to overcome conditional

branches that had not been passed via fuzz testing;— automatically utilizing input data on which the Binside

static analysis tool has detected an error;— utilizing an error trace generated by the Svace static anal-

ysis tool.— Integration with IDA PRO disassembler:

— Coverage export for the Lighthouse plugin for displaying covered basic blocks in the code;

— Displaying covered basic blocks percentage. — Ability of client and server software analysis, operating with

stateless and stateful protocols;— Easy extension with new algorithms within the existing infra-

structure; quick adaptation to new tasks; — Can be used for adhering to the GOST R 56939-2016

requirements (when certifying software within Russia), as well as the new FSTEC regulation norms for finding software vulnerabilities.

22

System requirements

What is ISP Fuzzer target audience?

ISP Fuzzer deployment stories

Fuzzer workflow

Support for Linux and Windows OSes. ISP Fuzzer is able to perform fuzz testing of embedded devices (controllers, IoT devices) and also of Windows services and COM objects.

ISP Fuzzer is deployed in RusBITech, Security Code, and Swemel companies. It is also used in a project for creating a custom specialized fuzzer that is performed together with Industrial Technology Research Institute of Taiwan (ITRI).

Companies aimed at software development with a special focus on high reliability and security.

PeachPits

Mutationplugins

Classificator

Anxiety

Unique crashesfiltration

Initial input

DynamoRIO

Input

New input considering

coverage

Instrumentator

Input generator

Coverage measurer

Crash

Windiws & Linux

Cata

logu

e o

f T

ech

nologie

s

23

THE TECHNOLOGY FOR C PROGRAMS MODEL CHECKING


Klever is a framework for checking models extracted from the source code of large software systems developed in the C programming language. Klever provides means for automatic checking of a variety of security, robustness, and performance requirements.

Klever is a result of scientific research and development in the field of automatic model checkers extracted from the source code non-interactively. The framework implements a modular approach for verification of software systems with hundreds of thousands and millions of lines of the C code. Klever is an open-source project (forge.ispras.ru/projects/klever).

Klever provides:

— High-precision sound analysis of production software (re-vealing all possible violations of specified requirements and proving the correctness of a program under explicitly stated assumptions);

— Checking an extendable suite of requirements (checking memory safety and usage correctness of specific APIs).

— Scalability (modular verification of a program allows to apply the most rigorous methods of program analysis such as model checking and symbolic execution);

— Comprehensive representation of found faults (in addition to a fault location the framework provides an action trace required to reproduce the fault in a convenient web-interface. In addition, it is possible to automatically generate input data from such a trace);

— Adaptation of the technology to customer’s needs, including adding new detected faults, fast specifications’ development for the given program’s requirements, as well as specifica-tions to model an environment and plug-ins in some cases;

— The convenient multi-user web-interface to set up and to run verification processes and for doing an expert analysis of verification results.

24

Who is Klever target audience?

Klever deployment stories

System Requirements

— Companies that develop safety and security-critical software.— Certification centers.

Ubuntu 18.04, at least 4 CPU cores, 16 GB of memory, 100 GB of disk space.

— The Klever technology was developed by the Linux Verifica-tion Center (http://linuxtesting.org) supported by the Linux Foundation and hosted by the Ivannikov Institute for System Programming of the Russian Academy of Sciences. Today Klever is used for verification of various operating systems.

— Klever was used for verification of Linux device drivers and kernel subsystems to demonstrate its possibilities. The number faults found by Klever and confirmed by Linux kernel developers exceeded 350. The faults include buffer overrun errors, null pointer dereference, use of uninitialized memory, repeated or incorrect memory deallocation, race conditions and deadlocks, leaks of specific Linux kernel resources, incorrect function calls depending on the context, incorrect initialization of specific data structures of the Linux kernel.

WorkflowAdaptation of the verification system to the

target software system

Configuration and start of verification processes

Automatic verification

Expert analysis of verification results

Cata

logu

e o

f T

ech

nologie

s

25

A VIRTUAL LABORATORY FOR DOCUMENTING ENDANGERED LANGUAGES

Features and Advantages

Lingvodoc is a system intended for collaborative multi-us-er documentation of endangered languages, creating multi-layered dictionaries and performing scientific work with the received sound and text data. It is a result of joint project with the Institute of Linguistics of the Russian Acad-emy of Sciences and Tomsk State University. Lingvodoc is under active development since 2012 and can be found on lingvodoc.ispras.ru.

Lingvodoc is an open source cross-platform system based on an innovative research (github.com/ispras/lingvodoc, github.com/ispras/lingvodoc-react).

Lingvodoc provides:

— Collaborative work on dictionaries (as opposed to the similar Starling project);

— Saving full history of user actions;— Working with audio-textual corpuses and dictionaries simul-

taneously based on the integration with the ELAN system developed by the Max Planck Institute of Psycholinguistics (Netherlands);

— Creating and editing unidirectional and bidirectional connec-tions between lexical entries within dictionaries as well as external connections between dictionaries;

— Recording, playing and storing sounds with markup (in WAV, MP3 and FLAC formats), and also construction of vowel formants and with the following data visualization;

— Advanced search that supports multiple parameters (as opposed to the similar TypeCraft project);

— Ability to search data on a map with automatic construction of isoglosses;

— Conflict-free bilateral delayed synchronization;— Increased automation (compared to the similar Kielipankki

project);— Creation of dictionaries of any structure: typical two-layer

dictionaries with lexical entry layer and paradigms layer, or multi-layer dictionaries. Importing dictionary structures is also supported;

26

Who is Lingvodoc target audience?

Lingvodoc deployment stories

Lingvodoc workflow

— Using either ISP RAS cloud infrastructure resources (the sys-tem backend is now optimized for working with the VMEm-peror architecture) or local resources with data isolation;

— Desktop and web-based versions;— Open registration (confirmation required);— Fast development for extending the system features as well

as the easy adaptation to another scientific field.

Lingvodoc is designed primarily for language experts performing a research in the area of documenting the en-dangered languages. It is, however, possible to adapt the technology for other purposes.

Lingvodoc is currently used in collaborative projects with the Institute of Linguistics of the Russian Academy of Sciences and Tomsk State University.

Language expertLingvodoc Frontend

web-interface

lingvodoc backend

Programmer

Browser

react

pyramid

apollo

celery

redux

dogpile

semantic UI

graphene

wavesurfer

SQLAlchemy

leaflet

C extensions

python

ruby

c ++

lua

C#

java

apple swift

scala

Any language with HTTP

support

Using bash and curl

Using browser add-ons

(such as Altair)

javascript

python 3.5 +

GraphQL HTTP Protocol

GraphQL HTTP Protocol

Cata

logu

e o

f T

ech

nologie

s

27

SUPPORT FOR DESIGNING HIGHLY RELIABLE SOFTWARE SYSTEMS


MASIW is a toolset for developing highly reliable hardware and software systems for avionics, medicine, and other safety critical areas. It is designed for engineers creating airborne hardware/software systems that are developed using the integrated modular avionics (IMA) approach. MASIW can be easily adapted for other application areas.

MASIW is the technology for optimizing the development and verification process of complex hardware/software systems. It allows performing a preliminary quality assessment of the product before making the first prototype, as well as perform-ing the fault tolerance analysis. This reduces the risk of errors and defects. MASIW is developed jointly with «GosNIIAS».

Despite the presence of the OSATE tool at the start of devel-opment, MASIW currently is more functional in the areas of verification, static, and dynamic analysis.

MASIW provides:

— Creation, editing and management of models based on the AADL modeling language:— creation and editing of models using the text and diagram

editors;— support for team development with the ability to track and

modify individual elements of a model;— support for the third-party AADL models reuse.

— Model analysis:— hardware+software system structure analysis: hardware

resources sufficiency, interfaces consistency, etc.;— verification of the developed system for compliance with

the requirements;— transmission characteristics analysis for the AFDX net-

works: message latencies, port queue depth, etc.;— generation and analysis of fault trees (FTA) to determine

probabilities of high-level fault events;— architecture-model based analysis of failures and their

consequences, including generation of special descrip-tive tables;

— simulation of hardware+software system model with user reports generation including software-in-the-loop execu-tion of on-board partitions with RTOS co-emulated with QEMU.

— Model synthesis:— distribution of software applications by computational mod-

ules, taking into account hardware resource limitations and additional restrictions regarding reliability and security;

28

MASIW Workflow

— processor schedule generation (in particular, for ARINC-653 compatible real-time operating systems).

— Configuration data generation:— development of specialized configuration data tools

based on the provided software interface (API);— configuration data generation for the VxWorks653 RTOS

and for the AFDX network equipment.— The ability to extend the toolset by creating own modules.

Results of model analysis

Repositories of AADL-models (SVN, Git, etc.)

AADL-model libraries

mupd5_default.xml

Data from hardware or

softwarevendors in the form of

configuration files, AADL-

models and the requirements

for them

Model refinement

IMA system analyzers

PyCL Checker

VxWorks653JetOS

Configuration data for

software-hardware

system

AFDX network analyzers

REAL Checker

MASIW

Automation of the IMA system

design

Reports and documentation

FTA FMEA

The software-hardware system in the form of AADL-models

Cata

logu

e o

f T

ech

nologie

s

29

A Test Program Generator


MicroTESK is an industry-targeted framework for generating test programs in the assembly language for functional veri-fication of microprocessors. Based on formal specifications of microprocessor architectures, MicroTESK allows con-structing test program generators automatically. MicroTESK supports a variety of architectures ranging from CISC/DSP to RISC and VLIW.

MicroTESK is the state-of-the-art production solution that includes the modeling framework (building models of microprocessors based on formal specifications) and the generation framework (building test programs based on test templates). MicroTESK delivers value to the users that is simi-lar to global competitors (e.g., Genesys Pro and RAVEN) but outperforms them with increased usability and performance. Also, it is distributed under the open-source Apache 2.0 license.

It is free for download on the ISPRAS website: forge.ispras.ru/projects/microtesk. The technology is also presented at www.microtesk.org.

MicroTESK provides:

— Using formal specification as a source of knowledge about the microprocessor under verification: — architecture specification in the nML language (regis-

ters, memory, their addressing modes, instruction logic, text/binary instruction representation);

— additional memory subsystem specifications in the mmuSL language (properties of memory buffers (TLB, L1, and L2), address translation logic, read/write opera-tions logic);

— a potential possibility to make a transition to formal ver-ification and to the automatic toolchain generation for a microprocessor under development (disassembler, emulator, etc.);

— Test programs generation based on object-oriented test templates: — test templates in the Ruby language (so that the tem-

plates are human-readable and easy-to-support); — possibility of using different generation techniques of

instruction sequences and test data simultaneously (random generation, combinatorial generation, con-strained-based generation, etc.);

— generation framework scalability (the ability to develop complex test templates at low cost due to reuse).

30

System requirements

MicroTESK deployment stories

Workflow

— A wide range of supported microprocessor architectures: — support of many architecture specific features (RISC,

CISC, VLIW, DSP); — MicroTESK-based test program generators have been

developed for such architectures as RISC-V, ARM, MIPS, PowerPC;

— multicore architectures are supported.— Fast adjustment to a new microprocessor architecture with

minimal costs and automatic extraction of information about test situations (due to formal specifications);

— Convenient language for developing test templates that allows describing complex verification scenarios quickly.

Windows or GNU/Linux-based OS, Java 8.

MicroTESK is developed since 2007. It was used in various Russian and international projects on developing modern industrial microprocessors, including production projects on verifying ARMv8, MIPS64, and RISC-V microprocessors.

Verification engineer

Translator

Specifications

Test templates

Test programs

Simulator

Constraints

Generator

Extensions

MicroTESK: Test Program Generator

Model

Core

Cata

logu

e o

f T

ech

nologie

s

31

ISP


Obfuscator is a set of technologies to prevent mass ex-ploitation of vulnerabilities resulting from errors or back-doors. In case a hacker is capable of attacking one of the devices that has a certain software installed, the rest will remain protected by changes made by the tool to the soft-ware code.

Obfuscator protects the system from mass exploitation of vulnerabilities using various methods of code diversification and allows compiling the code of full OS distribution.

ISP Obfuscator provides:

— Fine-tuning the balance of obfuscation level and perfor-mance (when protecting against reverse engineering). The minimum speed degradation is 1.2 times, the maximum is 8 times;

— Full automation (no changes to the program source code or efforts to integrate with the build system are required);

— Based on the GCC compiler, which allows correctly building the full OS code;

— The original control flow integrity technique (CFI), which successfully counteracts most of code reuse attacks (ROP, JOP, ret-to-libc, etc.). The implemented CFI support within the GCC compiler shows the average slowdown of about 2% on the SPEC CPU2006 test suite, which is noticeably lower than that of the traditional methods;

— Two diversification approaches:— Dynamic code diversification at program startup. It is used

when the customer needs the same binary code deployed on all devices (for example, because of the certification procedure). This method allows shuffling up to 98% of code with a slight increase in size and a performance degradation of about 1.5%. The obfuscator provides the following advantages over the similar products:

— Shuffling with function granularity (as opposed to ASLR and Pagerando technologies that randomize only large blocks of code);

— Shuffling functions throughout the whole OS code except the kernel, and avoiding conflicts with the antivirus software (compared to the Selfrando technology developed for the Tor Browser);— Static code diversification. During each separate compi-

lation, depending on the specified key, the new exe-cutable file is created. This approach has the following advantages:

— the binary code size does not increase (which is particularly important for the Internet of things use case);

— performance degradation is close to zero;— an extended set of diversifying transformations can be ap-

plied and more flexibly customized, as the required opera-tions are performed within the compiler during build time, as opposed to the linker.

32

Who is Obfuscator target audience?

System requirements

Obfuscator deployment stories

Obfuscator workflow

— Developers of specialized operating systems;— Application software developers.

Obfuscator is a universal product that can be adapted to many system requirements. The production version is cur-rently running on a Linux-based OS (version 2.6 and higher) with the Intel x86 / x86-64 architecture support.

ISP Obfuscator is deployed in the Zirkon OS, which is used by the Ministry of Foreign Affairs and the Border Guard Service of the Federal Security Service of Russia.

— support of Control Flow Integrity (CFI).— Conflict-free combination with other software protection

tools (including the ASLR system mechanism).

Standard compilation

Errors

Hacked

Errors

Source code

Exploit

Executablecode

GCC compiler and linker

Static diversification

Seed 1

Seed 2

Errors

Seed 3

Not Hacked

Errors

Hacked

Not Hacked

Source codeStatic diversifying

GCC and Linker

Executable code 1

Executable code 2

Executable code 3

Exploit

Cata

logu

e o

f T

ech

nologie

s

33

Dynamicdiversification

Errors

Source code

GCC compiler and modified linker

Modified diversifying

dynamic loader

Executable code

Data for diversification

Run of executable code 1



Exploit

34

A NETWORK TRAFFIC ANALYZER


Protosphere is a system of deep packet inspection (DPI). It is the part of intrusion and information leak protection systems. Proto-sphere detects inconsistencies between a protocol specification and a specific implementation. It allows you to add support quickly for new protocols (either open or closed) due to the flexibility of its internal representation.

Protosphere is an innovative system based on the innovative research in the area of network traffic analysis. It combines the key features of similar tools (e.g. Wireshark, Microsoft Message Analyzer) with an universal data representation model that enables rapid expansion of analysis capabilities.

Protosphere provides:

— Advanced system core:— universal data representation model used when parsing

network traffic;— processing of corrupted, reordered or duplicated packets;

handling of packet loss; processing of asymmetric traffic;— compressed/encrypted data analysis;— arbitrary configuration tunnel support;— support for causality of network flows.

— Support for all stages of network trace analysis (each stage has a visualization component that are synchronized between stages): — network connections localization in the network interaction

graph and the network flow tree;— detailed view of the selected connections in the timeline

diagram;— interactive visualization of the parsed network packets in

the stream tree;— detection of discrepancies between a protocol implemen-

tation and the actual traffic in the diagnostic log;— arbitrary OSI-layer data extraction and analysis (L7+).

— Quick extension of supported protocols:— access to parsing results via API;— parsing errors localization;— debugging the module being developed on real-time traffic

and network traces.— Support for both online and offline analysis modes;— Advanced GUI provides choice of the most convenient way to

present the analysis results;— Universal data representation model to accelerate customiza-

tion:— support for new protocols;— extract data in a desired format;

Cata

logu

e o

f T

ech

nologie

s

35

Who is Protosphere target audience?

SUPPORTED PLATFORMS AND ARCHITECTURES

Protosphere Workflow

— configuring the analysis results format.— Adjustment to network bandwidth and available computa-

tional resources to find a balance between accuracy of the analysis and the resources consumed.

— Testers of network protocol implementations including those in embedded OS and network hardware;

— Developers of network security tools, such as firewalls and IDS/IPS;

— Manufacturers of network hardware that must be certified;— Companies requiring real-time control and monitoring of

network channels.

Architecture: Intel x86-64.Platforms: Windows, Linux kernel based OS.

Network traces

support for new protocols

Parse errors

Interactive offline analysis

Protosphere source code

Core— module management— access to parsing results— parser failure diagnostics

Modules:— recognizers— parsers

Network traffic analysis— Online modules— Online core

Data extraction modules

Third-party tools

statistics on network

protocols

LANmonitoring

file extractor

DLP

Network traces analysis—GUI— offline modules— offline core

AP

I

36

ISP RAS SOFTWARE ANALYSIS PLATFORM BASED ON


ISP RAS Foundation Platform for creating program analysis systems is built on top of open source QEMU emulator. The platform is essential for organizing multi platform and cross platform development.

QEMU supports emulation of more than 10 instruction set architectures (i386, AMD64, ARM and Thumb, MIPS, Power-PC, etc.), as well as guest debugging via GDB Remote Serial Protocol (the clients can be IDA Pro, GDB, Eclipse IDE, etc). QEMU supports full system emulation mode that allows de-bugging low-level software such as a bootloader and an OS kernel. The QEMU source code is regularly checked by static code analysis tools, including Coverity and Svace that makes malware analysis more secure.

QEMU with reverse debugging and introspection support is available on the ISPRAS GitHub page: github.com/ispras/swat. The developed QEMU tools are available at github.com/ispras/qdt, github.com/ispras/i3s.

ISP RAS QEMU Foundation Platform provides:

— A record and replay mechanism: — One can record non-deterministic events once, then

replay the recording deterministically, many times. The same VM execution is replayed every time. It makes find-ing bugs in multi-threaded applications (race conditions, deadlocks) easier;

— GDB-compatible reverse debugging is implemented based on the record and replay mechanism. VM snap-shots and the ability to deterministically replay VM execu-tion are used to reconstruct previous states;

— The minimum required information is recorded. This al-lows one to record longer for debugging rarely occurring errors;

— Low performance overhead caused by recording. This enables analysis of malware that interacts with external environment in real-time.

— VM Introspection solution without any guest modifications. This allows updating gathered low-level information to a high-er level (OS-level information): — Recovering the following OS-level information: system

calls, access to shared libraries (.dll and .so), list of run-ning processes, list of open files and loaded modules;

— Supports all Linux distributions and firmware for various embedded devices;

Cata

logu

e o

f T

ech

nologie

s

37

— WinDbg server support in QEMU enabling kernel-mode debugging via the KDCOM Protocol. There is no need to enable kernel debugging mode in the guest OS;

— Speeding up QEMU development:— Faster development of dynamic analysis tools that can

analyze binary code for specific hardware;— Automatic TCG front-end generation from ISA descrip-

tion. A tool for generating TCG front-end template is im-plemented. The tool uses a C-like language for describing semantics of machine instructions;

— An automatic tool for preliminary testing of TCG front-end in QEMU. The tool only requires GNU Binutils and the C compiler;

— A tool for automating QEMU virtual devices development;— VM generation tool. This tool can create VMs from both

existing devices and new devices. To create a new QEMU board, the tool has a GUI and a mechanism for the board description in Python;

— API in Python for debugging via GDB Remote Serial Proto-col. This is used to debug QEMU, the guest OS or both at the same time.

— Convenience and user experience: — Easy QEMU extension due to open source code and

ISPRAS toolkit for speeding up development; — Binary code analysis without any guest OS modifications;— VM Inrospection mechanism that can be extended using

plugins via convenient API;— Can be easily adapted for specific use cases;— Support for latest QEMU versions with the latest features

including support for newest peripherals and CPUs.

Who is ISP RAS Foundation Platform target audience?

Supported guest platforms

ISP RAS QEMU deployment stories

— Bootloader, driver, OS and other system software developers;— DevOps teams for reproducing of software bugs, cross-plat-

form development, and scalable cloud testing; — Malware analysis;— Software certification engineers.

Emulation of the following ISAs: i386, x86-64, ARM, MIPS, PowerPC, and others.Guest systems supported by the introspection mechanism: Windows 10 (x86_64), Linux 2.x-4.x (x86, x86_64, ARM, AArch64).

The QEMU community has accepted ISP RAS patches for the record and replay mechanism and added them in the open source QEMU version 3.1.

38

Workflow

QEMUTarget

softwareGDB [+Eclipse]IDA Pro, WinDbg

ProcessesInstructionsEvents

Plugins

Record and replay

+

Debuggers and IDEs

QDT +

l3S

High-level information

Extensions

CPU devices

Cata

logu

e o

f T

ech

nologie

s

39

STATIC ANALYSISOF HDL DESCRIPTIONS


Retrascope is a functional verification toolkit for digital hard-ware designs. Retrascope provides automated engines for code analysis, formal model extraction and functional test generation. The toolkit accepts as inputs digital hardware module descriptions, written on the synthesizable subset of Verilog and VHDL languages, as well as their behavioral specifications.

Retrascope is an open source toolkit for functional verifica-tion of digital hardware designs. The toolkit provides several methods for formal model extraction and analysis and functional test generation. A component-based architecture of Retrascope allows user to develop hybrid techniques of formal model analysis by freely combining various analysis methods. The toolkit’s source code and distributive are avail-able at forge.ispras.ru/projects/retrascope.

Retrascope provides:

— Formal models extraction from source code:— control flow graph;— guarded actions decision diagram;— high-level decision diagram;— extended finite state machine.

— Functional test generation:— random tests;— dead code detection;— typical read-write error detection;— user-defined property checking.

— Formal models analysis (model checking) on specifications conformance via:— PSL;— SystemVerilog Assertions.

— Graphical user interface based on Eclipse IDE (and command line interface too):— running the tool with specified parameters;— model visualization (Zest, GraphML).

— Open source (Apache License Version 2.0);

40

— Extensibility at the source code level:— new models;— new engines for analysis and test generation.— Open APIs allow using auxiliary standalone tools for anal-

ysis and verification without changing Retrascope source code:

— SMT solvers via the SMT-LIB v2 language;— Model checkers via the SMV language.

— Companies working in the area of digital hardware develop-ment;

— Research groups in the field of digital hardware verification.

Retrascope is at the research prototype stage with active de-velopment. The tool has been applied with success to several industrial modules and multiple open benchmarks.

Software: Windows or GNU\Linux-based OS, Java Runtime Environment 8.

Who is Retrascope target audience?

Retrascope deployment stories

System requirements

Retrascope workflow

Models

Toolchain

Engines

Internal representation

GADD

Engine0 Engine1 Enginen

Launchers

Printers

Generators

Simulators

HLDD

EFSM

Functional tests

Retrascope

TransformersParameters

Specifications (PSL, SVA)

HDL (VHDL, Verilog)

Extractors

Cata

logu

e o

f T

ech

nologie

s

41

EXPLORATORY SEARCH SYSTEM FOR SCIENTIFIC GROUPS


SciNoon is a system for collaborative exploration of scien-tific papers. It is essential for a group of researchers to dive quickly into the new area of knowledge and to find answers on their questions, following up with tracking new research on the topic of interest with highly customizable alerts.

SciNoon is an innovative system designed to optimize long-term teamwork with scientific papers. The papers could be added both from search systems and from digital libraries (like Google Scholar, arxiv.org, SemanticScholar, PubMed) or be uploaded directly as PDF files. The key feature of SciNoon is 2D research maps specially designed for visualization of added papers which all team members have access to.

SciNoon provides:

— Shared workspace for collaborative processing of found scientific papers;

— Adaptive papers metadata details visualization depending on a zoom level and other parameters;

— Deduplication and cleansing of the uploaded metadata in the internal database enabling possibility to discover relations between papers, as well as between authors and papers.

— Citation contexts classification into five classes depending on a citation role:— Background: a cited paper contains general information

about the domain;— Use: a citing paper uses methods, data, and so on from

the cited paper;— Compare: a citing paper points differences (or similarities)

with the cited paper;— Extend: a citing paper continues the research from the

cited paper;— Weak: a citing paper criticizes the cited one pointing to

the authors’ mistakes;— Possibility to find relevant papers without keywords search

but using integrated recommendation subsystem;— Adjustable list of research-specific questions that a team

wants to find answers to. The papers are visualized on a re-search map depending on the answers to a selected question;

42

— Possibility to group similar articles and show these groups as clusters on a research map;

— Maintaining awareness of team members’ actions (via notifi-cations) and possibility to quickly discuss recent findings;

— Spreadsheet view for all collected answers and possibility to export data to a CSV file;

— Staying up to date with research topic by tracking newly col-lected relevant papers.

— Research teams from R&D departments lacking a tool for col-lecting and quickly sharing papers on their scientific problem;

— Scientists who need a tool for team work with collaborators from all over the world;

— Scientific advisers and their students who need to collabo-rate during exploratory search on the course projects.

SciNoon is used in ISP RAS while doing research and when advising students.

Who is SciNoon target audience?

SciNoon deployment stories

SciNoon workflow

PDF and/or paper’s

metadata

Data acquisition subsystem

PDF processing

serviceHTTP API

Web interfaceResearch mapGraph

database2019

extended

2010

2015

Cata

logu

e o

f T

ech

nologie

s

43

STATIC ANALYZER


Svace is an essential tool of the secure software develop-ment life cycle, the main static analyzer that is used in Sam-sung Corp. It detects more than 50 critical error types as well as hundreds of coding issues. Svace supports C, C++, C#, and Java. Support for Kotlin and Go programming languages is in progress and is planned for Q4 of 2020. Svace is includ-ed in the Unified Register of Russian Programs (No.4047).

Svace is an innovative technology based on years of research that constantly evolves for customer’s needs. It combines the key qualities of foreign competitors (Coverity Scan Static Analysis, Fortify Static Code Analyzer, Klocwork Static Code Analysis) with the unique open industrial compilers usage to provide the maximal support level for new programming language standards.

Svace provides:

— High-quality deep analysis: — an accurate representation of the source code (due to

integration with any build system); — full path coverage taking into account function calling

contexts when searching for complex defects; — high percentage of true positives (60-90%).

— Scalability and high speed: — parallel analysis using up to 64 processor cores; — ability to analyze software with the code size of tens of

millions of lines (analysis of the Android 6 OS having 8 million lines of code takes about 5 hours);

— supporting incremental system analysis in addition to the full analysis mode (performs a quick re-analysis of the recently modified source code).

— Convenient warnings viewing interface: — detailed error description with code navigation; — review interface for marking true and false positives; — analysis results migration between runs with hiding any

issues previously marked as false positives. — Accelerated customization (configuring existing detectors

as well as writing individual ones available exclusively to this customer; creating tailored user interfaces);

— Ultra fast adaptation to new environments and tools (adding new compilers within 1-2 weeks, in complex cases up to 2 months);

— Full compatibility with regulatory documents and require-ments of regulators (FSTEC of the Russian Federation);

— Can be used for adhering to the GOST R 56939-2016 requirements (when certifying software within Russia), as well as the new FSTEC regulation norms for finding software vulnerabilities.

44

— Companies aimed at software development with a special focus on high reliability and security;

— Companies that need to certify the developed software; — Certification laboratories.

— Host platforms for the analyzer: Linux kernel based OS (ver-sion 2.6 and later), Windows XP and later.

— Target architectures of the analyzed code: Intel x86/x86-64, ARM, ARM64, MIPS, MIPS64, Power PC, Hexagon.

For С/С++: GCC (GNU Compiler Collection), Clang (LLVM compiler), Microsoft Visual C++ Compiler, RealView/ARM Compilation Tools (ARMCC), Intel C++ Compiler, Wind River Diab Compiler, NEC/Renesas CA850, CC78K0(R) C Compil-ers, C/C++ Compiler for the Renesas M16C Series and R8C Family, Panasonic MN10300 Series C Compiler, C compiler for Toshiba TLCS-870 Family, Samsung CalmSHINE16 Com-pilation Tools, Texas Instruments TMS320C6* Optimizing Compiler.

For С#: Roslyn, Mono.

For Java: OpenJDK Javac Compiler, Eclipse ECJ compiler, Jack Compiler for Android.

Svace is the main static analyzer used in Samsung Corp. since 2015. It is used to check the company’s own software based on Android OS as well as the Tizen OS source code. Tizen is used in smartphones, infotainment systems and Samsung home appliances. Since 2017, Svace checks all changes submitted for review and inclusion in the Tizen OS. Within Russia, Svace is deployed in RusBITech and JSC Suk-hoi companies, among others.

What is Svace target audience?

Svace deployment stories

Supported platforms and architectures

Supported Compilers

Cata

logu

e o

f T

ech

nologie

s

45

Svace Architecture

Executions of the original compilers and other build tools are intercepted when monitoring the build

The analysis intermediate representation is created by own compliers adapted from the open industrial toolchains.

— lightweight abstract syntax trees analysis;

— interprocedural analysis (context sensitive and path sensitive with symbolic execution);

— tainted data analysis.

Interception

Build system

Svace

C# analysis

C / C++ / Java IR

C#

Warnings

— syntax coloring and code navigation support;

— warning review support (assigning true /false positive status);

— comparison of analysis runs with suppressing old false positives.

46

A FRAMEWORK FOR SOCIAL MEDIA ANALYSIS


Talisman is a framework for data analysis that is designed for retrieving people, community, and company data. It utilizes modern approaches for machine learning, computer lin-guistics, complex network analysis and big data processing. Talisman is capable of finding relations and their patterns by analyzing large graphs consisting of hundreds of millions of nodes.

Talisman is integrated with a platform for semantic extraction from text (Texterra) and the original ISP RAS technology for data mining. Considering the technological level, Talisman is comparable to the world’s best competitors (Palantir Gotham and IBM Watson Content Analytics). Its advantage is the automation of routine analysis processes utilizing the recent scientific achievements (reducing resources required for manual analysis).

Talisman provides:

— The essential features combination:— Semantic analysis utilizing the Texterra platform capabili-

ties (sentiment analysis; working with concepts instead of just words, a unique ability for the Russian language; the ability to analyze users’ comments and identify implicit references to objects in discussions, etc.);

— Analysis of large graphs consisting of hundreds of millions of nodes (including automatic construction of information distribution graphs with role definitions such as source, distributor, opinion leader, reader).

— Automatic message grouping in stories (a map of all discussed stories in the information space, taking into account information flow between different resources);

— Identification of true users’ attributes in social networks. Determination of gender, age (usually precise within a year), education, marital status, place of residence based on the profile analysis and user activity (this list of attrib-utes can be easily extended);

— Automatic retrieval of target audience parameters (aggre-gation by demographic attributes and dominant values identification);

— Information validation tools (bot detection, spam filtering, and detecting possible manipulation of the audience opinion);

Cata

logu

e o

f T

ech

nologie

s

47

— Reports on monitored information within a few minutes after publication supported by big data analysis technologies of the Apache Hadoop stack and the elastic system scalability using the Asperitas cloud technology from ISP RAS;

— Analysis of big data coming from any source: corporate, news, social networks (VK, Facebook, Twitter, Instagram, Od-noklassniki, Youtube, LinkedIn, etc.), blogs (LiveJournal), open channels of Telegram messenger and Dark web resources. Talisman can be integrated both with the original ISP RAS data acquisition technology and with the external collectors;

— Functioning either as a locally deployed system or in SaaS mode;

— Fast adaptation and functionality extension for various do-mains (information security, medicine, auditing, etc.).

— Interest groups detection based on social media analysis, e.g. target audiences (for marketing and political purposes), hotbeds of social tensions and groups addressing hot-spot issues.

— Public opinion retrieval for companies, people and products; — Key trends identification and forecasting of online advertise-

ment success; — Staff management optimization including efficient recruit-

ment, data verification, assistance in developing systems of incentives based on short-term and long-term interests, monitoring of leakage and private data disclosure;

— Reputation management (in particular, finding real causes of employee and customer complaints);

— Detecting information campaigns aiming at manipulating opinions of target audiences as well as identification of said campaign’s target audience.

The Talisman framework includes two products that complement each other’s functionality:

1. Talisman. Flow

Talisman application areas

2. Talisman.Biography.

Talisman.Flow is a system for preprocessing big data information flow coming from social media. It is a scalable microservice-based software framework built from open source. Talisman.Flow increases development productivity for analysis applications allowing to merge several information flow processors easily. It is included in the Unified Register of Russian Programs (No.6045). More information at:

talisman.ispras.ru/talisman-se-поток/

Talisman.Biography is a system for big data analysis from social media. It performs automatical retrieval of staff’s form data based on social media and other open sources. It is in-cluded in the Unified Register of Russian Programs (No.5547). More information at

http://talisman.ispras.ru/talisman-se-биография/

Talisman supports languages recognized by the Texterra analyzer, namely, Russian and English.

Supported languages

48

Talisman workflow

Data collection and information

extraction

Contentanalysis: text, images, video,

graphs

Tags for each content unit

Aggregation and additional

analysis

Accounts analysis: friendship graph,

actions graph

Tags foreach profile

Raw data storage

Processed information storage

Applications

Interface:Web, API

Profiles of users and groups

Friendship graphs

Subscriptions

Avatar(s)

Posts on the wallsCommunity

posts Reposts

Likes

Messages from forums (with

authorship)

Mass media messages

Cata

logu

e o

f T

ech

nologie

s

49

A SEMANTIC ANALYZER


Texterra is a scalable platform for extracting semantics from text. It contains the complete fundamental set of technolo-gies for creating multifunctional applications for text analysis. Texterra bases its semantic analysis approach in the concept identification. It is included in the Unified Register of Russian Programs (No.4048)

Texterra performs a unique analysis of Russian texts based on the identification of concepts instead of just words. It differs from foreign analogues by predominant attention to Russian language. The analyzer is based on the results of basic research and provides the ability to integrate with the Elasticsearch search system, significantly expanding its capa-bilities. The successful combination of technologies allows the platform to compete with projects of the IBM Watson Natural Language Understanding level.

Texterra provides:

— High text processing speed (morphological analysis: 69 000 words per second, syntactic analysis: 39 100 words per sec-ond, coreference resolution: 10 100 words per second, full text analysis: approximately 13 600 words per second);

— Maximum attention to Russian language (unlike similar spaCy and UDPipe projects, as well as IBM Watson Natural Language Understanding, which does not support the analy-sis of emotions and concepts in Russian-language texts);

— Large knowledge base (more than 7 million concepts); — Building the knowledge base without expert involvement

(automatic construction and update using Wikipedia, Me-diaWiki, Linked Open Data, etc.);

— Scalability both in word processing speed and knowledge base size (using Apache Ignite and the Asperitas cloud tech-nology developed at ISP RAS);

— High text analysis accuracy due to a number of key features: — Multi-level search by related concepts; — Adaptability to slang, hashtags (#) and errors in text; — Analysis of emotional coloring (with separation of atti-

tude towards objects and their attributes);— Determining relationships between people and compa-

nies based on text information;— Finding out implicit object references in discussions.

50

Who is Texterra target audience?

Supported languages

Texterra deployment stories

Texterraworkflow

System requirements

— Fast adaptation and tailored solutions development;— Support of two use cases:

— As a deployed software system on a customer’s local server providing either HTTP REST-based or RMI protocol access;

— Online at texterra.ispras.ru;— Simple and fast support for specific domains and the ability

to integrate new languages backed up by the modern ma-chine learning approach.

— Corporate software developers (e.g. chat bot developers);— Developers of semantic search systems for certain domains

(such as information security, medicine, auditing, etc.);— Developers of arbitrary text processing applications.

Texterra analyzes Russian and English texts.

— A system platform supported by Java 8;— 16 Gb RAM or more for each supported language;— 64-bit operating system is recommended.

Texterra is productized in the joint projects with HP and Sam-sung (the project goal was to develop a technology for analyz-ing corporate reports or supporting smart TVs). Currently Texterra backs up several ISP RAS innovative products such as Talisman social media analysis. Texterra is also used by a few of Russian government agencies.

Information extraction module:

identification of mentioned concepts;

relationships extraction;key concepts recognition.

Sentiment analysis module:

analyzes the opinions of social media users,

taking slang and hashtags into account.

Linquistic analysismodule:

language identification;morphological analysis with

error correction;analysis of syntax and

semantics.

Cata

logu

e o

f T

ech

nologie

s

51

A BINARY CODE ANALYSIS PLATFORM


Trawl is a unique production-level tool for analyzing various binary code features that supports multiple target proces-sor architectures. It does not require debug information or source code. Trawl can be used to analyze all kinds of soft-ware ranging from boot loaders to OS kernels and user-level applications. It is included in the Unified Register of Russian Programs (No.5323).

Trawl is a large software system based on decades of experi-ence generated by compiler developers and information se-curity experts. Compared to similar research tools in the area of binary code analysis, Trawl is ready for production use.

Key features:

— unrestricted recovery of data and control flows on machine code level;

— localization of individual algorithms in the code, formal pres-entation of their structure and semantics;

— automation of manual analysis and completely automatic solutions for many everyday analysis tasks.

Trawl provides:

— Modular platform architecture (allows extending the set of supported processor architectures and implementing new functional features);

— Support for automating analysis via scripts and an open API (allows for integrating Trawl with other tools such as IDA Pro or Wireshark);

— In-depth analysis:— only the binary code is required for analysis;— the approach is based on dynamic analysis of full-system

execution traces, possibly augmented with static memory snapshot analysis;

— automatic lifting of intermediate representation per-formed prior to the main analysis phase;

— static program structure recovery for analyzed system’s parts, with the support for performing the recovery based on multiple program runs;

— precise data flow analysis that accounts for hardware specifics (pipeline, interrupts, virtual address translation, DMA);

— interactive recovery of algorithm flowcharts, based on constructing information flow slices;

— the approach implemented in Trawl is not vulnerable to most known anti-analysis techniques.

52

Who is Trawl target audience?

Supported platforms and architectures

Схема работы

— Malware research laboratories;— Developers of embedded software and OS components;— Certification laboratories.

— Hardware requirements: Windows or Linux OS, a 64-bit x86 processor, at least 16GiB RAM.

— Supported target processor architectures: x86, x86-64, ARMv6, ARMv7.

— Supported target operating systems: Windows, Linux. Trawl supports analyzing the code for an unknown OS and the code that runs outside an OS.

— High performance:— parallel analysis with high scalability on multicore work-

stations;— support for analyzing long user scenarios for the system

being researched.— Advanced GUI:

— execution trace views that provide many kinds of search and navigation similar to conventional debuggers but with possibility of instantaneous navigation along data flows both forward and backward in time;

— automatic markup of high-level trace structure: process-es and threads, interrupt handlers, call stacks, dynamical-ly loaded modules and symbols;

— arguments and return values recovery for called func-tions;

— external events synchronized with traces (network com-munications, user input and output) and hardware-related events.

executiontraces

networktraces

description of algorithms and

data formats

model example

IDA Protraffic analysis

(Wireshark)

system being analyzed

analysis tools: tracing,debugging, taint analysis

information flow research

flowchart construction

deterministic replay

preliminary representation level lifting

controlled execution environment

Trawl

API

interactive work

Cata

logu

e o

f T

ech

nologie

s

53

Key features

The solution framework currently has four technologies:

I. Asperitas cloud environment based on Openstack, Kubernetes and Ceph.

The solution allows storing data and performing complex, resource-intensive calculations using both containers and virtual machines. It is designed especially for the deployment of cloud environments.

— Ability to build solutions for problems in specific domains (computational fluid dynamics, big data analysis, program analysis for vulnerabilities etc.);

— Technological independence and on-premise solutions (the ability to recreate the infrastructure in isolated environment with full control by the means of open standards, free soft-ware and ISP RAS innovative research).

Asperitas is created in a joint project with Dell company. It is designed for short-living computations with large amount of available resources. The cloud environment is deployed from local sources as a prepared virtual machine with all neces-sary tools. Asperitas is included in the Unified Register of Russian Programs (No.5921).

— Deployment is based on open state of the art technologies, which are fundamental for building large private cloud sys-tems;

— It provides users with all required functionality:— Virtual networks and compute clusters are managed by Key-

stone, Neutron and Nova (equivalent of Amazon EC2);— Block storage and scalable object storage is based on the

Ceph distributed file system;— Container environment management is based on Kuber-

netes.

SOLUTIONS FOR CREATING

54

II. General-purpose deployment and configuration management tool.

III. VMEmperor virtual machine management system

IV. Fanlight web laboratory organization platform

The tool performs software system lifecycle management. It provides the ability to deploy and implement various services on the PaaS level:

VMEmperor is developed at ISP RAS for satisfy internal demands and is publicly available (https://github.com/ispras/vmemperor). It is designed to manage virtual resources at the IaaS level. It has been continuously running on the XCP-ng / Citrix XenServer platform since 2012 providing users with easy access to virtual resources and their orchestration.

Fanlight is created as a result of ISP RAS participation in the University Cluster program and in the Open Cirrus international project (founded by Hewlett-Packard, Intel and Yahoo!). It is in-tended for deploying SaaS infrastructures for web-based com-puting labs using Docker Compose. Fanlight is built onvirtual containers and operates on the basis of virtual desktops in the DaaS model (Desktop as a Service). The platform is available for users at fanlight.ispras.ru and supports applications devel-oped for Linux kernel based OS only. Fanlight is included in the Unified Register of Russian Programs (No.6066).

— Big data analysis services on ready-for-run Apache Spark, Apache Hadoop and Apache Ignite systems with arbitrary amount of computing nodes (starting a cluster takes about 5 minutes). The open source version with minimal functionality is located at github.com/ispras/spark-openstack;

— Artificial Intelligence research services using Tensorflow, Caffe and other software systems on modern hardware (serv-ers with NVIDIA Tesla V100 on SXM2 interface);

— High performance computing (HPC) services.

Fanlight provides:

— High performance cloud computing through the use of con-tainers:

— working comfortably with heavy CAD-CAE engineering ap-plications that require hardware acceleration support for 3D graphics for complex visualization;

— support for running MPI, OpenMP, CUDA applications by accessing HPC clusters, multi-core processors and NVIDIA graphics accelerators.

— Computing capabilities expansion at the PaaS level by em-ploying hardware resources (HPC / BigData clusters, storage systems, servers with graphic accelerators);

The configuration tool allows:

— Managing users and group permissions as a service with REST API endpoints;

— Recording users actions history and software system states; — Working with various cloud virtualization systems;— Ability to deploy complex distributed systems with all possi-

ble combinations of services (Spark, Hadoop, Ignite, Cassan-dra, Jupyter, shared remote FS, remote FS server, Nextcloud, Fanlight) on request;

— Taking into account the compatibility of services and their various versions;

— Ability to use local software repositories for deploying services.

Cata

logu

e o

f T

ech

nologie

s

55

— Customization for a given application area by integrating spe-cialized design application packages. There are deployment success stories :

— in the MCC CFD process: OpenFOAM, SALOME, Paraview, etc.;— in the Gas&Oil industry: tNavigator, Eclipse, Roxar, Tempest, etc.— Allows using an arbitrary thin client (including mobile devices);— Can be deployed on a server, computing farm, in the cloud

(from IaaS level) or in local data processing center.

Cloud solutions deployment stories

The computing cluster based on Asperitas is used to analyze information flows in the Talisman social media analysis technology and to support the operation of other ISP RAS technologies (e.g. analyzing Android OS using Svace). The following projects were also implemented: a joint project with Huawei (large graphs analysis using big data processing technologies) and the Tizen OS lifecycle support infrastruc-ture that allows organizing the process of OS components joint development and automating the regular assembly and sample testing. In addition, a number of projects is performed jointly with the Ministry of Science of Russian Federation.

The Fanlight platform were used in a number of joint projects for the deployment of web laboratories with Russian Federal Nuclear Center of the All-Russian Scientific Research In-stitute of Experimental Physics, OOO RRS-Baltika, Keldysh Institute of Applied Mathematics (development of technology to increase and efficiently use the hydrocarbon raw materials resource potential of the Union State) as well as the Labora-tory of Continuum Mechanics (unicfd.ru).

VMEmperor was not used in external commercial projects, however, it is widely used in ISP RAS internal projects.

catalogue of technologies · the unified digital platform for scientific and scientific-techni-cal...

Documents