project no: 644312 · 0.7 f. campos(atos) new integrated version 29/09/2017 0.7.1 r. montella (unp)...
TRANSCRIPT
D7.1 System Integration and Validation
Page 1 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Project No: 644312
D7.1 System Integration and Validation
September 30, 2017
Abstract:
This deliverable describes the Cloud infrastructure software deployed in the RAPID testbed, in order to enable the execution of the pilot applications for validating the implemented components. The document details the components
deployed and their current configuration, as well as the deployment topology of the RAPID Cloud itself.
Document Manager
E. Garrido ATOS
Document Id N°: rapid_D7.1 Version: 1.0 Date: 11/10/2017
Filename: rapid_D7.1_v1.0.docx
Confidentiality
This document contains proprietary material of certain RAPID contractors, and may not be
reproduced, copied, or disclosed without appropriate permission. The commercial use of any
information contained in this document may require a license from the proprietor of that information.
Ref. Ares(2017)5075152 - 18/10/2017
D7.1 System Integration and Validation
Page 2 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
The RAPID Consortium consists of the following partners:
Participant no. Participant organization names short name Country
1 Foundation of Research and Technology Hellas FORTH Greece
2 Sapienza University of Rome UROME Italy
3 Atos Spain S.A. ATOS Spain
4 Queen's University Belfast QUB UK
5 Herta Security S.L. HERTA Spain
6 SingularLogic S.A. SILO Greece
7 University of Naples "Parthenope" UNP Italy
The information in this document is provided “as is” and no guarantee or warranty is given that the
information is fit for any particular purpose. The user thereof uses the information at its sole risk and
liability.
Revision history
Version Author Notes Date
0.1 E. Garrido(ATOS) Table of Contents 15/06/2017
0.2 E. Garrido(ATOS) Writing of sections 2, 2.2 & 4 13/07/2017
0.3 E. Garrido(ATOS) Writing of section 3 14/07/2017
0.4 E. Garrido(ATOS) Adding section 2.3 & 2.4 18/07/2017
0.5 F. Campos(ATOS) Format section 4, Writing of section 5 30/08/2017
0.5.1 S. Kosta (UROME) Comments about integration test 01/09/2017
0.5.2 L. López (ATOS)
Reformatting section 1 and section 2, include description
of new implemented architecture
11/09/2017
0.5.3 C. Hong (QUB) Comments about integration test 14/09/2017
0.5.4 T. Velivassaki (SILO) Detailed review of current draft. 19/09/2017
0.6 F. Campos (ATOS) New integrated version 20/09/2017
0.6.1 S. Kosta (UROME) Comments about section 3.Deployment 23/09/2017
0.6.2 F. Campos (ATOS) Writing about Scenarios structure. 26/09/2017
0.6.3 S. Kosta (UROME) Comments about test structure. 27/09/2017
0.6.4 T. Velivassaki (SILO) Detailed review of current draft. 28/09/2017
0.6.5 C. Hong (QUB) Writing about VMM and AS 28/09/2017
0.7 F. Campos(ATOS) New integrated version 29/09/2017
0.7.1 R. Montella (UNP) Writing about GPU 29/09/2017
0.8 F. Campos(ATOS) New integrated version ready for final review 02/10/2017
0.8.1 I. Spence (QUB) Internal review 03/10/2017
0.8.2 D. Deyannis (FORTH) Internal review 04/10/2017
0.9 F. Campos(ATOS) New version with internal review comments 04/10/2017
0.9.1 F. Campos(ATOS) Reformat Scenarios structure. 05/10/2017
0.9.2 S. Kosta (UROME) Reworking about integration test 07/10/2017
0.9.3 C. Hong (QUB) Reworking about integration test 08/10/2017
1.0 F. Campos(ATOS) Final version 10/10/2017
D7.1 System Integration and Validation
Page 3 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Contents 1. Introduction ............................................................................................................................................7
1.1. Glossary of Acronyms .................................................................................................................7
2. Final Implemented Architecture ............................................................................................................9
2.1. Resize issue.................................................................................................................................10
2.2. SLA parameters and fulfilment .................................................................................................11
2.3. Monitoring ..................................................................................................................................13
3. Deployment ..........................................................................................................................................15
4. Verification ...........................................................................................................................................17
4.1. Complementary Individual Test ................................................................................................17
4.1.1 SLAM .................................................................................................................................17
4.1.2 VMM ..................................................................................................................................28
4.1.3 AC .......................................................................................................................................29
4.1.4 AS .......................................................................................................................................29
4.2. Integration Test...........................................................................................................................30
4.2.1 D2D offloading ..................................................................................................................30
4.2.2 Registration process and CPU Offloading ........................................................................30
4.2.3 Offloading task to a GPU ..................................................................................................31
4.2.4 Enhancing the VM characteristics.....................................................................................31
4.2.5 Task parallelization ............................................................................................................32
4.2.6 Task forwarding .................................................................................................................33
4.2.7 Multiple clients...................................................................................................................33
5. Validation .............................................................................................................................................35
5.1. Scenario: Run Generic APP in RAPID infrastructure for the first time ..................................35
5.2. Scenario: Run Generic APP in RAPID infrastructure using an existing VM .........................35
5.3. Scenario: Run Generic APP in RAPID infrastructure by resizing existing VM .....................35
5.4. Detailed Scenario Steps .............................................................................................................37
5.4.1 User registration process....................................................................................................37
5.4.2 D2D offloading to CPU .....................................................................................................39
5.4.3 Offloading task to a GPU ..................................................................................................40
5.4.4 Cloud Services with DS not available...............................................................................40
5.4.5 Cloud Services with SLAM not available ........................................................................42
5.4.6 Cloud Services with VMM not available..........................................................................43
D7.1 System Integration and Validation
Page 4 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.7 AS not available .................................................................................................................43
5.4.8 Task parallelization ............................................................................................................44
5.4.9 Task forwarding .................................................................................................................46
5.4.10 Multiple clients...................................................................................................................48
5.4.11 Overhead in VM CPU usage .............................................................................................49
5.4.12 Overhead in VM RAM usage ............................................................................................52
5.4.13 Overhead in VM DISK usage............................................................................................54
5.4.14 Overhead in VM CPU, RAM and DISK usage all together ............................................56
6. Conclusions ..........................................................................................................................................59
References ...................................................................................................................................................61
D7.1 System Integration and Validation
Page 5 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
List of Figures
Figure 1. The final RAPID architecture ......................................................................................................9
Figure 2. Final Sequence Diagram (Original: D5.4 - Figure 9. 1) ...........................................................11
Figure 3. JUnit SLA-Enforcement .............................................................................................................20
Figure 4. JUnit SLA-Repository I ..............................................................................................................22
Figure 5. JUnit SLA-Repository II ............................................................................................................23
Figure 6. JUnit SLA-Services I ..................................................................................................................25
Figure 7. JUnit SLA-Services II ................................................................................................................26
Figure 8. JUnit SLA-Tools .........................................................................................................................27
Figure 9. JUnit SLA-socket-parent ............................................................................................................28
Figure 10. Screenshot of the RAPID demo app with the number of VMs set to 4, so the task can be
executed on distributed way on multiple VMs. .........................................................................................46
Figure 11. Screenshot of the RAPID demo app with the forwarding flag enabled, so the task will throw
an error on the main VM and the main VM will forward the task to a more powerful VM. ..................47
Figure 12. Screenshot of the RAPID demo running on one phone and one emulator at the same time,
showing that RAPID supports multiple clients in a transparent way. ......................................................48
List of Tables
Table 1: Parameters of the @QoS annotations in RAPID ........................................................................12
Table 2: Example of QoS information exchange between AC and SLAM, in JSON .............................12
Table 3: Predefined VM characteristics ....................................................................................................13
Table 4: OpenStack command metrics ......................................................................................................13
Table 5: Ports required to be opened in RAPID private cloud’s external network .................................15
Table 6: Ports required to be opened in RAPID GPU infrastructure .......................................................15
Table 7: List of flavours created ................................................................................................................36
D7.1 System Integration and Validation
Page 6 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Executive Summary
This document reports the integration and validation activities on the integrated prototype of the
RAPID framework, consolidating functionalities already available in the integrated Acceleration
Client and Acceleration Server prototypes. Specifically, the document presents the integration and
verification tests on the integrated RAPID framework, with respect to the RAPID system
requirements. The main outcomes of the integration and validation activities include:
The final RAPID architecture, which has resulted from updates and modifications, performed
on the initial version, during the development, integration and validation activities, exploiting
feedback from all involved processes and components. The document presents the issues
which have arisen during development and integration, as well as the architectural
modifications and components’ updates to tackle them.
The RAPID validation at both component and system level, which includes tests verifying the
functionalities of individual components, as well as integration tests among components,
verifying the RAPID functionalities on the integrated RAPID framework. Detailed description
and analysis of the results is included in every test performed.
The integrated RAPID framework opens up great opportunities in the Internet of Things and mobile
computing, allowing for heavy (or just heavier than a client device can stand) computations to be
offloaded to more capable infrastructure, i.e. more powerful devices or even the cloud. The great
potential of the RAPID framework will be further demonstrated during the RAPID framework
evaluation against the three intensive applications of the RAPID project, namely the Biosurveillance,
Antivirus and Kinect Hand Tracking applications.
D7.1 System Integration and Validation
Page 7 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
1. Introduction Cloud computing can be considered as a key driver for innovation, as it is depicted in the European
Cloud Computing Strategy [1], adopted in 2012, and included into the Digital Single Market strategy
[2]. Since then, the cloud computing market has been continuously growing and it is expected to grow,
according to Bain & Company [3], from $280B in 2015 to $390B in 2020 at a Compound Annual
Growth Rate (CAGR) of 17%. According to Research and Markets [4], the rise of the Internet of
Things is impacting the way customers and businesses interact with the physical world. This can be
translated into a faster growth going from $16.3B in 2016 up to $185.9B, expected by 2023. At the
same time, the globalization of the technology and the proliferation of smart devices are driving
mobile data to 20% of the total internet traffic by 2020, representing 5.5B global mobile users [5].
This is the context in which RAPID expects to interact.
To this end, RAPID provides a flexible framework, allowing the offloading of heavy computational
tasks, of either CPU or GPU load, to more capable devices (i.e. with higher computational or energy
resources).
The present document, the first deliverable of WP7, reflects the final RAPID Architecture, as during
the development phase some updates were required which was not initially evident. Moreover, the
interactions between the components are also included, as well as the description of the new
functionalities along with some instructions for deploying the RAPID framework. The deliverable is
structured as follows.
Section 2 presents the final implemented architecture, which is the evolution of the architecture
initially presented. Section 3 includes details on development implementation. Section 4 lists
individual tests for validating and verifying the new functionalities of the individual components, as
well as the integration tests and their results. Section 5 presents the different validation scenarios to
test the functionality of the platform. Finally, Section 6 contains the conclusions of the entire process.
1.1. Glossary of Acronyms
Acronym Definition AC Acceleration Client AC-RM Acceleration Client in Remote Machine
API Application Programming Interface
AS Acceleration Server
AS-RM Acceleration Server in Remote Machine
CAGR Compound Annual Growth Rate
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
D Deliverable DAO Data Access Object
DB Database
DFE Dispatch and Fetch Engine
DS Directory Server
GPGPU General-Purpose computation on Graphics Processing Units
GPU Graphics Processing Unit
JSON JavaScript Object Notation
QoS Quality of Service
SLA Service Level Agreement
D7.1 System Integration and Validation
Page 8 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
SLAM Service Level Agreement Manager
VM Virtual Machine
VMM Virtual Machine Manager
VPN Virtual Private Network
WP Work Package
D7.1 System Integration and Validation
Page 9 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
2. Final Implemented Architecture The RAPID architecture was designed at the early stages of the project, as reported in D3.1
Specifications of System Components and Programming Model [6] ; however, during the development
phase some issues arose that required to be solved. These issues implied an update on the
communication between the components of the RAPID framework. As explained in Figure 1. The
final RAPID architecture
Figure 1. The final RAPID architecture
In the picture above, the arrows represent the directional communication between components, going
from the ones that start the communication to the other communicating end.
The basic components and their functionality remains the same. The RAPID framework is mainly
composed of the Acceleration Client (AC) running on the client (source) device, the Acceleration
Server (AS) running on a remote host and the infrastructural components, namely the Directory Server
(DS), the SLA Manager (SLAM) and the Virtual Machine Manager (VMM). The main modifications
applied on the RAPID architecture are presented in detail in the following subsections.
Section 2.1 presents the issues which arose while attempting to change the size of the VMs, explaining
the modifications to the process. During testing, it was detected that the RESIZE action caused
interruption of the process launched by the user when a machine reboot occurs as a result of a resizing
D7.1 System Integration and Validation
Page 10 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
action requested from SLAM to VMM. The machine was restarted after the approval of the AS, to
avoid an additional communication between the VMM and AS components. Likewise, we detected the
need to add delay events between the CHANGE and CONFIRM commands, in order to be correctly
interpreted by OpenStack.
Section 2.2 presents the three QoS parameters implemented in RAPID, namely cpu, mem and disk, to
exchange QoS information, as well as the JSON format to be used for each case. If it does not comply
with the SLA condition, a predetermined resource increase action will be executed.
Section 2.3 details the metrics used by OpenStack associated with the monitoring services
implemented in the RAPID messages, in relation to the % MEM, % DISK used and % CPU
monitoring.
2.1. Resize issue The resize action occurs when there is a violation of the service level agreement and consists of
incrementing resources: cpu, mem or disk. To do this, SLAM sends two messages:
SLAM_CHANGE_VMFLV_VMM and SLAM_CONFIRM_VMFLV_VMM as explained in section
3.2 of D5.4 [7].
As explained in D5.4, Section 4.3, Figure 9 [7], when a SLA violation is detected, the SLAM sends
the SLAM_CHANGE_VMFLV_VMM message to the VMM to adjust the size of the VM that has
caused the violation. The VMM then executes a corresponding OpenStack4j API call to resize the
VM. After the resize request, OpenStack requires a confirmation action from the user in order to
finalize the resize process. To satisfy this request, the SLAM sends the
SLAM_CONFIRM_VMFLV_VMM message to the VMM after sending
SLAM_CHANGE_VMFLV_VMM. This process is fully explained in Section 4.3, Figure 9 of D5.4.
During our integration tests, we made two modifications to this process. First, as shown in Figure 9 of
D5.4, the VMM sends the AS_RM_MIGRATION_VM message to the AS after receiving the
confirmation message (i.e., SLAM_CONFIRM_VMFLV_VMM) from the SLAM. The AS then
notifies the client that the current resources provided by the cloud will be unavailable during the resize
process. This assumed that OpenStack would stop providing resources during finalizing the resize
process with the confirmation action. However, we found that when the VMM requests the resize call
from OpenStack the VM resources for the client already become unavailable. Therefore, the VMM is
modified to send AS_RM_MIGRATION_VM to the AS after receiving
SLAM_CHANGE_VMFLV_VMM from the AS. After an OK message is received from the AS, the
VMM calls the proper OpenStack4j API to resize the VM. Second, as displayed in Figure 9 of D5.4,
the SLAM sends the SLAM_CONFIRM_VMFLV_VMM message to the VMM right after sending
SLAM_CHANGE_VMFLV_VMM to the VMM. During testing, we found that the resize process by
OpenStack normally takes more than 30 seconds. Therefore, if the SLAM sends
SLAM_CONFIRM_VMFLV_VMM to the VMM right after SLAM_CHANGE_VMFLV_VMM, it
will get an ERROR message from the VMM. To address this issue, the SLAM is changed to send
SLAM_CONFIRM_VMFLV_VMM periodically to the VMM until it gets an OK message from the
VMM. Figure 2 is the updated version of D5.4, Figure 9, including these two changes.
D7.1 System Integration and Validation
Page 11 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 2. Final Sequence Diagram (Original: D5.4 - Figure 9. 1)
2.2. SLA parameters and fulfilment In D3.1 [6] Section 7.1 and Section 7.6 we described that a @QoS annotation would be implemented
in order to guarantee a minimum set of resources and conditions that the developer can decide are
needed for the proper execution of the remote offloaded code. The @QoS Java annotation is used in
conjunction with @Remote to declare the set of minimum requirements to be fulfilled by the host
environment.
In D3.1 [6] Section 7.6 it was already described that for each @QoS annotation, there will be the
following parameters:
terms: metric or aspect to be taken into account
D7.1 System Integration and Validation
Page 12 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
operators: They represent the operators to be applied to the thresholds defined, such as eq (equal), gt (greater than), lt (less than), etc.;
thresholds: They are the values (percentage) of the terms that have to be fulfilled
A set of options has been implemented in RAPID in order to exchange the QoS to be used, as listed in
the following table:
Table 1: Parameters of the @QoS annotations in RAPID
Term Meaning Operator Threshold
cpu_util Maximum percentage of CPU that can be used. For
example: cpu_util LT 60 means that a maximum of 60% of the CPU is desired to be busy and if it gets higher, then the
number of cores of the machine should be increased.
LT percentage
mem_util Maximum percentage of memory that can be used. Such as
mem_util LT 60 means that a maximum of 60% of the
memory is desired to be busy and if it gets higher, then the
machine memory should be increased.
LT Percentage
disk_util Maximum percentage of disk that can be used. For give an
example disk_util LT 60 means that a maximum of 60% of
the disk is desired to be occupied and if it gets higher, then
the machine disk should be increased.
LT percentage
The format used to exchange the information of the QoS between the AC and the SLAM is described
in D5.3 [8]. This information is exchanged in JSON format, while minor changes have been applied in
the attributes described in D5.3, the “variable” is now called “term”, the “condition” is “operator” and
the “value” is “threshold”. The QoS information represents the conditions under which correct
behavior of the system can be assumed.
In Table 2, a possible JSON snippet exchanged between the AC and SLAM is presented. In this
example, we want the physical machine hosting the VM of interest to be using a maximum of 80% of
CPU and to be using a maximum of 60% of the available RAM. If any condition of the QoS attributes
is not fulfilled, then the resources have to be increased.
Table 2: Example of QoS information exchange between AC and SLAM, in JSON
Example QoS exchange between AC and SLAM
{
"QoS":[
{ "term":"cpu_util","operator":" lt ", "threshold":80},
{ "term":"ram_util ", "operator":"lt", "threshold":60}
]
}
QoS monitoring is performed every minute. In order to achieve this, OpenStack ceilometer polling
interval has been decreased in order to return real-time information [9]. It has to be set to 10 seconds
to ensure that the SLA will work with real-time information.
When a QoS violation occurs, depending on the parameters that have been set (memory, cpu or disk),
the machine may change. The machine cannot be enhanced in an unlimited manner. So, the maximum
CPU has been configured to 4 cores, the maximum RAM to 4GB and the maximum disk to 40GB.
D7.1 System Integration and Validation
Page 13 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Once a parameter cannot be increased any more, a log trace will be created with the appropriate
information. In a production environment, an e-mail could be sent, giving information about the issue.
In:Table 3: Predefined VM characteristics below, the possible values of CPU, RAM or DISK within
the flavor are displayed, for our predefined set of virtual machines.
We have a predefined set of virtual machines, Table 3, the possible values of CPU, RAM or DISK
within the flavour are displayed.
Table 3: Predefined VM characteristics
Resource Value
CPU #cores 1, 2, 4
RAM in Mb 1024, 2048, 4096
DISK in GB 20, 40
The system will start with the lowest flavour and will increase resources depending on the QoS that is
producing the violation. OpenStack does not allow changing the flavour of a VM where two
parameters are changed at once. If two violations occur at the same time we prioritize the changes: 1st
CPU, 2nd RAM and 3rd DISK size.
2.3. Monitoring In order to realize the QoS monitoring, described in the previous section, the VMM has been adapted
and is able to return metrics collected by the underlying OpenStack. OpenStack - Ceilometer's
component has been used for this purpose. The component has been configured to pull data every 10
seconds in order to achieve an almost real-time monitoring as explained in section 2.2
The SLAM monitoring has been configured to check the QoS, and therefore pull the metrics
information form the VMM every minute. The metrics retrieved from OpenStack are presented in
Table 4.
Table 4: OpenStack command metrics
RAPID Message OS
command Unit Note RAPID Rule
SLAM_GET_VMCPU_VMM cpu_util % Average CPU utilization % cpu
SLAM_GET_VMMEM_VMM memory MB Volume of RAM allocated to
the instance % mem_use
SLAM_GET_VMMEM_VMM memory.usage MB
Volume of RAM used by the
instance from the amount of its
allocated memory
% mem_use
SLAM_GET_VMDISK_VMM disk.capacity B The amount of disk that the
instance can see % disk_use
D7.1 System Integration and Validation
Page 14 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
SLAM_GET_VMDISK_VMM disk.allocation B The amount of disk occupied by
the instance on the host machine % disk_use
In a real environment, the client would be offloading several tasks and the metrics from the VM would
be retrieved for longer than one minute from OpenStack.
The frequency of monitoring polling has been selected such that the processing overhead is kept low
and also the size changes in the VM are possible to be refreshed on OpenStack.
As commented in the previous section, it will be possible to set QoS within RAPID for CPU, memory
and disk. Three metrics were presented in Section 2.2 (cpu_util, mem_util and disk_util) to obtain
these three values. Moreover, three constants have been defined to retrieve the values, namely:
RapidMessages.SLAM_GET_VMCPU_VMM, RapidMessages.SLAM_GET_VMMEM_VMM and
RapidMessages.SLAM_GET_VMDISK_VMM. With the first message, the percentage of CPU that is
being used is received.
In case of memory and disk two values are received, one expressing the maximum memory or disk
capacity and another the amount of memory and disk that is being occupied. Their percentage is
calculated by the SLAM in order to evaluate the QoS.
D7.1 System Integration and Validation
Page 15 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
3. Deployment In Figure 1, the updated architecture of RAPID has been presented. In this architecture, we also
included information about the devices/machines hosting the components.
The architecture includes the infrastructural components, i.e. the SLA Manager, the Directory Server
and VM Manager. All three components have been installed in the RAPID private cloud and have to
be accessible from the other components (like the acceleration client); therefore, they must have fixed
IP addresses. All three components will receive communication from outside the RAPID private
cloud; therefore, they must have some ports open in the firewall, as shown in Table 5.
Table 5: Ports required to be opened in RAPID private cloud’s external network
Port number Reason
9002 Communication with the SLA Manager
9001 Communication with the Directory Server
9000 Communication with the VM Manager
For the installation of the SLA Manager please refer to D5.3 [8], section 6.
For the installation of the Directory Server please check D5.4 [7], section 5.
For the installation of the VM Manager please check D5.4 [7], section 5.
The Acceleration Client runs on the low-power device or any intermediate component that might
request a task offloading to a more powerful device. The acceleration client is the one that starts the
communication with the rest of the components. No specific socket is bound to the Acceleration Client
components. In RAPID deliverable D4.5 [10] it is possible to find more information about the
Acceleration Client and their installation.
The Acceleration Server runs on the virtual machine, hosted by a physical machine found at the
RAPID private cloud infrastructure. This requires opening the appropriate port in the firewall in order
to make it accessible to the Acceleration Client. The VM needs to communicate with RAPID’s
infrastructural components several times; therefore, they must be accessible from the created VM.
The AS is included in the image when the VM is created. No manual process is required in order to
install the AS. However, the AS uses two ports to listen for connections, 4323 and 5323. The first one
is used for the clear connection between AC-AS and the second one for the SSL connection. As such,
the physical machine or the cloud running the VMs should be configured to allow connections towards
these ports.
In the RAPID private cloud, we will be able to create and execute the new virtual machines. UNP
enables the access to their GPU infrastructure, where the GVirtuS backend is running for the purpose
of executing CUDA code, in order to perform the integration test.
IP Port
193.205.230.23 9991 Table 6: Ports required to be opened in RAPID GPU infrastructure
D7.1 System Integration and Validation
Page 16 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
In case another environment has to be used to execute CUDA code, GVirtuS backend must be running
on the physical machine. For information about installing the GVirtuS frontend, please refer to D6.3
Section 4 [11].
D7.1 System Integration and Validation
Page 17 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4. Verification Several integration verification tests have already been described in various RAPID deliverables. In
D5.1 [12], the integration tests between SLAM, AS and DS have already been described, while in
D5.2 [13] the interaction of the DS with other components including clients, VMM, VM and AS is
thoroughly presented. In D5.3 [8] the integration tests verifying the QoS support have been presented.
D4.5 [10] describes in detail how the AC could be tested against the RAPID framework. The main
RAPID offloading capabilities of Java methods, C/C++ functions and CUDA code were verified via
three representative applications for both Linux and Windows versions of the AC:
The N-Queens puzzle, which is used to test the CPU code offloading. We also use this one to
test the parallelization.
Simple Hello World code, to test the offloading of CPU code that embeds native (C/C++)
code.
A simple matrix multiplication that is used to test the GPU CUDA code offloading.
These applications were also used to verify the functionalities of the integrated RAPID framework.
4.1. Complementary Individual Test
In this section, individual tests verifying the new functionalities implemented are presented.
4.1.1 SLAM
The SLAM is the component that ensures the QoS of each client in RAPID, as detailed in D5.4 [7].
SLAM comprises two basic modules: the SLA-Core and the SLA-socket-parent.
4.1.1.1. SLA-Core
This component provides the REST interface for SLAM and has the following conventions:
Every entity is created with a POST request to the collection URL, A collection is the set of URLs available in the SLA-Core
A query for an individual item is a GET request to the URL of the resource (collection URL +
external id)
Any other query is usually a GET request to the collection's URL, using the GET request
parameters as the query parameters
Any unexpected error processing the request returns a 5XX error code
SLA-Core is composed of the following entities, which are presented in the following subsections:
SLA_enforcement
SLA_repository
SLA_services
SLA_tools
4.1.1.1.1. SLA_enforcement
An enforcement job is the entity which starts the enforcement of the agreement guarantee terms. An
agreement can be enforced only if an enforcement job, linked with it, has been previously created and
D7.1 System Integration and Validation
Page 18 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
started. An enforcement job is automatically created and started when an agreement is created, so there
is no need separately to create one to start an enforcement.
The enforcement is the entity by which it is evaluated that the provider complies with an agreement,
i.e. the measured metrics for the variables in guarantee terms fulfil the constraints. As shown in:
Figure 3. JUnit SLA-Enforcement
Purpose Enforces an agreement and store the results in repository
Test Package eu.atos.sla.enforcement.EnforcementServiceTest
eu.atos.sla.enforcement.AgreementEnforcementTest
Test
Description
This class retrieves all the metrics prior to the evaluation start, at once if the metricsRetriever
implements IMetricsRetrieverV2 interface. If not, falls back to IMetricsRetriever, calling the monitoring
once per metric type.
The needed properties to set are:
agreementEvaluator: in memory evaluation of the agreement
metricsRetriever: IMetricsRetriever implementer that retrieves from monitoring the new
metrics to evaluate.
constraintEvaluator: parse service levels and evaluates if new metrics fulfill them.
maxRetrievedResults: maximum number of values for each metric to retrieve. It has a
default value of
<code>MAX_RETRIEVED_RESULTS</code>.
Expected
Results
Compliance with the agreement and save it in the db
Purpose Test guarantee evaluation process
Test Package eu.atos.sla.evaluation.guarantee.SimpleBusinessValuesEvaluatorTest
eu.atos.sla.evaluation.guarantee.PoliciedServiceLevelEvaluatorTest
eu.atos.sla.evaluation.guarantee.GuaranteeTermEvaluatorTest
eu.atos.sla.evaluation.constraint.simple.OperatorTest
eu.atos.sla.evaluation.constraint.simple.SimpleConstraintParserTest
eu.atos.sla.evaluation.constraint.simple.SimpleValidatorsIterTest
Test
Description
BusinessValuesEvaluator that raises a penalty if the existent number of violations match the
count in the penalty definition and they occur in interval time defined in the penalty definition.
Implements a ServiceLevelEvaluator that takes into account Policies.
In a policy, a non-fulfilled service level by a metric is considered a breach. A policy specifies how many
breaches in a interval of time must occur to raise a violation.</p>
If no policies are defined for the guarantee term, each breach is a violation. Otherwise, only a violation
will be raised (if applicable) in each execution. Therefore, to avoid having breaches not considered as
violations, the policy interval should be greater than the evaluation interval.</p>
The breaches management (load, store) is totally performed in this class, and
D7.1 System Integration and Validation
Page 19 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
therefore, can be considered
as a side effect. The advantage is that this way, the interface for upper levels is cleaner (GuaranteeTermEvaluator and AgreementEvaluator do not know about breaches).
A GuaranteeTermEvaluator performs the evaluation of a guarantee term, consisting in:
A service level evaluation, assessing which metrics are violations.
A business evaluation, assessing what penalties are derived from the raised violations.
Expected
Results
Compliance with the guarantee terms and policies.
Purpose Evaluates an agreement, obtaining QoS violations and penalties.
Test Package eu.atos.sla.evaluation.AgreementEvaluatorTest
Test
Description
The process:
Check what metrics do not fulfil the service levels (breaches).
Check and raise the violations according to the found breaches and policies (if any) of
service levels.
Check and raise the compensations (business violations) that are derived from the raised
violations.
The result is a map that contains for each guarantee term, the list of violations and compensations that were
detected.
There are two possible inputs: - metrics (monitoring provides raw data)
- violations (smart monitoring that provides violations)
Expected
Results
Agreements evaluated and save the last status in the DB.
D7.1 System Integration and Validation
Page 20 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 3. JUnit SLA-Enforcement
4.1.1.1.2. SLA_repository
The SLA_repository is the component interacting with MySQL database. An interface is provided to
save, update, delete and query the database entity with respect to iolation, enforcement, guarantee,
providers, template and agreement. As presented in the Figure 4. JUnit SLA-Repository Iand Figure 5.
JUnit SLA-Repository II
D7.1 System Integration and Validation
Page 21 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Purpose Test the interface with the DB repository
Test Package eu.atos.sla.service.jpa.PolicyDAOJpaTest
eu.atos.sla.service.jpa.BreachDAOJpaTest
eu.atos.sla.service.jpa.PenaltyDAOJpaTest
eu.atos.sla.service.jpa.ViolationDAOJpaTest
eu.atos.sla.service.jpa.EnforcementDAOJpaTest
eu.atos.sla.service.jpa.ProviderDAOJpaTest
eu.atos.sla.service.jpa.TemplateDAOJpaTest
eu.atos.sla.service.jpa.AgreementDAOJpaTest
eu.atos.sla.service.jpa.GuaranteeTermDAOJpaTest
eu.atos.sla.datamodel.PolicyTest
eu.atos.sla.datamodel.BreachTest
eu.atos.sla.datamodel.GuaranteeTest
eu.atos.sla.datamodel.AgreementTest
eu.atos.sla.datamodel.TemplateTest
Test
Description
A data model object storing information to the follows objects: Policy, Breach, Guarantee, Agreement and Template.
A DAO interface to access to the follows objects information: Policy, Breach, Penalty, Violation, Enforcement, Provider, Template, Agreement, GuaranteeTerm
Expected
Results
The objects described above support operations of insert, update and delete in the DB repository.
D7.1 System Integration and Validation
Page 22 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 4. JUnit SLA-Repository I
D7.1 System Integration and Validation
Page 23 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 5. JUnit SLA-Repository II
D7.1 System Integration and Validation
Page 24 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4.1.1.1.3. SLA_services
The SLA_services is the component interacting with the REST services. It is composed of the
following entities as shown in Figure 6. JUnit SLA-Services I and Figure 7. JUnit SLA-
Services II.
Provider: It is used for the registration of the SLA service provider. The Default value is
“Rapid”.
Template/Agreement: Both of them are generated from QoS, containing the rules of the
service level agreement. Monitoring process starts with the activation of the agreement.
Violation: It is triggered with the detection of the violation of an agreement.
Guarantee: is the level of service guaranteed or rules that must be complied.
Purpose Test the SLA services exposes through rest interfaces
Test Package eu.atos.sla.service.rest.business.TemplateRestServiceTest
eu.atos.sla.service.rest.business.AgreementRestServiceTest
eu.atos.sla.service.rest.business.ProviderRestServiceTest
eu.atos.sla.service.rest.ProviderRestTest
eu.atos.sla.service.rest.AgreementRestTest
eu.atos.sla.service.rest.ViolationRestTest
eu.atos.sla.service.rest.EnforcementJobRestTest
eu.atos.sla.service.rest.TemplateRestTest
eu.atos.sla.util.ModelConversionTest
eu.atos.sla.service.rest.helpers.AgreementHelperETest
Test
Description
Rest Service that exposes all the stored information of the SLA core to the follow services:
Template, Agreement, Provider, Violation and Enforcement.
A Model Converter translates objects between data model and service model and vice versa.
Regarding templates and agreements, it is intended to translate between WSAG and data model, but nothing prevents
to use a ModelConveter that translates from a different service model.
Expected
Results
Rest Service can support create, update, delete, start and stop service operations.
D7.1 System Integration and Validation
Page 25 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 6. JUnit SLA-Services I
D7.1 System Integration and Validation
Page 26 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 7. JUnit SLA-Services II
D7.1 System Integration and Validation
Page 27 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4.1.1.1.4. SLA_tools
This component provides the tools to parse between messages formats.
Functions have been built to facilitate conversion between formats: JSON, XML that supports
“Agreement” and “Template” entities. As shown in Figure 8. JUnit SLA-Tools
Purpose Test tools to transforms entity model to XML o JSON
Test Package eu.atos.sla.parser.xml.TestXMLParser
eu.atos.sla.parser.json.TestJSONParser
Test
Description
Provides generic tools for converting required formats into the SLAM to the follow entities:
Template and Agreement.
Expected
Results
Object to Entity model converted to XML or JSON
Figure 8. JUnit SLA-Tools
4.1.1.2. SLA-socket-parent
This component is responsible for receiving and sending requests via sockets, interacts with VMM and
SLAM-Core. As shown in Figure 9. JUnit SLA-socket-parent:
Purpose Test socket interface to communicate with SLAM, VMM and DS components
Test Package eu.atos.sla.core.SLAMRegisterTest
eu.atos.sla.core.MainSLAMTest
eu.atos.sla.core.ThreadPoolServerTest
eu.atos.sla.core.WorkerRunnableTest
Test
Description
This test the socket receipt from the AC when request a registration process and the interaction between the SLAM and VMM later. It also tests the actions related to the increase of resources
when a violation of the guarantees occurs.
D7.1 System Integration and Validation
Page 28 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Expected
Results
Check the correct operation of the interfaces using sockets
Figure 9. JUnit SLA-socket-parent
4.1.2 VMM
This component is responsible for the management of VMs. It performs operations like creation,
resizing or monitoring the use of resources of a VM managing an interface with OpenStack4 API.
4.1.2.1. Notify resize action to AC
Purpose The purpose of this test is to send the AS_RM_MIGRATION_VM message to the AS to inform the AS that VM resources will be unavailable for the VM resize processing time.
External
Dependencies Ensure that the AS and the client are online. After the AS receives the
AS_RM_MIGRATION_VM message, it will communicate with the client.
Instead of using real SLAM, DummyComponent, which is included in the VMM for tests, is used for generating messages from the SLAM.
Test
Description
Step Description Component
Interaction
1 DummyComponent sends the SLAM_CHANGE_VMFLV_VMM message to the
VMM to trigger this test. Sends offload to the next device (More powerful Smartphone).
DummyComponent-VMM
2 The VMM sends the AS_RM_MIGRATION_VM message to the AS.
VMM-AS
Expected
Results
It is be expected that the VMM will receive an OK message from the AS. Otherwise, the resize
process will not be continued.
D7.1 System Integration and Validation
Page 29 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4.1.3 AC
The Acceleration Client (AC) is a component that runs inside the mobile device and offloads their
remote-able tasks into cloud infrastructures for increasing performance and decreasing overall power
consumption.
4.1.3.1. Testing the Offloading Decision
Purpose The purpose of this test is to check that the AC correctly decides the execution location of a
method.
External
Dependencies Setup an ad-hoc local connection between a device and a VM. For this test, there is no
need to use the entire RAPID infrastructure. We run the VM on VirtualBox and the
AC is embedded on an application running on a device connected locally with the VM.
Control the network quality between the device and the VM to see what decision the AC makes when the connection degrades.
Test
Description
Step Description Component
Interaction
1 Run the same task multiple times (100 times) on the
device.
Application
2 The AC takes the decision to offload or to run the
task locally on the device.
AC-AS
3 Change the network quality between the AC and the
VM.
Wi-Fi
4 Repeat steps 1-3 until the network quality is bad
enough to simulate a very low-quality connection.
Expected
Results
When the network connection between AC-AS is good and if the task is computationally intensive, it is expected that the AC will offload the task to the VM.
When the network connection degrades, the AC will decide to run the task locally on the phone, since task transfer and the result receiving times will be quite high.
4.1.4 AS
The Acceleration Server (AS) runs inside a VM and is responsible for executing the offloaded task.
4.1.4.1. Stop accepting new tasks while VM resize is occurring
Purpose The purpose of this task is to verify that while there is a VM upgrade being performed (see
Section 4.2.4), the AS should not accept new tasks.
External
Dependencies All RAPID components, DS, SLAM, VMM, should be up and running.
The client device establishes QoS metrics with the RAPID infrastructure.
A QoS parameter needs to be exceeded while running a task on the VM.
The SLAM should be able to monitor the resource utilization of the VMs.
Test
Description
A device offloads a task to the AS on the VM. The task starts consuming too many resources,
such as CPU The SLAM detects a QoS violation and triggers the VM resize process (see Section 4.2.4).
Expected
Results
The VMM informs the AS that the VM will be shut down and resized. From that moment, the
AS should not accept more tasks from the client.
D7.1 System Integration and Validation
Page 30 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4.2. Integration Test Different test cases have been designed in order to run the RAPID integrated tests. In the next
subsections, we describe parts of the integrated test. They are not completely independent, as there is a
registration process, described in Section 4.2.2 that is needed afterwards in other tests. References are
included within the descriptions.
4.2.1 D2D offloading
Purpose The purpose of this test is to offload from mobile device with very low power capabilities to some device with higher computational capabilities.
External
Dependencies The source mobile device should have low power capabilities.
The low power device should know the IP of the high-power device. To achieve this,
all mobile devices willing to participate in the D2D offloading send periodic HELLO messages in broadcast using User Datagram Protocol (UDP) packets. When sending
these messages, they also embed information about their resources. Devices that capture the messages can obtain the IP of the sender and its power capabilities.
Test
Description
Step Description Component
Interaction
1 Low power device tries to execute and requests the offload to the next device (More powerful
smartphone).
AC low power – AS higher power
2 Next device can try to offload again the execution
(see Test in section 4.2.2)
Expected
Results
Task execution in the high-power device
State OK
4.2.2 Registration process and CPU Offloading
Purpose A client that wants to use the services provided by the RAPID system, installed in a private or
private cloud, has to make a registration into the system.
External
Dependencies Connectivity must be ensured between AC and SLAM.
Test
Description
Step Description Component
Interaction
1 When SLAM gets the first call from the AC, it receives the QoS as described in Section 2.2. A new
SLA is created associated to the client and with the QoS specified.
AC - SLAM
2 A virtual machine is created for the AC. The parameter received in QoS (gpu_requested) has
to be forwarded to VMM so it can create the VM in a physical machine with CPU
SLAM - VMM
3 The new VM created has to register itself in the DS VM - DS
4 AC receives the information of the VM (IP) created
for the client
VM-AC
5 AC offloads the tasks to the new VM (when
gpu_requested enabled test from Section 4.2.3 has to run in parallel)
AC-VM
6 Test from Section 4.2.4, Section 4.2.5 and Section 4.2.6 can be executed in parallel
D7.1 System Integration and Validation
Page 31 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Expected
Results
Task is executed in VM in CPU
State OK
4.2.3 Offloading task to a GPU
Purpose A client that wants to use the services provided by the RAPID, in this case the task will be executed in the GPU.
External
Dependencies Prior to having this part of the test, the registration process described in Section
4.2.2. must have taken place. Also, a pending task in the client device must have been decided to be offloaded.
In order to execute this test, the matrix multiplication algorithm is used, a test code.
Test
Description
Step Description Component
Interaction
1 AC-DFE offloads code to AS AC - AS
2 AS-GPU Bridger offloads the code to the GVirtuS
Backend
AS – GVirtuS
Backend
3 Once the code has been executed, results are returned. GVirtuS Backend -
AS
4 Execution results are returned. AS - AC
Expected
Results
Task is executed in remote VM with GPU.
State OK
4.2.4 Enhancing the VM characteristics
Purpose Virtual machine which will be used by the AC has to be created with specified characteristics defined in the agreement.
A violation occurs when the task process gets executed by the AC which requires more resources than those established and its necessary increment the capacity of the VM
The purpose of this test is to increase the VM resources: (CPU, RAM, DISK) when a SLA violation occurs.
Specifically, the VM characteristics which will be enhanced as follows: the number of cores
that will be increased when the VM CPU usage is too high, or the RAM will be increased
when the RAM usage is too high or the disk space will be increased based on the disk usage.
QoS checks are performed periodically, currently, every minute. Due to the characteristics of
the tasks that could be potentially offloaded, a very low frequency of QoS checks, such as
once per hour, would not be useful, because the client might be using the machine for less
than that time. Similarly, a very high frequency, such as once per second, might cause
overload in the system.
External
Dependencies Prior to having this part of the test, we must have had the registration process
described in Section 4.2.2. must have taken place.
Also, to increment the VM resources is necessary that an agreement violations is produced
Within the QoS we must have some values that we want to have guaranteed. In Section 2.2 we have described how these QoS are described. The SLAM will be
monitoring the VM created for a specific client and will check that the expected QoS are fulfilled.
D7.1 System Integration and Validation
Page 32 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Test
Description
Step Description Component
Interaction
1 The SLAM requests the metrics from the VMM. The metrics may refer to CPU, memory or disk, depending on the term to
guarantee in the SLA.
SLAM – VMM
2 When a violation is detected the process of upgrading a VM
characteristic will be initiated.
SLAM
3 SLAM decides and requests from VMM to change the VM
flavour due to the violation
SLAM-VMM
4 VMM informs to AS that a change in the VM has been
requested
VMM - AS
5 Prior to executing the change in the VM. AS has to inform
AC that a change will happen
AS - AC
6 Confirmation that the AC accepts the change AC - AS
7 VMM is notified that the change is accepted AS - VMM
8 VMM executes the change on theVM VMM - VM
9 VM is restarted, this implies that AS will restart and register
itself in DS
AS-DS
10 AC will retry periodically to re-register with the AS. If
the AS is not ready yet, the AC will simply run everything locally on the phone.
AC-AS
Expected
Results
VM CPU, RAM or DISK increased.
State OK
4.2.5 Task parallelization
Purpose Some methods can be executed on multiple VMs in parallel in a map-reduce architecture. To
be able to perform the parallelization, the developer should take care of handling the
partitioning of the computation and the reduction of the partial results, using concepts similar
to map-reduce. When a method is annotated as parallelizable, the AS will ask the DS for the
IPs of helper machines. The helper machines are VMs that have been previously created in
the same physical machine as the VM assigned to the client, in the best-case scenario, but can
also be located in other physical machines. The AS will offload the code to the helper
machines and the task execution will be performed in parallel. When all VMs have finished
their executions, they return the partial results to the initial VM, which executes the reduce
function implemented by the developer to create the final result.
External
Dependencies Prior to having this part of the test, the registration process described in Section 4.2.2
must have taken place. Also, a task must have been decided to be offloaded to a remote machine.
The present test can run in parallel with the test described in Section 4.2.3 and
Section 4.2.4. The algorithm from the N-Queens puzzle has been adapted to run with parallelization. The code must be specially prepared in order to take advantage of the
parallelization capabilities of RAPID.
Test
Description Step Description Component
Interaction
1 AC will offload a parallelizable task to the AS AC – AS1
2 AS-DFE will request a parallelization to DS AS1 - DS
3 DS returns the IPs of helper VMs (AS-DFE-H) DS – AS1
4 AS-DFE will 'forward' the task to different helpers AS1 – AS2-Helper
D7.1 System Integration and Validation
Page 33 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5 The code is executed and the result is returned to AS1 AS2 – Helper
- AS1
6 At the same time, we should have the SLAM running, and
checking the metrics from the helper machines
SLAM -
VMM
7 If a QoS violation occurs the SLAM has to request a new VM
to the VMM
SLAM -
VMM
8 A helper machine is created and registered to DS as a new
helper machine
VMM - DS
9 New helper machine should be used AS3-Helper
Expected
Results
Task is run with parallelization, respecting SLAs
State OK
4.2.6 Task forwarding
Purpose Forwarding occurs when a task is executed and returns an error due to lack of resources in the VM, for example the lack of memory to execute the task. The error would not be controlled
by the application but by the RAPID system. The AS will forward the task to a helper machine with higher capabilities, which will be assigned to the AS by the DS. This helper
machine could be in the same physical machine as the current VM, or could be in a different one, even in a different cloud.
External
Dependencies Prior to having this part of the test, the registration process described in Section 4.2.2
must have taken place. Also, a task must have been decided to be offloaded to a remote machine.
The present test can run in parallel with the test described in Section 4.2.3 and Section 4.2.4.
Test
Description
Step Description Component
Interaction
1 Some execution returned error (at RAPID code level), so AS decides to make a forwarding
AC – AS1
2 AS1-DFE requests a forwarding from DS AS1 - DS
3 DS returns the IP of another VM (AS2-DFE-H). DS – AS1
4 AS1-DFE forwards the task to helper machine AS1- AS2-Helper
5 Code is executed and result is returned to AS1 AS2 - AS1
6 AS1 returns the result to the AC on the client. AS1 - AC
Expected
Results
A special version of the N-Queens puzzle will be created to simulate the high resource utilization in order to force the forwarding. The first time the code is offloaded and executed,
it will return an error just to force to the AS to forward the execution to another machine.
State On Going
4.2.7 Multiple clients
Purpose All previous tests are described with a single client executing its code. the purpose of this test is to verify that in a real environment, the RAPID system should be
able to handle several clients at the same time. The tests described above (Registration and CPU offloading 4.2.2, GPU offloading 4.2.3, VM enhancing 4.2.4, Task parallelization 4.2.5,
Task forwarding 4.2.6
External
Dependencies RAPID infrastructural components, such as DS, SLAM, VMM, should be up and
running.
Test
Description
We use multiple client devices, smartphones or smartphone emulators, to perform this real-life test. The clients will be used independently to perform different operations.
Expected The RAPID system (all the components) should properly handle different situations with
D7.1 System Integration and Validation
Page 34 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Results multiple clients. None of the components should misbehave or crash.
State OK
The integration tests have been successfully carried out, verifying the correct communication between
the components of the RAPID architecture.
D7.1 System Integration and Validation
Page 35 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5. Validation Section 4 presented the seven tests performed to check that the main RAPID functionalities are
working properly.
In Section 5.4.13 we have increased the resources consumption and the duration of the tasks in order
to test the SLAM. The SLAM monitoring has been set to run every minute as stated in Section 2.3,
therefore the tasks must run for a minute at least.
Each test has been executed; some changes had to be implemented before being able to deliver the
final system. These changes have been already documented in Section 2.
The validation scenarios of the RAPID Infrastructure are presented in detail in the following sections
5.1. Scenario: Run Generic APP in RAPID infrastructure for the first time
Steps Description Reference §
1 Device configuration 5.4.1
2 Registration Process 5.4.2
5.2. Scenario: Run Generic APP in RAPID infrastructure using an existing
VM
Steps Description Reference §
1 Device configuration 5.4.1
2 Offloading task a CPU 5.4.3
3 Offloading task a GPU 5.4.4
4 With AC component not available 5.4.5
5 With DS component not available 5.4.6
6 With SLAM component not available 5.4.7
7 With VMM component not available 5.4.8
8 With AS component not available 5.4.9
9 Task parallelization 5.4.10
10 Task forwarding 5.4.11
11 Multiple clients 5.4.12
5.3. Scenario: Run Generic APP in RAPID infrastructure by resizing existing
VM When enhancing the characteristics of the VM, there were limitations in the flavours existing in
OpenStack. A set of flavours have been predefined in OpenStack with all 18 possible combinations
with 1, 2 or 4 cores, 1024, 2048 or 4096 MB RAM and 20 or 40 GB Disk, as show in Figure 7. We
started with the minimal VM configuration: 1 core, 1024 MB RAM and 20 GB disk and depending on
the violation, we increased one or another characteristic of the VM.
When two or more violations occurred, preference was given to first changing the CPU, afterwards the
RAM and finally the disk. Per OpenStack restrictions, only one parameter can be changed each time.
D7.1 System Integration and Validation
Page 36 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
The way to perform this parameter change in OpenStack is to change one flavour to another. A flavour
group is a set of configurations and only allows the change of a single characteristics. If the CPU is
modified, then the memory and disk settings should remain the same. Also, if the memory is modified,
then the CPU and Disk must remain unchanged and, finally if the disk is modified, then the CPU and
memory should not change.
QoS indicators will be considered by the SLAM for the monitoring, verification and action processes
when a SLA violation occurs. The QoS indicators in the scope of the validation tests are the following:
cpu_util: QoS metric related to the VM CPU usage (%). If the CPU stays below a predefined
percentage threshold, it means that quality levels are acceptable. For example, “cpu_util LT
60” in Database or “{ "term":"cpu_util","operator":" lt ", "threshold":60}” in QoS format,
provides a condition of the CPU utilization to be kept below 60%.
mem_util: QoS metric related to the percentage of memory (RAM) utilization (%). If the
memory is kept below a predefined percentage threshold, it means that quality levels are
acceptable. i.e.: mem_util LT 30 in Database or “{"term":"mem_util","operator":" lt ",
"threshold":30}” in QoS format.
disk_util: QoS related to the utilization of hard disk (DISK) (%). If the DISK is kept below a
predefined percentage threshold, it means that quality levels are acceptable. i.e.: disk_util LT
60 in Database or “{ "term":"disk_util","operator":" lt ", "threshold":60}” in QoS format.
Table 7: List of flavours created
Number of CPU cores RAM (MB) DISK GB Flavor Name
1 1024 20 1_1024_20
1 2048 20 1_2048_20
1 4096 20 1_4096_20
1 1024 40 1_1024_40
1 2048 40 1_2048_40
1 4096 40 1_4096_40
2 1024 20 2_1024_20
2 2048 20 2_2048_20
2 4096 20 2_4096_20
2 1024 40 2_1024_40
2 2048 40 2_2048_40
2 4096 40 2_4096_40
4 1024 20 4_1024_20
4 2048 20 4_2048_20
4 4096 20 4_4096_20
4 1024 40 4_1024_40
D7.1 System Integration and Validation
Page 37 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
4 2048 40 4_2048_40
4 4096 40 4_4096_40
Steps Description Reference
1 Device configuration 5.4.1
2 Overhead in CPU usage 5.4.13
3 Overhead in RAM usage 5.4.14
4 Overhead in disk usage 5.4.15
5 Overhead in cpu, memory and disk usage all together 5.4.16
5.4. Detailed Scenario Steps
5.4.1 User registration process
Purpose To use the RAPID infrastructure, the user must register a mobile device configured with an
app installed and configured to access the AC and Cloud Services. The registration process
consists of requesting the use of a new or existing VM to accelerate the execution of the
apps in the mobile device.
Requirements Have an app with connectivity to Cloud Services.
The AC is embedded into the app.
Expected
outcomes
New VM is started and available to be used. User is registered.
App task is running in VM
Validation
scenario
Consists of launching a request to use a new VM that will provide a high level of resources for the execution of the app tasks.
Validation
checks
App Available Connectivity with Cloud Resources
Cloud Resources started and available
Validation
steps
The steps are as follows:
1
We will begin by sending a request to use the RAPID Infrastructure from a mobile application called
"RAPID Offload" using BlueStack[22] emulator to emulate an Android mobile device and may run the Android app.
2 We connect to the environment with the OpenVPN app for Android
D7.1 System Integration and Validation
Page 38 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
3 The application exposes the option to use existing VM or launch a new VM
4 We select the option: “connect to a new VM” and send our request by clicking the “Start” button, the
QoS indicators will be sent immediately to the SLAM.
D7.1 System Integration and Validation
Page 39 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.2 D2D offloading to CPU
Purpose The purpose of D2D offloading is to exploit two Android devices with different capabilities.
Requirements At least two Android devices with the RAPID system is installed.
Expected
outcomes
Devices will periodically broadcast HELLO messages containing information about
their resources. Devices will receive HELLO messages from other devices and will store the
information about their neighbouring devices. Whenever a local task must be executed on the device with less resources, the device
will offload the task to a more powerful neighbouring device.
Validation
scenario
Consists two Android smartphones with different capabilities, where one is more
powerful than the other.
Validation
checks
Both devices send their HELLO messages.
Both devices receive the messages sent by the other device. The less powerful device is able to offload a task to the more resourceful one.
Validation
steps
We used two Android devices, one Sony Xperia Z5 (Octa-core (4x1.5 GHz Cortex-A53 & 4x2.0 GHz Cortex-A57 CPU, 3 GB RAM, Android 7.0) and one Motorola Moto G
1st generation (Quad-core 1.2 GHz Cortex-A7 CPU, 1 GB RAM, Android 5.1). We run the AS on the Sony Xperia Z5 phone, since it is more powerful and the Android
demo application (which includes the AC) on the Motorola Moto G phone. Both phones are connected to the same Wi-Fi router, so they are on the same network.
We observe the logs of the phones, and see that they send and receive the HELLO messages.
When we try to run a task on the Motorola, the execution gets offloaded to the Sony device. The Sony device executes the task and sends the result to the Motorola.
D7.1 System Integration and Validation
Page 40 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.3 Offloading task to a GPU
Purpose The purpose of device to infrastructure offloading is to compare two Android devices with
different capabilities.
Requirements A device with the demo application containing the CUDA matrix multiplication installed.
Prior to having this part of the validation, we must have had the registration process described in Section 4.2.2
GVirtuS backend should be up and running on UNP’s server. The GVirtuS back-end should be deployed on a machine “close” to the ThinkAir accelerator
server.
Expected
outcomes
The CUDA calls the matrix multiplication application which will be offloaded to the
GVirtuS backend.
Validation
scenario
Consists of two Android smartphones with different capabilities, where one is more
powerful than the other.
Validation
checks
The CUDA code is correctly offloaded to the GVirtuS backend
Validation
steps Test the connection with the GVirtuS back-end executing a CUDA device query
from the Android RAPID application.
Perform a single matrix multiplication test in order to check the overall behaviour.
Perform an “all-4” test.
Perform an “all-9” test.
Compare and contrast the test results (charts and tables) between different devices.
5.4.4 Cloud Services with DS not available
Purpose The purpose of this step is to perform a test when Directory Server of the RAPID
infrastructure is not available.
Requirements Prior to having this part of the validation, we must complete the registration process
described in Section 4.2.2
Expected
outcomes
The SLAM tries to connect to the DS and receives an error. The VMM tries to connect to the DS and receives an error. The VMM indicates that the
DS is not running and finishes its execution. The AC client tries to connect to the DS and receives an error.
Validation
scenario
SLAM case: The SLAM starts and tries to register itself in the DS. It will receive a Socket ERROR message if the DS is not running
VMM case: The VMM starts and tries to register with the DS. It will receive an ERROR message and stop working if the DS is not running. Otherwise, it will start normally.
AC case: A RAPID app starts and tries to register with the system. The DS is down or replies with an ERROR message.
Validation
checks
SLAM case: If the DS is not running, the SLAM returns an error and continues running.
VMM case: If the DS is not running, the VMM stops running.
AC case: The AC stops the registration process and runs all tasks locally. The AC tries to
D7.1 System Integration and Validation
Page 41 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
connect to the DS periodically.
Validation
steps
SLAM case:
We kill the DS process.
The SLAM starts and tries to connect with the DS.
The registration process fails, and the SLAM continues to listen.
VMM case:
We kill the DS process.
The VMM starts and tries to connect with the DS.
The registration process fails, and the VMM terminates.
AC case:
We kill the DS process.
We use the RAPID demo app to test the behaviour of the AC.
D7.1 System Integration and Validation
Page 42 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
The app starts and the AC tries to connect with the DS, without success.
The registration process fails, all task executions are performed locally on the device.
5.4.5 Cloud Services with SLAM not available
Purpose The purpose of this step is to perform a test when unavailability of SLAM component of the
RAPID infrastructure occurs.
Requirements Have an app with connectivity to Cloud Services.
The AC is embedded into the app.
Expected
outcomes
The VMM tries to connect SLAM and receives an error. The AC client tries to connect SLAM and receives an error.
Validation
scenario
VMM case: The VMM tries to register with the SLAM after it registers with the DS. If the SLAM is not running, it will receive an ERROR message and stop working.
Otherwise, it will start normally.
AC case: Consists an app registering with the RAPID infrastructure, first with the DS and then with the SLAM.
Validation
checks
VMM case: If the SLAM is not running, the VMM stops running.
AC case:
The AC receives an ERROR from the DS, because there is no available SLAM.
The AC tries to register with the DS periodically until the SLAM becomes available.
Validation
steps VMM case:
We kill the SLAM component.
The VMM starts and tries to connect with the SLAM.
The registration process fails, and the VMM will be terminated.
AC case: 1. We kill the SLAM component.
2. We use the RAPID demo application to test the behaviour of the AC. 3. The app starts, connects with the DS, but doesn’t receive any SLAM address.
4. The registration process fails, all tasks will be executed locally on the device.
D7.1 System Integration and Validation
Page 43 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.6 Cloud Services with VMM not available
Purpose The purpose of this step is to perform a test when the unavailability of the VMM component of the RAPID infrastructure occurs.
Requirements Prior to having this part of the validation, we must have had the registration process
described in Section 4.2.2
Expected
outcomes
The SLAM tries to connect to VMM and receive a Socket error.
Validation
scenario
Consists of a request to VM between SLAM to VMM
Validation
checks
The SLAM Error. SLAM cannot connect to VMM socket
When SLAM connect to VMM and can started a VM. SLAM shows this trace:
Validation
steps
1. Send a request from SLAM to VMM
RapidMessages.SLAM_START_VM_VMM status: 0 is expected
5.4.7 AS not available
Purpose The purpose of this step is to perform a test when unavailability of AS component of the
RAPID infrastructure occurs.
Requirements Prior to having this part of the validation, we must have had the registration process described in Section 4.2.2
D7.1 System Integration and Validation
Page 44 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Expected
outcomes
VMM case: The VMM tries to connect to the AS while resizing the VM on the request of
the SLAM and receives an error.
AC case:
Scenario 1: AC tries to register with the AS but AS is not available/reachable.
Scenario 2: AC is already registered with the AS and tries to offload a task to the AS, but the AS is not available or reachable.
Validation
scenario
VMM case: The VMM stops the target VM using the OpenStack dashboard. VMM tries to
connect to the AS if SLAM asks the VMM of resizing the VM . VMM sends an Error message to the SLAM if AS is not responding
AC case:
Scenario 1: Consists of the demo app registered correctly with the DS, SLAM, and tries to register with the AS, but the AS is not reachable.
Scenario 2: Consists of the demo app registered correctly with the DS, SLAM, and
AS. The AS is then destroyed. The AC tries to offload a task to the AS.
Validation
checks
VMM case: If the AS is not running, the VMM returns an error message to the SLAM.
AC case:
Scenario 1: The AC cannot register with the AS, runs everything locally on the device. The AC tries to register again periodically, repeating the whole registration
process by registering again with the DS and the SLAM.
Scenario 2: The AC cannot offload a task to the AS. The AC will try to register to
the AS periodically (no need to register again with the DS and SLAM).
Validation
steps
VMM case:
We kill the AS process by using the OpenStack dashboard.
The VMM tries to connect with the AS upon the request of the SLAM.
The VMM cannot connect to the AS, and the VMM sends an error message to the SLAM.
AC case:
Scenario 1:
o We use the demo app, triggering the registration process.
o The AC registers with the DS and the SLAM. A VM is started and the AC gets the IP of the VM.
o At that moment, we destroy the VM, so the AC cannot register.
Scenario 2:
o We use the demo app, triggering the registration process.
o The AC registers with the DS, the SLAM, and with the AS.
o We destroy the AS after the AC is registered and connected with the AS.
o Then we force the AC to offload a task and see that the offloading fails and the execution is performed locally on the device.
5.4.8 Task parallelization
Purpose The purpose of this step is to perform a test with an application that can exploit distributed
computation on different VMs.
D7.1 System Integration and Validation
Page 45 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Requirements Device with the demo application installed.
The application should have been developed in a map-reduce way, with the developer handling the reduction function of the partial results.
Prior having this part of the validation, we must have had the registration process described in section 4.2.2
Expected
outcomes
The AS should be able to request helper VMs to the DS. The task will be executed in parallel by multiple VMs
Validation
scenario
Consists of using a device with the demo application installed. We can trigger the request for parallel execution by choosing the number of VMs we
want to be utilized on the remote side from the application’s user interface
Validation
checks
The AC correctly instructs the AS that the task can be parallelized.
The AS correctly requests VM helpers from the DS. The DS correctly allocates the helper VMs.
The task is correctly distributed among the VMs. Execution is performed on all VMs and the partial results are received by the AS on the
main VM. The AS correctly reduces the partial results using the function provided by the
developer. The AS correctly sends the result to the AC on the client device.
Validation
steps
We use the demo application to run a version of the N-Queens application that can exploit distributed parallel computing (see Figure 10).
We select the number of VMs we want to use. We start the execution of the task, enforcing the AC to offload the computation.
The AC offloads the task to the AS and sends the number of VMs requested. The AS asks the DS to provide the IPs of the needed VM helpers.
The AS distributes the execution between the VM helpers. Each VM performs its own part of the computation. The main AS receives all partial results and combines them in one result.
The main AS sends the result back to the AC on the device.
D7.1 System Integration and Validation
Page 46 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Figure 10. Screenshot of the RAPID demo app with the number of VMs set to 4, so the
task can be executed on distributed way on multiple VMs.
5.4.9 Task forwarding
Purpose The purpose of this test is to check that an AS can forward the computation of a task to more powerful VM if the execution fails on the current device or VM.
Requirements Prior to having this part of the validation, we must have the registration process described in
Section 4.2.2. The modified version of the N-Queens application, which will trigger an error when running
on the VM so that the AS will try to forward the execution to a more powerful VM.
Expected
outcomes
The device offloads a task to the VM.
The AS on the VM runs the task, which throws an error indicating that there are not enough resources.
The AS then requests a more powerful device to offload the task. After the task is offloaded to the helper VM, it is executed there, and the result is returned to the main
VM and later to the device.
Validation
scenario
Consists of using a device with the demo application installed. We will use a modified version of the N-Queens application to artificially trigger an
error when the task is offloaded, so that the AS will forward to another VM.
Validation
checks
From the demo application, we can set a flag to indicate that we want to trigger a task forwarding.
The AC offloads the task.
D7.1 System Integration and Validation
Page 47 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
The task throws an error.
The AS captures the error and requests the DS to allocate a more powerful VM. The DS allocates the helper VM.
The AS forwards the task to the helper VM. The helper VM executes the task and sends the result to the AS on the main VM.
The AS sends the result to the AC on the device.
Validation
steps
We use the demo application to run a version of the N-Queens application, which
throws an error that can trigger the forwarding on the remote side (see Figure 11). We enable the flag “Check to enforce forwarding” below the N-Queens application.
We start the execution of the task, enforcing the AC to offload the computation. The AC offloads the task to the AS.
The AS executes the task, which throws the error. The AS captures the error and asks the DS to allocate a helper VM with more
resources. The helper VM executes the task and sends the result to the main VM.
The main AS receives the result and sends it to the AC on the device.
Figure 11. Screenshot of the RAPID demo app with the forwarding flag enabled, so the
task will throw an error on the main VM and the main VM will forward the task to a
more powerful VM.
D7.1 System Integration and Validation
Page 48 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.10 Multiple clients
Purpose The purpose of this test is to check that the RAPID platform can support multiple clients at the same time.
Requirements At least two devices running some RAPID-enabled application.
Both devices must have the registration process described in Section 4.2.2.
Expected
outcomes
All devices (clients) should be able to connect with the RAPID platform and perform
task offloading transparently from each other.
Validation
scenario
Consists of using two devices with RAPID-enabled applications installed. We use the RAPID-enabled applications on both devices
Validation
checks
Applications on both devices are able to connect with the RAPID platform. RAPID allocates different VMs for each device.
Applications on different devices can offload CPU and GPGPU tasks independently and transparently from each other.
Validation
steps
We run the RAPID demo app on one physical phone (Huawei P9 Lite, Android 7.0) and on an emulator with Android 6.0 (see Figure 12).
We run different tests and verify that the devices are independent from each other and they correctly perform their tasks.
In the screenshots in Figure 12 we show the execution of the N-Queens with 7 queens on both devices.
Figure 12. Screenshot of the RAPID demo running on one phone and one emulator at
the same time, showing that RAPID supports multiple clients in a transparent way.
D7.1 System Integration and Validation
Page 49 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.11 Overhead in VM CPU usage
Purpose The purpose of this step is increment the VM resource when occurs a violation for CPU
usage.
Requirements Prior to having this part of the validation, we must have had the registration process
described in Section 4.2.2 Set of valid flavours predefined in OpenStack
It will be necessary to have a physical machine with enough resources: cpu, memory and disks, per defined “flavours” described in Table 7: List of flavours created
QoS Violation Flavours available
Expected
outcomes
VM characteristics enhanced
Validation
scenario
After user registration and provisioning of a new VM, SLAM begins to monitor the
metrics contains in the QoS. The QoS indicates the type of metric to be monitored, and can be cpu, memory, disk or a combination of these.
The values of the metrics for each user/machine will reach the SLAM at approximately one minute intervals.
Each of the received values is compared to the umbral agreed QoS and in case of an increase in CPU usage above the established QoS, a QoS violation occurs, causing the
action to: CPU increase, if the New CPU number does not exceed the maximum number of CPUs allowed, which in this case is: 4.
Validation
checks
QoS active = 4%
# CPU current = 1 % CPU current = 4.055%
# CPU new = 2 % CPU new = 2%
Validation
steps
They are presented as follows:
1
The request to start a new VM with the features is taken from the agreement is sent to the VMM Manager,
with 2 QoS: cpu_uti <220%, mem_util <60% metrics.
D7.1 System Integration and Validation
Page 50 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
2 Once the new VM is started, the scheduled process of monitoring metrics is started, in the case of this test
we focus on cpu_util. For example: cpu_util = 0.10, mem_util = 31
3 The application is connected and using the new VM, a process to increase CPU load will be sent to the new
VM byclicking the button: "Solve NQueens"
At the moment, there is no violation of QoS because we have a Guarantee Term: GT_cpu_util LT 220.
For test object, we will change Guarantee Term to: GT_cpu_util LT 4. Now, 4% will be the maximum
D7.1 System Integration and Validation
Page 51 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
allowed CPU usage.
4 As we can see in the logs capture, metrics do appear and when the CPU load is above 4% a QoS violation
occurs.
(*) cpu_value = 4.055
5 The violation of the QoS: cpu_util leads us to take the action of requesting a number of CPU increase into
the VMM.
The VMM is requested to change the number of CPUs from 1 to 2.
OpenStack requires a confirmation of the resize before making the change and perform the restart of
the VM.
VMM is responsible for notifying the AS / AC that a restart is to be performed when the VMM
confirms the resize operation has been acknowledged to the SLAM.
D7.1 System Integration and Validation
Page 52 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.12 Overhead in VM RAM usage
Purpose The purpose of this step is to increment the VM resources when a RAM usage violation
occurs.
Requirements Prior to having this part of the validation, we must have the registration process described in
Section 4.2.2 Set of valid flavours are predefined in OpenStack
It will be necessary to have a physical machine with enough resources: cpu, memory and disks, per the defined “flavours”, described in Table 7: List of flavours created
QoS Violation Flavours available
Expected
outcomes
VM characteristics enhanced
Validation
scenario
After user registration and provisioning of a new VM, SLAM begins to monitor the metrics contains in the QoS. The QoS indicates the type of metric to be monitored
which can be cpu, memory, disk or a combination of these. The values of the metrics for each user/machine reaches the SLAM at approximately
1min intervals. Each of the received values is compared to the umbral agreed QoS and in case of an
increase in RAM usage above the established QoS, a QoS violation causes the action to increase the RAM, as long as the New RAM does not exceed the maximum number of
RAM allowed, which in this case is: 4096MB
At the moment, there is no violation of QoS because we have a Guarantee Term:
GT_mem_util LT 45% and in last metric mem_util is 32%.
(*) percent_mem_value = 32.421875
D7.1 System Integration and Validation
Page 53 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
For test purposes we will change Guarantee Term to: GT_mem_util LT 30 allowing
30% will be the maximum MEM usage.
Validation
checks
QoS current value = 30%
# RAM current= 1024 % RAM current = 32%
# RAM new= 2048 % RAM new = 15%
Validation
steps
The steps are as follows:
1
As we can see in the logs capture, metrics continue to be received and a QoS violation occurs when MEM
load is above 30%.
2 The violation of the QoS: mem_util leads us to take the action of requesting a MB of MEM increase in the
VMM.
The VMM is requested to change the MB size of MEM from 1024MB to 2048MB.
3 After making the change, OpenStack requires a confirmation of the resize before applying the change and
performing the restart of the VM.
VMM is responsible for notifying the AS / AC that a restart is to be performed.
D7.1 System Integration and Validation
Page 54 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
5.4.13 Overhead in VM DISK usage
Purpose The purpose of this step is to increment the VM resource when a disk usage violation
occurs.
Requirements Prior to having this part of the validation, we must have the registration process described in Section 4.2.2
Set of valid flavours predefined in OpenStack It will be necessary to have a physical machine with enough resources: cpu, memory and
disks, per the defined “flavours”, described in Table 7: List of flavours created
Flavours available
Expected
outcomes
VM characteristics enhanced
Validation
scenario
After user registration and provisioning of a new VM, SLAM begins to monitor the metrics in the QoS. The QoS indicates the type of metric to be monitored, and can be
cpu, memory, disk or a combination of these. The values of the metrics for each user/machine reaches the SLAM at approximately
1min intervals. Each of the received values is compared to the umbral agreed QoS and in case of an
increase in disk usage above the established QoS, a QoS violation occurs, causing the action of disk increase. The New DISK values cannot exceed the maximum number of
disk allowed, which in this case is 40GB
At the moment, there is no violation of QoS because we have a Guarantee Term:
GT_disk_util LT 40% and in last metric disk_util is 0.02899%
(*) percent_disk_value = 0.02899%
For test purposes we change Guarantee Term to: GT_disk_util LT 40. Now, 0.027%
will be the maximum allowed disk usage.
D7.1 System Integration and Validation
Page 55 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Validation
checks
QoS current value= 30% # DISK current = 20GB
% DISK current = 0.028% # DISK new= 40GB
% DISK new = 0.014%
Validation
steps
They are as follows:
1
As we can see in the logs capture, metrics continue to be received and when MEM load becomes greater
than 0.027% a QoS violation occurs.
2 The violation of the QoS: disk_util leads us to take the action of increasing GB inside the VMM disk
The VMM is requested to change the GB size of DISK from 20GB to 40GB.
D7.1 System Integration and Validation
Page 56 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Max. Value (40GB) is reached.
3 After making the change, OpenStack requires a confirmation of the resize before applying the change and
performing the restart of the VM.
VMM is responsible for notifying the AS / AC that a restart is to be performed
5.4.14 Overhead in VM CPU, RAM and DISK usage all together
Purpose The purpose of this step is to increment the VM resources when a cpu, ram and disk
violation occurs all together
Requirements Prior to having this part of the validation, we must have the registration process described in
section 4.2.2 Set of valid flavours predefined in OpenStack
It will be necessary to have a physical machine with enough resources: cpu, memory and disks, per the defined “flavours”, described in Table 7: List of flavours created
QoS Violation Flavours available
Expected
outcomes
VM characteristics enhanced
Validation
scenario
After user registration and provisioning of a new VM, SLAM begins to monitor the
metrics contains in the QoS. The QoS indicates the type of metric to be monitored, and can be cpu, memory, disk or a combination of these.
The values of the metrics for each user/machine reaches the SLAM at approximately 1min intervals.
Each of the received values is compared to the umbral agreed QoS and in case of an increase in disk usage above the established QoS, a QoS violation occurs, causing the
action to increase cpu, mem and disk values, as long as the New CPU, RAM, DISK does not exceed the maximum number of disk allowed, which in this case is: 4 CPU
cores, 4096BM RAM and 40GB disk.
At the moment, there is no violation of QoS because we have a Guarantee Term:
GT_cpu_util LT 1.6% and in last metric cpu_util is 2%
Term: GT_mem_util LT 31% and in last metric mem_util is 30%
D7.1 System Integration and Validation
Page 57 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
Term: GT_disk_util LT 0.026% and in last metric disk_util is 0.02899%
In this case, we perform the test when 3 QoS violations occur simultaneously.
Validation
checks
QoS active= 1.6% # CPU current = 2
% CPU current = 2% # CPU new = 4
% CPU new= 0.08%
QoS current value= 31% # MEM current = 1024
% MEM current = 32% # MEM nuevo = 2048
% MEMNuevo = 16%
QoS current value = 0.026% # DISK current = 20GB
% DISK current = 0.028% # DISK new= 40GB
% DISK new = 0.013%
Validation
steps
They are as follows:
1
As we can see in the logs, metrics are received unless CPU load is above 1.6% (1.7%)
2 As we can see in the logs capture, metrics continue to be received and when MEM load is above 31%
(32%) a QoS violation occurs.
D7.1 System Integration and Validation
Page 58 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
3 As we can see in the logs capture, metrics continue to be received and when MEM load is above 0.026%
(0.028%) a QoS violation occurs.
4 And finally, all violations are resolved.
D7.1 System Integration and Validation
Page 59 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
6. Conclusions The integration and validation tests that verify the connectivity of the components of the RAPID
architecture have been carried out. Each component is independent in its start and stop; however they
interact with each other to achieve the correct operation of RAPID.
Basically, the following components: Accelerators, Cloud Services, VM and GPU have been defined
and tested
Everything starts with the Acceleration Client (AC) that is located inside the mobile app and is the one
that provides the user the front-end interface, for the registration to RAPID in first place and for high
available resource use. This component is responsible to communicates with the accelerator server
(AS) part which is inside a VM.
In this communication, the DS interacts with all components in architecture as a registration
component, recording ports, IP. The VMM manages the creation of new virtual machines and VM
configuration. The SLAM component monitors the agreed levels of service and when any violations of
the established service level thresholds given value, a corrective action, that usually is triggered in
order to an increase of resources is executed.
Finally, GPU component provides high level of processing especially task-oriented, with a great
demand for graphics processes.
These components are orchestrated and communicated together and are essential for RAPID
operations. The absence or unavailability of any of them would cause RAPID as a whole to be
unavailable for us. The RAPID infrastructure provides a fast, flexible and scalable solution.
It is fast in the sense that just a simple registration in the platform is enough to begin using it.
It is flexible, in case of a violation issue, the control mechanisms take automated actions to improve
the service level. It is scalable because it allows the increase of the amount of VM resources if the
physical infrastructure allows.
The GPGPU acceleration depicts two operational scenarios: i) the application is offloaded on the
accelerator server; the GVirtuS back-end is installed on a physical machine “close” to the one
supporting the VMs; the CUDA APIs invocation’s latency is reduced due the use of intra machine
communication channels. ii) The application runs locally; the GVirtuS back-end is installed on a
publically accessible GPGPU accelerator server; the application invokes the CUDA APIs without any
VM mediation.
In relation to SLAM/VMM, the dynamic resizing of resources depends on the allocation of Flavors
that has been predefined in OpenStack. The limit of increment of resources of the VM will be limited
by the existing Flavors at that moment.
OpenStack does not allow resource resizing without restarting the VM. The restart of the machine
implies that if a client had tasks running these would be lost. In order to solve this problem, we added
a communication flow between VMM / AS / AC so that the client confirms the resizing and does not
lose the processes in progress.
D7.1 System Integration and Validation
Page 60 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
OpenStack has a delay time for updating the actual usage information of the resources when the VM
has just been started or resized. It takes some minutes for the VMM to get the correct usage
information. Therefore, the monitoring process of the SLAM cannot enforce the QoS policy after the
VM is just started or resized.
In future versions of RAPID it would be possible to give high availability, autonomy and fault
tolerance to each component, giving the possibility to raise multiple instances in different physical
machines
D7.1 System Integration and Validation
Page 61 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
References
[1] “European Cloud Strategy,” [Online]. Available: https://ec.europa.eu/digital-single-
market/node/10565.
[2] “Digital Single Market,” [Online]. Available: https://ec.europa.eu/commission/priorities/digital-
single-market_en.
[3] M. B. a. M. Heric, “The Changing Faces of the Cloud,” Bain & Company, New York, 2017.
[4] R. a. Markets, “Internet of Things (IoT) Market Shares, Strategies, and Forecasts 2017 to 2023,”
2017.
[5] Cisco, “Complete Visual Networking Index (VNI) Forecast,” 2017.
[6] RAPID, “D3.1: Specifications of System Components and Programming Model,” H2020-644312
RAPID Deliverable Report, 2015.
[7] RAPID, “D5.4 Server Integration, Runtime Optimizations and testing,” H2020-644312 RAPID
Deliverable Report, 2017.
[8] RAPID, “D5.3 Development of QoS Support,” H2020-644312 RAPID Deliverable Report, 2016.
[9] Openstack, “OpenStack Ceilometer’s Documentation,” 27 9 2017. [Online]. Available:
https://docs.openstack.org/ceilometer/latest/. [Accessed 28 9 2017].
[10] RAPID, “D4.5 Client Integration, Runtime Optimizations and Testing,” H2020-644312 RAPID
Deliverable Report, 2017.
[11] RAPID, “D6.3 : Development of Execution Engine,” H2020-644312 RAPID Deliverable Report,
2017.
[12] RAPID, “D5.1 Execution Environment,” H2020-644312 RAPID Deliverable Report, 2016.
[13] RAPID, “D5.2 Development of Task Scheduler,” H2020-644312 RAPID Deliverable Report,
2016.
[14] RAPID, “D6.1: Cloud Infrastructure Hardware,” H2020-644312 RAPID Deliverable Report,
2016.
[15] RAPID, “D3.4: Specifications of Accelerator Infrastructures,” H2020-644312 RAPID
Deliverable Report, 2015.
[16] RAPID, “D2.1: Application analysis and system requirements,” H2020-644312 RAPID
Deliverable Report, 2015.
D7.1 System Integration and Validation
Page 62 of 62
This document is Public, and was produced under the RAPID project (EC contract 644312).
[17] RAPID, “D4.6: Development of Acceleration Compiler,” H2020-644312 RAPID Deliverable
Report, 2017.
[18] RAPID, “D4.3: Development of Registration Process,” H2020-644312 RAPID Deliverable
Report, 2017.
[19] RAPID, “D4.1: Development of Design Space Exploration Engine,” H2020-644312 RAPID
Deliverable Report, 2016.
[20] RAPID, “D4.2: Development of Dispatch/Fetch Engines,” H2020-644312 RAPID Deliverable
Report, 2016.
[21] RAPID, “D3.2: Specifications of Acceleration Client and Compiler,” H2020-644312 RAPID
Deliverable Report, 2015.
[22] RAPID, “D3.3: Specifications of Acceleration Server Platform,” H2020-644312 RAPID
Deliverable Report, 2015.