combining genetic algoriths and constraint programming to support stress testing of task deadlines
TRANSCRIPT
ESEC/FSE 2015
Bergamo, 02/09/2015
Combining Genetic Algorithms and Constraint Programming
to Support Stress Testing of Task Deadlines
Stefano Di Alesio 1,2
Lionel Briand 2
Shiva Nejati 2
Arnaud Gotlieb 1
1 Certus Centre for Software V&V
Simula Research Laboratory
Norway
2 Interdisciplinary Centre for Reliability, Security and Trust (SnT)
University of Luxembourg
Luxembourg
1 2
We present GA+CP: a search strategy to identify
scenarios likely to violate task deadlines in RTES
Stefano Di Alesio โ 2/21
Problem Statement: Stress Testing of Task
Deadlines in Real Time Embedded Systems (RTES)
Key Results: Efficiency, Effectiveness,
Diversity and Scalability
Proposed Solution: Search for worst-case
schedules with a combined GA+CP strategy
Safety-critical RTES have to meet strict Performance
Requirements to be deemed safe for operation
Real-Time Embedded System (RTES)
Embedded SoftwareSensors Actuators
Operating
Environment
Computing Platform
Stefano Di Alesio โ 3/21
Systematic stress and performance testing is highly
recommended when certifying safety-critical RTES
IEC 61508 deems stress testing as
highly recommended for SIL 3-4
Stress Testing: โTesting in which a system is subjected to [โฆ]
harsh inputs [โฆ] with the intention of breaking itโ
โ Boris Beizer
Arrival times for aperiodic tasks Worst-case scenarios
Stefano Di Alesio โ 4/21
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
RTES usually have concurrent interdependent tasks
executed by a priority-driven preemptive scheduler
๐๐
๐๐๐๐
๐ ๐๐๐
๐๐
๐๐๐๐
๐ ๐๐๐
๐๐
๐๐๐๐
๐ ๐๐๐
๐๐๐
๐๐๐๐๐๐๐๐
๐ ๐๐๐๐๐๐๐
๐ = ๐
๐๐๐๐๐๐๐๐๐๐๐
๐๐๐๐๐๐๐
๐๐๐๐
๐๐๐๐
๐๐๐๐
Stefano Di Alesio โ 5/21
Tasks can trigger other tasks, or share
computational resources with them
Each task has a deadline (i.e., latest
finishing time) w.r.t. its arrival time
Some task properties depend on the
environment, others are design choices
Periodic Triggered Resource Aperiodic
0
1
2
3
4
5
6
7
8
9
Particular sequences of arrival times may
determine scenarios violating task deadlines
๐๐, ๐๐, ๐๐ arrive at ๐๐๐, ๐๐๐, ๐๐๐ and
must finish before ๐ ๐๐, ๐ ๐๐, ๐ ๐๐
๐๐ can miss its deadline ๐ ๐๐depending on when ๐๐๐ occurs!
0
1
2
3
4
5
6
7
8
9
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐ = ๐
๐๐๐๐๐๐๐
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐ = ๐
๐๐๐๐๐๐๐
A sequence of arrival times which is likely to violate
a task deadline characterizes a stress test case
Stefano Di Alesio โ 6/21
Periodic Triggered Aperiodic Periodic Triggered Aperiodic
Several techniques have been used for solving this
problem, but each has its own drawbacks
Formal Verification Testing
Scheduling
Theory
Model
Checking
Performance
Engineering
Genetic
Algorithms
Constraint
Programming
BackgroundQueuing
Theory
Fixed-point
Computation
Practice and
ToolsMetaheuristics
Artificial
Intelligence
Key
FeaturesTheorems
Symbolic
Execution
Dynamic
Analysis
Randomized
Search
Complete
Search
Drawbacks AssumptionsComplex
Modeling
Non
SystematicIneffective [1]
Inefficient [1],
Low diversity
Stefano Di Alesio โ 7/21
[1] Di Alesio, S., Nejati, S., Briand, L., and Gotlieb, A. (2013). Stress Testing of Task Deadlines: A Constraint Programming Approach. In
Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pages 158โ167. IEEE.
GA is efficient and diverse: quickly generates test
cases involving different interactions between tasks
CP is effective: generates test cases that
are more likely to violate task deadlines
GA + CP
Stefano Di Alesio โ 8/21
[2] Nejati, S., Di Alesio, S., Sabetzadeh, M., Briand, L.: Modeling and Analysis of CPU Usage in Safety-critical Embedded Systems to
Support Stress Testing. In: Model Driven Engineering Languages and Systems (MODELS), pp. 759โ775. Springer (2012)
Automated Search
Optimization ProblemFind arrival times that maximize the chance of
violating task deadlines
SolutionsTask arrival times likely to violate
task deadlines
Deadline Misses
Analysis
System Platform
System Design Design and Platform Models Timing and concurrency information about
the RTES software and computing platform
INPUT
OUTPUT๐๐๐ = ๐๐๐๐๐ = ๐๐โฆ
Stress Test Cases
UML/MARTE Modeling [2]
We cast the generation of stress test cases as an
Optimization Problem over the task arrival times
Genetic Algorithms (GA) +
Constraint Programming (CP)
โขTriggering Relationship:
๐๐๐๐๐๐๐๐ ๐๐, ๐๐โขDependency Relationship:
๐ ๐๐๐๐๐ ๐๐๐ ๐๐, ๐๐ , ๐ ๐๐๐๐๐ ๐๐๐(๐๐, ๐๐)โข Impacting Relationship: ๐๐๐๐๐๐๐ ๐๐, ๐๐ , ๐๐๐๐๐๐๐ ๐๐, ๐๐ ,๐ฐ ๐๐ = {๐๐, ๐๐}
โขObservation Interval: ๐ป = ๐, ๐โขNumber of cores: ๐ = ๐โขSet of Tasks: ๐ฑ = {๐๐, ๐๐, ๐๐, ๐๐}โขPriority of Tasks: ๐๐๐๐๐๐๐๐ ๐๐ = ๐โขPeriod of Tasks: ๐๐๐๐๐๐ ๐๐ = ๐โขMin/Max Inter-arrival time of Tasks:
๐๐๐_๐๐ ๐๐ = ๐,๐๐๐_๐๐ ๐๐ = ๐๐โขDuration of Tasks: ๐ ๐๐๐๐๐๐๐ ๐๐ = ๐โขDeadline of Tasks: ๐ ๐๐๐ ๐๐๐๐ ๐๐ = ๐
0
1
2
3
4
5
6
7
8
9
๐๐๐๐๐๐๐
๐๐ ๐๐ ๐๐๐๐๐ ๐๐
๐๐๐๐
๐๐๐๐
๐๐๐๐
Static Properties depend on the RTES design (are
known), and express constraints on task execution
Stefano Di Alesio โ 9/21
Assumption 1: Time is discretized in time quanta
Assumption 2: The time for switching context between
tasks is negligible w.r.t. a time quantum
๐ = ๐
Aperiodic Triggered Resource Periodic Aperiodic
Independent PropertiesโขNumber of Task Executions: ๐๐๐๐_๐๐๐๐๐๐๐๐๐๐ ๐๐ = ๐,๐๐๐๐_๐๐๐๐๐๐๐๐๐๐ ๐๐ = ๐
โขArrival time of Aperiodic Task Exec.:
๐๐๐๐๐๐๐_๐๐๐๐ ๐๐, ๐ = ๐,๐๐๐๐๐๐๐_๐๐๐๐ ๐๐, ๐ = ๐
Dynamic Properties depend on the RTES runtime
behavior, and are not known prior to the analysis
Stefano Di Alesio โ 10/21
Dependent PropertiesโขActive time of Task Executions:
๐๐๐๐๐๐ ๐๐, ๐, ๐ = ๐, ๐๐๐๐๐๐ ๐๐, ๐, ๐ = ๐โขStart/End time of Task Executions:
๐๐๐๐๐ ๐๐, ๐ = ๐, ๐๐๐ ๐๐, ๐ = ๐โขPreempted Time Quanta in :
๐๐๐๐๐๐๐๐๐ ๐๐, ๐, ๐ = ๐ โ ๐ โ ๐ = ๐โขSystem Load: ๐๐๐๐ ๐ = ๐โขDeadline Miss of Task Executions:
๐ ๐๐๐ ๐๐๐๐_๐๐๐๐ ๐๐, ๐ = ๐ โ ๐ = โ๐
Independent Properties characterize stress test cases
Dependent Properties characterize the expected reaction of the
system to the events modeled by the Independent properties
0
1
2
3
4
5
6
7
8
9
๐๐๐๐๐๐๐
๐๐ ๐๐ ๐๐๐๐๐ ๐๐
๐๐๐๐
๐๐๐๐
๐๐๐๐
๐ = ๐
Aperiodic Triggered Resource Periodic Aperiodic
Stefano Di Alesio โ 11/21
Both GA and CP cast the search for arrival times that
violate task deadlines as an optimization problem
[3] L. Briand, Y. Labiche, and M. Shousha, โUsing Genetic Algorithms for Early Schedulability Analysis and Stress Testing in Real-Time
Systemsโ, Genetic Programming and Evolvable Machines, vol. 7 no. 2, pp. 145-170, 2006
[4] Di Alesio, S., Nejati, S., Briand, L., and Gotlieb, A. (2014). Worst-Case Scheduling of Software Tasks โ A Constraint Optimization
Model to Support Performance Testing. In Principles and Practice of Constraint Programming (CP 2014)
Properly rewards scenarios
with deadline misses [3]
Genetic Algorithms [3] Constraint Programming [4]
Static Properties of Tasks Chromosomes Properties Constants
Dynamic Properties of Tasks Chromosomes Genes Variables
OS Scheduler Behavior Chromosomes Evaluation Constraints
Deadline Misses Requirement Fitness Function Objective Function
Efficient and diverse,
but ineffective
Effective, but inefficient
and non-diverse
๐ญ๐ซ๐ด =
๐,๐
๐๐ ๐๐๐ ๐๐๐๐_๐๐๐๐(๐,๐)
Stefano Di Alesio โ 12/21
Therefore, we looked into a way to retain the
practical advantages of GA and CP in isolation
Search Space
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐๐๐
๐๐
๐๐
๐๐
๐๐
๐๐๐โ
๐โ
1. GA-step: ๐๐, ๐๐, ๐๐ evolve into ๐๐, ๐๐, ๐๐2. CP-step: ๐๐, ๐๐, ๐๐ are optimized into ๐โ, ๐โ, ๐โ
= ๐โ
The key idea behind GA+CP is to run complete searches
with CP in the neighborhood of solutions found by GA
2-steps strategy
Stefano Di Alesio โ 13/21
GA+CP only looks in the neighborhood of tasks that
can have an impact on task deadlines in a solution
๐๐ and ๐๐ have an indirect impact on ๐๐because they have higher priority than ๐๐
๐ฑโ ๐ = ๐๐๐ฐ๐๐ ๐ = ๐๐ , ๐๐, ๐๐, ๐๐๐๐๐๐๐๐๐_๐๐๐๐ ๐๐, ๐ = ๐๐ โ ๐ซ โค ๐๐๐๐๐๐๐_๐๐๐๐(๐๐, ๐) โค ๐ +๐ซ๐ โ ๐ซ โค ๐๐๐๐๐๐๐_๐๐๐๐(๐๐, ๐) โค ๐ +๐ซ๐ โ ๐ซ โค ๐๐๐๐๐๐๐_๐๐๐๐(๐๐, ๐) โค ๐ +๐ซ๐ โ ๐ซ โค ๐๐๐๐๐๐๐_๐๐๐๐(๐๐, ๐) โค ๐ +๐ซ
๐๐ฎ๐จ = ๐ , ๐ , ๐ , ๐ , ๐ ๐๐ฎ๐จ+๐ช๐ท = ๐ , ๐ , ๐ , ๐ , ๐
๐๐ has a direct impact on ๐๐because it depends on ๐๐
Dependent tasks Dependent tasks
Stefano Di Alesio โ 14/21
We compared GA+CP with GA and CP in isolation
on 5 systems from safety-critical domains
RQ1 โ Efficiency: time needed
to generate test cases
RQ2 โ Effectiveness: revealing
power of worst-case scenarios
RQ3 โ Diversity: capability to exercise the
system w.r.t. different patterns (i.e., coverage)
DomainTasks
LogsizePeriodic Aperiodic
ICS: Ignition Control System Automotive 3 3 446.7
CCS: Cruise Control System Automotive 8 3 551.6
UAV: Unmanned Air Vehicle Avionics 12 4 671.5
GAP: Generic Avionics Platform Avionics 15 8 709.4
HPSS: Herschel-Planck Satellite System Aerospace 23 9 836.6
RQ4 โ Scalability: extent to which the
system size affects the efficiency
Stefano Di Alesio โ 15/21
MetricsโขComputation time ๐(๐) of a solution ๐ โ ๐ฟโขSum ๐(๐) of time quanta in deadline misses
๐โ ๐๐โ ๐ฟ๐
โ
โขNumber ๐(๐) of tasks that miss a deadline
๐โ ๐๐โ ๐ฟ๐
โ
โขNumber ๐(๐) of task execs. that miss a deadline
๐โ ๐๐โ ๐ฟ๐
โ
0
1
2
3
4
5
6
7
8
9
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐๐
๐๐๐
๐ ๐๐
๐ = ๐
๐๐๐๐๐๐๐
๐ฟ = ๐
๐ = ๐ , ๐ , ๐
๐ = ๐๐ = ๐๐ = ๐
AttributesโขRQ1 โ Efficiency ๐ผ: computation time of the best solution
๐ผ๐ = ๐ ๐๐โ ๐ผ๐ = ๐ ๐๐
โ ๐ผ๐ = ๐ ๐๐โ
โขRQ2 โ Effectiveness ๐ฟ: metric value of the best solution
๐ฟ๐ = ๐โ ๐ฟ๐ = ๐
โ ๐ฟ๐ = ๐โ
โขRQ3 โ Number ๐ต of best solutions
๐ต๐ = ๐ฟ๐โ ๐ต๐ = ๐ฟ๐
โ ๐ต๐ = ๐ฟ๐โ
โขRQ3 โ Diversity ๐น: extent to which solutions exercise the
system in different ways
We formalize aspects of practical interest related to
the Research Questions as metrics and attributesPeriodic Triggered Aperiodic
Stefano Di Alesio โ 16/21
High diversity entails that test cases thoroughly
exercise interactions between task executions
0
1
2
3
4
5
6
7
8
9
๐๐
0
1
2
3
4
5
6
7
8
9
๐๐
0
1
2
3
4
5
6
7
8
9
๐๐
0
1
2
3
4
5
6
7
8
9
๐๐
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
๐๐๐๐
Diversity ๐น๐ w.r.t.
execution shift
Diversity ๐น๐ w.r.t.
execution pattern
Diversity ๐น๐ w.r.t.
number of executions
๐น๐ = |๐ โ ๐| + |๐ โ ๐| = ๐ ๐น๐ = |๐ โ ๐| + |๐ โ ๐| = ๐ ๐น๐ = ๐ โ ๐ = ๐
Stefano Di Alesio โ 17/21
In GA+CP, we instructed CP to terminate the local
search after two hours
The time taken by CP to terminate the local searches was not significant
with respect to the time taken by GA to generate its solutions
Stefano Di Alesio โ 18/21
GA+CP achieves trade-off between the efficiency
and diversity of GA, and the effectiveness of CP
Wilcoxon rank-sum test (GA+CP vs GA)
Wilcoxon signed-rank test (GA+CP vs CP)
๐ผ ๐ฎ๐จ + ๐ช๐ท โ ๐ผ ๐ฎ๐จ > ๐ผ(๐ช๐ท)๐ฟ ๐ฎ๐จ + ๐ช๐ท โ ๐ฟ ๐ช๐ท > ๐ฟ(๐ฎ๐จ)๐น ๐ฎ๐จ + ๐ช๐ท โ ๐น ๐ฎ๐จ > ๐น(๐ช๐ท)
Stefano Di Alesio โ 19/21
The effect the system size has over the efficiency of
GA+CP (RQ4 โ Scalability) is similar to that of GA
Our experiment was performed on 5 case studies
โ No quantitative scalability study
In our experiments, ๐ซ โ๐ป
๐๐๐proved to yield satisfactory results
โ But in larger case studies ๐ซ might also have to be larger
CP (โข) GA (โ ) GA+CP (โฒ)
ICS CCS UAV GAP HPSS ICS CCS UAV GAP HPSS ICS CCS UAV GAP HPSS
Stefano Di Alesio โ 20/21
Future directions include investigating test suite
reduction strategies and multiobjective optimization
Deadline Misses Response TimeCPU Usage
Multiobjective optimization allows to investigate scenarios
where multiple requirements are close to be violated
Find the minimal set of test cases that retain some relevant property
(e.g.: cover all the task executions predicted to miss a deadline)
Execution 1 Execution 2 Execution 3 Execution 4 Execution 5
Test Case 1 X X
Test Case 2 X X
Test Case 3 X X
Test Case 4 X X
In summary, GA+CP casts stress testing of task
deadlines misses as an optimization problem
GA explores the search space, and CP
exploits the solutions found by GA
The CP search is complete, but only along
โpromisingโ directions (impacting tasks)
Stefano Di Alesio โ 21/21
Questions?
This design yields trade-off in efficiency,
diversity (GA) and effectiveness (CP)