tarot2013 testing school - gilles perrouin presentation
DESCRIPTION
TAROT 2013 9th International Summer School on Training And Research On Testing, Volterra, Italy, 9-13 July, 2013 These slides summarize Gilles Perrouin's presentation about "Feature-based Testing of SPLs: Pairwise and Beyond"TRANSCRIPT
Feature-based Testing of SPLs: Pairwise and Beyond
Gilles Perrouin
TAROT Summer School 2013
DISCLAIMERS
About me...
About me...PhD (2007) jointly from U. Luxembourg & UNamur
Requirements engineering, Software Architecture and Development Methodologies for SPLs
About me...PhD (2007) jointly from U. Luxembourg & UNamur
Requirements engineering, Software Architecture and Development Methodologies for SPLs
Postdoc (2007-2009) in INRIA: MDSPLE and SPL Testing (2009)
About me...PhD (2007) jointly from U. Luxembourg & UNamur
Requirements engineering, Software Architecture and Development Methodologies for SPLs
Postdoc (2007-2009) in INRIA: MDSPLE and SPL Testing (2009)
Since 2010, UNamur, SPL Testing funded by FNRS (3yr grant since Oct. 2012)
About me...PhD (2007) jointly from U. Luxembourg & UNamur
Requirements engineering, Software Architecture and Development Methodologies for SPLs
Postdoc (2007-2009) in INRIA: MDSPLE and SPL Testing (2009)
Since 2010, UNamur, SPL Testing funded by FNRS (3yr grant since Oct. 2012)
I still have SO much to learn about software testing...
Acknowlegments...Benoit Baudry, Sagar Sen, Jacques Klein, Yves Le Traon, Sebastian Oster
Arnaud Gotlieb, Aymeric Hervieu
Xavier Devroey, Maxime Cordy, Patrick Heymans, Pierre-Yves Schobbens, Axel Legay, Eun-Young Kang, Andreas Classen
Christopher Hénard, Mike Papadakis
How it all started....
How it all started....
[xkcd.com]
Product Derivation
Perrouin et al. "Reconciling automation and flexibility in product derivation." SPLC'08 pp 339-348, IEEE
Product Derivation
Perrouin et al. "Reconciling automation and flexibility in product derivation." SPLC'08 pp 339-348, IEEE
Product Derivation
Perrouin et al. "Reconciling automation and flexibility in product derivation." SPLC'08 pp 339-348, IEEE
!"#$%&#'(')'#*'+,-%&$%./
,0$0#.12$-",$"-'
30$'1.-%'+4-.5",$6/7.-80$%./
9*6801'
:++.,%0$'5:++'$+
FM configuration
Product (model)
Research Question (2009)
Research Question (2009)How to design model fragments so that they compose well together ?
Methodological hints are insufficient
Need for an automated approach to validate SPL models...
Research Question (2009)How to design model fragments so that they compose well together ?
Methodological hints are insufficient
Need for an automated approach to validate SPL models...
Testing view: Extract relevant configurations of the SPL and build them (composition = oracle)
Challenges
2N
Challenges
NASA JPL
Challenges
NASA JPL
270 Features
Testing the universe ???
Testing the universe ???Renault Vans: 1021 possible vehicles...
Testing the universe ???Renault Vans: 1021 possible vehicles...
Source: Astesana et al."Constraint-based vehicle configuration: a case study." Tools with Artificial Intelligence (ICTAI), IEEE, 2010.
Testing the universe ???Renault Vans: 1021 possible vehicles...
Source: Astesana et al."Constraint-based vehicle configuration: a case study." Tools with Artificial Intelligence (ICTAI), IEEE, 2010.
The Linux Kernel FM ! 7,000 features
Testing the universe ???Renault Vans: 1021 possible vehicles...
Source: Astesana et al."Constraint-based vehicle configuration: a case study." Tools with Artificial Intelligence (ICTAI), IEEE, 2010.
The Linux Kernel FM ! 7,000 features
Source: S. She, R. Lotufo, T. Berger, A. Wasowski, and K. Czarnecki, “Reverse engineering feature models,” in ICSE, 2011, pp. 461–470.
Testing the universe ???Renault Vans: 1021 possible vehicles...
Source: Astesana et al."Constraint-based vehicle configuration: a case study." Tools with Artificial Intelligence (ICTAI), IEEE, 2010.
The Linux Kernel FM ! 7,000 features
Source: S. She, R. Lotufo, T. Berger, A. Wasowski, and K. Czarnecki, “Reverse engineering feature models,” in ICSE, 2011, pp. 461–470.
The General Motors PL comprises a few thousands of features
Testing the universe ???Renault Vans: 1021 possible vehicles...
Source: Astesana et al."Constraint-based vehicle configuration: a case study." Tools with Artificial Intelligence (ICTAI), IEEE, 2010.
The Linux Kernel FM ! 7,000 features
Source: S. She, R. Lotufo, T. Berger, A. Wasowski, and K. Czarnecki, “Reverse engineering feature models,” in ICSE, 2011, pp. 461–470.
The General Motors PL comprises a few thousands of features
Source: Flores et al,. Mega-scale product line engineering at General Motors. SPLC '12, pp 259-268
Testing the universe ???
Specific Challenges
Specific ChallengesMDSPLE Context
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Applicability for SPL Engineer
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Applicability for SPL Engineer
No a priori knowledge of the SPL
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Applicability for SPL Engineer
No a priori knowledge of the SPL
Difficult to pinpoint faulty assets in a symmetric composition approach (faulty interactions likely)...
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Applicability for SPL Engineer
No a priori knowledge of the SPL
Difficult to pinpoint faulty assets in a symmetric composition approach (faulty interactions likely)...
Abstract models
Specific ChallengesMDSPLE Context
Integration in early SPL lifecycle (requirements/design level)
Applicability for SPL Engineer
No a priori knowledge of the SPL
Difficult to pinpoint faulty assets in a symmetric composition approach (faulty interactions likely)...
Abstract models
Incremental testing [Uzu08,Loc12] unsuitable
Sampling-Based Top-down Testing
Sampling-Based Top-down Testing1) Select relevant configurations from the FM
2) Derive or retrieve the products realizing those configurations
3) Write/Select test cases associated to those products
4) Run test cases on some inputs
Sampling-Based Top-down Testing1) Select relevant configurations from the FM
2) Derive or retrieve the products realizing those configurations
3) Write/Select test cases associated to those products
4) Run test cases on some inputs
A simple (simplistic?) scenario of which Steps #1 and #2 kept us busy the last 4 years => focus of this talk
Sampling-Based Top-down Testing1) Select relevant configurations from the FM
2) Derive or retrieve the products realizing those configurations
3) Write/Select test cases associated to those products
4) Run test cases on some inputs
A simple (simplistic?) scenario of which Steps #1 and #2 kept us busy the last 4 years => focus of this talk
Sampling-Based Top-down Testing1) Select relevant configurations from the FM
2) Derive or retrieve the products realizing those configurations
3) Write/Select test cases associated to those products
4) Run test cases on some inputs
A simple (simplistic?) scenario of which Steps #1 and #2 kept us busy the last 4 years => focus of this talk
Sampling-Based Top-down Testing1) Select relevant configurations from the FM
2) Derive or retrieve the products realizing those configurations
3) Write/Select test cases associated to those products
4) Run test cases on some inputs
A simple (simplistic?) scenario of which Steps #1 and #2 kept us busy the last 4 years => focus of this talk
Agenda
AgendaFM-level Configuration Selection
T-wise SAT-based with Alloy
Similarity-Driven and prioritization with evolutionary algorithms
Tool Demo
Multi-objective
AgendaFM-level Configuration Selection
T-wise SAT-based with Alloy
Similarity-Driven and prioritization with evolutionary algorithms
Tool Demo
Multi-objective
Going Beyond: Unifying Verification and Test for SPLs
AgendaFM-level Configuration Selection
T-wise SAT-based with Alloy
Similarity-Driven and prioritization with evolutionary algorithms
Tool Demo
Multi-objective
Going Beyond: Unifying Verification and Test for SPLs
On the (actual) trustability of FMs
AgendaFM-level Configuration Selection
T-wise SAT-based with Alloy
Similarity-Driven and prioritization with evolutionary algorithms
Tool Demo
Multi-objective
Going Beyond: Unifying Verification and Test for SPLs
On the (actual) trustability of FMs
CIT for SPLs
CIT for SPLs
CIT for SPLs
Pros
Addresses the feature interaction problem
Small test suites (compared to 10^X possible tests)
CIT for SPLs
Pros
Addresses the feature interaction problem
Small test suites (compared to 10^X possible tests)
Cons (2009)
Poor support for constraints
Limited SPL tool support[Cohen2006,Cohen2007]: FMs " Covering Arrays (input pb)
T-wise Coverage as a SAT problem
T-wise Coverage as a SAT problem
FM
Viewed as a set of constraints between boolean features
T-wise Coverage as a SAT problem
FM
Viewed as a set of constraints between boolean features
T-wise
Can be seen as a SAT problem: “Set of valid configurations that satisfy the conjunction of all t-tuples of features”
Rely on SAT solvers for SPL T-wise testing
Related IssuesHowever FMs are not SAT solvers’ inputs… (Usability perspective)
Need to devise a solution to encode automatically FMs and T-wise selection problem
Scalability
SAT solving is NP-complete
We don’t know how to predict in advance the difficulty of a given problem => need to assess it experimentally
Pragmatic solutions have to be found to address concrete scalability issues
Approach overview
Approach overview
Use Alloy [Jac2006] as an intermediate representation between FM+T-wise and SAT solvers
Use MDE (EMF, Kermeta) to generate alloy specs
Approach overview
Use Alloy [Jac2006] as an intermediate representation between FM+T-wise and SAT solvers
Use MDE (EMF, Kermeta) to generate alloy specs
Scalability: “divide-and-compose”
Split t-tuples in solvable sets
Generate Alloy commands and solve sets
Recompose solutions in an unique configuration suite
Approach overview
Use Alloy [Jac2006] as an intermediate representation between FM+T-wise and SAT solvers
Use MDE (EMF, Kermeta) to generate alloy specs
Scalability: “divide-and-compose”
Split t-tuples in solvable sets
Generate Alloy commands and solve sets
Recompose solutions in an unique configuration suite
Configurable JAVA-based toolset performing test selection and analysis of the selection strategies
Approach overview
Use Alloy [Jac2006] as an intermediate representation between FM+T-wise and SAT solvers
Use MDE (EMF, Kermeta) to generate alloy specs
Scalability: “divide-and-compose”
Split t-tuples in solvable sets
Generate Alloy commands and solve sets
Recompose solutions in an unique configuration suite
Configurable JAVA-based toolset performing test selection and analysis of the selection strategies
Evaluating T-wise Generation
Generation time
Evaluating T-wise Generation
Generation time
Generated configurations size
Evaluating T-wise Generation
Generation time
Generated configurations size
T-tuple occurrence: how many times a given T-tuple appears ?
Evaluating T-wise Generation
Generation time
Generated configurations size
T-tuple occurrence: how many times a given T-tuple appears ?
Number of duplicates (“divide-and-compose” strategies)
Evaluating T-wise Generation
Generation time
Generated configurations size
T-tuple occurrence: how many times a given T-tuple appears ?
Number of duplicates (“divide-and-compose” strategies)
Similarity: how different are my configurations ?
Evaluating T-wise Generation
Generation time
Generated configurations size
T-tuple occurrence: how many times a given T-tuple appears ?
Number of duplicates (“divide-and-compose” strategies)
Similarity: how different are my configurations ?
Evaluating T-wise Generation
Sim(tci, tcj) =�Tciv ∩ Tcjv��Tciv ∪ Tcjv�
Generation time
Generated configurations size
T-tuple occurrence: how many times a given T-tuple appears ?
Number of duplicates (“divide-and-compose” strategies)
Similarity: how different are my configurations ?
Tciv : Variant features[Benavides2010] of configuration ‘i’
Evaluating T-wise Generation
Sim(tci, tcj) =�Tciv ∩ Tcjv��Tciv ∪ Tcjv�
Initial Assessment
Initial AssessmentComputed those metrics on a case-study varying scope and generation time
Initial AssessmentComputed those metrics on a case-study varying scope and generation time
Wrote paper (that brought me here)
G. Perrouin, S. Sen, J. Klein, B. Baudry, and Y. Le Traon. "Automated and scalable t-wise test case generation strategies for software product lines." ICST 2010 pp. 459-468, IEEE.
Initial AssessmentComputed those metrics on a case-study varying scope and generation time
Wrote paper (that brought me here)
G. Perrouin, S. Sen, J. Klein, B. Baudry, and Y. Le Traon. "Automated and scalable t-wise test case generation strategies for software product lines." ICST 2010 pp. 459-468, IEEE.
Experimentation
Comparing T-wise Approaches
Comparing T-wise Approaches Our experience raised some interest:
Oster, Sebastian, Florian Markert, and Philipp Ritter. "Automated incremental pairwise testing of software product lines." Software Product Lines: Going Beyond. Springer Berlin Heidelberg, 2010. 196-210.
Use a “home-made” CSP solver
Comparing T-wise Approaches Our experience raised some interest:
Oster, Sebastian, Florian Markert, and Philipp Ritter. "Automated incremental pairwise testing of software product lines." Software Product Lines: Going Beyond. Springer Berlin Heidelberg, 2010. 196-210.
Use a “home-made” CSP solver
Conflicting philosophies
Generality for Alloy-based
Specialization for CSP-based
Key Differences
Key Differences FM Expressivity
Key Differences FM Expressivity
CSP-Based Alloy-Based
Cardinalities - +
Binary Constraints + +
N-ary Constraints - +
Key Differences FM Expressivity
Key Differences FM Expressivity
Scalability
‘A priori’: Flattening of the FM
‘A posteriori’ : “divide-and-compose” strategies
Key Differences FM Expressivity
Scalability
‘A priori’: Flattening of the FM
‘A posteriori’ : “divide-and-compose” strategies
Determinism
CSP-based provides always the same suite on a given FM
Alloy-based can produce very different test suites due to random tuple combinations and scope influence
Comparison Results
Comparison ResultsFMs from SPLOT [Mendonca2009]
T=2 CP SH AG MT ES
Features 19 35 61 88 287
Configurations 61 1E+06 3.30E+09 1.65E+13 2.26E+49
CTCR (%) 26 0 55 0 11
CSP (ms) 0 0 32 46 797
BinSplit (ms) 11812 11457 33954 >9h >9h
IncGrowth (ms)
56494 137209413847835
(4h)>9h >9h
Comparison Results
Comparison Results
CSP 1000 times faster
Selected Configurations Sizes
t=2 t=3 CPCP SHSH AGAG MTMT ESES
CSPCSP 8 23 40 61 46 257 92 643 215 841
BinSplitBinSplit 12 207 92 - 514 - - - - -
IncGrowthIncGrowth 15 133 28 - 74 - - - - -
Conclusions
ConclusionsCSP outperforms alloy-based
ConclusionsCSP outperforms alloy-based
Generation time and configurations size
ConclusionsCSP outperforms alloy-based
Generation time and configurations size
Automated and scalable t-wise test case generation strategies for software product lines
ConclusionsCSP outperforms alloy-based
Generation time and configurations size
Automated and scalable t-wise test case generation strategies for software product lines
Specialization > Generality
ConclusionsCSP outperforms alloy-based
Generation time and configurations size
Automated and scalable t-wise test case generation strategies for software product lines
Specialization > Generality
More details
Perrouin, Gilles, Sebastian Oster, Sagar Sen, Jacques Klein, Benoit Baudry, and Yves Le Traon. "Pairwise testing for software product lines: Comparison of two approaches." Software Quality Journal 20, no. 3-4 (2012): 605-643.
Lessons Learned
Lessons Learned
Lessons Learned
Lessons LearnedWhat went wrong
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Gives you insights design decisions: e.g. flattening and expressivity
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Gives you insights design decisions: e.g. flattening and expressivity
Choose your case studies wisely
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Gives you insights design decisions: e.g. flattening and expressivity
Choose your case studies wisely
Go for repositories when they exist
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Gives you insights design decisions: e.g. flattening and expressivity
Choose your case studies wisely
Go for repositories when they exist
Publish your models/tools so that others can play with them
Lessons LearnedWhat went wrong
Alloy was not really meant for this: interactive model exploration " batch tool chain running continuously
Exotic case study
Comparison is good
A testing tool is software too => should be tested ;)
Gives you insights design decisions: e.g. flattening and expressivity
Choose your case studies wisely
Go for repositories when they exist
Publish your models/tools so that others can play with them
Meanwhile...
Meanwhile...SPL-specific
Pacogen [Hervieu2011]
SPLCAT [Johansen2011,2012a,2012b]
Search-based [Ensan2012,Garvin2009]
Meanwhile...SPL-specific
Pacogen [Hervieu2011]
SPLCAT [Johansen2011,2012a,2012b]
Search-based [Ensan2012,Garvin2009]
Scalability greatly improved
From dozens to thousands of features (approx. 7,000) for t=2
Meanwhile...SPL-specific
Pacogen [Hervieu2011]
SPLCAT [Johansen2011,2012a,2012b]
Search-based [Ensan2012,Garvin2009]
Scalability greatly improved
From dozens to thousands of features (approx. 7,000) for t=2
Tool availability
Problem Solved ?
Problem Solved ?
Problem Solved ?
Problem Solved ?What about higher values of t (3,4,5,6)?
Problem Solved ?What about higher values of t (3,4,5,6)?
480 2-wise configurations for Linux FM: Where to start?
Problem Solved ?What about higher values of t (3,4,5,6)?
480 2-wise configurations for Linux FM: Where to start?
t-wise coverage remains essentially difficult to compute for large models...
We used similarity to evaluate generated configurations
We used similarity to evaluate generated configurations
H. Hemmati et al, “Achieving scalable model-based testing through test case diversity,” ACM
TOSEM, vol. 22, no. 1, 2012.
We used similarity to evaluate generated configurations
H. Hemmati et al, “Achieving scalable model-based testing through test case diversity,” ACM
TOSEM, vol. 22, no. 1, 2012.
We used similarity to evaluate generated configurations
H. Hemmati et al, “Achieving scalable model-based testing through test case diversity,” ACM
TOSEM, vol. 22, no. 1, 2012.
Can we use similarity to mimic t-wise coverage ?
We used similarity to evaluate generated configurations
H. Hemmati et al, “Achieving scalable model-based testing through test case diversity,” ACM
TOSEM, vol. 22, no. 1, 2012.
Can we use similarity to mimic t-wise coverage ?
Intuition: dissimilar configurations cover more t-tuples than similar ones
Similarity-driven Selection
Similarity-driven SelectionUse evolutionary algorithms to evolve a population of configurations
SBSE well-suited for large configurations spaces
Ensure the validity of generated configurations (via SAT4J)
Similarity-driven SelectionUse evolutionary algorithms to evolve a population of configurations
SBSE well-suited for large configurations spaces
Ensure the validity of generated configurations (via SAT4J)
Designed for Scalability
Fitness function based on distance: correlated with t-wise coverage but easier to compute
Similarity-driven SelectionUse evolutionary algorithms to evolve a population of configurations
SBSE well-suited for large configurations spaces
Ensure the validity of generated configurations (via SAT4J)
Designed for Scalability
Fitness function based on distance: correlated with t-wise coverage but easier to compute
Flexibility
Tester decides generation time and # of configurations
Configurations are prioritized w.r.t fitness function: use a subset if lack of resources
SPL Similarity Search Problem
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
Individual = set of configurations
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
Individual = set of configurations
Population= 1 individual :)
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
Individual = set of configurations
Population= 1 individual :)
No crossover, mutation = change one gene at a time (configuration) depending on its fitness
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
Individual = set of configurations
Population= 1 individual :)
No crossover, mutation = change one gene at a time (configuration) depending on its fitness
Fitness function
SPL Similarity Search Problem (1+1) EA [Dro02] (non-local variant of HC)
Individual = set of configurations
Population= 1 individual :)
No crossover, mutation = change one gene at a time (configuration) depending on its fitness
Fitness function
f :Cm −→ R+
(C1, ..., Cm) �−→�m
j>i≥1 d(Ci, Cj)
Configuration Selection Algorithm
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
compute fitness function f
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
compute fitness function f
Prioritize configurations
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
compute fitness function f
Prioritize configurations
Global distance: make sure that each product maximizes its distance with all others
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
compute fitness function f
Prioritize configurations
Global distance: make sure that each product maximizes its distance with all others
Local distance: pairwise distance
Configuration Selection Algorithm1. Select a set of unpredictable configurations of size m from SAT solver
Unpredictable: unaffected by the solver order privileging local similar configurations (internal order of the literals and clauses)
2. While elapsedTime < t
compute fitness function f
Prioritize configurations
Global distance: make sure that each product maximizes its distance with all others
Local distance: pairwise distance
remove worst configuration => iterate on new ones until f improves
Mimicking t-wise ?
Linux FM
Better than (true) Random ?
120 FMs ∈ [11;1,000] features
Better than (true) Random ?
120 FMs ∈ [11;1,000] features
100% coverage not guaranteed
Prioritization
120 FMs ∈ [11;1,000] features
Prioritization
Prioritization
More efficient for high values of t
Taming large FMs
Ecos: 1 % more coverage implies 2,1E+15 additional 6-tuples !
1,000 configurations may not be enough...
t=6(1,000 confs)
0 runs 15,000 runs
eCos 94,191% 95,343%
FreeBSD 76,236% 76,494%
Linux 89,411% 90,671%
ConclusionsBalanced coverage and flexibility
Let testers decide w.r.t to the resources they have
Prioritization helps focusing on most covering configurations
Does not replace 100 % coverage CIT approaches for SPLs [Johansen2012b,Garvin2011] but complement them for intractable cases
Look at our TR
C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, Y. Le Traon. "Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test suites for large software product lines." arXiv preprint arXiv:1211.5451 (2012).
Prof Smith Says...
Prof Smith Says...
Prof Smith Says...
Approach not getting all interactions, CIT assumption
does not hold any more: Fault finding ability ?
I expect you to get back to work...
Using Mutation to Evaluate Similarity
Using Mutation to Evaluate SimilarityMutate FMs and use set of configurations of various similarity levels to assess their mutant killing ability
Using Mutation to Evaluate SimilarityMutate FMs and use set of configurations of various similarity levels to assess their mutant killing ability
Mutation operators: feature negation, disjunction (or) -> conjunction (and) (2 new clauses)
Using Mutation to Evaluate SimilarityMutate FMs and use set of configurations of various similarity levels to assess their mutant killing ability
Mutation operators: feature negation, disjunction (or) -> conjunction (and) (2 new clauses)
22 1010 5050
Dis Sim Dis Sim Dis Sim
eCos 60.35% 48.32% 77.83% 48.41% 83.49% 49.64%
Using Mutation to Evaluate SimilarityMutate FMs and use set of configurations of various similarity levels to assess their mutant killing ability
Mutation operators: feature negation, disjunction (or) -> conjunction (and) (2 new clauses)
22 1010 5050
Dis Sim Dis Sim Dis Sim
eCos 60.35% 48.32% 77.83% 48.41% 83.49% 49.64%
Dissimilar suites yield better mutation score in all cases
Using Mutation to Evaluate Similarity
Using Mutation to Evaluate SimilarityPreliminary Work
Using Mutation to Evaluate SimilarityPreliminary Work
Specialized mutation operators
Using Mutation to Evaluate SimilarityPreliminary Work
Specialized mutation operators
Equivalent mutant discriminations (FM semantics)
Using Mutation to Evaluate SimilarityPreliminary Work
Specialized mutation operators
Equivalent mutant discriminations (FM semantics)
More information
C. Henard, M. Papadakis, G. Perrouin, J. Klein, Y. Le Traon, Assessing Software Product Line Testing via Model-based Mutation: An Application to Similarity Testing, AMOST@ICST 2013
PLEDGE: A Product Line Editor and Test Generation Tool
C. Henard, M. Papadakis, G. Perrouin, J. Klein, Y. Le Traon, SPLC 2013 (Tool Demonstration Papers), ACM
http://research.henard.net/SPL/PLEDGE/
More Flexibility: Multi-Objective
More Flexibility: Multi-ObjectiveSelecting configurations involves several objectives
More Flexibility: Multi-ObjectiveSelecting configurations involves several objectives
Maximizing coverage
More Flexibility: Multi-ObjectiveSelecting configurations involves several objectives
Maximizing coverage
Minimizing # configurations
More Flexibility: Multi-ObjectiveSelecting configurations involves several objectives
Maximizing coverage
Minimizing # configurations
Minimizing the testing cost of each configuration
More Flexibility: Multi-ObjectiveSelecting configurations involves several objectives
Maximizing coverage
Minimizing # configurations
Minimizing the testing cost of each configuration
More Flexibility: Multi-Objective
Use GAs + SAT: Search problem
Cost: value assigned to each variant feature
Coverage: pairwise
Selecting configurations involves several objectives
Maximizing coverage
Minimizing # configurations
Minimizing the testing cost of each configuration
More Flexibility: Multi-Objective
Use GAs + SAT: Search problem
Cost: value assigned to each variant feature
Coverage: pairwise
Objective function (F)
Weighted linear combination of coverage, cost and # configurations
Selecting configurations involves several objectives
Maximizing coverage
Minimizing # configurations
Minimizing the testing cost of each configuration
Comparison with Random
42
Comparison with Random
!F1 F2
F3
RandomMulti-objective
0
42
Same Pairwise Coverage
Comparison with Random
!F1 F2
F3
RandomMulti-objective
0
42
!F1 F2
F3
RandomMulti-objective
0
Same Pairwise Coverage Same # Configurations
Conclusion on Multi-Objective
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Threat: Small FMs so far (<100 features)
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Threat: Small FMs so far (<100 features)
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Threat: Small FMs so far (<100 features)
More information
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Threat: Small FMs so far (<100 features)
More information
C. Henard, M. Papadakis, G. Perrouin, J. Klein, Y. Le Traon, Multi-objective Test Generation for Software Product Lines, SPLC2013, ACM
Conclusion on Multi-ObjectiveStatistical significance of F guiding search after only 500 generations
Currently being integrated in PLEDGE
Threat: Small FMs so far (<100 features)
More information
C. Henard, M. Papadakis, G. Perrouin, J. Klein, Y. Le Traon, Multi-objective Test Generation for Software Product Lines, SPLC2013, ACM
Going Beyond FMs
Unified Behavioural SPL QA
45
Salvador, 2 September 2012
Featured Transition Systems
46
Salvador, 2 September 2012
Featured Transition Systems
Sells soda
pay soda serveSoda open
close
change take
46
Salvador, 2 September 2012
Featured Transition Systems
Sells soda and teapay
soda serveSodaopen
tea serveTeaclose
change take
Sells soda
pay soda serveSoda open
close
change take
46
Salvador, 2 September 2012
Featured Transition Systems
Sells soda and teapay
soda serveSodaopen
tea serveTeaclose
change take
Can cancel purchase
pay soda serveSoda open
cancelreturn close
changetake
Sells soda
pay soda serveSoda open
close
change take
46
Salvador, 2 September 2012
Featured Transition Systems
Sells soda and teapay
soda serveSodaopen
tea serveTeaclose
change take
Can cancel purchase
pay soda serveSoda open
cancelreturn close
changetake
Drinks are freesoda serveSodafree
take
Sells soda
pay soda serveSoda open
close
change take
46
Salvador, 2 September 2012
Featured Transition Systems
46
FTS cont’d
FTS cont’dDesigned for Model-Checking
More efficient than product-by-product verification
Tool-support: SNIP [Classen2012], NuSMV [Classen2011]
Real-time [Cordy2012a], adaptive systems [Cordy2012b]
SPL of model-checkers: http://www.info.fundp.ac.be/fts/
FTS cont’dDesigned for Model-Checking
More efficient than product-by-product verification
Tool-support: SNIP [Classen2012], NuSMV [Classen2011]
Real-time [Cordy2012a], adaptive systems [Cordy2012b]
SPL of model-checkers: http://www.info.fundp.ac.be/fts/
SPL-dedicated
From product to set of products
From application to domain engineering
FTS cont’dDesigned for Model-Checking
More efficient than product-by-product verification
Tool-support: SNIP [Classen2012], NuSMV [Classen2011]
Real-time [Cordy2012a], adaptive systems [Cordy2012b]
SPL of model-checkers: http://www.info.fundp.ac.be/fts/
SPL-dedicated
From product to set of products
From application to domain engineering
Goal: Combination with Testing
MC properties as test selection criteria
Verification of feature interactions
Combining Verification and Testing
Combining Verification and TestingVerification → Testing
LTL properties as testing criteria: “[](cancel => !serve W start)”
MC will get all configurations violating this property: if you cancel your order you should not get your drink
Combining Verification and TestingVerification → Testing
LTL properties as testing criteria: “[](cancel => !serve W start)”
MC will get all configurations violating this property: if you cancel your order you should not get your drink
Testing → Verification
FTS too large to be verified
Focus on some features or interactions → FTS’
Check the behavior of configuration containing them
Combining Verification and TestingVerification → Testing
LTL properties as testing criteria: “[](cancel => !serve W start)”
MC will get all configurations violating this property: if you cancel your order you should not get your drink
Testing → Verification
FTS too large to be verified
Focus on some features or interactions → FTS’
Check the behavior of configuration containing them
Multiple scenarios can be devised
Example: Pruning FTS with observers
49
Example: Pruning FTS with observers Usability Issue: Not everyone loves TL ;)
49
Example: Pruning FTS with observers Usability Issue: Not everyone loves TL ;)
Observer automata allows specifying properties easily
49
Example: Pruning FTS with observers Usability Issue: Not everyone loves TL ;)
Observer automata allows specifying properties easily
49
!"##$%&'()*"+"#,%$-./)**"!"
Example: Pruning FTS with observers Usability Issue: Not everyone loves TL ;)
Observer automata allows specifying properties easily
49
Example: Pruning FTS with observers Usability Issue: Not everyone loves TL ;)
Observer automata allows specifying properties easily
49
Leveraging FTS
Leveraging FTS
Not really an user-friendly language
No structuring mechanism
Higher-level models (fPromela, fSMV) still requires MC expertise
Leveraging FTS
Not really an user-friendly language
No structuring mechanism
Higher-level models (fPromela, fSMV) still requires MC expertise
Use of UML instead
Broaden the scope of this techniques to any SPL engineer
Abstraction: Hierarchical states, orthogonal regions
FTS as underlying formal semantics
Leveraging FTS
Challenges
ChallengesUML 2 FTS
ChallengesUML 2 FTS
Choice of relevant constructs
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
Flattening: Well-known pb but few usable solutions
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
Flattening: Well-known pb but few usable solutions
Testability of FTS
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
Flattening: Well-known pb but few usable solutions
Testability of FTS
Extended Actions, test criteria, FTS-ioco...
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
Flattening: Well-known pb but few usable solutions
Testability of FTS
Extended Actions, test criteria, FTS-ioco...
More information
ChallengesUML 2 FTS
Choice of relevant constructs
Symmetric vs Asymmetric composition
Flattening: Well-known pb but few usable solutions
Testability of FTS
Extended Actions, test criteria, FTS-ioco...
More information
X. Devroey, M. Cordy, G. Perrouin, E-Y Kang, P-Y Schobbens, P. Heymans, A. Legay, and B. Baudry. "A vision for behavioural model-driven validation of software product lines." ISOLA 2012, pp. 208-222. Springer.
Models & Reality...
Models & Reality...
Models & Reality...
“Essentially, all models are wrong, but some are useful.”
George E. P. Box, Brit. Mathematician, (1919-2013)
Source of FMs
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Set of existing products evolved as a SPL not derived directly from the FMs
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Set of existing products evolved as a SPL not derived directly from the FMs
FM Synthesis (or reverse-engineering)
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Set of existing products evolved as a SPL not derived directly from the FMs
FM Synthesis (or reverse-engineering)
Several approaches exist [She2011,Ach2011,Hasl2013...]
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Set of existing products evolved as a SPL not derived directly from the FMs
FM Synthesis (or reverse-engineering)
Several approaches exist [She2011,Ach2011,Hasl2013...]
Tough pb: parsing code, limitations in the expressiveness of target FMs languages, heuristics…
Source of FMsRepositories (SPLOT) contain mostly small to medium size academic FMs
may not represent actual systems
SPLs are barely built from nothing
Set of existing products evolved as a SPL not derived directly from the FMs
FM Synthesis (or reverse-engineering)
Several approaches exist [She2011,Ach2011,Hasl2013...]
Tough pb: parsing code, limitations in the expressiveness of target FMs languages, heuristics…
=> FMs may not be fully representative of the systems you want to test
FMs not representative
?
FMs not representative
?
Missing/Infeasible/Useless
configurations…
Get the model right !
FMs not representative
?
Missing/Infeasible/Useless
configurations…
Get the model right !
Some FMs have thousands of
features…
How to fix them?
Test-and-Fix LoopDetect discrepancies in two ways
Check if existing products conform to the FM (SCF, EWC)
Try to build products from randomly extracted configurations from FM (GCF,ORF)
Fix them using Hill-Climbing EA
Mutate the FMs to align them with real products (alter/insert/remove constraint)
Fitness function trying to minimize different kinds of errors
EvaluationExperimentation on linux kernel FM
Large reverse-engineered FM + manual edits
Easy building infrastructure (kconfig+make)
FM 2K runs 3K runs 4K runs 5K runs
EWC 50 46 43 41 39
SCF 1000 885 556 498 455
ORF 2468 1646 1395 1236 1084
GCF 1000 1000 1000 1000 1000
OutcomeApproach looks promising on some aspects (SCF/ORF)
But we cannot generalize one case…
Future work
Assessing the improvement on all configurations (not only a subset)
Mutation operators
A (bit) more information
C. Henard, M. Papadakis, G. Perrouin, J. Klein, Y. Le Traon, Towards automated testing and fixing of re-engineered feature models. NIER@ICSE2013, pp.1245-1248. IEEE Press.
Wrapping Up
Christo,Villa Borghese 1974, Photo: Harry Shunk
SummaryLooked at selecting configurations from FMs
Important subproblem of SPL Testing
CIT main inspiration source
FMs are not covering arrays…
Constraints are important for CIT tools (outside SPLs) and getting more and more interest => opportunities to compare
Achievements
Scalability
Usability
Ready for industrial practice ?
Summary cont’dT-wise is “blind”
prioritization (weights, ordered suites...)
flexibility (time/budget constraints)
Complement with behavioural SPL Testing & QA
Tough challenge: collaboration between Verification and Testing communities required
MBT and SPL testing
Depending on their source our techniques may be applied to VIS not only SPL (e.g. linux)
Their validity has to be challenged
Opportunities to work with (model) miners and program analysts
Take Home Message(s)
Take Home Message(s)SPL Configuration Selection is Hard
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Is inter(sub)-disciplinary
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Is inter(sub)-disciplinary
SPL & Testing culture: Helps finding useful tradeoffs
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Is inter(sub)-disciplinary
SPL & Testing culture: Helps finding useful tradeoffs
Diversity of techniques: SAT, Evolutionary, even model-checking ;)
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Is inter(sub)-disciplinary
SPL & Testing culture: Helps finding useful tradeoffs
Diversity of techniques: SAT, Evolutionary, even model-checking ;)
Collaboration will and communication skills needed, but interesting :)
Take Home Message(s)SPL Configuration Selection is Hard
Don’t be afraid...
But start simple to understand
Is inter(sub)-disciplinary
SPL & Testing culture: Helps finding useful tradeoffs
Diversity of techniques: SAT, Evolutionary, even model-checking ;)
Collaboration will and communication skills needed, but interesting :)
A Personal NoteDo not have a long testing experience but
Enjoyed an open and friendly community (so far ;))
Naive questions yield non-trivial answers
Pragmatism may help to face exponentials
I hope to still have a tester hat in 4 years…
Questions
References[Oster et al 2010] Sebastian Oster, Florian Markert, Philipp Ritter: Automated Incremental Pairwise Testing of Software Product Lines. SPLC 2010:196-210
[Perrouin2008] Gilles Perrouin, Jacques Klein, Nicolas Guelfi, Jean-Marc Jézéquel: Reconciling Automation and Flexibility in Product Derivation. SPLC 2008: 339-348
[Perrouin2010] Gilles Perrouin, Sagar Sen, Jacques Klein, Benoit Baudry, Yves Le Traon: Automated and Scalable T-wise Test Case Generation Strategies for Software Product Lines. ICST 2010: 459-468
[Perrouin2012] Gilles Perrouin, Sebastian Oster, Sagar Sen, Jacques Klein, Benoit Baudry, Yves Le Traon: Pairwise testing for software product lines: comparison of two approaches. Software Quality Journal 20(3-4): 605-643 (2012)
[Uzu08] E. Uzuncaova, D. Garcia, S. Khurshid, and D. Batory, “Testing software product lines using incremental test generation,” in ISSRE. IEEE Computer Society, 2008, pp. 249–258.
[Cohen2006] M. B. Cohen, M. B. Dwyer, and J. Shi, “Coverage and adequacy in software product line testing,” in ROSATEA@ISSTA, 2006, pp. 53–63.[10]
[Cohen2007] M. Cohen, M. Dwyer, and J. Shi, “Interaction testing of highly-configurable systems in the presence of constraints,” in ISSTA, 2007, pp. 129–139.
[Weißleder2010] Stephan Weißleder: Test models and coverage criteria for automatic model-based test generation with UML state machines. PhD Thesis, Humboldt University of Berlin 2010, pp. 1-259
[Utting2006] Utting,M.,Legeard,B.:Practicalmodel-based testing: a tools approach. Morgan Kaufmann, 2006
[Kuhn2004] Kuhn DR, Wallace DR, Gallo AM (2004) Software fault interactions and implications for software testing. IEEE Trans Softw Eng 30(6):418–421
References[Batory2005] D. S. Batory, “Feature models, grammars, and propositional formulas,”in SPLC, 2005, pp. 7–20.
[Czarnecki2007] K. Czarnecki and A. Wasowski, “Feature diagrams and logics: There and back again,” in SPLC.Los Alamitos, CA, USA: IEEE ComputerSociety, 2007, pp. 23–34.
[Schobbens2007] P. Schobbens, P. Heymans, J. Trigaux, and Y. Bontemps, “Generic semantics of feature diagrams,” Computer Networks, vol. 51, no. 2, pp.456–479, 2007.
[Benavides2010] Benavides D, Segura S, Ruiz-Cortés A (2010) Automated analysis of feature models 20 years later: A literature review. Information Systems 35(6):615 – 63
[Mendonca2009] Mendonca M, Branco M, Cowan D (2009) SPLOT: software product lines online tools. In: Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, ACM, pp 761–762
[Hervieu2011] Aymeric Hervieu, Benoit Baudry, Arnaud Gotlieb: PACOGEN: Automatic Generation of Pairwise Test Configurations from Feature Models. ISSRE 2011: 120-129
[Johansen2012a] Martin Fagereng Johansen, Øystein Haugen, Franck Fleurey, Anne Grete Eldegard, Torbjørn Syversen: Generating Better Partial Covering Arrays by Modeling Weights on Sub-product Lines. MoDELS 2012: 269-284
[Johansen2012b] Martin Fagereng Johansen, Øystein Haugen, Franck Fleurey: An algorithm for generating t-wise covering arrays from large feature models. SPLC (1) 2012: 46-55
[Johansen2011] Martin Fagereng Johansen, Øystein Haugen, Franck Fleurey: Properties of Realistic Feature Models Make Combinatorial Testing of Product Lines Feasible. MoDELS 2011: 638-652
References[Classen2012] Classen, A.; Cordy, M.; Heymans, P.; Legay, A. and Schobbens, P-Y. Model checking software product lines with SNIP. In International Journal on Software Tools for Technology Transfer (STTT), Springer-Verlag, 14 (5): 589-612, 2012.
[Cordy2012a] Maxime Cordy, Pierre-Yves Schobbens, Patrick Heymans, Axel Legay: Behavioural modelling and verification of real-time software product lines. SPLC (1) 2012: 66-75
[Cordy2012b] Maxime Cordy, Andreas Classen, Patrick Heymans, Axel Legay, Pierre-Yves Schobbens. Model Checking Adaptive Software with Featured Transition Systems, in Assurance for Self-Adaptive Systems, Lecture Notes in Computer Science, to appear.
[Classen2008] Classen A, Heymans P, Schobbens P (2008) What’s in a feature: A requirements engineering perspective. In: Proceedings of the Theory and practice of software, 11th international conference on Fundamental approaches to software engineering, Springer-Verlag, pp 16–30
[Loc12] Malte Lochau, Ina Schaefer, Jochen Kamischke, and Sascha Lity. Incremental model-based testing of delta-oriented software product lines. Tests and Proofs, pages 67–82, 2012.
[Ensan2012] Ensan, Faezeh, Ebrahim Bagheri, and Dragan Ga#evi$. "Evolutionary search-based test generation for software product line feature models." In Advanced Information Systems Engineering, pp. 613-628. Springer Berlin Heidelberg, 2012.
[Droste2002] S. Droste, T. Jansen, and I. Wegener, “On the analysis of the (1+ 1) evolutionary algorithm,” Theor. Comput. Sci., vol. 276, no. 1-2, pp. 51–81, Apr. 2002
[Garvin2009] Garvin BJ, Cohen MB, Dwyer MB (2009) An improved meta-heuristic search for constrained interaction testing. In: 1st international symposium on search based software engineering, pp 13–22, 2009
[Garvin2011] B. J. Garvin, M. B. Cohen, and M. B. Dwyer, “Evaluating improvements to a meta-heuristic search for constrained interaction testing,” Empirical Softw. Engg., vol. 16, no. 1, pp. 61–102, Feb. 2011.
References[Classen2010] Classen, A., Heymans, P., Schobbens, P., Legay, A., Raskin, J.: Model checking lots of sys- tems: efficient verification of temporal properties in software product lines. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1. pp. 335–344. ICSE ’10, ACM, New York, NY, USA (2010)
[Classen2011] Classen, A., Heymans, P., Schobbens, P., Legay, A.: Symbolic model checking of software product lines. In: Proceedings 33rd International Conference on Software Engineering (ICSE 2011). ACM Press, New York (2011)
[Ach2011] Acher, M., Cleve, A., Perrouin, G., Heymans, P., Vanbeneden, C., Collet, P., & Lahire, P. (2012, January). On extracting feature models from product descriptions. In Proceedings of the Sixth International Workshop on Variability Modeling of Software-Intensive Systems (pp. 45-54). ACM.
[She2011] She, S., Lotufo, R., Berger, T., Wasowski, A., & Czarnecki, K. (2011, May). Reverse engineering feature models. In Software Engineering (ICSE), 2011 33rd International Conference on (pp. 461-470). IEEE.
[Hasl2013] Haslinger, E. N., Lopez-Herrejon, R. E., & Egyed, A. (2013). On extracting feature models from sets of valid feature combinations. In Fundamental Approaches to Software Engineering (pp. 53-67). Springer Berlin Heidelberg.