![Page 1: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/1.jpg)
Feature Maps: A Comprehensible Software
Representation for Design Pattern DetectionHannes Thaller, Lukas Linsbauer, and Alexander Egye
24 December 2018
Правилов Михаил13.02.2019
1
![Page 2: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/2.jpg)
Design Patterns (DP)● Are the generalization of the different adapted implementations
● May also circumvent deficiencies and inflexibilities in OO languages
● Examples:○ Builder
○ Decorator
○ Visitor
2
![Page 3: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/3.jpg)
Pattern’s semantic● What the pattern does
● Why the pattern is needed
● Where it is useful
3
![Page 4: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/4.jpg)
Design Pattern Detection (DPD)Used for:
● Preliminary analysis in maintenance and testing
scenarios
● Hinting at structures and dependencies and finding
performance-critical regions
4
![Page 5: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/5.jpg)
Design Pattern Detection (DPD)Used for:
● Preliminary analysis in maintenance and testing
scenarios
● Hinting at structures and dependencies and finding
performance-critical regions
5
![Page 6: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/6.jpg)
Problem in DPD● Patterns are only a guideline for implementing a specific
solution
● Each pattern can be implemented in various ways
6
![Page 7: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/7.jpg)
Roles Mapping
7
![Page 8: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/8.jpg)
Roles Mapping● Primary role
● Secondary role
● Pattern mapping:
● Equivalence class:
● Unique mapping
8
![Page 9: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/9.jpg)
ML essential elements● Data
● Model
● Optimization procedure
● Evaluation
9
![Page 10: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/10.jpg)
Data● Independent and identically distributed
● The observations are mutually independent
● Collected in the same fashion
● Preprocessing
10
![Page 11: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/11.jpg)
Model● Convolutional Neural Networks (CNNs)
○ capabilities for computer vision problems
○ local correlations -> high-level features
○ reasonable amount of model parameters
● Random Forests
○ multiple randomly perturbed decision trees
○ smoother decision boundaries 11
![Page 12: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/12.jpg)
Evaluation● Cross-Validation
● k-folds
● Nearly unbiased estimator
● Accuracy, Precision, Recall
● Matthews Correlation Coefficient
12
![Page 13: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/13.jpg)
Design Pattern Detection Pipeline
13
![Page 14: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/14.jpg)
Feature Extraction● Feature = Micro-Structure = Design Pattern (1-2 roles)
● Size prohibits variance in implementation
● Readable
● MS detectors are sub-graph filters by predicate
● Result: ASG’s sub-graphs annotated with MS roles
14
![Page 15: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/15.jpg)
Candidate Sampling● Finds potential candidates
● Creates role mappings
● Huge search space
● Heuristic search:
a. sup → Component
b. sub → Composite
c. sib → Leaf 15
![Page 16: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/16.jpg)
Feature Normalization: Approach
16
![Page 17: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/17.jpg)
Feature Normalization: Issues
17
![Page 18: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/18.jpg)
Design Pattern Inference● Learning models
● Input: feature maps
● Output: probability
18
![Page 19: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/19.jpg)
Design Pattern Detection Study● Binary classification problem
● Multi-label classification problem
● Evaluated only the last two stages of the pipeline
19
![Page 20: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/20.jpg)
Controlled Variables7 Experiment Parameters (ExP):
1. Patterns = {Singleton, Template Method, Composite,
Decorator}
2. Role Count = {1, 2, 3, 4}
3. Classification Model = {Random Forest, CNN}
20
![Page 21: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/21.jpg)
Controlled Variables4. Negative-Positive Candidate Ratio = {1, 2, 4, 6, 8, 10}
5. Data Augmentation = {0, 1, 5, 10}
6. Optimization Budget = {200}
7. Instance Independence = {project-fold cross validation}
21
![Page 22: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/22.jpg)
Response Variables● Accuracy
● Precision
● Recall
● F1 score
● MCC (primary)
22
![Page 23: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/23.jpg)
Data Source
23
![Page 24: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/24.jpg)
Procedures1. 67 different Micro-Structures extracted
2. Possible candidates sampled
3. Feature map for each candidate and pattern
4. Global unique role identifiers in [0; 161]
5. Controlled variables -> Response variables
24
![Page 25: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/25.jpg)
Experiment Results
25
![Page 26: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/26.jpg)
Experiment Results
26
![Page 27: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/27.jpg)
Experiment Results, CNN● Average: Med = 0.646, IQR = [0.528; 0.772]
● Worst,Template Method: Med = 0.51, IQR = [0.43, 0.51]
● Best,Composite: Med = 0.79, IQR = [0.71, 0.85]
● Worst case variance = 0.16
● MCC variance on NPСR = 0.064
27
![Page 28: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/28.jpg)
Experiment Results, Random Forest● Average: Med = 0.48, IQR = [0.29; 0.64]
● Worst,Decorator: Med = -0.35, IQR = [-0.38, -0.29]
● Best,Composite: Med = 0.79, IQR = [0.67, 0.83]
● Worst case variance = 0.14
● MCC variance on NCPR = 0.16
28
![Page 29: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/29.jpg)
Experiment Results, Data Imbalance● Not applicable to Singleton
● CNN is more robust
29
![Page 30: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/30.jpg)
Experiment Results, Independence● Tests for independence regarding MCC
● Different permutation counts:
○ CNN close to significant: p < 0.057
○ RF is significant: p < 3.42 * 10^(-16)
● Significant effect concerning NPCR:
○ CNN significant: p < 2.85 * 10^(-8)
○ RF significant: p < 0.002 30
![Page 31: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/31.jpg)
Summary● Patterns with more roles are easier to detect
● Decline in performance with larger NPCR
● FMs fit well in the framework of CNNs
● Direct comparison with other results
31
![Page 32: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/32.jpg)
Comparison
Zanoni et al., Accuracy Best Authors, Accuracy Average
Singleton (NPCR = 1.66) RF, 0.93 RF 0.73 (-20%), CNN 0.77 (-19%)
Template Method - ?
Composite (NPCR = 3) ν−SVCRBF 0.81 RF 0.90, CNN 0.93
Decorator (NPCR = 1.48) RF 0.82 CNN 0.79
32
![Page 33: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/33.jpg)
Threats to Validity, Internal● P-MARt dataset:
○ Old projects
○ Misclassification
○ Overfitting:
■ RF: increasing tree number
■ CNN: Dropout, kernel and activation
regularization, and early stopping 33
![Page 34: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/34.jpg)
Threats to Validity, External● Multiple patterns with different numbers of roles
● Different NPCRs
● Only one pattern for each number of roles
34
![Page 35: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/35.jpg)
Conclusion● Feature Map - flexible and comprehensible source code
representation useful beyond DPD
● Robust performance for imbalanced datasets
● Compact fashion
● Future: more design patterns, bigger dataset, graph
representations natively algorithms35
![Page 36: Feature Maps: A Comprehensible Software Representation for ...Feature Maps: A Comprehensible Software Representation for Design Pattern Detection Hannes Thaller, Lukas Linsbauer, and](https://reader033.vdocuments.us/reader033/viewer/2022042622/5f95d6dce5142304e35f2c2b/html5/thumbnails/36.jpg)
Links● Article: https://arxiv.org/abs/1812.09873
● Dataset:
http://www.ptidej.net/tools/designpatterns/index_html#2
● Source code is not available
36