recovering traceability links in requirements documents …horacek/ontologies-presentation.pdf ·...

31
RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo Huang Department of Computer Science & Engineering Southeren methodist University Dallas, TX 75275-0122 Vincet Ng Human Language Technology Institute University Of Texas at Dallas Richardson, TX 75083-0688 Presented By Narendra Narisetti Cuk,2552738 1

Upload: vokiet

Post on 06-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS

Zeheng Li Mingrui Chen LiGuo Huang

Department of Computer Science & Engineering

Southeren methodist University

Dallas, TX 75275-0122

Vincet Ng

Human Language Technology Institute

University Of Texas at Dallas

Richardson, TX 75083-0688

Presented By

Narendra Narisetti

Cuk,2552738 1

Page 2: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Introduction

• Software system development initialized with evaluation andrefinement of requirements.

• Documenting those requirements using natural language iscalled “requirements documents”.

• The requirements are refined with additional design detailsand implementation information.

• Linking of requirements in which one is refinement of other iscalled ‘’ requirements traceability’’.

2

Page 3: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Types of Requirements

• Specifically, requirements can be divided into two types:

1. High Level Requirements(coarse-grained)

2. Low Level Requirements(fine-grained)

• Requirement traceability links each high-level requirementwith all the low-level requirements that improves.

• The traceability mapping is many-to-many .

3

Page 4: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Example: Pine email system by Sultanov and Hayes

Figure 1: Sample of high- and low-level requirements4

1

2

3

Page 5: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Drawbacks:

• Information irrelevant to the establishment of one link is related to establishment of other link in same requirement.

Example: Description section in UC01 is irrelevant to the HR02 but it is relevant to HR01 for linking.

• Link can exist between a pair of requirements even if they don’t have similar content words or overlapping.

5

Page 6: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Requirements Traceability Approaches:

• It is classified as two types:

Manual approaches: Requirements traceability links arerecovered manually by developers.

Automated approaches: Depends on information retrieval(IR)techniques to generate links automatically.

6

Page 7: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Automated approaches

• Binary classification tasks.

• Measures similarity between high and low level requirements.

• Classifying positive means high and low level requirements are linked.

• Information retrieval (IR) techniques are used for traceability link prediction.

7

Page 8: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Supervised Learning Methods

• Supervised methods are employed with two types of humansupplied knowledge:

i) Annotator rationales : It contains the information relevantto the establishment of link by the human annotator.

we use this rationales to create additional training instancesfor the learner.

ii) Ontology hand-built: It is defined by a domain expert tocreate additional training features for learner. (see next slide)

8

Page 9: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Hand-built ontology of pine

9

Page 10: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Why ontology based features are useful for traceability links?

1.Only those verbs and nouns appear in training data

2. For link identification , verbs and nouns are deemed relevant by domain expert in ontology.

3. Robust generalization of the words/phrases .

10

Hand-built Ontology

Page 11: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Manual Vs Automated

Manual Approach

1. System analysts uses requirement management tools to build RTM.

2. Rational DOORS, Rational RequisitePro, CASE .

3. It is human-intensive so error prone gives large set of requirements.

Automated Approach

1. Calculate textual similaritybetween requirements.

Ex: Cosine coefficients, Jaccard

2. Tf-idf-based vector spacemodel, Latent DirichletAllocation.

3. Depend on IR techniques.

11

Page 12: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

For our evaluation we are taking second dataset“WorldVistA” , an electronic health informationsystem developed by the USA veteransadministration along with pine email system.

Datasets

Table 1: Statistics on the Datasets12

Page 13: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Manual ontology for WorldVistA

13

Page 14: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Manual ontology for WorldVistA

14

Page 15: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

15

Baseline Systems

• It employs different methods for traceability prediction.

Baseline Systems

Unsupervised Baseline Supervised Baseline

Tf-idf LDA Word Pairs LDA induced topic pairs

Page 16: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Unsupervised Baselinesa) The Tf-idf baseline: If cosine similarity value between two

documents is greater than given threshold value then it ispositive.

b) The LDA baseline: Each entry in document has certainprobability such that it belongs to one of the topics ofn(length of the document) and apply cosine similarity asabove method.

Note: Here LDA is trained to produce n topics.

16

Page 17: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Supervised Baseline

• Instance is pair of high-level and low-level requirements.

• Instance is positive then two requirements are linked otherwiseit is negative.

• Instances can be represented using two types of features:

a) word pairs: Instance is pair of words taken from traininginstances.

b) LDA-induced topic pairs: Instance is pair of features and it ispositive if both features are most probable topics in high andlow-level requirements.

Note: Here LDA is trained with additional parameter C toproduce n topics.

17

Page 18: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Exploiting Rationales

Extension:

• Generating extra training instances i.e. pseudo instances, weneed to adopt extension to baseline systems.

• We employ a binary SVM classifier on training data set withlinear kernel and setting all parameters to default valuesexpect C parameter.

Evaluation:

• Dataset is five fold cross validation in which three folds fortraining data, one fold for development set and one fold forevaluation.

• F-score on dev set give performance of the classifier. 18

Page 19: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Rationale in Traceability Prediction

• According to Zaidan et al, Rationale is a human-annotated textfragment that motivated an annotator to assign a particularlabel to training document.

• In traceability prediction rationales are identified only forpositive instances.

• In traceability prediction, negative instances are because ofabsence of evidence that two requirements involved shouldbe linked rather than presence of evidence that they shouldnot be linked.

19

Page 20: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Creating Negative Pseudo Instances

• Steps for creating negative pseudo instances:

i) Select pair of linked requirements.

ii) Remove rationale from both requirements. Only negativeinstances will remain.

iii) Remaining text fragments create pseudo instances which arenegative in nature.

iv) From each pair of positive instances, three types of negativepseudo instances are possible:

a) Removing all and rationales from high-level requirements.

b) Removing all and rationales from low-level requirements.

c) Removing all rationales from both requirements.20

Page 21: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Creating Positive Pseudo Instances

• Steps for creating positive pseudo instances:

i) Select pair of linked requirements.

ii) Remove text fragments which are not part of rationale in pair.

iii) Reaming pseudo instances are positive pseudo instances.

iv) Add a constraint to the SVM learner to classify pseudo instances with less confidence.

21

Page 22: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Soft-margin SVM formulation

i) Positive instances:

ii) Positive pseudo instances:

iii) Negative pseudo instances:

• Xi = Training example

• C = error penalty

• Vi ,uij = pos/neg pseudo instances created from Xi

• Ci = { -1,+1} class label

• ξi = slack variable

• μ = margin size

22

Page 23: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Exploiting an Ontology

• For generating additional features we employ SVM learner tohand-built ontology contains verb and noun clusters.

• In this, each training instance is

i) from high-level and low-level requirements

ii) from the list of Ontology.

23

Page 24: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

24

Ontology Based Features

Verb pairsNoun pairs Verb group pairs Noun group pairs Dependency pairs

focus on verbs/Nouns that relevant to traceability prediction

Replace verbs/Nouns with cluster id’sCreate binary file with cluster id’sBest performance

Combination of verb and nounUse Stanford dependency parser Connected by dependency relation

Page 25: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Learning the Ontology

Is it possible to learn an ontology rather than hand-buildingit?

Yes, it involves 3steps procedure:

Step1: verb/noun selection

Select verbs, nouns, noun phrases from training data in such

way that

a) should appear more than once

b) it contains at least three characters. Ex: be, is.

c) should appear in high level but not in low level and vice

versa.

25

Page 26: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Learning the Ontology• Step2: Verb/Noun representation

a) Represent each verb with set of nouns/NPs using Stanforddependency parser.

b) similarly noun with set of verbs collected in step1.

• Step3: Clustering-

a) Apply clustering to both verb and noun clusters separatelyusing single-link algorithm.

b) This algorithm merges two most similar clusters usingsimilarity measurement and stops when it reaches desirednumber of clusters.

It gives induced number of clusters for given datasets.26

Page 27: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Evaluation

• In evaluation, we compare F-score of different methodswhich depends on combination of noun clustering and verbclustering and C value.

• F-score depend on two terms:

i) Recall (R) :- It is percentage of links in the gold standard thatare recovered by our system.

ii) Precision (P) :- It is percentage of links recovered by oursystem that are correct.

• F-score is harmonic mean of recall and precision.

27

Page 28: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Result of Supervised Systems

28

Page 29: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

Conclusion• Traceability prediction is crucial task with annotator rationale

and ontology.

• Supervised baseline techniques reduces relative error by 11.1-19.7% compared to baseline techniques.

• F-score is competitive in between manual clusters andinduced clusters.

• The results might change depending on datasets.

29

Page 30: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

30

Page 31: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo

31