automating the formalization of product comparison matrices

26
Automating the Formalization of Product Comparison Matrices Guillaume Bécan, Nicolas Sannier, Mathieu Acher, Olivier Barais, Arnaud Blouin, Benoit Baudry

Upload: guillaume-becan

Post on 25-May-2015

186 views

Category:

Science


2 download

DESCRIPTION

Automating the Formalization of Product Comparison Matrices ASE 2014

TRANSCRIPT

Page 1: Automating the Formalization of Product Comparison Matrices

Automating the Formalization of

Product Comparison Matrices

Guillaume Bécan, Nicolas Sannier, Mathieu Acher,

Olivier Barais, Arnaud Blouin, Benoit Baudry

Page 2: Automating the Formalization of Product Comparison Matrices

Product lines everywhere

Automating the Formalization of Product Comparison Matrices - 2

Page 3: Automating the Formalization of Product Comparison Matrices

Product Comparison Matrices (PCMs)

Automating the Formalization of Product Comparison Matrices - 3

Page 4: Automating the Formalization of Product Comparison Matrices

Services on top of PCMs

Automating the Formalization of Product Comparison Matrices - 4

Edit

Compare

Visualize

Filter

Rank

Merge

Configure

Multi-objective optimization

Page 5: Automating the Formalization of Product Comparison Matrices

Problem

Automating the Formalization of Product Comparison Matrices - 5

Edit

Compare

Visualize

Information is:

• Uncontrolled

• Heterogeneous

• Ambiguous

[Sannier et al, ASE 2013]

| [[Acer Inc.|Acer]]

| [[Acer beTouch E110|beTouch E110]]

| {{dts|format=dmy|2010|2|15}}

| 1.5

| [[320x240|320x240 QVGA]]

| {{convert|2.8|in|mm|abbr=on}}

| Touch, accelerometer

|

* [[GSM]]/​GPRS/​[[Enhanced Data

Rates for GSM Evolution|EDGE]]

* [[Universal Mobile Telecommunications

System|UMTS]] 850 1900

* CSD

Page 6: Automating the Formalization of Product Comparison Matrices

Problem

Automating the Formalization of Product Comparison Matrices - 6

Common

language Transformation

Edit

Compare

Visualize

• How to formalize data contained in natural language PCMs?

• How to automate the formalization of PCMs?

• What tools and services can be built on top of this formalization?

Page 7: Automating the Formalization of Product Comparison Matrices

Contributions

Automating the Formalization of Product Comparison Matrices - 7

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into formalized

product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Page 8: Automating the Formalization of Product Comparison Matrices

Metamodeling driven by (lots of) data

Automating the Formalization of Product Comparison Matrices - 8

Designed for data (lots of examples + personal experience)

Designed for applications (edit, compare, visualize…)

Objectives:

• A metamodel that can contain every PCM of Wikipedia

• A metamodel for building services on top of these PCMs

Categorization of patterns (ASE 2013)

Refinement of the patterns

Realization of the metamodel (2 intensive weeks)

Formalizing some examples to adjust the metamodel

Driven by statistics and manual review of lots of PCMs New concept

Statistics

Brainstorming

Working on the metamodel since February 2013

300+ PCMs – 300,000 cells

Numerous domains

Manual review of 50 PCMs (thousands of cells)

Statistics on all PCMs

Analysis of Wikipedia syntax for tables

Automated transformation of all PCMs to PCM models

Page 9: Automating the Formalization of Product Comparison Matrices

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 9

Page 10: Automating the Formalization of Product Comparison Matrices

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 10

Structure of a PCM

Page 11: Automating the Formalization of Product Comparison Matrices

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 11

Feature/Product oriented

Page 12: Automating the Formalization of Product Comparison Matrices

Automating the Formalization of Product Comparison Matrices - 12

Formalized interpretation of a cell

Data types: Boolean, Integer, Real

Special values: Unknown, Empty, Inconsistent, Partial

PCM metamodel

row string formalized integer

Page 13: Automating the Formalization of Product Comparison Matrices

Contributions

Automating the Formalization of Product Comparison Matrices - 13

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into

formalized product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Page 14: Automating the Formalization of Product Comparison Matrices

Approach

Automating the Formalization of Product Comparison Matrices - 14

Parsing: transform a PCM artefact in a PCM model

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

| [[Acer Inc.|Acer]]

| [[Acer beTouch E110|beTouch E110]]

| {{dts|format=dmy|2010|2|15}}

| 1.5

| [[320x240|320x240 QVGA]]

| {{convert|2.8|in|mm|abbr=on}}

| Touch, accelerometer

|

* [[GSM]]/​GPRS/​[[Enhanced

Data Rates for GSM Evolution|EDGE]]

* [[Universal Mobile Telecommunications

System|UMTS]] 850 1900

* CSD

Enable the development of a

generic formalization process

Page 15: Automating the Formalization of Product Comparison Matrices

Approach

Automating the Formalization of Product Comparison Matrices - 15

Preprocessing:

Contributors cannot be trusted: missing cells, headers everywhere

We have to normalize the matrix and identify headers

Default strategy: first line and first column are headers

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

Page 16: Automating the Formalization of Product Comparison Matrices

Approach

Automating the Formalization of Product Comparison Matrices - 16

Extracting information:

• Identify features and products

• Interpret cells based on a set of syntactic rules (regex)

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

List of rules:

"\d+" => Integer

match Integer(100)

Same process as the metamodel for creating the rules

Page 17: Automating the Formalization of Product Comparison Matrices

Contributions

Automating the Formalization of Product Comparison Matrices - 17

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into formalized

product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Page 18: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 18

Experimental settings:

• 75 PCMs from Wikipedia

• Headers specified manually

• Automated extraction of information

PCM PCM

model

parsing preprocessing extracting

information

PCM

model

PCM

model

PCM metamodel

exploiting

S

E

R

V

I

C

E

S

RQ1

RQ2

RQ3

Page 19: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 19

Task: check interpretation of each cell (30,000+)

• Validate

• Correct it with existing concept

• Correct it with a new concept

• I don’t know / there is no interpretation

20 evaluators

Online editor

Page 20: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 20

Metrics:

• Number of valid cells

• Number of cells corrected with concepts from the metamodel

• Number of cells corrected with new concepts

• List of new concepts

Page 21: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 21

RQ1: To what extent can PCMs be formalized?

93.11% of the cells are valid

2.61% are corrected with concepts from the metamodel

4.28% are invalid and the evaluators proposed a new concept

• Dates

• Dimensions and units

• Versions

Solution:

• Add corresponding data types to the

metamodel

• Create new rules for interpreting cells

Page 22: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 22

RQ2: To what extent can the formalization be automated?

93,11% of the cells are correctly formalized

Formalization errors may arise from 4 main areas:

• Overlapping concepts (e.g. what does an empty cell mean?)

• Missing concepts (e.g. dates, versions…)

• Missing interpretation rules

• Bad rules

Page 23: Automating the Formalization of Product Comparison Matrices

Evaluation

Automating the Formalization of Product Comparison Matrices - 23

RQ3: What services can be built on top of formalized PCMs?

Editing and formalizing PCMs

Warnings during edition (inconsistent cells)

Filtering capabilities

Translate PCMs to variability models

The metamodel provides

• Feature/product oriented perspective

• Clear semantics

Page 24: Automating the Formalization of Product Comparison Matrices

Results of the evaluation

Automating the Formalization of Product Comparison Matrices - 24

We now have a common language for PCMs

• validated by humans

• validated by transformation

• validated by the editor

A large proportion of the formalization can be automated

BUT human is necessary

Good news: the editor can help formalizing the data

Page 25: Automating the Formalization of Product Comparison Matrices

Future work

Automating the Formalization of Product Comparison Matrices - 26

Universal editor

Support large datasets Community of PCM contributors

Synchronization with Wikipedia

Page 26: Automating the Formalization of Product Comparison Matrices

Questions?

Automating the Formalization of Product Comparison Matrices - 27

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S