automating the formalization of product comparison matrices

Post on 25-May-2015

186 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Automating the Formalization of Product Comparison Matrices ASE 2014

TRANSCRIPT

Automating the Formalization of

Product Comparison Matrices

Guillaume Bécan, Nicolas Sannier, Mathieu Acher,

Olivier Barais, Arnaud Blouin, Benoit Baudry

Product lines everywhere

Automating the Formalization of Product Comparison Matrices - 2

Product Comparison Matrices (PCMs)

Automating the Formalization of Product Comparison Matrices - 3

Services on top of PCMs

Automating the Formalization of Product Comparison Matrices - 4

Edit

Compare

Visualize

Filter

Rank

Merge

Configure

Multi-objective optimization

Problem

Automating the Formalization of Product Comparison Matrices - 5

Edit

Compare

Visualize

Information is:

• Uncontrolled

• Heterogeneous

• Ambiguous

[Sannier et al, ASE 2013]

| [[Acer Inc.|Acer]]

| [[Acer beTouch E110|beTouch E110]]

| {{dts|format=dmy|2010|2|15}}

| 1.5

| [[320x240|320x240 QVGA]]

| {{convert|2.8|in|mm|abbr=on}}

| Touch, accelerometer

|

* [[GSM]]/​GPRS/​[[Enhanced Data

Rates for GSM Evolution|EDGE]]

* [[Universal Mobile Telecommunications

System|UMTS]] 850 1900

* CSD

Problem

Automating the Formalization of Product Comparison Matrices - 6

Common

language Transformation

Edit

Compare

Visualize

• How to formalize data contained in natural language PCMs?

• How to automate the formalization of PCMs?

• What tools and services can be built on top of this formalization?

Contributions

Automating the Formalization of Product Comparison Matrices - 7

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into formalized

product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Metamodeling driven by (lots of) data

Automating the Formalization of Product Comparison Matrices - 8

Designed for data (lots of examples + personal experience)

Designed for applications (edit, compare, visualize…)

Objectives:

• A metamodel that can contain every PCM of Wikipedia

• A metamodel for building services on top of these PCMs

Categorization of patterns (ASE 2013)

Refinement of the patterns

Realization of the metamodel (2 intensive weeks)

Formalizing some examples to adjust the metamodel

Driven by statistics and manual review of lots of PCMs New concept

Statistics

Brainstorming

Working on the metamodel since February 2013

300+ PCMs – 300,000 cells

Numerous domains

Manual review of 50 PCMs (thousands of cells)

Statistics on all PCMs

Analysis of Wikipedia syntax for tables

Automated transformation of all PCMs to PCM models

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 9

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 10

Structure of a PCM

PCM metamodel

Automating the Formalization of Product Comparison Matrices - 11

Feature/Product oriented

Automating the Formalization of Product Comparison Matrices - 12

Formalized interpretation of a cell

Data types: Boolean, Integer, Real

Special values: Unknown, Empty, Inconsistent, Partial

PCM metamodel

row string formalized integer

Contributions

Automating the Formalization of Product Comparison Matrices - 13

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into

formalized product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Approach

Automating the Formalization of Product Comparison Matrices - 14

Parsing: transform a PCM artefact in a PCM model

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

| [[Acer Inc.|Acer]]

| [[Acer beTouch E110|beTouch E110]]

| {{dts|format=dmy|2010|2|15}}

| 1.5

| [[320x240|320x240 QVGA]]

| {{convert|2.8|in|mm|abbr=on}}

| Touch, accelerometer

|

* [[GSM]]/​GPRS/​[[Enhanced

Data Rates for GSM Evolution|EDGE]]

* [[Universal Mobile Telecommunications

System|UMTS]] 850 1900

* CSD

Enable the development of a

generic formalization process

Approach

Automating the Formalization of Product Comparison Matrices - 15

Preprocessing:

Contributors cannot be trusted: missing cells, headers everywhere

We have to normalize the matrix and identify headers

Default strategy: first line and first column are headers

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

Approach

Automating the Formalization of Product Comparison Matrices - 16

Extracting information:

• Identify features and products

• Interpret cells based on a set of syntactic rules (regex)

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

List of rules:

"\d+" => Integer

match Integer(100)

Same process as the metamodel for creating the rules

Contributions

Automating the Formalization of Product Comparison Matrices - 17

1. Design of a metamodel for product comparison matrices

2. Automated techniques for formalizing raw data into formalized

product comparison matrix model

3. Evaluation on 30,000+ cells from Wikipedia

Evaluation

Automating the Formalization of Product Comparison Matrices - 18

Experimental settings:

• 75 PCMs from Wikipedia

• Headers specified manually

• Automated extraction of information

PCM PCM

model

parsing preprocessing extracting

information

PCM

model

PCM

model

PCM metamodel

exploiting

S

E

R

V

I

C

E

S

RQ1

RQ2

RQ3

Evaluation

Automating the Formalization of Product Comparison Matrices - 19

Task: check interpretation of each cell (30,000+)

• Validate

• Correct it with existing concept

• Correct it with a new concept

• I don’t know / there is no interpretation

20 evaluators

Online editor

Evaluation

Automating the Formalization of Product Comparison Matrices - 20

Metrics:

• Number of valid cells

• Number of cells corrected with concepts from the metamodel

• Number of cells corrected with new concepts

• List of new concepts

Evaluation

Automating the Formalization of Product Comparison Matrices - 21

RQ1: To what extent can PCMs be formalized?

93.11% of the cells are valid

2.61% are corrected with concepts from the metamodel

4.28% are invalid and the evaluators proposed a new concept

• Dates

• Dimensions and units

• Versions

Solution:

• Add corresponding data types to the

metamodel

• Create new rules for interpreting cells

Evaluation

Automating the Formalization of Product Comparison Matrices - 22

RQ2: To what extent can the formalization be automated?

93,11% of the cells are correctly formalized

Formalization errors may arise from 4 main areas:

• Overlapping concepts (e.g. what does an empty cell mean?)

• Missing concepts (e.g. dates, versions…)

• Missing interpretation rules

• Bad rules

Evaluation

Automating the Formalization of Product Comparison Matrices - 23

RQ3: What services can be built on top of formalized PCMs?

Editing and formalizing PCMs

Warnings during edition (inconsistent cells)

Filtering capabilities

Translate PCMs to variability models

The metamodel provides

• Feature/product oriented perspective

• Clear semantics

Results of the evaluation

Automating the Formalization of Product Comparison Matrices - 24

We now have a common language for PCMs

• validated by humans

• validated by transformation

• validated by the editor

A large proportion of the formalization can be automated

BUT human is necessary

Good news: the editor can help formalizing the data

Future work

Automating the Formalization of Product Comparison Matrices - 26

Universal editor

Support large datasets Community of PCM contributors

Synchronization with Wikipedia

Questions?

Automating the Formalization of Product Comparison Matrices - 27

PCM PCM

model

parsing preprocessing extracting

information exploiting

PCM

model

PCM

model

PCM metamodel S

E

R

V

I

C

E

S

top related