PROPOSAL AND VALIDATION OF A FEASIBILITY MODEL
FOR INFORMATION MINING PROJECTS
Pablo Pytel. Paola Britos & Ramón García-Martínez
AGENDA
Problem Description
Proposed Solution
Validation
o Proof Concept
o Comparison with real projects
Conclusions
Problem Desctipion:
Information Mining Projects
Software
Engineering
o Methodso Technicso Tools
Metodologies:oCRISP-DMoP3TQoSEMMA
85% [2000] and 60% [2005] ofprojects failed to achieve its goals
The main problems (and associted risks)
are not identified in the initial stages
Feasibility Model
Feasibility Model for Information Mining Projects:
13 characteristics to be evaluated:
o Categories:
Procedure:
o Dimensions:
Determining the value of each
project features
Interpreting the results
Converting feature values into
fuzzy intervals
Calculating the value of each
dimension
Calculating the overall project
feasibility
Validation – Proof Concept:
o Step 1: Determining the value of each project features
Project Objetive Detecting evidence of causality between general satisfaction and internet.
Category ID Value
Data
P1 All
P2 Regular
A1 All
A2 Much
A3 Regular
E1 Little
Business Problem
P3 All
A4 Much
A5 Regular
ProjectE2 Much
E3 Regular
Project TeamP4 All
E4 Much
Fuzzy Interval
(7.8; 8.8; 10; 10)(3.4; 4.4; 5.6; 6.6)(7.8; 8.8; 10; 10)(5.6; 6.6; 7.8; 8.8)(3.4; 4.4; 5.6; 6.6)(1.2; 2.2; 3.4; 4.4)(7.8; 8.8; 10; 10)(5.6; 6.6; 7.8; 8.8)(3.4; 4.4; 5.6; 6.6)(5.6; 6.6; 7.8; 8.8)(3.4; 4.4; 5.6; 6.6)(7.8; 8.8; 10; 10)
(5.6; 6.6; 7.8; 8.8)
o Step 2: Converting feature values into fuzzy intervals
Conversion Table
Validation – Proof Concept: (2)
o Step 3: Calculating the value of each dimension
o Step 4: Calculating the overall project feasibility.
o Step 5: Interpreting the results.
Plausibility Adequacy
Sucess
Dimension Value
Plausibility 7.60
Adequacy 6.27
Sucess 5.25
Overall Project Feasibility 6.47
Feasible
Accepted
Accepted
Accepted (in the limit)
Validation – Comparison with real projects: (3)
Statistical Analysis
Sucess
Overall Project Feasibility
Validation – Comparison with real projects: (4)
Statistical Analysis
Plausibility Adequacy
SucessOverall Project
Feasibility
Validation – Comparison with real projects: (5)
Wilcoxon signed-rank test:
Hypotheses :H0: there are no meaningful differences between the researchers and the model values (i.e. they are equivalent).
H1: the researchers and the model values are not equivalent.
DimensionSum Ranks+
( W+ )Sum Ranks –
( W+ )
Plausibility 97 228
Adequacy 227 98
Success 175 150
Overall Feasibility 181 144
level of significance = 0.01
quantity of non-zero pairs = 25
critical value = 68
Check Critical Value
97 > 68 H0 accepted
98 > 68 H0 accepted
150 > 68 H0 accepted
144 > 68 H0 accepted
Conclusions:
A model to determine whether a data mining project is feasible or not at an early stage is proposed
From the application of the model into real projects:
Statistical Analysis:
o the model tends to be more conservative than the experts
o standard deviation range and average values are almost the same
Wilcoxon signed-rank test
the proposed model is equivalent to the appraisal performed by the experts.