development in the ferda project december 2006 martin ralbovský
Post on 01-Jan-2016
216 Views
Preview:
TRANSCRIPT
Development in the Ferda Development in the Ferda projectproject
December 2006December 2006
Martin RalbovskMartin Ralbovskýý
ContentContent
HistoryHistory Changes in the 2.0 version, improved Changes in the 2.0 version, improved
GUHA abilitiesGUHA abilities Background knowledge and Background knowledge and
ontologiesontologies Further academic developmentFurther academic development
Ferda project history IFerda project history I
Ferda – successor of the LISp-Miner data Ferda – successor of the LISp-Miner data mining system, visual and modular mining system, visual and modular environmentenvironment
Software project at MFF UKSoftware project at MFF UK KEG 10.11.2005KEG 10.11.2005
Introduction of the systemIntroduction of the system Description of parts of the working environmentDescription of parts of the working environment Implementation principlesImplementation principles
Znalosti 2006 articleZnalosti 2006 article KEG 4.5.2006KEG 4.5.2006
State of development in May 06State of development in May 06 Master theses themes discussedMaster theses themes discussed
Ferda project history IIFerda project history II
Development since May 06Development since May 06 ““Experimental GUHA Procedures” by Experimental GUHA Procedures” by
Tomáš KuchařTomáš Kuchař completed completed ““Usage of Domain Knowledge for Usage of Domain Knowledge for
Applications of GUHA Procedures” by Applications of GUHA Procedures” by Martin Martin RalbovskýRalbovský completed completed
Further development + testingFurther development + testing
Available versions of FerdaAvailable versions of Ferda
Version 1.0 (1.1) - approved MFF project Version 1.0 (1.1) - approved MFF project version (+ improvements)version (+ improvements)Copy of the LISp-Miner system in terms of GUHA abilities Copy of the LISp-Miner system in terms of GUHA abilities (almost)(almost)
Dependent on the LISp-Miner hypotheses generation engineDependent on the LISp-Miner hypotheses generation engine
Version 2.0 based on the master thesis of Version 2.0 based on the master thesis of Tomáš KuchařTomáš KuchařFerda no longer dependent on LISp-Miner systemFerda no longer dependent on LISp-Miner system
Improved GUHA abilities (datasource, definition of relevant Improved GUHA abilities (datasource, definition of relevant questions…)questions…)
Improved GUHA abilities Improved GUHA abilities theoretically Itheoretically I
Definition of a large set of relevant Definition of a large set of relevant questions (original):questions (original):
Attribute A, Attribute A, non-empty subset of non-empty subset of attribute attribute , then A(, then A() is ) is basic boolean basic boolean attributeattribute
Each Each basic boolean attribute basic boolean attribute is a is a boolean boolean attributeattribute
If If and and are are boolean attributes,boolean attributes, then then and and are are boolean boolean attributesattributes
Improved GUHA abilities Improved GUHA abilities theoretically IItheoretically II
Definition of a large set of relevant Definition of a large set of relevant questions in LISp-Miner (and Ferda 1.0)questions in LISp-Miner (and Ferda 1.0)
Literal ~ basic boolean attribute Literal ~ basic boolean attribute or its or its negationnegation
Literal Literal can be can be basic basic or or remainingremainingbasic – in each basic – in each partial cedent partial cedent there has to be at least there has to be at least
one one basic literalbasic literal
remaining – the oppositeremaining – the opposite
Partial cedent Partial cedent ~ conjunction of ~ conjunction of literalsliterals Cedent Cedent ~ conjunction of ~ conjunction of partial cedentspartial cedents
Improved GUHA abilities Improved GUHA abilities theoretically IIItheoretically III
Definition of a large set of relevant Definition of a large set of relevant questions in Ferda 2.0questions in Ferda 2.0
Ferda 2.0 fully supports the original Ferda 2.0 fully supports the original definition, user can use conjunction, definition, user can use conjunction, disjunction and negation multiple timesdisjunction and negation multiple times
Basic boolean attribute Basic boolean attribute can becan be Basic – Basic – the same meaningthe same meaning Forced – Forced – must be present in every relevant questionmust be present in every relevant question Auxiliary – Auxiliary – conjunction and disjunction cannot be conjunction and disjunction cannot be
formed only with formed only with auxiliaryauxiliary boolean attributes (there boolean attributes (there must be a must be a basic basic or or forcedforced attribute). attribute).
Improved GUHA abilities practically Improved GUHA abilities practically 4FT – Ferda 1.04FT – Ferda 1.0
Improved GUHA abilities practically Improved GUHA abilities practically 4FT – Ferda 2.04FT – Ferda 2.0
Improved GUHA abilities practicallyImproved GUHA abilities practicallyKL – Ferda 1.0KL – Ferda 1.0
Improved GUHA abilities practicallyImproved GUHA abilities practicallyKL – Ferda 2.0KL – Ferda 2.0
Ferda 2.0 versus LISp-MinerFerda 2.0 versus LISp-Miner
We compare only the hypotheses We compare only the hypotheses generation engines, not the whole systemsgeneration engines, not the whole systems
Running time of proceduresRunning time of procedures 4FT approximately equal4FT approximately equal KL faster in Ferda 2.0KL faster in Ferda 2.0 CF faster in Ferda 2.0CF faster in Ferda 2.0 SD procedures much faster in LISp-Miner (no jump SD procedures much faster in LISp-Miner (no jump
optimalizations)optimalizations) Some quantifiers not implemented in Some quantifiers not implemented in
Ferda 2.0 (but are easy to implement)Ferda 2.0 (but are easy to implement) LISp-Miner better testedLISp-Miner better tested
Background knowledge I – Background knowledge I – introductionintroduction
Background knowledge is a vague term for knowledge from Background knowledge is a vague term for knowledge from the domain experts to aid in KDD.the domain experts to aid in KDD.
No central definition or theory, different authors use it No central definition or theory, different authors use it differently.differently.
The definition for GUHA mining: The definition for GUHA mining: a set of various verbal rules that are accepted in a a set of various verbal rules that are accepted in a specific domain as a common knowledge.specific domain as a common knowledge.
Background knowledge can be used as an effective mean of Background knowledge can be used as an effective mean of communication between the knowledge expert and the communication between the knowledge expert and the data miner.data miner.
Usage of background knowledge in GUHA is described in Usage of background knowledge in GUHA is described in master thesis of Martin Ralbovsky (and elsewhere)master thesis of Martin Ralbovsky (and elsewhere)
Background knowledge II - Background knowledge II - examplesexamples
Sociomedical domain:Sociomedical domain: If education increases, wine consumption If education increases, wine consumption
increases as wellincreases as well Patients with greater responsibility in work Patients with greater responsibility in work
tend to drive to work by cartend to drive to work by carBeer marketing domain:Beer marketing domain: Younger consumers prefer drought beerYounger consumers prefer drought beer Older consumers prefer beer in bottlesOlder consumers prefer beer in bottles More expensive brands are better sold More expensive brands are better sold
during holidaysduring holidays
Background knowledge III – Background knowledge III – preferred usagepreferred usage
Domain expert Data miner
Knowledge about the domain Data mining techniquesand interpretation knowledge
Specification of interesting facts to the domain expertRules can be transformed into mining tasks
Tasks resultsSoundness of DM techniques
Background knowledge IV – in Background knowledge IV – in FerdaFerda
Formalization of background knowledge Formalization of background knowledge rules sound for GUHA purposes createdrules sound for GUHA purposes created
Implemented modules of the Ferda system Implemented modules of the Ferda system (version 1.1) to validate background (version 1.1) to validate background knowledge rulesknowledge rules
Experiments carried out to find presence Experiments carried out to find presence of background knowledge rules in the data of background knowledge rules in the data with the GUHA procedures 4FT and KLwith the GUHA procedures 4FT and KL
So far rather disappointing resultsSo far rather disappointing results
Background knowledge V - Background knowledge V - experimentexperiment
Presumptions:Presumptions: Background knowledge rules are somehow Background knowledge rules are somehow
stored in the datastored in the data Data collection and attribute creation Data collection and attribute creation
without mistakeswithout mistakes
Question: Can the rules be found in Question: Can the rules be found in data with “our” techniques?data with “our” techniques?
Experiment: 8 background knowledge Experiment: 8 background knowledge rules tested with the 4FT and KLrules tested with the 4FT and KL
Background knowledge VI - resultsBackground knowledge VI - results
Founded Implication with default values (Founded Implication with default values (base base = = 0,05, 0,05, p p = 0,95) – 1/8 rules approved= 0,95) – 1/8 rules approved
Above Average with default values (Above Average with default values (basebase= 0,05, = 0,05, P P = = 1,2) – 1/8 rules approved1,2) – 1/8 rules approved
Modifications of Kendall – 2/6 rules approvedModifications of Kendall – 2/6 rules approved Furthermore quantifiers showed strange results Furthermore quantifiers showed strange results
(4/8 FI results below with (4/8 FI results below with pp below 0,4) below 0,4) How good are our quantifiers???How good are our quantifiers??? Bigger experiments are planned to be done in the Bigger experiments are planned to be done in the
futurefuture
Ontologies I – introductionOntologies I – introduction In the past attempts to enhance GUHA In the past attempts to enhance GUHA
mining with domain ontologies (also mining with domain ontologies (also presented on KEG)presented on KEG)
Data understandingData understanding Attribute creationAttribute creation Decomposition of tasksDecomposition of tasks Task creationTask creation
RalbovskýRalbovský’s master thesis first work to ’s master thesis first work to examine automatic processing of domain examine automatic processing of domain ontologiesontologies
Deep analysis, however no tools Deep analysis, however no tools implementedimplemented
Ontologies II – problemsOntologies II – problems
Technical problems… not so badTechnical problems… not so badConceptual problemsConceptual problems Ontologies express knowledge on very general Ontologies express knowledge on very general
levellevel For GUHA mining, we need specific knowledge For GUHA mining, we need specific knowledge
that usually is not present in ontologiesthat usually is not present in ontologiesExample: for attribute creation we needExample: for attribute creation we need
Maximum and minimum valuesMaximum and minimum values Extreme valuesExtreme values Significant values dividing the domainSignificant values dividing the domain Typical values (for nominal domains)Typical values (for nominal domains)
Solution: probably specific ontologies for GUHA Solution: probably specific ontologies for GUHA miningmining
Further academic development IFurther academic development I
Alexander Kuzmin – “Relational GUHA procedures” Alexander Kuzmin – “Relational GUHA procedures” master thesismaster thesis
Implementation of relational 4FT miner (and Implementation of relational 4FT miner (and possibly others)possibly others)
Ferda 2.0, spring 2007Ferda 2.0, spring 2007
Daniel Kupka – “User support for 4ft-Miner Daniel Kupka – “User support for 4ft-Miner procedure for data mining” master thesisprocedure for data mining” master thesis
Help scenarios depending on the settings of 4FT Help scenarios depending on the settings of 4FT tasktask
Complex and modular systemComplex and modular system Ferda 2.0, spring 2007Ferda 2.0, spring 2007
Further academic development IIFurther academic development II
Martin Martin Zeman – Zeman – “Using ontologies in GUHA “Using ontologies in GUHA procedures”procedures”
Definition of GUHA ontologiesDefinition of GUHA ontologies Tools for ontology supportTools for ontology support Ferda 2.0, autumn 2006Ferda 2.0, autumn 2006
Michal KováčMichal Kováč – “User oriented language for – “User oriented language for solving KDD tasks”solving KDD tasks”
Only Michal knows what this is aboutOnly Michal knows what this is about Ferda 2.0, autumn 2006Ferda 2.0, autumn 2006
Thank you for your attention.Thank you for your attention.
top related