mex vocabulary - a lightweight interchange format for machine learning experiments
TRANSCRIPT
MEX VocabularyA Lightweight Interchange Format for Machine Learning Experiments
Diego Esteves et al.
Department of Computer Science, AKSWUniversity of Leipzig
17 Sep 2015 - SEMANTiCS
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 1 / 30
Outline
1 IntroductionProblemMotivationChallengesState of the Art
2 MEXThe InspirationThe ArchitectureExamples
3 Conclusion and Future Work
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 2 / 30
MotivationThe Problem
The Problem
How should we represent results of machine learning experiments in acommon, comprehensive and interoperable format?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 3 / 30
MotivationExample 1: Collaborative Project
Three Universities are working collaboratively in a research project
How to achieve a high level of interoperability?
A uses the Weka1 toolkit.
B uses DL-Learner2
C uses the Accord Framework3
1http://www.cs.waikato.ac.nz/ml/weka/2http://dl-learner.org/3http://accord-framework.net/
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 4 / 30
MotivationExample 2: hands on...
A complex script-based scenario
You are working on your research about stock market predictions and wantto store the data for further analysis?
eg.: a script which takes 2 days to run a multi-level machine learningalgorithm.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 5 / 30
MotivationExample 3: Reading or Reviewing a paper
You are a reviewer or scientist...
sometimes it’s hard to understand the proposed solution of a researchpaper.
.The ACL POS Tagging website (State of the art)exemplifies a good use case for MEX on the web 1.
Furthermore, in both cases the task/reading is error-prone andtime-consuming.
1http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_
the_art)
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 6 / 30
MotivationSolution
Machine-readable data
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 7 / 30
MotivationSolution
Existing Standards:
Comma-Separated Values (CSV)
eXtensible Markup Language (XML)
JavaScript Object Notation (JSON)
Value-Object (VO)
Data-Transfer-Objects (DTO)
Database Management System (DBMS)
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 8 / 30
MotivationSolution
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 9 / 30
Motivation3 drawbacks
1 The lack of schema definition: you always have to define theschema by yourself and share your model afterwards.
2 DBMS is technology-dependent and does not provides reasoningand inference capabilities.
3 the lack of semantic information.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 10 / 30
MotivationProblem: an example
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 11 / 30
MotivationThe Problem
The Optimal Scenario
How should we represent results of machine learning experiments in acommon1, comprehensive (but not complex)2, lightweight3,interoperable4 and flexible5 format, taking into consideration a low
effort-level6 for implementation?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 12 / 30
State of the Art
Related Work
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 13 / 30
State of the ArtPlatforms for e-science workflows
Name Description
MyExperiment[DeRoure2009 ]
A collaborative environment
where scientists can
publish their workflows and
experiment plans
Wings[Gil2011 ]
A Semantic Approach to
creating very large
scientific workflows
OpenTOX[Tcheremenskaia2012 ]
An interoperable predictive
toxicology framework
OpenML[Vanschoren2014 ]
A frictionless,
collaborative environment
for exploring machine
learning
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 14 / 30
State of the ArtOntologies
Name Description
Expose[Vanschoren2010 ]
Data mining experiments
used in conjunction with
Experiment Databases
OntoDM[Panov2013 ]
Data mining investigations
DMOP[Keet2015 ]
Data Mining OPtimization
Ontology: It supports
informed decision-making
at various choice points of
the data mining process
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 15 / 30
MEX.aksw.org
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 16 / 30
The abstractionWhat we want to describe
Machine Learning Definition by T.Mitchell
“A computer program is said to learn from experience E with respect tosome task T and some performance measure P, if its performance onT, as measured by P, improves with experience E” – Tom Mitchell
ML Concepts MEX Classes
experience E mexcore:ExecutionCollection
task T mexalgo:Algorithm
performance measure P mexperf:ExecutionPerformance
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 17 / 30
MEX3 Vocabularies
MEX Core
formalizes the key entities for representing the basic
steps on machine learning executions
MEX Algorithm
representing the context of machine learning algorithms and
their associated characteristics
MEX Performance
provides the basic entities for representing the
experimental results of executions of machine learning
algorithms
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 18 / 30
MEX Vocabulary (:mexalgo + :mexcore + :mexperf)
and Related Ontologies
402
778
858
757
MEX (7+14+10=31)
ONTO-DMExpose
DMOP
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 19 / 30
MEXInterlinking the 3 layers: mexalgo, mexcore and mexperf
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 20 / 30
:mexalgo
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 21 / 30
:mexcore
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 22 / 30
:mexperf
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 23 / 30
MEXACL POS Tagging website metadata
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 24 / 30
Next chapter ;-)
RDF? Ontology? Jena?Dublin Core...? SPARQL?OWL? PROV-O, What?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 25 / 30
p u b l i c s t a t i c vo i d main ( S t r i n g [ ] a r g s ) {
MyMEX 10 mex = new MyMEX 10 ( ) ;mex . setAuthorName (”D Es t e v e s ” ) ;S t r i n g e i d = ”E001S001 ” ;
mex . addConf ( e i d ) . s e t D e s c r i p t i o n (” h e l l o wor ld expe r imen t ” ) ;mex . Conf ( e i d ) . addFeature (”min ; max ; op ; c l o s e ” ) ;mex . Conf ( e i d ) . Imp l ementa t i on ( ) . s e t ( enumImplementat ion .Weka ) ;mex . Conf ( e i d ) . addAlgor i thm ( enumAlgorithm . Suppor tVectorMach ines ) ;mex . Conf ( e i d ) . addAlgor i thm ( enumAlgorithm . NaiveBayes ) ;mex . Conf ( e i d ) . A lgo r i thm ( enumAlgorithm . Suppor tVectorMach ines ) . addParameter (”C” , ”10ˆ3”) ;mex . Conf ( e i d ) . A lgo r i thm ( enumAlgorithm . Suppor tVectorMach ines ) . addParameter (” a lpha ” , ” 0 . 2 ” ) ;
. . .}
/∗ your code he r e ∗/. . .
S t r i n g e x i d = mex . Conf ( e i d ) . a ddEx e cu t i o nOve r a l l . addPer formance ( enumMeasures .ACCURACY, . 9 6 ) ;S t r i n g e x i d = mex . Conf ( e i d ) . E x e c u t i o nOv e r a l l ( e x i d ) . addPerformance ( enumMeasures .TPR, . 7 8 ) ;. . .
MEXSe r i a l i z e r 10 . g e t I n s t a n c e ( ) . p a r s e (mex ) ;
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 26 / 30
ConclusionD.Esteves et al.
Requirement Argumentation
lightweight 7: this is the minimal number of
classes you need for representing a
basic execution. 31: this is the
number of the most important entities
in the 3 layers
flexible Single or Overall Executions
Choose your inputs/outputs
low
effort-level
MEX provides APIs which encapsulate
the semantic knowledge. So you can
avoid extra implementation-effort and
just log your inputs and outputs
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 27 / 30
ConclusionD.Esteves et al.
Requirement Argumentation
common The concepts behind vocabularies
allow us to achieve a high level
of abstraction, generalization and
formalization of concepts
interoperable Vocabularies are the current best
choice for representing real-world
entities
comprehensive classification, regression and
clustering problems are covered
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 28 / 30
ConclusionD.Esteves et al.
1 Produces Provenance Metadata.2 Allows Querying Results.3 Defines an Interoperable Format for Sharing Machine Learning
Experiments.4 Benefits Meta-Learning [Vilalta2002 ] Approaches.5 Tends to minimize the misinterpretation probability rate
on persuasive and informative aspects [Gillen2006 ].6 MEX is flexible and lightweight.7 Experiment Databases [Blockeel2007 ][Vanschoren2012 ] need
an interchange format for experiments.8 MEX provides APIs which facilitate the file generation process.9 Benchmark Systems[Usbeck2014 ] can benefit from a standard
format.10 Generate your LaTeX table automatically.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 29 / 30
MEXD.Esteves et al.
Thank you so much for your attention!mex.aksw.org
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 30 / 30