fusepool machine learning framework

FusepoolMachine Learning FrameworkJune 25th, Brussels

Fusepool

Structured Content

Visualization

Enable personalized software

Outline

Introduction to adaptive interfacesSource refinementDocument labelingLink predictionAdaptive layout

Simple Machine Learning: Listen-Update-Predict (LUP)

LUP in detail for document labelling

Predictive Query: Predictive queries

Adaptive interfaces

Guillaume Bouchard (Xerox)

Customization/Contextualization of interfaces

Known and accepted by big internet companies

Nor easy to implement for SMEs

Annotation tools

● To manage large knowledge bases, the is a need for efficient interfaces for annotators

● Web2.0 companies are investigating these tools

● Mixed initiativeo A learning algorithm +

human interface● Remark: a user can be

an annotator for some time

Supervised automationIntroduction

ChallengeLOD provides huge amount of dataHard to organize

GoalStreamline KB cleaning and management through implicit and explicit feedback

SpecificationsEasy tagging of documentsNear real-time prediction

Adaptive components in Fusepool

Document category prediction

Entity labeling

Source refinement (re-ranking based on previous user clicks)

Adaptive Layout

Simple Machine Learning:Listen-Update-Predict (LUP)

Motivation

● Adaptive systems● Many systems use machine learning algorithms as internal components● The interaction between raw data, annotations, algorithms and predictions is

not simple:• Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume)• Algorithms: multiple possible algorithms for the same task, slow

training/inference• Visualization: must carry the uncertainty about data, annotations and

predictions ●Common problems:• Confusion between predictions and data• Models not automatically updated (manually « re-train » models)• No simple way to test new algorithms• Annotations not shared accross models in the same system• Too few annotations in specific domain (no principled way to gather new

annotations)

Prior art• Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008

• https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf• The Agent Learning Pattern: Implementing ML algorithms in multiagent systems

• http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf• Gestalt, a general-purpose integrated development environment designed the application of

machine learning• Kayur Patel (University of Washington)• http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf

• Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer• http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf

• Infer.net: Probabilistic programming. Compilation of machine learning codes• http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf

• Never-Ending Language Learning (NELL). The closest to our work but focused on language• www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf

Never Ending Language Learning● ● Intelligent computer agent

● Runs forever. Every day:

1. extract, or read, information from the web

2. learn to perform this task better

● Carlson, Betteridge, Kisiel, Settles, Hruschka and Mitchell (2010) give the design principles for such an agent

Machine learning process

LUPI Module overview

ListenGets notified when new annotations arrive

UpdateProcess annotation & update learning models

PredictExposes a prediction service available for other components

InvestigateActively ask for new annotations

LUP modules are monitored by Fusepool main platform

LUP Module Implementation

●LUPEngine in a java interface●Locations: com.xerox.services.LUPEngine

o + getGraphListener(...);o + graphChanged(...);o + updateModels(...);o + predict(...);

Supervised automationFollow the LUP

ListenUsers give labels to documents in the GUILabels stored in annotation store

UpdateOptimize the model with latest annotationsWarm start machine learning algorithms

PredictReal time prediction based on updated modelVisible in the GUI

Supervised automationArchitecture

Components Process

Supervised automationXerox web services

Update and prediction using REST interface

Scaling up prediction to huge datasets

Listenprivate class MyListener implements GraphListener { public void graphChanged(List<GraphEvent> list) { /** * Listener method: called when matching modifications detected on * the Annostore. This method triggers the Learning process, using * the updateModels(HashMap<String,String> paramas) method. */ annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME); for (GraphEvent e : list) { log.info("New #MyKindOfAnnotation !"); HashMap<String,String> params = new HashMap<String, String>(); // 1.) Accessing the target of the annotation Iterator<Triple> it = annostore.filter(e.getTriple().getSubject(), new UriRef("http://www.w3.org/ns/oa#hasTarget"), null); // 2.) Accessing the content as text of the target // e.g. the new word to insert into the dictionary Resource target = it.next().getObject(); it = annostore.filter((NonLiteral)target, new UriRef("http://www.w3.org/2011/content#chars"), null); String newWord = it.next().getObject().toString(); params.put("newWord", newWord); updateModels(params); } } }

Update

public void updateModels(HashMap<String, String> params) { /** * This method updates the learning models. */ String newWord = params.get("newWord"); log.info("Adding " + newWord + " to dictionnary"); myDictionnary.add(newWord); }

Predict

HashMap<String,String> params = new HashMap<String,String>(); String docURI = "<http://fusepool.info/doc/pmc/2751467>"; /** * We build the parameters to give it to the L3.4via the predictionHub */ params.put("docURI", docURI); /** * We call the LUP34.predict(...) method via the predictionHub.predict(...) method */ String predictedLabels = predictionHub.predict("LUP34", params); /** * We dump the result of the prediction */ log.info(predictedLabels); /** * "tissue__0.713##sodium__0.09135##English__0.016" */

Supervised automationMulti-task learning services

● Better prediction based on multi-task algorithm with label embedding

● Efficient learning algorithmso Alternating optimizationo Stochastic Gradient Descent

● Efficient storage based on Cassandra

Supervised automationSequence diagram

1. The GUI insert annotations

2. The Listener calls the LUP3.4 Module

3. The LUP calls the REST API

4. Then the information flows back when doing prediction

Supervised automationProperly tested interface

Corpus 20 Newgroups WebKB Cade

Tolerance 1 2 3 1 2 3 1 2

Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222

Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266

Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072

Predictive queries

Motivation for predictive queries

Most of prediction problems can be expressed as a query on “missing” information.

SELECT ?n WHERE<?d, hasLabel, “WellWritten”><?p, isAuthor, ?d><?p, hasName, ?n>

Semantic Search APIPredictive SPARQL

Core idea: learn a model on KB Now we can query missing data!● SPARQL is a standard query language for semantic data ● Predictive SPARQL: generalization to probabilistic models

Semantic Search APIPredictive SPARQL example

Semantic Search APIPredictive model

● Use of tensor factorization methods

● Tensor=generalization of matrices

● Scalable probabilistic models

● Based on Rescal approximation:

Tikj ≈ eiTRk ej

where:o ei and ej are entitieso Rk is the relational matrix

Predictive Sparql example

Conclusion

Main achievements

● LUP: Listen-Update-Predict is a design pattern that provide software engineering best practices

● Predictive SPARQL: A framework for predictive queries on RDF data

Future of Fusepool

Xerox is using Fusepool for exploring and organizing its customer KB

fusepool machine learning framework

machine learning systems

machine learning process

new annotations

new algorithms annotations

agent learning pattern

language learning nell

lup lup

adaptive components

Technology

johannes hercher developer linking data presentation...

machine for major manufacturers - .net framework

o parl developer presentation fusepool-locationmapper...

graphlab a new framework for parallel machine learning

a machine learning framework for solving high-dimensional

a machine learning framework to forecast wave conditions

qt state machine framework

a new parallel framework for machine learning

ii-sdv 2013 large scale application of text mining and...

digital foveation: an energy-aware machine vision framework

tabla: a framework for accelerating statistical machine...

mp2ml: a mixed-protocol machine learning framework for

a machine learning security framework for iot systems

machine learning based framework for maintaining privacy

machine learning big data framework and analytics for big...

edf2012 michael kaschesky - fusepool - fusing and pooling...

esa-hmi standardized framework for designing human-machine

a machine learning framework for energy consumption...

text analytics in the eu fusepool project (at ii-sdv 2013...

aby3: a mixed protocol framework for machine learning ·...