fusepool machine learning framework

Post on 26-Jun-2015

102 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FusepoolMachine Learning FrameworkJune 25th, Brussels

Fusepool

Structured Content

Visualization

Enable personalized software

Outline

Introduction to adaptive interfacesSource refinementDocument labelingLink predictionAdaptive layout

Simple Machine Learning: Listen-Update-Predict (LUP)

LUP in detail for document labelling

Predictive Query: Predictive queries

Adaptive interfaces

Guillaume Bouchard (Xerox)

Customization/Contextualization of interfaces

Known and accepted by big internet companies

Nor easy to implement for SMEs

Annotation tools

● To manage large knowledge bases, the is a need for efficient interfaces for annotators

● Web2.0 companies are investigating these tools

● Mixed initiativeo A learning algorithm +

human interface● Remark: a user can be

an annotator for some time

Supervised automationIntroduction

ChallengeLOD provides huge amount of dataHard to organize

GoalStreamline KB cleaning and management through implicit and explicit feedback

SpecificationsEasy tagging of documentsNear real-time prediction

Adaptive components in Fusepool

Document category prediction

Entity labeling

Source refinement (re-ranking based on previous user clicks)

Adaptive Layout

Simple Machine Learning:Listen-Update-Predict (LUP)

Guillaume Bouchard (Xerox)

Motivation

● Adaptive systems● Many systems use machine learning algorithms as internal components● The interaction between raw data, annotations, algorithms and predictions is

not simple:• Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume)• Algorithms: multiple possible algorithms for the same task, slow

training/inference• Visualization: must carry the uncertainty about data, annotations and

predictions ●Common problems:• Confusion between predictions and data• Models not automatically updated (manually « re-train » models)• No simple way to test new algorithms• Annotations not shared accross models in the same system• Too few annotations in specific domain (no principled way to gather new

annotations)

Prior art• Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008

• https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf• The Agent Learning Pattern: Implementing ML algorithms in multiagent systems

• http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf• Gestalt, a general-purpose integrated development environment designed the application of

machine learning• Kayur Patel (University of Washington)• http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf

• Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer• http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf

• Infer.net: Probabilistic programming. Compilation of machine learning codes• http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf

• Never-Ending Language Learning (NELL). The closest to our work but focused on language• www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf

Never Ending Language Learning● ● Intelligent computer agent

● Runs forever. Every day:

1. extract, or read, information from the web

2. learn to perform this task better

● Carlson, Betteridge, Kisiel, Settles, Hruschka and Mitchell (2010) give the design principles for such an agent

Machine learning process

LUPI Module overview

ListenGets notified when new annotations arrive

UpdateProcess annotation & update learning models

PredictExposes a prediction service available for other components

InvestigateActively ask for new annotations

LUP modules are monitored by Fusepool main platform

LUP Module Implementation

●LUPEngine in a java interface●Locations: com.xerox.services.LUPEngine

o + getGraphListener(...);o + graphChanged(...);o + updateModels(...);o + predict(...);

Bouchard, Guillaume
define

Guillaume Bouchard (Xerox)

Supervised automationFollow the LUP

ListenUsers give labels to documents in the GUILabels stored in annotation store

UpdateOptimize the model with latest annotationsWarm start machine learning algorithms

PredictReal time prediction based on updated modelVisible in the GUI

Supervised automationArchitecture

Components Process

Supervised automationXerox web services

Update and prediction using REST interface

Scaling up prediction to huge datasets

Listenprivate class MyListener implements GraphListener { public void graphChanged(List<GraphEvent> list) { /** * Listener method: called when matching modifications detected on * the Annostore. This method triggers the Learning process, using * the updateModels(HashMap<String,String> paramas) method. */ annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME); for (GraphEvent e : list) { log.info("New #MyKindOfAnnotation !"); HashMap<String,String> params = new HashMap<String, String>(); // 1.) Accessing the target of the annotation Iterator<Triple> it = annostore.filter(e.getTriple().getSubject(), new UriRef("http://www.w3.org/ns/oa#hasTarget"), null); // 2.) Accessing the content as text of the target // e.g. the new word to insert into the dictionary Resource target = it.next().getObject(); it = annostore.filter((NonLiteral)target, new UriRef("http://www.w3.org/2011/content#chars"), null); String newWord = it.next().getObject().toString(); params.put("newWord", newWord); updateModels(params); } } }

Update

public void updateModels(HashMap<String, String> params) { /** * This method updates the learning models. */ String newWord = params.get("newWord"); log.info("Adding " + newWord + " to dictionnary"); myDictionnary.add(newWord); }

Predict

HashMap<String,String> params = new HashMap<String,String>(); String docURI = "<http://fusepool.info/doc/pmc/2751467>"; /** * We build the parameters to give it to the L3.4via the predictionHub */ params.put("docURI", docURI); /** * We call the LUP34.predict(...) method via the predictionHub.predict(...) method */ String predictedLabels = predictionHub.predict("LUP34", params); /** * We dump the result of the prediction */ log.info(predictedLabels); /** * "tissue__0.713##sodium__0.09135##English__0.016" */

Supervised automationMulti-task learning services

● Better prediction based on multi-task algorithm with label embedding

● Efficient learning algorithmso Alternating optimizationo Stochastic Gradient Descent

● Efficient storage based on Cassandra

Supervised automationSequence diagram

1. The GUI insert annotations

2. The Listener calls the LUP3.4 Module

3. The LUP calls the REST API

4. Then the information flows back when doing prediction

Supervised automationProperly tested interface

Corpus 20 Newgroups WebKB Cade

Tolerance 1 2 3 1 2 3 1 2

Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222

Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266

Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072

Predictive queries

Guillaume Bouchard (Xerox)

Motivation for predictive queries

Most of prediction problems can be expressed as a query on “missing” information.

SELECT ?n WHERE<?d, hasLabel, “WellWritten”><?p, isAuthor, ?d><?p, hasName, ?n>

Semantic Search APIPredictive SPARQL

Core idea: learn a model on KB Now we can query missing data!● SPARQL is a standard query language for semantic data ● Predictive SPARQL: generalization to probabilistic models

Semantic Search APIPredictive SPARQL example

Semantic Search APIPredictive model

● Use of tensor factorization methods

● Tensor=generalization of matrices

● Scalable probabilistic models

● Based on Rescal approximation:

Tikj ≈ eiTRk ej

where:o ei and ej are entitieso Rk is the relational matrix

Predictive Sparql example

Conclusion

Guillaume Bouchard (Xerox)

Main achievements

● LUP: Listen-Update-Predict is a design pattern that provide software engineering best practices

● Predictive SPARQL: A framework for predictive queries on RDF data

Future of Fusepool

Xerox is using Fusepool for exploring and organizing its customer KB

top related