autocompletion for mashups presented by: ido schreier writers: neoklis polyzotis ohad greenshpan...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Autocompletion for Mashups
Presented by:Ido Schreier
Writers:Neoklis PolyzotisOhad GreenshpanTova Milo
Copyright 2009 VLDB EndowmentArticle link
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
A Web Mashup Mashup is a technology for integration of
data, services and applications being available on the web, into a single application.
A Web Mashup (cont’)
A collection of Web APIs.
File System
Display
Sound
Network
Operating System
Google Maps
RSS feeds
Songs Lyrics
Weather forecasting
The Web
Program Mashup
Mashup Samples Can be found in www.ProgrammableWeb.com
It will be our reference DB Statistics
4551 Mashups 1573 APIs
Some Samples… Mashups
Why Mashup?
1. Quick Applications’ delivery. 2. Reusing existing (successful)
resources. 3. Possibility to quickly change
applications for new situations.
Why Mashup? (cont’)
4. Gain valuable insights through information remix.
5. Innovate and create value through community contribution.
(1) Choose some relevant
components
(2) Decide which should be
connected and learn their spec
Components Repository
(3) Glue
Mashup Development
The Problem
Given a large number of components, selecting the right components and the appropriate connections between them. Inexperienced developers. Time.
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
The Solution
A System that will assists developers by recommending possible compositions for his chosen components.
“Matchup” “Collective Wisdom” Iterative creation of the Mashup.
“MatchUp” method: AutoCompletionAutoCompletion
In other fields: File locations in Unix. E-mail programs. Source code editors.
Data-model Components
The atomicatomic MashletsMashlets The basic unit in a Mashup. Implements specific functionalities. E.g:
News RSS feeds. Visual functionalities. Draw a map. Extracts coordinates of a place.
The compound Mashlet- - Glue PatternGlue Pattern (GP) A Logical component combines
several atomicatomic MashletsMashlets or Other GPsGPs. Every GP is a Mashup. e.g.:
Glue the previous mentioned Mashlets.Map + coordinates of a place + RSS News feeds
= display News in a map
Our problem domain
MM GPGP
DBDB
M = {atomic Mashlets available on the Web}GP = {GPs available on the Web}
U = {Collection of Mashlets chosen by the user}
UU
Given a group U, “MatchUp” will give the user recommendations of kk possible combinations. Will choose GPs from GP. May Ignore/Replace Mashlets from U. May Add Mashlets from DB.
“MatchUp”: Autocompletion for Mashups.
The MashletsMashlets Supports an interfaceinterface of variables (I/O)
and methods that are visible to other MashletsMashlets.
Internal Data. Rules (Logic):
What’s the output according to given input. May be implemented as queries. Using high-level programming language.
Web-Services Inheritance.
Inheritance of MashletsMashlets MashletsMashlets may be similar to others with
common functionality. Can be distributed into a small group
of types. Chat, Sports, Travel, Photos, etc.
Atomic Mashlets (APIs)
Mashups (GPs)
# In Total 1573 4551
# Of categories 51 20
# at Maps category
99 2134
# at Music Category
60 310
Inheritance of MashletsMashlets (cont’)
Mashlet m2 inherits from Mashlet m1 if: {m1 interface} {m2 interface}
Inheritance distance metric: Quantify the price of using m1 instead
of m2 Dist(m2m1) Є [0,1)
GPs significance measurement A GP g will be called “a candidate
completion” if it can link non-empty subset of U.
Each g is transferred into a D-dimensions point. D=|DB|+1. g Pg == (Pg[0],m1,…,m|DB|) Pgi Є [0..1] , i=0..|DB|
Some definitions
Given GP g and Mashlet m: Components(g)={all Mashlets in g} g(m)=m’, if m’ Є Components(g) and
also m’ is the closest generalization to m.
Dist(mm’) is minimal. g(m)== , if none exists
Pg=(Pg[0],m1,…,m|U|,m|U|+1,…,m|DB|)
g importance against other GPs in GP:
Imp(g) = Static importance of GP g (will be discussed later)
IMP(M)={Imp(g)|gЄ GP}
User’s Mashlets
Dist(mm’)
Pg[m] == 1
0
PIdeal == (0,0,0,0,0,…,0) m1
m2
GP static importan
ce
PIdeal
Pg
Scoring function - S(Pg)
In reverse to the distance between Pg and PIdeal .
Monotony: For g,g’ Є GP , if every coordinate m
value in g, is lower from coordinate m in g’, than S(g’) S(g)
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
>g3,0.7<
>g2,0.5<
>g5,0.4<
>g1,0.3<
>g6,0.2<
>g4,0.2<
>g7,0.1<
>gp,score<
L1
>g5,0.8<
>g7,0.5<
>g2,0.5<
>g1,0.5<
>g3,0.2<
>g4,0<
>gp,score<
L2
>g7,1<
>g6,0.4<
>g5,0.4<
>g4,0.4<
>g3,0.4<
>g2,0.2<
>g1,0>
>gp,score<
L0
>g3,0.9<
>g5,0.8<
>g4,0.8<
>g6,0.7<
>g7,0.6<
>g2,0.6<
>g1,0.1<
>gp,score<
L|DB|
MashletsGP
Popularity
Algorithm internal data
<g,w>Є Lm => g Є GP , g:mm’ , w=Dist(m->m’)
<g,Pg0>
L1
>gp,score<
>g7,0.1<
>g4,0.2<
>g6,0.2<
>g1,0.3<
>g5,0.4<
>g2,0.5<
>g3,0.7<
L2
>gp,score<
>g4,0<
>g3,0.2<
>g1,0.5<
>g2,0.5<
>g7,0.5<
>g5,0.8<
L0
>gp,score<
>g1,0<
>g2,0.2<
>g3,0.4<
>g4,0.4<
>g5,0.4<
>g6,0.4<
>g7,1<
L|DB|
>gp,score<
>g1,0.1<
>g2,0.6<
>g7,0.6<
>g6,0.7<
>g4,0.8<
>g5,0.8<
MashletsGP
Popularity
Algorithm stops when: |PQueue| = kk && S(PQueue(k))>= S(g’)
Problem with the algorithm
The number of lists the algorithm accesses is very large
Most of the Mashlet lists are unrelated to the user’s selection. Average Mashup contain less than 5
Mashlets!
The refined Algorithm
Iterates only L0 to L|U| Not 100% correct when using same definition for S(g’).
Why? Enough for our problem when we redefine the general
threshold S(g’). g’ doesn’t connect any irrelevant Mashlets.
Correctness’ proofs Lemma: Let S(g’) be the threshold at
the end of one iteration. Let g be a candidate GP that has not been yet examined by AC*AC*. Then, S(g) S(g’)
L1
>gp,score<
>g7,0.1<
>g4,0.2<
>g6,0.2<
>g1,0.3<
>g5,0.4<
>g2,0.5<
>g3,0.7<
L2
>gp,score<
>g4,0<
>g3,0.2<
>g1,0.5<
>g2,0.5<
>g7,0.5<
>g5,0.8<
L0
>gp,score<
>g1,0<
>g2,0.2<
>g3,0.4<
>g4,0.4<
>g5,0.4<
>g6,0.4<
>g7,1<
L3
>gp,score<
>g1,0.1<
>g2,0.6<
>g7,0.6<
>g6,0.7<
>g4,0.8<
>g5,0.8<
>g3,0.9<
MashletsGP
Popularity
Lemma’s proof
The Theorem Algorithm AC*AC* returns a correct solution.
A contradiction proof… g supposed to be chosen.
S(gk)<S(g) AC* didn’t find g
S(g) S(g’) AC* stopped
S(g’) S(gk) a contradiction!!!
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
MatchUp’s Implementation
Can be combined in any Mashup’s editor.
Tested at IBM Mashup Center Platform. InfoSphere Mashup Hub- create XML
Data feeds. The Mashlets. Lotus Mashups – visual layer to
assemble some Data feeds. The GPs.
MatchUp’s Implementation (cont’)
Extensions to the current DB: Inheritance information. A relational DB for the lists, GPs scores, etc.
Written in Java. Wrapped as Web-Service
Inputs: A list of Mashlets- U. An integer k.
Output: A list of top-k possible completions for U.
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
Stage #1- Checking Performance
Num of Mashlets: 1-40000. 1:3.5 ratio between |M| and |GP| 4000 GPs at ProgrammableWeb.com
GP Structure GP complexity – c. At ProgrammableWeb.com, 2 c 5
Experiments & Performance (cont’) Inheritance depth
Split to sets Maximal inheritance depth – d. Doesn’t affect performance
Mashlet Importance Uniform distribution for the base function. a,b,c don’t affect performance.
User Input: 2 |U| 20 3 kk 20
Stage #2: Results’ Quality
10 users used the system to build a travel-related Mashup.
k=10 Did the users adopt the recommendations? Could they find some better completions?
The users ranked the given completions. About the same as MatchUp. Reflection of personal “taste”.
What is better: omitting a Mashlet, or adding a redundant one?
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
Summarize What’s a Mashup? “MatchUp” help developers creating
Mashups. Autocompletion mechanism. Can be attached to any Mashup’ editor.
Take advantage of previous works. A TA Algorithm, based some ranking
functions. Efficient and effective.
Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize
Appendixes:1. Computing Importance.2. Future plans.
Appendix 1. Computing Importance
The Static Importance of a GP g - Imp(g) and a Mashlet m –Imp(m).
A base function for each m & g # of downloads. Explicit rating system.
A PageRank style importance Importance by Inheritance. Importance by Mashlets–GPs connections.
3 weigh parameters: a+b+c=1. At “Matchup”, a=b=c=1/3.