similarity search for web services xin (luna) dong, alon halevy, jayant madhavan, ema nemes, jun...

55
Similarity Search Similarity Search for for Web Services Web Services Xin (Luna) Dong Xin (Luna) Dong , Alon Halevy, , Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Jayant Madhavan, Ema Nemes, Jun Zhang Zhang University of Washington University of Washington

Upload: myra-cole

Post on 17-Dec-2015

220 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Similarity Search for Similarity Search for Web ServicesWeb Services

Xin (Luna) DongXin (Luna) Dong, Alon Halevy, , Alon Halevy, Jayant Madhavan, Ema Nemes, Jun ZhangJayant Madhavan, Ema Nemes, Jun Zhang

University of WashingtonUniversity of Washington

Page 2: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Web Service SearchWeb Service Search Web services are getting popular within Web services are getting popular within

organizations and on the weborganizations and on the web The growing number of web services raises the The growing number of web services raises the

problem of web-service search.problem of web-service search. First-generation web-service search engines do First-generation web-service search engines do

keyword search on web-service descriptionskeyword search on web-service descriptions BindingPoint, Grand Central, Web Service List, BindingPoint, Grand Central, Web Service List,

Salcentral, Web Service of the Day, Remote Methods, Salcentral, Web Service of the Day, Remote Methods, etc.etc.

Page 3: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Capture the Keyword Search does not Capture the Underlying SemanticsUnderlying Semantics

zip

Page 4: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Capture the Keyword Search does not Capture the Underlying SemanticsUnderlying Semantics

50

Page 5: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Capture the Keyword Search does not Capture the Underlying SemanticsUnderlying Semantics

zipcode

Page 6: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Capture the Keyword Search does not Capture the Underlying SemanticsUnderlying Semantics

18

Page 7: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Accurately Keyword Search does not Accurately Specify Users’ Information NeedsSpecify Users’ Information Needs

Page 8: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Keyword Search does not Accurately Keyword Search does not Accurately Specify Users’ Information NeedsSpecify Users’ Information Needs

Page 9: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Users Need to Drill Down to Find the Users Need to Drill Down to Find the Desired OperationsDesired Operations

Choose a web service

Page 10: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Users Need to Drill Down to Find the Users Need to Drill Down to Find the Desired OperationsDesired Operations

Choose an operation

Page 11: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Users Need to Drill Down to Find the Users Need to Drill Down to Find the Desired OperationsDesired Operations

Enter the input parameters

Page 12: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Users Need to Drill Down to Find the Users Need to Drill Down to Find the Desired OperationsDesired Operations

Results – output

Page 13: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

How to Improve Web Service Search?How to Improve Web Service Search?Offer users more flexibility by providing Offer users more flexibility by providing

similar operationssimilar operationsBase the similarity comparison on the Base the similarity comparison on the

underlying semanticsunderlying semantics

Page 14: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

1) 1) Provide Similar WS OperationsProvide Similar WS Operations Op1: GetTemperatureOp1: GetTemperature

Input: Zip, AuthorizationInput: Zip, Authorization Output: ReturnOutput: Return

Op2: WeatherFetcherOp2: WeatherFetcher Input: PostCodeInput: PostCode Output: TemperatureF, WindChill, Output: TemperatureF, WindChill,

HumidityHumidity

Similar Operations

Select the most appropriate

one

Page 15: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

2) Provide Operations with Similar Inputs/Outputs2) Provide Operations with Similar Inputs/Outputs Op1: GetTemperatureOp1: GetTemperature

Input: Zip, AuthorizationInput: Zip, Authorization Output: ReturnOutput: Return

Op2: WeatherFetcherOp2: WeatherFetcher Input: PostCodeInput: PostCode Output: TemperatureF, WindChill, Output: TemperatureF, WindChill,

HumidityHumidity Op3: LocalTimeByZipcodeOp3: LocalTimeByZipcode

Input: ZipcodeInput: Zipcode Output: LocalTimeByZipCodeResultOutput: LocalTimeByZipCodeResult

Op4: ZipCodeToCityStateOp4: ZipCodeToCityState Input: ZipCodeInput: ZipCode Output: City, StateOutput: City, State

Similar Inputs

Aggregate the results of

the operations

Page 16: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

3) 3) Provide Composable WS OperationsProvide Composable WS Operations Op1: GetTemperatureOp1: GetTemperature

Input: Zip, AuthorizationInput: Zip, Authorization Output: ReturnOutput: Return

Op2: WeatherFetcherOp2: WeatherFetcher Input: PostCodeInput: PostCode Output: TemperatureF, WindChill, HumidityOutput: TemperatureF, WindChill, Humidity

Op3: LocalTimeByZipcodeOp3: LocalTimeByZipcode Input: ZipcodeInput: Zipcode Output: LocalTimeByZipCodeResultOutput: LocalTimeByZipCodeResult

Op4: ZipCodeToCityStateOp4: ZipCodeToCityState Input: ZipCodeInput: ZipCode Output: City, StateOutput: City, State

Op5: CityStateToZipCodeOp5: CityStateToZipCode Input: City, StateInput: City, State Output: ZipCodeOutput: ZipCode

Input of Op2 is similar to

Output of Op5

Compose web-service operations

Page 17: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Searching with WoogleSearching with Woogle

Similar Operations, Inputs, Outputs

Composable with Input, Output

Page 18: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Searching with WoogleSearching with Woogle

A sample list of similar operations

Jump from operation to operation

Page 19: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Elementary ProblemsElementary Problems Two elementary problems:Two elementary problems:

Operation matching: Operation matching: Given a web-service operation, Given a web-service operation, return a list of similar operationsreturn a list of similar operations

Input/output matching: Input/output matching: Given the input/output of a Given the input/output of a web-service operation, return a list of web-service web-service operation, return a list of web-service operations with similar inputs/outputsoperations with similar inputs/outputs

Goal:Goal: High recallHigh recall: Return potentially similar operations: Return potentially similar operations Good rankingGood ranking: Rank closer operations higher: Rank closer operations higher

Page 20: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Can We Apply Previous Work?Can We Apply Previous Work? Software component matching Software component matching

Require the knowledge of implementation Require the knowledge of implementation – We only know the interface– We only know the interface

Schema matchingSchema matching Similarity on different granularitySimilarity on different granularity Web services are more loosely relatedWeb services are more loosely related

Text document matchingText document matching TF/IDF: term frequency analysis TF/IDF: term frequency analysis E.g. GoogleE.g. Google

Page 21: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Why Text Matching Does not Apply?Why Text Matching Does not Apply? Web page: often long textWeb page: often long text

Web service: very brief descriptionWeb service: very brief description

Lack of informationLack of information

Page 22: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Web Services Have Very Brief Web Services Have Very Brief DescriptionsDescriptions

Page 23: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Why Text Matching Does not Apply?Why Text Matching Does not Apply? Web page: often long textWeb page: often long text

Web service: very brief description Web service: very brief description

Lack of informationLack of information Web page: mainly plain textWeb page: mainly plain text

Web service: more complex structureWeb service: more complex structure

Finding term frequency is not enoughFinding term frequency is not enough

Page 24: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Operations Have More Complex StructuresOperations Have More Complex Structures Op1: GetTemperatureOp1: GetTemperature

Input: Zip, AuthorizationInput: Zip, Authorization Output: ReturnOutput: Return

Op2: WeatherFetcherOp2: WeatherFetcher Input: PostCodeInput: PostCode Output: TemperatureF, WindChill, HumidityOutput: TemperatureF, WindChill, Humidity

Op3: LocalTimeByZipcodeOp3: LocalTimeByZipcode Input: ZipcodeInput: Zipcode Output: LocalTimeByZipCodeResultOutput: LocalTimeByZipCodeResult

Op4: ZipCodeToCityStateOp4: ZipCodeToCityState Input: ZipCodeInput: ZipCode Output: City, StateOutput: City, State

Op5: CityStateToZipCodeOp5: CityStateToZipCode Input: City, StateInput: City, State Output: ZipCodeOutput: ZipCode

Similar use of words, but opposite functionality

Page 25: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Our Solution Our Solution Part 1: Exploit StructurePart 1: Exploit Structure

Web ServiceCorpus

Web service description

Operation name and description

Input parameter names

Output parameter names

OperationSimilarity

Page 26: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Why Text Matching Does not Apply?Why Text Matching Does not Apply? Web page: often long textWeb page: often long text

Web service: very brief description Web service: very brief description

Lack of informationLack of information Web page: mainly plain textWeb page: mainly plain text

Web service: more complex structureWeb service: more complex structure

Finding term frequency is not enoughFinding term frequency is not enough Operation and parameter names are highly variedOperation and parameter names are highly varied

Finding word usage patterns is hard Finding word usage patterns is hard

Page 27: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Parameter Names Are Highly VariedParameter Names Are Highly Varied Op1: GetTemperatureOp1: GetTemperature

Input: Zip, AuthorizationInput: Zip, Authorization Output: ReturnOutput: Return

Op2: WeatherFetcherOp2: WeatherFetcher Input: PostCodeInput: PostCode Output: TemperatureF, WindChill, HumidityOutput: TemperatureF, WindChill, Humidity

Op3: LocalTimeByZipcodeOp3: LocalTimeByZipcode Input: ZipcodeInput: Zipcode Output: LocalTimeByZipCodeResultOutput: LocalTimeByZipCodeResult

Op4: ZipCodeToCityStateOp4: ZipCodeToCityState Input: ZipCodeInput: ZipCode Output: City, StateOutput: City, State

Op5: CityStateToZipCodeOp5: CityStateToZipCode Input: City, StateInput: City, State Output: ZipCodeOutput: ZipCode

Page 28: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Input parameter names

Output parameter names

Our Solution Our Solution Part 2: Cluster Parameters into ConceptsPart 2: Cluster Parameters into Concepts

Web ServiceCorpus

Web service description

Operation name and description

Input parameter names & concepts

Output parameter names & concepts

OperationSimilarity

Concepts

Page 29: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

OutlineOutlineOverviewOverviewClustering parameter namesClustering parameter namesExperimental evaluationExperimental evaluationConclusions and ongoing workConclusions and ongoing work

Page 30: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Parameter NamesClustering Parameter Names Heuristic: Parameter terms tend to express the Heuristic: Parameter terms tend to express the

same concept if they occur together oftensame concept if they occur together often Strategy: Cluster parameter terms into Strategy: Cluster parameter terms into conceptsconcepts

based on their co-occurrencesbased on their co-occurrences Given terms Given terms pp and and qq, , similaritysimilarity from from p p to to qq::

Sim(pSim(pq) = P(q|p) q) = P(q|p) Directional: e.g. Directional: e.g. Sim Sim ((zipzipcodecode) > ) > Sim Sim ((codecodezipzip))

( (ZipCode v.s. TeamCodeZipCode v.s. TeamCode, , ProxyCodeProxyCode, , BarCodeBarCode, etc.), etc.)

Term Term p p is is close close to to qq:: Sim(pSim(pq) > Threshold e.gq) > Threshold e.g. . citycity is close to is close to statestate..

Page 31: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Criteria for an Ideal ClusteringCriteria for an Ideal Clustering High cohesion and low correlationHigh cohesion and low correlation

cohesion cohesion measures the intra-cluster term similaritymeasures the intra-cluster term similarity correlationcorrelation measures the inter-cluster term similarity measures the inter-cluster term similarity

cohesion/correlation scorecohesion/correlation score = = )avg(

)avg(

ncorrelatio

cohesion

Page 32: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Algorithm (I)Clustering Algorithm (I) Algorithm – a series of refinements of the classic Algorithm – a series of refinements of the classic

agglomerative clusteringagglomerative clustering Basic agglomerative clustering: merge clusters Basic agglomerative clustering: merge clusters I I

and and J J if term if term ii in in II is close to term is close to term j j in in JJ

Page 33: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Algorithm (II)Clustering Algorithm (II) Problem: Problem:

{temperature, windchill} + {zip}{temperature, windchill} + {zip}

=>=> {temperature, windchill, zip}{temperature, windchill, zip} Solution: Solution:

Cohesion condition:Cohesion condition: each term in the result cluster is each term in the result cluster is close to most (e.g. half) of the other terms in the close to most (e.g. half) of the other terms in the clustercluster

Refined Algorithm: merge clusters Refined Algorithm: merge clusters I I and and J J only if the only if the result cluster satisfies the cohesion conditionresult cluster satisfies the cohesion condition

Page 34: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Algorithm (III)Clustering Algorithm (III) Problem:Problem:

{code, zip} + {city, state, street}{code, zip} + {city, state, street}

{code} + {zip, city, state, street}{code} + {zip, city, state, street} Solution: split before mergeSolution: split before merge

I

J

I

JI-I’I’

J

I-I’I’ I

JI-I’I’J-J’J’

I-I’I’

J-J’J’

=>=>

Page 35: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Algorithm (IV)Clustering Algorithm (IV) Problem: Problem:

{city, state, street} + {zip, code}{city, state, street} + {zip, code}

=> => {city, state, street, zip, code}{city, state, street, zip, code} Solution: Solution:

noise noise terms – most (e.g. half) of the occurrences are terms – most (e.g. half) of the occurrences are not accompanied by other terms in the conceptnot accompanied by other terms in the concept

After a pass of splitting and merging, remove noise After a pass of splitting and merging, remove noise terms.terms.

Page 36: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Clustering Algorithm (V)Clustering Algorithm (V) Problems: Problems:

The cohesion condition is too strict for large conceptsThe cohesion condition is too strict for large concepts The terms taken off during splitting lose the chance to The terms taken off during splitting lose the chance to

merge with other termsmerge with other terms

Solution: Run the algorithm iterativelySolution: Run the algorithm iterativelydo{do{

refined agglomerative clustering (a set of splitting-and-merging);refined agglomerative clustering (a set of splitting-and-merging);

remove noise terms;remove noise terms;

replace each term with its concept;replace each term with its concept;

} while (} while (no more mergesno more merges))

Page 37: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

OutlinesOutlinesOverviewOverviewClustering parameter namesClustering parameter namesExperimental evaluationExperimental evaluationConclusions and ongoing workConclusions and ongoing work

Page 38: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Experiment Data and Clustering ResultsExperiment Data and Clustering Results Data set:Data set:

790 web services (431 are active)790 web services (431 are active) 1574 distinct operations1574 distinct operations 3148 inputs/outputs3148 inputs/outputs

Clustering results:Clustering results: 1599 parameter terms 1599 parameter terms 623 concepts623 concepts

441 single-term concepts (54 frequent terms and 387 441 single-term concepts (54 frequent terms and 387 infrequent terms)infrequent terms)

182 multi-term concepts (59 concepts with more than 5 182 multi-term concepts (59 concepts with more than 5 terms)terms)

Page 39: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Example ClustersExample Clusters (temperature, heatindex, icon, chance, precipe, uv, like, (temperature, heatindex, icon, chance, precipe, uv, like,

temprature, dew, feel, weather, wind, humid, visible, temprature, dew, feel, weather, wind, humid, visible, pressure, condition, windchill, dewpoint, moonset, sunrise, pressure, condition, windchill, dewpoint, moonset, sunrise, moonrise, sunset, heat, precipit, extend, forecast, china, moonrise, sunset, heat, precipit, extend, forecast, china, local, update)local, update)

(entere, enter, pitcher, situation, overall, hit, double, strike, (entere, enter, pitcher, situation, overall, hit, double, strike, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, player, bat, season, stats, position, experience, throw, player, bat, season, stats, position, experience, throw, players, draft, experier, birth, modifier)players, draft, experier, birth, modifier)

(state, city)(state, city) (zip)(zip) (code)(code)

Page 40: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Example ClustersExample Clusters (temperature, heatindex, icon, chance, precipe, uv, like, (temperature, heatindex, icon, chance, precipe, uv, like,

tempraturetemprature, dew, feel, weather, wind, humid, visible, , dew, feel, weather, wind, humid, visible, pressure, condition, windchill, dewpoint, moonset, sunrise, pressure, condition, windchill, dewpoint, moonset, sunrise, moonrise, sunset, heat, precipit, extend, forecast, china, moonrise, sunset, heat, precipit, extend, forecast, china, local, update)local, update)

(entere, enter, pitcher, situation, overall, hit, double, strike, (entere, enter, pitcher, situation, overall, hit, double, strike, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, player, bat, season, stats, position, experience, throw, player, bat, season, stats, position, experience, throw, players, draft, experier, birth, modifier)players, draft, experier, birth, modifier)

(state, city)(state, city) (zip)(zip) (code)(code)

Page 41: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Example ClustersExample Clusters (temperature, heatindex, icon, chance, precipe, uv, like, (temperature, heatindex, icon, chance, precipe, uv, like,

temprature, dew, feel, weather, wind, humid, visible, temprature, dew, feel, weather, wind, humid, visible, pressure, condition, windchill, dewpoint, moonset, sunrise, pressure, condition, windchill, dewpoint, moonset, sunrise, moonrise, sunset, heat, precipit, extend, forecast, moonrise, sunset, heat, precipit, extend, forecast, chinachina, , local, update)local, update)

(entere, enter, pitcher, situation, overall, hit, double, strike, (entere, enter, pitcher, situation, overall, hit, double, strike, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, stolen, ball, rb, homerun, triple, caught, steal, pct, op, slug, player, bat, season, stats, position, experience, throw, player, bat, season, stats, position, experience, throw, players, draft, experier, birth, modifier)players, draft, experier, birth, modifier)

(state, city)(state, city) (zip)(zip) (code)(code)

Page 42: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Measuring Top-K PrecisionMeasuring Top-K Precision BenchmarkBenchmark

25 web-service operations25 web-service operations From several domainsFrom several domains With different input/output sizes and description sizesWith different input/output sizes and description sizes

Manually label whether the top hits are similarManually label whether the top hits are similar

MeasureMeasure Top-k precision: precision for the top-k hitsTop-k precision: precision for the top-k hits

Page 43: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Top-k Precision for Operation MatchingTop-k Precision for Operation MatchingWoogle

Text matching on descriptions

Ignore structure

Page 44: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Top-k Precision for Input/output MatchingTop-k Precision for Input/output Matching

Page 45: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Measuring Precision and RecallMeasuring Precision and Recall Benchmark:Benchmark:

8 web-service operations and 15 inputs/outputs8 web-service operations and 15 inputs/outputs From 6 domainsFrom 6 domains With different popularityWith different popularity Inputs/outputs convey different numbers of concepts, and Inputs/outputs convey different numbers of concepts, and

concepts have varied popularityconcepts have varied popularity

Manually label similar operations and inputs/outputs.Manually label similar operations and inputs/outputs.

Measure: R-P (Recall-Precision) curveMeasure: R-P (Recall-Precision) curve

Page 46: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on Func

Comb

ParOnly

Woogle

Impact of Multiple Sources of Evidences Impact of Multiple Sources of Evidences in Operation Matchingin Operation Matching

Wooglewithout

clustering

Ignore structure

Text matching on descriptions

Page 47: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on ParIO

ConIO

Woogle

Impact of Parameter Clustering in Impact of Parameter Clustering in Input/output MatchingInput/output Matching

WoogleCompare

only concepts

Compare only parameter names

Page 48: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

ConclusionsConclusions Defined primitives for web-service searchDefined primitives for web-service search Algorithms for similarity search on web-service Algorithms for similarity search on web-service

operationsoperations Exploit structure informationExploit structure information Cluster parameter names into concepts based on Cluster parameter names into concepts based on

their co-occurrencestheir co-occurrences

Experiments show that the algorithm obtains Experiments show that the algorithm obtains high recall and precision.high recall and precision.

Page 49: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work I – Template search Ongoing Work I – Template search on Operationson Operations

Input: city stateOutput: weatherDescription: forecast in the

next nine days

Page 50: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work I – Template search Ongoing Work I – Template search on Operationson Operations

GetWeatherByCityState

Page 51: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work II – Composition Ongoing Work II – Composition search on Operationssearch on Operations

See compositions

Page 52: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work II – Composition Ongoing Work II – Composition search on Operationssearch on Operations

getZIPInfoByAddress+GetNineDayForecastInfo

Page 53: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work III – Automatic Web Ongoing Work III – Automatic Web Service InvocationService Invocation

city=“Seattle” state=“WA”

Page 54: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Similarity Search for Similarity Search for Web ServicesWeb Services

@VLDB 2004@VLDB 2004Xin (Luna) Dong, Alon Halevy, Xin (Luna) Dong, Alon Halevy,

Jayant Madhavan, Ema Nemes, Jun ZhangJayant Madhavan, Ema Nemes, Jun Zhang

University of WashingtonUniversity of Washington

www.cs.washington.edu/wooglewww.cs.washington.edu/woogle

Page 55: Similarity Search for Web Services Xin (Luna) Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang University of Washington

Ongoing Work I – Template search Ongoing Work I – Template search on Operationson Operations

Italian CAPLocation InformationHoliday Information

Get Weather Forecast