worldwide lexicon brian mcconnell may, 2002. wwl – brian mcconnell worldwide lexicon intro...

24
Worldwide Lexicon Brian McConnell May, 2002

Upload: peter-lester

Post on 29-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

Worldwide LexiconBrian McConnell

May, 2002

Page 2: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Worldwide Lexicon Intro

• Automatic discovery of dictionary, semantic net and translation servers throughout the net

• Creates standard client/server interface for communicating with servers

• Creates distributed human computing grid (allows servers to poll idle users to enter data, score recent submissions)

• “GNUtella for language services”

Page 3: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

What WWL Does

• Creates a SOAP based interface for locating and communicating with language services

• Creates mechanism for discovering WWL servers on the fly

• Allows any application to talk to language servers with a few lines of code

• Allows existing dictionaries and MT systems to expose their data via WWL

• Creates something similar to SETI@Home, except it taps idle users to contribute knowledge

• Creates a web services API for language services

Page 4: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

What WWL Does Not Do

• Does not create a global, centrally managed dictionary (WWL is a P2P network of dictionaries and language servers)

• WWL does not provide machine translation services (although WWL can be used to talk to existing MT servers)

• WWL does not compete with existing dictionaries or translation services. It makes existing systems more accessible to applications and their users.

• WWL does not specify details about how dictionary and MT server internal processes

Page 5: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Some Example Applications

• Browser and text editor plug ins

• Extended dictionaries for machine translation systems

• Human assisted document translation

• Lexicon@Home client (polls users to enter data when they’re not busy)

• Multilingual chat clients (poll WWL data sources as needed to assist with translations)

• Real-time translation (via Jabber or SMS)

• Teaching aids

• User supported dictionaries and translation memories

Page 6: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Worldwide Lexicon Protocol

• Built upon the Simple Object Access Protocol

• Applications communicate via a small set of SOAP methods

• HTTP CGI interface also used for data entry and user peer review

• Goal: allow developers to locate and query any WWL data source with a few lines of code.

Page 7: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Protocol Overview

• Three types of methods

• WWL server discovery and network status methods

• WWL client/server query methods

• Utility functions

• About a dozen methods overall

Page 8: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

System Overview

• Four basic types of nodes

• Supernodes (directory servers)

• WWL servers (dictionaries, MT servers, semantic nets)

• Gateways (allow non-WWL servers to present WWL front end)

• Client apps (plug ins, IM clients, Lexicon@Home, etc)

Page 9: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

WWL Server Discovery

• Client app contacts a WWL supernode

• Invokes WWLFindServers() to fetch list of active servers and gateways that can process client’s request

• Supernode replies with a list of WWL servers, as well as information about each server’s capabilities

• WWL servers and gateways announce selves to supernodes at startup via WWLRegister() and WWLServerStatus() methods

Page 10: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

WWL Supernodes

• Track current status of WWL servers and their peers (servers send registration and status messages)

• Client apps use supernodes to locate WWL servers and gateways on the fly (e.g. locate Spanish-French full-text translation server)

• Supernodes also provide quality control (known WWL servers are listed first)

• Anyone can host a supernode (similar to GNUtella directory servers)

Page 11: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

WWL Gateways

• Translate WWL/SOAP method calls into other formats

• Can be used to talk to DICT dictionary servers

• Can be used to talk to proprietary systems

• Can do screen scraping (e.g. send query to web based MT server via CGI, scrape results from HTML response)

• Can even be used to cache and index static wordlists, and to make them appear to users as WWL data sources to any WWL client

Page 12: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Client/Server Communication

• Three SOAP methods allow clients to submit queries to WWL servers via standard interface.

• WWL servers reply via SOAP, results are returned to client app in XML data structure

• WWL interface can co-exist with other interfaces (DICT, web/cgi, WAP, etc)

Page 13: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Typical Client Session

• Contacts WWL supernode(s) to fetch list of active WWL servers according to language, services required

• Contacts top ranked WWL server to perform query (e.g. translate phrase from spanish to french)

• If query fails, contacts other WWL servers to perform query

Page 14: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Application Development

• WWL defines a client/server interface

• Client and server apps can be developed and tested independently

• System is complex, but individual components are simple

• Perfect fit for open source development model

Page 15: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Server Apps & Projects

• Updating existing dictionaries and machine translation servers for WWL and Lexicon@Home

• Building gateway servers that emulate WWL while talking to non-WWL servers (DICT, HTTP, etc)

• Document translation servers based on Lexicon@Home concept

Page 16: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Client Applications

• Browser/text editor plug ins

• WWL chat clients

• Lexicon@Home clients

• Teaching aids

Page 17: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Updating Existing Servers

• As simple as adding a few scripts to respond to SOAP calls (reply via SOAP versus HTML)

• SOAP/WWL interface co-exists with other front ends

• WWL server can be read-only, or can allow user data entry through Lexicon@Home initiative

• Allows hundreds of existing dictionaries, encyclopedia and machine translation servers to participate in WWL with minimal effort

Page 18: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Example: WWL Chat Client

• Listens to incoming and outgoing messages

• When user enables translation, IM client uses WWL to contact machine translation servers as needed

• When user enables dictionary features, IM client assists user in translating words and phrases when composing messages (ideal for users who know a language but are not fluent)

Page 19: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Lexicon@Home

• Distributed human computing

• Users download small client program that polls WWL server(s) for jobs when user is not busy

• When WWL server has job, it instructs Lexicon@Home client to force browser to form/CGI user (data entry form is generated by WWL server)

• User enters requested information (definition, translation, score for other user’s submission)

• Each user does small amount of work, with large population system learns at rapid pace

Page 20: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Quality Control

• Editorial oversight (WWL servers can require some or all user submissions to be reviewed by editors and trusted users via private CGI form)

• Randomized peer review (WWL server asks some lexicon@home users to score submissions from the peers.

• Hybrid system that combines randomized peer review with editorial oversight (editors focus on submissions with ambiguous scores or from unknown users).

Page 21: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Project Timeline

• WWL protocol spec is available at www.worldwidelexicon.org

• Work to develop first generation apps (supernodes, retrofit existing dictionary servers) is underway

• Work to develop Lexicon@Home client is in progress

• Looking for developers to contribute to project

Page 22: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell

Development Priorities

• Stable supernode server

• Source libraries for use by existing dictionary and translation servers

• WWL gateway servers (to talk to non-WWL sites)

• Lexicon@Home client

• Simple client apps (browser plug in, IM client that links to MT servers)

Page 23: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation
Page 24: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation

WWL – Brian McConnell