no application is an island: using topes to transform strings during data transfer atipol...
TRANSCRIPT
No application is an island: No application is an island: Using topes to transform strings Using topes to transform strings
during data transferduring data transfer
Atipol Asavametha, Prashanth Ayyavu, Christopher ScaffidiSchool of Electrical Engineering and Computer Science
Oregon State University
22
Problem: Data heterogeneityProblem: Data heterogeneityamong software componentsamong software components
• Software components– Created by autonomous stakeholders
– Differing data formats
– May switch to new formats without prior notice
• Programmers– Need to move data between elements automatically
• End users– Need to move data between elements manually
problem approach evaluation
33
Example: Exchanging person namesExample: Exchanging person names
John Smith today
Smith, John tomorrow – unexpected format!unanticipated need for “glue code” to reformat
Lincolnshire MCC tomorrow – questionable!need to validate data, maybe trigger fail-over
Similar issues for data from users, external datasets, or the web.
problem approach evaluation
44
Other examples ofOther examples ofdata format heterogeneitydata format heterogeneity
• Room Numbers– NSH 3103 vs Newell Simon Hall 3103
• Stocks– GOOG vs Google vs Google Corporation
• Address Lines– 101 Main St. vs 101 MAIN STREET vs 101 Main Str.
• Phone Numbers– 888-800-2030 vs +1 888 800 2030 vs (888) 800-2030
• State Names– California vs CA vs Calif.
problem approach evaluation
55
Insight: Exchange Insight: Exchange kindskinds of data of data(rather than particular formats)(rather than particular formats)
John Smith303-202-3030101 Main St.Pittsburgh, PA
Doe, Jane+1 717 292 303088 Brooke LanePITTSBURGHPennsylvania
RAY TILL(404) 555-12032 PITT STPGH, Penna.
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA
JOHN SMITH(303) 202-3030101 MAIN STPittsburgh, PA
problem approach evaluation
66
Insight: Exchange Insight: Exchange kindskinds of data of data(rather than particular formats)(rather than particular formats)
• Three loci for reformatting…– Before transmitting (from source component)
– After receiving (at receiving component)
– Or along the way (in the connector itself)
problem approach evaluation
Could be a database,web site, XML web service,
desktop application, …
Could be a database,web site, XML web service,
desktop application, …
77
Use topes to reformat!Use topes to reformat!
• A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data
• Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain
• Examples:– Tope for person name
– Tope for university names (and abbreviations)
– Tope for North American phone numbers
– Tope for Oregon State University phone numbers
problem approach evaluation
88
A tope is a graph.A tope is a graph.Node = format, edge = transformationNode = format, edge = transformation
Notional representation for an OSU room number tope…
Formal building name& room number
Kelley Engineering Center 1148
Colloquial building name& room number
Kelley 1148
Building abbreviation& room number
KEC 1148
problem approach evaluation
99
A tope is a conceptual abstraction.A tope is a conceptual abstraction.A tope A tope implementationimplementation is code. is code.
• Each tope implementation has executable functions:– 1 isa:string[0,1] function per format, for
recognizing instances of the format (a fuzzy set)
– 0 or more trf:stringstring functions linking formats, for transforming values from one format to another
• Validation function:(str) = max(isaf(str))
where f ranges over tope’s formats
– Valid when (str) = 1
– Invalid when (str) = 0
– Questionable when 0 < (str) < 1
problem approach evaluation
1010
But will it really work?But will it really work?
• For a range of different kinds of components, e.g….Web service application
Application web service
Web site web site
Desktop application web site
… and other combinations?
• How to specify which tope functions to invoke?• How much work will it be, in practice?
problem approach evaluation
1111
Case study propositionsCase study propositions
• Most of the difficulties encountered will result from technologies other than topes.
• Topes will be able to perform the string transformations needed in a variety of situations.
• Topes will be useful at all three loci (before/during/after data transfer), though not necessarily in every combination of locus and architectural style.
• Using topes will simplify the code required to perform string transformations.
problem approach evaluation
1212
Case #1: Enhanced Windows clipboardCase #1: Enhanced Windows clipboard
problem approach evaluation
1313
Case #2: Enhanced web macro toolCase #2: Enhanced web macro tool
• go to “http://people.oregonstate.edu/~ayyavup/form.html”• enter “Prashanth Ayyavu” into the “Full name” textbox• copy the “Full name” textbox• go to “http://some.other.website.com/myform.html”• paste in “DAVID JAMES” format from “person name” into
the “your name” textbox
(The CoScripter web macro tool already had copy/paste functionality; we just added the clauses for reformatting.)
problem approach evaluation
1414
Case #3: Web service libraryCase #3: Web service library
XML<!-- topesheet = http://eecs.oregonstate.edu/mytopes.txt -->
<mydoc><whatever>
<tel>233-222-3040</tel><date>11-Jan-96</date>
<tel>(203)484-2030</tel><date>12/30/2007</date>
</whatever></mydoc>
TopeSheetxpath:/mydoc/whatever/date{tope:url(http://www.w3c.org/topes/date_EN.xml);}
xpath:/mydoc/whatever/tel{tope:url(http://myserver.com/custom_tel.xml);}
Client CodeItemLoader loader = ItemLoader.FromXml(xml);
ItemSet items = loader.Load("xpath:/*/tel");
List<String> values = items.FormatAs("+1 404 505 6060");
// overloaded methods let you override the topes and/or validate the data
problem approach evaluation
1515
Summary of findingsSummary of findings
1. Clipboard 2. Web macros 3. Web services
Main sources of difficulty
Windows API Reading the CoScripter code; interfacing to our topes library
Web services becoming unavailable
Topes can handle the kinds of strings
Yes Yes Yes
Topes useful at all three loci
Connector CoScripter component (acts as connector between websites)
Sender or receiver of data
Topes simplify reformatting code
Yes No… needed interface code
Yes
problem approach evaluation
1616
ConclusionConclusion
• Software elements can use varying formats– No explicit references to format identifiers
– No need for ontology consensus
• Topes are reusable for data in…• XML nodes Database tuples
• HTML tags Webform fields
• Spreadsheet cells …and more
• Main challenge is interfacing to library across languages
problem approach evaluation