tool support for data validation by end-user programmers
Post on 31-Dec-2015
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Tool Support for Data Validationby End-User Programmers
Christopher Scaffidi
Brad Myers, Mary Shaw
Carnegie Mellon University
22
Target audience:Target audience:End-user programmersEnd-user programmers
• In 2012, in American workplaces– 90 million computer end users– 55 million of whom will create
• Spreadsheets• Databases• Web applications
Introduction Topes Demonstration Conclusion
33
An input validation problemAn input validation problemobserved during contextual inquiryobserved during contextual inquiry
Valid?“EDSH 225”
Questionable?“EDXH 225”
Valid but wrong format?“Smith 225”
Or obviously invalid?“Robotics Institute”
Introduction Topes Demonstration Conclusion
44
Underlying problem: abstraction mismatchUnderlying problem: abstraction mismatch
• Tools support strings, ints, floats, sometimes dates.
• Problem domain involves higher-level categories:– University names– Person names– CMU phone numbers– CMU room numbers
• These data categories are:– Short human-readable strings– Multi-format– Sometimes ambiguous (non-binary scale of validity)– Often particular to certain groups of people
Introduction Topes Demonstration Conclusion
55
Limitations of existing approachesLimitations of existing approaches
• Types do not support questionable values
• Grammars do not, either, nor can they reformat
• Information extraction algorithms rely on grammatical cues that are absent during validation
• Cues, Forms/3, -calculus, Slate, pollution markers, etc, infer numerical constraints but not constraints on strings, nor are they platform-independent
Introduction Topes Demonstration Conclusion
66
New Approach: TopesNew Approach: Topes
• A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data
• Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain
Introduction Topes Demonstration Conclusion
77
A tope is a graph.A tope is a graph.Node = format, edge = transformationNode = format, edge = transformation
Notional representation for a CMU room number tope…
Formal building name& room number
Elliot Dunlap Smith Hall 225
Colloquial building name& room number
Smith 225
Building abbreviation& room number
EDSH 225
Introduction Topes Demonstration Conclusion
88
A tope is a conceptual abstraction.A tope is a conceptual abstraction.A tope A tope implementationimplementation is code. is code.
• Each tope implementation has executable functions:– 1 isa:string[0,1] function per format, for
recognizing instances of the format (a fuzzy set)– 0 or more trf:stringstring functions linking formats,
for transforming values from one format to another
• Validation function:(str) = max(isaf(str))where f ranges over tope’s formats– Valid when (str) = 1– Invalid when (str) = 0– Questionable when 0 < (str) < 1
Introduction Topes Demonstration Conclusion
99
Today’s demonstrationToday’s demonstration(using our latest version)(using our latest version)
• Create phone number tope– Infer boilerplate from examples– What are formats, parts, and constraints?– Label parts, add/fix constraints, test in tool– Validate spreadsheet data– Transform spreadsheet data
• Reuse phone number tope– Create web application– Attach tope-based validator, configure, execute– Valid / invalid / questionable / valid-but-misformatted
Introduction Topes Demonstration Conclusion
1010
Contributions highlighted todayContributions highlighted today
• A model for data...– Short, human-readable strings– Ambiguous categories– Multiple formats
• Implementation features:– Inference of customizable formats from examples– Soft constraints– Human-readable error messages– Validation code is reusable across platforms
Introduction Topes Demonstration Conclusion
1111
Other contributionsOther contributionsnotnot highlighted today highlighted today
• Validating with topes (quantitatively) improves…– Accuracy of validation– Reusability of validation code– Subsequent duplicate identification
• Additional tool features:– Inter-tope reference (ie: “topes in topes”)– Whitelists– Various additional auto-transformation features– Overriding auto-transformation with JavaScript
Introduction Topes Demonstration Conclusion
1212
Validation and Tool MaturityValidation and Tool Maturity
• Expressiveness– Have implemented dozens of topes
• Usability– Fast creation of accurate formats by users in study
• Usefulness– Integrated w/ Excel, Visual Studio, and an XML library– Integrated by IBM & Univ. Nebraska into other tools
Introduction Topes Demonstration Conclusion
1313
Thank You…Thank You…
• To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback
• To NSF for funding
• To ICSE 2008 for this opportunity to present
Introduction Topes Demonstration Conclusion
1414
Available for downloadAvailable for download
http://www.cs.cmu.edu/~cscaffid/software.shtml
Or Google for
"Topes SDK"
Introduction Topes Demonstration Conclusion
top related