tool support for data validation by end-user programmers
DESCRIPTION
Tool Support for Data Validation by End-User Programmers. Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University. Target audience: End-user programmers. In 2012, in American workplaces 90 million computer end users 55 million of whom will create Spreadsheets Databases - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/1.jpg)
Tool Support for Data Validationby End-User Programmers
Christopher Scaffidi
Brad Myers, Mary Shaw
Carnegie Mellon University
![Page 2: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/2.jpg)
22
Target audience:Target audience:End-user programmersEnd-user programmers
• In 2012, in American workplaces– 90 million computer end users– 55 million of whom will create
• Spreadsheets• Databases• Web applications
Introduction Topes Demonstration Conclusion
![Page 3: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/3.jpg)
33
An input validation problemAn input validation problemobserved during contextual inquiryobserved during contextual inquiry
Valid?“EDSH 225”
Questionable?“EDXH 225”
Valid but wrong format?“Smith 225”
Or obviously invalid?“Robotics Institute”
Introduction Topes Demonstration Conclusion
![Page 4: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/4.jpg)
44
Underlying problem: abstraction mismatchUnderlying problem: abstraction mismatch
• Tools support strings, ints, floats, sometimes dates.
• Problem domain involves higher-level categories:– University names– Person names– CMU phone numbers– CMU room numbers
• These data categories are:– Short human-readable strings– Multi-format– Sometimes ambiguous (non-binary scale of validity)– Often particular to certain groups of people
Introduction Topes Demonstration Conclusion
![Page 5: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/5.jpg)
55
Limitations of existing approachesLimitations of existing approaches
• Types do not support questionable values
• Grammars do not, either, nor can they reformat
• Information extraction algorithms rely on grammatical cues that are absent during validation
• Cues, Forms/3, -calculus, Slate, pollution markers, etc, infer numerical constraints but not constraints on strings, nor are they platform-independent
Introduction Topes Demonstration Conclusion
![Page 6: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/6.jpg)
66
New Approach: TopesNew Approach: Topes
• A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data
• Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain
Introduction Topes Demonstration Conclusion
![Page 7: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/7.jpg)
77
A tope is a graph.A tope is a graph.Node = format, edge = transformationNode = format, edge = transformation
Notional representation for a CMU room number tope…
Formal building name& room number
Elliot Dunlap Smith Hall 225
Colloquial building name& room number
Smith 225
Building abbreviation& room number
EDSH 225
Introduction Topes Demonstration Conclusion
![Page 8: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/8.jpg)
88
A tope is a conceptual abstraction.A tope is a conceptual abstraction.A tope A tope implementationimplementation is code. is code.
• Each tope implementation has executable functions:– 1 isa:string[0,1] function per format, for
recognizing instances of the format (a fuzzy set)– 0 or more trf:stringstring functions linking formats,
for transforming values from one format to another
• Validation function:(str) = max(isaf(str))where f ranges over tope’s formats– Valid when (str) = 1– Invalid when (str) = 0– Questionable when 0 < (str) < 1
Introduction Topes Demonstration Conclusion
![Page 9: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/9.jpg)
99
Today’s demonstrationToday’s demonstration(using our latest version)(using our latest version)
• Create phone number tope– Infer boilerplate from examples– What are formats, parts, and constraints?– Label parts, add/fix constraints, test in tool– Validate spreadsheet data– Transform spreadsheet data
• Reuse phone number tope– Create web application– Attach tope-based validator, configure, execute– Valid / invalid / questionable / valid-but-misformatted
Introduction Topes Demonstration Conclusion
![Page 10: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/10.jpg)
1010
Contributions highlighted todayContributions highlighted today
• A model for data...– Short, human-readable strings– Ambiguous categories– Multiple formats
• Implementation features:– Inference of customizable formats from examples– Soft constraints– Human-readable error messages– Validation code is reusable across platforms
Introduction Topes Demonstration Conclusion
![Page 11: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/11.jpg)
1111
Other contributionsOther contributionsnotnot highlighted today highlighted today
• Validating with topes (quantitatively) improves…– Accuracy of validation– Reusability of validation code– Subsequent duplicate identification
• Additional tool features:– Inter-tope reference (ie: “topes in topes”)– Whitelists– Various additional auto-transformation features– Overriding auto-transformation with JavaScript
Introduction Topes Demonstration Conclusion
![Page 12: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/12.jpg)
1212
Validation and Tool MaturityValidation and Tool Maturity
• Expressiveness– Have implemented dozens of topes
• Usability– Fast creation of accurate formats by users in study
• Usefulness– Integrated w/ Excel, Visual Studio, and an XML library– Integrated by IBM & Univ. Nebraska into other tools
Introduction Topes Demonstration Conclusion
![Page 13: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/13.jpg)
1313
Thank You…Thank You…
• To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback
• To NSF for funding
• To ICSE 2008 for this opportunity to present
Introduction Topes Demonstration Conclusion
![Page 14: Tool Support for Data Validation by End-User Programmers](https://reader035.vdocuments.us/reader035/viewer/2022071807/5681302f550346895d95c587/html5/thumbnails/14.jpg)
1414
Available for downloadAvailable for download
http://www.cs.cmu.edu/~cscaffid/software.shtml
Or Google for
"Topes SDK"
Introduction Topes Demonstration Conclusion