tool support for data validation by end-user programmers

14
Tool Support for Data Validation by End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University

Upload: quintessa-urban

Post on 31-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

Tool Support for Data Validation by End-User Programmers. Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University. Target audience: End-user programmers. In 2012, in American workplaces 90 million computer end users 55 million of whom will create Spreadsheets Databases - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tool Support for Data Validation by End-User Programmers

Tool Support for Data Validationby End-User Programmers

Christopher Scaffidi

Brad Myers, Mary Shaw

Carnegie Mellon University

Page 2: Tool Support for Data Validation by End-User Programmers

22

Target audience:Target audience:End-user programmersEnd-user programmers

• In 2012, in American workplaces– 90 million computer end users– 55 million of whom will create

• Spreadsheets• Databases• Web applications

Introduction Topes Demonstration Conclusion

Page 3: Tool Support for Data Validation by End-User Programmers

33

An input validation problemAn input validation problemobserved during contextual inquiryobserved during contextual inquiry

Valid?“EDSH 225”

Questionable?“EDXH 225”

Valid but wrong format?“Smith 225”

Or obviously invalid?“Robotics Institute”

Introduction Topes Demonstration Conclusion

Page 4: Tool Support for Data Validation by End-User Programmers

44

Underlying problem: abstraction mismatchUnderlying problem: abstraction mismatch

• Tools support strings, ints, floats, sometimes dates.

• Problem domain involves higher-level categories:– University names– Person names– CMU phone numbers– CMU room numbers

• These data categories are:– Short human-readable strings– Multi-format– Sometimes ambiguous (non-binary scale of validity)– Often particular to certain groups of people

Introduction Topes Demonstration Conclusion

Page 5: Tool Support for Data Validation by End-User Programmers

55

Limitations of existing approachesLimitations of existing approaches

• Types do not support questionable values

• Grammars do not, either, nor can they reformat

• Information extraction algorithms rely on grammatical cues that are absent during validation

• Cues, Forms/3, -calculus, Slate, pollution markers, etc, infer numerical constraints but not constraints on strings, nor are they platform-independent

Introduction Topes Demonstration Conclusion

Page 6: Tool Support for Data Validation by End-User Programmers

66

New Approach: TopesNew Approach: Topes

• A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data

• Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain

Introduction Topes Demonstration Conclusion

Page 7: Tool Support for Data Validation by End-User Programmers

77

A tope is a graph.A tope is a graph.Node = format, edge = transformationNode = format, edge = transformation

Notional representation for a CMU room number tope…

Formal building name& room number

Elliot Dunlap Smith Hall 225

Colloquial building name& room number

Smith 225

Building abbreviation& room number

EDSH 225

Introduction Topes Demonstration Conclusion

Page 8: Tool Support for Data Validation by End-User Programmers

88

A tope is a conceptual abstraction.A tope is a conceptual abstraction.A tope A tope implementationimplementation is code. is code.

• Each tope implementation has executable functions:– 1 isa:string[0,1] function per format, for

recognizing instances of the format (a fuzzy set)– 0 or more trf:stringstring functions linking formats,

for transforming values from one format to another

• Validation function:(str) = max(isaf(str))where f ranges over tope’s formats– Valid when (str) = 1– Invalid when (str) = 0– Questionable when 0 < (str) < 1

Introduction Topes Demonstration Conclusion

Page 9: Tool Support for Data Validation by End-User Programmers

99

Today’s demonstrationToday’s demonstration(using our latest version)(using our latest version)

• Create phone number tope– Infer boilerplate from examples– What are formats, parts, and constraints?– Label parts, add/fix constraints, test in tool– Validate spreadsheet data– Transform spreadsheet data

• Reuse phone number tope– Create web application– Attach tope-based validator, configure, execute– Valid / invalid / questionable / valid-but-misformatted

Introduction Topes Demonstration Conclusion

Page 10: Tool Support for Data Validation by End-User Programmers

1010

Contributions highlighted todayContributions highlighted today

• A model for data...– Short, human-readable strings– Ambiguous categories– Multiple formats

• Implementation features:– Inference of customizable formats from examples– Soft constraints– Human-readable error messages– Validation code is reusable across platforms

Introduction Topes Demonstration Conclusion

Page 11: Tool Support for Data Validation by End-User Programmers

1111

Other contributionsOther contributionsnotnot highlighted today highlighted today

• Validating with topes (quantitatively) improves…– Accuracy of validation– Reusability of validation code– Subsequent duplicate identification

• Additional tool features:– Inter-tope reference (ie: “topes in topes”)– Whitelists– Various additional auto-transformation features– Overriding auto-transformation with JavaScript

Introduction Topes Demonstration Conclusion

Page 12: Tool Support for Data Validation by End-User Programmers

1212

Validation and Tool MaturityValidation and Tool Maturity

• Expressiveness– Have implemented dozens of topes

• Usability– Fast creation of accurate formats by users in study

• Usefulness– Integrated w/ Excel, Visual Studio, and an XML library– Integrated by IBM & Univ. Nebraska into other tools

Introduction Topes Demonstration Conclusion

Page 13: Tool Support for Data Validation by End-User Programmers

1313

Thank You…Thank You…

• To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback

• To NSF for funding

• To ICSE 2008 for this opportunity to present

Introduction Topes Demonstration Conclusion

Page 14: Tool Support for Data Validation by End-User Programmers

1414

Available for downloadAvailable for download

http://www.cs.cmu.edu/~cscaffid/software.shtml

Or Google for

"Topes SDK"

Introduction Topes Demonstration Conclusion