fast, accurate creation of data validation formats by end-user developers christopher scaffidi brad...
Post on 15-Jan-2016
217 views
TRANSCRIPT
![Page 1: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/1.jpg)
Fast, Accurate Creation of Data Validation Formats by
End-User Developers
Christopher ScaffidiBrad Myers, Mary Shaw
Carnegie Mellon University
![Page 2: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/2.jpg)
22
Contextual inquiry:Contextual inquiry:What challenges do end users face?What challenges do end users face?
Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each)
Background Toped Evaluation New Opportunities
![Page 3: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/3.jpg)
33
One person’s task: validate web forms--One person’s task: validate web forms--but he didn’t know JavaScript / regexpsbut he didn’t know JavaScript / regexps
Is the input valid?“EDSH 225”
Is the input questionable?“GATE 225”
Or is it obviously invalid?“412-555-5444”
Background Toped Evaluation New Opportunities33
![Page 4: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/4.jpg)
44
Hurricane Katrina “Person Locator” site:Hurricane Katrina “Person Locator” site:Many inputs unvalidatedMany inputs unvalidated
Background Toped Evaluation New Opportunities44
![Page 5: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/5.jpg)
55
Spreadsheets contain lots of typos:Spreadsheets contain lots of typos:inconsistent formatting & invalid stringsinconsistent formatting & invalid strings
• Above: part of an actual spreadsheet on our university’s web site• Plenty of invalid strings in users’ spreadsheets during contextual inquiry• For thousands of other examples: EUSES Spreadsheet Corpus
Background Toped Evaluation New Opportunities
![Page 6: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/6.jpg)
66
Needed: a usable mechanism for Needed: a usable mechanism for implementing validationimplementing validation
66 Background Toped Evaluation New Opportunities
![Page 7: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/7.jpg)
77
Coming Up…Coming Up…
• Background– Formative pilot study– Related work
• Toped
• Evaluations– Usability– Expressiveness
• New opportunities
![Page 8: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/8.jpg)
88
Formative pilot studyFormative pilot study
• Motivation: Exploring the “gulf of execution” for data– User has to figure out how to map intentions to the
features provided by a computer system– Poor “closeness of mapping” impedes system use Before designing system, probe the concepts and
terminology familiar to users
• Asked 4 administrative assistants to verbally describe two kinds of data– American mailing addresses– University project numbers
Background Toped Evaluation New Opportunities
![Page 9: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/9.jpg)
99
Formative pilot studyFormative pilot study
• Participants identified and named the parts of data• Eg: Street address, city, state, zip code
– They hierarchically refined parts until sub-parts became small enough that they lacked names
• At that point, they described parts with constraints– Constraints were sometimes “soft”: not always true– They used adverbs of frequency to indicate softness
• Eg: “usually” or “sometimes”
• Implications– Users describe data in terms of constrained parts– Valid data sometimes violate certain constraints
Background Toped Evaluation New Opportunities
![Page 10: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/10.jpg)
1010
Alternate approaches: limited support for Alternate approaches: limited support for expressing constraints on structured stringsexpressing constraints on structured strings
• Grammars based on sequences of characters– Context-free grammars (CFGs)
• Grammex• Apple data detectors (CFGs + regexps)
– Regular expressions (regexps)• SWYN regexp editor
• Lapis patterns: constrained structured strings– Intentionally designed to support outlier finding
@PhoneNumber is Number equal to /\d\d\d/ then "-" then Number equal to /\d\d\d\d/ ignoring nothing
Background Toped Evaluation New Opportunities
![Page 11: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/11.jpg)
1111
1. Name
2. Describe
3. Test
4. Save
1111 Background Toped Evaluation New Opportunities
Toped: A form fill-in UI to Toped: A form fill-in UI to mediatemediatebetween users and grammarsbetween users and grammars
![Page 12: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/12.jpg)
1212
The system generates an augmented CFG The system generates an augmented CFG from format descriptionfrom format description
A part that almost always has 1-8 lowercase letters:
#WORD : #CHLIST : COUNT(#CH)>=1 && COUNT(#CH)<=8 {90}#CHLIST : #CH | #CH #CHLIST #CH : a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
• More compact than a pure CFG• More expressive than a pure CFG
– Some constraints are impossible to represent as CFG– Some constraints need to be soft
Background Toped Evaluation New Opportunities
![Page 13: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/13.jpg)
1313
Testing strings against grammarsTesting strings against grammars
• Downgrade a parse if it violates constraints– Penalty = 1 – (strength of constraint)/100– Multiply penalties– Propagate penalties up parse tree– Choose best parse (ie: parse with least penalties)
• Show error messages– Track violated constraints, concatenate into message
• If parse fails completely, show portions of format description that were used to generate unsatisfied CFG productions.
– End-user development tools may offer user option of overriding some errors, depending on penalties.
Background Toped Evaluation New Opportunities
![Page 14: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/14.jpg)
1414
Showing error messages after testing Showing error messages after testing strings against the generated CFGsstrings against the generated CFGs
1414Background Toped Evaluation New Opportunities
![Page 15: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/15.jpg)
1515
Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?
• Between-subjects lab experiment– Direct comparison system: Lapis– (We also compare results to those of SWYN study – see paper)
• Recruited 17 participants (9 Toped, 8 Lapis)– Approx half were administrative assistants, approx
half were master’s students (mostly information systems), distributed roughly equally across tools
– 1 participant mis-interpreted instructions (=> 8 & 8)
Background Toped Evaluation New Opportunities
![Page 16: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/16.jpg)
1616
Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?
• Study structure– Background questionnaire– Tutorial (30 min)– 3 tasks (20 min)– User satisfaction questionnaire
• Detail of a task:– Validate 1 kind of data
• phone numbers, mailing addresses, company names– User goal: For each kind, find typos in 25 strings
• Randomly drawn from EUSES spreadsheet corpus• And we also retained 25 strings for further accuracy tests
Background Toped Evaluation New Opportunities
![Page 17: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/17.jpg)
1717
Usability: Users were nearly 2 times as fast Usability: Users were nearly 2 times as fast and found 3 times as many typosand found 3 times as many typos
Toped Lapis RelativeImprovement
Significant?(Mann-Whitney)
Tasks completed 2.79 1.75 60% p<0.01
Typos identified
On 75 visible strings 16.50 5.75 187% p<0.01
On all 150 strings 31.25 9.50 229% p<0.01
F1 accuracy measure
On 75 visible strings 0.74 0.51 45% No
On all 150 strings 0.68 0.46 48% No
User satisfaction 3.78 3.06 24% p=0.02
Toped also compares favorably to SWYN regexp editor – see paper
Background Toped Evaluation New Opportunities
![Page 18: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/18.jpg)
1818
Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?
• Logged data typed by 4 users into browser (3 weeks)– For each text string, we recorded:
• A label for the text field (e.g.: “Phone”)• A regexp summarizing the string (e.g.: \d\d\d-\d\d\d-\d\d\d\d)
• Examined data, wrote scripts to cluster strings– 94% of the 5897 strings were in 19 clusters– Each cluster had 1-2 formats
• Used Toped to create formats– Omitted 5 clusters that were for “general text”, usernames or
passwords (so we could post format descriptions online)
Background Toped Evaluation New Opportunities
![Page 19: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/19.jpg)
1919
Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?
• Overall, successful– We were able to create formats for each kind of data– The formats identified many probable typos
• Ideas for improvements– Ways to reuse constraints from format to format– Primitives for kinds of parts: Numeric, word-like, …
Background Toped Evaluation New Opportunities
![Page 20: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/20.jpg)
2020
Data Description EditorData Description EditorTopedToped++: an improved editor: an improved editor
2020Background Toped Evaluation New Opportunities
![Page 21: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/21.jpg)
2121
Contributions and New OpportunitiesContributions and New Opportunities
• Toped – UI to mediate between users & grammars– Enables users to work faster & more effectively– Adequately expressive for validating many kinds of data– Provided a start for new line of similar editor tools
• New Opportunities (aka “Future Work”)– Extending Toped+ to automatically reformat data [IUI’09]– Providing a repository for sharing formats (in-progress)
– Developing new ways to make use of ability to identify strings that violate soft constraints
Background Toped Evaluation New Opportunities
![Page 22: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/22.jpg)
2222
Thank You…Thank You…
• To Margaret Burnett, Brad Myers, Valentina Grigoreanu, Mary Beth Rosson, Mary Shaw and others in the EUSES Consortium for feedback over the years
• To NSF for funding
• To ISEUD 2009 for this opportunity to present
![Page 23: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022081520/56649d385503460f94a10b4a/html5/thumbnails/23.jpg)
2323
TopedToped++: key improvements vs Toped: key improvements vs Topedin terms of Cognitive Dimensionsin terms of Cognitive Dimensions
• Better closeness of mapping– Constraints “belong” to parts in all formats
• Higher juxtaposability– Easy to view & compare multiple formats
• Lower error-proneness– Helps prevent senseless combinations of constraints
• Lower viscosity– Drag-and-drop / copy-and-paste speeds up edits
• Improved progressive evaluation– User can test each part individually
Background Toped Evaluation New Opportunities