6.uap report - usable programming group - mit csailup.csail.mit.edu/other-pubs/andreslp-aup.pdf ·...

Post on 20-Apr-2019

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

6.UAP REPORTCarSinch

Andrés López-PinedaMay 5th, 2012

CarSinch! 1

IntroductionThis paper describes work on a project named CarSinch, previously designed and imple-mented by Rob Miller, Akansha Kumar, and Hieu Tran. CarSinch addresses the problem of web searching by drivers of a car, which is inherently risky and inefficient. CarSinch is in-tended to be a safe alternative to manual interaction with a phone interface for answering questions, for those occasions when there isn’t a passenger available to find the answer for the driver. It allows for users to speak their query and get a answer quickly and without much manual interaction. It uses audio inputs and feedback to allow the user to keep their focus on the road.

There have been other approaches to this, such as ChaCha and Apple’s Siri, but CarSinch is an implementation focused on the experience of doing this task while driving, with an inter-face that is intended to be built into the dashboard of a car. In addition, it uses Amazon’s Mechanical Turk [1] on the backend to provide answers. Mechanical Turk is an online sys-tem designed to allow for hiring of labor for small tasks, and appropriately small amounts of money. The workers, or “Turkers,” are commonly asked to perform simple human computa-tional tasks, so CarSinch uses them to answer the driver’s question appropriately. This allows CarSinch to provide more flexible answers, as opposed to ChaCha which has a static textual response, and also allows for human verification, as opposed to Siri which relies on AI.

Additionally, CarSinch provides feedback to the user that their question is being answered. It provides audio feedback as different parts of process are completed, such as when Turkers start searching, when they find an answer, and when they compile additional, contextual in-formation to support their initial answer. This allows for users to keep their mind focused on the road, instead of worrying that the system is down or that their question isn’t getting answered.

The design of the system is split into two parts, one for the interface used by drivers on a tablet and one for the interface used by Turkers. The driver uses the tablet interface to ask a question, which is transmitted to our servers and submitted as a question for Turkers to an-swer. When one of them have completed an answer, the driver’s interface is updated to show the resulting data. Since the driver and Turker have different inputs, outputs, and require-ments for their tasks, the system requires two separate designs. Most sections of this paper are also similarly split, so each interface can be described within its appropriate context.

Several evaluations were done on the two separate designs. The driver’s interface was pri-marily tested for safety and efficiency, and the Turker’s interface was primarily tested for la-tency and correctness. This allowed us to ensure that the end-to-end experience by the driver was similar to asking a passenger a question, which is what the system is trying to mimic.

CarSinch! 2

BackgroundCarSinch is a system that allows a mobile user to use an Android phone to speak a question and obtain an answer, which is provided by Turkers (workers paid through Amazon’s Me-chanical Turk [1]). It was previously worked on by Rob Miller, Rajeev Nayak, Akansha Ku-mar, and Hieu Tran. The main goal of this system was to remove effort from the user so they didn’t have to do the searching themselves, and could concentrate on other tasks while their answer was generated, as described in Sinch: Searching Inte"igently on a Mobile Device [7].

However, even after the previous designers’ work, there was still room to improve on CarS-inch to make it more appropriate for realistic driving conditions. Drivers cannot be dis-tracted by mobile devices, otherwise their safety is endangered. There have been several pa-pers describing different suggested limits on the length and number of steps constituting a task a user can accomplish safely while driving, but the most salient one was Visual-Manual NHTSA Driver Distraction Guidelines, published by the National Highway Traffic Safety Ad-ministration (NHTSA) [2]. The paper states that 17% of crashes involve distracted driving and any distraction of more than 2 seconds leads to an exponentially increasing amount of risk. Tasks that can’t be completed in 6 steps also greatly increases the amount of risk as-sumed by the driver. Unfortunately, that paper focuses on visual-manual interfaces for ac-complishing tasks not directly related to driving. CarSinch is much closer to an auditory-vocal based interface, so although the guidelines were not entirely appropriate, the design principles behind CarSinch partly derive from their suggestions.

The tests for this paper use common queries made by mobile users published in A Diary Study of Mobile Information Needs [4].

Design PrinciplesIn this section, the different assumptions made prior to starting work are described, as well as the core concepts for the systems that were developed while investigating the space.

D R I V E R S I D EThe driver side design focused on how to provide an interface to the system through in-car dashboards, which would have a tablet-sized device installed. This allows for large UI ele-ments, and lots of whitespace that can direct the user’s focus to the important sections of the interface.

The primary concern for any interface in a car is safety, so CarSinch was designed to follow all regulations for in-car devices, primarily those suggested by the NHTSA paper [2]. That

CarSinch! 3

paper suggesets reducing all tasks to 6 steps or less, and ensuring that each step needed less than 2 seconds of attention away from the road in order to maximize the driver’s safety.

CarSinch system takes input, and gives feedback, through audio, with very little manual in-teraction needed. This decreases the number of manual steps needed to answer a user’s question. Additionally, the design removes much of the on-screen information from previ-ous implementations, so it is easily glanceable, using large font sizes and minimal amounts of text. This decreases the amount of time needed for each step of the process, since users can get all the information in a short glance.

Despite safety concerns, the system still needs to be useful for users. This is accomplished by providing them with a direct answer to their question as quickly as possible, and then giving them contextual information shortly afterwards. This design provides as much feed-back at each step as possible without providing distraction, which allows the users to know that progress is being made.

Finally, the system ensures that information is lossless. That is, users need to be able to re-trieve previously asked questions, and information shouldn’t disappear or fluctuate over time. This ensures that users can focus on driving until they are available to absorb the in-formation in the answer, regardless of the amount of elapsed time since the question was asked.

T U R K E R S I D E

The main goal when using Turkers for CarSinch is to return quick and accurate information to the user. Since those qualities involve trade-offs, the design separates the Turker data-gathering process into several parts in order to ensure that the system’s usefulness for the user (aka the driver) can be maximized. This allowed the user to receive an answer to the question quickly, as well as answer any necessary follow-up questions and provide contextual information later, after the user has consumed the first answer but before any more ques-tions are asked.

ImplementationIn this section, the implementation for the two sides of the project are described. It also includes images of the designs and describes how they fit the design principles outlined pre-viously.

CarSinch! 4

D R I V E R S I D EThe majority of the work of this project was on the Android Side UI. Previously, the Sinch interface was optimized for a user using a phone, and with the ability to use hands. As de-scribed in our design principles, CarSinch had more constraints, and is intended for use on a tablet.

The tablet UI relies primarily on audio input and feedback. There is a button that allows for recording of a query, as well as a button that uses text-to-speech to speak the answer aloud, as shown in figure 1. The microphone button also includes a levels feedback, so the user knows that the device can hear the driver and it can use those levels to realize that the user is done speaking, and to submit the question then.

The answer screen (left image in figure 1) provides feedback as the Turkers complete parts of their tasks. The search query used is put into the top field, and the answer fills in the bubble below. After receiving results from the Turkers, the interface automatically speaks them aloud so the user doesn’t need to take his eyes off of the road to know that progress is being made.

Figure 1: Two screens %om the Android UI implementation

This design requires that users swipe to retrieve previous answers. Although this decreases efficiency in retrieving old information, it allowed the design to avoid a list view, which was previously required, as shown in figure 2. However, this introduced the problem of asking new questions versus asking followup questions. For example, if the user first asked “What is the cheapest data plan?”, then received an answer that wasn’t specific enough, a followup question would be “What about the cheapest one for AT&T?” This is a different type of question than a new question, such as “Who is Bob Marley?” Through user tests, this con-cern was dismissed, since users were able to realize that the same button on every page can ask both followups and new questions.

CarSinch! 5

Figure 2: First iteration of Android UI

Initially, there was more visual feedback on the pages Turkers were visiting, as shown in fig-ure 3, but this conflicted with the design principle of lossless information, increasing the amount of glances needed by the user, and thus increasing risk, so they were removed from the final design.

Figure 3: Two intermediate pages showing visual progress

T U R K E R S I D E

Since the Turkers need to go through several different phases to complete a query for the driver, the design included several different sites. Initial designs had three phases: searcher, designer, and verifier. The verifier was cut for this project, but it is something that could be worked on in the future.

The searcher stage was essentially unchanged from the original Sinch, as described in Sinch [3]. The Turker is asked to listen to a spoken query, run a Google search to find an answer, and submit the answer in a textbox. The system uses an embedded iframe for the Google

CarSinch! 6

search, as shown in figure 4, so that it can save the sites visited by the Turker, making them accessible in later stages of the system.

Figure 4: The unchanged searching Turker interface

For the designer stage, Turkers produced the UI that the driver would see on the dashboard. On this page, the Turker is asked to put more context around the direct answer provided by the searcher in the last page. Although previous work on CarSinch had templates that could be used to specify what type of data Turkers could input for a certain question type, this was too constraining for our design. Additionally, feedback on previous tests indicated that Turkers’ tasks would be more interesting (and thus get better results) if they had more free-dom arranging the data on the screen. Therefore our design allowed for a more freeform de-sign process.

Initially, the system allowed Turkers to design Android UI, but this constrained the answer to only support specific Android devices. Instead, the final design abstracted it so that the Turker was creating HTML which could be shown on any device (and at any size). Thus the design included an HTML UI designer called TinyMCE [8], preloaded with the searcher answer, as shown in figure 5. In order to assist the designer Turkers, the system also pre-loaded a relevant website (saved from the searcher interface) below the HTML editor, so the worker could copy-and-paste data in easily.

CarSinch! 7

Figure 5: HTML Designer and context website loaded below for the question “What is the cheapest AT&T data plan?”

In order to give the designers some context, they were given instructions and examples above the editing interface, as shown in figure 6.

Figure 6: Instructions and Example for Designer Turkers

An example of a completed design using the system is shown in figure 7, answering the ques-tion “When is the Coldplay concert tomorrow?”

CarSinch! 8

Figure 7: Finished Turker Designer Page

Test Designs and ResultsThis section describes the user tests which were designed to see if the system’s interfaces were effective, efficient, and safe. It also enumerates and analyzes the results.

D R I V E R S I D EFor the Android side of the system, the tests measured efficiency, safety, and utility of the interface. There were three different tests, each of which is described below. Each one was intended to test certain aspects of the system that were being iterated on. For each test, there were four trials. Each trial had a user that was a part of my living group. They were evenly distributed between freshmen and seniors, and none of them had seen Sinch or CarS-inch previously. Each test uses a low-quality driving game, but the NHTSA paper [2] asserts that driving simulators give very similar results to a real car on a test track for in-car device interface tests.

T E S T 1The first test measured the usefulness of the list view as shown in figure 2. The user was placed in front of a laptop, and an Android tablet was placed to the right of the screen (where a dashboard UI would be in a car in the US). The laptop was loaded with the New York Times Gauging Your Distraction game [5]. This is a game where you are required to press numbers on the keyboard to go through gates at a regular interval, simulating the attention needed for driving. Users were asked to ignore the secondary part of the game, which in-volved texting on a simulated phone, since we wanted to test our UI instead.

CarSinch! 9

Once the user was given time to practice driving and investigate the tablet UI, they were given a pre-chosen sample question and asked to find an answer to it. Users proceeded through three “stages” of the test: recording the question, waiting for an answer, and report-ing it. All three stages were done while continuing to drive through gates. The number of gates passed through and crashed into were recorded separately for each stage.

After the test was finished, users were asked to clarify the way they used the interface and give a subjective opinion on the experience.

T E S T 2This test was very similar to the previous test, except it was intended to test the safety and usefulness of the intermediate progress as shown in figure 3. Additionally, the first answer provided was wrong, to test how users asked followup questions. The user was expected to notice a wrong answer and re-ask the question, at which point the correct answer was given. This was an attempt to see if users were confused by the ability to ask followups and new questions.

T E S T 3The final test was made to measure the efficiency and latency of the entire system, end-to-end. It was a full Wizard of Oz style test, where the user came up with questions before the study began, and were able to ask those questions instead of sample ones. I acted the role of Turker (both searcher and designer). Because of this, I was unable to measure the number of gates they went through or crashed into, but I talked to each user extensively afterwards to get an idea of how often and for how long they glanced away from the driving game to esti-mate the effect of the test on their safety.

R E S U L T S

For the tablet tests, we received both subjective and objective data. In the first two tests, we kept track of the amount of gates they passed through in each “stage” (corresponding to be-fore, during, and after asking the question). For the first test, the results were mainly posi-tive for safety, but not for usefulness. The majority of users missed at most one gate, how-ever, they didn’t use a lot of the features of the application. They mostly just asked the ques-tion, then got the direct answer visually when it arrived. This means they had to glance back and forth several times, and they got distracted by the contextual information when they only wanted to read the most direct answer, which is why they missed gates. The users sub-jectively complained that the screen was too busy in general. After this feedback, our design moved away from using a list view.

CarSinch! 10

For the second test, every user failed around 2 gates per stage, but each stage took less time in general. This means that they were able to understand the interface easier and could use it to get the data they wanted in a shorter amount of time, but there was an increase in the risk. This was primarily because of the intermediate results being shown on screen. Other than the results flashing by on-screen, the interface did well at going further into the auditory-vocal space instead of the manual-visual space, which we know to have less risk per the NHTSA guidelines [2].

The final test was a more realistic version of the final system. Allowing users to ask their own questions meant that it followed more closely the real use case. Unfortunately, I didn’t get objective data for this test because it was hard to record that data in addition to running the Wizard of Oz test. However, I did end the user test by asking several subjective ques-tions to the user, and attempted to estimate objective data by asking how often they glanced away from the driving simulation, and for how long. When asked to come up with questions, the users came up with trivia (i.e. “When was Star Wars first released?” and “When was Ma-roon 5’s first album?”) as well as local questions (i.e. “Where is the nearest gas station to MIT?” and “What is the speed limit on 494?”). This fit the assumptions from the Mobile Needs paper[4]. Based on the estimated objective data, the system continued to move further into the auditory-vocal space, since the users reported that they glanced at the screen much less, and if they did look, it was for about 0.5 seconds, much less than the 2.0 second sug-gested limit.

T U R K E R S I D E

From Sinch [3] we already had results for the searcher phase, so we focused on our new de-signer interface.

Each time the designer interface changed, we ran a series of tests where Turkers were pro-vided with a sample question (alternating between “When is the Coldplay concert tomor-row?” and “What is the cheapest AT&T data plan?”) and were asked to create a UI answer-ing the question as quickly as possible, using the interface provided (which changed for each test). As shown in figure 4 and figure 8, the test included directions and examples, showing what an answer should look like. The only difference in the tests were from the changes made when iterating on the designer interface. The objective data that was measured was time elapsed in designing the answer compared to simply entering text, and we also got sub-jective feedback, which helped give context to some of the time lengths. There were four total tests. For the first two tests, the 4 participating Turkers were paid 15¢, because they were only required to move data around inside of a Google document. For the third test, the 5 Turkers were paid 50¢, since they were required to search for data items as well as organize

CarSinch! 11

them inside a Google document. The final test had 5 workers, each of which were paid 15¢, and they were required to copy data from a pre-loaded website into the HTML editor.

Figure 8: An example given to designer Turkers to show what contextual information is important

R E S U L T S

Our results were fairly consistent for the first two tests on the designer page for the Turkers, but did not achieve the results we wanted. On average, they took about 6 minutes to com-plete the assignment. The results from the searchers, which was done in a previous project, showed an average of around a minute. Although the searcher’s results will be given to users at an acceptable speed, we would still like to get contextual results to them quicker. Addi-tionally, several Turkers assumed that all the data available was to be used. Some of them ended up putting too much detail into their design, which made it harder for our main user to glance at the data and understand it, as well as increasing the time it took to return the answer. Figure 9 shows an example of this, as well as an example of a result that was more along the lines of what we were hoping for.

CarSinch! 12

Figure 9: Sample Results %om first and second user test; the le( one is too busy; the right one is we" done, except for the wrong name (should be Albert Einstein)

Several Turkers mentioned that they wished editing was easier. This suggested that perhaps the interface was too complicated and didn’t give enough affordances for simply copying data into the design.

The final test was an attempt to see if using TinyMCE and adding the searcher’s answer automatically to the design would decrease the time and effort it took to design an answer. Unfortunately, none of the answers used any additional data. This meant that Turkers either thought that the direct answer was the only necessary data, or that the directions and ex-amples didn’t give a clear impression of what should be done.

Future WorkIdeally, Turkers would spend less time working on results for our users. I think that incorpo-rating a retention model (as used in Adrenaline [6]) would help this, as well as further simpli-fying our designer page. However, most drivers were satisfied with the direct answer, and were happy with the additional information received, even if it took slightly longer than de-sired. Another way to improve this experience is to show more feedback as to what the Turkers are doing without making it distracting to the user. We tried both extremes of this

CarSinch! 13

idea, but there should be some solution in the middle that informs users their question is still being worked on, without distracting too much. Ideally, there would also be a third stage for the Turkers, which validated the information submitted, so that the user would be ensured of getting accurate data.

AcknowledgementsI would like to thank Rob Miller for his guidance in this project, as well as Hieu Tran, Akan-sha Kumar, and Rajeev Nayak for their previous work on the project. Additionally, I would like to thank all of my test subjects for their assistance in designing this system.

References[1] “Amazon’s Mechanical Turk.” Web. <www.mturk.com>

[2] Strickland, David, et. al. “Visual-Manual NHTSA Driver Distraction Guidelines for In-Vehicle Electronic Devices.” Feb. 16th, 2012.

[3] Tran, Hieu. “Sinch: A Delegated Search Service 6.UAP Final Report.” May 15th, 2011.

[4] Sohn, Timothy, et. al. “A Diary Study of Mobile Information Needs.” April 5th, 2008.

[5] “Gauging Your Distraction.” Web. <http://www.nytimes.com/interactive/2009/07/19/technology/20090719-driving-game.html>

[6] Bernstein, Michael, et. al. “Crowds in Two Seconds: Enabling Realtime Crowd-Powered Interfaces.” Oct. 16th, 2011.

[7] Nayak, Rajeev. “Sinch: Searching Intelligently on a Mobile Device.” August 20th, 2010.

[8] “TinyMCE.” Web. <www.tinymce.com>

CarSinch! 14

top related