name that cluster text vs. graphics

Shuang WuREU-DIMACS, 2010

Mentor: James Abello

Project description Our research project Input: time data recorded from the ‘Name

That Cluster’ web page. Output: statistic results of participants’ different behaviors when using

three interfaces. Collected Statistics Conclusions

For a pre-computed collection of search engine queries, users select for each query one out of three interfaces: Textual, Graphical and Hybrid.

The evaluation process consists of exploring clusters associated with each query, naming the correspond clusters, selecting Clusters Ratings. (ClusterFitRatings and Cluster Name Ratings).

The ClusterFitRatings are on a scale from -1 to 4 and Name Ratings are on a scale from -1 to 2. Note: -1 means that participants didn’t give a rating.

The collected statistics are: Exploration Times, Naming Times, Cluster Rating Times, Name Rating Times, ClusterFitRatings, and Name Ratings.

The raw data collected online is: Userid QueryString: the evaluated query ClusterNum: the evaluated cluster in that

query Name: name/description/summary given

to the cluster Timestamp: server data/time at which the

evaluation was written on the data base

440 clusters were evaluated in the Textual interface, and 338 clusters were evaluated in the Graphical interface, another 378 clusters were evaluated in the Hybrid interface.

We used the Exploration Time, and the Evaluation Time = sum of Naming time, ClusterFitRating time and Name Rating time in the following analysis.

Notation: Ex(T),Ex(G),Ex(H) denote Exploration time per interface; T,G,H denote Evaluation time per interface; NT(T), NT(G), NT(H) denote Naming time per interface.

Dealt with the outliers

Note: We treated data with 3.5 standard deviations

from mean as outliers.

TWO SAMPLE T-TEST ANOVA F-TEST• Test for the difference

in means of two samples.

• Null Hypothesis: there is no difference in two means. vs. Alternative Hypothesis: a mean of the first sample is larger/smaller than a mean of the second sample.

• Reject a Null Hypothesis if P-value is less than .05.

• Test for the difference in means for three or more samples.

• Null Hypothesis: all means are equal. vs. Alternative Hypothesis: at least one of the means are different.

• Reject a Null Hypothesis if P-value is less than .05.

After a series of T-tests and ANOVA F-tests we got the following results.

Exploration time: There is no difference on

the average of Exploration times per interface.

Name time: The Textual interface has

the larger Naming time mean.

Evaluation time: The Graphical interface

has a larger mean of Evaluation time than the Textual and Hybrid interfaces.

We wanted to see if there was a relationship

between ClusterFitRatings and Evaluation times

or Naming times for the cluster collection.

We also wanted to see if there was a relationship

between Name Ratings and Evaluation times or

Naming times for the cluster collection.

According to the results from the four pages and a regression test: test for the linear relationship between a response variable and a explanatory variable, we got the following observations.

When participants gave ClusterFitRating=4, they had the shorter mean of Evaluation time and Naming time than the other Cluster FirRatings in all interfaces.

Users either had the shorter mean of Evaluation time and Naming time when they gave a Name Rating=-1 or 2 than when they gave other ratings or there was no significant time difference in all interfaces.

There are linear correlations between ClusterFitRatings and Name Ratings in all interfaces.

We wanted to see if there was a per query variation of task time among the three interfaces.

In order to do this, we grouped the queries that were evaluated with the three interfaces by different users. There was 16 queries that were evaluated by different users with the 3 interfaces. For each such query we tested for the difference.

According to the results from the last two pages, we got the following observations for these 16 queries user triples.

The Textual interface has the shorter mean of Evaluation time and Exploration time than the Graphical and Hybrid interfaces.

The difference in the average of Evaluation time and Exploration time between the Textual and the Graphical interfaces is larger than the one between the Textual and the Hybrid and the Graphical and the Hybrid.

To see if there is an interface with the shortest Exploration time and Evaluation time for these 16 qualified queries. We found the minimum number of triples over all such queries, in order to best deal with the leftovers (a remaining data after grouping in triples).

After a consideration of the number of triples per query and the outliers of these data, we set this minimum number as five.

This is a part of table with five randomly selected triples from each query.

After a series of T-tests , ANOVA F-tests and regression tests, we got the following results for this tripled set of 16 queries

There is no difference in the means of Exploration times and Evaluation times for each interface.

There exist linear correlations between Exploration times and Evaluation times in the Graphical and the Hybrid interface, but not in the Textual interface.

The Textual interface has the larger mean of Naming time.

The Graphical interface has the larger mean of Evaluation time.

Participants give the highest ClusterFitRating have the shorter mean of Evaluation time and Naming time in all interfaces.

There exists a linear correlation between ClusterFitRatings and Name Ratings in all interface; and a linear correlation between Exploration times and Evaluation times in the Graphical and Hybrid interfaces.

Name Than Cluster online survey, http://gem1.rutgers.edu/userstudy/login.php

J. Abello, J, Schulz, H, Gaudin, B, and Tominski, C (2007). Name That Cluster - Text vs. Graphics, IEEE InfoVis Conference, Sacramento, November 2007.

Ramsey, Fred L, The statistical sleuth : a course in methods of data analysis, Duxbury/Thomson Learning, 2002

http://gem1.rutgers.edu/userstudy/login.php

http://gem1.rutgers.edu/userstudy/login.php

THE END

name that cluster text vs. graphics

Documents

time data

evaluation times

clusterfitrating time

naming times

sum of naming time

larger mean of evaluation

larger naming time mean

cluster rating times