using a 3d cylindrical interface for image browsing...

4
USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSING TO IMPROVE VISUAL SEARCH PERFORMANCE Klaus Schoeffmann, David Ahlstr¨ om Klagenfurt University, Austria {Klaus.Schoeffmann, David.Ahlstroem}@aau.at ABSTRACT In this paper we evaluate a 3D cylindrical interface that ar- ranges image thumbnails by visual similarity for the purpose of image browsing. Through a user study we compare the performance of this interface to the performance of a com- mon scrollable 2D list of thumbnails in a grid arrangement. Our evaluation shows that the 3D Cylinder interface enables significantly faster visual search and is the preferred search interface for the majority of tested users. 1. INTRODUCTION Image browsing is known as the process of interactively skim- ming through a collection of images in order to explore the collection or to find a particular image. Image browsing tools usually employ a grid-like arrangement of thumbnails – also known as a Storyboard [1]. Well-known examples using this approach include online photo sharing sites, such as Flickr, photo management tools, such as Google Picasa, and file man- agers in operating systems. With these browsing tools the list of thumbnails is usually sorted by some kind of metadata, e.g., filename or creation date. However, a user who needs to find a specific image in mind rather prefers search by visual attributes than by metadata. This is particularly true if meta- data are incomplete or invalid. Previous work has shown that similarity-based arrangement of images can help to improve visual search performance in terms of search time [2, 3, 4]. Unfortunately, these proposed approaches are bound to spe- cific visualization layouts. For example, Schaefer [4] pro- poses a 3D globe layout and Rodden et al. [2] use a clustered visualization based on multi-dimensional scaling [5]. Both approaches share the problem of overlapping of images, an issue that is solved by hierarchical refinement, which is prob- lematic as it destroys the browsing context. In this paper we evaluate an image browsing interface us- ing a 3D cylinder with image thumbnails arranged by visual similarity, as shown in Figure 1. Through a user study with known item search tasks we show that this interface enables significantly faster visual search than a 2D list of visual simi- larly sorted images. Fig. 1. Cylindrical 3D interface for image browsing. 2. RELATED WORK Although several 3D interfaces for video retrieval (with ranked result lists) can be found in the literature (e.g., [6, 7, 8]), only a few 3D interfaces have been proposed for image browsing. A 3D globe interface for image browsing has been pro- posed by Schaefer [4]. This interface uses an interactive 3D sphere layout where image thumbnails are arranged according to Hue and Value attributes of the HSV color space. Schae- fer argues that this arrangement is very intuitive for humans. We share this opinion and we also use the HSV color space as a basis for the color-based similarity arrangement used in our interface. The sphere can be rotated horizontally and ver- tically and image clusters can be inspected at higher details with a hierarchical refinement method. Unfortunately, no user study has been performed that demonstrates the ease of use in neither browsing nor visual search tasks. A cylindrical 3D visualization has already been proposed by Christmann et al. [9], who have discussed two differ- ent perspectives that show excerpts of a 3D cylinder: an in- ner view and an outer view. They speculate that the inner view should provide a more intuitive and richer experience than when rotating the cylinder seen from an outside per- spective. Christmann and colleagues further hypothesize that the outer view, which distorts larger items more than smaller items, will result in better visual search efficiency than the inner view where the smaller items are distorted more than the larger items. However, their evaluation shows no differ- ence in terms of search efficiency. Although no direct com- parison was made with a standard 2D interface, participants

Upload: tranthuan

Post on 14-Mar-2018

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSING …vidosearch.com/publications/3DPhotoRing_preprint.pdf · TO IMPROVE VISUAL SEARCH PERFORMANCE ... known as a Storyboard [1]

USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSINGTO IMPROVE VISUAL SEARCH PERFORMANCE

Klaus Schoeffmann, David Ahlstrom

Klagenfurt University, Austria{Klaus.Schoeffmann, David.Ahlstroem}@aau.at

ABSTRACT

In this paper we evaluate a 3D cylindrical interface that ar-ranges image thumbnails by visual similarity for the purposeof image browsing. Through a user study we compare theperformance of this interface to the performance of a com-mon scrollable 2D list of thumbnails in a grid arrangement.Our evaluation shows that the 3D Cylinder interface enablessignificantly faster visual search and is the preferred searchinterface for the majority of tested users.

1. INTRODUCTION

Image browsing is known as the process of interactively skim-ming through a collection of images in order to explore thecollection or to find a particular image. Image browsing toolsusually employ a grid-like arrangement of thumbnails – alsoknown as a Storyboard [1]. Well-known examples using thisapproach include online photo sharing sites, such as Flickr,photo management tools, such as Google Picasa, and file man-agers in operating systems. With these browsing tools thelist of thumbnails is usually sorted by some kind of metadata,e.g., filename or creation date. However, a user who needs tofind a specific image in mind rather prefers search by visualattributes than by metadata. This is particularly true if meta-data are incomplete or invalid. Previous work has shown thatsimilarity-based arrangement of images can help to improvevisual search performance in terms of search time [2, 3, 4].Unfortunately, these proposed approaches are bound to spe-cific visualization layouts. For example, Schaefer [4] pro-poses a 3D globe layout and Rodden et al. [2] use a clusteredvisualization based on multi-dimensional scaling [5]. Bothapproaches share the problem of overlapping of images, anissue that is solved by hierarchical refinement, which is prob-lematic as it destroys the browsing context.

In this paper we evaluate an image browsing interface us-ing a 3D cylinder with image thumbnails arranged by visualsimilarity, as shown in Figure 1. Through a user study withknown item search tasks we show that this interface enablessignificantly faster visual search than a 2D list of visual simi-larly sorted images.

Fig. 1. Cylindrical 3D interface for image browsing.

2. RELATED WORK

Although several 3D interfaces for video retrieval (with rankedresult lists) can be found in the literature (e.g., [6, 7, 8]), onlya few 3D interfaces have been proposed for image browsing.

A 3D globe interface for image browsing has been pro-posed by Schaefer [4]. This interface uses an interactive 3Dsphere layout where image thumbnails are arranged accordingto Hue and Value attributes of the HSV color space. Schae-fer argues that this arrangement is very intuitive for humans.We share this opinion and we also use the HSV color spaceas a basis for the color-based similarity arrangement used inour interface. The sphere can be rotated horizontally and ver-tically and image clusters can be inspected at higher detailswith a hierarchical refinement method. Unfortunately, no userstudy has been performed that demonstrates the ease of use inneither browsing nor visual search tasks.

A cylindrical 3D visualization has already been proposedby Christmann et al. [9], who have discussed two differ-ent perspectives that show excerpts of a 3D cylinder: an in-ner view and an outer view. They speculate that the innerview should provide a more intuitive and richer experiencethan when rotating the cylinder seen from an outside per-spective. Christmann and colleagues further hypothesize thatthe outer view, which distorts larger items more than smalleritems, will result in better visual search efficiency than theinner view where the smaller items are distorted more thanthe larger items. However, their evaluation shows no differ-ence in terms of search efficiency. Although no direct com-parison was made with a standard 2D interface, participants

Klaus
Textfeld
This is a preprint of the paper to be published by IEEE in the Proceedings of WIAMIS 2012. (c) IEEE
Page 2: USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSING …vidosearch.com/publications/3DPhotoRing_preprint.pdf · TO IMPROVE VISUAL SEARCH PERFORMANCE ... known as a Storyboard [1]

frequently reported positive attitudes and possible advantageswith the 3D visualization. However, to best of our knowledge,no one has shown – through user tests with statistical evalu-ations – that a 3D storyboard significantly outperforms a 2Dstoryboard in the context of visual image search.

3. INTERFACES

We evaluate the following two image browsing interfaces:The Grid interface as a color sorted and scrollable story-board that arranges thumbnails in a grid with three columnsand in row-major order (bottom left part of Figure 2(a)).The Cylinder interface as a color sorted and rotatable 3Dcylinder that arranges thumbnails in a horizontal circle withthree rows by column-major order (depicted in Figure 1).

Both interfaces use roughly the same screen estate andshow image thumbnails sorted by the color sorting algorithmproposed in [10]. This algorithm sorts images in two stages.First, images are grouped by their dominant Hue attribute intoa maximum number of sixteen areas (each group correspondsto 22.5 degree in the Hue circle of the HSV color space). Sec-ondly, the images in each group are sorted by an HSV colorhistogram using Euclidian distance. Two additional groupsare used for bright and dark images at the beginning and theend of the list. This produces an intuitively sorted list of im-ages (cf. Figure 2) as the user survey in [10] has shown.

The Grid interface provides a scrolling feature by mouse-wheel as well as by a scrollbar at the right hand side. TheCylinder interface provides scrolling (i.e., rotation of the cylin-der) by the mouse-wheel. This interface uses real 3D graphicsto align image thumbnails in a horizontal ring that can be ro-tated. Our implementation of the cylinder interface uses a gapto indicate the beginning and the end of the list (between darkand bright images, as shown in the lower left part of Figure1). Each ‘screen’ consists of six triangles and the thumbnail astexture, such that the image thumbnails are smoothly bended.The Cylinder interface provides several advantageous charac-teristics: (a) through the 3D perspective it enables to showmore images at a glance than a common 2D list; this enablesa user to keep many images in view while concentrating onspecific images in the front, (b) the cylinder can be intuitivelyrotated in order to bring images from the back to the front forinspection at higher detail, (c) quick navigation to differentareas in the cylinder is possible by a right mouse click, whichinstantly turns the clicked area to the front of the cylinder.More details about this interface can be found in [11].

4. USER EXPERIMENT

To compare the two interfaces we conducted a user exper-iment with a within-subjects design where participants per-formed a series of image search tasks in both the color sortedgrid storyboard and in the color sorted cylinder.

4.1. Participants and Apparatus

Twelve right-handed volunteers (six female) aged 27 to 38years (mean 30.5, s.d. 3.8) participated in the experiment.All were experienced computer users and their self-estimatedcomputer usage per week ranged from 10 to 60 hours (mean42.8, SD 16.1). All participants had participated in earliersimilar user experiments and were thus familiar with both thecolor sorting and the two interfaces. The experiment was con-ducted on a Dell Precision M4400 Laptop (running Windows7) with its 15.4-inch display set at a resolution of 1440×900pixels. A laser mouse (Dell LaserStream) was used as inputdevice. The experiment software was coded in C# .NET 4.0and used the Microsoft XNA Framework 4.0.

4.2. Materials and Task

Each participant was required to complete 120 image searchtrials, 60 trials with each of the two interfaces. Each trial con-sisted of two phases: an inspection phase and a search phase,as shown in Figure 2(a). In the inspection phase participantswere prompted shown the image they were required to findin the search phase. Participants were told that they couldinspect the prompted image as long as needed and that afterinspection they had to search for the image in a display ofseveral images without recourse to the prompted image. Aclick inside the prompted image ended the inspection phase,displayed the search interface (the Grid or the Cylinder) andstarted timing. Timing ended when an image in the searchinterface was clicked. After that, the software returned tothe inspection phase and loaded the next trial by prompting anew image. Trials where the wrong image was selected werelogged as errors and were re-queued at a random positionamong the unfinished trials. The Grid interface used 50 rowsof 3 images (160×100 pixels large) and seven rows of imageshad place on the screen. The Grid could be navigated througha red scrollbar or by using the mouse-wheel. The navigationmechanisms and image layout used for the Cylinder are de-scribed in Section 3. We used 120 predefined lists of 150 im-ages to populate the search interfaces, one list of images foreach trial. The image lists were created by randomly drawingfrom a pool of 1,100 unique images taken from random key-frames of the IACC.1 TRECVID 2010 repository [12]. Allimages in a list were unique (but an image could appear inmore lists). Each list was sorted according to our color sort-ing algorithm and the lists were divided into two sets of 60lists of images, set A and set B. We then divided each list inten equally sized target groups (with the first 15 images in alist belonging to target group 1, the next 15 images to targetgroup 2, and so forth) and selected one image in each list toserve as the target image in a trial. The target images wereselected in such a way that, (1) no image served as target inmore than one trial, and (2) across all 60 lists in set A andin set B the target images were evenly distributed betweenthe 10 target groups (the exact position inside a target group

Page 3: USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSING …vidosearch.com/publications/3DPhotoRing_preprint.pdf · TO IMPROVE VISUAL SEARCH PERFORMANCE ... known as a Storyboard [1]

Target group 1Target group 2

Targetgroup 3

Targetgroup 4

Targetgroup 5

Targetgroup 6

Targetgroup 7

Targetgroup 8Target

group 9

Targetgroup 10

CylinderGrid

Inspectionphase

Searchphase

(a)

(b)

Fig. 2. (a) Screenshots of the experimental software duringthe two trial phases. (b) Limits of the ten target groups withinthe Cylinder (not visible in the experimental software).

was randomly selected). The experimental software loadedimages sequentially from a list into the search interface usingrow-major order for the Grid interface and column-major or-der for the Cylinder (Figure 2(b) visualizes the limits of eachtarget group in the Cylinder). Thus, after completing the 60trials with one interface a participant had searched for imagesin all areas of the search interface, i.e., each of the ten targetgroups had been used to host the target image in six trials.The order in which the image lists were used throughout asequence of 60 trials was random.

The 12 participants were divided into two groups of six.The first group started with the Grid interface, the other groupwith the Cylinder interface. Half of each group were pre-sented with images from image set A when using the first in-terface and images from set B for their second interface. Theother half used set B for their first interface and set A for theirsecond interface.

The number of error-free trials collected in the experimentcan be computed as follows: 12 participants × 2 interfaces(Grid, Cylinder) × 10 target groups × 6 trials = 1440 trials.A test session lasted approximately 40 minutes. Each partic-ipant had a 5 minutes long break between the two interfacesand completed 10 practice trials before starting the logged tri-als with a new interface.

4.3. Results

Data from a total of 1554 trials was collected. In 114 trials(7.3%) a wrong image was selected. These erroneous trialswere roughly equally distributed between the two interfacesand the ten target groups.

Since the time measures in the error-free data set exhib-ited positively skewed distributions we applied a logarithmictransformation (which resulted in distributions close to nor-mal) of the original measurements before analyzing the re-sults. A 2 × 10 RM-ANOVA (repeated measures analysisof variance) with factors interface (Scrolling, Cylinder) andtarget group (1 to 10) showed significant main effects for in-terface (F1,11 = 6.78, p < 0.05) and target group (F9,99 =63.38, p < 0.0001). Figure 3(a) shows the geometric meansfor the two interfaces (i.e., the antilog of the mean of the log-transformed data). With 4.8 seconds, the Cylinder was over-all 12.7% faster than the Grid (5.5 seconds). However, thisdifference has to be seen in combination with a significant in-terface × target group interaction effect (F9,99 = 5.68, p <0.0001), indicating that the magnitude of the difference de-pends on target group. In Figure 3(b) we see that target group5, 6, 7, 8, and 9 were the contributors to the overall difference.

0

2

4

6

8

10

1 3 109876542Target group

Tria

l tim

e (s

ec)

GridCylinder

0

2

4

6

8

10

Grid

Cylinder

Tria

l tim

e (s

ec)

(a) (b)

Fig. 3. (a) Geometric mean trial times across target groupsfor each interface. (b) Geometric mean trial times for eachinterface×target group combination. (Error bars indicate95% confidence intervals).

The results presented in Figure 3(b) are most easily ex-plained when considering how the two interfaces were dis-played at the beginning of a trial, as depicted in Figure 2(a)and (b). Since the initial state of both interfaces completelydisplays the first target group prominent on the screen, ex-tremely fast selections are possible. Selections in target group2 cost more due to scrolling operations in the Grid and due torotation operations or the inspection of slightly smaller anddistorted images in the Cylinder. For the Grid, we see thattrial time continues to increase as more scrolling is neededto access images in yet more distant target groups – acrosstarget group 1 to 8 almost linearly. For the Cylinder, on theother hand, we see how the occluded images in target group 3causes trial time to peak and that trial time settles at a constant

Page 4: USING A 3D CYLINDRICAL INTERFACE FOR IMAGE BROWSING …vidosearch.com/publications/3DPhotoRing_preprint.pdf · TO IMPROVE VISUAL SEARCH PERFORMANCE ... known as a Storyboard [1]

level thereafter for target group 4 to 8. When the target imagewas positioned in any of these target groups, or in target group9, participants could capitalize on the cylindrical design. Asvisible in Figure 2(b), the initial view of the Cylinder en-ables fast access to a target image in the background (initiallydistant target groups) without rotation (or extensive scrollingas necessary with the Grid) through a direct left-click. Al-ternatively, a possible target image is easily brought to theforeground for inspection at larger size by a right-click in itssurroundings. Furthermore, since the different color clustersare visible in the initial view of the Cylinder, valuable initialguidance to “hot” areas is provided. The abrupt decrease intrial time after target group 8 with the Grid clearly reflects theimportance of such guidance: only with dark target imagesparticipants could confidently narrow down the search areaknowing that dark images were positioned at the end of theGrid, i.e., target group 9 and 10. The locations of other col-ored images were harder to approximate since the Grid did notprovide an orienting overview where color clusters are visi-ble. Target group 9 was also fast with the Cylinder. Havingseen a dark image in the inspection phase of a trial, partici-pants anticipated the target location in the leftmost part of thescreen (target group 9 and 10) and could thus save time notneeding to scan the Cylinder for the location of the desiredcolor cluster, as necessary in case of other colored target im-ages. Finally, the occlusion in target group 10 did cost extratime, as in target group 3.

5. CONCLUSIONS

We have evaluated an interface for image browsing that takesadvantage of 3D graphics in order to show more informationat a glance than a common 2D image list. This interface pro-vides an intuitive arrangement of image thumbnails with mi-nor occlusion and only slight perspective distortion. It givesthe user the possibility to keep a large number of imagesin view while inspecting images in the front at high detail.Moreover, the interaction with the cylindrical arrangement byrotation maps well to the affordances of the mouse-wheel. Wehave evaluated this interface in a direct comparison to a com-mon 2D storyboard. We used color sorted image collectionsas it has been shown in the past that color sorting can im-prove image browsing in terms of search time. In our evalu-ation we have shown that the proposed 3D interface can fur-ther improve visual search time and significantly outperforma color-sorted 2D storyboard. In future work we will continueevaluations with larger image data sets (i.e., with more than150 images).

AcknowledgmentThis work was supported by Lakeside Labs GmbH, Klagen-furt, Austria and funding from the European Regional Devel-

opment Fund and the Carinthian Economic Promotion Fund(KWF) under grant KWF - 20214 22573 33955.

6. REFERENCES

[1] F. Arman, R. Depommier, A. Hsu, and M.Y. Chiu,“Content-based browsing of video sequences,” in Proc.of ACM Conf. on Multimedia, 1994, pp. 97–103.

[2] K. Rodden, W. Basalaj, D. Sinclair, and K. Wood, “Doesorganisation by similarity assist image browsing?,” inProc. of the ACM Conf. on Human factors in computingsystems, 2001, pp. 190–197.

[3] G.P. Nguyen and M. Worring, “Interactive access tolarge image collections using similarity-based visualiza-tion,” J. of Visual Lang. and Computing, vol. 19, no. 2,pp. 203 – 224, 2008.

[4] G. Schaefer, “A next generation browsing environmentfor large image repositories,” Multimedia Tools and Ap-plications, vol. 47, pp. 105–120, 2010.

[5] W. Basalaj, “Incremental multidimensional scalingmethod for database visualization,” in Proc. of VisualData Expl. and Analy. VI, 1999, vol. 3643, pp. 149–158.

[6] Y.T. Zheng, S.Y. Neo, X. Chen, and T.S. Chua, “Vi-siongo: towards true interactivity,” in Proc. of the ACMConf. on Image and Video Retrieval, 2009, pp. 51:1–51:1.

[7] C. Snoek, M. Worring, D. Koelma, and A. Smeulders,“Learned lexicon-driven interactive video retrieval,” Im-age and Video Retrieval, pp. 11–20, 2006.

[8] P. Chiu, J. Huang, M. Back, N. Diakopoulos, J. Doherty,W. Polak, and X. Sun, “mTable: browsing photos andvideos on a tabletop system,” in Proc. of the ACM Conf.on Multimedia, 2008, pp. 1107–1108.

[9] O. Christmann, N. Carbonell, and S. Richir, “Visualsearch in dynamic 3D visualisations of unstructured pic-ture collections,” Interacting with Computers, vol. 22,no. 5, pp. 399 – 416, 2010.

[10] K. Schoeffmann and D. Ahlstrom, “An evaluation ofcolor sorting for image browsing,” International Jour-nal of Multimedia Data Engineering and Management(IJMDEM), vol. 3, no. 1, pp. 49–62, 2012.

[11] Klaus Schoeffmann and David Ahlstrom, “3d story-boards for interactive visual search,” in Proceedings ofthe IEEE International Conference on Multimedia andExpo. 2012, IEEE.

[12] A.F. Smeaton, P. Over, and W. Kraaij, “Evaluation cam-paigns and trecvid,” in Proc. of the ACM Workshop onMultimedia Information Retrieval, 2006, pp. 321–330.