detection of imperceptible on-screen markers with...

Detection of Imperceptible On-Screen Markerswith Unsynchronized Cameras

Luiz Sampaio*1 Yoshio Yamada*2 Goshiro Yamamoto*3

Takafumi Taketomi*4 Christian Sandor*5 Hirokazu Kato*6

Abstract – We present a method for detecting on-screen markers to identify physicaldisplays and to estimate camera poses against these displays. It is difficult for humans toperceive the variation of two alternating colors when these colors switch in a frequencyhigher than the flicker fusion threshold. On the other hand, the variation is detectable byof-the-shelf rolling shutter cameras which have certain frame rate setting. Leveraging onthe difference between the human visual characteristics and the camera characteristics,imperceptible markers can be embedded onto a display screen. In this paper, we introduceour technique in mobile device based user interfaces for manipulating digital content onlarge distributed displays. Furthermore, we show that our prototype system enables anaugmented reality application on a mobile device using the marker embedded displays.Lastly, we discuss potential scenarios and applications.

Keywords : information embedding, visual performance, sensitively behavior of cam-era, multi-display environment.

1 Introduction

Public displays and digital billboards have becomepresent in most of urban centers over the last years.However, most of public displays still lack of interac-tivity, remaining simply electronic versions of theirancestry. The ones that allow some kind of interac-tion are not designed for simultaneous access, block-ing all interested people while one is using. Withthe proliferation of mobile devices, it is clear that anopportunity to expand this interaction is still poorlyexplored; they can work as an expansion of the publicdisplay, allowing each user to have their own privateversion of the remote content.

The first step to achieve this goal is the identifica-tion of the remote display by the user’s mobile de-vice. Other researchers have used visible tags insideand outside the screen, but these markers are obtru-sive and degrade the content visualization. Recentworks explored the detection of visible salient fea-tures (edges) on the displayed image [1], however, itis impractical when considering the identification ofone among various displays showing the same con-

*1Nara Institute of Science and Technology*2Nara Institute of Science and Technology*3Nara Institute of Science and Technology*4Nara Institute of Science and Technology*5Nara Institute of Science and Technology*6Nara Institute of Science and Technology

Fig. 1 The difference of perception betweenthe human eye and a camera.

tent, or when the image has only few or even nopoints of interest.

The main contribution of this paper is the methodfor detecting visual codes embedded in textured im-ages, using off-the-shelf cameras, remaining imper-ceptible for the human eye. Different from otherworks, our method does not require synchronizationbetween the screen frames and camera capture, whatmakes it especially interesting when considering un-managed environments, such as public places. Ad-ditionally, as the computation of the target color ismade separately for each pixel, the visual codes canbe inserted in any region of the image, regardlesswhat type of texture the image contains.

日本バーチャルリアリティ学会複合現実感研究会 Vol.19, No.1, 2015

2 Related Work

The idea of exploiting the difference between howcamera sensors and the Human Visual System per-ceive the colors is not new, and have been exploredfor different purposes over the years. VRCodes [1]employs the rolling-shutter camera built in smart-phones to detect binary codes embed in active dis-plays. Switching complementary hues at 60Hz, thehuman eye see just a flat gray-colored screen, whilethe mobile camera see the decomposed aliasing ef-fects of the changing hues. While the use of a rolling-shutter camera seems to be a strict constraint, ourapproach proved to be efficient in both rolling-shutterand global shutter cameras. Moreover, there is noinformation regarding the use of this technology onthe whole screen, when displaying content simulta-neously. When a newspaper headline in shown, thebinary code is applied only on the top of the screen.

Grundhofer et al. [2] propose a method of em-bedding binary codes integrated with the projectedimage. The embedded code is dynamically adjustedto guarantee its imperceptibility and to adapt it toactual camera pose. However, in order to achievethe marker tracking method, the camera, projectorand a LED illumination module must be intercon-nected with a synchronization unit, what makes thisapproach not applicable for our target audience.

While our approach is based on the detection ofimperceptible markers, most of other researchers hasfocus on the detection of visible feature points onthe screen. Touch Projector [3] and Virtual Projec-tion [4] rely on the matching of features detected inthe camera image and the image displayed on theremote screen. A drawback of this technique is thatit is content-dependent, it means, the system is notable to distinguish different screens showing the samecontent. Visual SyncAR [5] applies digital water-mark on videos, that when captured by the camera’ssmartphone, can show computer graphics animationssynchronized with the video. However, in order tocompute the camera position, the four corners of thescreen must be within the camera image. It can bedifficult to achieve when considering large public dis-plays, as we are aiming.

Alternative ways to track the camera position havealso been tested. Berge et al. [6] described the appli-cation of Polhemus Patriot Wireless trackers, usingthe magnetic field created between the transmitter

and receptor to determine the position relative todisplay.

3 Design and Detection of On-Screen

Markers

3.1 Imperceptible On-Screen MarkersFlicker fusion threshold is the frequency at which

an intermittent light appears to be steady to the hu-man observer. In the same way, complementary col-ors, when alternated faster than this frequency, areperceived as a combination of both. For instance, at120Hz it is possible to use colors with higher ampli-tude and still keep the same effect. In our system,as most of standard displays run with refresh rate of60Hz, we work with lower contrast colors.

3.2 3.2 Detection and TrackingThe concept of “beat” is well known in acoustics,

and it is defined as the interference that occur be-tween two sound waves of slightly different frequency.

Based in this idea, we defined our approach to cap-ture the video frames without relying in signal syn-chronization. Considering the available displays, thestandard refresh rate is 60 frames per second. By set-ting the camera at 45 frames per second, we foundthat the 3 frames are necessary in order to extractthe marker.

4 Experimental Results

4.1 ImplementationOur system is organized in a distributed architec-

ture as show in Fig. 2. The client application isrunning in a Sony Vaio Tap 11 Tablet PC. As weneed to capture the video at 45fps, so far is not pos-sible to use the built-in camera. Instead, we are us-ing a PlayStation Eye, which offers a wide range ofmodes for controlling exposure, gain and so forth.The main function of the client application is con-tinuously capture “rounds” of 3 frames and extractthe binary pattern by subtracting the color differenceof one frame from other.

The binary image is then sent to the server ap-plication, which runs in Intel Core i7 3.7GHz with32 GB of RAM. This application receives the binaryimage from the client, performs the pattern recogni-tion using the Locally Likely Arrangement Hashing(LLAH), and from the computed homography ma-trix, performs the camera pose estimation. Addition-ally this application manage the running instances of

Sampaio et al : Detection of Imperceptible On-Screen Markers with Unsynchronized Cameras

Fig. 2 System configuration.

client applications and pattern generators (which runon the displays). Once the marker is detected andthe projection matrix is computed, this is sent backto the client application, for rendering an augmentedcontent in the correct position. In this our case, weshow some annotation regarding the image that isdisplayed on the screen.

The “pattern generator” is the application whichruns on the remote displays, and for that we are us-ing an Intel Core i7 3.20GHz with 8GB of RAM. Toembed the marker in the image we want to show,we generate a Random Dot Pattern [7] beforehand.Then, the system computes the complementary col-ors on target image, in accordance with the marker’sposition. As result, we get two images, each of themwith the embedded markers in one of the comple-mentary colors. When these two images are flickeredat 60 fps, the human eye can perceive only the result-ing color. This application also receives events fromthe server application. For instance, when the userswipes a finger on the tablet screen, the coordinatesare sent to the pattern generator through the serverapplication, what results in a line drawing on the re-mote screen. For displays, we used a 50” PanasonicTH-50PF Plasma Screen, but also conducted testswith diverse models of standard PC screens.

4.2 Results

Figure 4 shows the result of embedding a dot markerin a picture of Tokyo. At the remote display, nodots can be seeing, because we are switching the im-ages with the dots in complementary colors in a fre-quency faster than the human eye can perceive. Onthe tablet display, we highlighted the dot pattern forillustration purposes. As the light from the environ-ment and the brightness from the remote display can

Fig. 3 Client application showing the dot pat-tern overlaid on the camera image.

Fig. 4 Example: annotations based in con-text.

affect the detection of the pattern, we created a sliderto control the camera exposure manually.

From this basic framework, we can develop manyapplications. Figure 4 shows annotations about thebuildings on the image. In a real world applica-tion, the annotations could be presented according touser’s language, or according some specific setting.We believe this kind of application can be very use-ful in environments such as museums, airports, shop-ping malls and any other place where it is providingtailored information for diverse and large amount ofpeople.

Due the feature subtraction of three consecutiveframes, when we move the camera quickly, the re-sulting image presents excessive quantity of noise,what makes the marker detection more challenging.We are currently studying ways to identify and sep-arate the blobs generated by noise from those thatare actually part of the markers.

For the same reason, our technology is yet not fea-sible for displaying dynamic content, such as videosand animations. In this case, as the most of col-

日本バーチャルリアリティ学会複合現実感研究会 Vol.19, No.1, 2015

ors and contours tend to change drastically at everyframe, the marker detection becomes impossible.

5 Conclusion and Future Work

With the proliferation of public displays, and con-sidering that mobile devices are nowadays ubiqui-tous, there is a huge opportunity to develop newapplications and studies in human-computer inter-action. However, most of current approaches fortracking the remote display are obtrusive or stronglydependent of the content shown on the screen. Inthis paper, we presented an innovative method de-tection of visual codes embedded in textured im-ages, using commodity cameras with 45 fps capturerate. The codes are embedded in the images com-puting complementary colors, which due the effectof metamerism, when switched at speed higher thanthe fusion-frequency of the human eye, are perceivedas just one resulting color.

As future work, we want to provide deeper un-derstanding of the Human Visual System as well asmore detailed user study for a real evaluation of ourtechnique. Tests with newer off-the-shelf displays,with refresh rate of 120Hz will allow us to use highercontrast colors, what can make the marker detectioneasier and still remaining unobtrusive for the humaneye. Additionally, we intend to adapt this technologyto work with movies. Then, a new range of applica-tions can be develop on the top of our framework.

References[1] G. Woo, A. Lippman, R. Raskar: VRCodes: Un-

obtrusive and Active Visual Codes for Interactionby Exploiting Rolling Shutter. ISMAR 2012. pp.59–64, 2012.

[2] A. Grundhofer, M. Seeger, F. Hantsch, O. Bimber:Dynamic Adaptation of Projected ImperceptibleCodes; ISMAR 2007, pp. 1–10, 2007.

[3] S. Boring, D. Baur, A. Butz, S. Gustafson, P.Baudisch: Touch Projector: Mobile InteractionThrough Video. SIGCHI, pp. 2287–2296, 2010.

[4] D. Baur, S. Boring, S. Feiner. Virtual projec-tion: exploring optical projection as a metaphorfor multi-device interaction. SIGCHI. pp. 1693–1702, 2012.

[5] S. Yamamoto, H. Tanaka, S. Ando, A. Katayama,K. Tsutsuguchi. Visual SyncAR: Augmented Re-ality which Synchronizes Video and Overlaid In-formation. The Journal of the Institute of ImageElectronics Engineers of Japan, Vol. 43, pp. 397–403, 2014.

[6] L. P. Berge, M. Serrano, G. Perelman, E.Dubois. Exploring Smartphone-based Interactionwith Overview+Detail Interfaces on 3D PublicDisplays. MobileHCI. pp. 125–134, 2014.

[7] H. Uchiyama, H. Saito. Random Dot Markers.IEEE VR. pp. 35–38, 2011.

detection of imperceptible on-screen markers with...

Documents