projector-camera system for flexible interactive projections...this report describes a master’s...

Projector-Camera System for Flexible Interactive Projections

Andreas Ammer

TRITA-NA-E05041

Numerisk analys och datalogi Department of Numerical Analysis KTH and Computer Science 100 44 Stockholm Royal Institute of Technology SE-100 44 Stockholm, Sweden

Projector-Camera System

for Flexible Interactive Projections

Andreas Ammer

TRITA-NA-E05041

Master’s Thesis in Computer Science (20 credits) within the First Degree Programme in Mathematics and Computer Science,

Stockholm University 2005 Supervisor at Nada was Lars Bretzner

Examiner was Yngve Sundblad

Projector-Camera System for Flexible Interactive Projections Abstract This report describes a Master’s thesis project to create a projector-camera system. A projector-camera system is a fairly new interface for human-computer interaction where the user can interact with projections. This report starts off by examining some previous work and then goes on to study some different solutions and approaches to the problems you face when constructing a projector-camera system. After that I give a more detailed description of how the system was built and what important decisions I made. Finally the report goes through some simple prototypes and lists experiences learned when testing the system on ourselves as well as some outside test users. Projektor-kamerasystem för flexibla interaktiva projektioner Sammanfattning Denna rapport beskriver ett examensarbete i datalogi där målet var att skapa ett projektor-kamerasystem. Ett projektor-kamerasystem är ett rätt så nytt gränssnitt inom människa-datorinteraktion där en användare kan interagera med projektioner. Denna rapport börjar med att undersöka tidigare arbeten inom området och går sedan vidare till att studera några olika lösningar och tillvägagångssätt till problemen man möter när man skapar ett projektor-kamerasystem. Därefter beskrivs mer detaljerat hur systemet byggdes och vilka viktiga beslut som togs. Till sist görs en genomgång av ett par enkla prototyper och de erfarenheter som iakttogs när systemet testades på erfarna såväl som på utomstående användare.

Contents

1 INTRODUCTION 1 1.1 MOTIVATION AND GOALS 2

1.1.1 GOALS 2 2 RELATED WORK 3

2.1 PROJECTOR-CAMERA SYSTEMS 3 2.2 RELATED SYSTEMS 4 2.3 FUTURE APPLICATIONS 7

3 OVERALL SYSTEM DESIGN 8 3.1 SYSTEM OVERVIEW 8 3.2 CAMERA AND PROJECTOR SETUP 9 3.3 TASKS AND COMMUNICATION 9

3.3.1 COMMUNICATION PROTOCOL 9 3.3.2 CAMERA IMAGE PROCESSING 9 3.3.3 PROJECTION CONTROLLER 10

3.4 ACTIVE AREAS 10 3.5 FEEDBACK 10 3.6 EXAMPLE 12

4 CAMERA IMAGE PROCESSING 13 4.1 FINGERTIP FINDING ALGORITHMS 13

4.1.1 MOTION DETECTION WITH A STATIC REFERENCE IMAGE 13 4.1.2 MOTION DETECTION WITH FREQUENTLY UPDATED REFERENCE IMAGES 15 4.1.3 COLOR MATCHING 15 4.1.4 TEMPLATE MATCHING 16 4.1.5 CONCLUSION 18

5 DETAILED IMPLEMENTATION INFORMATION 19 5.1 SYSTEM PIPELINE 19 5.2 ACTIVE AREAS 20 5.3 DIFFERENCE IMAGE ALGORITHM 20 5.4 TEMPLATE MATCHING ALGORITHM 21

6 PROJECTION CONTROLLING 23 6.1 OVERVIEW 23 6.2 INTERACTION DELAY AND VISUAL FEEDBACK 25 6.3 COMMUNICATION 25 6.4 SLIDES 26

6.4.1 UNIVERSALSLIDE 26 6.4.2 CUSTOMSLIDE 26

7 PROTOTYPE EXAMPLES AND OBSERVATIONS 27 7.1 PROTOTYPES 27

7.1.1 SIMPLE SLIDESHOW 27 7.1.2 EXAMPLE: FROGSLIDE 27 7.1.3 EXAMPLE: TICTACTOESLIDE 28

7.2 EXAMPLE SYSTEM RUN 28 7.3 OBSERVATIONS 30

8 CONCLUSIONS AND FUTURE WORK 31 9 SUMMARY 32 REFERENCES 33 APPENDIX: SLIDESHOW.XML 35

1

1 Introduction As computers are getting more advanced and the average user is getting more used to interacting with them, the research on creating better and more user friendly interfaces is growing, along with the research to make old interfaces more user friendly. This master thesis studies a fairly new type of interface, analyzes different approaches to it and implements a simple prototype. The interface in question is called a projector camera system. As the name implies the interface consists of two important hardware components, a digital video camera and a projector. The projector is used to display information valid to the user on a suitable surface. The camera then watches the user for any reactions or interactions (see figure 1).

Depending on what the user does the projected display changes to what the

system interpreted the user wanted. What the system actually looks for can differ. One example is to watch how close the user is to the projected display and then update the display as the user gets closer. Another example is to watch for user interactions within the display as the user uses hands and fingers to interact with different active elements in the display. Since this is such a new invention there is no standard or well known way to implement it.

The report is organized as follows. First we will make a summary of previous works in this area of research and also look at some similar systems (Chapter 2). After that we will examine the hardware and software setup for this project in a brief overview (Chapter 3), and then study each part in detail going through every implementation related decision carefully (Chapter 4–6). Finally we will end the report with a summary of what was done and what important lessons we learned.

Figure 1. A projector camera system.

2

1.1 Motivation and Goals Why is a Projector-Camera System needed? Other user interfaces such as touch screen monitors or interactive computer terminals already exist and provide a solid base to choose from. The advantage of a projector is that the image can be displayed in a number of different location and sizes. A projection can be several meters wide or just a few decimeters, whereas the size of monitors is limited and not adjustable. The use of power cables and such is simpler with a projector, as a projector often is placed where it can not be seen. Another big advantage is when the projector is turned off the projection area is empty and free to use for other things, where a monitor that is turned off is just a waste of space.

A Projector-Camera could for instance be used on a restaurant table to show the menu, once the customer has decided what to eat and drink the projection would terminate and the table would be free to eat off of. Another kind of application is an information kiosk where a user gathers useful information. In this case the information can be displayed nearly anywhere on a vacant surface. Information can include directions to different parts of a large shopping mall or information about certain articles, such as available sizes and stock.

1.1.1 Goals The goal of this project is to first of all get a good understanding of the problems and characteristics of a Projector-Camera System. We want to create a fully functional prototype with one or two example implementations, such as an information kiosk and a simple game. Both the projector and the camera will be of standard configuration, no specific hardware components will be used. An important goal we want to fulfill is that once the system is finished, the process of creating new applications such as an information kiosk should be simple enough for anyone to implement it. The goal is not to create one single prototype, but create a system that can handle lots of different prototypes specified by a user. We will not include a usability study in this paper but the possibility to do one in the future is open since the system will be flexible enough to create new prototypes to test.

Many problems will come up in the progress of creating this system. Some of the big problems to be solved are: • We will need to find an appropriate application to implement. Our way to

motivate the construction of this system. • Depending on the application we need an efficient way to identify user

interaction. • We need to make sure the interface is simple enough for an inexperienced

user.

3

2 Related Work Most of the previous work in this area is theoretical research done in connection with evaluations or discussions of other types of human-computer interfaces. In this chapter we will study some of the existing practical implementations of Projector-camera systems.

2.1 Projector-Camera Systems Scientists at Siemens such as C. Maggioni have written several articles in this area, the early ones being strictly theoretical since the hardware that existed back then did not support a real time system. One of the first articles [Maggioni, Cristoph. 1995] written in this area discuss the use of hands and gestures as a big part in human to human interaction, and how it might also be implemented in human-computer interaction. As computers got more power the theoretical work could more and more be implemented as practically working applications.

A man that has been leading in inventing and creating implementations of

Projector-Camera systems is the Frenchman Francois Bérard. His work in this area of research includes “The Magic Table” [Bérard. 2003] where users interact with a projection using colored magnets. In another project from Bérard described in the article “Bare-Hand Human-Computer Interaction” [Bérard, Hardenberg. 2001] they successfully display several software applications such as an internet browser where the user could click on links and scroll down the page. In this article Bérard describes an advanced algorithm to locate the user’s fingers as well as the user’s hands and included a wide range of different commands. For instance scrolling was applied by spreading all fingers and moving the hand up and down, while clicking a link was done using only one stretched finger.

Figure 2. Left: the Everywhere Display projector system of IBM. Right: the round table with three items highlighted.

4

Another team of researchers working for IBM, Claudio Pinhanez et al, has implemented a number of working prototypes. Research has been done over a wide range of ideas. Basically they all worked to give information to the user in different situations. In an article they describe four working implementations including the Retail Store Application (RSA) [Pinhanez, Kjeldsen. 2003a]. The RSA displays information on a projection area next to a stack of clothes in a store and changes display depending on how close the user is and which type of clothes the user currently is looking at. Information in this case was the different sizes available and how many articles were in stock. A similar idea was also tried on a round table where the displays changed along the side of the table as the customer walked around it showing detailed information about the item closest to the customer (see figure 2 right). The round table is described more closely in an article [Pinhanez, Kjeldsen. 2003b] that also discussed the hardware setup on a more detailed level. The most striking feature of the systems developed by the IBM team is that the camera and the projector are mounted upon a steerable motorized arm (see figure 2 left). This means they can control where the projection shall be displayed, and with software they are able to correct image distortion in real time. The system can project images on nearly any surface available close enough to the projector. The IBM team has also created a system, more like Bérard’s Bare-Hand, where the user interacts with a display by pointing and clicking in it [Pinhanez, Kjeldsen. 2002]. The system was sensitive to movement whereas Bérard had a system that was able to find hands and fingers that were not moving. This meant that to register a ‘click’ from the user some sort of movement was required, and the solution was a forward motion followed by a retracting motion.

Siemens’ researchers have not only theoretical results to show. In chapter 2 in the book [Cipolla, Pentland. 1998], Maggioni and Kämmerer discuss several different implementations and ideas. They created an Information Station where a user could gather useful information about various products by interacting with his hands in the projection.

2.2 Related Systems Many applications similar to standard Projector-Camera systems exist but with the demand of special hardware or with a different setup. A team of Swiss and American scientists has designed a Projector-Camera system [Starner, Leibe. 2003] with one significant difference in the hardware setup. The camera looks at the infrared (IR) spectrum, in effect it looks after heat signature where a regular camera just looks for any moving object. This solution has some advantages in that you know almost for sure that it is a person interacting with the system and not a pen or other finger shaped object. During their tests, items like coffee cups and heated food appeared but were easily filtered away since these objects look nothing like a hand or finger. The big disadvantage is that the hardware is very specific whereas the regular projector camera system in theory could use any camera and projector available.

The PDS (Portable Display System) created by Stanislav Borkowski et al. [Borkowski, Riff, Crowley. 2003] is a Projector-Camera system where the projector and camera are mounted on a motorized rack and the user can direct the projection using a small rectangular piece of cardboard (see figure 3). The camera finds the cardboard and follows it, telling the projector to do the same. A similar system to the one IBM used, but where IBM used theirs to give

5

information, this system is only for changing projection area without shutting down and restarting/recalibrating the system. This system also lacks any interactive projections which is what we will implement.

A project that is very relevant to this master thesis even though it is not a

Projector-Camera system is a work by Zhengyou Zhang at Microsoft Research [Zhang. 2003]. In the paper Zhang goes through a number of applications that is controlled with finger and hand gestures captured by a camera (see figure 4). This project does not however use a projector.

In the area of recognizing fingers and hands Alexandra L.N. Wong et al has

constructed a hand scanner that can identify a unique user by looking at the hand [Wong. 2002]. They have managed to create a system that very accurately can identify all parts of a users hand. They do this however by using a specially designed hand scanner with high resolution. Although the technique is interesting, the quality level of the input image extremely high in resolution and nothing like what a standard digital video camera of today can produce. And since it is live video we are working with this technique is too slow on computers today.

Mark Ashdown and Peter Robinson are two researchers at the University of Cambridge. They have created a personal workspace they call Personal

Figure 3. The PDS system. To the left is the hand held cardboard and to the right a projection has successfully been projected upon it.

Figure 4. One of the interactive implementations of Z. Zhengyou. The user can interact with the displayed image by using her hands.

6

Projected Display [Ashdown, Robinson. 2003]. A regular desk is used to project a large desktop workspace where the user can interact with the display using an electronic pen (see figure 5). Two projectors are used, one to display the major part of the workspace and another with higher resolution to view documents and other text in more detail. The electronic pen was a tool that proved to be an intuitive and easy to learn interface that even the most inexperienced user could learn to master reasonably fast.

For more accurate location of the fingertips two scientists at the Brigham

Young University created a system with 2 cameras [Fails, Olsen. 2002], giving the system to locate the users hand in a 3D environment. This project did not include a projector but the use of 2 cameras for better abilities to locate the hand is certainly an interesting implementation feature.

Figure 5. The Personal Projected Display. In practical use to the left and the hardware setup sketch to the right.

7

2.3 Future Applications The research in this area is ongoing and develops more as new hardware and faster and smaller computers can be used by the scientists. Cameras are now small enough to be mounted into personal cell phones for a small amount of money. Scientists predict that soon projectors too will be small and cheap enough that installing them into cell phones becomes possible. This means every cell phone could be a portable Projector-Camera system and the applications they could support are limited only by the imagination of the scientists. One research team is working on a virtual keyboard [Virtual Devices Inc. 2004], where the phone projects a keyboard on any available surface and the camera looks for fingertips in the area (see figure 6).

Figure 6. A virtual keyboard built into a PDA makes typing easier as long as there is a large enough planar area around.

8

3 Overall System Design We will in this chapter do a brief overview of how the first decisions regarding software development.

3.1 System Overview The Projector-Camera system consists of two parts; one part controls the projector and what images to show. We call this the Projection Controller or PC. The other part controls the camera taking care of the images the camera captures. This part is called Camera Image Processing or CIP (see figure 7). In theory the system could be constructed as one big application that handled everything, but as the two parts are significantly different in functionality and implementation splitting them up will make each implementation easier. This makes it easier to increase the efficiency of the system as each component does only what it is good at doing.

All user interaction will be based on fingers; algorithms for finding hands, faces or any other parts can be implemented. We chose to use fingers, more specifically fingertips because the fingertip is the most intuitive pointing tool humans have. What kind of fingertip finding algorithm to use will be explained later on in chapter 4.

Figure 7. Simple layout sketch of the system. PC stands for Projection Controller and CIP for Camera Image Processing

9

3.2 Camera and Projector Setup When positioning the camera and projector there are several things to take into consideration. In this master thesis we will use one regular camera and one regular projector. Previous works have used many different setups of hardware, for instance using two cameras for better depth perception or an infra red camera, or multiple projectors for better projections. The goal of this project is to use as generic components as possible and by that not limiting the usability to specific hardware components. Our setup requires that the camera and projector are aligned, so that the camera observes the exact image the projector is showing. Software to correct the camera view or the projection exists and has been used in several similar projects, for instance the PDS mentioned earlier [Borkowski, Riff, Crowley. 2003]. We decided not to use any such software, partly because research in that area is already well covered, partly because we want to limit the work to be done in this project.

3.3 Tasks and Communication The communication between the Projection Controller and Camera Image Processing will take place in sockets. We will use a simple client-server protocol.

3.3.1 Communication Protocol We decided to make the CIP the server side. This has advantages such as supporting several clients to connect from different places for access to information about the state and current settings of the camera. The projection controller side only contains information about the different images that can be projected and is more suitable to act as a client in the socket.

Most of the data that will be sent between the two parts will be numbers which makes it simple to implement a protocol. We want the system to be able to handle more advanced commands for possible future modifications. We decided a text based communication protocol would be more generic and open for future expansions.

Data sent from the camera to the projection controller will only consist of coordinates where a finger was found while data sent from the projection controller to the camera will contain a number of commands. The projection controller will also send coordinates whenever the active areas in the projection have been updated. The projection controller will act as a controller to the camera application using commands in form of strings.

3.3.2 Camera Image Processing The CIP takes every image captured by the camera and searches it for fingertips. Searching for patterns, such as fingertips, in images is very computationally demanding. To reduce the work, the areas in which the algorithm will search are limited to the specified active areas, this means we can disregard part of the image as not important and need not to waste computing power on it. The result of the algorithm is a position, representing the best match for a fingertip found in the image. The coordinates of this position is sent to the projection controller where they are examined and acted upon.

10

3.3.3 Projection Controller The task of the projection controller is to store all information about the images shown by the projector. This include data such as where the active areas are within all images, what happens when a specific active area is activated and if an active area is linked to another image, that must be stored as well. All this information can be stored on file for easy access and ability to save a specific system setup. Coordinates will continuously be received from the camera control where fingertips have been identified. The projection controller will examine whether these coordinates are within an active area and if they lead to activation of an area the appropriate action will be taken. Each time a new image is projected, the coordinates of the active areas need to be sent to the camera control for more efficient image processing.

3.4 Active Areas The image processing will go through each image read from the camera feed looking for user inputs, in our case fingertips. To simplify detection and reduce computational load we decided to define special active areas in the projection where the user interaction takes place. This means we can eliminate parts of the image when looking for fingertips which will increase the speed of the system. It also means we can have moving elements displayed, such as movies or animations without it interfering with the fingertip finding algorithm by placing the moving elements outside the active areas. Where these active areas will be stored is another decision we need to make. Initially all information about where the active areas are will be on the projection controller side of the application since that is where the projection is defined. Either the active areas can be sent to the camera control during runtime as they change or they can all be sent in the initial phase after they have been read and stored. A problem with sending all of them initially is if the projection consists of dynamic elements, images that are generated during runtime with new active areas. To have a system that can handle both static and dynamic images we will implement continuous report of active areas.

3.5 Feedback One of the big problems with some existing Projector-Camera systems is the lack of appropriate feedback to the user. The researchers at IBM could conclude this after testing their Projector-Camera systems on inexperienced users [Kjeldsen, Pinhanez. 2003c]. The only thing a user can see and get information from in a Projector-Camera system is the projection, and a projection has no feeling or physical feedback. Lacking a physical form we still have 4 senses to work with, seeing, hearing, smelling and tasting. We quickly rule out smell and taste, since it is practically impossible to implement with today’s hardware, even though implementing feedback for these two would be both interesting and a challenge. This leaves us with hearing and seeing. The way we wanted the active areas to work was with a certain time delay, if a user holds his/her fingers over an active area for a number of seconds that area will become active. To add audio feedback to such a function would mean playing sounds for several seconds at a time. Since every button has a time delay, we could use a sound that intensifies until it finally registers the “click”. The sound could just be played when the area is activated but then there will be no

11

feedback during the actual “click”, which means the feedback would still be bad. Playing sounds several seconds every time the user is interacting with the system would probably be more annoying than helping in the long run, depending on the scenario and the application. Another problem with audio feedback is where to place the speakers; if the sound comes from behind the user for instance it would probably be more confusing than helping. The best way to implement audio feedback would be in combination with visual feedback. With only visual feedback left we need to find some intuitive way of implementing it. Visual indication of user interaction can be done in a number of ways, all we need to do is find a good way. Some kind of indicator that would fill up when-ever an active area gets activated and that correspondingly would empty if the activation stops. The design and the position of this indicator is another issue. If there are lots of active areas there could either be one indicator used for all of them or every active area has its own. The important thing is to make sure the indicator does not interfere with any active area since changes in the display will show as potential fingertips in the fingertip finding algorithm. Consistency is important when creating a good user interface.

If we create a prototype using just one indicator in an image no matter how

many active areas the image has it would be best that all images in the system were built the same way. However if we created another prototype we could use another indicator placed somewhere else. Or if we created a Projector-Camera system for only experienced users we could mix different indicators. This shows that depending on what kind of prototype we create and which users it aims for the design of the images and indicators may differ a lot. We decided to implement some different indicators and different positioning of them just to show how it can be done rather than how it should be done. We want a system that is useable even for the most inexperienced user and adjust the design after that.

Figure 8. Left, the problem with IBM’s system was the lack of feedback. Right, one of our prototypes. The circle filling up with red indicates the user is successful in his actions.

12

3.6 Example The example we will go through (see figures 9–11) is named “The Frog-slide”, it consists of a single frame with two buttons and one small image of a frog, all in all three active areas. The two buttons are fixed and does not move when activated, one is named “Reset” the other “Exit”. Activating “Reset” will make the frog go to the center of the display, while activating “Exit” will make the system shut down. Pointing at the frog will make it jump to a random position somewhere in the display.

Figure 9. Left. The frog slide with the frog in the initial position. Right. The user points at the frog.

Figure 10. Left. The frog has now changed position. Right. The user is activating the Reset button.

Figure 11. Left. The frog has been returned to its initial position. Right. The user is activating the Exit button.

13

4 Camera Image Processing We have chosen to use fingertips as the users’ way to interact with the system and will concentrate on those algorithms that suite this kind of problem. To get a good update on some of the existing algorithms this chapter will go through relevant methods, ending with a conclusion and analysis.

4.1 Fingertip Finding Algorithms An intuitive way for a human to interact with any computerized user interface is with his/her hands. The function of a fingertip finding algorithm is to locate and identify any interaction from the user. The most significant features of our hands are the fingers, which is what the algorithms listed below try to locate. Most such algorithms consists of four basic steps; 1. Fetch the last image read by the camera. 2. Motion detection, calculate a difference-image to locate interference or

interactions. 3. Shape and/or color matching to see if the interference in step 2 actually was

a finger. 4. Store the coordinates of eventual fingertips.

We will now look at three different algorithms for motion detection and then discuss some different approaches to template matching.

4.1.1 Motion Detection with a Static Reference Image The first image difference algorithm we will look at uses a static reference image (see figure 12–14). When the system starts up the reference image algorithm, it captures and stores the first image – the reference image – displayed by the projector. If the number of different projections possible is known from start, every one of these can be captured and stored in the initializing part. When the system is up and running the latest image captured by the camera is compared with the reference image. This is done pixel by pixel in all three different base channels (red, green and blue) of the images as described in equation (1). The resulting difference image is a binary (black and white color only) image where any interference strong enough to pass the given threshold is marked black while white indicates no significant difference was found. For a more detailed description of the algorithm the reader is referred to the chapter “Detailed Implementation Information”.

Static Reference image is an algorithm best suited for static projections such as menu systems or information displays where the background does not change while the system is running. A big plus with static reference image is that it can register fingers that are still, which makes it easier to recognize user inputs. On the other hand, a big drawback with static reference image is that it can not handle big changes in environment and lighting, or if the images displayed by the projector are created while the system is running and therefore impossible to store the reference image beforehand. For example in a tic-tac-toe game where the next possible image differs upon where the user chooses to

)()()( )()()()()()( BrefBcurGrefGcurRrefRcurdiff IIIIIII −+−+−= (1)

14

set his/her mark, it is impossible to store every possible reference image. It is important to be able to store the images before the system is up and running because if reference images were to be taken later on, the possibility exists that a user has her hands within the projection area, and the reference image will then contain a hand or a finger. If this is the case the difference calculating will always result in finding a fingertip where the accidental imprint occurred.

Figure 12. Live feed from the camera.

Figure 13. Reference image taken in the initial state.

Figure 14. Difference image resulting from the difference algorithm.

15

4.1.2 Motion Detection with Frequently Updated Reference Images Where the static reference image algorithm was more suited for static projections in stable environments, movement detection works almost under any conditions. Instead of taking the reference image in the initial state of the system this algorithm considers every image a reference image to the following image. That way slow change in the display area will be disregarded.

Difference is calculated just as in the static reference image algorithm, pixel by pixel in all three channels (RGB) of the image; with a binary (black and white) image as a result, see equation (2). But where the static reference algorithm uses a fixed image as reference this algorithm uses one that is n frames old I(t-n). Black indicates a strong enough difference to pass the threshold and white indicates that no significant difference was found.

The big advantage with this algorithm is that because it always updates the reference image, the background and environment as well as the projection can change, but the algorithm will still be able to find the difference in two consecutive images.

Where this method solves the problem of changes in the display it also

creates a problem in finding a good way of identifying user interaction. In the case with static reference image the user can just hold the hand or finger over a specific area and after a certain time some action can be taken. This is not the case with this movement detection because as soon as the user holds the hand still it becomes invisible for the algorithm; in effect the calculated difference image of a user holding the hand still will be completely white. Any action from the user must now instead be based upon movement, for instance waving or “rubbing” a specific spot with a finger. The researchers at IBM [Pinhanez, Kjeldsen. 2003c] used such a movement-based algorithm and registered a click as a movement forward followed by a movement backwards. This is not the most intuitive way for a user to apply commands in since the forward motion followed by a backward could mean the user realized the action she was going for was the wrong one.

4.1.3 Color Matching Color matching is a method to locate objects of a specific color in an image. It can be the color of the skin of the hand, which only works if the projecting area is separated from the area for user interaction, as in the previous master thesis [Bodda. 2003] by Gabriele Bodda, or the color of some specific object like the magnets used by Bérard in The Magic Table [Bérard. 2003]. The resulting difference image however is still most often represented as a binary image. The fact that the projector’s display distorts the colors of all objects in the projection area makes it hard to implement this algorithm in a regular Projector-Camera system, but it is possible under some circumstances. Restricting the colors being projected or using very clear colors in the projected images could be a solution.

)()()( ))(()())(()())(()( 000 BntBtGntGtRntRtdiff IIIIIII −−− −+−+−= (2)

16

4.1.4 Template Matching In most fingertip finding algorithms, shape matching of some sort is a must. Template matching is a matching algorithm that uses a specific pattern, a template, which it searches for in the difference image (see figure 16). After the difference between the reference image and the live feed image has been calculated a binary black and white image has been generated and in that image all potential fingers and hands need to be located. Depending on what kind of algorithm was used in creating the difference image and what kind of precision is needed; the choice of template can vary greatly. If a static reference image algorithm was used prior to the template pixel match algorithm a bigger and more fingerlike template is appropriate. Whereas if a movement detection algorithm was used, a smaller and thinner template works better as the difference between two consecutive images has different characteristics. All pixel matching algorithms basically work in the same way. The smaller template image is compared to all sub-regions of the larger difference image. The number of sub-regions of the same size as the template is calculated according to equation (3). For instance a 10*10 pixel difference image with a 5*5 template image will require 25 matchings.

The easiest way to do a template match is to just go through both pictures

pixel by pixel and award one point for each matching pixel (see equation 4). For every position [row,col] the algorithm will produce an error describing the difference between that part of the image and the template. To make the algorithm normalized we divide by the number of pixels in the template, that way the overall scale of the template and image part has no effect on the error.

Figure 15. The fingertip template of Bérard. Where d1 is the diameter of the little finger and d2 the diameter of the thumb. [Hardenberg, Bérard. 2001]

)(*(regions# ) templatedifftemplatediff HeightHeightWidthWidth −−= (3)

17

One type of template is described by F. Bérard in one of his articles [Hardenberg, Bérard. 2001] (see figure 15). Another type of template is the one used by Gabriele Bodda in the previous master thesis from this institution [Bodda. 2003]. He uses a template that looks more like a finger than a fingertip, as is the case with most templates. This makes the algorithm better at finding actual fingertips but with the cost of rotating the template at each sub-region. To combine both the rotation and the joined pixel test into one algorithm could mean problems with the speed of which the system can handle data.

When determining which type of template and pixel matching algorithm to use it is important to take into consideration what kind of hardware the application will run on. To use a simple round template and to that use a simple pixel match algorithm means the system does not have to be state of the art since the computational load on the system will be lighter. But this also means that the images can be misclassified when differences not looking like fingertips at all can be rewarded high by the pixel match algorithm.

Finally one last important factor to take into consideration when choosing a template is the size of the template. For instance if the distance between the camera and the area where interaction takes place is great the large fingers will seem smaller and the size of the template needs to be scaled down, similarly it needs to be scaled up if the distance is very short. Other factors of size to take into consideration are if the system should work well for children as well as for grownups and if the system can change projection surface while running, like the Everywhere Display of the IBM researchers [Pinhanez, Kjeldsen. 2003]. In

Figure 16. Three different templates. (a) A round template for static reference image algorithms. (b) Another template for static reference image algorithm, this one

needs to be rotated. (c) A template for movement based algorithms.

)(

])),[],[(Im(],[ ,

TemplateAREA

vuTemplateucolurowageABScolrowerror vu

� −−−= (4)

18

these cases it can be a good idea to be able to change template as the system is running or to have a template of regular size and a pixel matching algorithm that is not too precise.

4.1.5 Conclusion When choosing a difference image algorithm for this master thesis we were weighing the positive and negative sides of static reference image and motion detection against each other and decided for the static reference image. We want to make a system where the user interaction is both intuitive and with good possibilities to give useful feedback and we felt that those two items were the hardest to implement in a good working manner using motion detection. A static reference image is also suitable when the content of the images being projected and the projection area are known from start, which is what we had in mind in our implementation.

We will implement two prototypes of the Projector-Camera system, one with static menu elements which a user can go through for information or control over some appliance and one with a more dynamic behavior where the slides are altered during runtime. Here a motion detection algorithm might be preferred but we felt that consistency was important and if a new algorithm was to be implemented the way for a user to interact with the system must change too. Instead we will make the dynamic imagery work with the static reference image algorithm by disregarding certain areas when running the finger finding algorithm.

Choosing a template is very specific to how the system should behave and what kind of hardware is available. Since the hardware in this case is sufficient we instead took into consideration how the system should act under certain circumstances. The system should be easy to use for everyone including children and adults; we therefore need to find a round template that is of regular size. To that we implemented a simpler pixel-matching algorithm that awards points for each matching pixel, and added a low threshold value so that a finger from small to large size will get the system to respond. This might get a lot of misclassified images but hopefully not too many to interfere with the stability and functionality of the system, something that will show in a later stage.

19

5 Detailed Implementation Information In this chapter we will describe on a more detailed level how we implemented the different parts of the Camera Image Processing. We have chosen to write the image processing in the C++ programming language because it is traditionally considered to be more efficient when designing computationally expensive algorithms. The image processing library we chose to use – Halcon – also has full support for C++.

5.1 System Pipeline The system works in a 6-step pipeline (see figure 17). 1. Initializing phase that is only done once. 2. If needed, new and updated active areas will be sent. 3. The system reads a new image from the camera. 4. A difference image is calculated using a difference image algorithm. 5. Template matching generates coordinates. 6. Information is sent through the socket. Back to step 2.

Step 3–5 will be executed on the Camera Image Processing side of the application. What we need to establish is what data to send in step 6. If the template matching was unable to find any coordinates no data needs to be sent at all, this is only necessary when user input has been registered. Most of the time the system will be idle and no information will be sent, as no fingertips will be found.

Figure 17. System pipeline overview.

20

5.2 Active Areas The active areas are designed to make the fingertip-finding algorithm go faster and reduce the possible mismatches. By eliminating parts of the projection and by that eliminating parts of the image read by the camera as parts that need to be searched for fingertips two major advantages arise. Firstly the fingertip-finding algorithm is the most cpu demanding process in the application, and by greatly reducing the amount of data to search through the program will run a lot faster. Second, within those areas that has been eliminated, moving objects and animations can now be shown without interfering with the reference image based difference algorithm.

Active areas are defined as Rectangles; a coordinate, the upper left corner, and a width and a height (see figure 18). These Rectangles are stored in the projection controller for each image in the system. Every time an active area gets activated, from user interaction, the projection controller will send new active areas, and while it sends them it will also change the projected image.

5.3 Difference Image Algorithm We chose to use a difference image algorithm based upon a static reference image algorithm. With the addition of the ability to handle multiple reference images, taken in the initial state of running the application. For each new image that is captured the difference is calculated between it and the corresponding reference image. This is done by first splitting the images into the three color-sub-channels, red green and blue, and then in each of the three channels doing a pixel-by-pixel subtraction. Using a specified threshold, three binary difference images are generated where a black pixel marks a difference strong enough to pass the threshold value. The three images are merged into one picture, again using a pixel-by-pixel match where only matching black pixels are saved.

Figure 18. Active areas marked in the image.

21

For each image For all color channels For each pixel If pixel_img – pixel_template > T Pixel_diff = black Else Pixel_diff = white Diff = diff_r + diff_g + diff_b For each pixel in Diff Remove single and small clusters of

pixels Return Diff

To eliminate insignificant differences we use connected-components, a

method that only saves big collections of black pixels, while smaller collections consisting of one to a few pixels are discarded. This elimination is done using 8-neighbourhood, which is an algorithm that for each pixel in the image determines the number of neighboring pixels of the same color.

8-neighbourhood means the algorithm looks at all the surrounding pixels and if none of these are a match the current pixel is removed. After eliminating single pixels another algorithm removes larger groups of pixels that still are too small to be matches. This algorithm counts the number of pixels that are connected and if they are less than a specified amount, all the pixels will be eliminated. The number of pixels required to not eliminate the group can differ as the size of the template image differs. The parameters of this algorithm must be adapted to the size of the template image currently in use. The resulting picture is then passed on to the fingertip finding algorithm.

5.4 Template Matching Algorithm

Pseudo code. Simple code for the difference image algorithm.

Figure 19. The best match marked with a grey rectangle.

22

To get an overall efficient fingertip-finding algorithm we chose the round template image. This will mean some misclassified images where the algorithm will find fingertips that are not really there. In every difference image the algorithm will go through the active areas doing a template match and reporting the highest score with corresponding coordinate. The best match function generates an error for each pixel within an active area, an error describing how good the template match from this pixel was (see equation 4). The collected errors will after the algorithm is run describe the obtained error for each coordinate within the active area, the coordinate with the lowest error will be the one returned by the algorithm (see figure 19). If the value of the error is within some pre-defined limits of the error a fingertip would give the coordinate is sent to the Projection Controller.

When running the fingertip-finding algorithm the distance between the camera and the actual projected image is very important. The closer to the projection the camera is the larger the fingers will appear to be in the captured image. If the camera and projector should be able to be moved between different distances while the system is up and running a number of different template images will be needed, alternatively one original template image can be scaled to fit the current distance or the system could generate the templates by itself during runtime. To find a size that fits a certain distant is not very hard once one template has been identified for one specific distance (see figure 20). Mapping the templates to specific distance can be done in advance the system will be used. The user then only needs to specify the distances and the system will know what template to use from earlier testing from a simple formula of a constant k and the distance (see equation 5). The constant is something that need to be identified by testing since it can differ on different cameras.

Figure 20. Three different template images for different distances between camera and projection surface. Which template to use is calculated from the distance between camera and projection area.

dkdfI templ

1)( == (5)

23

6 Projection Controlling Controlling what the projector should show and when to show it is an important part of the system functionality. In this chapter we will look over the structure of the projection controller and explain some of the implementation decisions.

When creating a graphical interface there are lots of programming languages and libraries to choose from, we decided to work with Java since it is a well known language with two good graphical libraries in AWT and Swing where SWING was what we chose to work with. We wanted to make the program generic and modularized to make future expansions or modifications as easy as possible. We also wanted it to be possible to easily define images and menu systems without needing to have programming skills, to create a new slide show the user only needs to specify certain parameters in a XML file. Both of these requirements are fairly easy to fulfill using Java.

6.1 Overview For a graphical overview of the classes see diagram on next page. The main class that controls the dispatching of data to the correct slide and making sure the correct slide is being shown is SlideShowViewer. A SlideShowViewer has several components, an interface for communicating with the camera <CamProj>, a collection of slides <Slide> and an interface for reading XML files <XMLInput>. It also implements an interface, CameraListener which is the class with which the camera communicates with the SlideShowViewer. SlideShowViewer should inherit the JFrame class since we decided to use Swing. To add the functionality of a window controller we added the class SlideWindow and let SlideShowViewer inherit it instead of extending JFrame directly.

The Slide is a JPanel following the Swing pattern. Slide is also abstract, the purpose of it is to have a layout of how a standard slide works. When a user creates slides, she can use the built in script for creating slide shows using XML, or if a more complex slide is to be created with dynamic components the user can implement the <CustomSlide>. The CustomSlide is an empty shell for creating slides with dynamic behavior, with built in support in XMLInput, which means you can link the self made slide right into the slide show with all the static and other dynamic slides.

To keep track of all the text, images and active areas we created a collection of small classes to handle this. TextItem, ImageItem and ActiveItem all inherit the same properties from the super class Item. An item holds its own position and properties and has the ability to draw itself in the JPanel that is the current slide.

24

Diagram. Class diagram for an overview over class dependencies.

25

6.2 Interaction Delay and Visual Feedback Active areas are parts of the projection that can be activated and hold a function; they are the buttons of the system. To activate an area the user holds a finger over the selected area until the system has registered enough successful coordinates for the area to be active. This delay is important to avoid incorrect activation of areas as the user is moving the hand over several different areas. A successful activation requires user interaction over a certain amount of time. It is important to make this time delay just long enough to make the system easy to use. Too long delays are annoying for the user. Too short delays are not acceptable either as this will lead to misclassified activations.

It is important to give the user feedback when interacting with the system. From earlier discussions we reached the conclusion to use visual feedback, as this is mainly a visual interface. To visualize the feedback in the projection we use figures such as circles and rectangles that gradually get filled up with some color as the user successfully makes an active an area active. Every active area has a time limit for how long it has to be active before it gets activated; this time is equal to the system delay time. The placement of the feedback figure can vary; either every active area has its own figure or one figure can be used for all areas in the image. The important thing when placing the feedback figure is to not place them to close to or in an active area, since this could lead to misclassified fingertips as the gradually filled up figures will show as differences in the fingertip finding algorithm. The idea of bars and circles gradually filling up was something we worked out during this project. We knew the feedback was extremely important as it is the basic feature this kind of system normally lacks. We could have used symbols like an hourglass or clock but felt that gradually filling figures was a more intuitive way of reporting progress to the user, as the user sees the exact progress at each moment. An additional aspect of gradually filling figures is that when the interaction stops, the figure will start to empty out. Thus alerting the user that the interaction did stop which is important. With good and intuitive feedback the user will feel in control of the system not that the system is controlling him/her.

6.3 Communication We established that we want a text based protocol for communicating with the Camera Image Processor (CIP). To handle this communication we have on the Java side implemented a package. The package consists of two classes; CameraListener and CamProj. CamProj is a public class that handles all outgoing information while CameraListener is an interface that reports a CameraEvent every time data comes from CIP.

In the initialization of the system it is important to synchronize the communication between the Projection Controller (PC) and CIP, this is when all the reference images are taken. The PC has to make sure that the correct image is displayed by the projector and then tell CIP to capture and store the image, then if there is more than one reference image to be taken, display next image and tell CIP to capture the next one and store that too. After all reference images are taken and reported as stored by CIP the system can start up. It does this by sending the active areas of the first image and then displays it on the

26

projector, then goes into idle mode just waiting for CIP to start reporting hits from user input.

When the system is in running mode the only information being sent are hits reported from CIP. If an active area gets activated PC will update the display and then send the new array of active area coordinates. If the system is created to be able to change projection area during run-time we built in support to change template image along with some other commands that could be useful when creating a new system, like being able to take new reference images and resetting the system.

6.4 Slides The images and all their attributes are stored as Slides. Slide is an abstract class containing all data belonging to a slide, such as its active areas and images. Since we chose SWING for this project, Slide extends JFrame making it easy to paint where and when we want to. Defining a slide show in our application is done in XML. The XML parser understands two types of slides, UniversalSlide and CustomSlide, where the first is a standard static slide used to create menu systems where slides refer to each other, CustomSlide is a way for users to implement slides with different behavior.

A slide has two attributes that can be specified. Name and background color. The name is the way to refer to a slide in a slide show when linking it with other slides.

6.4.1 UniversalSlide The UniversalSlide is a static slide that has the same active areas all through the runtime of the system. A UniversalSlide consists of three different elements, images, text and active areas. The active areas are the clickable areas with a reference to another slide while text and image elements only consists of just that, raw text and an address to an image. Universal slides are linked together by using active areas. Every active area has a specified slide that the slide show will change to once that area has been activated.

6.4.2 CustomSlide A CustomSlide is a slide that the user constructs by herself, when some specific behavior is needed that can not be solved by using UniversalSlide. For instance when creating a simple tic-tac-toe game, it could either be done by linking several static UniversalSlides together or a way better solution would be to create a slide that changed appearance and by that its active areas during runtime. For this project we created two simple custom slides to illustrate what a dynamic layout can do. A custom slide has no way of specifying active areas unless they are manually coded into the system. Therefore a custom slide has an extra attribute – the name of the slide to go to when the exit function of the custom slide is activated.

27

Figure 21. A simple game where the user makes a small frog jump across the projection.

7 Prototype Examples and Observations In this chapter we will describe a few example implementations we did to illustrate how to use our system.

7.1 Prototypes To test the functionality of the system we implement a number of simple prototypes. We will look at two simple dynamic slides and one very simple static example.

7.1.1 Simple SlideShow A slide show is a number of slides specified in an XML-file. A slide show consists of UniversalSlides and CustomSlides. To define a CustomSlide a name is required to keep track of it, after that it is just to add images, text and active areas wherever the user wants. A CustomSlide has three fields and they are all mandatory, the class name is needed to create an instance of the class, the name and the name of a slide it refers to. An example of an XML file specifying a slide show can be found in Appendix 1.

7.1.2 Example: FrogSlide When creating the first simple prototypes of the system we wanted a simple dynamic slide that can change appearance during runtime. The FrogSlide (see figure 21) is a simple game where a picture of a small frog is displayed and when the user points at the frog it will jump in a random direction over the image. When the frog jumps to the new position the active areas of the image needs to be updated and then sent to the CIP so that the fingertip finding algorithm does not search in the wrong areas.

28

7.1.3 Example: TicTacToeSlide TicTacSlide is a simple tic-tac-toe game that shows the strength of the system when it comes to defining the active areas (see figure 22). Once a marker has been set that field in the slide will no longer be an active area and the marker that might easily be mistaken by the system as a fingertip will no longer be examined by the fingertip-finding algorithm. Activating the reset button will make all active areas active again and remove all markers. Once a marker is placed it is stuck there as we do not have a system of being able to move markers around. Another possible implementation could be a 4-in-a-row game where markers by default can not move once placed.

7.2 Example System Run For a graphical overview see flowchart on next page. We will go through the implementation when running the example in chapter 3 to more clearly see how the system will work practically. The example we will go through is named “The Frog-slide” and consists of a single frame with two clickable buttons and one small image of a frog, all in all three active areas. The two buttons are fixed and does not move when activated, one is named “Reset” the other “Exit”. Activating “Reset” will make the frog go to the center of the display, while activating “Exit” will make the system shut down. However clicking the frog will make it jump to a random position somewhere in the display, although it can not jump to where the two buttons are. This example would run as follows.

Figure 22. A game of tic-tac-toe, the four fields marked with circles are no longer active to prevent mistaken fingertip matches.

30

7.3 Observations As we tested these prototypes we could make some observations on the performance of the system. We did not have time to do an extensive usability evaluation but we had a few inexperienced testers that got to try the system without any knowledge beforehand.

A problem that showed up during these tests was that user interaction within certain colors and dark areas would not show up in the difference image algorithm. This need to be considered when designing a slide. If a part of the slide is very dark, no light will fall on the finger of the user, making it invisible for the system. One way to avoid this problem is to make sure other light sources than the projector exists to make the user more visible to the camera. Another is to make sure all active areas are within light areas in the slide.

A good thing was that the number of misclassified images, images where a finger was found even though there was no actual user interaction, was very low. If we would have got lots of misclassified images the feedback system would not work as planned. This way the feedback system worked just as we wanted it to and with a time delay of about 2 seconds everybody that tested the system found it easy to understand.

The frame rate we managed to get is around 10–15 frames per second. This means the projection controller gets 10–15 reported coordinates every second. This is also the count that controls the feedback delay. An active area requires a number of correct coordinates rather than time based interaction. No optimizing has been done at this point. This means that if we manage to get a the algorithm to work faster and get a frame rate of 20–25 the number of coordinates required to activate an active area needs to be increased for the time delay to be the same. This has been taken into consideration as every active area has a time factor parameter when specifying it in the XML file. The most important parameter for changing the frame rate was the resolution of the image from the camera, the larger image the slower the algorithm would work. Other parameters that affect the frame rate is the size of the template image, the size and number of active areas and the precision of the template match.

31

8 Conclusions and Future Work While working on this project we came across problems both expected and unexpected. In this chapter we will discuss what could have been done to avoid them or why they can not be avoided.

When starting this project we tried to identify some of the problems we could run in to. The problems we identified then were all related to what kind of application we would create and how users would interact with it. As it turned out, we created a system with static reference images; making the user interaction a little easier for the user than if we had gone with a dynamic system with motion-detection. The activation of functions could then be implemented by setting a time limit on an area and let an indicator show how long the user had to hold the finger in the position for the function to become active.

The problem of finding an appropriate application to implement has been placed on the user. We have created a system where a user easily can create slide shows and with some programming knowledge even create their own unique slides.

Another factor to take into consideration when it comes to lighting is not to have the projection in an environment with changing lights. Once the reference image is taken the system will work best with the same lighting as when that image was taken. Once the change in light is too big the difference algorithm will interpret the whole image as a difference and the system will not be able to function. One way to solve this problem as well as problems of dynamic slides is to implement a safe way to take new reference images. The problem with taking new reference images during runtime is that we have no control whether or not there are objects such as hands and fingers interfering.

Something we have not taken into consideration during this project is what happens if the projection and the view of the camera are not perfectly aligned. This is partially because software to solve similar problems already exists and we felt that it was not an important feature to add. For future implementations this is something that could be added to make the system more flexible, even more projection surfaces can be made accessible with this technique.

32

9 Summary When creating our projector-camera system we made several choices and observations. We will now briefly go through them to sum things up.

Our goal when starting this project was that the hardware should not be too specific, any projector and camera with a video input respectively output should work. After testing the system on different hardware and also in different environments – different colored background and lighting – we can conclude that this has been achieved.

We decided to divide the application into two parts, one part controlling the projections and one part handles the image processing. We called these two parts Projection Controller (PC) and Camera Image Processing (CIP). The CIP was written in C++ and acted as a server in the socket communication protocol we set up. The PC, written in Java would connect to CIP and initiate contact.

The big question when implementing a way of identifying user interaction was what method to use. Before we could make that decision we needed to decide what kind of interaction to use, and for that we chose simple pointing gestures. We therefore needed an algorithm that easily could find fingertips. We chose a method with a static reference image, the advantages of this method is that it is always easy to spot user interaction even if there is little movement. The downside is that there is a problem if the projected image changes during runtime. This problem was solved by introducing active areas. The system only looks for user interaction within certain areas in the image, as long as the projection remains the same in this area the interaction detection will work. Fitting a proper template image to a specific distance is something we had to test our way to accomplish. Now when the system starts all it needs to know is how far the camera is from the projection and the correct template will load automatically.

An important aspect when creating a projection-camera system is the lack of physical feedback. Since the only part of the system that was visible to the user was the projection we needed a visual feedback system. Making every clickable area sensitive to interaction and setting a time limit made it possible to use figures that would gradually fill up indicating a progress of performing a click. These figures would also empty if the interaction stopped or failed. The timing of this feedback was the most important aspect of the feedback system and we found after some testing that a delay of about 2 seconds was the optimal solution.

Finally we wanted the system to be easy to use by people that want to create their own slideshows. To achieve this we implemented an XML parser that could create slideshows from simple tags that the user could specify. Users with some experience with XML should have no problem creating slideshows of their own. We created several slideshows using XML during the testing of the system and a simple example prototype was also included in this Master’s thesis (see Appendix).

After trying out the system on different users and in different conditions we can conclude that it works well as long as the lighting does not change too much during runtime.

33

References Ashdown, Robinson. Experiences Implementing and Using Personal Projected Displays. 2003. Procams workshop at ICCV 2003, Nice, France, October 2003. Bérard, Hardenberg. Bare-Hand Human-Computer Interaction. 2001. ACM International Conference Proceeding Series. Proceedings of the 2001 workshop on Perceptive user interfaces Orlando, Florida. Bérard, The Magic Table. 2003. In IEEE workshop on Projector-Camera Systems IEEE/PROCAM'03, Nice, France. Bodda. A computer vision based prototype for human-machine interaction via pointing gestures. 2003. Borkowski, Riff, Crowley. Projecting Rectified Images in an Augmented Environment. 2003. International Workshop on Projector-Camera Systems, ICCV 2003. Cipolla, Pentland. Chapter 2 of: Computer Vision for Human-Machine Interaction. Published July 1998. Fails, Olsen, Light Widgets in Every-day Spaces. 2002. Proceedings of the 7th international conference on Intelligent user interfaces, January 13–16, 2002, San Francisco, California, USA. Maggioni, Cristoph. GestureComputer – New Ways of Operating a Computer. 1995. In Proc. International Conference on Automatic Face and Gesture Recognition, pages 166–171. June 1995. Pinhanez, Kjeldsen, Levas et al. Interacting with Steerable Projected Displays. 2002. In Proc. of the 5th International Conference on Automatic Face and Gesture Recognition (FG'02). 2002. Washington, DC. Pinhanez, Kjeldsen, Levas et al. Applications of Steerable Projector-Camera Systems. 2003a In Proceedings of the IEEE International Workshop on Projector-Camera Systems at ICCV 2003, Nice Acropolis, Nice, France, October 12 2003. IEEE Computer Society Press. Pinhanez, Kjeldsen, Levas et al. Steerable Interfaces for Pervasive Computing Spaces. 2003b.

34

First IEEE International Conference on Pervasive Computing and Communications (PerCom'03) March 23–26, 2003 Fort Worth, Texas. Pinhanez. Kjeldsen, et al. Embedding Interactions in a Retail Store Environment: The Design and Lessons Learned. 2003c. In Proc. of the Ninth IFIP International Conference on Human-Computer Interaction (INTERACT'03). Zurich, Switzerland. September 2003. Starner, Leibe et al. The perceptive workbench: Computer-vision-based gesture tracking, object tracking, and 3D reconstruction for augmented desks. in Machine Vision and Applications. 2003. Vol. 14, No. 1, pp. 59-71, April 2003. Virtual Devices Inc. 2004. http://www.virtualdevices.net/ Last visited 26 January 2005. Wong et al. Peg-Free Hand Geometry Recognition Using Hierarchical Geometry and Shape Matching. 2002. IAPR Workshop on Machine Vision Applications, Nara, Japan, December, 2002. p. 281–284. Zhang. Vision-based Interaction with Fingers and Papers. 2003. In Proc. International Symposium on the CREST Digital Archiving Project, pp. 83–106, May 23–24, 2003, Tokyo, Japan.

35

Appendix SlideShow.xml  <SlideShow> <Width>1280</Width> <Height>1024</Height> <StartingSlide>One</StartingSlide> <Template>2</Template>  <Slide> <Name>One</Name> <Background>red</Background>  <TextItem> <X>120</X> <Y>120</Y> <Text>Hello Welcome, this is the first slide.</Text> </TextItem>  <ImageItem> <X>100</X> <Y>400</Y> <Url>/home/images/image1.jpg</Url> </ImageItem>

36

 <ActiveItem> <X>200</X> <Y>200</Y> <Width>240</Width> <Height>240</Height> <IndicatorType>5</IndicatorType> <IndicatorColor>blue</IndicatorColor> <Visual>1</Visual> <Next>Two</Next> <TimeFactor>20</TimeFactor> </ActiveItem> </Slide> <Slide> <Name>Two</Name> <Background>red</Background> <TextItem> <X>120</X> <Y>120</Y> <Text>You made it to the second slide.</Text> </TextItem> <ActiveItem> <X>200</X> <Y>200</Y> <Width>240</Width> <Height>240</Height> <IndicatorType>5</IndicatorType> <IndicatorColor>blue</IndicatorColor> <Visual>1</Visual> <Next>One</Next> <TimeFactor>25</TimeFactor> </ActiveItem> <ActiveItem> <X>400</X> <Y>400</Y> <Width>240</Width> <Height>240</Height> <IndicatorType>3</IndicatorType>

37

<IndicatorColor>blue</IndicatorColor> <Visual>1</Visual> <Next>Three</Next> <TimeFactor>25</TimeFactor> </ActiveItem> </Slide>  <CustomSlide> <ClassName>demo.FrogSlide.java</ClassName> <Name>Three</Name> <Next>One</Name> </CustomSlide> </SlideShow>

projector-camera system for flexible interactive projections...this report describes a master’s...

Documents