wiimote infrared detection paper

1

Wii-Remote

Wiimote Infrared Detection by Gregory Peaker

Computer vision technology is used word-wide to reproduce human visual perception in a

computer. This field is cross-disciplinary because it harnesses techniques from mathematical, biological,

and other systems to emulate our eyes and brain. The most essential component is obtaining images,

and then processing this into information understood by a computer. A computer can process images

and video, and to a small extent understand the content. A lot of computer vision is analogous to signal

processing, where sampling, quantization, transformations, and etc. are applied to images and video.

This makes computer vision an interesting exercise in developing artificial intelligence and has many

applications to society.

Most of the applications I’ve seen today are meant to aid human behavior. For instance, the

DARPA Grand Challenge aims to develop fully autonomous vehicles to further technology that can

provide driver’s assistance. DARPA vehicles use a range of sensors, including 2D imaging, 3D terrain

mapping, and laser range finding, in developing information that is later understood by the computer. I

find the integration of disparate technologies particularly interesting in developing autonomous driving.

As humans, we most often rely on one sensory input for our actions (for instance, our walking

movement is based on vision – not on hearing, touching, smelling, or tasting). DARPA competitors make

the task of integrating different sensor inputs to create a serialized flow of information for computer

processing seem like child’s play. I am curious to see the novel technologies people will come out

tomorrow from techniques developed in the DARPA Grand Challenge and similar competitions.

Another useful application of computer vision is aiding sports broadcast and analysis. I am an

avid tennis player and enthusiast, and I am always amazed in seeing computer vision applied to the

game. Multiple high-speed cameras and computer vision are used to track the tennis ball then 3D

2

visualization allows me to views games in more detail and receives better sports broadcaster analysis. I

am most interested in these technologies applications to human-computer interaction. I am consistently

surrounded with technology; I wake up in the morning to a digital alarm clock. I then check my email by

dividing time between a Smartphone and cutlery. This pattern of pressing and typing on devices repeats

all day – until I go to sleep. I think tomorrow’s world will see computer vision applied to changing how

we interact with computers and data. Human gesture tracking can be used to play video games and

interacting with objects on the screen. My project applies technology found in the Nintendo Wii game

console system – that allows people to play video games and manipulate on-screen information – in

demonstrating what computer vision can do for human-computer interaction tomorrow.

This class has taught me general histogram analysis algorithms and several morphological

operators. I leave this class with very useful knowledge of color space models including RGB and HSV.

We have examined image matrices in spatial and polar form and used dilation and erosion in various

combinations. Opening and closing morphological operations allow for edge/border detection and this

can be commonly seen in our digital photo cameras. We have analyzed and enhanced images knowing

information about a specific pixel and its surrounding area and statistic analysis across a whole image.

Image histograms show the distribution of pixel intensities in an image. We have applied statistical

analysis, for instance mean and standard deviation, in our machine problems. Edge detection is used to

find boundaries between objects, and this done by finding places in the image where pixel intensities

quickly change and derivatives are used here. The Hough transformation is used to find lines by

modeling the parametric representation of a line. Peak values in a Hough transformation reveal possible

lines in an image.

This class has provided a foundation that enables me to apply my computer science acumen to a

diverse range of problems facing today’s computer vision. Through the weekly machine problems I

3

learned how to successfully simplify, plan, and complete projects in a timely manner. I am more

interested in the synergy of computer vision and human-compute interaction – and most importantly I

have honed my engineering and analytical skills.

I would have enjoyed learning about machine learning in computer vision. Machine learning

applies to my final project, and I think the success and results of this project would have greatly

increased by using machine learning in my gesture recognition algorithm. I would want to study more

the importance of the correlation in statistics. I would like to have learned more about image

enhancement techniques – understanding the techniques and theories used in automatic color

adjustment, exposure adjustment, and image de-blurring. We touched upon this idea with histogram

equalization, but I would like to have delved further for another class. Understanding linear, median,

and adaptive filtering would come in handy for the occasional Photoshop project.

My project deals with a new – and in my opinion very interesting - application of computer

vision in the IR spectrum. I would like to have seen more real-world applications of computer vision then

gain knowledge in the algorithms and how they work. It would be interesting to learn the algorithms

used in infrared multi-touch products like the Microsoft Surface. Learning more on face detection, red-

eye removal, and other algorithms seen in digital cameras would be fascinating. Another area I would

like to study more is the integration of 3D vision systems using lasers, range sensors, tomographic

sensors, RADAR/LADAR sensors, and other imaging technologies. This type of integration can be seen in

projects like DARPA’s autonomous vehicles challenge.

4

Project Description

The goal in this project is to create a very inexpensive computer vision system that allows

interaction between a computer and a human (human-computer interaction). Additionally, this is to be

done in approximately one month’s time. The interaction is performed by moving one hand or finger in

a two-dimensional space and determining its best fit to one of twelve possible gestures/moves. Each

gesture controls a specific function within a computer application thus allowing a human to interact with

a computer through the air. Information is obtained using an infrared camera. Moreover, I want to learn

about infrared and gain working experience in this spectrum.

Problems with similar systems today are large computing requirement and ineffectiveness when

light background or skin tone is changed from the visible light spectrum. Additionally, interacting with

computers is difficult because we are limited to movements of the mouse and keyboard. Humans do not

interact with real-world objects the same way we interact with computers. For instance, we open our

hand, move the hand around an object, close our hand, move our hand again, and then open the hand

to move physical objects to its desired location. Using computers, we move our hand to the mouse,

open then close our hand around the mouse, find the cursor position on the screen, move the mouse

and corresponding cursor to our desired object, click a button, move the mouse, then release the mouse

button when the object is in its desired location.

5

My Design

Thinking about the previous problem statement, I realized that the large computing

requirement and ineffectiveness between working environments is caused by problems with cameras in

the visible light spectrum. I was then determined to not use the visible light spectrum thus leaving the

radio frequency, microwave, infrared, ultraviolet, and other spectrums viable alternatives. Further

deduction showed several existing computer vision systems using the infrared spectrum (most notable

the Nintendo Wii game console and Microsoft’s multi-touch Surface). It was then determined that an

infrared detector and emitter is necessary. I placed a budgetary constraint on the equipment to be less

than $50 – this level was considered matched my goal for an inexpensive system.

A detector (for instance a camera) and emitter are required for human-computer interaction.

The budgetary requirement and need to work in the infrared spectrum generated two viable options.

First is the $40 Wii remote that has a built-in 1024x768 infrared camera. Second is a $20-$50 webcam

that is modifiable into an infrared camera (1). The second option was quickly eliminated because of a

need for custom hardware modification (this would take too much time and I do not have the necessary

skills). The Wii remote was the remaining option for infrared detection. I decided to construct my own

infrared emitter since it only required an infrared LED, resistor, and battery pack.

Gesture recognition was decided as the best method to demonstrate the viability of human-

computer interaction using computer vision in the infrared spectrum. I used the

Interlink VP 6600 presentation remote to model the best computer functions

for this demonstration (2) (Figure 1). This remote controls media and

powerpoint functions on its host computer. This comes out to eleven unique

functions (Play/Pause, Previous ‘music’ track, Next ‘music’ track, Stop,

Volume Up, Volume Down, Volume mute/unmute, Up, Down, Left, and Right

Figure 1: Interlink VP6600

6

– the last four controlling PowerPoint). Further research into gesture

recognition showed most using machine learning techniques for neural

networks. It was decided that those systems were outside of my skillset

and I need to create a simpler gesture recognition system. This requires

me to work with straight-line gestures and I was able to come up with

twelve uniquiely recognizable gestures (Left, Right, Up, Down, Up-Left, Up-

Right, Down-Left, Down-Right, Right-Up, Right-Down, Left-up, and Left-Down – Figure 2). Using this

framework of a Wii remote with a built-in infrared camera, infrared LED, and a desired set of gestures I

set out to build the following system where I will discuss the Wii Remote, how it connects and is

understood by the computer, then how my gesture recognition algorithm works.

The Wii Remote (WiiMote)

The Wiimote is the primary input device for

the Nintendo Wii gaming console. It is a one-handed

remote control that supports a very intuitive motion

sensitivity. The remote is designed perfectly for

manipulating objects and characters on the screen,

and this design has appealed to non-gamers. The

manipulation synergizes the built-in accelerometer

and front-facing optical sensor, and it also has 11

input buttons (as seen in Figure 3). This device

measures 5.8” long, 1.4” wide, and 1.2” tall. The

built-in Bluetooth works up to thirty feet and the

optical sensor works up to fifteen feet. A set of four LED’s

Figure 2: Twelve uniquely recognizable gestures

Figure 3: The Wii Remote

7

indicate player number with the Wii console and remaining battery in quartiles. These features have

made the Wii remote a popular hacking project; for instance, the accelerometer is used as virtual

drumsticks in the Virtual Drum Kit program and infrared capability can often be seen as a replacement

for mouse input (3).

The Wiimote uses a very standard Bluetooth wireless link (4). A Broadcom Bluetooth System-on-

a-chip is used to process the eleven available button inputs and to send optical data sensor to the host

Bluetooth device. The standard Bluetooth Human Interface (HID) standard is implemented – this is the

same standard any Bluetooth keyboard or mouse uses. A Bluetooth host uses Bluetooth Service

Discovery Protocol (SDP) to receive vendor and product ID - all Wiimote’s have the same ID’s. This allows

any application to query the operating system’s Bluetooth stack for all available devices, and the ID

uniquely identifies the Wiimote device from all other Bluetooth devices. Full duplex communication is

performed and communicates at most 100Hz between the computer and the Wiimote with all discrete

packets equaling 22 bytes. Any button press or release event triggers a new packet; moreover, no

encryption or authentication is used. Most features of the Wiimote have been fully reverse-engineered;

areas that have not been completed include advanced functionality of

the IR Camera and built-in speaker.

3-axis linear accelerometer is housed near the center of the

remote. The accelerometer use tiny masses attached to silicon springs,

and the movement of this spring causes voltage differences that is

measured and used to determine Force applied by the mass. One

determines acceleration using the simple physics formula F=m*a, and

this device is able to measure +/- 3g with 10% sensitivity. The

microscopic design of this accelerometer intrinsically makes precise Figure 4: The Coordinate

system used for accelerometers

8

mass production of this device difficult; however, the Wiimote performs software calibration when it’s

first started and stores it in memory. These facts are used to derive the Wii’s acceleration and tilt-

rotation values.

During Nintendo Wii’s development, researchers found the accelerometer to be inaccurate in

cursor positioning. The engineers came up with the idea of adding an infrared image sensor with two

stationary IR beacons. These IR beacons are housed in the Sensor

Bar which can be located above or below the TV; each IR

beacon consists of 5 IR LEDs where the farthest LED is

pointed slightly away from the center, the LED closest

to the center is pointed slightly towards the middle,

and the other three LEDs point straight – most likely

maximizing the WiiMote’s field of view. This gives the

Wiimote about fifteen range. Triangulation due to a fixed

distance between the IR beacons determined the rotation and

distance from the TV. The infrared sensor is a 1024x768 monochrome camera with an integrated IR-pass

filter. Similar to Bluetooth, the camera is a System-

on-a-chip design with a built-in processor capable

of tracking up to four moving objects emitting IR

light. Due to Bluetooth bandwidth constraints, the

Wiimote is unable to send raw image back and

relies on the built-in object tracking to send

coordinate pairs (x,y) and intensity values for up t

four moving objects.

Figure 4: The Nintendo Wii Sensor Bar and highlighted IR LEDs.

Figure 4: IR sensor data is used to determine where a cursor should be on the screen.

9

Wiimote & Bluetooth The Wiimote is paired with the windows using BluetoothThe Wiimote can be found in the Bluetooth Stack.

Windows API Windows has built-in API calls for bi-directional communication with devices in the Bluetooth Stack.

P/Invoke and C# P/Invoke calls Windows APIC# uses P/Invoke

Wiimote C# API

Using .NET and C#, a reverse engineered API for the Wiimote on Windows has been developed.

There are additional APIs for Linux and Mac, but I will be focusing my attention on a single .NET API. This

can be included in any program as a Dynamic Link Library (DLL). Steps for using this API is very simple:

First, pair the Wiimote with the computer’s Bluetooth stack and install generic keyboard/mouse drivers.

Second, initialize the DLL and it will automatically search, find, connect, and start retrieving data from

the Wiimote.

A button in the battery compartment places the Wiimote into pairing mode. One found and

paired with a computer, it is identified as a Human Interface Device (HID) compliant and generic

Win32/Win64 drivers are installed. The P/Invoke feature of .NET allows C# to receive and send

information from Windows API functions –

meaning the Wiimote is paired with the Bluetooth

stack, Windows API functions are able to access

the Bluetooth stack, and P/Invoke in C# accesses

the Windows API functions. The API searches

searches then connects to the Wiimote in this

order:

1. Attain references to GUID and HID classes in Windows.

2. Receive list of all HID devices on the computer.

3. Loop through the list and receive detailed information on each device.

4. Select all devices that match the Wiimote’s Vendor ID and Product ID (this allows multiple

Wiimotes to connect simulateneously)

5. Create a FileStream object to each device

10

6. Disconnect from classes uses in 1 and 2

The FileStream object allows bi-directional communication between the computer and the Wiimote.

All data is send and received as discrete packets and saved in a 22 byte buffer; the API allows for events

and polling to take place when receiving data. The API parses packets and wraps this information in a

class with members defining the states of each button and double’s containing the coordinate (x,y) and

intensity of all four tracked objects. An event allows forces the update function to be called everytime

the 22 byte buffer is full and polling lets the application query the API for the Wiimote’s last packet.

Packets sent to the Wiimote can be done at anytime, polling or event querying is not necessary.

Gesture Recognition

11

Gesture

recognition is

becoming a

commonly used tool

to navigate the host

of applications on

computers. It is

possible to have an unlimited number of

gestures; for instance, a gesture could be a

drawing of a computer or a human. One would

not want to have too many gestures because

it’ll become exponentially harder to remember

them all and it’ll become easy to forget how to

draw one gesture than another gesture. I use a

simple IR LED, resistor, and battery pack circuit

to create an IR emitter. I tape this emitter to

my pointer finger thus demonstrating

pointing/gesture functionality with a finger.

My IR LED gesture recognition

algorithm has a simple, effective, and elegant

design.

12

1. A ticker at 1Hz keeps records of the coordinate (x,y) of the first tracked IR object for the last two

tickers.

a. If all three ticks are within a certain threshold of one another, it starts gesture

recognition (similar to a button press or gesture press).

2. Using the event notification of the C# Wiimote API, a list of coordinate for the first tracked IR

object is stored in an ArrayList object.

3. Since the ticks still keep records every 1Hz (from Part 1), the gesture ends when the last two

tickers are within a certain threshold of one another (similar to part 1 a). At gesture end, a

function to recognize which of the twelve possible gestures has been performed.

The gesture recognition function performs this:

Compare final (x,y) to initial (x,y)

Obtain variables x_diff and y_diff

If x_diff > threshold, define as East or West

If y_diff > threshold, define as North or South

If neither > threshold, no motion

If one > threshold, basic North, South, East, or West direction

If both > threshold, see if one direction is significantly greater than other

o Interpret this as only the larger one

o nits east and 12 units north probably intended to be only east

If both > threshold and neither significantly greater than other

o Create line between start and end points of gesture array

o Count every (x,y) element between start and end and count them as above or below line

o Use this info to determine, for example, whether gesture is South-East or East-South

13

My Approach & Design

Using off-the-shelf and inexpensive components - the wiimote, Infrared emitter (LED, resister,

battery pack), and free, open source software – I am able to get the Wiimote to detect and track IR

sports then transmit this information to a computer appolication. The Wiimote is connected through

Bluetooth and data is send full duplex at 100Hz. The Wiimote API parses incoming data packets and

creates an

object

storing

information from the most recently received packet. I do not use the button press, acceleration, or IR

intensity information and I focus only on coordinates (x,y) from

the first tracked object. The Gesture recognition algorithm

detects when a gesture is started, begins recording

movement at the same rate data is transmitted from the remote

to the computer (typically 100Hz), and recognizes when the

gesture ends. This information is fed into a function that returns

one of twelve direction combinations. The gestures are

mapped to keyboard presses according to Table 1.

Gesture Keyboard ActionDown DownUp UpLeft LeftRight RightDown-Left Previous TrackUp- Left Volume DownUp-Right Volume UpDown-Right

Next Track

Left-Down Mute/UnmuteLeft-Up Mute/UnmuteRight-Up PauseRight-Down

Stop

wiimote infrared detection paper

Documents

computer processing

todays computer vision

synergy of computer

lot of computer vision

humancomputer interaction

computer science acumen

image matrices

image histograms