wiimote infrared detection paper
DESCRIPTION
Apply new hardware to novel problems.TRANSCRIPT
1
Wii-Remote
Wiimote Infrared Detection by Gregory Peaker
Computer vision technology is used word-wide to reproduce human visual perception in a
computer. This field is cross-disciplinary because it harnesses techniques from mathematical, biological,
and other systems to emulate our eyes and brain. The most essential component is obtaining images,
and then processing this into information understood by a computer. A computer can process images
and video, and to a small extent understand the content. A lot of computer vision is analogous to signal
processing, where sampling, quantization, transformations, and etc. are applied to images and video.
This makes computer vision an interesting exercise in developing artificial intelligence and has many
applications to society.
Most of the applications I’ve seen today are meant to aid human behavior. For instance, the
DARPA Grand Challenge aims to develop fully autonomous vehicles to further technology that can
provide driver’s assistance. DARPA vehicles use a range of sensors, including 2D imaging, 3D terrain
mapping, and laser range finding, in developing information that is later understood by the computer. I
find the integration of disparate technologies particularly interesting in developing autonomous driving.
As humans, we most often rely on one sensory input for our actions (for instance, our walking
movement is based on vision – not on hearing, touching, smelling, or tasting). DARPA competitors make
the task of integrating different sensor inputs to create a serialized flow of information for computer
processing seem like child’s play. I am curious to see the novel technologies people will come out
tomorrow from techniques developed in the DARPA Grand Challenge and similar competitions.
Another useful application of computer vision is aiding sports broadcast and analysis. I am an
avid tennis player and enthusiast, and I am always amazed in seeing computer vision applied to the
game. Multiple high-speed cameras and computer vision are used to track the tennis ball then 3D
2
visualization allows me to views games in more detail and receives better sports broadcaster analysis. I
am most interested in these technologies applications to human-computer interaction. I am consistently
surrounded with technology; I wake up in the morning to a digital alarm clock. I then check my email by
dividing time between a Smartphone and cutlery. This pattern of pressing and typing on devices repeats
all day – until I go to sleep. I think tomorrow’s world will see computer vision applied to changing how
we interact with computers and data. Human gesture tracking can be used to play video games and
interacting with objects on the screen. My project applies technology found in the Nintendo Wii game
console system – that allows people to play video games and manipulate on-screen information – in
demonstrating what computer vision can do for human-computer interaction tomorrow.
This class has taught me general histogram analysis algorithms and several morphological
operators. I leave this class with very useful knowledge of color space models including RGB and HSV.
We have examined image matrices in spatial and polar form and used dilation and erosion in various
combinations. Opening and closing morphological operations allow for edge/border detection and this
can be commonly seen in our digital photo cameras. We have analyzed and enhanced images knowing
information about a specific pixel and its surrounding area and statistic analysis across a whole image.
Image histograms show the distribution of pixel intensities in an image. We have applied statistical
analysis, for instance mean and standard deviation, in our machine problems. Edge detection is used to
find boundaries between objects, and this done by finding places in the image where pixel intensities
quickly change and derivatives are used here. The Hough transformation is used to find lines by
modeling the parametric representation of a line. Peak values in a Hough transformation reveal possible
lines in an image.
This class has provided a foundation that enables me to apply my computer science acumen to a
diverse range of problems facing today’s computer vision. Through the weekly machine problems I
3
learned how to successfully simplify, plan, and complete projects in a timely manner. I am more
interested in the synergy of computer vision and human-compute interaction – and most importantly I
have honed my engineering and analytical skills.
I would have enjoyed learning about machine learning in computer vision. Machine learning
applies to my final project, and I think the success and results of this project would have greatly
increased by using machine learning in my gesture recognition algorithm. I would want to study more
the importance of the correlation in statistics. I would like to have learned more about image
enhancement techniques – understanding the techniques and theories used in automatic color
adjustment, exposure adjustment, and image de-blurring. We touched upon this idea with histogram
equalization, but I would like to have delved further for another class. Understanding linear, median,
and adaptive filtering would come in handy for the occasional Photoshop project.
My project deals with a new – and in my opinion very interesting - application of computer
vision in the IR spectrum. I would like to have seen more real-world applications of computer vision then
gain knowledge in the algorithms and how they work. It would be interesting to learn the algorithms
used in infrared multi-touch products like the Microsoft Surface. Learning more on face detection, red-
eye removal, and other algorithms seen in digital cameras would be fascinating. Another area I would
like to study more is the integration of 3D vision systems using lasers, range sensors, tomographic
sensors, RADAR/LADAR sensors, and other imaging technologies. This type of integration can be seen in
projects like DARPA’s autonomous vehicles challenge.
4
Project Description
The goal in this project is to create a very inexpensive computer vision system that allows
interaction between a computer and a human (human-computer interaction). Additionally, this is to be
done in approximately one month’s time. The interaction is performed by moving one hand or finger in
a two-dimensional space and determining its best fit to one of twelve possible gestures/moves. Each
gesture controls a specific function within a computer application thus allowing a human to interact with
a computer through the air. Information is obtained using an infrared camera. Moreover, I want to learn
about infrared and gain working experience in this spectrum.
Problems with similar systems today are large computing requirement and ineffectiveness when
light background or skin tone is changed from the visible light spectrum. Additionally, interacting with
computers is difficult because we are limited to movements of the mouse and keyboard. Humans do not
interact with real-world objects the same way we interact with computers. For instance, we open our
hand, move the hand around an object, close our hand, move our hand again, and then open the hand
to move physical objects to its desired location. Using computers, we move our hand to the mouse,
open then close our hand around the mouse, find the cursor position on the screen, move the mouse
and corresponding cursor to our desired object, click a button, move the mouse, then release the mouse
button when the object is in its desired location.
5
My Design
Thinking about the previous problem statement, I realized that the large computing
requirement and ineffectiveness between working environments is caused by problems with cameras in
the visible light spectrum. I was then determined to not use the visible light spectrum thus leaving the
radio frequency, microwave, infrared, ultraviolet, and other spectrums viable alternatives. Further
deduction showed several existing computer vision systems using the infrared spectrum (most notable
the Nintendo Wii game console and Microsoft’s multi-touch Surface). It was then determined that an
infrared detector and emitter is necessary. I placed a budgetary constraint on the equipment to be less
than $50 – this level was considered matched my goal for an inexpensive system.
A detector (for instance a camera) and emitter are required for human-computer interaction.
The budgetary requirement and need to work in the infrared spectrum generated two viable options.
First is the $40 Wii remote that has a built-in 1024x768 infrared camera. Second is a $20-$50 webcam
that is modifiable into an infrared camera (1). The second option was quickly eliminated because of a
need for custom hardware modification (this would take too much time and I do not have the necessary
skills). The Wii remote was the remaining option for infrared detection. I decided to construct my own
infrared emitter since it only required an infrared LED, resistor, and battery pack.
Gesture recognition was decided as the best method to demonstrate the viability of human-
computer interaction using computer vision in the infrared spectrum. I used the
Interlink VP 6600 presentation remote to model the best computer functions
for this demonstration (2) (Figure 1). This remote controls media and
powerpoint functions on its host computer. This comes out to eleven unique
functions (Play/Pause, Previous ‘music’ track, Next ‘music’ track, Stop,
Volume Up, Volume Down, Volume mute/unmute, Up, Down, Left, and Right
Figure 1: Interlink VP6600
6
– the last four controlling PowerPoint). Further research into gesture
recognition showed most using machine learning techniques for neural
networks. It was decided that those systems were outside of my skillset
and I need to create a simpler gesture recognition system. This requires
me to work with straight-line gestures and I was able to come up with
twelve uniquiely recognizable gestures (Left, Right, Up, Down, Up-Left, Up-
Right, Down-Left, Down-Right, Right-Up, Right-Down, Left-up, and Left-Down – Figure 2). Using this
framework of a Wii remote with a built-in infrared camera, infrared LED, and a desired set of gestures I
set out to build the following system where I will discuss the Wii Remote, how it connects and is
understood by the computer, then how my gesture recognition algorithm works.
The Wii Remote (WiiMote)
The Wiimote is the primary input device for
the Nintendo Wii gaming console. It is a one-handed
remote control that supports a very intuitive motion
sensitivity. The remote is designed perfectly for
manipulating objects and characters on the screen,
and this design has appealed to non-gamers. The
manipulation synergizes the built-in accelerometer
and front-facing optical sensor, and it also has 11
input buttons (as seen in Figure 3). This device
measures 5.8” long, 1.4” wide, and 1.2” tall. The
built-in Bluetooth works up to thirty feet and the
optical sensor works up to fifteen feet. A set of four LED’s
Figure 2: Twelve uniquely recognizable gestures
Figure 3: The Wii Remote
7
indicate player number with the Wii console and remaining battery in quartiles. These features have
made the Wii remote a popular hacking project; for instance, the accelerometer is used as virtual
drumsticks in the Virtual Drum Kit program and infrared capability can often be seen as a replacement
for mouse input (3).
The Wiimote uses a very standard Bluetooth wireless link (4). A Broadcom Bluetooth System-on-
a-chip is used to process the eleven available button inputs and to send optical data sensor to the host
Bluetooth device. The standard Bluetooth Human Interface (HID) standard is implemented – this is the
same standard any Bluetooth keyboard or mouse uses. A Bluetooth host uses Bluetooth Service
Discovery Protocol (SDP) to receive vendor and product ID - all Wiimote’s have the same ID’s. This allows
any application to query the operating system’s Bluetooth stack for all available devices, and the ID
uniquely identifies the Wiimote device from all other Bluetooth devices. Full duplex communication is
performed and communicates at most 100Hz between the computer and the Wiimote with all discrete
packets equaling 22 bytes. Any button press or release event triggers a new packet; moreover, no
encryption or authentication is used. Most features of the Wiimote have been fully reverse-engineered;
areas that have not been completed include advanced functionality of
the IR Camera and built-in speaker.
3-axis linear accelerometer is housed near the center of the
remote. The accelerometer use tiny masses attached to silicon springs,
and the movement of this spring causes voltage differences that is
measured and used to determine Force applied by the mass. One
determines acceleration using the simple physics formula F=m*a, and
this device is able to measure +/- 3g with 10% sensitivity. The
microscopic design of this accelerometer intrinsically makes precise Figure 4: The Coordinate
system used for accelerometers
8
mass production of this device difficult; however, the Wiimote performs software calibration when it’s
first started and stores it in memory. These facts are used to derive the Wii’s acceleration and tilt-
rotation values.
During Nintendo Wii’s development, researchers found the accelerometer to be inaccurate in
cursor positioning. The engineers came up with the idea of adding an infrared image sensor with two
stationary IR beacons. These IR beacons are housed in the Sensor
Bar which can be located above or below the TV; each IR
beacon consists of 5 IR LEDs where the farthest LED is
pointed slightly away from the center, the LED closest
to the center is pointed slightly towards the middle,
and the other three LEDs point straight – most likely
maximizing the WiiMote’s field of view. This gives the
Wiimote about fifteen range. Triangulation due to a fixed
distance between the IR beacons determined the rotation and
distance from the TV. The infrared sensor is a 1024x768 monochrome camera with an integrated IR-pass
filter. Similar to Bluetooth, the camera is a System-
on-a-chip design with a built-in processor capable
of tracking up to four moving objects emitting IR
light. Due to Bluetooth bandwidth constraints, the
Wiimote is unable to send raw image back and
relies on the built-in object tracking to send
coordinate pairs (x,y) and intensity values for up t
four moving objects.
Figure 4: The Nintendo Wii Sensor Bar and highlighted IR LEDs.
Figure 4: IR sensor data is used to determine where a cursor should be on the screen.
9
Wiimote & Bluetooth The Wiimote is paired with the windows using BluetoothThe Wiimote can be found in the Bluetooth Stack.
Windows API Windows has built-in API calls for bi-directional communication with devices in the Bluetooth Stack.
P/Invoke and C# P/Invoke calls Windows APIC# uses P/Invoke
Wiimote C# API
Using .NET and C#, a reverse engineered API for the Wiimote on Windows has been developed.
There are additional APIs for Linux and Mac, but I will be focusing my attention on a single .NET API. This
can be included in any program as a Dynamic Link Library (DLL). Steps for using this API is very simple:
First, pair the Wiimote with the computer’s Bluetooth stack and install generic keyboard/mouse drivers.
Second, initialize the DLL and it will automatically search, find, connect, and start retrieving data from
the Wiimote.
A button in the battery compartment places the Wiimote into pairing mode. One found and
paired with a computer, it is identified as a Human Interface Device (HID) compliant and generic
Win32/Win64 drivers are installed. The P/Invoke feature of .NET allows C# to receive and send
information from Windows API functions –
meaning the Wiimote is paired with the Bluetooth
stack, Windows API functions are able to access
the Bluetooth stack, and P/Invoke in C# accesses
the Windows API functions. The API searches
searches then connects to the Wiimote in this
order:
1. Attain references to GUID and HID classes in Windows.
2. Receive list of all HID devices on the computer.
3. Loop through the list and receive detailed information on each device.
4. Select all devices that match the Wiimote’s Vendor ID and Product ID (this allows multiple
Wiimotes to connect simulateneously)
5. Create a FileStream object to each device
10
6. Disconnect from classes uses in 1 and 2
The FileStream object allows bi-directional communication between the computer and the Wiimote.
All data is send and received as discrete packets and saved in a 22 byte buffer; the API allows for events
and polling to take place when receiving data. The API parses packets and wraps this information in a
class with members defining the states of each button and double’s containing the coordinate (x,y) and
intensity of all four tracked objects. An event allows forces the update function to be called everytime
the 22 byte buffer is full and polling lets the application query the API for the Wiimote’s last packet.
Packets sent to the Wiimote can be done at anytime, polling or event querying is not necessary.
Gesture Recognition
11
Gesture
recognition is
becoming a
commonly used tool
to navigate the host
of applications on
computers. It is
possible to have an unlimited number of
gestures; for instance, a gesture could be a
drawing of a computer or a human. One would
not want to have too many gestures because
it’ll become exponentially harder to remember
them all and it’ll become easy to forget how to
draw one gesture than another gesture. I use a
simple IR LED, resistor, and battery pack circuit
to create an IR emitter. I tape this emitter to
my pointer finger thus demonstrating
pointing/gesture functionality with a finger.
My IR LED gesture recognition
algorithm has a simple, effective, and elegant
design.
12
1. A ticker at 1Hz keeps records of the coordinate (x,y) of the first tracked IR object for the last two
tickers.
a. If all three ticks are within a certain threshold of one another, it starts gesture
recognition (similar to a button press or gesture press).
2. Using the event notification of the C# Wiimote API, a list of coordinate for the first tracked IR
object is stored in an ArrayList object.
3. Since the ticks still keep records every 1Hz (from Part 1), the gesture ends when the last two
tickers are within a certain threshold of one another (similar to part 1 a). At gesture end, a
function to recognize which of the twelve possible gestures has been performed.
The gesture recognition function performs this:
Compare final (x,y) to initial (x,y)
Obtain variables x_diff and y_diff
If x_diff > threshold, define as East or West
If y_diff > threshold, define as North or South
If neither > threshold, no motion
If one > threshold, basic North, South, East, or West direction
If both > threshold, see if one direction is significantly greater than other
o Interpret this as only the larger one
o nits east and 12 units north probably intended to be only east
If both > threshold and neither significantly greater than other
o Create line between start and end points of gesture array
o Count every (x,y) element between start and end and count them as above or below line
o Use this info to determine, for example, whether gesture is South-East or East-South
13
My Approach & Design
Using off-the-shelf and inexpensive components - the wiimote, Infrared emitter (LED, resister,
battery pack), and free, open source software – I am able to get the Wiimote to detect and track IR
sports then transmit this information to a computer appolication. The Wiimote is connected through
Bluetooth and data is send full duplex at 100Hz. The Wiimote API parses incoming data packets and
creates an
object
storing
information from the most recently received packet. I do not use the button press, acceleration, or IR
intensity information and I focus only on coordinates (x,y) from
the first tracked object. The Gesture recognition algorithm
detects when a gesture is started, begins recording
movement at the same rate data is transmitted from the remote
to the computer (typically 100Hz), and recognizes when the
gesture ends. This information is fed into a function that returns
one of twelve direction combinations. The gestures are
mapped to keyboard presses according to Table 1.
Gesture Keyboard ActionDown DownUp UpLeft LeftRight RightDown-Left Previous TrackUp- Left Volume DownUp-Right Volume UpDown-Right
Next Track
Left-Down Mute/UnmuteLeft-Up Mute/UnmuteRight-Up PauseRight-Down
Stop