spie proceedings [spie spie defense, security, and sensing - baltimore, maryland, usa (monday 29...

Optical image processing and pattern recognition algorithms for

optimal optical data retrieval

Brian Walkerb, Thomas Lua1

, Sean Stuartc, George Reyesa, and Tien-Hsin Chaoa

aJet Propulsion Lab/Caltech, Pasadena, CA, USA;

bGeorgia Inst. of Tech. Atlanta, GA, USA;

cSanta

Monica College, Santa Monica, CA, USA

ABSTRACT

Automatic pattern recognition algorithms are implemented to correct distortion and remove noise from the optical

medium in the multi-channel optical communication systems. The post-processing involves filtering and correlation to

search for accurate location of every optical data element. Localized thresholding and neural network training methods

are used to accurately digitize the analog optical images into digital data pages. The goal is to minimize the bit-error-

rate (BER) in the optical data transmission and receiving process. Theoretical analysis and experimental tests have been

carried out to demonstrate the improved optical data retrieval accuracy.

Keywords: multi-channel optical communication, data retrieval, channel alignment, analog-to-digital conversion, bit-

error-rate, thresholding, pattern recognition, and neural networks.

1. INTRODUCTION

Since the invention of the fiber optical communication in 1964 [1] and the CD-ROM in 1968 [2], optical transmission,

recording and retrieving of digital data have been active research areas. The presence or absence of light can be used to

represent the binary ones and zeros of digital communication. In contrast to radio frequency communication, light based

free space optical data transmission (FSODT) and communication are difficult to intercept without detection, and are

resistant to electromagnetic interference [3-4]. Optical transmission mediums are reliable and can offer large storage

capacity, such as the DVDs and Blue Ray Discs. Recent developments in high speed imaging devices have made large

capacity, high-speed, multi-channel digital optical data transmission potentially an affordable, accurate, secure and fast

method of optical data transmission and retrieval. In 2006, the ESA’s Advanced Research and Technology Mission

Satellite (ARTEMIS) successfully established free space optical communication with an aircraft [5]. At NASA’s

Goddard Space Flight Center, a technology development mission called the Laser Communications Relay Demonstration

(LCRD) is planning to test FSODT with data rates of 10 to 100 times that found in standard radio frequency

communication [6]. Commercial ground based communication networks are also taking advantage of FSODT to solve

the ‘last mile problem’. One technical concern of optical communication is that there are always distortion and noise

problems associated with the optical transmitter, receiver and medium. In order to achieve highly accurate optical data

transmission while maintaining high data rates, optical channel alignment and data processing become critically

important components. To this end, we present a method for the parallel transmission of data in either an optical

recording medium, or an unguided medium, as well as methods for automatic channel alignment and correcting noise in

a planar array transmission scheme.

1 e-mail: [email protected] , Tel: (818) 354-9513, Fax: (818) 393-4272

Optical Pattern Recognition XXIV, edited by David Casasent, Tien-Hsin Chao, Proc. of SPIE Vol. 8748, 87480L · © 2013 SPIE · CCC code: 0277-786X/13/$18 · doi: 10.1117/12.2018264

Proc. of SPIE Vol. 8748 87480L-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/03/2013 Terms of Use: http://spiedl.org/terms

2. OPTICAL TRANSMISSION ARCHITECTURE

2.1 Transmission of data in planar array

Transmission of optical data in an optical recording medium or a unguided medium like air can be accomplished in

different ways, but for our purposes we will employ a uniform illumination source and a method for modulating that

source. This can be accomplished using a liquid crystal display (LCD) or a digital micromirror device (DMD) as a spatial

light modulator (SLM). The data is placed in a rectangular matrix array - or a page - and displayed on the SLM. Due to

limitations on the size of the SLM, if the data to be transmitted has more information than can fit in a single page,

additional pages can be added until the entire data set is transmitted. After interacting with the SLM, the data page passes

through the transmission medium and a system of optical components to align the array onto a receiver. These optical

components and the optical medium may introduce some noise into the signal, which will need to be corrected at a later

stage in the process.

2.2 Retrieval of data

An optical receiver uses a detector array such as charge-coupled devices (CCD), or complementary metal-oxide

semiconductors (CMOS). When a page arrives at the receiver it is necessary to determine where each transmitted data

channel can be found. In the ideal case there is a one-to-one mapping from a data bit on the transmitter to the receiver.

Additionally each bit on the receiver has no covariance with any other bit, and the intensity is uniform across the entire

transmission space. In this configuration, a global intensity threshold can be used to find ones and zeros.

In real-world applications, there are distortions from the optical systems such that no obvious one-to-one mapping from

transmitter to receiver exists. Each channel may be corrupted by noise introduced by the transmission medium, the

transmitter and the receiver. This introduces the possibility that some channels may be lost in the transmission, and

contribute to the bit error rate (BER). In order to minimize the BER, an optimal mapping from receiver to transmitter is

required.

2.3 Calibration of transmitter and receiver

Calibration of the transmitter and receiver is the process of determining the optimal mapping between these two devices.

An ideal calibration accounts for all stable noise sources present in the transmission. A stable noise source is defined to

be noise that is consistent from page to page. These stable sources may take several forms, such as translation, distortion,

blockage, non-uniformity of illumination, and non-uniformity of amplifier gain. Bit translation is when data is recovered

at the receiver, but not in the intended location. This type of noise can be corrected by changing the logical mapping

between transmitter channel and receiver channel as long as the translation is uniform for the entire array. Distortion is

the phenomenon of non-uniform translation. Distortion may result in two or more channels occupying the same location

on the receiver, for a single channel to be spread across multiple receiver locations, or some combination of these

phenomena, as illustrated in Figure 1.

Channel blockage is when a channel is completely lost in transmission, and no amount of receiver configuration can

recover the data. In this case, it is necessary to adjust the transmission method such that data is not sent through that

channel location. Non-uniformity of illumination is when regions of a page are brighter or darker than the rest of the

page. If this is consistent from page to page, it is possible to create a map of pixel intensities to threshold each section of

the page. Non-uniformity of amplifier gain is a feature of the receiver, and may be constant over time due to

imperfections in the fabrication method. This type of noise will make adjacent pixels return different values even if they

are exposed to identical illumination. An extreme version of this noise type is a dead pixel, which is either always high

or always low, regardless of illumination.



tit

wCC

IOC

IOC

latICIC

CIN

CO

ICC

ICI

=C

OC

CI

IOW

AM

P*111

(a) (b) (c)

Figure 1. Illustration of the alignment of a JFET sensor array with some common distortion modes: (a) Channel (Bit)

center points are properly aligned; (b) Bit center points have some exaggerated barrel distortion; (c) Bit center points

are completely misaligned due to pincushion distortion. [7]

2.4 Post-Processing of recovered data

In addition to stable noise sources, there are transient noise sources that many vary from page to page. Transient noise

sources may take the form of salt & pepper noise and intensity variations. In contrast to dead pixels, salt & pepper noise

is not constant from page to page and may be due to electromagnetic interference, amplifier noise, or read errors. This

type of noise is unpredictable but is usually limited in amplitude. Intensity variations may come from the light source

and can either be global, where the entire page is darker or lighter than expected, or local, where different regions of the

page are brighter or darker than the rest of the page.

Salt & pepper noise is usually confined to a small number of pixels, and can thus be filtered out. Intensity variations are

harder to compensate for, because the physical scale of the intensity variation is not known in advance. If the variation is

large enough in magnitude, it can cause errors in the recovered data. If it is not possible to stabilize the light source, it is

then necessary to employ a method for measuring the local intensity variation on each page.

3. POST-PROCESSING ALGORITHM DESIGN

3.1 Transmission of laser light through a Liquid Crystal Display

In multi-channel optical communication, a laser is used as the illumination source. The light is spread out using a

divergent lens so that the intensity profile is nearly flat when it interacts with the SLM. As the laser light passes through

the SLM, the light is modulated by the binary data array. This light is then directed onto a receiver.

3.2 Collecting data using a complimentary metal-oxide semiconductor camera

A CMOS camera was chosen to collect the data because they can record data at a high rate, have nearly uniform pixel

gain, and are easy to acquire. The sensor was placed in such a way that the modulated laser light would impact the sensor

normal to its surface, and that it collected as little scattered light as possible.



3.3 Calibration through a Bin allocation method

In this system, each channel of data to be transmitted is represented by a collection of pixels, hereafter referred to as a

bin. The most significant noise source is alignment. If the data is transmitted on one bin, but recovered on a different bin,

the data will be misaligned, or parts may be dropped altogether. Additionally, a study of the other noise sources will be

made all the more difficult without the assumption that bins are consistent from transmitter to receiver. To that end, the

first problem to be addressed during calibration is the assignment of bin locations on the receiver.

The space in which the bins are to be assigned is the active area of the receiver. In this case, the space is the pixels on a

CMOS camera. Some reasonable assumptions of the system are: If the transmitted bin is approximately 4 pixels tall, by 4

pixels wide, it will be approximately the same size on the receiver. The transmission is done in such a way that it does

not cover the entire active area of the receiver. There will be some horizontal and vertical translation between the

transmitted data and the received data. There will be some finite rotation of the data, although it may be very small.

Finally, there will be some barrel or pinhole distortion.

3.4 Augmented reconstruction of communication channels

One way of solving these problems is to invest time and effort in minimizing each noise source through alignment and

light compensation using optics. This is called the standard reconstruction method. Another way to solve this problem is

to dynamically assign bin locations on the receiver through pattern recognition algorithms, which is referred to as

augmented reconstruction. Though this method requires additional processing time, it will only be done infrequently as

the alignment is mostly static. Thus the dynamic assignment can be done once at the beginning of operation, and then

periodically when the bit error rate (BER) is measured to be above a given threshold. An example of each reconstruction

method is shown in Figure 2.

In order to determine the location of each bin on the receiver, a checkerboard pattern of alternating zeros and ones is

transmitted, as shown in Figure 3(a). This is done so that each bin location is easily identified. Once received, the data is

correlated with a match filter of the same size as the bins. This creates another image, wherein each local maximum

appears in approximately the center of a single bin, as shown in Figure 3(b). There will be local maxima, which do not

correspond to a bin location due to amplifier and spike noise, but as long as the spatial magnitude of this noise is small

compared to the bin size they can be removed by thresholding.

Next, the algorithm finds each of these local maxima and creates a Boolean array of the same size as the input data, as

shown in Figure 4. This Boolean array, which will be called an allocation mask, is logical true where local maxima are

found and false everywhere else. The resulting locations are a good approximation of 50% of the bin locations.

Imperfections in the data may cause some bin locations to be duplicated, or lost. In order to resolve these conflicts,

another algorithm must be used to ensure the uniformity of the allocation mask.

The first problem to be addressed is the case where a bin generates more than one center point. This can be caused by

dark spots in the middle of the bin, which may be spike noise, dust, or amplifier noise. Assuming that the majority of the

bins do not suffer from this condition, it can be corrected by comparing the distance from each point to its nearest

neighbors. In this example, each real center point is approximately 4 pixels away from its nearest real neighbor in both

the row and column direction. Any point which has another point much closer than this is considered to be in conflict for

the correct bin location. All conflict bins are then measured against their neighbors in a window that is of a size such that

if the bin is correctly placed, only 5 true centers will be found in the window, as shown in Figure 5. In this way, any bins

which are closest to the correct position will have the most neighbors at the correct distance. Any point which has more

incorrectly spaced neighbors than correctly spaced is removed from the allocation mask.



Transmitted Data Received Data Standard Reconstruction

Augmented Reconstruction

Key

OM: Optical MediumSR: Standard ReconstructionAR: Augmented Reconstruction

Bin FindingAlgorithm

t l i i t i i IN IL ih al*l`i,ti4 b`k ! I

i r i i i i i IP ENSi i i i ii hiA i i,. i i 1

.l611 .*0II 1t i i# i; is ii rLittii.ii`ii11W * *' b. ` ü

E l s a ' 1 3 0 4 4F Y it I i i i iW ho igi F tll k tIt mi w t k il h. ip !t i t . , r i M1111104. a i E f i i

Mt S -

ae,

av

ee

IVH

eal

aaa

lbII

8,+

itdi

lbé

®iá

4,A

++

kB

ppiw

e;H

Ea

ItpE

f,E

*I

_4

raii

na

di9

5.W

Im

a

fir

Ea

ae

t!a,

a'...

.

aa

i3t3

rÄ

Figure 2. The encoded data (upper left) is formed into a page and transferred through the optical medium. The top center image is a

page recovered from the optical sensor. Bin locations and intensity values have changed. The standard reconstruction method using

optical alignment generates the image in the top right; In the bottom right, the pattern recognition algorithm is used to find bin

locations, resulting in a more accurate image.

(a) (b)

Figure 3. Example of recovered data before and after the application of a match filter. Figure 3a shows some spike noise, which

would appear as local maxima in a search of the image. The match filter minimizes the appearance of false positives for bin locations.



II

Figure 4. Above is an example of the local maxima, in white, overlaid on the filtered version of the recovered data. The bins are not

uniform in size, thus the local maxima may appear to be off center for different bins.

(a) (b) (c)

Figure 5. Example of neighbors of a bin: (a) the image is an example of four appropriately spaced neighbors to the center bin. (b)

Shows a neighbor too close to the center bin, and (c) is missing a neighbor.

3.5 Recovery of missing bins

Next, it is necessary to identify the missing bins. In this step, each bin in the mask is evaluated in a window of the same

size as previously described. Assuming that the previous step was successful in removing duplicate center bins, there are

4 possible situations arising in the window: The window contains only 2 bins if it is centered on one of the four corners

or on a bin which is missing three neighbors. The window will contain 3 points if it is a bin on the edge of the allocation

matrix, or if the bin is missing 2 neighbors. The window will contain 4 points if it is an interior bin missing a single

neighbor. Finally, the window will contain 5 points if the bin is on the interior with all neighbors intact. Of these cases,

only the one with 4 points guarantees that the missing point needs to be added to the allocation mask.

Assuming that the number of bins missing is a small fraction of the entire grid, it is possible to reconstruct an

approximate grid by choosing to add bins in the following manner: If the window contains 4 points, place a point where

the missing neighbor should be. Applied to the entire allocation mask, this will generate 4 points approximately where

each missing bin should be, one from each diagonal neighbor. Each point can be thought of as a vote from its adjacent



neighbor, and the middle point between these votes is the most likely location for the missing bin. Another pass through

the allocation mask can collapse these votes into a single bin by taking the first moment in a window which is large

enough to contain all votes, but small enough to not include any adjacent neighbors. If there are regions of the allocation

grid where many adjacent points are missing from the allocation mask, repetition of this method will eventually fill that

region with points that approximate the actual bin locations.

3.6 Generation of bin allocation mask

Once the allocation mask has passed the preceding steps, it is possible to approximate the center points for the dark bins

by taking a linear interpolation between the directly adjacent bins. Finally, the allocation mask needs to be stored in such

a way that the collected data can be arranged in a rectangular binary gird. To accomplish this, a known point must be

found in the image, and every other point taken as a reference to this point, as illustrated in Figure 6. The most obvious

point to start from is the upper left corner, because it is easy to describe mathematically. To find the upper left corner, let

X be a vector of the column indices of each bin in the allocation mask, and let Y be a vector of the row indices. The bin

which has indices which minimize the square root of √ is the bin which is closest to the upper left corner

of the allocation mask. Using this as a starting point, it is possible to move from point to point through the allocation

mask, looking for the bin which is closest to the expected location of the next bin. This can be seen in the following

figure.

Figure 6. A simulated bin allocation mask: Bins are indicated as white squares. The upper left bin is circled, and the ideal spacing

for its nearest neighbors are indicated in gray.

Assuming the data is transmitted as an array with height Y and width X, the algorithm need only start from the upper left

corner and record the indices of this point. Then, depending on the method of search, begin indexing downward Y times

or to the right X times. This is done assuming that the next row or column will be recorded in a similar manner. Upon

completion, the bin finding algorithm returns a data structure with the row and column indices for the center points of

each bin. These bins and their surrounding pixels will be the basis for a thresholding method which will determine if a

binary one or zero is transmitted in a later stage. It is important to note that this method yields accurate center points for

bins which are at least 3 pixels by 3 pixels in size. If the transmitted data is smaller than that, it is difficult to approximate

a center point and another method is required.

3.7 Post-processing using global intensity thresholding

Once the calibration has been completed, the system is ready to transmit real data. Comparisons between the transmitted

data and the recovered data will yield information on the transient noise in the system. An effective post processing



routine will compensate for these transient noise sources and recover the original data in its entirety. If the image has

mostly uniform illumination, it is possible to determine a single value to threshold the entire image. This is called global

thresholding. To determine the value to threshold, the average value of the bright bits is added to the average value of the

dark bits and divided by two. This method has a code rate of 1.0 and requires less processing time than other methods.

The downside to this method is the uniformity assumption. If the illumination is not uniform, some bits will be given the

incorrect value.

3.8 Post-processing using local intensity thresholding

If the illumination is not uniform, it is possible to threshold the image in smaller regions. To determine the threshold

value for this method, a single dark and light bit, which we will call thresholding bits, are placed at regular locations in

the transmitted data. The frequency of these thresholding bits is determined by the spatial magnitude of the intensity

variations. The smaller the spatial magnitude, the more frequently the bits need to be placed in the transmitted data. An

appropriate spacing allows for the assumption that the illumination is constant in that neighborhood. If the assumption

holds, any bin brighter than the midpoint between the dark bin and the light bin is a binary one, and any bin that fails this

test is a binary zero.

3.9 Post-processing using neural network estimated thresholding

It is also possible to threshold the data using an artificial neural network. Neural networks are excellent at pattern

matching and classification [8-9]. If given the appropriate data and training parameters, the neural network can threshold

the transmitted data without any thresholding bits. This allows for the maximum code rate, but comes at a cost. First, a

neural network can be trained to almost arbitrary accuracy on a single set of data, but overtraining can lead to a loss of

generality. This loss of generality can be neglected if the transmission conditions are approximately constant between

training sets. We will compare these thresholding strategies in the next section.

4. EXPERIMENTAL RESULTS

The bin allocation method previously discussed was used to find bin locations for a set of twenty two data pages. Each

page is a set of alternating binary ones and zeros, 4 pixels by 4 pixels, which resembles a checkerboard pattern. These

pages were transmitted under various alignments and illuminations. For each page, the bin locations are found and four

(4) methods of thresholding are tested to determine their bit error rate (BER). Each bin is approximated as the mean of a

three by three square centered on the bin coordinates. The global thresholding method uses a single value to determine

which bins are binary ones or zeros. This threshold value was chosen to be halfway between the average values for the

binary ones and zeros. In the local thresholding method, two bins with known identity are used to threshold a cluster of

data adjacent to these bins. Three neighborhood sizes were tested: One of size eight-by-eight (8x8), one of size four-by-

four (4x4), and one of size two-by-two (2x2), as shown in Figure 7.

4.1 Global vs local thresholding

Bit error rates are recorded for each of the four methods by comparing the results to a binary grid, which matches the

transmitted data. Without exception, the global thresholding method has a higher bit error rate than any of the local

thresholding methods. As expected, the bit error rate decreases as the neighborhood size decreases, however this

improvement comes at a cost in code rate. The eight-by-eight (8x8) neighborhood has a code rate of .97, the four-by-four

(4x4) a code rate of .88, and the two-by-two (2x2) a code rate of .50. All methods were able to maintain a BER of less

than one percent. Data relevant to this experiment can be found in Figures 8 and 9.



0.007

m 0_006

cc 0_005ó

w 0_004co

0 5 10 15

Test Image20

l global

x 2x2O 4x4

8x8

(a) (b) (c)

Figure 7. The three local neighborhoods for thresholding: (a) a neighborhood of size eight by eight (8x8), the bits used to threshold

the rest of the neighborhood are boxed near the center; (b) a neighborhood of size four by four (4x4), with thresholding bins boxed;

(c) a neighborhood of size two by two (2x2), the top two bins are used to threshold the bottom two bins.

Figure 8. The Bit Error Rate for twenty two pages using four different thresholding methods.



9

8

7

iz 6wm_

m 5(vo!

° 4wzm 3

2

1

o

Bit Error Rates for Four Threshold Methods

f -I4

1 2 3 4Global 8x8 4x4 2x2

Figure 9. A box plot summarizing the results of the Bit Error Rate tests on global, 2x2, 4x4, and 8x8 thresholding methods.

Apparent from the data is that local thresholding has a positive effect on the BER. The mean BER for the four-by-four

neighborhood is less than half the mean BER for the global thresholding method. Also, there is no significant

improvement in BER when comparing the four-by-four method to the two-by-two. This is a strong indication that the

intensity is roughly constant on the scale of the four by four neighborhood. The minimum BER is found to be nearly

identical for each method, which can be interpreted as the error floor for this type of thresholding.

4.2 Neural network thresholding

In order to test the artificial neural network training method, it is necessary to choose a data set that has consistent

transmission characteristics. Of the original twenty two images, only thirteen were determined to be similar enough to

test this method. For inputs to the neural network, the following values are measured for each bin: The mean, the

standard deviation, and the maximum and minimum values. Additional inputs are taken from the data page after

correlation with a match filter: The mean, the standard deviation, and the minimum and maximum values. Finally, the

global mean of the image is included.

Figure 10 shows the experimental results of the neural net (nnet) as compared to the other thresholding methods. As

expected, the neural network is able to match and exceed the performance of the other methods in the majority of the

pages. In two cases the neural network did worse than the global threshold method, but this can be attributed to

overtraining.

5. CONCLUSIONS AND DISCUSSIONS

We have developed post-processing algorithms for distortion and noise correction in the multi-channel optical

communication system. Bit error rates for several configurations of the test system indicate that these methods for data

transmission are successful but can be improved. Introduction of forward error correction encoding schemes could



O0 2 4 6 8

Test Image10 14

O nne1

= 2x2

4:<4

Tr ûxû

reduce the bit error rate to current telecommunication standards [10]. Additional methods, such as a look up table or

artificial neural network, for thresholding could reduce the raw BER, to reduce the processing requirements for

encoding.

Figure 10. Bit error rates for neural network (nnet), global, 2x2, 4x4, and 8x8 local thresholding techniques.

The dynamic allocation method, if developed further, could be used to create communication links between mobile

agents or mobile networks between multiple agents. For instance, cars on a highway could have forward and rear-facing

FSODT interfaces which would allow them to communicate with each other. This could be used to convey information

about traffic patterns, intention to stop or change speed, or inter-car voice/video communication. If the illumination and

light-blocking device is replaced with an LED display, mobile phones could use this technology to communicate secure

information to stationary terminals such as an ATM or controlled access door.

Static configurations of this method could be used to set up high speed wireless data transfer networks. If the transmitter

were to employ a feedback system to optimize the alignment, a commercially available digital micro-mirror device, and

a high-speed imaging device, data rates in the neighborhood of 320 Mbs could be achieved with very short set up time.



ACKNOWLEDGEMENTS

This research was carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology under a contract

with the National Aeronautics and Space Administration (NASA).

REFERENCES

1. V. Alwayn, Optical Network Design and Implementation, Cisco Press, 2004.

2. Sherman, Chris, ed. The CD-ROM Handbook. Second ed. New York: McGraw-Hill, 1994.

3. Majumdar, A. K., “Free-Space Laser Communications, Principles and Advances” Springer New York, 2008.

4. FCC Online Table of Frequency Allocations, 2013.

5. http://telecom.esa.int/telecom/www/object/index.cfm?fobjectid=27945

6. http://www.nasa.gov/centers/goddard/news/releases/2012/12-074.html

7. http://www.petapixel.com/2013/02/12/what-a-dslrs-cmos-sensor-looks-like-under-a-microscope/

8. T. Lu, C. L. Hughlett, H. Zhou, T-H. Chao, J. C. Hanan, “Neural network post-processing of grayscale optical

correlator,” Proc. SPIE 5908, Optical Information Processing III, 2005.

9. T. Lu, D. Mintzer, “Hybrid neural networks for nonlinear pattern recognition”, Optical Pattern Recognition, ed. by

F. T. S. Yu & S. Jutamulia, Cambridge University Press, 1998.

10. N. Letzepis and A. Grant, “Bit error rate estimation for turbo decoding,” in Proc. IEEE Int. Symp. Inf. Theory, p.

437, 2003.



spie proceedings [spie spie defense, security, and sensing - baltimore, maryland, usa (monday 29...

Documents