image processing capabilities of arm-based
TRANSCRIPT
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 1/6
Image Processing Capabilities of ARM-based
Micro-controllers for Visual Sensor Networks
M. Amac Guvensan, A. Gokhan Yavuz, Z. Cihan Taysi, M. Elif Karsligil, Esra Celik
Department of Computer Engineering, Yildiz Technical University
Istanbul, Turkeyamac, gokhan, cihan, elif, [email protected]
Abstract—Last decade witnessed the rapid development of Wireless Sensor Networks (WSNs). More recently, the avail-ability of inexpensive hardware such as CMOS cameras andmicrophones that are able to ubiquitously capture multimediacontent from the environment has fostered the development of Wireless Multimedia Sensor Networks (WMSNs) [1]. Nodes insuch networks require significant amount of processing powerto interpret the collected sensor data. Most of the currently
available wireless multimedia sensor nodes are equipped withARM7 core micro-controllers [2]. On the other hand, ARM9core is a viable alternative, which delivers deterministic highperformance and flexibility for demanding and cost-sensitiveembedded applications. Thus, we evaluated the performance of the ARM7 core against the ARM9 core in terms of processingpower and energy consumption. Our test results showed thatarchitectural improvements of the ARM9 core alone resulted ina 30% speed-up in execution time, where the ARM9 core ingeneral performed 9 to 11 times faster than the ARM7 core.
Keywords-Convolution, Visual Sensor Networks, ARM-basedmicro-controllers, Performance Evaluation
I. INTRODUCTION
Wireless Multimedia Sensor Networks (WMSNs) receivemuch attention with the rapid development and progress in
sensor technologies, embedded computing, and availability of
inexpensive CMOS cameras and microphones [2]. WMSNs
are able to obtain audio/visual information from an observed
area in contrast to traditional WSNs where sensor nodes were
restricted to collect basic scalar data. Thus, they recently
enabled many WMSN applications such as surveillance sensor
networks, law-enforcement reports, traffic control systems,
advanced health care delivery, automated assistance to elderly
telemedicine, and industrial process control [1].
Interpreting the observed data accurately/timely is the main
focus of these applications. Respective to the requirements of
the application, this process might occur either at the more
powerful base station or within the network on the battery/CPUconstrained sensor nodes. WMSNs are supposed to process
huge amounts of data during their operation [3]. To transmit
these raw data without in-network processing to a base sta-
tion causes sensor nodes to deplete their batteries quickly.
Hence, substantial energy could be saved by avoiding the
transmission of redundant information [4]. Besides, sending
raw data especially of medium/high resolution video images
is a serious problem due to communication link restrictions
[5]. Fortunately, new generation micro-controllers encourage
the researchers to implement signal processing algorithms
such as encoding, compression, object detection, etc. on the
nodes. In-situ processing brings many advantages to overcome
the challenges in WMSNs. Therefore, researchers aim at
transferring only meaningful information from sensor nodes
to base stations.
Objects and/or regions could be described as the most
important meaningful information in a picture/frame. Sensornodes in a visual sensor network are responsible for the
detection, recognition and tracking of targeted objects [6]. To
fulfill these tasks, several image processing techniques have
been implemented on sensor nodes recently. Real-time image
processing, especially real-time video processing, on existing
nodes is still hard to implement due to its CPU/memory
constraints. Exploring currently available wireless multimedia
sensor nodes [2][7] shows us that ARM micro-controllers are
very common among these nodes. Due to their low cost, small
size and simplicity, they became very popular and dominant in
the market. ARM processors are used extensively in consumer
electronics, including PDAs, mobile phones, music/media
players, and hand-held game consoles. ARM [8] offers a widerange of micro-controller cores, such as ARM7, ARM9, and
ARM11, with different functionalities and capabilities. ARM7
is one of the powerful micro-controllers for embedded comput-
ing. However, its performance is still behind the requirements
of real-time video processing especially for high resolution
images. On the other hand, ARM9 and ARM11 introduces
new skills for signal processing algorithms. However, ARM11
includes a wide range of peripherals such as LCD controller,
mpeg encoder, and graphic accelerator, since it is specifically
designed for multimedia applications. Due to its high cost and
peripheral overhead it is not an optional candidate for visual
sensor nodes. Thus, it is not included in this study.
In this paper, we compare the performance of ARM9 with
the performance of ARM7 and show the necessity of ARM9based micro-controllers for embedded computing in visual
sensor networks. A wide range of image/video processing
algorithms, from preprocessing to encoding, are made up of
a sequence of convolutions, which transform images using a
set of kernel matrices [9]. Thus, we believe that, how well a
micro-controller performs the compute intensive convolution
process could be indicative of its suitability for embedded
signal processing applications.
The rest of this paper is organized as follows. Section
2011 Ninth IEEE/IFIP International Conference on Embedded and Ubiquitous Computing
978-0-7695-4552-3/11 $26.00 © 2011 IEEE
DOI 10.1109/EUC.2011.44
250
2011 Ninth IEEE/IFIP International Conference on Embedded and Ubiquitous Computing
978-0-7695-4552-3/11 $26.00 © 2011 IEEE
DOI 10.1109/EUC.2011.44
243
2011 IFIP Ninth International Conference on Embedded and Ubiquitous Computing
978-0-7695-4552-3/11 $26.00 © 2011 IEEE
DOI 10.1109/EUC.2011.44
243
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 2/6
2 discusses available studies on the performance evaluation
for embedded systems. In Section 3, we categorize the basic
convolution algorithms and explain their importance for multi-
media sensor applications. Section 4 presents the architectural
details and differences of ARM7 based and ARM9 based
micro-controllers. In Section 5, we present the results of the
performance evaluation of ARM7 and ARM9 cores.
I I . RELATED WOR K
Performance optimization of image processing algorithms
on embedded systems can be achieved either in hardware or
software. In [10], authors propose an embedded video image
recognition system based on a Single Instruction Multiple
Data (SIMD) micro-controller and memory arrays. Although
their platform runs four times faster than a general purpose
processor, it is not suitable to be used in a wireless sensor
network due to its very high production cost and energy con-
sumption. In [11], it is shown that, using a high performance
micro-controller is more efficient, when performing complex
operations like image processing. Thus, authors propose a
dual processor architecture consisting of a Intel PXA255micro-controller and TI MSP430F1611 micro-controller. The
PXA255 is used only for image processing, whereas the
MSP430F1611 is used for general sensing purposes. The
proposed system also adapts a dual radio solution based on
low power 802.15.4 and 802.11g interfaces.
In [12], an embedded smart camera system is proposed
for traffic surveillance. Authors employ both hardware and
software optimizations to improve the system performance.
Hardware optimizations include the use of Direct Memory
Access (DMA) and the integration of a DSP. Software op-
timizations are limited to the removal of external memory
accesses and the replacement of floating point operations
with fixed-point operations. Stationary vehicle detection was
selected as an example application. The prototype system is
capable of processing 1.5 frames per second (fps) at full PAL
resolution.
In [13], authors propose a low cost embedded color vision
system consisting of a RISC based micro-controller operating
at 75 MHz and a CMOS camera. The proposed system
performs color blob tracking by comparing RGB values of
each pixel with lower and upper bounds. The system is capable
of tracking color blobs with 16 fps at a resolution of 80x143.
A parallel image processing architecture is proposed in [14]
for embedded vision systems. The system is based on FPGA
and operates at a clock rate of 50 MHz. It is capable of per-
forming pre-processing functions such as filtering, correlation
and transformation. The system benefits from 16 identicalprocessing elements. Each element is considered as a small
DSP intended for image processing.
In [15], authors demonstrate the feasibility of a distributed
surveillance system by developing a prototype implementation.
Authors design system architecture based on commercial off-
the-shelf elements to accelerate the development phase and to
lower the production cost. The system has three main parts:
the sensing unit, the processing unit, and the communication
unit. Sensing unit is a CMOS camera that is able to deliver
30 fps at VGA resolution and is connected to the processing
unit via a FIFO memory. Processing unit consists of 10 TI
TMS320C64x DSPs that can deliver an aggregate performance
of 80 GIPS. A peripheral component interconnect (PCI) bus
couples the DSPs and connects them to the communication
unit. Communication unit has a wired ethernet interface, a
wireless GSM interfaces, and a Intel XScale IXP425 processorto manage communications.
In [9], authors evaluate the performance of the AD Blackfin
BF561 fixed point, dual core DSP for image processing algo-
rithms. They use assembly optimizations, and stream image
processing technique to reduce runtime of the algorithms.
Most of the work discussed above propose special ar-
chitectures to satisfy the requirements of visual sensor net-
works. Although these architectures perform well, they are
not applicable to visual sensor networks due to their high
manufacturing cost and energy consumption. Thus, we believe
that general purpose micro-controllers are more appropriate for
the task.
III. CONVOLUTION IN IMAGE PROCESSING
Convolution is a fundamental mathematical operation for
image processing. Many image processing algorithms exploit
convolution basically to enhance the image and/or extract
features from the image [6]. The basic idea is that an image
is scanned with a window of predefined size and shape. The
output pixel value is the weighted sum of the input pixels
within the window, where the weights are the values of the
filter assigned to every pixel of the window itself. The window
consisting of the weights is called the convolution kernel.
An image represented by 2D matrix F := (f (x, y)mxn) can
be transformed into another image represented by the matrixg(x, y) by convolving F with a kernel, denoted by H .
g(x, y) = f (x, y)
h(x, y); (1)
This process could be explicitly expressed as in Equation
2.
g(x, y) =
n
2j=−n
2
n
2i=−n
2
h[ j+(n−1), i+(n−1)]×f [x− j, y−i]
(2)
Convolution is applied to images for many different pur-poses including noise removal, smoothing, blurring, sharpen-
ing and determination of the edges [16]. Convolution-based
image processing algorithms can be categorized into three
groups.
1) Enhancing Operations
2) Derivative-based Operations
3) Morphological Operations
251244244
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 3/6
Fig. 1. Convolution-based Operations
A. Enhancing Operations
Image enhancing operations are utilized to prepare images
for further processing such as segmentation and feature extrac-
tion. Applying these type of filters mainly aims at reducing the
noise and smoothing/blurring/sharpening the image. Existing
image filters can be grouped into several domains such as
linear and non-linear, low-pass and high-pass, rectangular and
circular. As an example, two most popular filters, low-pass
and high pass filters, are given in Equation 3.
LpassF =1
9
⎡⎣
1 1 11 1 11 1 1
⎤⎦HpassF =
⎡⎣−1 −1 −1−1 8 −1−1 −1 −1
⎤⎦ (3)
B. Derivative-based Operations
Derivative-based operations are applied for detecting the
three basic types of discontinuities in a digital image; points,
lines and edges [16]. There are two types of operations;
first-order derivatives and second-order derivatives. The most
common first-order derivative-based operation is the edge
detection. To find edges in a 2D image, a horizontal and a
vertical kernel are scanned across the image to calculate both
the gradient magnitude and the gradient direction. The mostpreferred gradient filters among the first-order derivatives are
Roberts, Prewitt, Sobel and Gaussian kernels. Equation 4 gives
an example of Prewitt filter-pair.
Prewittx =
⎡⎣
1 2 10 0 0−1 −2 −1
⎤⎦Prewitty =
⎡⎣
1 0 −12 0 −21 0 −1
⎤⎦
(4)
Another way of finding edges is the Laplacian, which is an
example for the second order derivative of a 2D function.
C. Morphological Operations
Morphological operations in image processing aim at ex-
tracting image components that are useful in the representation
and description of region shapes, such as the boundaries, the
skeletons, and the convex hull. There are four main types
of morphological operations; dilation, erosion, opening and
closing. Dilation causes objects to dilate or grow in size,
whereas erosion causes objects to shrink. The amount and the
way that they grow and shrink depend upon the choice of the
structuring element, i.e. the convolution kernel. On the other
hand, opening and closing are the combination of dilation and
erosion. In opening, first erosion, then dilation is applied to an
image. Opening generally smooths the contours of an image,
breaks narrow isthmuses, and eliminates thin protrusions. In
closing, erosion and dilation is applied in the opposite order
compared to opening. Although it tends to smooth sections
of contours, as opposed to opening, it generally fuses narrow
breaks and long thin gulfs, eliminates small holes, and fills
gaps in the contours.
D. Complexity Analysis of the Convolution Process
2D convolution is computationally intensive. For an N ×N
area and M ×M kernel, the time complexity of the sequential
convolution is O(N 2M 2). Convolution process consists of
a series of load-multiply-store operations. Thus, it can be
regarded as a Multiply-And-Accumulate operation and usually
micro-controllers with such DSP features do perform the
convolution faster. The pseudo code for the convolution is as
follows.
Algorithm 1 The pseudo-code of the 2D Convolution
CONVOLUTION(*srcImg,width,height,convKernel,kernelSize,*dstImg)for i = kernelSize/2 to height − (kernelSize/2) do
for j = kernelSize/2 to width− (kernelSize/2) do
for k = −kernelSize/2 to kernelSize/2 do
for l = −kernelSize/2 to kernelSize/2 do
tmp=tmp+convKernel[k][l] × srcImg[i+k][j+l];end for
end for
dstImg[i][j]=tmp;end for
end for
IV. ARM9 VS ARM7
ARM9 core provides several key improvements over the
ARM7 core in terms of clock frequency, cycle count, pipeline,
and extra peripherals. In the following subsections we willdescribe these improvements in detail.
A. Clock Frequency
The increased clock frequency of the ARM9 core comes
from a 5-stage pipeline design, compared to a 3-stage pipeline
design in the ARM7 core. Increasing the number of pipeline
stages increases the amount of parallelism in the design,
thus reducing the amount of logic, which must be evaluated
within a single clock period. With a 5-stage pipeline design,
the processing of each instruction is spread across five or
more clock cycles. Therefore, up to five instructions could be
worked on during any one clock cycle. The maximum clock
frequency of the ARM9 core is generally in the range 1.8 to2.2 times the clock frequency of the ARM7 core [8].
B. Cycle Count
Cycle count improvements give increased performance, in-
dependent of the clock frequency. The amount of improvement
depends on the mix of instructions in the code being executed,
which is affected by the nature of the program, and for high
level languages, by the compiler used.
252245245
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 4/6
1) Loads and stores: The most significant improvement
in the instruction cycle count, moving from the ARM7 core
to the ARM9 core, is the performance of load and store
instructions. Reducing the number of cycles for loads and
stores gives a significant improvement in program execution
time, since almost 30% of instructions are loads and/or stores.
The reduction of cycles for loads and stores is achieved by
the two fundamental micro-architectural differences between
the designs of the cores.
• The ARM9 core has separate instruction and data mem-
ory interfaces, allowing the CPU to simultaneously fetch
an instruction and read or write a data item. This is called
a modified-Harvard architecture. On the other hand, the
ARM7 core has a single memory interface.
• The 5-stage pipeline introduces separate Memory and
Write Back stages. These are used to access memory for
loads or stores, and to write results back to the register
file.
Table I summarizes the cycles required to execute various
load and store instructions. The table shows that all store
instructions take one cycle less on the ARM9 core than on theARM7 core. It also shows that load instructions generally take
two less cycles on the ARM9 cores, if there are no interlocks.
TABLE ILOAD AND STORE CYCLE COUNTS
Instruction ARM7 Core ARM9 CoreType Execute Interlock Execute Interlock
LDR1 3 0 1 0 or 1
LDM of 2 n+2 0 n 0 or 1n registers
LDRH3 3 0 1 0 to 2
LDRB4
LDRSB5
LDRSH6
STR7 2 0 1 0STM8 n+1 0 n 01 load one word 2 load multiple words3 load one halfword 4 load one byte5 load one signed byte 6 load one signed halfword7 store one word 8 store multiple words
2) Interlocks: Pipeline interlocks occur when the data re-
quired for an instruction is not available due to the incomplete
execution of an earlier instruction. When an interlock occurs,
the hardware stalls the execution of an instruction until the
data is ready. This provides complete binary compatibility
with earlier ARM processor designs, however it increases
the execution time of the code sequence by a number of
interlock cycles. Compilers and assembler-code programmers
can in many cases reduce the number of interlock cyclesby re-arranging the order of instructions or by using other
techniques.
ARM compilers implement code scheduling optimizations
to reduce the number of interlock cycles. It is often possible
to find a useful instruction to move between two consecutive
loads, but not always. This means that the average number of
cycles to execute a LDR is between 1 and 2. The exact number
depends on the code being compiled, and the sophistication
of the compiler.
3) Branches: Although ARM9 core has a bigger pipeline,
the number of cycles for a branch instruction both on ARM9
and ARM7 cores are equal. This is because the pipelines have
the same number of stages up to the end of the execute stages.
Thus, branches are implemented in the same way on both
cores. The cycle counts for branches are given in Table II.
TABLE IILOAD AND STORE CYCLE COUNTS
Condition Code Check ARM7 ARM9
Pass 3 3
Fail 1 1
The ARM9 core does not implement branch prediction,
because branches on these CPUs are fairly inexpensive in
terms of lost opportunity to execute other instructions.
C. Pipeline
The ARM7 core implements the 3-stage pipeline design as
shown in Figure 2. In a single cycle, the Execute stage can read
operands from the register bank, pass them through the ShiftRegister, and through the Arithmetic and Logic unit (ALU)
and write the results back to the register bank.
Data reads from and writes to the memory system are also
performed in the Execute stage. To do this, the instruction
stays in the Execute stage of the pipeline for multiple cycles.
Fig. 2. 3-stage pipeline of ARM7 core
The ARM9 core implements the 5-stage pipeline design as
shown in Figure 3. It is also a Harvard architecture, so that
data accesses do not have to compete with instruction fetches
for the use of one bus. Result forwarding is also implemented,
so that results from the ALU and data loaded from memory are
fed back immediately to be used by the following instructions.
This avoids having to wait for results to be written back to
and read from the register bank.
In this pipeline design, dedicated pipeline stages have been
added for memory access and for writing results back to
the register bank. Also, register read has been moved back
into the decode stage. These changes allow for higher clock
frequencies by reducing the maximum amount of logic which
must operate in a single clock cycle.
Fig. 3. 5-stage pipeline of ARM9 core
Loads from the memory system and stores to the memory
system are also performed in the Execute stage. To do this
the instruction stays in the Execute stage of the pipeline for
multiple cycles.
253246246
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 5/6
LDR uses the Execute stage for only one cycle, allowing
other instructions to use the Execute stage in the following
cycles, unless there are interlocks. This means LDR is a single
cycle instruction.
V. PERFORMANCE EVALUATION
Nowadays several companies offer micro-controllers with
ARM cores. Thus, a huge number of ARM7 and ARM9based micro-controllers are available off-the-shelf with a large
variety of peripheral configurations. In our case, our primary
focus is to analyze the suitability of the cores for visual sensor
networks in terms of the processing power and energy con-
sumption. Thus, for our purposes, the peripheral configuration
is not crucial to micro-controller selection and therefore any
micro-controller with an ARM7 or AMR9 core could be used
for the task.
For the performance evaluation of the ARM7 core, we used
a development board from Olimex [17] built around the At-
mel’s AT91SAM7S256 [18] micro-controller, which contains
an ARM7TDMI-S core. The AT91SAM7S256 operates at a
maximum speed of 55MHz and features 128KBs of flashmemory and 64KBs of SRAM. On the other hand, we used a
single board computer (SBC) from FriendlyARM [19], which
has Samsung’s ARM920T based S3C2440A micro-controller.
This SBC is capable of operating up to 400 MHz and is
equipped with 64 MBs of external SDRAM.
We performed the tests on the boards without any operating
system, and with all unnecessary peripherals of the micro-
controller turned off. By doing so we were able to measure the
pure performance of the cores. To eliminate the possible neg-
ative effects of different development tools, we used Sourcery
CodeBench development tool with gcc-4.5.1 compiler from
Code Sourcery for the tests.
On both cores, we applied the convolution process to
images of different resolutions. In order to demonstrate the
performance differences between the two cores, we conducted
our tests with regard to three main criteria; image resolution,
kernel size, and code optimization. For image resolution, reso-
lutions of 160×120, 320×240, and 640×480 were selected for
low, medium, and high resolution images respectively. Kernel
sizes were chosen as 3×3, 5×5, and 7×7. Both unoptimized
and optimized code were generated for each convolution of the
resolution and kernel size. For code optimization, the com-
pilers -O2 optimization mode was used. These optimizations
mainly include loop optimizations, jump threading, common
subexpression elimination and instruction scheduling. Figure
4, Figure 5, and Figure 6 show the time required for the convo-
lution process on both ARM7 and ARM9 cores. Consideringthe relative operating frequencies of both of the cores, a linear
speedup of 8.33 would be the theoretical limit. However, our
tests showed that ARM920T performed 9 to 11 times faster
than ARM7TDMI. This extra speedup is to be credited to the
ARM920T’s architectural design enhancements. Furthermore,
we observed that the extra speedup would be as high as 30%
with an optimized code, whereas with optimizations enabled
the speedup drops and varies between 7% and 12%. As an
example, the optimized convolution code runs 90% faster on
ARM920T with a kernel size of 3× 3 and a image resolution
of 160 × 120 if ARM7TDMI-S were to operate at the same
clock frequency as the ARM920T.
ARM920T defeats ARM7TDMI-S not only in terms of
speedup but also in terms of its energy consumption. Both
ARM7TDMI-S and ARM920T cores consume 0.25mW per
MHz . Although at full speed ARM920T consumes 8.33times more power than ARM7TDMI-S, it can complete the
same process 9 to 11 times faster than ARM7TDMI-S. Thus,
ARM920T consumes 7%-32% less power for the same opera-
tion. Table III gives detailed comparison of the cores regarding
their energy consumption.
The primary bottleneck of the ARM7TDMI-S core is that
the convolution process itself takes too much time to complete
especially for medium and high resolution images. Thus, this
bottleneck is the primary inhibiting factor for real-time video
processing on ARM7 cores. On the other hand, the results of
ARM920T core encourages us to use it on video sensor nodes.
TABLE III
PERFORMANCE COMPARISON OF ARM7TDMI-S AND ARM920TCORES IN TERMS OF ENERGY CONSUMPTION (MJ)
ResolutionARM7TDMI-S1 ARM920T2
3×3 5×5 7×7 3×3 5×5 7×7
Low 0.76 1.82 3.32 0.4 1.7 3
Medium 3.02 7.46 13.87 1.6 6.8 12.4
High 12.26 30.47 56.84 6.3 27.8 50.9
1 ARM7TDMI-S operates at 48MHz and consumes 0.25mW/MHz.2 ARM920T operates at 400MHz and consumes 0.25mW/MHz .
VI . CONCLUSION
Performance evaluation test results shows that most multi-
media sensor nodes are not equipped with enough processing
power for real-time video processing on medium/high resolu-
tion images. One of the most favored micro-controller cores,ARM7TDMI-S, has performed far behind of the ARM920T in
terms of processing power and energy consumption. The ar-
chitectural design enhancements of ARM920T alone speedups
the convolution process up to 30% compared to ARM7TDMI-
S. Including its high clock frequency, ARM920T performs 9
to 11 times faster than ARM7TDMI-S. Besides, ARM920T
consumes 10%-50% less energy for the same type of oper-
ations. Test results encourage us to utilize ARM9 cores on
multimedia sensor nodes for real-time video processing.
REFERENCES
[1] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury, “A Survey on WirelessMultimedia Sensor Networks,” Computer Networks (Elsevier), vol. 51,no. 4, pp. 921–960, mar 2007.
[2] I. T. Almalkawi, M. Guerrero Zapata, J. N. Al-Karaki, and J. Morillo-Pozo, “Wireless multimedia sensor networks: Current trends and futuredirections,” Sensors, vol. 10, no. 7, pp. 6662–6717, 2010.
[3] J. B. Javier Molina, Javier M. Mora-merchan and C. Leon, “Multimediadata processing and delivery in wireless sensor networks,” InTech, 2010.
[4] L. W. Chew, L.-M. Ang, and K. P. Seng, “Survey of image compressionalgorithms in wireless sensor networks,” in Information Technology,
2008. ITSim 2008. International Symposium on, vol. 4, Aug. 2008, pp.1 –9.
[5] S. Soro and W. Heinzelman, “A survey of visual sensor networks,” Hindawi Advances in Multimedia, vol. 2009, 2009.
254247247
7/30/2019 Image Processing Capabilities of ARM-Based
http://slidepdf.com/reader/full/image-processing-capabilities-of-arm-based 6/6
(a) Time elapsed during the convolution process for differentresolutions with 3×3 kernel (Unoptimized Code)
(b) Time elapsed during the convolution process for differentresolutions with 3×3 kernel (Optimized Code)
Fig. 4. Performance Comparison of ARM7TDMI-S and ARM920T in terms of processing power
(a) Time elapsed during the convolution process for differentresolutions with 5×5 kernel (Unoptimized Code)
(b) Time elapsed during the convolution process for differentresolutions with 5×5 kernel (Optimized Code)
Fig. 5. Performance Comparison of ARM7TDMI-S and ARM920T in terms of processing power
(a) Time elapsed during the convolution process for differentresolutions with 7×7 kernel (Unoptimized Code)
(b) Time elapsed during the convolution process for differentresolutions with 7×7 kernel (Optimized Code)
Fig. 6. Performance Comparison of ARM7TDMI-S and ARM920T in terms of processing power
[6] R. Zilan, J. M. Barcelo-Ordinas, and B. Tavli, “Wireless systems andmobility in next generation internet.” Berlin, Heidelberg: Springer-Verlag, 2008, ch. Image Recognition Traffic Patterns for WirelessMultimedia Sensor Networks, pp. 49–59.
[7] I. Akyildiz, T. Melodia, and K. Chowdhury, “Wireless multimedia sensornetworks: Applications and testbeds,” Proceedings of the IEEE , vol. 96,no. 10, pp. 1588–1605, Oct. 2008.
[8] “ARM Website,” http://www.arm.com/.
[9] M. G. Benjamin and D. Kaeli, “Stream image processing on a dual-core
embedded system,” in Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation,ser. SAMOS’07, 2007, pp. 149–158.
[10] S. Kyo, S. Okazaki, and T. Arai, “An integrated memory array processorarchitecture for embedded image recognition systems,” in Proceedings
of the 32nd annual international symposium on Computer Architecture,ser. ISCA ’05, 2005, pp. 134–145.
[11] D. McIntire, K. Ho, B. Yip, A. Singh, W. Wu, and W. J. Kaiser,“The low power energy aware processing (leap)embedded networkedsensor system,” in Proceedings of the 5th international conference on
Information processing in sensor networks, ser. IPSN ’06, 2006, pp.449–457.
[12] M. Bramberger, J. Brunner, B. Rinner, and H. Schwabach, “Realtimevideo analysis on an embedded smart camera for traffic surveillance,”in RealTime and Embedded Technology and Applications Symposium,
2004. Proceedings. RTAS 2004. 10th IEEE , may 2004, pp. 174 – 181.
[13] A. Rowe, C. Rosenberg, and I. Nourbakhsh, “A low cost embeddedcolor vision system,” in Intelligent Robots and Systems, 2002. IEEE/RSJ
International Conference on, vol. 1, 2002, pp. 208 – 213 vol.1.
[14] S. McBader and P. Lee, “An fpga implementation of a flexible, parallelimage processing architecture suitable for embedded vision systems,”
in Proceedings of the 17th International Symposium on Parallel and Distributed Processing, ser. IPDPS ’03, 2003, pp. 228.1–.
[15] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach,“Distributed embedded smart cameras for surveillance applications,”Computer , vol. 39, no. 2, pp. 68 – 75, feb. 2006.
[16] R. C. Gonzalez and R. E. Woods, Digital Image Processing. PrenticeHall, 2002.
[17] “SAM7-P256 Development Board for AT91SAM7S256 ARM7TDMI-SMicro-controller,” http://www.olimex.com/dev/index.html.
[18] “AT91SAM7S256 Datasheet,” http://www.atmel.com/.
[19] “FriendlyARM - ARM based Development Boards and Modules,”http://www.friendlyarm.net/.
255248248