robust video transmission over wireless networks using ...robust video transmission over wireless...
TRANSCRIPT
Robust Video Transmission over Wireless
Networks Using Cross Layer Approach
Vinod B Durdi, P. T. Kulkarni, and K. L. Sudha Dayananda Sagar College of Engineering Bangalore, India
Email: [email protected], [email protected], [email protected]
Abstract—Video coding is controlled by coding algorithms
and limited in a frame or group of frames within a sequence,
without considering channel status condition. Besides the
conventional coding algorithm, we need to develop methods
to take care of resource allocation in cooperative wireless a
sensor networks (CWSNs). The most challenging part of the
CWSNs is the quality of service i.e. guaranteed multimedia
content delivery. This paper mainly focuses on the
generation of various frames from the video stream and
their robust transmission over wireless network using cross-
layer approach.
Index Terms—CWSNs, energy efficiency, video coding, I P
B frames, cross-layer
I. INTRODUCTION
With the increased popularity of cooperative wireless
sensor networks (CWSN) in the military, commercial and
home applications, quality driven secured transmission of
multimedia data such as image or video signal has
become a critical challenge. In order to provide secured
quality of transmission, some sensitive data need to be
analyzed and secured before transmission in CWSN. The
joint optimization of multimedia Quality content
protection and communication energy efficiency has to be
addressed.
In video signal temporal and spatial correlation
parameter plays significant role in the reduction of the
transmission energy along with the widely used
compression techniques MPEG 1, MPEG 3, H.261,
H.263; H.264 etc where energy is critical parameter.
For text or binary data protection various algorithms
such as DES, RSA, IDEA, AES etc have been proposed
and widely used.
But these algorithms cannot be used in the video signal
as it is, since it comprise of large data. In [1] S. Lian et. al
described the video encryption algorithms in which a
complete encryption algorithm encrypts raw video
directly or encrypts compressed data, where as partial
encryption algorithm combines encryption process with
the compression process and encrypts video partially or
selectively. These algorithms use the DCT coefficients,
motion vector, and variable length code (VLC) to provide
the authentication.
Manuscript received Feburary 19, 2013; revised April 10, 2013.
However, issues of improving the energy efficiency in
cooperative wireless sensor network and joint design of
energy-distortion-encryption schemes are still not
considered. Delivery of reliable, high quality, multimedia
data in CWSN is still at early stage of research.
To cope with this an Unequal Error Protection (UEP)
technique has been proposed for image data by
considering the image position and value diversity in
early research [2]. This research can be extended to the
video signal by considering the temporal correlation and
dependency of video frames by cross layer approach
between the application and physical layer so that the
critical issue of wireless sensor network can be addressed.
Rest of the paper is organized as follows. Section II
briefly describes the I P B frame and the contents of I P B
frame. In section III, the sensitive contents of the I P B
frame and its scope in future work is discussed.
Simulation model is described in section IV. Results and
discussions are presented in section V. Finally, the
conclusions are drawn in section VI.
II. I P B FRAME CONTENT
There are three kinds of frames in the compressed
video stream namely I frame, B frame, and P frame.
These frames are freely arranged by the encoder in the
form: I P B B P B B as shown in the Fig. 1.
Figure 1. The structure video frames
Intra-predicted (I) frames are encoded block by block
without any temporal prediction regard to the previous or
future frames. The encoder takes the DCT of the 8X8
blocks. These blocks are transformed into its frequency
based representation. Since the most of the information is
available at the lower order the encoder eliminates the
higher order terms by quantization process. I frames
encoding techniques use the JPEG encoding scheme. The
motion estimation reference of the encoded I frame is
used for other forward and bidirectional predicted frames.
97
Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013
©2013 Engineering and Technology Publishingdoi: 10.12720/jiii.1.2.97-101
Forward-predicted (P) frames are encoded using
motion prediction from the most recent previous I or P
frames in the sequence. Each macro block is matched
with a similar region in I or P frame. The difference
macro block together with the motion vector is encoded
and transmitted.
Bidirectional predicted (B) frames are encoded using
interpolated motion prediction between the previous I or
P frames and the next I or P frame in the sequence.
Similar to the P frames the motion estimation and
encoding procedure are same for B frames.
The frame pattern consists of one I frame followed by
P or B frames called as Group of Picture (GOP). The
number of P and B fames till the next I frames refers to
the frame pattern length. The P frames has no encoding
dependency if the GOP ends with the P frame. Similarly
if the GOP ends with the B frame then encoding of that B
frame requires access to the first frame of the next GOP
in the sequence.
Access point of the image data is provided by the I
frame codes as they don’t depend on any other frames. I
frame also the beginning of the decoder. P frames are
predicted based on I and P frames which are compressed
effectively. Similarly the B frames are compressed
greatly and cannot provide reference to other frames. As I
frames are beginning of the decoder which is more
important than any other frames. Also the encryption I
frames greatly influence the P and B frames. Compared
to B/P frames I frames have longer frame lengths and
residue DCT efficiency to be encrypted.
In order to protect the video information both the
texture information and motion information must be
encrypted. According to Advanced Video coding (AVC)
the produced data stream includes the synchronization,
intra macro block information and inter macro block
information.
The synchronization information refers to the
sequence_picture, slice_syntax. Whereas intra macro
block information includes the macro block type, coded
block pattern, intra prediction mode and residue data.
Similarly the inter macro block includes the macro block
type, coded block pattern, inter prediction mode, motion
vector difference and residue data. In this the key
problem is the selection of these parameters to be
protected so that the critical issue of CWSN is dealt
which is explained in next section.
III. SENSITIVE CONTENT OF FRAME AND ISSUES IN
CWSN
According to references [3] and [4] it is not suitable to
encrypt the synchronization information. Similarly from
references [5] and [6] it is not secured to encrypt only the
residue data and not secured to encrypt only the intra
prediction mode. Therefore both the intra prediction
mode and residue data must be encrypted which will be
considered for wireless sensor networks in our simulation
mode.
To provide the joint optimization of multimedia quality,
content protection and communication energy efficiency
is a challenging task in CWSN. An Unequal Error
Protection (UEP) technique was proposed for image data
by considering the image position and value diversity. In
references [7] and [8] the image authentication was
proposed for single image. These methods can be
extended to the video signal by considering the temporal
correlation and dependency of the video frames. For
improving the efficiency of encryption and the network
resource allocation in wireless sensor network the motion
estimation/compensation and residue coding from the
video compression technique is considered. The method
is proposed which will not compromise the security
information of wireless sensor network by considering
the crucial structure. Rather than reducing the key length
or algorithm complexity the encryption cost is lowered by
the reduced encryption workload.
The first critical issue in CWSN is to minimize the
extra encryption dependency on certain important
codewords since the dependency is obtained during
compression and the selective encryption. To check the
dependency of video frames a frame level priority list is
calculated to generate the encryption priority. For cost
efficiency consideration only important codewords in the
high priority video frames are encrypted when the limited
encryption budget is available even though it is desirable
to encrypt all the codewords of I and P frames. A similar
resource allocation problem suggested in references [9]
and [10] can be modeled in CWSN for transmission of
single video frame to determine the energy distortion
performance. In a closed form the both the frame delivery
energy consumption and frame level packet loss rate are
related to resource allocation parameter. Since it is hard
to focus on many of resource allocation parameter with
the limited computational resource at wireless sensor
nodes, the transmitting power is identified as the
dominating resource allocation parameter at the physical
layer.
For a single video frame with length L, the total energy
consumption can be expressed as
E= (Pt+Pr) ((L+Loverhead) /2Bw) (1)
and the transmit power at a certain desirable bit rate and
certain channel condition is estimated as follows from
references [9]-[11]
Pt= (2N0Bw)/(A) (erfc-1
(2e)) 2
(2)
Based on these methods the simulation model has been
proposed for joint optimization of multimedia quality,
content protection and communication energy efficiency
in cooperative wireless sensor network.
IV. SIMULATION MODEL
In the simulation model of CWSN the video signal is
given as the input to the H.264 encoder. At the encoder
the video signal is compressed to generate the I P B
frames. This compressed data is fed as the input to the
prioritize block. Based on the priority the security and the
energy levels are assigned at the application layer as well
as at the physical layer on cross layer approach so that the
important information could not be lost.
98
Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013
©2013 Engineering and Technology Publishing
In our work we simulated the generation of the I P B
frames and shown in the result how the two decoded
frames are correlated with each other which can be used
for reduced data transmission. The priority assignment
and the cross layer approach are considered in the future
part of research. Next section explains formats taken in
the simulation.
Video frame format can be compressed by the video
compression standard. It is known that prior to
compression and transmission the captured video is
converted to intermediate formats. There are many
intermediate formats are available such as common
intermediate formats (CIF), 4CIF, QCIF, SQCIF etc. The
choice of the format depends on the particular application.
Since our research is related to robust video transmission
over wireless networks QCIF format is chosen as it is
popular for mobile multimedia applications.
The reasons for choosing the H.264 for simulation are
as follows:
Compared to other MPEG 1, 2, 4, H-261 and H.263
the H.264 has got higher robustness, coding efficiency,
and syntax specifications. In addition to this, the H.264
will support wide range of application like video
broadcasting, video streaming, video conferencing, D,
cinema etc The H.264 is supported by most of the
networks.
V. RESULTS AND DISCUSSIONS
Some simulations were conducted to evaluate the
performance of the proposed scheme. We performed
simulation on the well known video sequence
Foreman_qcif with 176X144 QCIF format. The video
signal is encoded with the H.264 coding technique at the
frame rate of 24 frames per second, with bit rate of
100kbps. The baseline profile is used for the encoding
purpose.
The Fig. 2 is showing the original foreman QCIF video
signal. With this video signal H.264 encoding is
performed for compression purpose.
Figure 2. The original video signal
The output of the H.264 has generated many I P frames
of which only few images of the frames are included in
the paper since it is difficult to include all the images of
the frames. The Fig. 3, Fig. 4, Fig. 5 and Fig. 6 shows the
first, second, fortieth and fiftieth frames respectively after
decoding the encoded data.
From the Fig. 3 and Fig. 4, it is observed that the video
sequences contain high temporal redundancy (i e the
successive frames within the sequence are very similar).
The movement in the scene is due to the differences
between successive frames and this can often be closely
approximated as linear movement. This linear movement
can be coded.
Figure 3. First frame of the original signal
Figure 4. The second frame of original signal
Figure 5. The fortieth frame of the original video signal
Figure 6. Fiftieth frame of the original video signal
Subtracting the gray converted frame first in Fig. 3 and
the gray converted second frame in Fig. 4 pixel by pixel
resulted the Fig. 7. In the figure gray represents a
difference of zero and the white and black represents a
positive and negative difference respectively in the figure.
Most of the difference frame is gray (i e. the difference
between pixel value is zero). The nonzero areas are
around the moving figures in the scene.
Figure 7. The difference between the first frame and second frame
99
Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013
©2013 Engineering and Technology Publishing
It is also observed from the Fig. 3 i.e. first frame and
Fig. 6 i.e. the fiftieth frame video sequence contain the
low temporal reference (i e frames within the sequence
are less similar). Subtracting the gray converted first
frame in Fig. 3 and the gray converted second frame in
Fig. 6 pixel by pixel resulted the Fig. 8.
Figure 8. The difference between the first frame and fiftieth frame
From the Fig. 7 and Fig. 8 it is observed that the
successive frames within video sequence are similar
compared with the other frames in the video sequence.
Rather than encoding the frame itself the difference
between current frame and the previous frame shown in
Fig. 7 is encoded so that the amount of information
transmitted will drastically reduced.
Figure 9. The decoded first frame at the receiver
Motion prediction further achieves the compression.
The pixel in current frame in each block is compared with
the previous frame in the same block. The offset between
the two blocks is known as motion vector. Motion vector
along with encoded error between the current block and
the similar block in the previous frame is transmitted for
the block.
At the receiver the original block shown in Fig. 10 can
be recreated by decoding the error block in the Fig. 7 and
adding this to the offset block in the previous
reconstructed frame shown in Fig. 9.
Figure 10. The decoded second frame at the receiver
VI. CONCLUSION
From this simulation result, we observed that rather
than encoding the frame itself, the difference between
current frame and the previous frame is encoded so that
the amount of information transmitted will be drastically
reduced. The frames can be properly analyzed to assign
the security codes. Since energy is the crucial parameter
in the cooperative wireless sensor networks, the resource
allocated may be controlled using cross layer design
based on the priority.
ACKNOWLEDGMENT
Authors wish to thank their colleagues for their
valuable input and many discussions on the topic.
REFERENCES
[1] S. Lian, Z. Liu, Z. Ren, and H. Wang, “Secure advanced video
coding based on selective encryption algorithms,” IEEE Trans.
Consumer Electronics, vol. 52, no. 2, pp. 621–629, May 2006. [2] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy
constrained distortion reduction optimization for wavelet-based
coded image transmission in wireless sensor networks,” IEEE Trans. Multimedia, vol. 10, no. 6, pp. 1169–1180, Oct. 2008.
[3] C. Shi, S. Wang, and B. Bhargava, “MPEG video encryption in real-time using secret key cryptography,” in Proc. Int. Conf.
Parallel and Distributed Processing Techniques and Applications
Las Vegas, Nevada, 1999, pp. 2822-2828. [4] F. Chiaraluce, L. Ciccarelli, E. Gambi, P. Pierleoni, and M.
Reginelli, “A new chaotic algorithm for video encryption,” IEEE Trans. on Consumer Electronics, vol. 48, no. 4, pp. 838-844, Nov
2002.
[5] L. Tang, “Methods for encrypting and decrypting MPEG video data efficiently,” in Proc. 4th ACM International Conf
96Multimedia’, Boston, Nov. 1996, pp. 219-229. [6] C. Shi and B. Bhargava, “A fast MPEG video encryption
algorithm,” in Proc. 6th ACM International Conf 98Multimedia,
Bristol, UK, Sept. 1998, pp. 81-88,. [7] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “An
adaptive approach for image encryption and secure transmission over multirate wireless sensor networks,” in Wireless
Communications and Mobile Computing, New York, Wiley, vol. 9,
no. 3, Mar.2009, pp. 383–393.
[8] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy-
constrained quality optimization for secure image transmission in wireless sensor networks,” Advances in Multimedia. New York:
Hindawi, vol. 2007, no. 1, Jan. 2007, pp. 1169-1180.
[9] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Matching stream authentication and resource allocation to
multimedia codec dependency with position-value partitioning in
wireless multimedia sensor networks,” in Proc. IEEE Tranc.
Multimedia Covers the Breadth of Research in Multimedia
Technology, Dec. 2009. [10] C. Schurgers, O. Aberthorae, and M. B. Srivastava, “Modulation
scaling for energy aware communication systems,” in Proc. Int. Symposium on Low Power Electronics and Design, Aug. 2001, pp,
96-99.
[11] W. Stallings, Data and Computer Communications, 7th ed. Upper Saddle River, NJ: Prentice-Hall, 2000, pp. 85–86.
Vinod B Durdi received BE (Electronics and Communication) from Karnataka University,
Dharwad, in 2000. He obtained M.Tech (Digital Communication) from Visvesvaraya Technological
University (VTU), Belgaum, in 2003. He is
presently pursuing Ph.D form Visvesvaraya Technological University (VTU), Belgaum, India.
His research interest includes video processing,
wireless networks, network security and chaos
communication. He has authored and coauthored the various research
papers in major national and international conference proceeding. He is
100
Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013
©2013 Engineering and Technology Publishing
currently Associate Professor in Dayananda Sagar College of Engineering, Bangalore, India.
Prahlad Tirumalrao Kulkarni received the B.E. in electronics and communication engineering
from Karnataka University. He obtained M.Tech.,
and Ph.D. in electronics and electrical communication engineering from Indian Institute
of Technology, Kharagpur, in 1988 and 1998,
respectively. His research interests include Wireless networks, Scheduling, Routing,
Reliability, Optical networks. He is currently Principal of Pune Institute of Computer Technology. He has served as Technical session chair in
International conferences abroad. He also served as visiting Professor in
CNIT, Scuola Superiore Sant’ Anna, Pisa, Italy and Chonbuk National University, S. Korea. He is senior member IEEE.
K.L.Sudha, presently working as professor department, Dayananda Sagar College of
Engineering, Bangalore, India has 16 years of teaching experience in Engineering Colleges. She
obtained her Bachelor’s degree in electronics
engineering from Mysore University and Masters from Bangalore University. She got Ph.D for her
work on “Detection of FH CDMA signals in time varying channel” from Osmania University, Hyderabad. She has
published more than 15 research papers in national/ international
journals and conferences. Her research interests are in Wireless communication, Coding theory, image processing and chaotic theory.
101
Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013
©2013 Engineering and Technology Publishing