robust video transmission over wireless networks using ...robust video transmission over wireless...

Robust Video Transmission over Wireless

Networks Using Cross Layer Approach

Vinod B Durdi, P. T. Kulkarni, and K. L. Sudha Dayananda Sagar College of Engineering Bangalore, India

Email: [email protected], [email protected], [email protected]

Abstract—Video coding is controlled by coding algorithms

and limited in a frame or group of frames within a sequence,

without considering channel status condition. Besides the

conventional coding algorithm, we need to develop methods

to take care of resource allocation in cooperative wireless a

sensor networks (CWSNs). The most challenging part of the

CWSNs is the quality of service i.e. guaranteed multimedia

content delivery. This paper mainly focuses on the

generation of various frames from the video stream and

their robust transmission over wireless network using cross-

layer approach.

Index Terms—CWSNs, energy efficiency, video coding, I P

B frames, cross-layer

I. INTRODUCTION

With the increased popularity of cooperative wireless

sensor networks (CWSN) in the military, commercial and

home applications, quality driven secured transmission of

multimedia data such as image or video signal has

become a critical challenge. In order to provide secured

quality of transmission, some sensitive data need to be

analyzed and secured before transmission in CWSN. The

joint optimization of multimedia Quality content

protection and communication energy efficiency has to be

addressed.

In video signal temporal and spatial correlation

parameter plays significant role in the reduction of the

transmission energy along with the widely used

compression techniques MPEG 1, MPEG 3, H.261,

H.263; H.264 etc where energy is critical parameter.

For text or binary data protection various algorithms

such as DES, RSA, IDEA, AES etc have been proposed

and widely used.

But these algorithms cannot be used in the video signal

as it is, since it comprise of large data. In [1] S. Lian et. al

described the video encryption algorithms in which a

complete encryption algorithm encrypts raw video

directly or encrypts compressed data, where as partial

encryption algorithm combines encryption process with

the compression process and encrypts video partially or

selectively. These algorithms use the DCT coefficients,

motion vector, and variable length code (VLC) to provide

the authentication.

Manuscript received Feburary 19, 2013; revised April 10, 2013.

However, issues of improving the energy efficiency in

cooperative wireless sensor network and joint design of

energy-distortion-encryption schemes are still not

considered. Delivery of reliable, high quality, multimedia

data in CWSN is still at early stage of research.

To cope with this an Unequal Error Protection (UEP)

technique has been proposed for image data by

considering the image position and value diversity in

early research [2]. This research can be extended to the

video signal by considering the temporal correlation and

dependency of video frames by cross layer approach

between the application and physical layer so that the

critical issue of wireless sensor network can be addressed.

Rest of the paper is organized as follows. Section II

briefly describes the I P B frame and the contents of I P B

frame. In section III, the sensitive contents of the I P B

frame and its scope in future work is discussed.

Simulation model is described in section IV. Results and

discussions are presented in section V. Finally, the

conclusions are drawn in section VI.

II. I P B FRAME CONTENT

There are three kinds of frames in the compressed

video stream namely I frame, B frame, and P frame.

These frames are freely arranged by the encoder in the

form: I P B B P B B as shown in the Fig. 1.

Figure 1. The structure video frames

Intra-predicted (I) frames are encoded block by block

without any temporal prediction regard to the previous or

future frames. The encoder takes the DCT of the 8X8

blocks. These blocks are transformed into its frequency

based representation. Since the most of the information is

available at the lower order the encoder eliminates the

higher order terms by quantization process. I frames

encoding techniques use the JPEG encoding scheme. The

motion estimation reference of the encoded I frame is

used for other forward and bidirectional predicted frames.

97

Journal of Industrial and Intelligent Information Vol. 1, No. 2, June 2013

©2013 Engineering and Technology Publishingdoi: 10.12720/jiii.1.2.97-101

Forward-predicted (P) frames are encoded using

motion prediction from the most recent previous I or P

frames in the sequence. Each macro block is matched

with a similar region in I or P frame. The difference

macro block together with the motion vector is encoded

and transmitted.

Bidirectional predicted (B) frames are encoded using

interpolated motion prediction between the previous I or

P frames and the next I or P frame in the sequence.

Similar to the P frames the motion estimation and

encoding procedure are same for B frames.

The frame pattern consists of one I frame followed by

P or B frames called as Group of Picture (GOP). The

number of P and B fames till the next I frames refers to

the frame pattern length. The P frames has no encoding

dependency if the GOP ends with the P frame. Similarly

if the GOP ends with the B frame then encoding of that B

frame requires access to the first frame of the next GOP

in the sequence.

Access point of the image data is provided by the I

frame codes as they don’t depend on any other frames. I

frame also the beginning of the decoder. P frames are

predicted based on I and P frames which are compressed

effectively. Similarly the B frames are compressed

greatly and cannot provide reference to other frames. As I

frames are beginning of the decoder which is more

important than any other frames. Also the encryption I

frames greatly influence the P and B frames. Compared

to B/P frames I frames have longer frame lengths and

residue DCT efficiency to be encrypted.

In order to protect the video information both the

texture information and motion information must be

encrypted. According to Advanced Video coding (AVC)

the produced data stream includes the synchronization,

intra macro block information and inter macro block

information.

The synchronization information refers to the

sequence_picture, slice_syntax. Whereas intra macro

block information includes the macro block type, coded

block pattern, intra prediction mode and residue data.

Similarly the inter macro block includes the macro block

type, coded block pattern, inter prediction mode, motion

vector difference and residue data. In this the key

problem is the selection of these parameters to be

protected so that the critical issue of CWSN is dealt

which is explained in next section.

III. SENSITIVE CONTENT OF FRAME AND ISSUES IN

CWSN

According to references [3] and [4] it is not suitable to

encrypt the synchronization information. Similarly from

references [5] and [6] it is not secured to encrypt only the

residue data and not secured to encrypt only the intra

prediction mode. Therefore both the intra prediction

mode and residue data must be encrypted which will be

considered for wireless sensor networks in our simulation

mode.

To provide the joint optimization of multimedia quality,

content protection and communication energy efficiency

is a challenging task in CWSN. An Unequal Error

Protection (UEP) technique was proposed for image data

by considering the image position and value diversity. In

references [7] and [8] the image authentication was

proposed for single image. These methods can be

extended to the video signal by considering the temporal

correlation and dependency of the video frames. For

improving the efficiency of encryption and the network

resource allocation in wireless sensor network the motion

estimation/compensation and residue coding from the

video compression technique is considered. The method

is proposed which will not compromise the security

information of wireless sensor network by considering

the crucial structure. Rather than reducing the key length

or algorithm complexity the encryption cost is lowered by

the reduced encryption workload.

The first critical issue in CWSN is to minimize the

extra encryption dependency on certain important

codewords since the dependency is obtained during

compression and the selective encryption. To check the

dependency of video frames a frame level priority list is

calculated to generate the encryption priority. For cost

efficiency consideration only important codewords in the

high priority video frames are encrypted when the limited

encryption budget is available even though it is desirable

to encrypt all the codewords of I and P frames. A similar

resource allocation problem suggested in references [9]

and [10] can be modeled in CWSN for transmission of

single video frame to determine the energy distortion

performance. In a closed form the both the frame delivery

energy consumption and frame level packet loss rate are

related to resource allocation parameter. Since it is hard

to focus on many of resource allocation parameter with

the limited computational resource at wireless sensor

nodes, the transmitting power is identified as the

dominating resource allocation parameter at the physical

layer.

For a single video frame with length L, the total energy

consumption can be expressed as

E= (Pt+Pr) ((L+Loverhead) /2Bw) (1)

and the transmit power at a certain desirable bit rate and

certain channel condition is estimated as follows from

references [9]-[11]

Pt= (2N0Bw)/(A) (erfc-1

(2e)) 2

(2)

Based on these methods the simulation model has been

proposed for joint optimization of multimedia quality,

content protection and communication energy efficiency

in cooperative wireless sensor network.

IV. SIMULATION MODEL

In the simulation model of CWSN the video signal is

given as the input to the H.264 encoder. At the encoder

the video signal is compressed to generate the I P B

frames. This compressed data is fed as the input to the

prioritize block. Based on the priority the security and the

energy levels are assigned at the application layer as well

as at the physical layer on cross layer approach so that the

important information could not be lost.

98


©2013 Engineering and Technology Publishing

In our work we simulated the generation of the I P B

frames and shown in the result how the two decoded

frames are correlated with each other which can be used

for reduced data transmission. The priority assignment

and the cross layer approach are considered in the future

part of research. Next section explains formats taken in

the simulation.

Video frame format can be compressed by the video

compression standard. It is known that prior to

compression and transmission the captured video is

converted to intermediate formats. There are many

intermediate formats are available such as common

intermediate formats (CIF), 4CIF, QCIF, SQCIF etc. The

choice of the format depends on the particular application.

Since our research is related to robust video transmission

over wireless networks QCIF format is chosen as it is

popular for mobile multimedia applications.

The reasons for choosing the H.264 for simulation are

as follows:

Compared to other MPEG 1, 2, 4, H-261 and H.263

the H.264 has got higher robustness, coding efficiency,

and syntax specifications. In addition to this, the H.264

will support wide range of application like video

broadcasting, video streaming, video conferencing, D,

cinema etc The H.264 is supported by most of the

networks.

V. RESULTS AND DISCUSSIONS

Some simulations were conducted to evaluate the

performance of the proposed scheme. We performed

simulation on the well known video sequence

Foreman_qcif with 176X144 QCIF format. The video

signal is encoded with the H.264 coding technique at the

frame rate of 24 frames per second, with bit rate of

100kbps. The baseline profile is used for the encoding

purpose.

The Fig. 2 is showing the original foreman QCIF video

signal. With this video signal H.264 encoding is

performed for compression purpose.

Figure 2. The original video signal

The output of the H.264 has generated many I P frames

of which only few images of the frames are included in

the paper since it is difficult to include all the images of

the frames. The Fig. 3, Fig. 4, Fig. 5 and Fig. 6 shows the

first, second, fortieth and fiftieth frames respectively after

decoding the encoded data.

From the Fig. 3 and Fig. 4, it is observed that the video

sequences contain high temporal redundancy (i e the

successive frames within the sequence are very similar).

The movement in the scene is due to the differences

between successive frames and this can often be closely

approximated as linear movement. This linear movement

can be coded.

Figure 3. First frame of the original signal

Figure 4. The second frame of original signal

Figure 5. The fortieth frame of the original video signal

Figure 6. Fiftieth frame of the original video signal

Subtracting the gray converted frame first in Fig. 3 and

the gray converted second frame in Fig. 4 pixel by pixel

resulted the Fig. 7. In the figure gray represents a

difference of zero and the white and black represents a

positive and negative difference respectively in the figure.

Most of the difference frame is gray (i e. the difference

between pixel value is zero). The nonzero areas are

around the moving figures in the scene.

Figure 7. The difference between the first frame and second frame

99



It is also observed from the Fig. 3 i.e. first frame and

Fig. 6 i.e. the fiftieth frame video sequence contain the

low temporal reference (i e frames within the sequence

are less similar). Subtracting the gray converted first

frame in Fig. 3 and the gray converted second frame in

Fig. 6 pixel by pixel resulted the Fig. 8.

Figure 8. The difference between the first frame and fiftieth frame

From the Fig. 7 and Fig. 8 it is observed that the

successive frames within video sequence are similar

compared with the other frames in the video sequence.

Rather than encoding the frame itself the difference

between current frame and the previous frame shown in

Fig. 7 is encoded so that the amount of information

transmitted will drastically reduced.

Figure 9. The decoded first frame at the receiver

Motion prediction further achieves the compression.

The pixel in current frame in each block is compared with

the previous frame in the same block. The offset between

the two blocks is known as motion vector. Motion vector

along with encoded error between the current block and

the similar block in the previous frame is transmitted for

the block.

At the receiver the original block shown in Fig. 10 can

be recreated by decoding the error block in the Fig. 7 and

adding this to the offset block in the previous

reconstructed frame shown in Fig. 9.

Figure 10. The decoded second frame at the receiver

VI. CONCLUSION

From this simulation result, we observed that rather

than encoding the frame itself, the difference between

current frame and the previous frame is encoded so that

the amount of information transmitted will be drastically

reduced. The frames can be properly analyzed to assign

the security codes. Since energy is the crucial parameter

in the cooperative wireless sensor networks, the resource

allocated may be controlled using cross layer design

based on the priority.

ACKNOWLEDGMENT

Authors wish to thank their colleagues for their

valuable input and many discussions on the topic.

REFERENCES

[1] S. Lian, Z. Liu, Z. Ren, and H. Wang, “Secure advanced video

coding based on selective encryption algorithms,” IEEE Trans.

Consumer Electronics, vol. 52, no. 2, pp. 621–629, May 2006. [2] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy

constrained distortion reduction optimization for wavelet-based

coded image transmission in wireless sensor networks,” IEEE Trans. Multimedia, vol. 10, no. 6, pp. 1169–1180, Oct. 2008.

[3] C. Shi, S. Wang, and B. Bhargava, “MPEG video encryption in real-time using secret key cryptography,” in Proc. Int. Conf.

Parallel and Distributed Processing Techniques and Applications

Las Vegas, Nevada, 1999, pp. 2822-2828. [4] F. Chiaraluce, L. Ciccarelli, E. Gambi, P. Pierleoni, and M.

Reginelli, “A new chaotic algorithm for video encryption,” IEEE Trans. on Consumer Electronics, vol. 48, no. 4, pp. 838-844, Nov

2002.

[5] L. Tang, “Methods for encrypting and decrypting MPEG video data efficiently,” in Proc. 4th ACM International Conf

96Multimedia’, Boston, Nov. 1996, pp. 219-229. [6] C. Shi and B. Bhargava, “A fast MPEG video encryption

algorithm,” in Proc. 6th ACM International Conf 98Multimedia,

Bristol, UK, Sept. 1998, pp. 81-88,. [7] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “An

adaptive approach for image encryption and secure transmission over multirate wireless sensor networks,” in Wireless

Communications and Mobile Computing, New York, Wiley, vol. 9,

no. 3, Mar.2009， pp. 383–393.

[8] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy-

constrained quality optimization for secure image transmission in wireless sensor networks,” Advances in Multimedia. New York:

Hindawi, vol. 2007, no. 1, Jan. 2007, pp. 1169-1180.

[9] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Matching stream authentication and resource allocation to

multimedia codec dependency with position-value partitioning in

wireless multimedia sensor networks,” in Proc. IEEE Tranc.

Multimedia Covers the Breadth of Research in Multimedia

Technology, Dec. 2009. [10] C. Schurgers, O. Aberthorae, and M. B. Srivastava, “Modulation

scaling for energy aware communication systems,” in Proc. Int. Symposium on Low Power Electronics and Design, Aug. 2001, pp,

96-99.

[11] W. Stallings, Data and Computer Communications, 7th ed. Upper Saddle River, NJ: Prentice-Hall, 2000, pp. 85–86.

Vinod B Durdi received BE (Electronics and Communication) from Karnataka University,

Dharwad, in 2000. He obtained M.Tech (Digital Communication) from Visvesvaraya Technological

University (VTU), Belgaum, in 2003. He is

presently pursuing Ph.D form Visvesvaraya Technological University (VTU), Belgaum, India.

His research interest includes video processing,

wireless networks, network security and chaos

communication. He has authored and coauthored the various research

papers in major national and international conference proceeding. He is

100



currently Associate Professor in Dayananda Sagar College of Engineering, Bangalore, India.

Prahlad Tirumalrao Kulkarni received the B.E. in electronics and communication engineering

from Karnataka University. He obtained M.Tech.,

and Ph.D. in electronics and electrical communication engineering from Indian Institute

of Technology, Kharagpur, in 1988 and 1998,

respectively. His research interests include Wireless networks, Scheduling, Routing,

Reliability, Optical networks. He is currently Principal of Pune Institute of Computer Technology. He has served as Technical session chair in

International conferences abroad. He also served as visiting Professor in

CNIT, Scuola Superiore Sant’ Anna, Pisa, Italy and Chonbuk National University, S. Korea. He is senior member IEEE.

K.L.Sudha, presently working as professor department, Dayananda Sagar College of

Engineering, Bangalore, India has 16 years of teaching experience in Engineering Colleges. She

obtained her Bachelor’s degree in electronics

engineering from Mysore University and Masters from Bangalore University. She got Ph.D for her

work on “Detection of FH CDMA signals in time varying channel” from Osmania University, Hyderabad. She has

published more than 15 research papers in national/ international

journals and conferences. Her research interests are in Wireless communication, Coding theory, image processing and chaotic theory.

101



robust video transmission over wireless networks using ...robust video transmission over wireless...

Documents