[ieee 2009 picture coding symposium (pcs) - chicago, il, usa (2009.05.6-2009.05.8)] 2009 picture...

4
PRACTICAL DISTRIBUTED VIDEO CODING OVER VISUAL SENSORS Rami Halloush, Kiran Misra, and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University E-mail: {hallous1, misrakir, radha}@egr.msu.edu ABSTRACT In this paper we describe a practical implementation of a distributed video codec deployed on a real visual sensor platform, viz. the MicaZ/Cyclops platform. The codec supports two encoding schemes, one employs Discrete Cosine Transform (DCT) and the other operates on raw pixels. DCT scheme is more costly in terms of computational power consumption, on the other hand it leads to more compression and hence less transmission power consumption. At the same time the DCT scheme may not necessarily achieve minimal overall power consumption (computation and transmission). In this paper we show that the choice of either scheme (DCT or pixel based) depends on the tolerable distortion and power consumption. Results show that for a range of achievable video quality values the DCT scheme demonstrates less overall power consumption. On the other hand for another range of video quality the pixel scheme results in less overall power consumption. Keywords: Distributed Video Coding (DVC), Wyner- Ziv, DCT. 1. INTRODUCTION Visual sensor networks consist of resource constrained nodes (motes). Motes are usually battery operated, hence to achieve an acceptable mote lifetime the overall power consumption needs to be minimized. There are two main sources of power consumption in a mote: (a) computational processing and (b) data transmission. To reduce the transmission power consumption, sensor data has to be compressed. Meanwhile the compression engine has to be of low computational complexity. Distributed Video Coding (DVC) [1, 2, 3, 4, 5] is well suited for this type of applications. With DVC the burden of computations is shifted to the decoder side leaving source node(s) with lower computational duties. At the same time a fairly good compression performance is achieved with DVC. In DVC there is a tradeoff between computation and transmission power consumption. While a computational intensive scheme, such as DCT, is costly in terms of computational power consumption, it achieves significant compression hence transmission power saving. On the other hand a less complex encoding scheme, such as a pixel based codec is less costly in terms of computational power consumption but more costly in terms of transmission power. Interestingly as we will see in this paper, although DCT is efficient in terms of transmission power consumption (compared to pixel based scheme) it may not be more efficient in terms of overall power consumption (computation and transmission) beyond a particular achieved level of video quality. In other words pixel based scheme can be more efficient in terms of overall power consumption than DCT for higher video quality. Consequently, for minimal power consumption, the choice between the two schemes (DCT and pixel based) depends on the target quality. In visual sensor applications power consumption is dominated by transmission [6]. For example in a MicaZ mote [10], the cost of one cycle of processing is 3 nJ whereas the cost of sending one bit is 252 nJ [7]. Consequently, the less transmission power consuming scheme (DCT) might seem more convenient for such applications. In this paper we show that this is not always the case. In this paper we analyze the two sources of power consumption (transmission and computation) when different video quality levels are targeted. We show that for some quality levels, the overall power saving of DCT is significant. On the other hand, at other target quality levels, the pixel based scheme can achieve more significant power savings. For the power analysis we use a practical implementation of a distributed video codec to compare the two different schemes (DCT and pixel based). Our codec supports different operating modes with different underlying computation as well as transmission power costs. In

Upload: hayder

Post on 07-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture Coding Symposium - Practical Distributed Video Coding over visual sensors

PRACTICAL DISTRIBUTED VIDEO CODING OVER VISUAL SENSORS

Rami Halloush, Kiran Misra, and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University

E-mail: {hallous1, misrakir, radha}@egr.msu.edu

ABSTRACT

In this paper we describe a practical implementation of a distributed video codec deployed on a real visual sensor platform, viz. the MicaZ/Cyclops platform. The codec supports two encoding schemes, one employs Discrete Cosine Transform (DCT) and the other operates on raw pixels. DCT scheme is more costly in terms of computational power consumption, on the other hand it leads to more compression and hence less transmission power consumption. At the same time the DCT scheme may not necessarily achieve minimal overall power consumption (computation and transmission). In this paper we show that the choice of either scheme (DCT or pixel based) depends on the tolerable distortion and power consumption. Results show that for a range of achievable video quality values the DCT scheme demonstrates less overall power consumption. On the other hand for another range of video quality the pixel scheme results in less overall power consumption.

Keywords: Distributed Video Coding (DVC), Wyner-Ziv, DCT.

1. INTRODUCTION

Visual sensor networks consist of resource constrained nodes (motes). Motes are usually battery operated, hence to achieve an acceptable mote lifetime the overall power consumption needs to be minimized. There are two main sources of power consumption in a mote: (a) computational processing and (b) data transmission. To reduce the transmission power consumption, sensor data has to be compressed. Meanwhile the compression engine has to be of low computational complexity. Distributed Video Coding (DVC) [1, 2, 3, 4, 5] is well suited for this type of applications. With DVC the burden of computations is shifted to the decoder side leaving source node(s) with lower computational duties. At the same time a fairly good compression performance is achieved with DVC.

In DVC there is a tradeoff between computation and transmission power consumption. While a computational intensive scheme, such as DCT, is costly in terms of computational power consumption, it achieves significant compression hence transmission power saving. On the other hand a less complex encoding scheme, such as a pixel based codec is less costly in terms of computational power consumption but more costly in terms of transmission power. Interestingly as we will see in this paper, although DCT is efficient in terms of transmission power consumption (compared to pixel based scheme) it may not be more efficient in terms of overall power consumption (computation and transmission) beyond a particular achieved level of video quality. In other words pixel based scheme can be more efficient in terms of overall power consumption than DCT for higher video quality. Consequently, for minimal power consumption, the choice between the two schemes (DCT and pixel based) depends on the target quality. In visual sensor applications power consumption is dominated by transmission [6]. For example in a MicaZ mote [10], the cost of one cycle of processing is 3 nJ whereas the cost of sending one bit is 252 nJ [7]. Consequently, the less transmission power consuming scheme (DCT) might seem more convenient for such applications. In this paper we show that this is not always the case. In this paper we analyze the two sources of power consumption (transmission and computation) when different video quality levels are targeted. We show that for some quality levels, the overall power saving of DCT is significant. On the other hand, at other target quality levels, the pixel based scheme can achieve more significant power savings. For the power analysis we use a practical implementation of a distributed video codec to compare the two different schemes (DCT and pixel based). Our codec supports different operating modes with different underlying computation as well as transmission power costs. In

Page 2: [IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture Coding Symposium - Practical Distributed Video Coding over visual sensors

particular, it operates on raw pixels or DCT coefficients. In both cases quantizers of different step sizes can be used. Furthermore, quantized values can be represented via Gray codes or regular binary numbers. We deploy our codec on a real visual sensor platform, namely the Cyclops/MicaZ [10, 11] module, where we evaluate the transmission/computation power consumption under different operation modes. This results in a power-distortion evaluation of the different schemes. The rest of the paper is organized as follows. In Sec. 2 we describe our DVC codec. In Sec. 3 we describe our approach for evaluating computation and transmission power consumption. In Sec. 4 we evaluate the performance of the different encoding schemes in terms of PSNR scores and power consumption. Finally in Sec. 5 we outline key conclusions of this study.

2. DVC CODECTo encode a video sequence, even numbered frames will be used as side information frames (key frames). On the other hand, odd numbered frames are Wyner-Ziv encoded [2]. The following is a description of our Wyner-Ziv pixel and DCT based codecs.

2.1 PIXEL DVC

In this scheme, a Wyner-Ziv frame is merely a frame of pixels. To encode a Wyner-Ziv frame, a uniform quantizer of nM 2� intervals is used to quantize pixel values. Our encoder is flexible in that a quantized pixel can have either regular binary representation or the corresponding Gray code, in either case the resulting frame of quantized pixels is decomposed into n bit-planes. Each bit-plane is fed to a Low-Density Parity-Check Accumulate (LDPCA) [8] coder. As LDPCA is rate adaptive, each bit-plane will be encoded into a syndrome whose length depends on the statistical dependence between the bit-plane being encoded ( iX )and the bit-plane of the same significance in the key frame ( iY ) ( i ranges from 1 to 8 and it represents the significance of the bit-plane). In our system this dependence is evaluated at the encoder simply by computing the conditional entropy

)|( ii YXH (1)

Equation 1 implies that the encoder has to keep key frames stored in memory for future use. Resulting syndrome is packetized and sent to the decoder. When decoder receives a syndrome, it figures out the rate of the used LDPCA code by observing the size of the

syndrome. It uses a bipartite graph [8] that corresponds to that code for decoding. Bits of side information bit-plane are fed to variable nodes of the graph, and syndrome bits are fed to check nodes. Iterative belief propagation algorithm causes bits at variable nodes to flip back and forth until they meet parity checks dictated by syndrome bits. At that point bits at variable nodes are declared as the decoded bit-plane. After decoding the n bit-planes pertaining to a quantized pixel frame, they are combined together to form an approximation to the encoded one (approximation as LDPCA decoding may incur a certain degree of bit error rate). The last step is to reconstruct the original non quantized pixel values. We use minimum mean-square error estimation [2]

]'|[' qpEp � (2)

Where 'p is the reconstructed pixel value, p is the original pixel value, 'q is the observed quantized pixel value, and .]|[.E is the conditional expectation. The reconstruction process requires having a joint p.m.f (probability mass functions) of p and 'q . This is attained by having a training set of pixel frames quantized using different quantizers.

2.2 DCT DVC

In this scheme, a Wyner-Ziv frame is a frame of quantized DCT coefficients. First a pixel frame is DCT transformed using the 4� 4 integer DCT specified by the H.264 standard [9]. We choose to work with this transform as it is completely implemented using integer arithmetic which saves computation power. Then transform coefficients are quantized via a quantizer of nM 2� intervals as described in [9]. Quantized coefficients can be represented by Gray codes or regular binary numbering, in both cases the resulting frame of quantized coefficients is decomposed into n bit-planes which are encoded and decoded in the same manner as in the pixel case. Decoded bit-planes are combined and the approximated quantizer indices are rescaled and inverse transformed as specified in [9].

3. POWER COMPUTATION

Measuring power consumption requires implementing a real codec on a real platform. Our encoder is deployed on a Cyclops camera module [11] that sits on top of a MicaZ [10] mote. It is worth to mention that Cyclops is responsible of capturing and encoding frames, whereas MicaZ is responsible for

Page 3: [IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture Coding Symposium - Practical Distributed Video Coding over visual sensors

communicating encoded frames to the decoder. MicaZ has a battery module that supplies both itself and Cyclops with power. As power consumption will affect the mote’s battery voltage level, our approach in evaluating power consumption will be based on monitoring battery’s voltage level during the course of the encoding and transmission process.

Transmission power evaluation: The camera is turned off, a 64� 64 frame that is permanently stored in memory is transmitted to the decoder as a set of packets (152 packets each of which consists of 12 Bytes header and 27 bytes payload). Prior to sending the frame’s last packet, battery voltage level is sampled and embedded in the payload of that packet. The receiver assembles received packets in one frame, and it extracts the voltage level related to that frame. The experiment starts with voltage level of 3154 mV and the frame is repeatedly transmitted until voltage level reaches 2770 mV . The receiver keeps track of all received frames and corresponding power levels during this voltage range. When the experiment terminates the average cost is computed by dividing the voltage decrease (384 mV ) over the total number of transmitted bytes logged at the receiver side. Figure 1 shows the power level decay as more frames are sent. As can be noted in the figure, the experiment terminates after sending 4960 frames, this means that the average cost is 77.42 frameV /� which implies 13.06 bytenV / .

Camera power evaluation: The above experiment is repeated but instead of repeatedly sending a permanently stored frame, a live frame is captured by

the camera and sent. Note that this experiment will evaluate the average cost of capturing and transmittinga frame. So, by subtracting off the transmission cost evaluated in the previous experiment we get camera cost. Figure 1 shows the decay of voltage level as more frames are captured and transmitted. Camera cost is evaluated as 8.87 frameV /� .

DCT power evaluation: The camera power experiment is extended so that after capturing a frame it is DCT transformed before being sent. This experiment evaluates the average cost of capturing, transforming and transmitting a frame. So, when subtracting off the cost of frame capturing and transmission evaluated in the previous experiment, we get average DCT cost. Figure 2 shows the result of this experiment, average DCT cost is evaluated as 8.81 frameV /� .

DVC power evaluation: the camera power experiment is extended so that the captured frame is DVC encoded (decomposed into bit-planes that are LDPCA encoded). The cost result of the camera power experiment is subtracted from the result of this experiment to yield DVC average cost. Figure 2 shows the result of this experiment. DVC average cost is evaluated as 21.27 frameV /� . Note that DVC cost is the same when using Gray codes, as Gray coding adds small overhead to the bit-plane decomposition step. Power costs are listed in Table 1.

4. EVALUATION AND RESULTS For evaluating the power cost of the DCT and pixel based encoding schemes we collect a video sequence of 100 frames captured by the Cyclops camera. Frames are grayscale of size 64� 64. Temporally consecutive frames are highly correlated, which is a vital condition for efficient DVC. Our codec is used to Wyner-Ziv encode odd numbered frames whereas even frames are used as side information. Each point in Figure 3 shows number of bytes that result from encoding the 50

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

2800

2850

2900

2950

3000

3050

3100

3150

Frame Number

Vol

tage

Lev

el (m

V)

Capture and TransmitTransmission Only

Figure 1. Battery voltage decay due to (a) video capture by camera along with transmission and (b) due to transmission alone.

Table 1 Cost of transmission and computation operations. Operation Voltage Decrease

Camera 8.87 frameV /�Transmission 0.5093 packetV /�

DCT 8.81 frameV /�DVC 21.27 frameV /�

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

2800

2850

2900

2950

3000

3050

3100

3150

Frame Number

Vol

tage

Lev

el (m

V)

Capture and TransmitDCTDVC

Figure 2. Battery voltage decay due to (a) video capture by camera along with transmission and (b) due to DCT and DVC algorithms.

Page 4: [IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture Coding Symposium - Practical Distributed Video Coding over visual sensors

Wyner-Ziv frames using both DCT and pixel encoding schemes along with different quantizers (quantizers with 256, 128, 64, 32 and 16 quantization intervals are used, heavier quantization results in smaller number of bytes and PSNR values in Figure 3) with and without Gray codes. Note from the figure that when Gray coding is not used, DCT results in more data compression compared to the pixel scheme. This is also valid when Gray coding is used. As we are more interested in comparing between the two schemes in terms of the overall power consumption at different video quality levels, we use data in Table 1 to map points in Figure 3 to those in Figure 4. The overall power consumption at each point has three components: (1) The cost of capturing and applying DVC to the 50 Wyner-Ziv frames; this component is common between DCT and pixel based schemes. (2) The cost of transmitting the encoded frames; this is different across the two schemes and it depends on the compression performance of each. (3) The cost of applying DCT; this is found only in the DCT case. In Figure 4, for the case of no Gray coding, we note that for lower video quality values (below 34 dB) DCT demonstrates lower overall power consumption. In this case DCT achieves significant saving in transmission power such that the overall power (transmission, and computation) is still less than that in the pixel case. On the other hand Figure 4 shows that for PSNR values greater than 34 dB, pixel based scheme demonstrates lower overall power consumption. In this case the transmission power saving of DCT is not significant enough to keep the overall power consumption less than that of the pixel case. The same justification applies for the case of Gray coding.

5. CONCLUSION

In this paper we showed that although DCT is efficient in transmission power saving, it may not be the best option for overall power savings. Results presented in this paper indicated that for higher video quality the

efficiency of DCT in transmission power saving is not significant enough to keep it more efficient than the pixel based scheme. We evaluate the power distortion performance of our system by implementing a DVC coder and deploying it on the (Cyclops/MicaZ) visual sensor platform.

REFERENCES[1] S. S. Pradhan and K. Ramchandran, “Distributed source

coding using syndromes (DISCUS): Design and construction,” in Proc. IEEE Data Compression Conference, Snowbird, UT, Mar. 1999, pp. 158 –167.

[2] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed Video Coding,” IEEE Special Issues on Advances in Video

[3] K. Misra, S. Karande, and H. Radha, “Multi-Hypothesis Based. Distributed Video Coding using LDPC Codes,” Allerton Conference on Communication, Control, and Computing, Sept. 2005.

[4] I. H. Tseng and A. Ortega, “Rate-distortion Analysis and Bit Allocation Strategy for Motion Estimation at the Decodec Using Maximum Likelihood Technique in Distributed Video Coding,” In Proc. of IEEE International Conference on Image Processing (ICIP07), San Antonio, TX, Sep., 2007.

[5] A. Liveris, Z. Xiong, C. Georghiades, “Compression of binary sources with side information at the decodec using LDPC codes,” IEEE Commun. Lett. 6 (10) (2002) 440–442.

[6] Xing G, Lu C, Zhang Y, Huang Q, Pless R. “Minimum power configuration in wireless sensor networks,” In Proceedings of the 6th ACM international symposium on Mobile Ad hoc networking and computing (MobiHoc), 2005.

[7] Polastre J., Szewczyk R., Culler D., “Telos: Enabling Ultra-Low Power Wireless Research”, The Fourth International Conference on Information Proc. in Sensor Networks: Special track on Platform Tools and Design Methods for Network Embedded Sensors, 2005.

[8] D. Varodayan, A. Aaron, B. Girod, “Rate-Adaptive Codes for Distributed Source Coding”, Asilomar Conference on Signals, Systems, and Compuers, Pacific Grove, California, 2005.

[9] “H.264/MPEG-4 Part 10: Transform and Quantization” White paper at http://www.vcodex.com

[10] Crossbow MicaZ Mote Specifications, “www.xbow. com”. [11] M. Rahimi, R. Baer, O. Iroezi, J. Garcia, J. Warrior, D. Estrin,

M. Srivastava, “Cyclops: in situ image sensing and interpretation in wireless sensor networks,” in: Proc. of the ACM Conf. on Embedded Networked Sensor Systems (SenSys), San Diego, CA, November 2005.

0 2 4 6 8 10 12 14

x 104

25

30

35

40

45

50

Number of Transmitted Bytes

PS

NR

(dB

)

PixelPixel, Gray CodeDCTDCT, Gray Code

Figure 3. Number of bytes that result from encoding the Wyner-Ziv frames using different schemes and the corresponding distortion.

2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.225

30

35

40

45

50

Total Power (mV)

PS

NR

(dB

)

PixelPixel, Gray CodeDCTDCT, Gray Code

Figure 4. Power-distortion performance of different encoding schemes.