evaluation of coap implementations for live streaming using … · 2019-04-15 · evaluation of...

Evaluation of CoAP Implementations for LiveStreaming using CoAP-Observe

Arne Wall, Hannes Raddatz, Bala Vikram Reddy Gopu, Dirk TimmermannUniversity of Rostock

Institute of Applied Microelectronics and Computer Engineering18051 Rostock, Germany, Tel.: +49 381 498-7271

Email: [email protected]

Abstract—CoAP (Constrained Application Protocol) enablesembedded devices to offer RESTful Web Services and exchangesmall binary message headers. Next to the transmission of sensordata and control information between smart home devices, thereis an application field of streaming data using CoAP. There areCoAP implementations available in Java, that allow an executionon heterogeneous devices and operating systems due to the JavaVirtual Machine and Java Runtime Environment. Because ofthe timing demands of live streaming applications in terms oflow latency communication at application layer, we evaluate twodifferent CoAP Java implementations. We compare the timingfluctuations of the packet processing with a CoAP C implemen-tation. As a conclusion we identify the Java Garbage Collector tocause large fluctuations of the packet processing, which resultsin bad suitability for a live streaming scenario. Furthermore,we recommend the usage of native code applications because ofbetter timing results.

I. INTRODUCTION

The Constrained Application Protocol (CoAP) [1] gainedan increasing popularity in the field of Machine-to-Machine(M2M) communication of resource constrained devices. Itenables devices to send application data with a small messageoverhead, due to the binary application header format and alightweight congestion control of UDP packets over IPv4 andIPv6 networks. The application data contains, e.g., sensor dataor control messages for actuators. Next to small data sizes,CoAP can be used for streaming application data at a high datarate. There are two principles to deliver streams, the CoAPblock-wise transfer and the CoAP-Observe [2] mechanism.One concept using the block-wise transfer presented by [3]is evaluated with a ns-3 simulation regarding the suitability ofthe congestion control mechanism. The authors of [4] showa suitability of the CoAP congestion control mechanism tostream block-wise data over lossy networks using a test bed.In [5], another streaming approach based on CoAP-Observe isintroduced. However, no practical evaluation on this conceptand different CoAP implementations exists so far. In this paperwe implement this streaming mechanism using three differentCoAP implementations written in Java and C and evaluatethem in terms of timing behaviour.

II. TECHNOLOGICAL BASIS

A. CoAP

CoAP is a protocol for M2M communication based on theserver/client principle. The server offers a RESTful (Repre-

sentational State Transfer) Web Service that can be invokedby CoAP clients. Every resource of the Web Service can berequested with the CRUD-operations GET, PUT, POST andDELETE, so that there is a compatibility to RESTful APIs thatuse HTTP. CoAP defines to set the application layer payload to1024 bytes so that there is no IP packet fragmentation on theMAC layer, e.g., of Ethernet (1500 bytes MTU). There are twobasic mechanisms enabling the transmission of high data ratesusing CoAP: One is the block-wise transfer that allows theserver to send large data chunks packed into multiple CoAPmessages that require one IP packet each. Another feature ofCoAP is the asynchronous notification of a client through theCoAP-Observe mechanism. The client subscribes to resourcesand receives notifications about changes of, e.g., a temperaturesensor. Next to small rates of sensor data it is possible to sendapplication data at a higher rate like video streams.

B. Streaming using CoAP Block-wise Transfer

In [4] the authors present DASCo, a modification of thestreaming protocol MPEG DASH [6]. To receive a video, aMPEG DASH client requests meta data of available videoformats and qualities from the server. Next, the client willrequest video segments of a few seconds length via HTTP GETon a resource, which is extracted from the meta information.MPEG DASH allows a client to request a different videoquality depending on the quality of the transmission channel.Usually Video on Demand services are requested to deliverlow video quality segments first, because of the optimized userexperience, which means a seamless start of the stream afterrequesting it via a Web GUI. DASCo substitutes HTTP throughCoAP which entails the substitution of TCP through UDP.The authors show that while using a modified re-transmissiontimeout of the application-layer congestion control of CoAP acomparable or better performance can be achieved.

C. Streaming using CoAP-Observe

The principle of sending asynchronous notifications isexploited by the mechanism published by the patent [5]. Thebasic principle relies on the CoAP-Observe mechanism thatenables clients to subscribe asynchronous events of a resource.In their proposed scheme the client observes a video resourcewhich triggers the server to send the chunks of the videostream to the subscriber. We implemented this mechanismusing different CoAP stacks. In this paper, we do an analysisof the timing behaviour of the CoAP Java implementations978-1-5386-4980-0/19/$31.00 c©2019 IEEE

477

Californium [7], jCoAP [8] and libcoap [9] (C implementation)on a test bed of different embedded devices.

III. TEST SETUP

In our test setup (Figure 1) we make use of two devicesrunning a server and a client application. A client sends anobserve request to a resource representing the live stream.The server starts to send chunks to the client immediately.The data source is implemented with a separate C applicationthat periodically produces data of a fixed size. Via a pipe thetest data is sent to the server application. In all test cases weimplemented server and client with the same CoAP implemen-tation (Californium, jCoAP and libcoap). To achieve significantexperimental results we execute the CoAP implementations ondifferent hardware platforms. Therefore, we make use of theRaspberry Pi 1 [10] as a single core system, the ZedBoard[11] as a dual core system and the Raspberry Pi 3 as a quadcore system. Both devices, server and client, are connectedwith a wired 1 Gbit/s Ethernet switch, because we do notyet investigate the transmission via an unreliable channel. Weevaluate the processing time of each CoAP implementation ondifferent hardware. The JRE 1.8 build 171 was used for allJava applications on all devices to get comparable results. Wetake five time stamps for each message: Time t1 indicates thatdata from standard input (coming from the test data source) isavailable. t2 is the time point when the server has created thecomplete CoAP message right before sending it via the UDPsocket. t3 is taken after returning from the socket. On clientside the time t4 is taken when raw data is received by theUDP socket. After parsing the CoAP message and extractingthe payload t5 is measured.

Testdata

t1

CoAP-Server

t2 t3

stdin

CoAP-Client

t4 t5

1Gbit Ethernet

Fig. 1: Test Setup

IV. EVALUATION

The first test case we created, describes a test data streamof a fixed data rate with a fixed chunk size. Therefore, wemeasured the creation of CoAP messages on the server andthe parsing time of the client application with 10,000 messageseach. Another experiment was used to estimate the latency andits fluctuation which is influenced by the garbage collector ofthe JVM.

A. Packet Processing at Fixed Data Rate and Chunk Size

First, we illustrate the duration to build a packet (Figure2), to return from the socket (Figure 4) and to read a packet(Figure 5) from client side for jCoAP and Californium (Cf).We send test data at a fixed transmission interval of 100 msand 10 ms on all platforms. In Figure 2 there is a comparison

between the packet build time using jCoAP and Cf on threedifferent platforms at a transmission cycle of 100 ms and10 ms. In both cases and implementations, the performanceof the Raspberry Pi 3 and the ZedBoard are comparable.The Raspberry Pi 1 has a much lower performance than theother platforms. This results in an increased packet build time

0

200

400

600

800

1000

1200

1400

1600

1800

Pi1 Pi3 ZedBoard

Tim

e [µ

s]

Packet Build Time

jCoAP 100ms

jCoAP 10ms

Californium 100ms

Californium 10ms

Fig. 2: Build Time for each Packet created by the Server at100 ms and 10 ms Interval

on server side when an application with a higher throughputdemand is executed. Building a packet with jCoAP withina 100 ms interval on the Pi 1 takes approximately 514µscompared to 1655µs within an 10 ms transmission interval.The same increased computational time can be observed whenCf is used to send packets. The build time increases from763µs at an 100 ms transmission interval to approximately240 ms at an 10 ms interval. The large execution time of 240 msis not displayed in Figure 2 due to better readability. Thosenonlinear effects occur on all platforms at a certain thresholdof the data rate to be sent.

0 2000 4000 6000 8000Message Number

0

50000

100000

150000

200000

250000

Build

Tim

e of

a P

acke

t in

micr

osec

onds

Fig. 3: Increasing Packet Build Time of Californium at 1 msInterval on Raspberry Pi 3

In Figure 3 we depict exemplarily the increasing packetbuild time on a Raspberry Pi 3 while executing the Cfimplementation. There is a threshold of a certain data ratethat leads to a maximum load of the machine, which can bemeasured in an increasing packet build time. After sending

478

10,000 test messages every 1 ms, the build time continuouslyincreases to 250 ms. We can observe that a higher rate resultsin a slightly decreased processing time. But only of the datarate is within the maximum threshold. Another observation isthat the average execution time to build a packet is smaller ifjCoAP is used compared to Cf.

0

50

100

150

200

250

300

350

400

Pi1 Pi3 ZedBoard

Tim

e [µ

s]

Socket Time

jCoAP 100ms

jCoAP 10ms

Californium 100ms

Californium 10ms

Fig. 4: Time to Return from Server Socket at 100 ms and 10 msInterval

In Figure 4 it can be seen that returning from the socketis quicker than building and parsing a packet. There is aninternal thread in jCoAP and Cf that reads messages from aqueue and sends them over the socket. Therefore, the sockettime in Figure 4 is the theoretical limit for a data transmissioninterval.

0

100

200

300

400

500

600

Pi1 Pi3 ZedBoard

Tim

e [µ

s]

Packet Read Time

jCoAP 100ms

jCoAP 10ms

Californium 100ms

Californium 10ms

Fig. 5: Time to Read a Packet on the Client at 100 ms and10 ms Interval

Figure 5 shows the time to read a packet on client side.Due to the overload situation at a 10 ms transmission intervalthere is no valid result for Pi 1. We can observe that Cf hasa slower packet processing compared to jCoAP independentof the transmission rate. A statistical evaluation shows thatall timings of Cf are slower than the timings of jCoAP witha confidence interval of larger than 99,9%. Comparing thecreation and parsing times of a single packet the parsingprocedure is significantly faster than the creation of a message.

Next to the applications, where the timings are determinedby the underlying JVM, we investigated the C-implementationlibcoap. Running native code is much faster than the variantswith a JVM.

0

2

4

6

8

10

12

14

16

18

Pi1 Pi3 ZedBoard

Tim

e [µ

s]

Packet Build Time

Libcoap 100ms

Libcoap 10ms

Fig. 6: Build Time for each Packet created by the libcoapServer at 100 ms and 10 ms Interval

In Figure 6, we can see that operating at a higher trans-mission rate results in a slightly lower packet build time. Weassume that this effect is caused by a higher probability that thecode, which is executed at a higher rate, is stored in the cache.This assumption needs further investigation. Furthermore, wecan observe that the platforms Pi 3 and ZedBoard create CoAPmessages 3-5 times faster than the Pi 1.

0

10

20

30

40

50

60

Pi1 Pi3 ZedBoard

Tim

e [µ

s]

Packet Read Time

Libcoap 100ms

Libcoap 10ms

Fig. 7: Time to Read a Packet by the libcoap Client at 100 msand 10 ms Interval

Figure 7 shows the average time to read a message. Thesame speed difference between the platforms and test cases canbe observed. While using libcoap the parsing time is slowerthan the build time of a single message, which is the oppositebehavior compared to the Java applications.

In Table I we show the sum of building and parsing timeof a single message. We can observe that all values for theJava implementations on all platforms range in the values ofa few hundred µs. Only Cf on the Pi 1 is at 1332µs. The

479

TABLE I: Overall Processing Time [µs] of Server and Clientin microseconds at 100 ms Interval

Implementation Raspberry Pi 1 Raspberry Pi 3 ZedBoard

jCoAP 702 201 197

Cf 1332 377 360

libcoap 69 17 18

TABLE II: Overall Processing Time [µs] of Server and Clientat 10 ms Interval

Implementation Raspberry Pi 1 Raspberry Pi 3 ZedBoard

jCoAP 1846 177 174

Cf 258929 451 318

libcoap 60 15 15

same streaming experiments are done to investigate the CoAPC-implementation libcoap. The results are ten times fastercompared to the Java implementations. On a Pi 1 it takes 69µsto create and parse a message, on a Pi 3 and ZedBoard it takes17µs and 18µs. Table II lists the overall timings measuredusing an application that requires to send a message every10 ms. On the Pi 1 an increase of the time using jCoAP canbe observed. As discussed in Figure 2 the overload situationresults in large computational times if Cf is used on Pi 1.Native code (libcoap) on a Pi 1 is executed much faster at60µs overall processing time, where no limitations occur at thisdata rate. The platforms Pi 3 and ZedBoard can also handle atransmission interval of 10 ms without any non-linear increaseof the processing time. Using libcoap on those platforms, wemeasured the same processing time of 15µs.

B. Fluctuation of Latency

Next to the average timings we evaluated the jitter. Runningthe Java implementations, some messages are delayed by ahuge factor. For comparable results all applications (imple-mented with jCoAP and Cf) are executed on two ZedBoards.

0 2000 4000 6000 8000Message Number

102

103

104

Pack

et B

uild

Tim

e (S

erve

r) [

s]

Fig. 8: Fluctuation of Packet Build Time on ZedBoard usingCalifornium

Figure 8 shows the build time of the messages sent by theserver. Therefore, we measured the timings of 9000 messages.There are a lot of spikes which indicate large time values

for single messages. Those needles don not occur for a burstof messages. It can be seen that after 1500 messages theprocessing time drops, which can be explained due to the runtime compiler which optimizes the code after 1500 calls. Thisbehavior can be also observed in all other Java measurements.

0 2000 4000 6000 8000 10000Message Number

102

103

104

Pack

et B

uild

Tim

e (S

erve

r) [

s]

Fig. 9: Fluctuation of Packet Build Time on ZedBoard usingjCoAP

On client side there are less needles regardless of the imple-mentation (Figures 10, 11). Comparing both implementationswith each other, we observe that jCoAP shows less needles(Figures 8, 9). The high timing values for single messages canbe explained by the garbage collector of the underlying JVM.Two types of garbage collection are known [12], which resultin different values of delay.

0 2000 4000 6000 8000Message Number

102

103

104

Pack

et P

arsin

g Ti

me

(Clie

nt) [

s]

Fig. 10: Fluctuation of Packet Read Time on ZedBoard usingCalifornium

We also measured the round trip time between two Pi 3running jCoAP to estimate the latency and show the influenceof the garbage collector (Figure 12). There, the CoAP clientsends requests of 1024 bytes to the server, which respondswith an echo message. After an echo is received by the clienta new request is sent. The average round trip time is at 1.7ms,but there are single message round trip times that exceed thisvalue by a factor of 10. 650 messages took longer than 3 ms,56 messages where slower than 10 ms and 8 messages out ofthe set of 50,000 messages had a round trip time slower than

480

0 2000 4000 6000 8000 10000Message Number

102

103

104Pa

cket

Par

sing

Tim

e (C

lient

) [s]

Fig. 11: Fluctuation of Packet Read Time on ZedBoard usingjCoAP

0 10000 20000 30000 40000 50000Message Number

0

100000

200000

300000

400000

500000

600000

RTT

in m

icros

econ

ds

Fig. 12: Round Trip Time between Californium Echo-Serverand Cf Client on Raspberry Pi 3

100 ms. Especially for streaming applications with demands forlow latency, such as audio and video streaming, this sporadichigh latency needs to be buffered for the decoding unit. Thelarge buffering of a couple of 100 ms data leads to unacceptabledelays for live streaming applications.

V. RELATED WORK

In [8], there is a performance analysis of jCoAP whichconsiders the round trip time of a CoAP client that requestsan echo resource of a CoAP server. The authors set differentsizes of the echo resource to determine the influence of variablepayload sizes. In contrast to our work, sending asynchronousnotifications of 1024 bytes, payload sizes from 16 to 160bytes via GET requests/responses are tested. Furthermore, theauthors measured each payload size for 100 times. In ourstreaming scenario we measured over 10,000 messages withthe effect that the run time compiler of Java optimizes thecode, so that after 1500 there is a significant drop of the packetprocessing time. The analyses of [8] also confirmed a fasterpacket processing of jCoAP compared to Californium. Alsoa high latency on a non-real-time capable JVM was observeddue to the Java Garbage Collector. The use of a real time JVMcan improve these high time values [8].

A deeper investigation of the Garbage Collector inducedinterruptions of Java applications is done in [12]. The authorsmeasured the RTT between a server-client application basedon a medical DPWS stack (Devices Profile for Web Services),that was executed on an Intel Galileo Board and RaspberryPi Version 1 - 3. They also observed two different kinds ofgarbage collections, the small and the large garbage collection.Next to this effect an RTT measurement over time has showna significant drop in RTT after 1500 messages when the runtime optimization of the JVM has been executed. In additionto our work the authors evaluated the ”-Xcomp” flag of theJVM to force the compilation at start time. The use is notrecommended because of highly increased start times rangingfrom 6 s to 13 s.

VI. CONCLUSION

The use of Java as a programming language and executionenvironment enables the deployment of applications to hetero-geneous devices. There are JVMs for a range of architecturesand operating systems available. Developers of distributed,e.g., smart home applications can use CoAP RESTful serversand clients that are implemented as open source software. Weanalyzed the performance of the CoAP Java implementationCalifornium to create a point-to-point streaming applicationusing the asynchronous notification mechanism of CoAP. Dueto high fluctuations of the packet processing within the Javaapplication, large message queues have to buffer messages.The queues have to buffer up to 1s of incoming streamingdata which results in an unacceptable latency of live streamingapplications. The memory management of the JVM has toclean the heap because of memory allocations, which is alsotime consuming, for every single message. Similar qualitativeresults where observed using the CoAP Java implementationjCoAP, although the quantitative values of the packet pro-cessing where faster compared to Californium. In contrast tothe Java implementations, we measured the packet processingtimes of the C implementation libcoap. Running native code onall platforms leads to much lower (factor 10) packet processingtimes. Furthermore, the fluctuations of these timings are muchsmaller even on a non-real-time Linux kernel and the achievedthroughput is much higher compared to the Java applications(approx. 10s Mbit/s vs. 100s kbit/s). For timing sensitive ap-plications like audio/video streaming we can conclude to usenative code implementations.

ACKNOWLEDGEMENT

We would like to thank the German Federal Institute forResearch on building, Urban Affairs and Spatial Development(BBSR) within the Federal Office for building and RegionalPlanning for their support in this project. This work is partiallygranted by BBSR within the scope of project ”Zukunft Bau”.

REFERENCES

[1] Z. Shelby, K. Hartke, and C. Bormann, “The Constrained ApplicationProtocol (CoAP),” RFC 7252, Jun. 2014. [Online]. Available:https://rfc-editor.org/rfc/rfc7252.txt

[2] K. Hartke, “Observing Resources in the Constrained ApplicationProtocol (CoAP),” RFC 7641, Sep. 2015. [Online]. Available:https://rfc-editor.org/rfc/rfc7641.txt

481

[3] G. Choi, D. Kim, and I. Yeom, “Efficient streaming over CoAP,”in IEEE 2016 International Conference on Information Networking(ICOIN). IEEE, 2016).

[4] P. Krawiec, M. Sosnowski, J. Mongay Batalla, C. X. Mavromoustakis,and G. Mastorakis, “Dasco: dynamic adaptive streaming over coap,”Multimedia Tools and Applications, vol. 77, no. 4, pp. 4641–4660, Feb2018. [Online]. Available: https://doi.org/10.1007/s11042-017-4854-z

[5] O. Diaz, S. Loreto, and H. Back, “Method and server for sending adata stream to a client and method and client for receiving a datastream from a server,” Apr. 11 2017, uS Patent 9,621,617. [Online].Available: https://www.google.com/patents/US9621617

[6] I. Sodagar, “The mpeg-dash standard for multimedia streaming over theinternet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, April 2011.

[7] “Web Page: Californium Repository.” [Online]. Available: https://github.com/eclipse/californium

[8] B. Konieczek, M. Rethfeldt, F. Golatowski, and D. Timmermann,“Real-time communication for the internet of things using jcoap,” in2015 IEEE 18th International Symposium on Real-Time DistributedComputing, April 2015, pp. 134–141.

[9] “Web Page: Libcoap Repository.” [Online]. Available: https://github.com/obgm/libcoap

[10] Raspberry Pi Foundation, “Webpage: Raspberry Pi Documenta-tion.” [Online]. Available: http://www.raspberrypi.org/documentation/hardware/README.md

[11] Digilent Inc., “Board Photo (ZedBoard),” 2013. [Online]. Available:http://www.zedboard.org/content/board-photo

[12] M. Kasparick, B. Beichler, B. Konieczek, A. Besting, M. Rethfeldt,F. Golatowski, and D. Timmermann, “Measuring latencies of ieee 11073compliant service-oriented medical device stacks,” in IECON 2017 -43rd Annual Conference of the IEEE Industrial Electronics Society,Oct 2017, pp. 8640–8647.

482

evaluation of coap implementations for live streaming using … · 2019-04-15 · evaluation of...

Documents