a collaborative and sustainable edge-cloud architecture for … · 2020-03-25 · tracking...

10
2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEE Transactions on Sustainable Computing IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 1 A Collaborative and Sustainable Edge-Cloud Architecture for Object Tracking with Convolutional Siamese Networks Haifeng Gu, Member, IEEE , Zishuai Ge, E Cao, Mingsong Chen, Senior Member, IEEE , Tongquan Wei, Member, IEEE , Xin Fu, Member, IEEE , and Shiyan Hu, Senior Member, IEEE Abstract—Convolutional Neural Networks (CNNs) are becoming popular in Internet-of-Things (IoT) based object tracking areas, e.g., autonomous driving, commercial surveillance, and intelligent traffic management. However, due to limited processing power of embedded devices and network bandwidth, how to simultaneously guarantee fast object tracking with high accuracy and low energy consumption is still a major challenge, which makes IoT-based vision applications unreliable and unsustainable. To address this problem, this paper proposes a collaborative edge-cloud architecture that resorts to cloud for object tracking performance enhancement. By properly offloading computations to cloud and periodically checking tracking status of edge devices through convolutional Siamese networks, our novel edge-cloud architecture enables interactive collaborations between edge devices and cloud servers in order to quickly and accurately rectify tracking errors. Comprehensive experimental results on well-known video object tracking benchmarks show that our architecture can not only significantly improve the performance of object tracking, but also can save the energy consumption of edge devices. Index Terms—Object Tracking; Sustainability; Convolutional Siamese Network; Edge Computing; Collaborative Architecture. 1 INTRODUCTION The proliferation of smart Internet of Things (IoT) applications in areas such as autonomous driving, commercial surveillance, intel- ligent traffic management, augmented reality, robotics stimulates increasing demands on high performance object tracking in terms of accuracy, response time and energy efficiency [1], [2], [3]. This is mainly because: i) emerging applications such as self-driving vehicles impose extremely stringent tracking accuracy and real- time response requirements since they are safety-critical [4]; and ii) IoT-based object tracking applications are typically based on battery-powered terminal devices, which require energy-efficient image processing for prolonging uninterrupted surveillance time [5]. However, due to limited processing power and battery capacity of terminal devices close to video cameras, it is difficult to fulfill the above performance requirements in practice based on existing embedded architectures. Edge-cloud architectures [6], [7] can extend IoT processing capabilities by offloading partial computation to remote cloud servers. Therefore, they are becoming an emerging paradigm that enables efficient design of object tracking applications [8], [9], [10], [11]. However, commercial cloud computing services such as Amazon EC2, Microsoft Azure, Alibaba Aliyun are mainly hosted by regular data centers, which do not offer specific real- time object tracking services. Therefore, how to guarantee fast Haifeng Gu, Zishuai Ge, Mingsong Chen and Tongquan Wei are with the Shanghai Key Lab of Trustworthy Computing at East China Normal University, Shanghai, 200062, China (email: {hfgu, zsge, mschen}@sei.ecnu.edu.cn,[email protected]). Mingsong Chen is also with the Shanghai Institute of Intelligent Science and Technology, Tongji University. Mingsong Chen is the corresponding author. Xin Fu is with the Department of Electrical and Computer Engineering at University of Houston, Houston TX 77204, USA ([email protected]). Shiyan Hu is with the School of Computer Science and Electronic En- gineering at University of Essex. Wivenhoe Park, Colchester CO4 3SQ, United Kindom (email: [email protected]). and accurate object tracking using edge devices with limited computing capability and battery power is becoming a major challenge during the edge-cloud based object tracking design. Since Convolutional Neural Networks (CNNs) provide near human level accuracy, they have been widely used in object detection and tracking [14], [15], [16], [17], [18]. For example, as a special kind of CNNs, convolutional Siamese networks [19], [20], [21] have been successfully adopted in modern tracking benchmarks by accurately locating an exemplar image within a large search image. However, to achieve required accuracy, CNN- based object tracking involves a large quantity of computations. Due to limited computing power of edge devices, the speed of trackers hosted within edge devices will be severely compro- mised. Moreover, more CNN computations require more energy consumption. This will strongly affect the sustainability of CNN- based trackers, especially for mobile edge trackers which are driven by batteries. Although both the size of CNN features the depth of convolutional layers can be reduced to enable real-time tracking, the accuracy of edge trackers cannot be guaranteed. In an edge-cloud architecture, edge devices are geologically located closer to video cameras than cloud servers [22]. Therefore, edge devices can promptly react to the changes of tracked objects, though the accuracy cannot be guaranteed due to limited comput- ing power. On the other hand, cloud servers have strong computing power. However, although high object tracking accuracy can be achieved for investigated video frames, the inevitable network latency will prevent real-time reaction to target moving objects [23], which may cause catastrophic results for safety-critical applications. To make the tradeoff between accuracy and real- time response, this paper proposes a novel and efficient edge-cloud architecture for object tracking based on the collaboration between edge devices and cloud servers. This paper makes following three major contributions: 1) Based on Kernelized Correlation Filters (KCFs) [24] and Pearson Correlation Coefficients (PCCs), a novel Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Upload: others

Post on 28-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 1

A Collaborative and Sustainable Edge-Cloud Architecturefor Object Tracking with Convolutional Siamese Networks

Haifeng Gu, Member, IEEE , Zishuai Ge, E Cao, Mingsong Chen, Senior Member, IEEE ,Tongquan Wei, Member, IEEE , Xin Fu, Member, IEEE , and Shiyan Hu, Senior Member, IEEE

Abstract—Convolutional Neural Networks (CNNs) are becoming popular in Internet-of-Things (IoT) based object tracking areas, e.g.,autonomous driving, commercial surveillance, and intelligent traffic management. However, due to limited processing power ofembedded devices and network bandwidth, how to simultaneously guarantee fast object tracking with high accuracy and low energyconsumption is still a major challenge, which makes IoT-based vision applications unreliable and unsustainable. To address thisproblem, this paper proposes a collaborative edge-cloud architecture that resorts to cloud for object tracking performanceenhancement. By properly offloading computations to cloud and periodically checking tracking status of edge devices throughconvolutional Siamese networks, our novel edge-cloud architecture enables interactive collaborations between edge devices and cloudservers in order to quickly and accurately rectify tracking errors. Comprehensive experimental results on well-known video objecttracking benchmarks show that our architecture can not only significantly improve the performance of object tracking, but also can savethe energy consumption of edge devices.

Index Terms—Object Tracking; Sustainability; Convolutional Siamese Network; Edge Computing; Collaborative Architecture.

F

1 INTRODUCTION

The proliferation of smart Internet of Things (IoT) applications inareas such as autonomous driving, commercial surveillance, intel-ligent traffic management, augmented reality, robotics stimulatesincreasing demands on high performance object tracking in termsof accuracy, response time and energy efficiency [1], [2], [3]. Thisis mainly because: i) emerging applications such as self-drivingvehicles impose extremely stringent tracking accuracy and real-time response requirements since they are safety-critical [4]; andii) IoT-based object tracking applications are typically based onbattery-powered terminal devices, which require energy-efficientimage processing for prolonging uninterrupted surveillance time[5]. However, due to limited processing power and battery capacityof terminal devices close to video cameras, it is difficult to fulfillthe above performance requirements in practice based on existingembedded architectures.

Edge-cloud architectures [6], [7] can extend IoT processingcapabilities by offloading partial computation to remote cloudservers. Therefore, they are becoming an emerging paradigm thatenables efficient design of object tracking applications [8], [9],[10], [11]. However, commercial cloud computing services suchas Amazon EC2, Microsoft Azure, Alibaba Aliyun are mainlyhosted by regular data centers, which do not offer specific real-time object tracking services. Therefore, how to guarantee fast

• Haifeng Gu, Zishuai Ge, Mingsong Chen and Tongquan Wei arewith the Shanghai Key Lab of Trustworthy Computing at East ChinaNormal University, Shanghai, 200062, China (email: {hfgu, zsge,mschen}@sei.ecnu.edu.cn,[email protected]). Mingsong Chen is alsowith the Shanghai Institute of Intelligent Science and Technology, TongjiUniversity. Mingsong Chen is the corresponding author.

• Xin Fu is with the Department of Electrical and Computer Engineering atUniversity of Houston, Houston TX 77204, USA ([email protected]).

• Shiyan Hu is with the School of Computer Science and Electronic En-gineering at University of Essex. Wivenhoe Park, Colchester CO4 3SQ,United Kindom (email: [email protected]).

and accurate object tracking using edge devices with limitedcomputing capability and battery power is becoming a majorchallenge during the edge-cloud based object tracking design.

Since Convolutional Neural Networks (CNNs) provide nearhuman level accuracy, they have been widely used in objectdetection and tracking [14], [15], [16], [17], [18]. For example,as a special kind of CNNs, convolutional Siamese networks [19],[20], [21] have been successfully adopted in modern trackingbenchmarks by accurately locating an exemplar image within alarge search image. However, to achieve required accuracy, CNN-based object tracking involves a large quantity of computations.Due to limited computing power of edge devices, the speed oftrackers hosted within edge devices will be severely compro-mised. Moreover, more CNN computations require more energyconsumption. This will strongly affect the sustainability of CNN-based trackers, especially for mobile edge trackers which aredriven by batteries. Although both the size of CNN features thedepth of convolutional layers can be reduced to enable real-timetracking, the accuracy of edge trackers cannot be guaranteed.

In an edge-cloud architecture, edge devices are geologicallylocated closer to video cameras than cloud servers [22]. Therefore,edge devices can promptly react to the changes of tracked objects,though the accuracy cannot be guaranteed due to limited comput-ing power. On the other hand, cloud servers have strong computingpower. However, although high object tracking accuracy can beachieved for investigated video frames, the inevitable networklatency will prevent real-time reaction to target moving objects[23], which may cause catastrophic results for safety-criticalapplications. To make the tradeoff between accuracy and real-time response, this paper proposes a novel and efficient edge-cloudarchitecture for object tracking based on the collaboration betweenedge devices and cloud servers. This paper makes following threemajor contributions:

1) Based on Kernelized Correlation Filters (KCFs) [24]and Pearson Correlation Coefficients (PCCs), a novel

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 2: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 2

lightweight object tracking approach is proposed for edgedevices that enables coarse-grained similarity evaluationof tracked objects.

2) By adopting convolutional Siamese networks, we developa rectification module coupled with an evaluation modulethat can be deployed on cloud servers for accurate andfine-grained tracking of target objects.

3) We propose a collaborative mechanism between edgedevices and cloud servers for rectifying tracking errorsin an interactive manner.

Unlike traditional edge-cloud architectures that offload all thetracking workloads to cloud servers, edge devices in our approachseek help from cloud servers only as needed, thus requiring muchless network communication while both tracking accuracy andspeed can be guaranteed.

The rest of this paper is organized as follows. Section 2introduces the related work on CNN-based object detection andtracking, edge-cloud based architectures for IoT, and convolutionalSiamese networks. Section 3 details our collaborative edge-cloudarchitecture based on convolutional Siamese networks. Section 4presents the implementation details of our collaborative edge-cloud architecture for object tracking. Section 5 presents exper-imental results on two well-known benchmarks. Finally, Section 6concludes the paper.

2 RELATED WORK

By leveraging a large quantity of training data to improve pre-diction accuracy, convolutional neural networks have becomeone of the hottest techniques in the domain of object tracking[14], [15], [16], [17]. For example, Ma et al. [30] exploited richfeature hierarchies of deep CNNs trained on object recognitiondatasets to improve tracking accuracy and robustness. In [15],Nam and Han proposed a novel tracking algorithm based on CNNstrained in a multi-domain learning framework. By learning domainindependent representations from pre-training domain-specific in-formation through online learning, their approach can achievehigh-quality tracking results. Based on Deep Neural Networks(DNNs), Price et al. [31] introduced a novel method for continuousand accurate multi-robot cooperative detection and tracking. Byleveraging cooperation among robots in a team , their approach iscapable of harnessing the power of DNN-based detectors for real-time applications. In [2], Held et al. proposed a method for offlinetraining of neural networks that can track novel objects at 100 FPS.In [19], Bertinetto et al. presented the fully-convolutional Siamesenetworks that can be used for object tracking at frame-rates beyondreal-time. Although the above approaches are promising, most ofthem focus on improving tracking accuracy or speed. Few of themconsider the neural network execution on devices with limitedcomputing power.

The performance of IoT applications would be significantlyimproved by adopting edge architectures [6], [9]. For example,Mudassar et al. [8] proposed a framework that integrates taskinformation produced from a computationally expensive algorithmat the host to guide the data collection and transmission fromresource constrained edge devices. Although based on collabora-tion the quality of information processing can be enhanced, mostcomputation work in their approach is offloaded to cloud servers,which may strongly affect response time for tracking. In [13],Blanco-Filgueira et al. proposed a low-power and real-time CNN-based multiple object visual tracking method. However, they only

consider the efficient edge implementation on NVIDIA JetsonTX2 platforms. They did not consider the interactions betweenedge devices and cloud servers. In [10], Xu et al. presented asmart surveillance architecture. By leveraging the advantages ofthe edge computing paradigm, their approach can achieve thegoal of on-site and real-time human object tracking. However,their approach did not adopt CNNs for more accurate tracking.Moreover, they did not consider the collaborations between edgedevices and cloud servers for object tracking. In [12], Zhao etal. introduced an edge computing system for object trackingfor resource-constrained devices. By partitioning computation-intensive tasks such as inference onto IoT devices and edgeservers, the power consumption of IoT devices can be reduced.However, this approach requires the execution of both IoT devicesand edge servers. The communication delay caused by dynamicnetwork environment can easily affect the response time of IoTdevices.

Convolutional Siamese networks have been widely used insimilarity learning. For example, Wang and Tzanetakis [26] uti-lized CNNs in a Siamese architecture to learn audio features forcharacterizing singing style. In [27], Daudt et al. proposed twoSiamese extensions of fully convolutional networks that achievethe best results on two open change detection datasets using bothRGB and multispectral images. In [28], Liu et al. presented asolution to the top-down reidentification problem that uses theSiamese architecture in conjunction with CNNs. Their approachcan achieve one-shot, top-down re-identification by learning un-seen classes of persons in real-time. Due to the promising per-formance in similarity learning, convolutional Siamese networksare becoming popular in object tracking. For example, Bertinettoet al. [19] equipped a basic tracking algorithm with a novelfully convolutional Siamese network trained end-to-end on theILSVRC15 dataset for object detection in videos. Their tracker canoperate at frame-rates and achieve state-of-the-art performance inmultiple benchmarks. Cen and Jung [25] proposed to use fullyconvolutional Siamese fusion networks for object tracking. Intheir approach, they constructed a Siamese network based onVGGNet and performed layer fusion for object tracking to effec-tively capture semantic information. To address the problems ofbackground clutters and scale changes, Chen et al. [29] devised amulti-granularity hierarchical attention Siamese network tracker tofurther enhance the tracking stability without sacrificing real-timespeed. Although all the above approaches based on convolutionalSiamese networks are promising in identify similarities betweenimages, they are not designed for IoT-based applications.

To the best of our knowledge, our collaborative edge-cloudarchitecture is the first attempt that adopts convolutional Siamesenetworks on cloud to correct the tracking errors made by edgedevices, leading to better tracking performance and lower energyconsumption.

3 OUR COLLABORATIVE EDGE-CLOUD ARCHI-TECTURE FOR OBJECT TRACKING

Figure 1 shows an overview of our proposed edge-cloud archi-tecture. The architecture mainly consists of two parts: edge partand cloud part. For each edge device, we create one VirtualMachine (VM)1 on cloud for the rectification of tracking errors.At the beginning of tracking, an edge device reads some input

1. We also call this kind of VMs as cloud servers.

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 3: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 3

configuration file to initialize the position (i.e., bounding box) ofa specified target object. This position information will also besent to its corresponding VM for tracking initialization. Note thatthe initially labeled target object will be used as the first groundtruth for the similarity evaluation in both edge devices and cloudservers. During object tracking, our edge tracking module usesKCF algorithm to figure out the target object position for eachframe of captured videos. The obtained bounding box of a trackedobject then will be sent to the edge evaluation module, whichchecks the deviation of the tracked object from its ground truth.To quantify such deviation, our approach adopts PCCs to denotethe similarity between the images of tracked objects cropped fromvideo frames and their ground truths. Note that our edge evaluationmodule uses color histogram features to represent image features.Once the PCC value of current tracking result is less than aspecified threshold, the edge evaluation module will generate asearch region from corresponding frame and send it to its cloudserver for tracking rectification.

Edge Tracker

Tracking Module

Bounding Box

Evaluation Module

SearchRegion

Bounding Box

Rectified Bounding Box

KCF Tracker

Color Histogram Feature

Pearson Correlation Coefficient

Cloud Server

Rectification Module

Fully-Convolutional Siamese Network

Convolutional Neural Network

Pearson Correlation Coefficient

Cloud Tracker

Video

Boun

ding

Box

Frame

.conf

Initial Bounding

Box

Object Feature Maps

Evaluation Module

Fig. 1. An overview of our edge-cloud architecture.

When receiving an edge rectification request, our cloud rec-tification module will search for a more accurate position forthe target object based on the given search region. Such rectifiedinformation in the form of a bounding box will then be sent back tothe edge tracker. Meanwhile, the ground truths of both edge trackerand cloud tracker will be updated by this newly tracked object bycloud. In our approach, the rectification module is implementedbased on the convolutional Siamese network proposed in [19]. Themodule has two collaboration modes, i.e., passive mode and activemode. Besides processing passive rectification requests sent byedge devices, our cloud tracker also performs active rectificationsperiodically. Unlike edge evaluation modules that perform errorevaluation for each video frame, our cloud evaluation modulesperiodically and proactively check tracking errors made by edgedevices. For the cloud evaluation module, we use PCCs to denotethe correlations between cropped images and ground truths, andsave their values in object feature maps. Similar to the passiverectification, if the evaluated PCC result is less than a giventhreshold, a rectification response will be sent to the correspondingedge tracker, and the ground truths of both edge and cloud trackerswill be updated accordingly. The following subsections will detailthe major modules and workflow of our architecture.

3.1 Design of Edge TrackersTo enable quick object tracking as well as self-evaluation, our edgetracker mainly consists of two modules: i) the tracking module

based on KCF algorithm, and ii) the evaluation module thatcalculates PCCs based on the color histogram features of trackedobjects and the ground truth. At the beginning of tracking, the edgetracker needs to figure out the ground truth by reading the labeledbounding box information for a target object in the first framefrom a user-provided configuration file. Such information shouldbe also shared to corresponding cloud tracker for its initialization.By creating a single-scale KCF tracker based on the ground truth,we can conduct the object tracking for the following frames.During the tracking of each frame, we need to evaluate the trackingquality. In our approach, we compare the color histogram featuresof the tracked object and the ground truth, and calculate the PCCto indicate their similarity. Generally, the larger the PCC is, thebetter the tracking quality we can achieve. Along the trackingof consecutive frames, the accumulated errors will become large.Without any rectification mechanism, the target object can easilyget lost. Therefore, during the evaluation if the PCC does notmeet designers’ expectation, proper requests will be sent to cloudservers for some rectification feedback. As aforementioned, wehave two collaboration modes (i.e. active mode and passive mode)for rectifications. Since cloud servers cannot obtain the first-handvideo frames captured by cameras, it cannot be used to triggerthe collaboration based on specific events. Instead, all the activeand passive rectifications are initiated by edge trackers. Note thatevery rectification response made by cloud will trigger the groundtruth update regardless of collaboration modes.

Our approach supports two communication modes (i.e., syn-chronous mode and asynchronous mode) for the collaboration be-tween edge devices and cloud servers. The synchronous commu-nication mode requires edge trackers to wait for rectification feed-back from cloud trackers without doing any work. In this mode,the incoming new frames will be stored in edge device buffers.Therefore, we cannot track the latest captured frames in a real-timemanner. However, in this case the tracking accuracy can be easilyguaranteed. As an alternative, the asynchronous communicationmode does not block the edge devices for rectification generation.Instead, the edge devices and cloud servers can execute in parallel.Consequently, the edge tracker can make prompt response to thelatest capture frames, though the accuracy can be deteriorated forabrupt scenes. Note that when the rectifications made by cloudtrackers come to edge devices in the asynchronous mode, theycannot be used directly, since these rectifications are not for thecurrent investigating frame. In this case, we calculate the positionoffset between the rectification request made by some edge trackerand the rectification response made by its corresponding cloudserver. Our approach then uses this offset to update the position ofcurrently tracked objects.

3.2 Design of Cloud TrackersDue to adequate computing power of cloud servers, the capa-bilities of edge devices can be significantly extended. In ourapproach, by using pre-trained convolutional Siamese networks,the overall tracking accuracy can be significantly improved basedon our proposed cloud rectification and evaluation modules. Notethat the convolutional Siamese networks are pre-trained using theILSVRC15 dataset [19], which may not best fit for other datasets.To further improve tracking performance, proper incrementalretraining for convolutional Siamese networks are required beforethe tracking of objects from different domains.

At the beginning of tracking, the ground truth of each cloudtracker is created based on the knowledge from some given

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 4: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 4

configuration file shared by its corresponding edge tracker. Afterthat, cloud trackers will receive search regions as well as boundingboxes for the target object tracking under both passive and activecollaboration modes. For the passive collaboration with edgetrackers, only the rectification module is invoked to obtain a newaccurate position for the target object. For the active collaboration,the cloud tracker will first check the similarity between the groundtruth and the tracked object. If the deviation is larger than a giventhreshold, the rectification module will be invoked to search for anaccurate position for the target object.

3.2.1 Cloud Rectification ModuleTo enable accurate tracking of target objects, we use the con-volutional Siamese network proposed in [19] , which is a fully-convolutional network that only has convolutional layers andpooling layers without any fully-connected layers. By feeding thenetwork with a search region, the convolutional Siamese networkcan construct a score map, where each element denotes thesimilarity between a given ground truth and its mapped counterpartin the search region.

𝑋

𝑌

127×127×3

255×255×3

CNN

CNN

𝑋′

𝑌′

Slide WindowBased Correlation

Calculation

6×6×128

22×22×128

𝑆

17×17×1

Initial Object Feature Maps

Score Map

Search Region Feature Maps

Fig. 2. Architecture of cloud rectification module.

Figure 2 presents the architecture of our convolutional Siamesenetwork-based rectification module. It has two inputs, i.e., a largerimage Y (with a size of 255×255×3) indicating the search region,and a smaller image X (with a size of 127×127×3) indicating theground truth. Based on pre-trained convolution network (denotedby CNN without any fully-connected layers), we can get thefeature maps X ′ for X and Y ′ for Y , respectively. Note that tofacilitate the presentation the two CNN components shown inFigure 2 refer to the same convolution network. Since Y ′ is largerthan X ′, by comparing X ′ with each slide window of Y ′ (with asize of 6× 6× 128), we can get the score map (with a size of17× 17× 1) for the search region Y against the ground truth X .The coordinates of the highest-score element of the score map willthen be mapped back to the search region to denote the trackedobject. Such information can be used for the rectification.

3.2.2 Cloud Evaluation ModuleIn order to alleviate the problem of insufficient computing powerand inaccurate object tracking for edge devices, we introduce thecloud evaluation module to periodically check whether a trackedobject deviates from its corresponding target object. An edgedevice periodically sends the tracking result to its correspondingcloud server for similarity evaluation and position rectification.Similar to the edge evaluation module, our cloud evaluationmodule also uses PCC to reflect tracking errors. If the PCC isequal to or less than the specified threshold, it means that the erroris unacceptable and the tracked object needs to be rectified.

Figure 3 presents the architecture of our cloud evaluationmodule. It is important to note that in our approach the cloud

Result

127×127×3

I

R

127×127×3

CNN

CNN

FC

FC

PCCCalculation

𝑒

𝑅′

𝐼′

Update

1024×1

1024×1

𝐼′′

𝑅′′

Initial Object

6×6×128

6×6×128

Feature Maps

Feature Maps

Feature Vector

Feature Vector

Correlation Coefficient

Fig. 3. Architecture of cloud-based evaluation module.

rectification module and the cloud evaluation module share thesame convolutional network (denoted by CNN in both Figure 2and Figure 3) . Unlike the rectification module that takes thesearch region (with a size of 255×255×3) as an input, the inputResult is the tracked object by corresponding edge tracker withthe same size of ground truth. By calculating feature vectors usingfully connected layers (denoted by FC), we can calculate PCCsbased on obtained feature vectors for the purpose of similaritycomparison.

3.2.3 Retraining of Convolutional Siamese NetworksTo fit different datasets, our approach adopts transfer learning toretrain pre-defined neural networks. In our approach, the initialconvolutional Siamese network is obtained from the pre-trainedmodel presented in [19], and then this model is incrementallyretrained by partial videos of the newly investigated datasets. Inour approach, the retraining of convolutional Siamese networksconsists of three major phases: extraction of training data, re-training of convolutional layers, and retraining of fully-connectedlayers.

Based on the pre-trained model, for each new dataset, ourapproach selects a rich set of positive and negative samplesto retrain the convolutional Siamese network. To enhance thecapabilities of convolutional layers in feature extraction for largerimages, during the retraining of convolutional layers, the size ofthe cropped search regions in our approach is two times largerthan the size of target objects. Furthermore, to fully exploit thepotentials of fully-connected layers, during the retraining of fully-connected layers, the size of the cropped search regions in ourapproach is three times larger than the size of target objects. If thetarget object position is located at the center of a given searchregion, the search region will be treated as a positive sample,otherwise the search region will be treated as a negative sample.

For the convolutional layers, we use the loss function intro-duced in [19]. Formula (1) shows the loss function of our fully-convolutional layers, where D is a score map, y[u] is a true label fora position u∈D in the score map, and l(y[u],v[u]) is the individualloss of each position u. Finally, the loss function is calculated asthe mean of individual losses. Note that the score map size in ourapproach is 17×17. Each element of the score map is calculatedby the convolutional layer, indicating the similarity between theground truth and a corresponding tracking candidate.

L(y,v) =1|D| ∑u∈D

l(y[u],v[u]). (1)

After the convolutional layer retraining, we need to performthe retraining for the fully-connected layer as well. The input

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 5: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 5

of the fully-connected layer is feature maps (with a size of6×6×128) obtained from the retrained convolutional layers. Theoutput of the fully-connected layer is a feature vector (with asize of 1024×1). Since the feature vectors are used to investigatethe similarities between the ground truth and tracked objects, wepropose to define the loss function for the fully-connected layerusing PCCs. Formula (2) presents the defined loss function forthe fully-connected layer, where y(i) denotes the output vector ofthe fully-connected layer, y(0) denotes the feature vector of anexemplar, and y indicates the ground-truth label. We calculate thePCC (i.e., py(i)y(0) ) for the pair y(i) and y(0). If the feature vectors ofy(i) and y(0) are the same, y will equal to 1, otherwise y will equalto -1.

L(y,y(i),y(0)) = log(1+ exp(−y× py(i)y(0))). (2)

To prevent the over-fitting on fully-connected layers, weadopt the random dropout method during the retraining of fully-connected layers. In other words, in each round of training, someneurons in a fully-connected layer are randomly invalidated. Theeffect of this process is equivalent to retraining a smaller fully-connected layer. In this way, the retrained fully-connected layercan be considered as a combination of all these small fully-connected layers.

4 IMPLEMENTATION OF OUR COLLABORATIVEARCHITECTURE

Algorithm 1 details the implementation of our edge trackers.In this algorithm, lines 2-5 conduct the initialization of objecttracking. Line 2 fetches the first video frame and saves it inf irstFrame. Base on the initial bounding box information pro-vided by the configuration file, line 3 creates a KCF tracker forobject tracking. Note that the target object is bounded withininitBBox of f irstFrame. Line 4 calculates the color histogramfeature for the target object. Line 5 resets f rameCnt, which isused to trigger the periodical active rectifications by correspondingcloud tracker. Moreover, line 5 initializes sent and BOX , whichare used to save temporary information during asynchronouscollaborations. Lines 6-42 iteratively conduct the object trackingfor each frame until the video stops. Line 8 figures out thebounding box for a new frame, and line 9 crops the search regionsearchRegion based on this bounding box. Line 10 calculates thecolor histogram feature for the sub-figure bounded by bBox inthe current frame. Line 11 obtains the PCC between the groundtruth and the tracked object bounded by bBox. If the PCC valueis larger than the given threshold θ, lines 13-33 will be invoked todetermine whether an active rectification needs to be performed.When the check in line 13 holds, an active rectification will beperformed by lines 14-32 based on collaborations between edgeand cloud trackers. In our implementation, we use the functionSendToCloud to communicate with corresponding cloud tracker,where the third parameter of this function denotes the collabo-ration modes (i.e., 1 denotes passive mode and 0 denotes activemode). For different collaboration modes, we use different func-tions (i.e., the blocking function SyncRecvFromClou() and non-blocking function AsyncRecvFromCloud()) to receive rectificationresults from cloud trackers. In asynchronous mode, if the currentbounding box is inaccurate (indicated by line 18), the edge trackerwill update its ground truth using the rectification data sent byits cloud collaborator (line 19). Note that for the asynchronouscollaboration our approach allows non-blocking execution of edge

trackers. Therefore, when receiving valid rectification informationfrom cloud trackers, the edge tracker will update its currentbounding box and ground truth accordingly (lines 28-30). If thePCC value is no larger than the given threshold θ and there isno unfinished asynchronous rectification, lines 36-39 will send thepassive rectification request to its corresponding cloud tracker andupdate the ground truth. In this case, our approach only adopts thesynchronous collaboration.

Algorithm 1: Implementation of Edge Trackers

Input: i) video, a video for tracking;ii) initBBox, initial bounding box of object;iii) θ, threshold for cloud evaluation;iv) period, cloud evaluation cycle;v) mode, edge-cloud synchronization mode;

Edge(video, initBBox, θ, period, mode) beginReadFrame(video, f irstFrame);kc f = KCF( f irstFrame, initBBox);ob jCHF = ColorHistFeature( f irstFrame, initBBox);f rameCnt = 0, sent = f alse, BOX = NULL;while ReadFrame(video, f rame) do

f rameCnt++;bBox = kc f .Update( f rame);sReg = SearchRegion( f rame, bBox);resultCHF = ColorHistFeature( f rame, bBox);evalE = PCC(resultCHF , ob jCHF);if evalE > θ then

if f rameCnt % period == 0 thenif mode == sync then

SendToCloud(sReg, bBox, 0);f rameCnt = 0;bBox = SyncRecvFromCloud();if bBox ! = NULL then

kc f = KCF( f rame, bBox);end

elseif sent == f alse then

SendToCloud(sReg, bBox, 0);sent = true, BOX = bBox;

endb = AsyncRecvFromCloud();if b ! = NULL then

b′ = AdjustBox(bBox, b-BOX);kc f = KCF( f rame, b′);frameCnt = 0, sent = f alse;

endend

endelse

if !sent thenSendToCloud(sReg, bBox, 1);bBox = SyncRecvFromCloud();kc f = KCF( f rame, bBox);f rameCnt = 0;

endend

endend

Algorithm 2 details the implementation of cloud trackers.

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 6: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 6

Based on the first video frame and initial bounding box for sometarget object, line 2 figures out the feature maps, and line 3achieves the corresponding feature vector of the target object.Lines 4-17 iteratively deal with the incoming rectification requestsfrom edge devices. In line 5, cloud trackers construct feature mapsbased on the search regions and bounding boxes sent by their edgecollaborators. The flag r f lag indicates the collaboration modes. Ifr f lag equals 0, the cloud tracker will conduct the active rectifi-cation. In this case, lines 7-8 invoke the cloud evaluation moduleto check the similarity between ground truths and tracked objectscaptured by corresponding edge trackers. If a tracked object hasa large deviation from its ground truth or the collaboration modeis passive, lines 11-13 will search for a more accurate position forthe target object based on convolutional Siamese networks, sendthe rectification results to its edge counterpart, and update theground truth in terms of feature vectors accordingly. Otherwise,line 15 will just respond the querying edge counterpart withoutany rectification information.

Algorithm 2: Implementation of Cloud Trackers

Input: i) f irstFrame, the first video frame;ii) initBBox, initial bounding box of object;iii) θ, threshold for cloud evaluation;

Cloud( f irstFrame, initBBox, θ) beginob jFMs = CNNFMaps( f irstFrame, initBBox);ob jFV = FullyConnect(ob jFMs);while RecvfromEdges(sReg, bBox, r f lag) do

resultFMs = CNNFMaps(sReg, bBox);if r f lag == 0 then

resultFV = FullyConnect(resultFMs);evalC = PCC(resultFV , ob jFV );

endif evalC <= θ || r f lag == 1 then

recBBox = SiameseNet(ob jFV , sReg);SendToEdge(recBBox);ob jFV = resultFV ;

elseSendToEdge(NULL);

endend

end

To illustrate the necessity of our approach, Figure 4 comparesthe non-collaborative tracking method with our collaborative ap-proach using asynchronous rectifications. From Figure 4(b), wecan find that at the tth frame the KCF tracker cannot accuratelytrack the boat as initialized in Figure 4(a). Due to the limitedcapability of KCF trackers and accumulated tracking errors, fromFigure 4(c) we can observe that the deviation between the targetobject and its tracking result becomes larger. However, if we resortto the asynchronous rectification provided by our edge-cloudarchitecture at this frame, our approach can send a rectificationrequest to its cloud tracker. After k-frame tracking using the KCFtracker, the rectification information sent back by the cloud trackercan be used to improve tracking accuracy. As an example shown inFigure 4(d), the tracking results indicated by the green boundingbox can accurately capture the target object.

5 EXPERIMENTS

To evaluate the effectiveness of our approach, we constructedan experimental edge-cloud platform. In the experiment, edgedevices are developed on top of NVIDIA Jetson TX2 boards,which are equipped with four 2.0GHz ARM Cortex-A57 coresand 4GB memory. For each edge device, we deployed one single-scale KCF tracker implemented using the opencv-contrib-pythonextension package (version 3.4.1). Note that the tracker does notuse the on-board GPU of Jetson platforms. To enable collabo-rative tracking, each edge device is connected to one VM thatis equipped with eight 4.0GHz cores, 16GB memory and oneNVIDIA GeForce GT-640 GPU. For each VM, we deployed oneconvoltional Siamese network-based rectification module and onefully-connected layer-based evaluation module implemented usingthe TensorFlow-1.4 framework. Our experiment was conductedwithin a WIFI environment, where the average network bandwidthbetween VMs and edge devices is 14.2 Mbits per second (mea-sured using the tool ipert). All the edge devices and cloud serversrun the Ubuntu operating system (version 16.04).

Let Bg be the bounding box of some ground truth, and Bo bethe target object bounding box generated by a tracker. We usedIntersection Over Union (IOU) to denote the accuracy of objecttracking, where

IOU =Area(Bg)∩Area(Bo)

Area(Bg)∪Area(Bo).

Meanwhile, we used the number of Frames processed Per Second(FPS) to indicate the response time of object tracking. We investi-gated two datasets in the experiment, i.e., DAC 2018 competitiondataset [32] and VOT 2016 competition dataset [33]. Note thatDAC dataset is mainly captured by unmanned aerial vehicles,while VOT 2016 dataset consists of a large number of videoscollected from various websites. Since each dataset consists ofa large number of videos, to facilitate the experiment we onlyconsider ten videos randomly selected from each dataset.

5.1 Results of Tracking Performance Evaluation

To evaluate our proposed approach in terms of tracking accuracy,we investigated four tracking strategies as follows: i) edge-onlyindicates the one that adopts edge-based KCF trackers for trackingwithout any cloud-based rectification; ii) cloud-only indicates thatcloud servers make tracking decisions for each video frame;iii) sync denotes the synchronous collaboration; and iv) asyncdenotes the asynchronous collaboration. Note that the implemen-tation of KCF trackers (i.e., edge-only) are extracted from theOpenCV library (version 2.4.13.4) [34], and cloud-only is basedon a TensorFlow version of fully convolutional Siamese networks(i.e., SiamFC-TensorFlow) implemented in [35]. To enable faircomparison, in cloud-only we modified and deployed SiamFC-TensorFlow on cloud servers rather than edge devices, whichenables communication between edge devices and cloud services.

Figure 5 compares the tracking results of two frames randomlyexcerpted from the DAC dataset using these four strategies. Forboth sync and async strategies, we set the evaluation cycle (i.e.,the number of intermediate frames in between two rectificationrequests) to 8 frames. From these two images, we can observethat cloud-only can achieve the best accuracy in these two figures.This is because the convultional Siamese network-based trackersdeployed on cloud servers are more accurate than the KCF-basedtrackers deployed on edge devices. Meanwhile, we can find that

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 7: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 7

(a) Initial frame (b) The tth frame w/o rectification

(c) The (t + k)th frame w/o rectification (d) The (t + k)th frame w/ rectification

Fig. 4. A comparison of tracking results without and with cloud-based rectification. Green box = tracking result, red box = ground truth.

our collaborative tracking methods (sync and async) are moreaccurate than the edge-only approach. Here, the small trackingerrors of both methods are mainly due to the rectifications madeby cloud servers. For the whole experiment, we set the thresholdof Pearson correlation coefficient to 0.5. Similar to [19], for eachsearch image (i.e., video frame) we feed convolutional Siamesenetworks on cloud with its three resized versions, where scalefactors are 0.957, 1.0 and 1.045, respectively. We do not considersuch resized versions for KCF trackers on edge devices.

Since edge devices periodically send rectification requests tocloud servers, the cloud evaluation cycle plays an important rolein both tracking speed and accuracy. Figure 6 presents the impactsof evaluation cycles for both DAC and VOT datasets. Note thatthe statistics presented in the two subfigures denote the averageIOU and FPS values for the ten selected videos from each dataset,respectively. By comparing both sync and async methods, fromFigure 6(a) we can find that the DAC dataset can achieve betterIOU than the VOT dataset. This is mainly because the trackingobjects in DAC dataset have fewer occlusions and shape changes.We can also observe that for both datasets sync can achieve betterIOU than async, since our synchronous collaboration methodcan guarantee more accurate tracking for each frame than itsasynchronous counterpart. When the evaluation cycles becomelarger, the IOU of trackers becomes worse due to the accumulatedtracking errors and abrupt shape and speed changes of targetobjects. As an example for DAC dataset, when the evaluation cycleis increased from 10 frames to 11 frames, the IOU of sync reducesdrastically. Although increasing the lengths of evaluation cyclesmay deteriorate the tracking IOU, on the other hand the trackingspeed can be significantly improved as shown in Figure 6(b). Wecan observe that, when the evaluation cycle increases from 1 frameto 7 frames, the FPS of sync can achieve an improvement of morethan twice.

(a) The 160th frame

(b) The 240th frame

Fig. 5. Tracking results using different tracking strategies. We use thered, blue, green and yellow colors to denote the edge-only, sync, asyncand cloud-only, respectively.

To check the accuracy of edge trackers under collaborationswith cloud trackers, Figure 7 presents the tracking IOU of oneDAC video and one VOT video from the ten selected videos,respectively. We set the evaluation cycle to 7 frames and comparedthe tracking accuracy using the four tracking strategies. The x-axesof subfigures indicate the indices of consecutive video frames.We can find that for both cases edge-only obtains the worst

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 8: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 8

2 4 6 8 10 12Evaluation Cycles

0.1

0.2

0.3

0.4

0.5

0.6

0.7

IOU

Async_DACSync_DACAsync_VOTSync_VOT

(a) Impacts on tracking accuracy

2 4 6 8 10 12Evaluation Cycles

05

101520253035

FPS

Async_DACSync_DACAsync_VOTSync_VOT

(b) Impacts on tracking speed

Fig. 6. Impacts of evaluation cycles on tracking accuracy and speed.

performance, while cloud-only achieves the best performance. Forexample, for the DAC video the KCF tracker loses its target objectwithin 200 frames. However, cloud-only still can obtain high IOUafter the tracking of more than 500 frames.

0 100 200 300 400 500# of Frames

0.0

0.2

0.4

0.6

0.8

1.0

IOU

Edge-only Cloud-onlySync Async

(a) Tracking results of a DAC video

0 100 200 300 400 500# of Frames

0.0

0.2

0.4

0.6

0.8

1.0

IOU

Edge-only Cloud-onlySync Async

(b) Tracking results of a VOT video

Fig. 7. Impacts of rectification on tracking accuracy.

Although syc and async do not have better IOU than cloud-only, they can still obtain high IOU with the satisfying processingtime. Table 1 shows the tracking time (in the form of x/y) for500 frames of the two videos with different tracking strategies,where x denotes the overall tracking time (include y) and y denotes

the network communication cost. Note that the reuslts in Table 1include the commucation time between VMs and edge devices.From this table we can find that cloud-only is not suitable for real-time tracking, since its processing time is much higher than theother three strategies. When response time has a higher prioritythan accuracy, we suggest to use async for object tracking. Notethat in the case of VOT, sync appears more stable than async,since there exist more abrupt shape and speed changes within theinvestigated VOT video.

TABLE 1Tracking time (in seconds) for the strategies in Figure 7

Strategies Edge-only Async Sync Cloud-onlyTrack. Time for DAC 14.3/0 46.7/- 62.9/17.8 406.5/115.2Track. Time for VOT 14.6/0 46.6/- 60.6/15.5 393.7/97.3

5.2 Results of Energy Evaluation for Edge DevicesDue to the increasing popularity of battery-driven edge devices,it is required that object tracking applications should deal withcomputation in an energy-efficient manner. Figure 8 shows theimpacts of evaluation cycles on both energy consumption andtracking time for the selected DAC and VOT videos.

2 4 6 8 10 12Evaluation Cycles

51015202530354045

Edge

Ene

rgy

(KJ)

Async_DACSync_DACAsync_VOTSync_VOT

(a) Energy consumption

2 4 6 8 10 12Evaluation Cycles

010203040506070

Tim

e (m

in.)

Async_DACSync_DACAsync_VOTSync_VOT

(b) Tracking time

Fig. 8. Energy consumption and tracking time of edge devices.

We can find that asynchronous rectification method can sig-nificantly reduce the overall energy consumption and trackingtime compared to the synchronous rectification method. This isbecause the asynchronous approach allows the parallel executionof both edge devices and cloud servers. The shortened trackingtime will in turn reduce the energy consumption accordingly.Moreover, we can observe that evaluation cycles strongly affectthe energy consumption. Since larger evaluation cycles requireless rectification requests, the smaller network transmission timewill lead to lower overall energy consumption. In other words,

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 9: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 9

when battery power is low, proper tuning of evaluation cycle sizecan enable the processing of more frames, though the tradeoffbetween energy consumption and tracking accuracy need to bemade in advance.

6 CONCLUSIONS

Along with the prosperity of IoT-based visual applications, howto ensure real-time object tracking is becoming a major bot-tleneck. To address this problem, we presented a novel edge-cloud architecture that enables real-time object tracking with high-accuracy based on the collaboration between edge devices andcloud servers. By incorporating convolutional Siamese networksinto our proposed rectification modules deployed on cloud servers,edge devices can resort to cloud for more accurate object po-sitions as needed. The experimental results on two well-knownbenchmarks demonstrate that our approach can not only improvethe overall object tracking performance, but also can reduce theenergy consumption of edge devices drastically.

ACKNOWLEDGMENTS

This work was partially supported by the grants from Na-tional Key Research and Development Program of China (No.2018YFB2101300), Natural Science Foundation of China (Nos.61872147), National Science Foundation grants CCF-1900904,CCF-1619243, CCF-1537085 (CAREER).

REFERENCES

[1] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACMComputing Surveys, vol. 38, no. 4, pp. 13, 2006.

[2] D. Held, S. Thrun, and S. Savarese, “Learning to Track at 100 FPS withDeep Regression Networks,” in European Conference on Computer Vision(ECCV), 2016, pp. 1:749–765.

[3] D. Kim, J. Park, J. Shin, and J. Kim, “Design and Implementation ofObject Tracking System based on LoRa,” in International Conference onInformation Networking, 2017, pp. 463–467.

[4] S. Kato, E. Takeuchi, Y. Ishiguro, Y. Ninomiya, K. Takeda, and T. Hamada.“An Open Approach to Autonomous Vehicles,” IEEE Micro, vol. 35, no. 6,pp. 60–68, 2015.

[5] K. Abas, K. Obraczka, and L. Miller, “Solar-Powered, Wireless SmartCamera Network: An IoT Solution for Outdoor Video Monitoring,”Computer Communications, vol. 118, pp. 217–233, 2018.

[6] A. Botta, W. Donato, V. Persico, and A. Pescape, “Integration of Cloudcomputing and Internet of Things: A survey,” Future Generation Com-puter Systems (FGCS), vol. 56, pp. 684–700, 2016.

[7] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge Computing: Vision andChallenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646,2016.

[8] B. A. Mudassar, J. H. Ko, and S. Mukhopadhyay, “Edge-cloud collabo-rative processing for intelligent internet of things: a case study on smartsurveillance,” in Design Automation Conference (DAC), 2018, pp. 146:1–6.

[9] L. Tong, Y. Li, and W. Gao, “A hierarchical edge cloud architecture formobile computing,” in International Conference on Computer Communi-cations (INFOCOM), 2016, pp. 1–9.

[10] R. Xu, S. Y. Nikouei, Y. Chen, A. Polunchenko, S. Song, C. Deng,and T. R. Faughnan:, “Real-Time Human Objects Tracking for SmartSurveillance at the Edge,” in International Conference on Communications(ICC), 2018, pp. 1–6.

[11] H. Liu, F. Eldarrat, H. Alqahtani, A. Reznik, X. Foy, and Y. Zhang,“Mobile Edge Cloud System: Architectures, Challenges, and Approaches,”IEEE Systems Journal, vol. 12, no. 3, pp. 2495–2508, 2018.

[12] Z. Zhao, Z. Jiang, N. Ling, X. Shuai, and G. Xing, “ECRT: An EdgeComputing System for Real-Time Image-based Object Tracking,” in ACMConference on Embedded Networked Sensor Systems (SenSys), 2018, pp.394–395.

[13] B. Blanco-Filgueira, D. Garcia-Lesta, M. Fernandez-Sanjurjo, V. M.Brea, and P. Lopez, “Deep Learning-Based Multiple Object VisualTracking on Embedded System for IoT and Mobile Edge ComputingApplications,” IEEE Internet of Things Journal, accpted, 2019.

[14] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based Fully Convolutional Networks,” in Neural Information ProcessingSystems (NIPS), 2016, pp. 379–387.

[15] H. Nam and B. Han, “Learning Multi-domain Convolutional NeuralNetworks for Visual Tracking,” in IEEE Conference on Computer Visionand Pattern Recognition (CVPR), 2016, pp. 4293–4302.

[16] K. Kang, W. Ouyang, H. Li, and X. Wang, “Object Detection from VideoTubelets with Convolutional Neural Networks,” in IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2016, pp. 817–825.

[17] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig, “Virtual Worlds as Proxyfor Multi-Object Tracking Analysis,” in IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2016, pp. 4340–4349.

[18] A. Bewley, Z. Ge, L. Ott, F. T. Ramos, and B. Upcroft, “Simple Onlineand Realtime Tracking,” in International Conference on Image Processing(ICIP), 2016, pp. 3464–3468.

[19] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S.Torr, “Fully-Convolutional Siamese Networks for Object Tracking,” inEuropean Conference on Computer Vision (ECCV) Workshops, 2016, pp.850–865.

[20] R. R. Varior, M. Haloi, and G. Wang, “Gated Siamese ConvolutionalNeural Network Architecture for Human Re-identification,” in EuropeanConference on Computer Vision (ECCV), 2016, pp. 791–808.

[21] V. Kumar B. G, G. Carneiro, and I. D. Reidg, “Learning Local ImageDescriptors with Deep Siamese and Triplet Convolutional Networks byMinimizing Global Loss Functions,” in IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2016, pp. 5385–5394.

[22] M. Satyanarayanan, P. Simoens, Y. Xiao, P. Pillai, Z. Chen, K. Ha, W. Hu,and B. Amos, “Edge Analytics in the Internet of Things,” IEEE PervasiveComputing, vol. 14, no. 2, pp. 24–31, 2015.

[23] S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed Deep Neu-ral Networks Over the Cloud, the Edge and End Devices,” in InternationalConf. on Distributed Computing Systems (ICDCS), 2017, pp. 328–339.

[24] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-SpeedTracking with Kernelized Correlation Filters,” IEEE Transactions onPattern Analysis and Machine Intelligence (TPAMI), vol. 37, no. 3, pp.583–596, 2015.

[25] M. Cen, and C. Jung, “Fully Convolutional Siamese Fusion Networks forObject Tracking,” in nternational Conference on Image Processing (ICIP),2018, pp. 3718–3722.

[26] C. Wang, and G. Tzanetakis, “Singing Style Investigation by ResidualSiamese Convolutional Neural Networks,” in International Conference onAcoustics, Speech and Signal Processing (ICASSP), 2018, pp. 116–120.

[27] R. C. Daudt, B. L. Saux, and A. Boulch, “Fully Convolutional SiameseNetworks for Change Detection,” in International Conference on ImageProcessing (ICIP), 2018, pp. 4063–4067.

[28] Z. Liu, A. McClung, H. W. F. Yeung, Y. Y. Chung, and S. M. Zan-davi, “Top-Down Person Re-Identification With Siamese ConvolutionalNeural Networks,” in International Joint Conference on Neural Networks(IJCNN), 2018, pp. 1–8.

[29] X. Chen, X. Zhang, H. Tan, L. Lan, Z. Luo, and X. Huang, “Multi-granularity Hierarchical Attention Siamese Network for Visual Tracking,”in International Joint Conference on Neural Networks (IJCNN), 2018, pp.1–8.

[30] C. Ma, J. Huang, X. Yang, and M. Yang, “Hierarchical ConvolutionalFeatures for Visual Tracking,” in International Conference on ComputerVision (ICCV), 2015, pp. 3074–3082.

[31] E. Price, G. Lawless, R. Ludwig, I. Martinovic, H. H. Bulthoff, M. J.Black, and A. Ahmad, “Deep Neural Network-Based Cooperative VisualTracking Through Multiple Micro Aerial Vehicles,” IEEE Robotics andAutomation Letters, vol. 3, no. 4, pp. 3193–3200, 2018.

[32] DAC-SDC, http://www.cse.cuhk.edu.hk/˜byu/2018-DAC-HDC/, 2018.[33] VOT-2016. http://www.votchallenge.net/vot2016/data set.html, 2016.[34] OpenCV 2.4.13.4. https://github.com/opencv/opencv/releases/tag/2.4.13.4,

2017.[35] SiamFC-TensorFlow. https://github.com/lzane/SiamFC-tf, 2018.

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.

Page 10: A Collaborative and Sustainable Edge-Cloud Architecture for … · 2020-03-25 · tracking workloads to cloud servers, edge devices in our approach seek help from cloud servers only

2377-3782 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSUSC.2019.2955317, IEEETransactions on Sustainable Computing

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. X, NO. Y, ZZZZ 10

Haifeng Gu (S’18) received the B.E. degreefrom the Department of Computer Scienceand Technology, Sichuan Normal University,Chengdu, China, in 2013. He is currently work-ing toward the PhD degree in the Departmentof Embedded Software and System, East ChinaNormal University, Shanghai, China. His re-search interests include the area of edge com-puting, hardware/software co-validation, sym-bolic execution, statistical model checking, andsoftware testing.

Zishuai Ge received the B.E. degree from theDepartment of Biomedical Engineering, ChinaMedical University, Shenyang, China, in 2016.He is currently working toward the M.E. degreein the Department of Embedded Software andSystem, East China Normal University, Shang-hai, China. His research interests include thearea of edge computing, computer vision, andembedded systems.

E Cao received the B.E. degree from theLife Information Science and Instrument Engi-neering Institute, Hangzhou Dianzi University,Hangzhou, China, in 2018. He is currently amaster student in the Institute of Computer Sci-ence and Software Engineering, East China Nor-mal University. His research interests are in thearea of cloud computing, parallel and distributedsystems, design automation of embedded sys-tems, and software engineering.

Mingsong Chen (M’11–SM’17) received theB.S. and M.E. degrees from Department ofComputer Science and Technology, NanjingUniversity, Nanjing, China, in 2003 and 2006respectively, and the Ph.D. degree in Com-puter Engineering from the University of Florida,Gainesville, in 2010. He is currently a Professorwith the Computer Science and Software Engi-neering Institute at East China Normal Univer-sity. His research interests are in the area ofcloud computing, design automation of cyber-

physical systems, parallel and distributed systems, and formal verifica-tion techniques. He is an Associate Editor of IET Computers & DigitalTechniques, and Journal of Circuits, Systems and Computers.

Tongquan Wei (S’06-M’11) received his Ph.D.degree in Electrical Engineering from MichiganTechnological University in 2009. He is currentlyan Associate Professor in the Department ofComputer Science and Technology at the EastChina Normal University. His research interestsare in the areas of green and reliable embed-ded computing, cyber-physical systems, paralleland distributed systems, and cloud computing.He serves as a Regional Editor for Journal ofCircuits, Systems, and Computers since 2012.

He also served as Guest Editors for several special sections of IEEE TIIand ACM TECS.

Xin Fu (S’05-M’10) received the Ph.D. degreein Computer Engineering from the University ofFlorida, Gainesville, in 2009. She was a NSFComputing Innovation Fellow with the ComputerScience Department, the University of Illinoisat Urbana-Champaign, Urbana, from 2009 to2010. From 2010 to 2014, she was an AssistantProfessor at the Department of Electrical Engi-neering and Computer Science, the Universityof Kansas, Lawrence. Currently, she is an As-sociate Professor at the Electrical and Computer

Engineering Department, the University of Houston, Houston. Her re-search interests include computer architecture, high-performance com-puting, hardware reliability and variability, energy-efficient computing,and mobile computing. Dr. Fu is a recipient of 2014 NSF Faculty EarlyCAREER Award and 2012 Kansas NSF EPSCoR First Award.

Shiyan Hu (SM’10) received his Ph.D. in Com-puter Engineering from Texas A&M University in2008. He is the Professor and Chair in Cyber-Physical Systems at University of Essex. He wasan Associate Professor and Director of Centerfor Cyber-Physical Systems at Michigan Tech.and a Visiting Associate Professor at StanfordUniversity. His research interests include Cyber-Physical Systems (CPS), CPS Security, SmartEnergy CPS, Data Analytics, and Computer-Aided Design of VLSI Circuits, where he has

published more than 100 refereed papers. He is an ACM DistinguishedSpeaker, an IEEE Systems Council Distinguished Lecturer, an IEEEComputer Society Distinguished Visitor, a recipient of the 2017 IEEEComputer Society TCSC Middle Career Researcher Award, the 2014National Science Foundation (NSF) CAREER Award, and the 2009ACM SIGDA Richard Newton DAC Scholarship. His publications havereceived a few distinctions, which includes the 2018 IEEE Systems Jour-nal Best Paper Award, the 2017 Keynote Paper in IEEE Transactionson Computer-Aided Design, and the Front Cover in IEEE Transactionson Nanobioscience in March 2014. He is the Chair for IEEE TechnicalCommittee on Cyber-Physical Systems. He is the Editor-In-Chief of IETCyber-Physical Systems: Theory & Applications. He serves as an As-sociate Editor for IEEE Transactions on Computer-Aided Design, IEEETransactions on Industrial Informatics, IEEE Transactions on Circuitsand Systems, ACM Transactions on Design Automation for ElectronicSystems, and ACM Transactions on Cyber-Physical Systems. He hasserved as a Guest Editor for 8 IEEE/ACM Journals such as Proceedingsof the IEEE and IEEE Transactions on Computers. He has held chairpositions in numerous IEEE/ACM conferences. He is a Fellow of IET.

Authorized licensed use limited to: East China Normal University. Downloaded on March 25,2020 at 15:16:04 UTC from IEEE Xplore. Restrictions apply.