[acm press the 13th international conference - melbourne, victoria, australia...

8
Proximal and Social-aware Device-to-Device Communication via Audio Detection on Cloud Jakob Mass [email protected] Satish Narayana Srirama [email protected] Huber Flores [email protected] Chii Chang [email protected] Institute of Computer Science, Mobile & Cloud Lab University of Tartu J. Liivi 2, Tartu, Estonia ABSTRACT Device-to-Device (D2D) communication is a potential strat- egy to release the mobile network from unnecessary data transfer, accelerate the responsiveness of end-to-end apps, and decentralize the provisioning of traditional services. D2D coordination is a critical challenge, which cannot be over- come without the explicit intervention of the user as D2D communication represents a threat for user’s privacy. How- ever, social attributes can be leveraged to equip the devices with trusted mechanisms that can automate D2D commu- nication. In this paper, we build and design a mobile cloud system that relies on audio data obtained from user’s envi- ronment to determine whether a set of devices are located in proximity. Audio analysis is performed on the cloud using classical machine learning principles, and the cloud instance (server) also informs the devices about the coordination plan to establish D2D communication. The framework is evalu- ated using a smartphone app for sharing files and the eval- uation shows that the approach is feasible in practice. Categories and Subject Descriptors C.1.3 [Computer Systems Organization]: Other Archi- tecture Styles—Cellular architecture (e.g., mobile); C.2.4 [Computer-Communication Networks]: Distributed Sys- tems—Client/Server Keywords D2D, Mobile cloud, Audio analysis, Community sensing, Clustering, Bluetooth 1. INTRODUCTION In the last lustrum, the proliferation of smartphones and their apps is on the rise. A growing number of persons are Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MUM ’14, November 25 - 28 2014, Melbourne, VIC, Australia Copyright 2014 ACM 978-1-4503-3304-7/14/11 ...$15.00 http://dx.doi.org/10.1145/2677972.2677985. constantly connected to the Internet through their mobiles and have access to a plethora of different services and soft- ware. Service provisioning models are being adapted to fit mobility patterns, which allow the user to maintain a contin- uous presence in the network, e.g., WhatsApp, LINE, Drop- box, etc. These circumstances are resulting in collaborative environments, which allow users to access, create and share information with little effort. However, as a result of this surge in ubiquitous services, mobile networks are facing an increased amount of data transfer leading to network congestion, which directly im- pacts user experience. This throws challenges and opportu- nities at minimizing data transfer while allowing the users to participate actively without compromising their interac- tivity and perceptibility with the smartphone apps. Fortunately, mobility introduces the proximity attribute in the mobile systems, which can be exploited in oppor- tunistic ways in order to release the network from exces- sive data transfer [10]. Smartphones are equipped with mechanisms that enable the mobiles to establish device-to- device (D2D) communication, e.g., Bluetooth, Wifi direct. D2D communication is preferable when end-to-end points require to exchange information and those are in close prox- imity. D2D communication accelerates the responsiveness of the apps and reduces unnecessary data transfer over the mobile networks. Moreover, D2D communication can im- prove service provisioning and foster new ways to decen- tralized/share information, e.g., Mobile Social Network in Proximity (MSNP) [4]. Establishing D2D communication is a complex task as it requires explicit intervention of the user. D2D communi- cation is a potential threat for user’s privacy, which means that the user must be aware with who he/she is sharing in- formation and what information is being transferred back and forth. As a result, D2D mechanisms provide authenti- cation methods, e.g., passwords, pin-codes, etc. Since D2D mechanisms introduce extra effort to the user, D2D com- munication is not very popular. Consequently, mobile users prefer to rely on the cellular network for exchanging data. However, social attributes can be leveraged to equip the devices with trusted mechanisms that can automate D2D communication [13]. For instance, a mobile can continuously monitor which are the most frequent devices with whom it shares temporal locations, is in social terms or in friend- 143

Upload: chii

Post on 31-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

Proximal and Social-aware Device-to-DeviceCommunication via Audio Detection on Cloud

Jakob [email protected]

Satish Narayana [email protected]

Huber [email protected]

Chii [email protected]

Institute of Computer Science,Mobile & Cloud LabUniversity of Tartu

J. Liivi 2, Tartu, Estonia

ABSTRACTDevice-to-Device (D2D) communication is a potential strat-egy to release the mobile network from unnecessary datatransfer, accelerate the responsiveness of end-to-end apps,and decentralize the provisioning of traditional services. D2Dcoordination is a critical challenge, which cannot be over-come without the explicit intervention of the user as D2Dcommunication represents a threat for user’s privacy. How-ever, social attributes can be leveraged to equip the deviceswith trusted mechanisms that can automate D2D commu-nication. In this paper, we build and design a mobile cloudsystem that relies on audio data obtained from user’s envi-ronment to determine whether a set of devices are located inproximity. Audio analysis is performed on the cloud usingclassical machine learning principles, and the cloud instance(server) also informs the devices about the coordination planto establish D2D communication. The framework is evalu-ated using a smartphone app for sharing files and the eval-uation shows that the approach is feasible in practice.

Categories and Subject DescriptorsC.1.3 [Computer Systems Organization]: Other Archi-tecture Styles—Cellular architecture (e.g., mobile); C.2.4[Computer-Communication Networks]: Distributed Sys-tems—Client/Server

KeywordsD2D, Mobile cloud, Audio analysis, Community sensing,Clustering, Bluetooth

1. INTRODUCTIONIn the last lustrum, the proliferation of smartphones and

their apps is on the rise. A growing number of persons are

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected] ’14, November 25 - 28 2014, Melbourne, VIC, AustraliaCopyright 2014 ACM 978-1-4503-3304-7/14/11 ...$15.00http://dx.doi.org/10.1145/2677972.2677985.

constantly connected to the Internet through their mobilesand have access to a plethora of different services and soft-ware. Service provisioning models are being adapted to fitmobility patterns, which allow the user to maintain a contin-uous presence in the network, e.g., WhatsApp, LINE, Drop-box, etc. These circumstances are resulting in collaborativeenvironments, which allow users to access, create and shareinformation with little effort.

However, as a result of this surge in ubiquitous services,mobile networks are facing an increased amount of datatransfer leading to network congestion, which directly im-pacts user experience. This throws challenges and opportu-nities at minimizing data transfer while allowing the usersto participate actively without compromising their interac-tivity and perceptibility with the smartphone apps.

Fortunately, mobility introduces the proximity attributein the mobile systems, which can be exploited in oppor-tunistic ways in order to release the network from exces-sive data transfer [10]. Smartphones are equipped withmechanisms that enable the mobiles to establish device-to-device (D2D) communication, e.g., Bluetooth, Wifi direct.D2D communication is preferable when end-to-end pointsrequire to exchange information and those are in close prox-imity. D2D communication accelerates the responsivenessof the apps and reduces unnecessary data transfer over themobile networks. Moreover, D2D communication can im-prove service provisioning and foster new ways to decen-tralized/share information, e.g., Mobile Social Network inProximity (MSNP) [4].

Establishing D2D communication is a complex task as itrequires explicit intervention of the user. D2D communi-cation is a potential threat for user’s privacy, which meansthat the user must be aware with who he/she is sharing in-formation and what information is being transferred backand forth. As a result, D2D mechanisms provide authenti-cation methods, e.g., passwords, pin-codes, etc. Since D2Dmechanisms introduce extra effort to the user, D2D com-munication is not very popular. Consequently, mobile usersprefer to rely on the cellular network for exchanging data.

However, social attributes can be leveraged to equip thedevices with trusted mechanisms that can automate D2Dcommunication [13]. For instance, a mobile can continuouslymonitor which are the most frequent devices with whom itshares temporal locations, is in social terms or in friend-

143

Page 2: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

ships. However, given the constrained capabilities of thesmartphones in terms of battery life and processing, thesekinds of strategies become undesirable for the mobile. Onthe other hand, mobile devices can rely on cloud to delegatetasks that require resource-intensive processing [6].

This paper attempts to figure out a mobile cloud strategythat enables smartphones in proximity to establish auto-matic D2D communication. We build and design a frame-work that uses audio data collected by integrated micro-phones as social attribute, e.g., voice, which is obtained fromuser’s environment to determine if the devices are likely tobe in same context, e.g., attendees of a meeting, conference,lecture, etc. The audio analysis happens in the cloud basedon machine learning techniques. In this process, mobile de-vices capture audio timestamps that are sent to the cloudfor classification as sequence of amplitude levels. Once de-vices in proximity are detected, the cloud sends back to thedevices a D2D communication plan, such that devices canestablish communication. The evaluation of our frameworkshows that the approach is feasible in practice.

The rest of the paper is organized as follow. Section 2 de-scribes the complete approach. Section 3 discusses the clientside of the framework along with the details about establish-ing D2D communication via Bluetooth. Section 4 discussesthe server side of the system along with a detailed discus-sion regarding the audio analysis via clustering algorithms.Section 5 provides the evaluation of the framework. Section6 presents the state of the art and related works. Finally,section 7 concludes the paper with future directions.

2. APPROACHThe platform is built following a client-server model. Each

client is provided with a service, which periodically createsa time series of audio amplitude data of a predeterminedlength, utilizing the integrated microphone of the device.In addition, all clients begin and end the capture of audioin synchronization, meaning that the time series recordingsare started and stopped on every device at the same time.The sequence of amplitude levels is regularly transmitted toa server, in specific intervals. Along with the time series,other details such as the device’s Bluetooth MAC-addressare also provided.

When the server receives a set of clients’ data, it runsa clustering algorithm on the time series in order to groupthem based on audio similarity. The server then provideseach member of a specific group with the information neededto create a Bluetooth network with other members in thatsame group.

After receiving this information, the clients in each groupcreate Bluetooth connections with other members of thesame group, with no input from the mobile user. Once thewireless network has been established in the group, userscan share files and text messages with other users withinthe group.

The client-side of the platform is implemented as an An-droid application. Communication with the server is estab-lished using either Wi-Fi or 3G/4G connections and mes-sages to the server are sent as HTTP requests. The server-side is implemented using Java Servlets technology, the serveritself is run by the open-source Apache Tomcat version 7web server and is established on Amazon cloud. The dataanalysis on the server uses the Java-ML machine learninglibrary, which includes an improved version of the Dynamic

Time Warping distance measure algorithm - FastDTW [18].The details are provided in following sections.

3. THE CLIENTThe client continuously records a series of maximum heard

amplitudes and transmits these to a server using HTTPPOST requests. For each POST request sent, the clientreceives information about the group that the client belongsto.

3.1 Application Startup and InitializationAs the client Android application is started, the software

first runs a few checks to determine whether the device isconfigured properly for running this application. This in-cludes determining whether an internet connection is avail-able and whether the Bluetooth radio is enabled.

After this, the difference between the device’s internalclock and a remote Network Time Protocol (NTP) timeserver clock is estimated. The same NTP server is usedby every client in the platform. This allows different devicesto reference a common clock and use it to start recordings inunison. The difference between the device’s local time andthe server’s time is stored and the NTP time server is notused during the rest of the client software’s lifetime.

At this point, the application can enter its main state:producing sequences of audio data and transmitting themto the platform’s cloud service.

3.2 Sequence Production and TransmissionThe client periodically transmits data to a server for pro-

cessing. This data is formatted as a JSON string, in whichthe following information is included:

• the client device’s Bluetooth MAC address;

• a user-created string depicting the user’s (nick)name;

• a time series of successively recorded amplitude values;

• an identifier code of the bluetooth group the devicemost recently belonged to;

• a timestamp depicting the time at which the recordingof the time series began

The creation of the amplitude time series is explained inmore detail in the next subsection. Once the time seriescreation is complete, each JSON object is transmitted im-mediately to the server using a HTTP POST request.

3.3 Using Integrated Microphones

3.3.1 Audio features and Time SeriesAudio representations are most often categorized into two:

a) time domain characterizations, which use a time-amplituderepresentation or b) frequency domain characterizations, whichuse frequency-magnitude representation. In time-amplituderepresentation, which is used by the framework, a signal isdepicted as amplitude varying in time.

In the time domain, features such as average energy, whichindicates loudness of the signal; volume distribution, whichis the variation of a signal’s energy level; and silence ratio,which describes how big a part of the signal is silent, can beextracted.

144

Page 3: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

In the proposed approach, a time series is created fromtime-amplitude samples gathered from the physical device’smicrophone. A time series is a data structure made up ofa sequence of values, usually depicting sequential measure-ments over time, often recorded at equal intervals. Time-series are popular in applications such as stock market anal-ysis, economic and sales forecasting, observation of naturalphenomena, etc. [11].

3.3.2 Microphones in AndroidThe Android platform allows to utilize the microphone

via the MediaRecorder API. MediaRecorder allows specify-ing the audio encoder, output format and audio source used(often, phones have two microphones- one in the front, usedfor calls, and one in the back, for video capture).

The work behind the creation of a single sequence is asfollows. The getMaxAmplitude() method of MediaRecorderreturns the maximum amplitude heard since the last call tothe method. This method is periodically called in a finiteloop. A set interval determines the time between methodcalls, and the values are stored into an array during thisloop. The resulting time series contains numeric data aboutthe amplitudes heard during the loop, no information aboutaudio frequencies is stored. The process of recording is re-flected to the user in the form of a progressbar.

Our approach uses the following constants when creat-ing the audio time series. The number of samples gatheredper series is 50. The sampling interval is 140 ms. Thus,one sequence represents a 50 ∗ 140ms = 7000ms time pe-riod. These numbers are reasonable, as in normal humanspeech, acoustic transformations (syllable changes, letter ex-pressions) happen within 100-200 ms [17] and sentences inspeech generally last a few seconds.

3.4 Handling Server InstructionsThe client receives a response for each POST request it

sent to the server. The server determines which group ofdevices the client belongs to, if any. The response, also aJSON object, contains nicknames of other devices belongingto the same group and a MAC address of the device to whichthe client should connect to. Additionally, MAC addressesof the devices the client should accept incoming connectionsfrom are included. An example JSON object is provided infigure 1.

{"acceptfrom" : "D0:51:62:93:E8:CE","connectto" : "CC:FA:00:16:2B:9A","groupid": "14cef550-2526-11e4-8c21-0800200c9a66""group" : ["Steven","Johnny", "Mike", "Andy"]}

Figure 1: An example of an instructions JSON ob-ject

The establishment of social network using Bluetooth net-work is further detailed in section 3.5. As each response con-tains a list of devices, the client can keep the list of groupmembers up-to-date, and display this information to the userdynamically. If a user leaves the group, his device will stopappearing in the member lists of others still in the group.

3.5 Social-aware D2D via Bluetooth

Once the group to which the client belongs is identified,the social network can be established among the devicesbased on Bluetooth network.

3.5.1 Bluetooth in AndroidTo use Bluetooth programmatically, the Android API pro-

vides RFCOMM (Radio frequency communication) sock-ets. RFCOMM is the most common socket type for Blue-tooth [8]. To create a connection between two devices, oneside must take the role of a server and create a listeningsocket - a BluetoothServerSocket object. The other side,the client, needs to create a BluetoothSocket. The Blue-toothServerSocket listens for connections and once a clientconnects successfully, it returns a BluetoothSocket object.The BluetoothServerSocket can listen for connections in twoways: using insecure RFCOMM socket or secure RFCOMMsocket. The insecure method requires no user input (such asPIN entry) to create the actual connection, however, it canbe vulnerable to Man In The Middle attacks [7].

After a connection is established, IO streams can be usedto transmit data, via usual Java conventions - by callinggetInputStream() and getOutputStream().

3.5.2 Topology / StructureIn the Bluetooth standard, a single device (acting as mas-

ter) can be directly linked up to 7 other devices (slaves).This structure is called a piconet and to overcome the limitof linking together up to 8 devices, various structures andmethods for combining piconets exist. For this approach,we have chosen the Ring Of Masters (ROM) structure [12].ROM is a ring-based topology, in which master devices forma ring. As each master also has slaves linked up to it, mas-ters thus act as bridges between different piconets in thering.

While the original ROM description involves a structureformation protocol including initial internode communica-tion and role assignment, in our case there is no need forthis as role assignment is handled by a central server whichalso does the clustering.

ROM fits the requirements of our approach, as the topol-ogy is easy to maintain as devices leave or join the network.If free slave slots are available, joining is trivial. If no freeslave slots are available, the ring is grown by adding newmasters into the ring, thus creating more free slave slots.The number of hops when routing traffic is under five for net-works of 30 nodes [12], which is a satisfying result. Takinginto account the typical use case for our platform, supportfor more than a couple of dozen nodes is not necessary.

When programmatically creating Bluetooth sockets, themethods createInsecureRfcommSocketToServiceRecord(UUIDuuid), listenUsingInsecureRfcommSocketToServiceRecord(Stringmac, UUID uuid) take an argument Universally unique iden-tifier (UUID).

In our approach, each client creates an UUID using theirown Bluetooth MAC-address and a string which is commonto the entire platform. This means that if some device D hascreated a listening RFCOMM socket with an UUID gener-ated using its own MAC-address, then another device E willbe able to reconstruct the same UUID and use it to connectto device D, because E has been provided the MAC-addressof D. Because these UUIDs must match in order for theconnection to be accepted, a level of security is provided.

3.6 File Sharing

145

Page 4: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

Figure 2: A Ring Of Masters structure example

Once the social network has been established, membersmay share files within the group. The client applicationsupports sending files to all network participators or a single,specific recipient.

To allow a file to reach the intended destination(s), a cus-tom class (BTMessage) is used which contains the followingfields that help transmit the file across the network.

1. The initial sender of the file

2. The intended destination of the file

3. The MAC address of the last transmitter

4. A byte array of the actual contents of the file

In the general scenario, the initial sender will create aBTMessage object and send it to all of its open Bluetooth-Sockets (see 3.5.1). As the BTMessage is received by theimmediate neighbours in the network, a given receiver willfirst check if it was the intended recipient for this BTMes-sage. If so, then the byte array inside BTMessage will beconstructed into an actual file and saved onto the device’sinternal storage. A request is then made to view the fileusing Android Intents [9].

The file is not stored if the current device is not the re-cipient. In this case (or if the recipient is the entire group),the next step is to rebroadcast the message.

To do this, the client goes through all of its open Blue-toothSockets and passes on the BTMessage to any socketwhich doesn’t correspond to the MAC in the third field ofthe BTMessage (as listed above). This ensures that we won’tretransmit the BTMessage to a client that already sent usthe message.

4. THE SERVERThe server gathers data from clients, and once a single

‘round’ of recordings have been received, the DBSCAN al-gorithm [5] is run on the data. The groups formed as aresult of the clustering are used as input to generate per-sonal instructions for each client on how to join the networkof devices which were clustered together. This process isrepeated continuously.

4.1 InitializationAs the server is started, first some global objects are cre-

ated, which will be accessible to different threads. The firstobject is a queue for incoming data (henceforth called thedata queue). As clients post audio sequences to the server,the sequences are stored in this queue to be accessed later.

The second object is a hash map (hash table), which isused to store instructions relevant to individual clients (here-after referenced to as the instructions map). The keysin the hash map are MAC-addresses of the clients. Thevalues corresponding to the keys are JSON objects whichcontain Bluetooth connectivity instructions and informationrelevant to the group the client has been clustered into.

Before the servlets which handle HTTP requests are en-abled, the server is also synchronized with the same NTPtime server clock as the clients. A new thread, called theWorkThread is also executed. The WorkThread is responsi-ble for periodically creating another type of thread - Clus-tererThread - and running instances of this thread. TheClustererThread is described in section 4.3.4.

4.2 Request HandlingAs already mentioned, the server is implemented as an

Apache Tomcat web server with Java Servlets. When aPOST-request is received, the JSON object described insidethe request body is added to the data queue. Later, the in-structions map is accessed using the Bluetooth MAC-addressof the request author as the key, determining whether thereare any instructions currently corresponding to it. If in-structions matching this key are found, they are added tothe response of the POST request. Otherwise, the responsewill contain no instructions.

4.3 D2D Clustering

4.3.1 ClusteringClustering is the process of dividing a set of data into

clusters, using some measure of similarity to decide whichdata points should belong to the same cluster. There are fivemain types of clustering methods: partitioning, hierarchical,density-based, grid-based and model-based clustering [11].

Partitioning algorithms partition a database D of n ob-jects into k clusters, where k is an input parameter. Thismeans that we need to know the number of existing clustersin advance. In addition, the partitioning approach regardsthe clusters as voronoi cells of a voronoi diagram, meaningthat cluster shapes are restricted to being convex.

Hierarchical methods use trees to determine clusters, ei-ther building clusters from leafs by merging them up to somepoint or by moving top-down along the tree, splitting thedata. This is especially useful for summarizing or visualiz-ing data [2].

Grid-based methods divide the data space into a grid ofcells. Clustering operations are then done directly on thisgrid, allowing for fast processing times.

Model-based form mathematical models to describe eachcluster and try to fit the data to given models. These meth-ods have the potential of considering noise and outliers inthe data.

In density-based clustering methods, clusters are createdfrom regions in the dataset, in which data points are spacedmore densely. Clusters are separated from each other byregions where the density is lesser. The main idea is to

146

Page 5: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

start growing a given cluster until the density in the regionpasses some given threshold. The advantages of density-based clustering methods are the ability to form arbitrarilyshaped clusters and to disregard noise and outliers. A typi-cal density based algorithms is DBSCAN.

4.3.2 Distance MeasuresClustering algorithms require a distance measure to do

classification. A well-known distance measure is the Eu-clidean distance. The Euclidean distance between two pointsis defined as the square root of the sum of the squares ofthe differences between the corresponding coordinates of thepoints. In two-dimensional Euclidean geometry, the Eu-clidean distance d between points a = (ax, ay) and b =

(bx, by) is defined as d(a, b) =√

(ax − bx)2 + (ay − by)2.For time series, Euclidean distance is defined as the sum

of the squared distances from each nth point in one timeseries to the nth point in the other. However, the Euclideandistance is only fitting for time series which are perfectlyaligned. The distance measure fails to recognize two highlysimilar time series if one is slightly shifted along the timeaxis.

4.3.3 Dynamic Time WarpingDynamic time warping (DTW) is a distance measure which

was proposed to tackle the shortcoming with time series.DTW is able to take into account global and local shifts intime.

For two time series, X and Y ,

X = x1, x2, . . . , xn

Y = y1, y2, . . . , ym

an n-by-m cost matrix is constructed, where the (ith, jth)element of the matrix contains the distance between twopoints xi and yj . Most often, the Euclidean distance is used.Next, an optimal alignment between X and Y is found (analignment which would have minimal overall cost accordingto the cost matrix).

However, the space of possible alignments between X andY is large, thus DTW has a time and space complexity ofO(N2). A modification of DTW, called FastDTW, was cre-ated to improve upon the complexity of DTW. Using a mul-tilevel approach, which initially finds optimal alignments be-tween time series at a coarser level and then refines the reso-lution, a linear time and space complexity was achieved [18].This makes FastDTW a fitting time series distance measurefor situations where the processing time is limited.

Several machine learning libraries exist which include dif-ferent clustering algorithms and distance measures. In ourframework we used the Java-ML, a library which focuses onproviding developers with the opportunity to use machinelearning in their own software. Java-ML provides simple,basic and easy to understand interfaces which developerscan integrate into their code. Java-ML also involves a num-ber of similarity measures in addition to FastDTW, whichwas used in our approach [1].

4.3.4 ImplementationIn our approach, the WorkThread (mentioned in 4.1) sched-

ules the creation and execution of ClustererThread instances.A new ClustererThread is executed every m time units, thethread’s execution is always started n time units after theclients start recording audio. This is illustrated in figure 3,

Client

Server

t0 t0 + m t0 + 2m t0 + 3mre

cord

()

reco

rd()

reco

rd()

reco

rd()

POST

POST

POST

instructio

ns

instructio

ns

instructio

ns

t0 + r t0 + m + r

t0 + n t0 + 2n t0 + 3n

cluster()

cluster()

cluster()

Figure 3: A timeline of interactions between theclient and server

as the cluster() method is periodically called.Two things must be ensured when choosing when to call

the clustering method (parameter n). First, only data fromone recording session must be clustered. Second, the clus-tering must finished by the time the client makes the nextPOST request and expects a response containing the resultsof the clustering. For the example in figure 3, this can beensured by requiring that:

t0 + n > t0 + r + d1 , where d1 is the POSTrequest network delay

t0 + n + d2 < t0 + m + r , where d2 is the POSTresponse network delay

More generally, we require thatn > r + d1 , where d1 is the POST

request network delay

n + d2 < m + r , where d2 is the POSTresponse network delay

The following actions are performed by the ClustererThread.The server’s data queue is emptied, the audio time series arepreprocessed (explained in subsection 4.3.6) and a datasetobject is formed out of all the data. The dataset is thenprovided as input to the clustering algorithm itself.

4.3.5 DBSCANDBSCAN (Density-Based Spatial Clustering of Applica-

tions with Noise) [5] is a clustering algorithm which requiresonly 2 input parameters. It can detect the number of clus-ters automatically.

As explained in [23], DBSCAN uses the following conceptfor clusters: ”a cluster is defined as containing at least aminimum number of points, every pair of points of whicheither lies within a user-specified distance (ε) of each otheror is connected by a series of points in the cluster that eachlie within a distance of ε of the next point in the chain.”

The algorithm can use any distance function for two points,meaning that for some given application, a fitting distancemeasure can be used [5]. DBSCAN is also included in theJava-ML library. DBSCAN is a fitting choice for the purpose

147

Page 6: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

of clustering audio amplitude data with the intent of formingsocial groups on-the-go, as it can determine the number ofgroups automatically and allows to use a distance measurewhich is fitting for audio data.

The Java-ML implementation of DBSCAN uses 3 param-eters: a distance measure, the minimum no. of points fromwhich a cluster can be formed; and the ε-value. In our ap-proach, DBSCAN is configured to run with the followingparameter values:

1. FastDTW is given as the distance measure to use.

2. The minimum points for a single cluster is 2.

We wish to support work groups of sizes 2 persons andup, thus the choice for this parameter’s value is clear.

3. The ε-value is set to 15.5.

In the course of the practical work in our approach, wecame to fix ε at this value, as it showed the best resultsin comparison to other values during initial tests. Amore detailed methodology of determining an appro-priate ε-value is described in [5].

When the clustering algorithm finishes its work, it returnsa collection of datasets each representing a group of clientsthat are in proximity.

4.3.6 PreprocessingBefore clustering data, it is desirable to process the dataset

in such a way that the features on which the distance mea-sures are based, are better distinguished.

With time-domain audio data, one of the fundamentalsteps is feature scaling: ensuring that the values of eachtime series are of the same magnitude. A simple example il-lustrating this is the case where two microphones are record-ing audio which comes from the same source, however onedevice is significantly closer to the audio source than theother. This results in the two recordings having differentaverage energy (amplitude) levels, even though the shape ofthe time-amplitude curves would look similar. In figure 4,this is the case when comparing the 2nd and 4th rows ofthe 1st column. The signals have a similar pattern, yet themagnitude differs greatly.

Figure 4: Left column: original data, center col-umn: mean-normalization, right column: max-normalization

In our approach, we overcome the above-mentioned mag-nitude differences by taking the mean of a given amplitudeseries and dividing each value in the series with that meanvalue. Alternative methods were also tried. One simple ap-proach would be to rescale the values using the maximumvalue of the series, giving a range of values within [0 · · · 1].Initial tests however showed a higher error rate comparedto using the mean. Using the mean gives more weight topeaks in the signal if the rest of the signal is not very noisy.Emphasizing peaks is beneficial, as peaks (high amplitudesounds) are more likely to be heard on all devices at thesame time.

In addition to feature scaling, signal smoothing was alsotried. Using a window function (the Hamming function) andShort-Time Average Energy analysis, the data was smoothed[17]. However, tests showed no benefits compared to usingmean scaling alone.

4.4 D2D Coordination PlanEach time a clustering finishes, the resulting clusters are

processed to generate pairing instructions for the client de-vices. These instructions must allow the forming of the Blue-tooth network described in 3.5.

For each cluster, the most frequently occurring group IDis determined (each device has provided the identifier of thegroup they last belonged to). The server keeps a record ofprevious networks that have been formed. The most com-mon ID in a cluster is used to retrieve the previously formednetwork and process it for any changes. By using the high-est ocurring group ID, it is most likely that least connectionswill have to be reconfigured.

When processing the existing network, roles are assignedand reassigned so as to preserve the Ring Of Masters struc-ture. If the most frequent group ID doesn’t exist, the servercreates a new network.

Once the network has been processed or created, custominstructions for each device are created based on the net-work’s state. The contents of the instructions were discussedin 3.4 .

5. EVALUATIONSeveral tests were conducted, to measure the effective-

ness of this approach. The tests were done using Androiddevices of different types, namely several smartphones ofdifferent manufacturers and one tablet device. The exactdevice models have been brought out in table 1 .

Device Amount

LG Nexus 5 5

Sony Ericsson Xperia Acro S 1

Samsung Galaxy S2 1

Asus Eee Pad TF101 1

Table 1: Devices used for testing

The test scenario constituted of splitting devices into twogroups and positioning both groups in separate rooms. Eachroom had an audio source of a person reading a book. Inboth rooms, the devices were positioned close to the audiosource (1-2 meters). The client application was started on

148

Page 7: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

all the phones and more than 60 iterations of clustering liveaudio time series were then run concurrently.

5.1 Measures UsedTo measure the precision of the approach the following

method is used. Per a clustering instance, for every devicewhich is placed in the wrong cluster or not placed in a clusterat all, 1 is added to the clustering error rate. Then, weget the error rate per device by dividing the clusteringerror rate by the total number of devices clustered. Forexample, if we had devices A,B,C in room 1 and devicesD,E, F in room 2; and if the system creates the followingclusters: [A,C,D] and [E,F ], then the error rate per devicewould be 0.33.

5.2 Conducting the Tests

5.2.1 Round OneIn the first round, five devices were used, thus creating

groups of 2 and 3. One room had an actual human readinga book out loud, in the other room a speaker was playingback an audiobook. 65 clusterings were run, the averageerror rate per device was 0.178. A noteworthy phenomenain the first round, however, was the fact that when lookingat instances where just one device was missing or misplaced,in 8 cases out of these 9, the device causing the error wasthe Samsung Galaxy S2.

In the first round, out of 65 clusterings, 43 had no mis-takes, meaning 66.15% of the clusterings were entirely accu-rate.

5.2.2 Round TwoIn the second round, the Samsung Galaxy S2 was removed

and 3 additional LG Nexus 5 -s were added. This time, a liveperson was reading a book in both rooms.

Out of 70 clusterings, 53 had no mistakes, thus 75.71%of the clusterings were entirely accurate. The average errorrate per device in this case was 0.104.

However, 11 of the erroneous results were from cluster-ing instances where the clustering failed to work at all, thatis, all devices were put in a single cluster. When discard-ing these instances where the clustering failed entirely, theaccuracy is considerably higher, 89.8%.

6. RELATED WORKSeveral works exist which aim to derive context of a single

client via sensors [15, 22]. This works by gathering sensordata from the client, and then running that data througha classifier, which usually runs on a server. However, thepotential of using microphones in mobile applications hasbeen explored less than other sensors such as GPS receivers,cameras or gyroscopes.

Regarding mobile social networks in proximity and adhocmobile social networks, significant research has been per-formed using different methods [4,16,19]. CroudSTag [19] isone such application which helps in forming a social groupwith people identified in a set of media files, using facerecognition cloud services. Hewlett-Packard Labs presentedan implementation which is most similar and handled thesame goal that this work is about. Their project used si-lence signatures to match similar audio signals. To createthese silence signatures, the audio signal is quantized into si-lence and non-silence through the use of an adaptive silence

threshold [21].Similarly, Spartacus [20] uses an acoustic technique based

on the Doppler effect to enable users to accurately initi-ate an interaction with a particular target device in theirproximity through a pointing gesture. The application runscontinuously in the background, removing need for manualuser pre-configuration for interactions. Within a 3 meterdistance, Spartacus achieves 90% device selection accuracyon average. Their solution also uses Android and passive,periodic audio sensing. However, in their work, the commu-nication itself is initiated by the user. After the softwaredetects which device the user wishes to connect to, the con-nection is established automatically.

Another solution is SurroundSense [3], which uses the mi-crophone in conjunction with other sensors to form ambiencefingerprints which describe the location or context the user isin. The client-side of the SurroundSense framework recordssensed values from sensors, pre-processes them at the client-side and then sends the preprocessed data to a server. Theserver side segregates the different types of data and dealswith each one according to a module assigned to it. Forexample, the server might perform color clustering on im-age data. After the server has processed the different typesof sensor data, an ”ambience fingerprint” is formed, whichis then forwarded to a matching module, which matches itto already known fingerprints for localization. The authorsnote that audio information provides benefits such as recog-nizing walls, which, in addition to working as sound barriers,are something that humans often associate with barriers ofcontext.

Similarly, CenceMe [14] application is a system which usesvarious sensors of the Nokia N95 phone to detect activitiesand context of users (e.g. walking, having a conversation,sitting in a vehicle). Because analyzing continuous streamsof sensor data can be computationally expensive, their solu-tion ”splits” some of this work. Some classification is doneby the client and some by the server. More complex classifi-cations, such as ones that involve sensor data from multipleclients, are done on the back-end server. This so-called split-level design offers benefits such as allowing users to createcustom markers for activities or contexts that their phonehas classified only locally. This allows for users to createclassified states beyond the ones that the framework ini-tially proposes. In addition, because some classification isdone in-phone, the data being sent to the server is morelight-weight.

7. CONCLUSIONS AND FUTURE WORKThis work presented an approach for using mobile device

microphones and cluster analysis to create a collaborativesocial network on-the-go. The goal was to create a dynamicBluetooth network of persons in close proximity without re-quiring the users to setup the connection. Such an automaticD2D communication framework can foster new ways to de-centralized/share information such as Mobile Social Networkin Proximity (MSNP), without congesting the cellular net-work.

A complete implementation for this approach was described,involving an Android client application equipped with filesharing capabilities and a Java Servlets-based web server.The server-side of the implementation uses Dynamic TimeWarping and Density-Based Spatial Clustering For Applica-tions with Noise to analyze audio data. The presented im-

149

Page 8: [ACM Press the 13th International Conference - Melbourne, Victoria, Australia (2014.11.25-2014.11.28)] Proceedings of the 13th International Conference on Mobile and Ubiquitous Multimedia

plementation automatically creates Bluetooth connectionsbetween devices in proximity without additional steps re-quired from the user other than initiating the client appli-cation.

To support this implementation, a number of similar andrelated works were researched, involving sensor usage andmobile wireless connectivity. Different categories of cluster-ing algorithms were examined and density-based clusteringwas chosen as a fitting method for the given goal. The ac-curacy of the clustering, which is the key factor in this ap-proach, was also tested. When grouping up devices basedon microphone data, test results showed an accuracy of upto 75.71% and in certain cases up to 89.8%.

To improve this approach, we propose several possible fu-ture developments. Firstly, a survey of audio preprocessingmethods could be carried out to acquire better knowledgeabout methods used to extract more features out of audiodata. The preprocessing used in this approach is relativelylight. Secondly, it would be desirable to receive instructionsfrom the server as soon as they are created, instead of re-ceiving them when making future requests.

8. ACKNOWLEDGMENTSThis work is supported by European Regional Develop-

ment Fund through EXCS, Estonian Science Foundationgrant PUT360 and Target Funding theme SF0180008s12.

9. REFERENCES[1] T. Abeel, Y. V. de Peer, and Y. Saeys. Java-ML: A

machine learning library. Journal of Machine LearningResearch, 10:931–934, 2009.

[2] A. Amini, T. Wah, and H. Saboohi. On density-baseddata streams clustering algorithms: A survey. Journalof Computer Science and Technology, 29(1):116–141,2014.

[3] M. Azizyan, I. Constandache, and R. Roy Choudhury.Surroundsense: Mobile phone localization viaambience fingerprinting. In Proceedings of the 15thAnnual International Conference on MobileComputing and Networking, MobiCom ’09, pages261–272, New York, NY, USA, 2009. ACM.

[4] C. Chang, S. N. Srirama, and S. Ling. Towards anadaptive mediation framework for mobile socialnetwork in proximity. Pervasive and MobileComputing, 12:179–196, 2014.

[5] M. Ester, H. peter Kriegel, J. S, and X. Xu. Adensity-based algorithm for discovering clusters inlarge spatial databases with noise. pages 226–231.AAAI Press, 1996.

[6] H. Flores and S. N. Srirama. Mobile cloud middleware.Journal of Systems and Software, 92:82–94, 2014.

[7] Google Inc. BluetoothAdapter. Android API.https://developer.android.com/reference/

android/bluetooth/BluetoothAdapter.html\

#listenUsingInsecureRfcommWithServiceRecord(java.

lang.String,java.util.UUID). Accessed: 31.03.2014.

[8] Google Inc. BluetoothSocket. Android API.https://developer.android.com/reference/

android/bluetooth/BluetoothSocket.html.Accessed: 31.03.2014.

[9] Google Inc. Intents and intent filters.http://developer.android.com/guide/components/

intents-filters.html. Accessed: 13.05.2014.

[10] B. Han, P. Hui, V. A. Kumar, M. V. Marathe,J. Shao, and A. Srinivasan. Mobile data offloadingthrough opportunistic communications and socialparticipation. Mobile Computing, IEEE Transactionson, 11(5):821–834, 2012.

[11] J. Han and M. Kamber. Data Mining: Concepts andTechnique. Second Edition. Morgan Kaufmann, 2006.

[12] T. Hassan, A. Kayssi, and A. Chehab. Ring of masters(rom): A new ring structure for bluetooth scatternetswith dynamic routing and adaptive schedulingschemes. Pervasive and Mobile Computing, 4(4):546 –561, 2008.

[13] Y. Li, T. Wu, P. Hui, D. Jin, and S. Chen.Social-aware d2d communications: qualitative insightsand quantitative analysis. Communications Magazine,IEEE, 52(6):150–158, 2014.

[14] E. Miluzzo, N. D. Lane, K. Fodor, R. Peterson, H. Lu,M. Musolesi, S. B. Eisenman, X. Zheng, and A. T.Campbell. Sensing meets mobile social networks: Thedesign, implementation and evaluation of the cencemeapplication. In Proceedings of the 6th ACM Conferenceon Embedded Network Sensor Systems, SenSys ’08,pages 337–350, New York, NY, USA, 2008. ACM.

[15] C. Paniagua, H. Flores, and S. N. Srirama. Mobilesensor data classification for human activityrecognition using mapreduce on cloud. ProcediaComputer Science, 10:585–592, 2012.

[16] A.-K. Pietilainen, E. Oliver, J. LeBrun, G. Varghese,and C. Diot. Mobiclique: middleware for mobile socialnetworking. In Proceedings of the 2nd ACM workshopon Online social networks, pages 49–54. ACM, 2009.

[17] G. D. Poli and L. Mion. Algorithms for Sound andMusic Computing. 2006. Chapter 5: From audio tocontent.

[18] S. Salvador and P. Chan. FastDTW: Toward accuratedynamic time warping in linear time and space.Intelligent Data Analysis, 11(5):561–580, 2007.

[19] S. N. Srirama, C. Paniagua, and H. Flores. Socialgroup formation with mobile cloud services. ServiceOriented Computing and Applications, 6(4):351–362,2012.

[20] Z. Sun, A. Purohit, R. Bose, and P. Zhang. Spartacus:Spatially-aware interaction for mobile devices throughenergy-efficient audio sensing. In Proceeding of the11th Annual International Conference on MobileSystems, Applications, and Services, MobiSys ’13,pages 263–276, New York, NY, USA, 2013. ACM.

[21] W.-T. Tan, M. Baker, B. Lee, and R. Samadani.Sensing device co-location through patterns of silence.In Proceeding of the 11th Annual InternationalConference on Mobile Systems, Applications, andServices, MobiSys ’13, pages 473–474, New York, NY,USA, 2013. ACM.

[22] S. Wang, J. Yang, N. Chen, X. Chen, and Q. Zhang.Human activity recognition with user-freeaccelerometers in the sensor networks. In Int. Conf. onNeural Networks and Brain, 2005. ICNN&B’05.,volume 2, pages 1212–1217. IEEE, 2005.

[23] I. H. Witten, E. Frank, and M. A. Hall. Data mining:Practical machine learning tools and techniques. thirdedition. 2011.

150