1 paco: a system-level abstraction for on-loading ... · 1 paco: a system-level abstraction for...

1

PACO: A System-Level Abstraction forOn-Loading Contextual Data to Mobile Devices

Nathaniel Wendt, Student Member, IEEE, Christine Julien, Senior Member, IEEE

Abstract—Spatiotemporal context is crucial in modern mobile applications that utilize increasing amounts of context to better predictevents and user behaviors, requiring rich records of users’ or devices’ spatiotemporal histories. Maintaining these rich historiesrequires frequent sampling and indexed storage of spatiotemporal data that pushes the limits of resource-constrained mobile devices.Today’s apps offload processing and storing contextual information, but this increases response time, often relies on the user’s dataconnection, and runs the very real risk of revealing sensitive information. In this paper we motivate the feasibility of on-loading largeamounts of context and introduce PACO (Programming Abstraction for Contextual On-loading), an architecture for on-loading data thatoptimizes for location and time while allowing flexibility in storing additional context. The PACO API’s innovations enable on-loadingvery dense traces of information, even given devices’ resource constraints. Using real-world traces and our implementation for Android,we demonstrate that PACO can support expressive application queries entirely on-device. Our quantitative evaluation assesses PACO’senergy consumption, execution time, and spatiotemporal query accuracy. Further, PACO facilitates unified contextual reasoning acrossmultiple applications and also supports user-controlled release of contextual data to other devices or the cloud; we demonstrate theseassets through a proof-of-concept case study.

F

1 INTRODUCTION

S PATIOTEMPORAL data is increasingly essential to mobileapplications, both those that we already use on a daily

basis and those envisioned as part of a future Internetof Things. These applications rely on understanding therelationships between users and their surroundings and theways in which those relationships change over space andtime. Essential to mapping this relationship is a basic un-derstanding of the user’s spatiotemporal history—basicallya timestamped trace of the user’s device through space.

Implementing applications that use such rich spatiotem-poral context information is possible given today’s technicalcapabilities [1]. Today’s mobile devices already largely col-lect this information, whether by applications that collectspatiotemporal traces explicitly, such as fitness and lifestyleapps1, by applications that collect (potentially spotty) spa-tiotemporal traces implicitly, for example via social networkcheckins, or, more insidiously, by service providers them-selves. Furthermore, it has been shown that when usersare presented with the frequency that applications requestcontextual information and sensor data, they are likely toadjust or revoke the permissions [2]. However, in most ofthese examples, the user’s spatiotemporal information isoff-loaded to the cloud. This not only introduces communi-cation and resource (e.g., energy) costs because it requiresmaintaining a persistent connection to the Internet, but,more importantly, it releases highly sensitive informationto the control of a third party who may not be completelytrustworthy or who may be compelled to share the data.This storage pattern can also artificially limit the granularityof data collected based on a desire to limit the overhead ofoffloading the data. Further, these approaches are “siloed”

• N. Wendt and C. Julien are with the Department of Electrical andComputer Engineering, The University of Texas at Austin.E-mail: {nathanielwendt, c.julien}@utexas.edu

1. http://www.mapmyrun.com, https://www.moves-app.com

such that each consumer of spatiotemporal data collects itsown traces, meaning that many efforts may be redundant.

The few approaches that do not offload all of this dataare not intelligent in their ability to reduce the storagefootprint [3] (e.g., they simply statically decrease samplingrates, or, worse, completely disable location services) andthey target individual applications in a siloed fashion. Otherapproaches have created special purpose operating systemsand devices that silo private information entirely2.

We instead propose a fundamental shift in howspatiotemporal-aware mobile applications are developed,by on-loading storing and querying contextual data, stillmaking expressive spatiotemporal data available to appli-cations but in a user-controlled way on-device. That is,we propose that responsibility and ownership of intenselypersonal spatiotemporal data be brought back onto theuser’s device and under the explicit and individual con-trol of the user. This requires making it possible to storeand query a device’s spatiotemporal data using only thedevice’s resources without any external infrastructure orcommunication. This on-loading is feasible given today’smobile device capabilities and will only become more so ascapabilities improve. The fundamental principle underlyingour approach is that contextual data should exist in a rawform only on the device where it is collected and that deci-sions about how and when to share detailed contextual datashould be retained by the owner of the data. Our approach,termed PACO, for Programming Abstraction for ContextualOnloading, makes it possible for an off-the-shelf commoditymobile device to maintain a compact yet comprehensivestore of its own (spatiotemporal) contextual history, directlyaccessible to applications on the device. Note that while weborrow concepts from spatial and moving object databases,PACO only stores the history of a single user. In our vision

2. https://www.silentcircle.com/products-and-solutions/devices/

arX

iv:1

703.

0350

4v1

[cs

.OH

] 1

0 M

ar 2

017

http://www.mapmyrun.com

https://www.moves-app.com

https://www.silentcircle.com/products-and-solutions/devices/

2

of a future, PACO is integrated with the OS as a systemservice and provides applications running on the OS theonly available window into location services. Specifically, inthis vision, applications would no longer have unfetteredaccess to the device’s precise location information; insteadall information about a device’s spatiotemporal trajectorieswould be available only through the PACO interfaces. Be-cause of this, it is important that PACO provide a sufficientlyexpressive API to not over-limit application capabilities.

In addition to providing direct user control, this on-loaded contextual data store also enables unifying con-textual data across applications. Rather than maintainingmultiple application-specific contextual “silos” (as is thecase today, although typically using the cloud), we create asingle contextual data abstraction that can be shared acrossapplications.

Supporting individual users and devices is just onepotential use of rich contextual data. Applications may alsobenefit tremendously from sharing snapshots of contextualdata with other nearby devices, either via direct (device-to-device) sharing or through hyper-localized cloudlet sup-port [4]. While the humans about whom this data is gener-ated are unlikely to be willing to share the raw data publicly,they are often willing to share with other nearby, evenunknown, users [5]. Using basically the same abstractionsthat PACO uses to reduce the storage footprint, PACO alsocreates expressive options for sharing a spatiotemporal his-tory that releases varying amounts of personal information.Our approach supports sharing contextual data, allowingusers to retain control over how, when, and what is shared.

Of course, while on-loading spatiotemporal data storageand querying has many benefits, there are also potentialdrawbacks. The storage space and energy consumption onthe mobile device are very important since we target users’mobile phones, which must support many varied dailytasks. Further, PACO achieves on-loading of spatiotemporaldata by intelligently determining what data is most impor-tant to maintain; because of the algorithms involved, theuser must explicitly trade-off the fidelity or granularity ofthe stored data for the costs incurred in storing it. Therefore,as part of our evaluation of PACO, we study both the costsand quality of the resulting on-loaded data store and theirpotential impacts on applications.

Contribution. In summary, this paper’s primary con-tribution is the introduction of the PACO abstractions andassociated system service implementation. At a basic level,PACO completely onloads a user’s spatiotemporal data stor-age to the user’s personal device. As a result of onloading,concerns of mobile device resource constraints must betaken into account. To make this tractable, PACO uses thesemantics of the spatiotemporal data itself, along with high-level guidance from the user, to determine what to storein order to maintain sufficient spatiotemporal fidelity. Thisbrings two primary benefits. First, the user retains individ-ual, direct, physical control over the stored spatiotemporalinformation—no third party service “owns” or “controls”this highly personal information. Second, the user’s ownapplications on the personal device can share the samespatiotemporal context, breaking down the walls of tradi-tional on-device application silos. PACO relies on existingstructures (i.e., the R-Tree and the k-d tree, described in

Section 3) and uses them in ways such that their behaviorand interaction is optimized for both energy and storageconcerns when used to capture highly detailed spatiotem-poral information on a mobile device.

To demonstrate how PACO can act as an abstractedlocation services to better protect the user, we present PACOaccess profiles, which enable users to specify how and whento share filtered, fuzzed, or otherwise lossy representationsof the user’s own spatiotemporal context information. Foreach application, the user can specify an access profilecontrolling the query parameters, effectively controlling thelossiness of the released data.

The remainder of this paper is organized as follows.We begin in the next section by providing a high-levelsystem model that overviews the concrete contributionsembedded in PACO. Section 3 uses this model to statePACO’s contributions in the context of related efforts, ex-plaining why this existing work fails to achieve the PACOvision. We also highlight some key pieces of backgroundinformation necessary to understand PACO’s abstractions.Section 4 presents the details of PACO and the PACO API.We present evaluations of PACO in Sections 5 and 6; theformer provides quantitative assessments of PACO, whilethe latter demonstrates PACO’s application-level capabilitiesthrough use cases. Section 7 discusses extension of PACOand addresses some of its limitations, and Section 8 con-cludes.

2 A MOTIVATING SYSTEM MODEL

This paper describes the PACO API and its supportinginfrastructure, which together enable on-loading spatiotem-poral data and rich application-level queries over that data.PACO fits in a larger system context that makes contextualdata accessible not only to local applications but also tonearby devices, opportunistically connected via device-to-device communication, and to the larger world throughcloud off-loading. While this paper does not focus primarilyon the elements of the PACO API used for this sharing of theon-loaded spatiotemporal information, the broader systemmodel includes these capabilities.

Fig. 1 shows PACO in this larger context. At the baselayer of this architecture, we rely on existing implementa-tions that can support a spatiotemporal database (“InternalStructure” in the figure, which we implement using anR-Tree [6]) that can store frequent updates of time- andlocation-stamped observations of context. Elements are in-dexed within the internal structure based on their locationcoordinates and time stamps; each element in the internalstructure is also mapped to one or more elements in theContext Index, indicating the semantic information associ-ated with the referenced space and time. For example, anobservation might be described as: “at 5:10pm the user isrunning 2 miles starting from his house (actual gps coor-dinates)” or “at 6:00am the user purchased a coffee at thecafe and gave the cafe a 5 star rating”. For each data item,the spatiotemporal data forms the base to which additionalcontext (e.g., that the current activity is running or that theuser purchased coffee or gave a particular rating) can beadded. Observations are captured by applications in user-space (e.g., via the Apps and Sensors components in Fig. 1).

3

Capturing context may require explicit user interaction, e.g.,a user inputs a review of a cafe, while other context can beautomatically captured in the background through sensoractivity, e.g., the ambient nose level at a cafe.

Fig. 1. System architecture; dotted region is Paco’s contributions

The PACO API provides access to the stored data in amanner tailored to mobile applications. At the most basiclevel, the user’s personal applications on the device canfreely query the stored spatiotemporal data, retrieving anyview of the raw data desired. Practically, users will likelywant to exercise control over release of their raw contextualdata even to on-device apps as well as for device-to-device(D2D) exchange and cloud offloading. For this reason, PACOuses access profiles to further filter or otherwise alter orconstrain the contextual information released from PACO.These access profiles are user- or device-defined and cantake into account the user’s current situation (e.g., physicalenvironment), his social network, etc. These access profilescan govern the granularity of the information released (e.g.,releasing only aggregate “coverage” views of large areas) orthe spaces and times about which information is released(e.g., releasing fairly detailed information about the user tohis colleagues in his work environment or releasing anyinformation that is more than a week old). These accessprofiles can be applied whether the data is being sharedin the local environment (e.g., through D2D exchanges) orto offload contextual data to the cloud, for example offload-ing sufficiently dated information or lossy summaries thatobfuscate the user’s exact contextual data.

In this paper, we focus on the PACO API as it relatesto on-loading spatiotemporal contextual storage and sub-sequent on-device application queries; these elements aredepicted within the dashed rectangle in Fig. 1. We definethe PACO data structure, based on the existing R-Tree spa-tiotemporal index. We create the Smart Insert method andthe query interface that defines the PACOAPI, and we showhow Access Profiles can be used to restrict the release ofpotentially private contextual data. We do not implementcommunication mechanisms to enable D2D sharing andcloud offloading; we could easily connect PACO to existingmechanisms in these spaces [7]. We do not include the

Context Index in this investigation; for the remainder ofthis paper, we treat the structure as storing just the spaceand time of the observations, what we assume to be the“heavy-lifting” and most general purpose pieces of contex-tual indexing. Since our spatiotemporal index is relational,it supports additional context indexing through joins onselection queries. Including the contextual indices as well asthe associated contextual ontology is left for future work,and is also addressed in much existing work on contextontologies [8], [9].

3 RELATED WORK AND BACKGROUND

The ability to store, query, and reason about spatiotempo-ral information is increasingly essential to mobile appli-cations [10], [11], [12]. Increasing concerns about locationprivacy [13] have motivated researchers to explore privacyprimitives associated with location sharing [14], [15], witha focus on determining what location information to shareand how to share it in a way that shields the user. Fur-thermore, on-loading, or moving storage and processing ofdata onto the device, has become popular for improvinguser experience in mobile computing environments [16],[17]. The contributions in this paper couple the idea of on-loading with the indexing of spatiotemporal data to supportvaluable contextual data storage of any type. In order toon-load large amounts of context, a proper adaptation ofexisting spatiotemporal indexing methods is necessary.

Complementary work in mobile context-awareness en-ables context processing on-device. ACE [18] is a contextreasoning framework that executes on top of sensors andthe data streams those sensors generate. ACE attempts to re-duce the overhead of context sensing by making inferencesabout what context streams to collect using application-levelinformation across multiple applications. ACE’s inferencemechanisms only work for boolean context types (e.g., “isthe user running”) and not continuous ones like a trace ofthe user’s spatiotemporal movement. Further, ACE assumesthat the applications access only instantaneous context infor-mation and does not support historical introspection acrossstored context information. In contrast, a data store likePACO could be layered underneath an inference engine likeACE providing a richer store of context information onwhich to make application decisions.

SeeMon [19] was an early software enabler of on-devicecontinuous context sensing. Similar approaches to adjustingsensing tasks based on application needs and a device’scurrent situation have resulted in a wealth of similar ap-proaches for continuous context sensing [20], [21], [22]. Asan exemplar, SeeMon intelligently adjusts how context is ac-quired, specifically by monitoring context changes instead ofcontinuously sampling and delivering raw values. Similarly,work related to efficient trajectory sensing dictates how andwhen to activate various on-board sensors to accuratelydetermine the device’s location while incurring low energyoverheads [3]. CoMon [23] extends these ideas beyond asingle device and creates a framework by which co-locateddevices can collaboratively sense the ambient environment,selecting the best sensors to task across a network of nearbydevices. Like ACE, these approaches for intelligent and effi-cient continuous sensing deliver the instantaneous context;

4

in addition, they focus on selecting the “best” set of sensorsand sensing parameters to do that. These approaches donot create an on-device contextual history. As with ACE,PACO directly complements these efforts; as (spatiotempo-ral) context information is continuously generated by theseapproaches, it can then be stored efficiently and queried inthe future using PACO’s API.

In contrast to the wealth of approaches enabling effi-cient context sensing on commodity smartphones, existingapproaches for storing that information focus exclusively onoff-loading the data to the cloud and enabling efficient andexpressive access to it there. That is, existing approacheseither assume the sensed context information is shippedoff of the device to be stored elsewhere or they assumethat the device’s applications simply discard any contextinformation that is not immediately consumed. This storagegap is exactly PACO’s target. We on-load the storage ofacquired context information, making it available to thedevice’s applications for future queries while reducing theenergy and communication costs of offloading and givingthe user direct control over the potentially personal andprivate spatiotemporal information.

As a starting point for exploring efficient and effec-tive on-device storage of spatiotemporal data, it is usefulto examine the approaches used in server-based systems.Storing spatial data has a rich history in image processing,geographic information systems (GIS), and robotics. Grid-based approaches, which divide space into regions andinsert data into the grid square representative of the data’slocation, are the most straightforward [24]. Clever statisticalapproaches can optimize queries over this data. This type ofapproach is not well suited for data that is dynamic or datasets with “hot spots,” i.e., spatial areas with a high numberof data points. In both cases, selecting an optimal grid sizeis difficult, and grid bounds may not be known a priori.

The widely used R-tree [6] maintains a balanced struc-ture by representing objects within minimum boundingrectangles and then creating a hierarchical (tree) representa-tion that relates the rectangles to one another. In an R-tree, arectangle at a node in the tree completely contains any rect-angles of any of the node’s descendents in the tree; the tree isstructured to support queries through simple computationsof intersection and containment. An R-tree is especiallyuseful for storing objects encompassing some area, whichmaps naturally to the abstraction of minimum boundingrectangles. The most complexity in using an R-tree involvesdeveloping algorithms to minimize or eliminate boundingrectangle overlap [25], thus avoiding worst-case query per-formance. Existing R-tree optimizations reduce this overlapby reinserting points [25] or duplicating objects [24]. Becausethe R-tree is designed for on-disk storage, we use it as thefoundation of PACO’s on-device spatiotemporal database,which must be persisted on the device to be shared amongapplications and to extend its lifetime through applicationrestarts and power cycling of the device3.

While it is quite beneficial for on-disk storage, the R-treeis not as efficient when used for in-memory computations.The k-d tree [26], on the other hand, is very well-suited for

3. We use the R*tree, which encompasses the discussed dynamicmaintenance optimizations

in-memory computations, especially when all of the dataitems are known ahead of time, and algorithms can beemployed to directly construct a balanced tree. A k-d treecan be thought of as a k-ary search tree that recursivelypartitions data points along k coordinate axes alternatingbetween the k dimensions. PACO leverages these existingstorage structures by employing an R-tree as the base datastructure stored on disk, complemented by a k-d tree ofthree dimensions (i.e., k = 3) as a utility structure to supportsub-queries computed in memory.

We are not the first to consider storing temporal informa-tion alongside spatial information; this is a natural extensionfor both the k-d tree and R-tree. More generally, combiningspatial and temporal information is particularly prevalentin moving object databases [27]. Many optimizations havebeen explored for improving queries across moving objectssuch as with the binary string prefix-matching used in [28].Temporal aspects may capture a data item’s relevance to thestate of the database (termed transaction time) or relevanceto the real physical world (termed valid time) [29]. We focuson the latter. In contrast to existing work in more generalpurpose moving object databases, PACO focuses on a datastore whose data points all represent just a single movingobject: the user who owns the mobile device collects thedata.

In support of efficient storage and retrieval of spatiotem-poral data for mobile devices, methods can be looselycategorized into those that “index past positions,” “indexcurrent positions,” and “index current and future posi-tions” [30]. To date, these approaches build almost exclu-sively on R-trees, and they are thus complementary toour work. However, existing approaches have developedcentralized, heavyweight indexes that are not well suitedto the lightweight, flexible, and personalizable implemen-tations demanded by mobile devices. Again, this need fora lightweight index that can reside entirely on a user’spersonal mobile device is PACO’s goal.

PACO aims to enable efficient real-time querying ofspatiotemporal information by applications on the user’sdevice. Therefore, in designing PACO, it is essential thatthe data store does not grow too large both because of thelimitations of the device and the need to support quickresponse to applications’ queries for spatiotemporal data.To support on-line location-awareness, researchers haveinvestigated approximate query processing [31], enablinglossiness in data storage [32], [33] by storing line segmentscomprising a trajectory. In contrast, CDR [34] uses an on-line trajectory reduction that relies on the moving objectsthemselves (e.g. mobile embedded sensors) to temporarilystore data. These approaches are related to some methodsthat we use in PACO to intentionally discard redundantdata items, however, these approaches continue to rely ona central server or other devices capable of handling sig-nificant off-loaded storage and computation. Further, theseapproaches do not associate the application context with thespatiotemporal data, thus they lack the ability to use thisdata to respond to complex application-level queries.

5

4 PACO AND THE PACO API

As shown in Fig. 1, PACO relies on an internal structureto store spatiotemporal data; because the PACO API shieldsthe application developer from the specifics of this struc-ture, any spatiotemporal data structure can be used thatconforms to the appropriate interface. We first overview therequirements of such a data structure and discuss our use ofthe R-tree to fulfill these requirements. As shown in Fig. 2,the internal structure provides basic query operations acrosstime and space. Upon this structure, we build the PACO APIas a suite of application layers that effectively extend theinternal structure’s interface, resulting in the complete PACOabstraction for on-loading a device’s spatiotemporal infor-mation locally and using that local data storage to answerspatiotemporal queries for applications on the device. Recallthat PACO is intended to replace traditional location servicesand serve as the primary access point for spatiotemporally-indexed contextual data. Note that in some cases single rawlocation data points may need to be exposed, such as formapping applications that display your current location. Weomit this from our discussion of the PACO API in favor ofthe more interesting methods that PACO provides.

Fig. 2. Application layers and internal structure of the PACO API

4.1 Required Internal Data Structure Interface

The internal structure is responsible for the low-level ef-ficient storage of spatiotemporal data. It must support in-serting spatiotemporal data objects in the form (x, y, t, id),where x and y are spatial coordinates, t is a temporal value,and id is an identifier for the data object that allows PACOto directly associate the entries in the spatiotemporal datastore with the appropriate application-level context entries.We only use two spatial dimensions for our initial study, butPACO can be easily extended to support more dimensions.For the remainder of this paper we omit discussion of idand instead focus on the space and time portions of theindexing; we expect that these will be integral to most futuremobile applications and are reasonable targets for storageand query optimizations.

The structure must also support the ability return a listof points matching a range of spatiotemporal values; forexample a query may ask for all data points with an areacomprising a popular running route from 5-8pm. This is acommon feature of data structures supporting spatial andtemporal indexing and is essential to the effectiveness of

PACO’s abstractions; as such, optimizing efficiency of thisquery is crucial.

In addition, the underlying data structure should sup-port an additional two query operations that will be used inPACO to support expressive application queries. One is thenearest neighbor operation, which takes a piece of candidatespatiotemporal data and finds the closest matching dataitem stored in the structure. For instance, applications mayask for the stored data item nearest an historical landmark,closest to given time, or closest to a particular place andtime. Finally, the internal structure must allow for sequentialretrieval of time-ordered data items between two querypoints. This query is of the form Q{p1, p2}, where p1 andp2 are spatiotemporal points previously inserted into thestructure, and the query returns the sequential list of databetween these two points. An example sequence query mayask for a list of points between a runner’s starting andending location.

4.2 The PACO API

Building on the internal data structure, we introduce thePACO API, which establishes a programming interface foranswering important spatial and temporal questions and forcontrolling the contents of the spatiotemporal data store (seethe outermost layer in Fig. 2). This API is tunable throughconfiguration parameters that can be adjusted to match thecurrent available resources, for different users, and for aparticular user’s situation or changing environment. Whileadditional operations may be added to PACO in the future,we demonstrate in Section 6 the effectiveness of those pro-vided in this work to address existing and future applicationuses. A basic building block for the PACO API is PACO’snotion of probability of knowledge, which we discuss first.

4.2.1 Probability of Knowledge (PoK)Fundamental to PACO is the creation of a model to representthe amount of knowledge stored in the internal structureabout a particular place and time. Points in the PACO datastore have the form (x, y, t, z) where x and y are spatialdimensions (typically longitude and latitude, respectively),t is the timestamp, and z is an abstraction for any additionalcontext. In representing additional context as a single di-mension z is a significant oversimplification, our structurefully supports expressive extensions including incorporat-ing a full context ontology [9]. We omit discussion of thesedetails as well as computations on z because the processis more of an exercise in developing the semantics of thecontext ontology and not fundamental to spatiotemporaldata storage and querying. In contrast, the aim of this paperis to perform the “heavy lifting” related to the most commoncontext aspects, i.e., space and time.

The mobile applications that motivate our work oftenwant to ask questions of the form: “How well does the datastored in the structure relate to a given reference point?”or “What does the structure knows about a point in spaceor time?” For example, a contextual chat application mightask whether a user’s historical context indicates purchasingitems at a specific store at 5pm in order to connect messageswith similarly profiled users. For a matching data point,x and y may be the GPS coordinates of the store, t the

6

timestamp for 5pm on the current day, and z additionalcontext that indicates shopping (e.g., a recorded purchase).Given some reference point r = (rx, ry, rt, rz), we mightask what influence a particular stored data point, p, has onr. In other words, we want to determine the probabilitythat a point in the structure shares common or relatedinformation with the reference point based on their regionsof influence. We define this influence on a per-dimensionbasis, using a spatial influence function Is(∆x,∆y) and atemporal influence function, It(∆t) where ∆x,∆y are thespatial distances from p and ∆t is the temporal distancefrom p. We say that p influences any reference point r withinthis region.

Fig. 3. Influence function fora point p along a particulardimension (i indicates one ofthe three dimensions)

Fig. 3 depicts an influ-ence function for some di-mension i. Note that the thisinfluence function is simi-lar to modeling a probabilitydensity function over spaceand time, similar to the be-havior modeling approachused in [35].

Using these influencefunctions, we define theprobability of knowledge (PoK)as the amount of influence apoint p has on a reference point r as:

∆x = |px − rx|; ∆y = |py − ry|; ∆t = |pt − rt|PoK(p, r) = (Is(∆x,∆y) ∗Ws) ∗ (It(∆t) ∗Wt) (1)

where Ws and Wt are two parameters termedSPACEWEIGHT and TIMEWEIGHT, respectively.Applications can tune these weights to enable PoKcomputations to have more or less relative emphasis onspatial or temporal dimensions. In the remainder of thispaper, we fix both values at 1.

In our study in Section 5, we use linear decay functions.Specifically, we fix the amplitude of the influence functionsat 1 and use the value for which the influence function is 0(x-intercept) or i-RANGE (where “i” is either SPACE or TIME)to adjust the slope of the influence function. The RANGEvalues can be conceptually thought of as the physical dis-tance and length of time over which a data point shouldexert influence on reference points. Note that by using linearinfluence functions, the decay of PoK behaves quadratically(since the two influences are multiplied in the resulting PoKcalculation).

4.2.2 PoK WindowPotentially more useful than PoK information about a singlepoint is information about the data structure’s aggregateknowledge over a spatiotemporal area (i.e., a PoK window).For example, rather than querying for purchases made atthe store at 5pm, an application may query for purchasingknowledge about a reasonably sized area around the storeany time in the afternoon. Given such a spatiotemporalregion (visualized by a rectangular cube in three dimen-sions, as shown in Fig. 4), a PoK window query returnsthe PoK for the region as opposed to the PoK of a singlepoint; the region’s PoK depends on the spatial and temporal

influences of the set of “nearby” data items stored in theinternal structure.

Fig. 4. A PoK window computa-tion highlighting one sub-cube

The computationof a PoK window isan integral over theinfluence functionsacross the structure’sdimensions. In PACO,we estimate the valueby summing the PoKof small sub-cubes thatapproximate the influencefunctions (essentiallyby performing smallermore tractable windowqueries), as shown inFig. 4. The size of thegrid sub-cubes trades off computational complexity for thegranularity of accuracy of the resulting PoK for the window.To specify the size of the grid sub-cubes, we define a valueGRIDFACTOR that is the number of sub-cubes to includeper i-RANGE value along any ith dimension. For example,given a SPACERANGE of 100 meters, a TIMERANGE of 30minutes, and a GRIDFACTOR of 2, the sub-cube sizes are50m × 50m × 15min (2 cubes per i-RANGE). The valueof the GRIDFACTOR parameter affects both the accuracyand efficiency of the resulting window query; we brieflyevaluate these tradeoffs in Section 5.

To compute the PoK of a window, PACO performs arange query on the internal data structure for each sub-cube. The returned data items are then used to compute thePoK for the sub-cube. Finally, the PoKs for all of the sub-cubes in the window are aggregated to generate the PoKfor the window. Since there are often many sub-cubes forwhich to calculate PoK values, and, for each sub-cube, PACOmust execute a smaller range query, we perform an initialrange query using the entire window’s bounds to generatea reference set of data points for the smaller sub-queries.The initial range query retrieves all of the candidate pointsstored in the internal structure that are within the windowfor the PoK computation. We can further restrict the set ofcandidate points by joining queries that account for filtersrelative to the additional context information, i.e., z. Wethen execute each sub-cube’s smaller window query onlyon this reference set of candidate points rather than on theentire internal structure (e.g., in our experiments, reportedin Section 5, this pre-query eliminated, on average, 97.5% ofthe data points from consideration in the larger of our twodata sets). PACO stores this reference set of points in a k-dtree that it then uses when calculating each sub-cube’s PoK.Our implementation of this approach leverages the benefitsof k-d trees for in-memory computations and R-trees formaintaining a balanced on-disk database of the entire setof points in the structure. Our evaluation of PoK windowqueries in Section 5 includes exploring the effectiveness ofusing a k-d tree rather than a simple array of points as thereference structure.

To calculate each sub-cube’s PoK, we need to resolve theinfluence of potentially multiple data items. For example,Fig. 5 shows (in only two dimensions) the (overlapping)ranges of influence of three data items C1, C2, and C3.

7

Computing the PoK for a region that contains all or partof any of the intersection areas in Fig. 5 involves resolvingthe shared influence between the relevant data points.

Fig. 5. Inclusion-exclusion and spatialinfluence

Given a sub-cube, identified bya point at its center(the target point), weretrieve all of thenearby neighborsof the target point(the candidate points)using the internalstructure’s rangequery. We computethe structure’s PoK ofthe target point usingthe inclusion-exclusionprinciple to account for “double counting.” This becomesmore complicated as an increasing number of candidatepoints and dimensions are considered. In general, thefollowing expression captures the PoK at a reference pointr, where 1 ≤ i ≤ n and Ki is the influence from each of then nearby candidate points. P(Ki) reflects the distributionfor point i after accounting for both the spatial and temporaldecays. Keep in mind that this could be extended to includeitems from the application-level context (i.e. z).

PoK(r) = P

(n⋃

i=1

Ki

)(2)

P

(n⋃

i=1

i

)=

n∑i=1

PoK (i)−∑i<j

PoK (i ∩ j)

+∑

i<j<k

PoK (i ∩ j ∩ k)

− . . . + (−1)n−1

P

(n⋂

i=1

i

) (3)

PoK values are between 0 and 1, inclusive, and the setintersection operator in Equation 3 generates the combinedprobability for the considered candidate points. Using Equa-tions 2 and 3, PACO computes the PoK for each sub-cube. We then compute To as the sum of the PoK valuesfor all of the sub-cubes and TP as the maximum possiblePoK (equivalent to every sub-cube having a PoK of 1, or,simply, the number of sub-cubes). We return the PoK forthe window as the fraction To/Tp. For realistic data, ourevaluation has show that values of > 60% indicate highlevels of coverage.

Note that Equation 2 effectively generates all combina-tions of the n PoK values. To mitigate state explosion, wesort the candidate points and select the first TRIMTHRESHnumber of points with the most significance and use thesepoints to compute the window’s PoK. In our evaluation, wefixed TRIMTHRESH at 10 because it empirically allows for agood number of points to be considered without generatingtoo many sets of combinations.

PoK window queries are flexible and, like their underly-ing range queries, can be performed over any combinationof the dimensions. For example, a query might ask for the

PoK over a spatial-only region such as on a college campusor a temporal-only region such as from 9am to 5pm ona particular day. This distinction defines only the boundsfor the initial query on the internal structure for which thereference structure is constructed. The PoK calculation foreach sub-cube is unaffected as it still computes the influencevalues from all three dimensions. Similarly, the PoK compu-tations could be extended to account for dimensions of theapplication-level context (i.e., z).

4.2.3 Smart Insert

Maintaining a data structure that contains detailed histor-ical space-time indexes of contextual data generates largestructures that may be expensive to maintain and query.Many data items are redundant in space, time, or both, andmaintaining such redundant data items does not add muchadditional contextual information of use to applications.Most users stay in a relatively confined location throughouta day at work, so updating space-time information on afixed (e.g., one second) schedule is often overkill. On theother hand, many mobile location services only updateobservations if the mobile object has moved a significantdistance. This does not account for the temporal relevance ofdata objects, nor does it account for motion paths where anactively mobile user is constantly looping back on already“covered” areas.

In PACO, we introduce smart insert, which uses applica-tion tunable guidance to ask “how well is this informationalready represented?” before inserting a data item. Smartinsert accomplishes this check by performing a small PoKwindow query around the candidate point. The result ofthis window query sample is compared to an application-defined parameter, INSTHRESH. Since both the value of thewindow query and the value of INSTHRESH are PoK’s, theyrepresent the combined spatial and temporal relevance ofthe structure relative to the point that is to be inserted.For example, an INSTHRESH value of 80 means that, ifPoK(p) ≥ 0.8 for some new point p, then p should notbe inserted. Lower thresholds result in a greater reductionin the number of data items inserted, trading some degreeof accuracy for storage and computational efficiency asdepicted in Fig 6, which gives heat maps of the PoK valuesof the resulting structure, given the ground truth in the leftmost figure (i.e., all data samples are inserted, regardless oftheir contribution to knowledge) vs. smart insert using anINSTHRESH value of 80% (in the center) and an INSTHRESHvalue of 20% (on the right). We provide a more controlledand detailed study of the impacts of these parameters onsmart insert and the resulting available knowledge in thedata structure in Section 5.

Since this operation uses the PoK window operationdescribed above, its performance is partially dependent onthe application parameters defined for coverage window.Using the PoK computations defined above, this considersonly space and time information when determining whethera data item is redundant; again, PoK computations canbe extended to include additional dimensions of contextinformation, also increasing the expressiveness of the de-termination of redundancy for smart insert.

8

Fig. 6. Effect of smart insert on spatiotemporal trace data.

4.2.4 Find Path

PACO also enables queries about a mobile user’s trajectoriesthrough space and time. PACO’s find path operation takestwo points (x1, y1, t1), (x2, y2, t2). Under the hood, the oper-ation first finds the nearest neighbor of each point in PACO’sinternal data structure, then uses the internal structure’sGetSequence method to return the path between thesetwo nearest neighbor points. To illustrate, first considera running/exercise application. A simple example of findpath is to query for two known data points, such as therunner’s home and a familiar trail he ran two days ago, forwhich find path can return the sequential list of data points.A more interesting example of find path is to query for twonon-exact data points. For example, a user who cannot recallhis exact running path could query around a trail head anda nearby lake three weeks ago. While the user may havenot passed through those exact points or at that exact time,find path will determine the nearest two points and returnthe sequential list of items best matching those points.Weinclude the Find Path operation in PACO to complement theother operations in motivating the use cases. Find Path isnot dependent on configuration parameters, and as such,we omit an evaluation in favor of more interesting methodswithin PACO.

A challenge for many of the operations in PACO is set-ting the parameters to achieve desired application behavior.PACO’s programming interface is intentionally designed tomake it easy for application developers to use defaults forthese parameters or to substitute alternative values on acase-by-case basis. A summary of the parameters as well asconcrete values we considered in the evaluation in the nextsection are summarized in Table 1. “cabs” and “peds” referto the two data sets we use, which will be introduced in thebeginning of the next section. In Table 1, Device Tunablerefers to whether a particular device will adjust the givenparameter at run time (and under what conditions). AppTunable refers to the likelihood or frequency with which anindividual application will adjust the given parameter (e.g.,for a particular query or sequence of queries).

4.3 PACO Access Profiles

Our initial intention is to store spatiotemporal data ondevice to make it available to applications running on thatdevice. In this sense, the above API is one that is assumed

to be “open” to any application running on the same de-vice as PACO. An obvious next use of PACO is to enableindividual users to control whether and how their personalspatiotemporal contextual data is shared with applicationsand services on the device and with other individuals orservices executing off of the device. To enable this control,PACO defines access profiles that use the PACO API directly toconstrain and filter the spatiotemporal data (and ultimatelyits derived PoK information) in making it accessible.

In PACO, users employ the PACO API’s parameters tocontrol the accessibility of the spatiotemporal data storedwithin PACO. To provide examples of this process, Table 2gives a qualitative comparison of different access profilesthat could be employed by users or devices to control releaseof their spatiotemporal information. The open profile allowscomplete access to the PACO API and is suitable for trustedapplications (i.e., likely those resident on the local device).The guarded profile outlines a safe middle-ground, for in-stance only allowing the find path operation for entries morethan 24 hours in the past (preventing the user’s exact recentmovements from being tracked); restricting the grid cubesize (effectively blurring the granularity of spatiotemporalinformation released); and providing a minimum windowsize (only allowing window query sizes five times largerthan the RANGE in any dimension). This guarded pro-file serves as an example for device-to-device interactions(based on the presumption that users are willing to sharewith other co-located users more than they are willing toshare publicly on the Internet [5]), the guarded. At the mostconservative end of these examples, the restricted profileserves well for offloading to a public server or sharing withunknown users by only allowing lossy data representations,ensuring that the data is sufficiently obfuscated to makethe user comfortable in releasing it. One could imagineswitching between profiles not just for the three differentmodalities of applications shown in Fig. 1 but also based onsocial relationships with the users of the connected devices,the time of the data, the location of the exchange, the user’smood, etc.

4.4 Android PACO ImplementationWe implemented the PACO components depicted withinthe dashed area in Fig. 1 on Android4. The current imple-mentation of PACO is an Android service that serves as asufficient prototype for evaluation. It is our vision that PACObe implemented at the system level to be more nativelyshared across applications. Our prototype implementationis, however, more suitable to evaluation on commoditydevices, especially in a way that can be replicated by others.We use a 3D R-tree as the internal data structure to storeeach (x, y, t) point; for simplicity of evaluation, our currentprototype does not associate the z values with the spa-tiotemporal samples, but this is a straightforward extension.The default SQLite build contained within Android doesnot include the needed R-tree module, so we compiled ourown version of SQLite and accessed it through JNI in ourbenchmark test runner. SQLite’s R-tree module requires in-sertion of regions, (xmin, xmax, ymin, ymax, tmin, tmax); for

4. The source code for this implementation is available at https://github.com/nathanielwendt/LSTAndroid.

https://github.com/nathanielwendt/LSTAndroid

https://github.com/nathanielwendt/LSTAndroid

9

TABLE 1PACO parameter values

Parameter Description Study Values Device Tunable AppTunable

SPACERANGE range of a point’s spatial influence 50m (peds); 1000m (cabs) No InfrequentlyTIMERANGE range of a point’s temporal influence 5min (peds); 60h (cabs) No InfrequentlyGRIDFACTOR grid squares per RANGE value 1/2, 1, 2 Subject to energy constraints LikelyINSTHRESH value deemed significant PoK about

a given area0.1 - 1.0 Subject to energy constraints

and device mobilityInfrequently

TABLE 2Qualitative Comparison of Access Profiles

Profile GridFactor Min. Window Size Find Pathopen 1/2 - 2 None Yesguarded 1/2 - 1 5X RANGE > 24hrestricted 1/2 20X RANGE No

a given (x, y, t) point, we duplicate data so that for eachdimension i: imin = imax. We also adapted the k-d treemodule from the java machine learning library5 for useas our in-memory reference tree. Lastly, to support storingGPS data, we developed a set of GPS library functions formanipulating spatial distance over latitude and longitudewhile taking into account curvature of the earth, etc.6

5 BENCHMARKING

We performed a series of benchmarks on PACO to better un-derstand its feasibility and efficiency. To our knowledge, weare the first to explore on-loading spatiotemporal data at thisscale and density; for this reason, we evaluate PACO acrossvarious configurations. We also compare PACO’s internalstructure to a single table database approach (which we callStdTable). For both internal structure options, we compareusing the k-d tree reference tree for window computes (asdiscussed in Section 4) versus not using it. Recall that the k-din-memory reference tree optimizes the otherwise expensiverepeated window query that computes over sub-cubes. Allevaluations use mobile trace data from CRAWDAD: (1) aset of 92 traces of 500-2000 on-foot data points, collected ata university in South Korea [36] (“Peds”), which is fairlysparse, and (2) a set of vehicular traces of taxicabs in SanFrancisco with 500 taxicabs over 30 days [37] (“Cabs”),which is fairly dense with distinct highly populated areas.

We used Moto G 1032 Android Devices (Quad-core 1.2GHz Cortex-A7, Qualcomm Snapdragon 400 chipset, and1GB RAM). Determining execution times in Java can beunreliable due to the JVM’s JIT compiler and variations insystem behavior. We mitigate these concerns using indus-try guidelines7. We perform all benchmarks on quiescentdevices and run warmup samples on the JIT compilerbefore measuring execution times. Energy benchmarkinguses Qualcomm’s Trepn Profiler8, and power measurements

5. http://java-ml.sourceforge.net/6. Equations: http://www.movable-type.co.uk/scripts/latlong.html7. http://www.ibm.com/developerworks/java/library/

j-benchmark1/index.html8. https://developer.qualcomm.com/mobile-development/

increase-app-performance/trepn-profiler

are reported from a recorded baseline established per test.Despite these measures, execution times and energy levelsshould not be taken as absolute but rather as reasonableestimates of performance and relative measures across pa-rameter settings.

Our evaluation is framed with the following goals:(1) maintain fast, responsive query execution for on-devicespatiotemporal application queries; (2) reduce the size ofstored contextual data to manageable levels (both for sup-porting the first goal and for reducing PACO’s memoryfootprint on resource-constrained devices); and (3) minimizeenergy consumption of PACO’s operations. In this section,we take the elements of PACO’s API in turn and evaluatethem for these three goals. We also investigate some alter-native internal operations for their impact on these goals.

5.1 Smart Insert

Recall that PACO’s smart insert operation trades accuracy ofinformation for structure size. With decreasing values of IN-STHRESH, PACO becomes increasingly selective in insertingnew data points. Intuitively, the effect is a decrease in re-sulting PoK values since there are fewer points to influenceregion and point queries (i.e., there is less knowledge storedin the structure). Fig. 7 shows the various relationshipsbetween INSTHRESH, the size of the internal structure, andthe resulting PoK values. A decrease in PoK value indicatesa decrease in stored data accuracy from the ground truthof storing all of the points. On this graph, the varyingthickness of the lines represents the size of the data set(the thickest line belonging to the largest data set). Theleft axis shows the average PoK achieved as a function ofreducing the INSTHRESH. The right axis shows the degreeof size reduction of the overall data structure as a functionof reducing the INSTHRESH. As an example, for the largestdata set, PACO achieves an almost 80% reduction in the sizeof the data store, albeit at a decreased PoK of about 0.5.

Larger data sets (such as the trace of 80,000 points)tend to exhibit larger fractional size reductions while stillmaintaining similar changes in PoK. We empirically choosea default INSTHRESH value of 0.8 as a reasonable com-promise that maintains a high fidelity of the ground truthspatiotemporal information (as measured by the change inPoK) relative to the space savings (which ultimately alsoresult in more efficient querying). At this value, most tracesmaintain PoK within 80% of the original value with size re-ductions of an average of over 48%. Obviously, these curvesare directly representative of the particular traces evaluated,and as such, some data sets such as 5k do not follow thegeneral trend as they vary in spacing and density of points

http://java-ml.sourceforge.net/

http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html

http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html

https://developer.qualcomm.com/mobile-development/increase-app-performance/trepn-profiler

https://developer.qualcomm.com/mobile-development/increase-app-performance/trepn-profiler

10

Fig. 7. Effect of smart insert on PoK and structure size.

in the trace. The aim of this portion of the evaluation is toexamine real world data sets as a first effort in empiricallyestablishing guidelines for setting configuration parameters.

5.2 PoK Window

We now turn to examining the PoK window query. Forthis evaluation, we used fixed window query sizes of10km×10km×7days for cabs and 500m×500m×10hoursfor peds and ran a large number of window queries foreach. We choose larger spatial and temporal ranges for thecabs data since queries about vehicular data is likely tocover greater distances and the data stored will be moresparse. We evaluate using the R-tree and the StdTable(a single table for storing the spatiotemporal data, whichrepresents a conventional baseline) as alternatives for theinternal structure and whether or not the k-d tree wasused for the queries over sub-cubes (utility structure). Wealso evaluate the final PACO option, which uses the R-tree as the internal structure, the k-d tree for the sub-cubequeries, and a smart insert INSTHRESH parameter of 0.8. Weevaluate the average execution time and consumed energy,shown in Fig. 8, across all traces and windows for the 5possible structure combinations. The raw time values seemhigh (relative to desired application responsiveness rates)because the evaluation includes very large window sizesthat can encompass a large number of data points. Thesewindows push the limits of the structure; more practicalapplication queries can expect to use smaller windows withfewer than 1000 data points.

The R-tree and StdTable without the k-d tree for inmemory computation perform the worst (averaging 7 sec-onds and 1.5 mWh per query). The structures that includethe k-d reference tree perform significantly better. The R-tree’s benefits over the table approach are not as apparentsince the sub-cube computation outweighs the initial rangequery on the internal structure. The R-tree’s efficiency ismore evident in very large but sparse data sets (i.e., thosewith many data points but only a small number of those

Fig. 8. Internal structure and reference tree comparison.

points within a given query window). The fifth combinationuses the R-tree with the k-d tree but with a smart insertvalue of 0.8 (as opposed to 1.0 for the other structures). Aswas previously demonstrated, an INSTHRESH value of 0.8maintains reasonable PoK values while drastically reducingthe size of the structure; this is borne out here as well, wherereducing the INSTHRESH has a significant impact on boththe query time and energy consumption.

The resources demanded for computing a PoK windowquery are directly related to the number of data items in thewindow. Since the window query is a common operation,it is important that it is effective across a large variation ofwindow sizes. Fig. 9 shows the energy consumed by a PoKwindow query for various window sizes. We show the beststructure combination, R-tree with a k-d reference tree, bothwith and without smart insert. For energy consumption,we set an upper limit of 0.5 mWh as a target for per-query energy consumption, hence the horizontal line in thefigure. The Samsung Galaxy S5 has a 2800 mAh batteryrated for 3.85 volts. Estimation yields a total capacity of10,780 mWhs for the total battery, which allows for 21,560window queries at this upper limit. We would like to keepmost queries below this upper limit, and as shown in Fig. 9,most realistic queries consume about 0.05 mWh of energy,increasing the number of possible queries in a single chargeto over 200,000. Essentially, a user could perform 1,000 PoKwindow queries per day with less than a 0.5% effect onbattery life.

We similarly evaluated execution time across variouswindow sizes and the data trends very similarly to Fig. 9.We set a realistic target time of 2 seconds since applicationsshould try to keep response times around 1 second [38].Most windows we evaluated maintained window queryexecution times below this target, with PoK window queriesrunning for about 0.2 seconds on average. Also similar toour energy evaluation, the smart insert INSTHRESH valuedecreased execution times considerably to assist in meetingthe 2 second target.

5.3 Grid Factor

As important as INSTHRESH is to smart insert, GRIDFACTORis to computing the window query. GRIDFACTOR deter-mines the number of grid cubes that are used in a windowquery computation. Larger number of cubes are desirablefrom an accuracy standpoint but can add significant com-putational overhead.

11

Fig. 9. Consumed energy vs. window size. Note log axes.

For dense query windows (i.e., those with 500+ datapoints to consider) we found that a GRIDFACTOR of 2averages over 37 seconds in execution time, which is unrea-sonable for mobile applications. For windows with fewerpoints, a GRIDFACTOR of 2 is more reasonable (resulting inexecution times of 2 seconds). Ultimately, a GRIDFACTORvalue of 1 is best for default use, with the option to increasethe GRIDFACTOR for small windows or queries where thedata is expected to be sparse; one can even consider de-creasing the accuracy of the window query result by settingthe GRIDFACTOR to a value of 0.5 when applications requireonly a lossy representation of a window.

5.4 Cloud Offload

We investigate general power consumption of using PACOin comparison to offloading to a server to demonstrate thatour entire approach of on-loading is feasible and reason-ably comparable to existing cloud-based silo approaches.In particular, we compare using PACO on-device to entirelyoffloading all of the device’s spatiotemporal data to a cloud,where we assume queries can be processed quickly. Specif-ically, implementing the same storage features found inPACO in the cloud in not necessary since we assume thatcloud resources can be scaled to serve any desired minimalresponse time. To mimic minimal server response times, ourevaluation sets the service instance to wait for 100ms beforereturning a response to simulate an optimistically realisticcloud computation. To generate results for the overall en-ergy consumption of insertions and window queries on ourcloud instance, we sent a large number of generic HTTPrequests and computed the average cost per request. HTTPrequests are sent each time a new location is sampled.

To generate the data for this experiment, we performeda one hour “walkabout” in which we walked around aWiFi enabled university campus while continuously pollingAndroid location services on the device. As shown in Ta-ble 3, inserting the spatiotemporal data points into PACOwas about 14% more efficient than offloading the same datato the cloud, consuming 9.86 mWhs of energy vs. 11.47

TABLE 3System storage and power usage

Evaluation PACO On-load CloudWALKABOUT INSERT (30 MIN) 9.86 mWh 11.47 mWhTRACE INSERT (PER POINT) 5.87 µWh 7.9 µWhPOK WINDOW (PER WINDOW) 25.7 µWh 7.9 µWh20,000 DATA STORAGE 1.24Mb N/A

mWhs in the cloud case. To further isolate these resultsfrom other effects of running the PACO service while thedevice was in use, we also compared a controlled insertionof the same data from a captured trace, which resulted in asimilar relative result between on-loading and off-loading.The gist of this result, however, is that in a full 16 hour dayof continuous location polling, PACO would only consumeabout 316 mWhs of energy to store a complete trajectory ofthe user’s movement. With modern device battery capacitiesof over 10,000 mWhs, this is very feasible.

We also compared PACO’s average cost per point inser-tion and per PoK window query with cloud off-loading ofthe same activities. As also shown in Table 3, PACO is moreenergy efficient for data insertion but about three times lessefficient for PoK window queries. This is intuitive since thecloud instance has potentially limitless resources to answerexpensive queries but has an overhead associated with eachrequest. We limited the queries to windows smaller than1000 data points to better represent realistic applicationqueries. Because it is expected that insertion will be muchmore common than window queries, the main take awayhere is that, while PACO is not designed to save energy, its(overall) energy performance is reasonable in comparisonto a conventional approach that relies entirely on a cloudbackend for spatiotemporal storage. The last item in Table 3outlines the storage requirements for 20,000 spatiotemporaldata points, for which the PACO representation required1.24Mb. Devices could insert a new data point every 5seconds for a week and only use about 7.5 Mb of storage,which is very reasonable given today’s device storages of16-32Gb. Note that this leaves plenty of storage for theadditional z context data we also plan to include.

6 USE CASES

In this section, we provide two case studies for demon-strating applications’ use of the PACO service. The first ofthese examples takes an existing application and refactorsit to remove its siloed use of spatiotemporal information toreplace it with the use of PACO’s abstractions instead. Thesecond shows novel application behavior that is enabled bythe existing of PACO. In both uses cases, the device employsPACO’s smart insert method to collect spatiotemporal con-text information, and the device (user) itself manages theinsertion thresholds. To provide more application richness,these example applications also bring back the z componentof application-level context.

6.1 RunningThe first example considers a running application thatcollects location samples for a specified period of time(explicitly started and ended by the user) to collect and

12

report on path information, pacing, and route times. Manysuch applications exist, including the MapMyRun9 familyof applications, the Nike+ app10, or even apps like SpotifyRunning11 that use additional forms of context information(e.g., running tempo) to deliver media content. In all of theseapplications, context collection is a dedicated part of theapp. If a user wants to employ all three of these applicationssimultaneously (which is reasonable, given that they allprovide different capabilities, connections, and statistics),each device maintains its own store of spatiotemporal andcontext information, and each of these applications pushesthat personal context information to a cloud service for pro-cessing and storage. Much of the context-related activitiesof the applications are redundant, as is the cloud offloading,and the latter also releases potentially private information.

In this PACO use case, we therefore show how onewould refactor such an application to make use of PACO’sspatiotemporal context abstractions. To start with, becausePACO continuously collects spatiotemporal context informa-tion, the user no longer has to remember to open each of theapps and explicitly select “begin run.” Instead, the devicepassively and unobtrusively records the context information(based on its settings) for later querying by the applications.That is, based on the context collected by PACO on thedevice, the application could recreate a run post hoc basedon the information stored in the PACO structure. Fig. 10shows code that would be launched as startup of a runningstatistics app (e.g., like mapmyrun) to examine route andpacing information for a previously performed run.

//on app startupContextWindow window =

ContextWindow.Builder().setTimeWindow(lastAppUse, now).setContextMobility(‘‘Running’’).build();

Paco.setGridFactor(0.5);double pok = Paco.windowPoK(window);if(pok > 0.1){

List<ContextPoint> points = Paco.findPath(window);// display map view and/or share running profile

}

Fig. 10. Querying for a activity in a running app.

As shown in Fig. 10, when the user opens the app, it canaccess relevant context pieces from PACO’s historical record.The app first creates a context window with rough precision.PACO’s configurability allows for a lossy representation ofPoK for this query since the exact value is not important,and a general indication above some base threshold (0.1)suffices. This allows for a fast qualifying query before theapp queries for the exact path information. Once the latter isobtained, the path can be displayed to the user or offloadedto the cloud for use across devices or for archiving. The usercan also allow this application access to specific contextualdata. For instance, the application may be only allowed tooffload context that the user has explicitly provisioned, inthis case spatiotemporal data with a mobility profile. Insteadof offloading the user’s entire trajectory, however, the appcan offload exactly only the data from the running activity

9. https://www.mapmyrun.com10. http://www.nike.com/us/en us/c/nike-plus/running-app-gps11. https://www.spotify.com/running

and then only at a level of abstraction matching a statedaccess profile.

Given that, for example, accelerometer data is part ofthe z context collected by the device and associated withthe spatiotemporal samples, a more sophisticated runningstatistics app can give the user historical information aboutrunning tempo throughout the entire path. Similarly, anexecuting app like a tempo-sensitive music playlist coulduse the information from PACO as it is collected to adjustthe songs selected as part of the playlist. That is, PACOsupports both applications’ needs for instantaneous contextinformation and the ability to query context informationretrospectively.

6.2 ContextChatTo demonstrate PACO’s ability to support next-generationapplications by enabling new uses of context, we turn toa ContextChat example, which connects messages betweenusers not only co-located in time and space but across othercontextual attributes, for instance based on their exercisehistory, their dining patterns, or the spending profiles. Todemonstrate the types of interactions that such an applica-tion might have with PACO, we consider a use case in whichthe application aims to send chat messages between driverswho have similar driving experiences. In this example, theselection of similar drivers is based on their spatiotem-porally captured paths while engaged in the higher levelcontext activity of “DRIVING.” However, using even moreexpressive notions of context (i.e., z), the selection of similardrivers could also be based on other context measures, forinstance, of a driver’s level of aggressiveness.

Fig. 11 demonstrates how this application would capturethe similarity between the drivers (i.e., as measured by PoK)within given space and time bounds. For demonstrationpurposes, this example uses a static query, but more ad-vanced queries could query over dynamic properties deter-mined over user experience.

SpaceWindow spaceWin = SpaceWindow.bound(currentLoc, 100);//100 meter bound

ContextWindow window = ContextWindow.Builder().setTimeWindow(now - Time.10MINS, now).setSpaceWindow(spaceWin).setContextMobility(ContextMobility.Driving);

Paco.setGridFactor(2.0);double pok = Paco.windowPoK(window);//retrieve messagesList<Message> messages =

CloudBackend.GetMessages(userId, window, pok);

//send a messageCloudBackend.SendMessage(userId, window, pok, message);

Fig. 11. Sending and receiving messages in the ContexChat app.

First, the app creates a space window for a 100 meterperimeter around the user’s current location. Using thisspace window, relevant time, and driving context, the appcreates a contextual query window. The precision is setto fine since the actual PoK value is important. The PoKvalue and the query are sent to a backend chat service thatretrieves any messages sent by other drivers that match the

https://www.mapmyrun.com

http://www.nike.com/us/en_us/c/nike-plus/running-app-gps

https://www.spotify.com/running

13

given context. No specific contextual data is sent to thesiloed cloud backend, simply a region and a configurablylossy PoK value representing coverage for a contextualquery. This information is abstracted but is sufficient for abackend to index and query to send messages to appropriateusers. While this example uses a centralized cloud backendto exchange the chat messages, PACO is also compatiblewith D2D uses. Devices could share context windows, re-questing data queries from nearby users directly to gaininformation about their surroundings, what they are doing,and where they are heading.

7 DISCUSSION

We have outlined how PACO provides novel types of spa-tiotemporal queries while intelligently limiting the footprintof the structure on resource constrained mobile devicesthrough the smart insert operation. While checking theusefulness of the data to PACO before inserting it is agood first step, extension of PACO could use a context-aware location sampling technique such as only samplingdetailed location when an accelerometer indicates signifi-cant movement or by inferring expensive location samplingfrom cheaper sensors such as in ACE [18]. Such a step,performed before even considering whether a sample isa candidate for insertion in the PACO data store, has thepotential to provide energy savings on resource constrainedmobile devices. Furthermore, PACO enables control for thelossiness of the stored structure as well as lossiness in queryresults. The lossiness of the query results can be dynami-cally tuned per-application, but the lossiness of the storedstructure must match the greatest accuracy requirement ofall the applications. Going forward, PACO might supportearly subscription of accuracy requirements to be able totune the lossiness of the stored structure. On the other hand,because PACO brings all of the spatiotemporal collectionunder one roof, each spatiotemporal data point is collectedonly once. This is in contrast to conventional approachesin which multiple applications may be collecting the samespatiotemporal data independently, which is also potentiallywasteful.

The work presented in this paper is a necessary first stepin giving mobile users control over vast amounts of per-sonal contextual data that is collected about them withoutunnecessarily limiting the expressiveness of the queries asexposed to applications. Future work will look to developa contextual ontology to couple with PACO’s current spa-tiotemporal indexing, further leverage the internal R-Treestructure to provide faster data region summaries, and tobuild a system of sharing summaries with other devicesor determining appropriate cloud offloading to supplementPACO.

8 CONCLUSIONS

The key contribution of this paper is to make it possible toentirely on-load storage and querying of users’ potentiallyhighly personal (and private) spatiotemporal context infor-mation. To our knowledge, we are the first to enable suchon-loading while providing a detailed historical view ofspatiotemporal context to applications. Motivated by real-world applications, we introduced PACO, a programming

abstraction that enables efficient on-loading of contextualdata storage and querying by indexing crucial spatiotem-poral components. We placed PACO in a broader systemcontext, demonstrating its ability to support D2D exchangeas well as cloud offloading while maintaining user controlover potentially sensitive data. We evaluated PACO overa variety of parameters and compared it to cloud basedalternatives, and ultimately demonstrated the efficiency andfeasibility of including PACO on modern devices. Not onlydoes PACO give users control of their own spatiotemporallytagged context information, but, by making spatiotemporalcontext storage and querying a system service, PACO breaksdown the walls between today’s existing siloed applications,sharing collected context information among all applica-tions on the device.

As demonstrated in this paper, our implementation ofPACO enables on-loading contextual storage, giving usersdirect control over potentially private information and en-abling low-latency responses to spatiotemporally indexedcontextual queries for a wide variety of applications.

ACKNOWLEDGMENTS

This material is based upon work supported by the NationalScience Foundation under Grant No. CNS-1218232. Anyopinions, findings, and conclusions or recommendationsexpressed in this material are those of the author(s) anddo not necessarily reflect the views of the National ScienceFoundation.

REFERENCES

[1] J. Michel and C. Julien, “From human mobility to data mobility:Leveraging spatiotemporal history in device-to-device informa-tion diffusion,” in Proceedings of the IEEE International Conferenceon Mobile Data Management, June 2016.

[2] H. Almuhimedi, F. Schaub, N. Sadeh, I. Adjerid, A. Acquisti,J. Gluck, L. F. Cranor, and Y. Agarwal, “Your location has beenshared 5,398 times!: A field study on mobile app privacy nudg-ing,” in Proceedings of the 33rd annual ACM conference on humanfactors in computing systems. ACM, 2015, pp. 787–796.

[3] M. Kjaergaard, S. Bhattacharya, H. Blunck, and P. Nurmi, “Energy-efficient trajectory tracking for mobile devices,” in Proceedingsof 9th International Conference on Mobile Systems, Applications, andServices, June 2011, pp. 307–320.

[4] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The casefor VM-based cloudlets in mobile computing,” IEEE Journal ofPervasive Computing, vol. 8, no. 4, pp. 14–23, 2009.

[5] Q. Jones, S. Grandhi, S. Karam, S. Whittaker, C. Zhou, andL. Terveen, “Geographic place and communication informationpreferences,” Computer Supported Cooperative Work, vol. 17, no. 2-3,pp. 137–167, April 2008.

[6] A. Guttman, “R-trees: A dynamic index structure for spatialsearching,” in Proceedings of the 1984 SIGMOD International Con-ference on Management of Data, 1984.

[7] T. Kalbarczyk and C. Julien, “XD (Exchange-Deliver): A middle-ware for developing device-to-device applications,” in Proceedingsof the ACM/IEEE International Conference on Mobile Software Engi-neering and Systems, May 2016.

[8] C. Bettini, O. Brdiczka, K. Henricksen, J. Indulska, D. Nicklas,A. Ranganathan, and D. Riboni, “A survey of context modellingand reasoning techniques,” Pervasive and Mobile Computing, vol. 6,no. 2, pp. 161–180, April 2010.

[9] X. Wang, D. Zhang, T. Gu, and H. Pung, “Ontology based contextmodeling and reasoning using OWL,” in Proceedings of the 2nd

IEEE Annual Conference on Pervasive Computing and CommunicationsWorkshops, 2004, pp. 18–22.

14

[10] J. Biagioni, T. Gerlich, T. Merrifield, and J. Eriksson, “EasyTracker:Automatic transit tracking, mapping, and arrival time predictionusing smartphones,” in Proceedings of the 9th ACM Conference onEmbedded Networked Sensor Systems, 2011, pp. 68–81.

[11] U. Blanke, T. Franke, G. Troster, and P. Lukowicz, “Capturingcrowd dynamics at large scale events using participatory gps-localization,” in Proceedings of the 2014 IEEE 9th InternationalConference on Intelligent Sensors, Sensor Networks, and InformationProcessing, April 2014, pp. 1–7.

[12] T. Yan, D. Chu, D. Ganesan, A. Kansal, and J. Liu, “Fast applaunching for mobile devices using predictive user context,” inProceedings of the 10th International Conference on Mobile Systems,Applications, and Services, June 2012, pp. 113–126.

[13] N. Sadeh, J. Hong, L. Cranor, I. Fette, P. Kelley, M. Prabaker, andJ. Rao, “Understanding and capturing people’s privacy policies ina mobile social networking application,” Personal and UbiquitousComputing Journal, vol. 13, no. 9, pp. 401–412, 2009.

[14] S. Guha, M. Jain, and V. Padmanabhan, “Koi: A location-privacyplatform for smartphone apps,” in Proceedings of the 9th USENIXConference on Networked Systems Design and Implementation, 2012.

[15] E. Toch, J. Cranshaw, P. Hankes-Drielsma, J. Springfield, P. Kelley,L. Cranor, J. Hong, and N. Sadeh, “Locaccino: A privacy-centriclocation sharing application,” in Proceedings of the 12th ACMInternational Conference Adjunct Papers on Ubiquitous Computing,2010, pp. 381–382.

[16] S. Han and M. Philipose, “The case for onloading continuoushigh-datarate perception to the phone,” in Proceedings of the 14th

Workshop on Hot Topics in Operating Systems, 2013.[17] N. Vallina-Rodriguez, V. Erramilli, Y. Grunenberger, L. Gyarmati,

N. Laoutaris, R. Stanojevic, and K. Papagiannaki, “When Davidhelps Goliath: The case for 3G onloading,” in Proceedings of the11th ACM Workshop on Hot Topics in Networks, 2012, pp. 85–90.

[18] S. Nath, “ACE: Exploiting correlation for energy-efficient andcontinuous context sensing,” in Proceedings of the 10th InternationalConference on Mobile Systems, Applications, and Services, June 2012,pp. 29–42.

[19] S. Kang, J. Lee, H. Jang, H. Lee, Y. Lee, S. Park, T. Park, andJ. Song, “SeeMon: Scalable and energy-efficient context monitoringframework for sensor-rich mobile environments,” in Proceedings ofthe 6th International Conference on Mobile Systems, Applications, andServices, June 2008, pp. 267–280.

[20] A. Kansal, S. Saponas, A. Brush, T. Mytkiwocz, and R. Ziola,“The latency, accuracy, and battery (LAB) abstraction: Programmerproductivity and energy efficiency for mobile context sensing,” inProceedings of the 2013 ACM SIGPLAN International Conference onObject Oriented Programming Systems, Languages, and Applications,October–November 2013, pp. 661–676.

[21] K. Rachuri, M. Musolesi, and C. Mascolo, “Energy-accuracy trade-offs of sensor sampling in smart phone based sensing systems,”Mobile Context Awareness, pp. 65–76, 2012.

[22] Z. Zhuang, K.-H. Kim, and J. Singh, “Improving energy efficiencyof location sensing on smartphones,” in Proceedings of the 8th

International Conference on Mobile Systems, Applications, and Services,June 2010, pp. 315–330.

[23] Y. Lee, Y. Ju, C. Min, S. Kang, I. Hwang, and J. Song, “CoMon:Cooperative ambience monitoring platform with continuity andbenefit awareness,” in Proceedings of the 10th International Confer-ence on Mobile Systems, Applications, and Services, June 2012, pp.43–56.

[24] V. Gaede and O. Gunther, “Multidimensional access methods,”ACM Computing Surveys, vol. 30, no. 2, pp. 170–231, 1998.

[25] N. Beckmann, H.-P. Kriegel, R. S. B., and Seeger, “The R*-tree: anefficient and robust access method for points and rectangles,” inProceedings of the 1990 ACM SIGMOD International Conference onManagement of Data, 1990, pp. 322–331.

[26] J. Bentley, “Multidimensional binary search trees used for asso-ciative searching,” Communications of the ACM, vol. 18, no. 9, pp.509–517, 1975.

[27] M. Erwig, R. Gu, M. Schneider, and M. Vazirgiannis, “Spatio-temporal data types: An approach to modeling and queryingmoving objects in databases,” GeoInformatica, vol. 3, no. 3, pp. 269–296, 1999.

[28] R. Ganti, M. Srivatsa, D. Agrawal, P. Zerfos, and J. Ortiz, “Mp-trie: Fast spatial queries on moving objects,” in Proceedings ofthe Industrial Track of the 17th International Middleware Conference.ACM, 2016, p. 1.

[29] A. Tansel, “Temporal databases,” in Wiley Encyclopedia of ComputerScience and Engineering, 2008, pp. 1–7.

[30] M. Mokbel, T. Ghanem, and W. Aref, “Spatio-temporal accessmethods,” IEEE Data Engineering Bulletin, vol. 26, no. 2, pp. 40–49, 2003.

[31] J. Sun, D. Papadias, Y. Tao, and B. Liu, “Querying about thepast, the present, and the future in spatio-temporal databases,” inProceedings of the 20th International Conference on Data Engineering,March–April 2004, pp. 202–213.

[32] H. Cao, O. Wolfson, and G. Trajcevski, “Spatio-temporal datareduction with deterministic error bounds,” VLDB Journal, vol. 15,no. 3, pp. 211–228, 2006.

[33] P. Cudre-Mauroux, E. Wu, and S. Madden, “TrajStore: An adaptivestorage system for very large trajectory data sets,” in Proceedingsof the 26th IEEE International Conference on Data Engineering, March2010, pp. 109–120.

[34] R. Lange, F. Durr, and K. Rothermel, “Online trajectory data reduc-tion using connection preserving dead reckoning,” in Proceedingsof the 5th Annual International Conference on Mobile and UbiquitousSystems: Computing, Networking, and Services, 2008.

[35] H. G. Kayacik, M. Just, L. Baillie, D. Aspinall, and N. Micallef,“Data driven authentication: On the effectiveness of user be-haviour modelling with mobile device sensors,” arXiv preprintarXiv:1410.7743, 2014.

[36] I. Rhee, M. Shin, S. Hong, K. Lee, S. Kim, and S. Chong,“NCSU - KAIST mobility models,” http://www.crawdad.org/ncsu/mobilitymodels/.

[37] M. Piorkowski, N. Sarafijanovic-Djukic, and M. Grossglauser,“Epfl mobility dataset,” http://www.crawdad.org/epfl/mobility/.

[38] J. Nielsen, Usability engineering. Elsevier, 1994.

Nathaniel Wendt. Nathaniel is a graduate stu-dent at the University of Texas at Austin re-searching applications of mobile context forpreserving privacy, enabling better monitoringand learning in mobile platforms and the IoT.Nathaniel primarily focuses on middleware thatleverages mobile context to enable simplifiedand more intelligent application programming inpervasive computing environments.

Christine Julien. Dr. Julien is a professor inthe Department of Electrical and Computer En-gineering at the University of Texas at Austin.Her research focuses on the intersection of soft-ware engineering and dynamic, unpredictablenetworked environments and develops models,abstractions, tools, and middleware whose goalsare to ease the software engineering burdenassociated with building applications for perva-sive and mobile computing environments. Dr.Julien’s research has been supported by the

National Science Foundation (NSF), the Air Force Office of ScientificResearch (AFOSR), the Department of Defense, Freescale Semicon-ductors, Google, and Samsung, and the results have appeared in manypeer reviewed journal and conference papers. Dr. Julien graduated witha D.Sc. in 2004 from Washington University in Saint Louis and earnedthe M.S. degree in 2003 and the B.S. with majors in Computer Scienceand Biology in 2000 (both also from Wash. U.).

http://www.crawdad.org/ncsu/mobilitymodels/

http://www.crawdad.org/ncsu/mobilitymodels/

http://www.crawdad.org/epfl/mobility/

http://www.crawdad.org/epfl/mobility/

1 paco: a system-level abstraction for on-loading ... · 1 paco: a system-level abstraction for...

Documents