university of bath...90 gupta et al.’s electrisense [16] – which uses electromagnetic...

Citation for published version:Lovett, T, Lee, J, Gabe-Thomas, E, Natarajan, S, Brown, M, Padget, J & Coley, D 2016, 'Designing sensor setsfor capturing energy events in buildings', Building and Environment, vol. 110, pp. 11-22.https://doi.org/10.1016/j.buildenv.2016.09.004

DOI:10.1016/j.buildenv.2016.09.004

Publication date:2016

Document VersionPeer reviewed version

Link to publication

University of Bath

Alternative formatsIf you require this document in an alternative format, please contact:[email protected]

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Download date: 30. May. 2021

https://doi.org/10.1016/j.buildenv.2016.09.004https://doi.org/10.1016/j.buildenv.2016.09.004https://researchportal.bath.ac.uk/en/publications/designing-sensor-sets-for-capturing-energy-events-in-buildings(b41cfc3f-d301-420f-b1b9-bc69a252fc60).html

Designing Sensor Sets for Capturing Energy Events inBuildings

Tom Lovetta, JeeHang Leea,∗, Elizabeth Gabe-Thomasb, Sukumar Natarajana,Matthew Brownc, Julian Padgetc, David Coleya

aDepartment of Architecture and Civil EngineeringUniversity of Bath, Bath, UK, BA2 7AY

bDepartment of PsychologyUniversity of Bath, Bath, UK, BA2 7AY

cDepartment of Computer ScienceUniversity of Bath, Bath, UK, BA2 7AY

Abstract

There is a growing desire to measure the operational performance of buildings – often

many buildings simultaneously – but the cost of sensors and complexity of deployment

is a significant constraint. In this paper, we present an approach to minimising the cost

of sensing by recognising that researchers are often not interested in the raw data itself

but rather some inferred performance metric (e.g. high CO2 levels may indicate poor

ventilation). We cast the problem as one of constrained optimisation – specifically, as

a bounded knapsack problem (BKP) – to choose the best sensors for the set given each

sensor’s predictive value and cost. Training data is obtained from a field study com-

prising a wide range of possible sensors from which a minimum set can be extracted.

We validate the method using reliable self-reported event diaries as a measure of ac-

tual performance. Results show that the method produces sensors sets that are good

predictors of performance and the optimal sets vary substantially with the constraint

parameters. Furthermore, valuable yet expensive sensors are often not chosen in the

optimal set due to strong co-incidence of sensor signals. For example, light level and

sound level often increase at the same time. The overall implication of the work is

∗Corresponding authorEmail addresses: [email protected] (Tom Lovett), [email protected] (JeeHang

Lee), [email protected] (Elizabeth Gabe-Thomas), [email protected](Sukumar Natarajan), [email protected] (Matthew Brown), [email protected] (JulianPadget), [email protected] (David Coley)

Preprint submitted to Journal of Building and Environment September 1, 2016

that a large number of co-incident low-cost sensors can be used to build up a picture

of building performance, without significantly compromising information content, and

this could have major benefits for the smart metering industry.

Keywords: Energy use, sensing, intelligence, interaction, ENLITEN

1. Introduction

The reduction of energy use in buildings has become a major challenge for re-

searchers across multiple fields. The UK government has committed to 80% reduc-

tions in carbon emissions by 2020 [1], and a large proportion of these emissions stem

from the operation and use of buildings [2]. Building energy efficiency aside, it is the5

occupants and their energy-related behaviour within the buildings that are a critical and

complex factor in overall energy use [3].

To tackle the problem of energy usage reduction in buildings, researchers have used

sensing technology to capture and analyse buildings’ energy use so that efficiency can

be improved and methods of lowering energy demand can be explored, e.g. through10

changing occupants’ energy-related behaviour. The first step in enabling behavioural

change is the gathering and sensing of pertinent data. As such, key questions emerge

about how best to approach energy sensing: what sensors should we use? How many

do we need? How intrusive and costly is the installation? Direct energy sensing with

electricity and gas sensors is commonplace [4, 5, 6], but direct sensing alone does not15

account for total energy use, nor does it allow for non-trivial analyses of the often

individualistic causal factors involved in energy consumption.

It is therefore important to look at the more abstract notion of energy events within

buildings. Rather than monitoring how often a kettle is used, it may be more useful

to monitor the events that involve kettle use, e.g. making breakfast, which could also20

comprise of other energy-consuming activities, e.g. using the hob or opening a window.

In order to be able to infer these events accurately, we need to capture the right data,

which means that we need to deploy the right sensors in the right locations around the

building. This alone is a non-trivial problem due to various factors such as health and

safety, aesthetics (the best functional position for a sensor may not be the ideal position25

2

aesthetically), power supply reach and – if the sensors are part of a sensor network –

connectivity and range.

On top of this, there are cost installation issues. System designers often have fixed

budgets and, if meaningful data is to be gathered, a sizeable number of buildings may

need to be considered for sensor installation.30

There are two key questions here: first, what are the “right” sensors for capturing

energy events in a building, and how do we measure their value? Second, given this

measure, what is the best sensor set for capturing such events in a building given certain

constraints, e.g. budgetary and deployment constraints?

There are two key contributions in this paper:35

• A method for assigning a value to a sensor in terms of its utility in capturing

human activities that involve energy consumption in a building.

• A method for the selection of maximal value sensor sets subject to practical

constraints such as budget and sensor quantities.

We compute a value metric for a given sensor in the context of a given deployment40

based upon a data set collected from a field study of domestic buildings in the UK. The

study starts from the premise that by “over-sensing” a building, it becomes possible to

identify the subset of readings, and thereby the sensors, that are necessary to capture

the energy and occupant events that characterise the building’s use. We encode a sensor

value from an aggregate measure of feature value, as output by random forest feature45

selection methods. We then combine these values with monetary cost and model the

resulting integer linear programming problem as a knapsack problem which, although

NP hard, can be solved in pseudo-linear time (O(nW )). We present some example

sensor sets from our field study as budgetary and limit parameters vary, and illustrate

how predictive certain sensors are – notably CO2 and light level sensors – from others.50

The outputs from this analysis allow the designers of energy sensing systems to

determine the predictive values for each sensor in a candidate design set, and to choose

sensor sets of maximal predictive value given budgetary and deployment constraints.

The rest of the paper is organised as follows: first, we review and contrast prior

work in building energy sensing and sensor selection. Then we outline our high level55

3

approach to sensor selection using random forests to estimate sensor value, and a

bounded knapsack algorithm to choose the sensor sets based upon a range of con-

straints. We then describe our field study in UK homes before finally discussing the

implications and limitations of the presented work.

2. Related Work60

2.1. Capturing Energy Events in Buildings

The use of technology to sense, infer and predict energy use in buildings has be-

come increasingly popular as demand for energy efficiency rises. As such, it is a broad

field, with different disciplines focusing on many areas of energy use in buildings, from

appliance and HVAC usage [7, 8] to occupants’ behaviour [9] and responses to energy65

feedback [10, 11].

There is a recognised strong correspondence between the actions of building oc-

cupants and energy use [9]. As a consequence, there has been a focus on occupant

activity recognition in relation to monitoring energy consumption and improving en-

ergy efficiency. Much indoor activity recognition is concerned with the inference of70

general activities, e.g. whether the occupants are sleeping, but our objective is cap-

turing particular activities that consume energy. Prior work in this area ranges from

direct sensing to higher level inference and automation [12]. In [13], Milenkovic and

Amft focus on energy activities in an office space. By using a hidden Markov model

(HMM) which received inputs from passive infra-red (PIR) motion sensor, they were75

able to predict desk-based work to a high degree of accuracy, with simulation results

predicting ≈ 20% energy savings if control systems used these data. Similarly, PIR

sensors are used for improving energy management through occupancy classification

by Agarwal et al. [12] who, through simulation, show that potential energy savings of

up to 15% may be achieved by integrating occupancy detection into building energy80

management systems.

Depsite the correlation between energy use and occupant action [9], much of the lit-

erature focuses on occupancy detection with hardly any consideration of the occupants’

effect on energy use. Studies into occupancy detection do tend to cite energy efficiency

4

as a motivating application, but concentrate on the performance of the occupancy detec-85

tor [14]. In [15], Patel et al. use HVAC air pressure sensors to infer occupancy as well

as door and window opening/closing events. Notable domestic sensing work that fo-

cuses more on energy use rather than occupancy includes Cohn et al.’s GasSense [4] –

which uses the sound of domestic gas relief valves to measure gas events in the home –

Gupta et al.’s ElectriSense [16] – which uses electromagnetic interference (EMI) sig-90

natures to monitor appliance electricity use – and Froehlich et al.’s HydroSense [17],

which classifies water usage events through pressure changes.

Our work focuses on more than just occupancy detection; rather we concentrate

on sensing energy events i.e. human activity involving energy consumption. Similar

studies tend to focus on atomic energy events, e.g. what appliances are being used [5],95

but we consider more abstract events such as “preparing food”, which can incorporate

multiple atomic events; often concurrently. This aligns with the idea that occupant

behaviours have strong relationships with energy efficiency [3].

Attempting to recognise more abstract events comprised of multiple directly de-

tectable events is an approach that has been used previously by Wilke et al. [18] to100

model real-time occupancy in buildings. Our work is similar, in that we too use the

Multinational Time Use Survey to determine interesting events [19]. Again, however,

our focus is on events that consume energy, rather than those that determine occupancy.

There is an increasing industrial demand for energy sensing and occupant behaviour

learning with a view to saving energy. Commercial systems such as NEST1 – which105

uses a variety of environmental sensors – are popular, although they do require occu-

pant training and the intelligent features have suffered from usability issues [6].

Finally, energy sensing in buildings is typically performed through direct sensing

of electricity use through whole-building and plug-load electrical sensing [7], disag-

gregation of appliance use from electrical sensing traces [5, 16] and direct gas use110

sensing [4]. However, comparatively few studies have considered deploying environ-

mental sensors to infer energy use; mainly because these sensors are not designed for

direct energy measurement. There is potential predictive value in using environmental

1http://www.nest.com

5

http://www.nest.com

sensors in conjunction with direct energy sensors, and our work in this paper concen-

trates on measuring this predictive value prior to selecting the appropriate sensors for115

the application.

2.2. Sensor Selection Approaches

The goal of sensor selection is to choose from an existing set of sensor inputs in

order to maximise some objective function or parameter [20, 21]. Part of our contribu-

tion in this paper is the derivation of sensor “value” in terms of its utility in capturing120

energy events. In contrast to other sensor selection approaches, we are concerned with

the more practical problem of sensor selection a priori, i.e. choosing the sensor set

design prior to deployment in an application given the practical constraints in doing so,

rather than choosing the best measurement from a pool of existing sensors.

The closest work to the study in this paper is Zhang et al.’s study of feature selection125

for occupancy classification in office spaces [22]. Here, the authors explore the relative

information gain – or uncertainty coefficients – as a value measure for a small range

of sensors using intermittent ground truth gathered in an office environment. We use

a different measure of sensor value in a domestic environment, but our results broadly

support Zhang et al.’s, which show that sound and CO2 sensors appear to be the most130

effective at detection; albeit for energy events in ours, and occupancy events in theirs.

By incorporating sensor costs, however, we show that these sensors are not always the

best ones to choose for maximising sensor value given a set of constraints. In brief, Response to

R3

Response to

R3the proposal in [22] pursues “the best sensor sets” for the occupancy detection in an

office setting, based only on sensor values, and without consideration of the constraints135

(e.g. each sensor cost which affects the total cost of the sensor sets). Our approach

instead attempts to consider the combination of sensor values, constraints, and sensor

redundancy 2 in order to find out the best sensor set that meets the cost constraints.

For example, a CO2 sensor could be the most significant for the desired detection (for

both energy event and occupancy), but the unit cost is relatively high. If we have a cost140

constraint on the final choice, our approach could suggest other alternatives, by consid-

2See Section 3.5 for more details.

6

ering both the cost and the efficiency, such that when the cost of a CO2 sensor exceeds

the available budget, a cheaper sensor or sensor set (e.g. temperature and PIR sensors)

could be suggested as an alternative proposal that satisfies the cost constraint as well

as efficiency of energy event detection.145

Our use of knapsack algorithms, or integer linear programming in general, is not

new, although the application to sensor selection for energy event capture, as far as

we are aware, is. The use of knapsack algorithms has previously been applied to the

domain of sensing, typically for time-dependent resource usage. In [21], Joshi and

Boyd use convex optimisation to develop a heuristic approach that approximates sen-150

sor subsets for minimising the error of parameter estimation. Godrich et al. directly use

the knapsack problem to formulate optimal configurations for radar architectures [23],

and Bian et al.[20] use a more general form of linear programming to select a subset of

sensors from a theoretical global set based upon maximum utility. Here, utility is some-

what abstract, although the authors do give an example of expected variance reduction155

in average sensor measurements, i.e. the usefulness of a sensor is its accuracy.

In summary, our work seeks to aid in the a priori choice of sensors for capturing

energy events in buildings. By combining work in sensor selection problems with the

field of energy sensing, we present an original design approach.

3. Sensor Selection and Study Design160

In this section, we describe our method for designing sensor sets to capture energy

events in buildings. We first define the problem statement in greater detail, before

describing the general design approach of assigning a value measure to each sensor and

choosing sets using a BKP algorithm. We also outline our means of measuring sensor

redundancy, i.e. the amount each sensor can be predicted from others, and detail the165

methodology of our field study.

3.1. Problem Statement

Our general problem statement is: what is the best sensor set design for capturing

energy events in a particular building? The “best” sensor set needs a more concrete

7

definition however, and the best set is unlikely to be the same across all building types.170

The best set for a single building with stringent accuracy requirements will not be the

best set for a large deployment with limited budgetary requirements. Rather than define

a global “best set”, we give an approach to determining the best set given contextual

parameters, e.g. budget and scale of deployment.

Thus, we refine the problem statement to be: what then is the best sensor set design175

for capturing energy events in a particular building for a given set of parameters? In

this case, the best set is one that maximises the information required for event capture,

whilst meeting the cost requirements of deployment. We can set this up as a con-

strained optimisation problem, which requires that each sensor have a measure of cost

and value, and solves the maximum value achievable given the cost constraints. With180

these cost and value measures, the constrained optimisation problem becomes a form

of the famous knapsack problem [24], which can be solved in pseudo-linear time using

dynamic programming.

Thus, the key problem is not so much the optimisation process, but the determina-

tion of sensor cost and value. Cost may typically be simply defined as the financial185

cost (but see §3.4 for discussion of other factors), so it is sensor value that is the key

measure to define. In the next section, we formally outline the constrained optimisation

problem, before detailing our approach for calculating sensor values.

3.2. Constrained Optimisation: The Knapsack Problem

The knapsack problem is a simple integer linear program that seeks to find the190

optimal combination of n distinct items that maximises the total value of a weight-

constrained knapsack, given that each item has a value and a weight. More formally,

given n distinct items, where each item i has a corresponding value vi, number of

copies xi and weight wi, and an overall weight constraint W , the knapsack problem

seeks to:195

maximise:n∑

i=1

vixi (1)

8

subject to:n∑

i=1

wixi ≤W

xi ∈ {0, . . . , ci}

(2)

where ci is an upper bound on the number of copies of each item. ci could be viewed

as a sensor quantity limit, e.g. a stock limit. The above problem is a bounded knapsack

problem (BKP), which does not restrict the items in the knapsack to one copy each; as

is the case for the 0-1 knapsack problem (KP). The BKP can be solved by reduction

to a KP, allowing a dynamic programming solution in O(nW logW ) [24] or O(nW )200

[25].

Thus, we can apply the knapsack problem to the problem of designing sensor sets Response to

R1

Response to

R1for energy event capture in buildings. Instead of items, we have sensors with a measure

of predictive value for capturing energy events, and instead of weights, we have a mea-

sure of cost.Within this context: (i) n distinct items correspond to a number of distinct205

sensors that our study considers (e.g. Temperature, Humidity, Light, Sound, CO2 etc),

(ii) each item i corresponds to each sensor with vi denoting the predictive value of

the sensor i, (iii) number of copies xi is a quantity of the sensor i, (iv) the weight wi

corresponds to the (financial) cost of the sensor i, and (v) the overall weight constraint

W corresponds to the budget i.e cost requirements of deployment. Our final knapsack210

is the chosen sensor set that maximises Equation 1 with the number of sensors (xi) and

their predictive value (vi), and the weight constraint (W ) is a budget over some cost

measure which is specified in Equation 2.

Thus, for each sensor, we need to determine:

• Value (vi): A measure of each sensor’s value, in terms of its response to energy215

events in the building.

• Cost (wi): An applicable measure of cost, or “weight” in the knapsack problem.

Given values for vi and wi, a quatity xi of the sensor i can be installed at xi locations

in the domestic buildings.

9

3.3. Defining Sensor Values220

Determining a measure of value for a sensor is context-dependent and potentially

non-trivial. In our case, a more valuable sensor provides better information about en-

ergy events in a building than a less valuable one. For each sensor, we define a number

of features of its raw measurement, and view the problem as a feature selection prob-

lem, i.e. what sensor features are better predictors of energy events in buildings. We225

can then aggregate each feature’s value measure into an overall value measure for the

sensor.

3.3.1. Feature Extraction

Before undertaking feature selection, we must define and calculate the sensor fea-

tures that we wish to measure through feature extraction. This is done because we be-230

lieve that some feature such as first-order difference in sensor measurements or moving

average of sensor measurements will be more strongly predictive of energy events than

the raw measurement alone. The definition of a feature is a free choice for the designer,

and there is no limit to the type or number of features that can be chosen for feature

extraction. Again, this is likely to be context dependent, and we define the features for235

our field study in §3.7.

3.3.2. Feature Selection: Random Forest

To perform feature selection, we use a random forest process on the extracted fea-

tures. A random forest is an ensemble method that combines a set of decision tree

classifiers, each of which is comprised of a random sample of input variables (in our240

case, extracted features). For brevity, we refer the reader to Breiman’s description

of the random forest method for a detailed overview [26]. We use random forests to

measure the value of each extracted sensor feature using the average decrease in node

impurities from splitting the decision trees on that feature.

For this, we use the Gini impurity measure, i.e. the greater the decrease in the Gini245

impurity for the feature variable – averaged over the forest – the more important the

feature variable. Thus, to measure the value of each sensor, we uses the mean Gini

impurity decrease over the features attributable to each sensor, since the inclusion or

10

exclusion of a sensor adds or removes its entire feature set. Moreover, sensor values

are unlikely to be independent, and the mean Gini decrease provides a way to average250

the incremental effect of each sensor in the candidate set. Thus, we use mean Gini

decrease over the sensor’s feature set as the sensor’s value measure in the knapsack

problem.

3.4. Defining Sensor Costs

As with the choice of value measure, the choice of cost measure is likely to be255

context-dependent. An obvious choice is the financial cost of each sensor, but more

complex cost functions could be designed that incorporate, for example, sensor energy

costs, installation effort or sensor reliabilities. In addition to budgetary constraints,

logical constraints can be introduced that restrict the chosen sensor set to particular

subsets of the overall power set (all 2n possible choices of sensor set from n sensors).260

3.5. Sensor Redundancy

Once a sensor set is found according to defined sensor costs and values, design

decisions surrounding the pruning of sensors may be aided through measuring sensor

redundancy, i.e. how much information about a sensor’s value can be predicted from

the others in the set? In [22], Zhang et al. use an information theoretic approach to

select features for occupancy detection using environmental sensors in an office, and

we use a similar approach here for energy events 3. Using the entropy function from Response to

R1 and R3

Response to

R1 and R3information theory for each sensor output:

H(X) =∑x∈X

p(x) log

(1

p(x)

)(3)

where H(X) is an entropy, x is a random variable, and p(x) is the probability of X = Response to

R1

Response to

R1x. In this work, the random variable x corresponds to a sensor feature measured and

derived from one sensor.

3Zhang et al. use information theory to study the correlation between occupancy levels and features

extracted from various environmental measurements [22]. Conversely, we use information theory for the

purpose of identifying the correlation between each pair of sensors in the set i.e. how much information

about the feature extracted from measurements of one sensor can be predicted by that of the other.

11

I(X;Y ) is the mutual information content of variables X and Y :265

I(X;Y ) =∑y∈Y

∑x∈X

p(x, y) log

(p(x, y)

p(x)p(y)

)(4)

where x and y are random variables, p(x) and p(y) are the probability of X = x and Response to

R1

Response to

R1Y = y, respectively, and p(x, y) is the conditional probability of x given y. As noted

previously, the random variable x ∈ X corresponds to the feature of one sensor and the

random variable y ∈ Y corresponds to that of the other. Thus, I(X;Y ), is the mutual

information content (MIC) between two co-present sensors, X and Y , and is a measure270

of the quantity of common information that can be derived from them.

Then, calculate the uncertainty coefficient:

CXY =I(X;Y )

H(Y )(5)

That is, the proportion of bits about sensor feature Y that can be predicted from sensor

feature X .

We calculate the uncertainty coefficient CXY over all sensor feature pairs X × Y275

to explore possible redundancy in sensor set selections. That is, once a sensor set is

chosen, one can use the uncertainty coefficient measures to remove further sensors from

the set if needs be. For example, let us assume that the uncertainty coefficient between Response to

R1

Response to

R1the temperature and the CO2 sensor is high. This suggests that there is high probability

that the information measured and derived from the CO2 sensor can be predicted by280

the temperature sensor, which is to say that the CO2 sensor is redundant in the presence

of a temperature sensor. If the cost of CO2 sensor is much greater than a temperature

sensor, then the sensor set designer may be able to choose a temperature sensor instead

of a CO2 sensor for energy event capture. This is not to suggest that a temperature

sensor would replace the CO2 sensor for physical measurements, but rather that when285

the intention is to detect changes in signals as a proxy for occupancy then either sensor

is likely to provide the same signal at the same strength. This could be done before the

sensor set is chosen, but it would be prudent to observe the sensor’s influence in the

chosen set before considering pruning it based upon redundancy. In our field study, we

12

present the uncertainty coefficients for all co-present sensors (sensors in the same room290

of each home) to illustrate the redundancy in our buildings’ sensors.

3.6. Field Study

In order to demonstrate how a sensor set for capturing energy events can be chosen,

we present the results of a field study in a set of domestic buildings in the UK. We

recruited 4 homes to be studied for the duration of 7 consecutive days in August 2013.295

Details of the homes including house type, number of bedrooms and number of oc- Response to

R2

Response to

R2cupants are summarised in Table 1. The experimental settings including the sensor

placement in each home are shown in Figure 14. Within certain rooms in each home –

each room common to each home – we installed the following sensors:

• Kitchen: Temperature, light, humidity, PIR, CO2 and sound level sensors.300

• Living Room and study: Temperature, light, humidity, PIR, and sound level

sensors.

• Main bedroom and secondary bedroom: Temperature, PIR and sound level

sensors.

Temperature was recorded in ◦C, light in lux, CO2 in ppm, motion in {0, 1} and305

sound level in dB. Each of the room’s sensors were connected to a single Arduino Uno

4Note that the layouts are not exactly the same as the participants’ dwellings, but equivalent to the build-

ing details they provided. We use these examples to show the context for the sensors for the field study.

Home Type Bedrooms Floors Occupants

A Terraced 3 3 2

B Semi-detached 3 2 3

C Detached 4 2 3

D Terraced 4 3 2

Table 1: Descriptions of the homes used in the field study.

13

board, (5 boards per home) which was housed in an acrylic plastic box shown in Fig-

ure 2. The sensors were placed on surfaces such as bookshelves and kitchen counters,

and each of them sampled data at a rate of once per minute. The data were sent to us

remotely over the home’s WiFi connection and simultaneously logged locally to an SD310

card in order to reduce risk of data loss.

To capture a record of ground truth events in each home, we asked the primary

occupant to record energy-related events around the home throughout the week in a

diary study. This was considered appropriate over ethnographic methods as it allows

examination of temporal sequences across an extended time period in a practical and315

accessible manner [27].

To define the energy events, we used Oxford University’s Multinational Time Use

Study (MTUS) data [19], selecting domestic event codes that classify energy-consuming

events around the home, similar to a method used by Wilke et al. [18] to predict build-

ing occupant activities.320

The primary occupant was presented with the list of events in Table 2 as guidance

on the type of events to capture. Then, throughout the duration of the study, the occu-

pant was asked to log as many of them as possible in Google’s calendar application so

that we could capture the event description, its location, i.e. room, and the start and end

times of the event. The occupants were given no restriction on the event description325

text, i.e. although the MTUS data was used as a guide to the type of events to capture,

participants were free to use their own label descriptors.

In addition to these events, participants were asked to record known periods where

homes were unoccupied and no energy events were undertaken. This allowed to us

encode ground truth for each room into a variable with three levels:330

• Known energy event: the occupant logged an energy event.

• Known absence of event: the occupant logged an absence of energy events.

• Nothing recorded: the occupant did not log anything, i.e. ground truth is un-

known.

We dismissed data during which the occupants did not log anything, i.e. the ground335

14

truth was unknown. Although this reduces the size of the dataset for analysis, it is

a manifestation of using self-report methods to capture ground truth. Participants are

unlikely to capture everything, but their behaviour is perhaps more “natural” than if

other methods, e.g. ethnography, were used. The diary study also minimises the risk

of retrospective bias common to other self-report methodology as the recorded events340

were objective and concrete by nature [28]. Furthermore, the neutrality of the events

recorded should minimise social desirability bias.

Ethnographic methods are also time consuming for both the researcher and the

participant, which may compromise on both study validity and the duration of the data

capture. We attempted to minimise participant burden further through the presentation345

of clearly defined event classes (Table 2) [28].

3.7. Extracted Features

For each of the sensors, we calculated the following features:

• Raw value at timestep k: yk

• First order difference: ∆(yk) = yk+1 − yk350

• Second order difference: ∆2(yk) = ∆(yk+1)−∆(yk)

• Simple moving average, over a m minute window:

ȳk =1

m

k∑i=k−m+1

yi (6)

These are similar features to those used by Zhang et al. in [22] in their study of sensor

feature selection for office space occupancy detection.

3.8. Section Summary

In summary, we have outlined our method of sensor selection using a BKP algo-355

rithm. We have also described our measure of a sensor’s value in terms of its utility in

capturing energy events in buildings, as generated from random forests. Furthermore,

we have defined a measure of redundancy within a sensor set using the uncertainty co-

efficient. Following the methodology of our field study, the next section presents the

results of applying the techniques in this section to the data obtained from the study.360

15

4. Results

This section presents the results from our field study and uses the procedure out-

lined in § 3 to identify the best sensor set. We first show the sensor values calculated

using the random forest approach, along with the observed sensor redundancies as

measured by the uncertainty coefficient in Equation 5. We then examine various exam-365

ple sensor sets output from the BKP algorithm using these sensor values and a list of

illustrative costs.

4.1. Configuration Parameters

All sensor data were captured at a sample rate of once per minute. For the random

forest process, our study dataset is split .7 training data, .15 validation data and .15 test370

data. Each forest consists of 500 trees, with 4 variables randomly sampled per split;

no replacement. We used the R package “randomForest” [29] to run the random forest

process with the aforementioned parameters. This package uses Breiman’s approach

[26].

For the BKP, we use Pferschy’s O(nW ) BKP algorithm described in [25]. For375

the probability distributions p(x) in Equation 3, we use implicit probability estimators

from the dataset frequencies. For the moving average feature, we set m – the moving

average window – to 20 minutes for each sensor. The sensor values for the BKP are

set to the mean Gini decrease measures for each sensor. For the sensor costs, we

use the approximate financial cost of the sensors in our study setup, which includes380

the cost of each sensor itself plus a portion of the hardware required to acquire data

from it remotely, e.g. CPUs and WiFi hardware. We must stress that this measure is

illustrative for the purposes of demonstrating our sensor selection process, and should

not be viewed as a standalone measure (unlike the sensor value measure) – the costs are

financially realistic at the time of writing, but obviously varies across manufacturers,385

suppliers, time and market. The costs for the sensors are as follows: 215 for CO2, 20

for humidity, 16 for light, 115 for sound and 17 for temperature.

Figure 3 shows a plot of the raw sensor data from Home 1’s kitchen over the du- Response to

R2

Response to

R2ration of the study. Figure 4 is an example of an energy event recorded on Aug 06,

16

2013 by Home 1, retrieved from Google’s calendar application. These participant-390

recorded energy events are encoded into three ‘event’ states seen at the bottom row,

entitled ‘Event’, in Figure 3: (i) 1 corresponds to a participant-recorded energy event

in this particular room, a kitchen (e.g. Food in Kitchen, Coffee in Kitchen, Cooking

in Kitchen, Dishwasher in Kitchen), (ii) 0 corresponds to a participant-recorded “non-

event” and (iii) NA corresponds to no record. Here is an example. As seen in Figure 4,395

5 energy events are recorded by Home 1’s participant. Since there is no PIR event ob-

served in the morning, only 4 events in the afternoon are encoded as 1 representing an

energy event in the kitchen on Aug 06, 2013.

The study participants logged 392 events in total over the 7 days (A = 119, B = 59,

C = 77, D = 137).400

4.2. Sensor Values

Figure 5 shows the top 10 ranked feature set as output from the random forest

process using the mean Gini impurity decrease as a value measure. Figure 6 shows the

mean Gini impurity decrease for each sensor, averaged over the sensor’s features.

Figure 7 shows the uncertainty coefficients of each sensor’s raw measurement rel-405

ative to the others, i.e. the approximate proportion of bits that can be predicted about

sensor j from sensor i. Note, this is only calculated using sensors that are co-present,

i.e. sensors that are located on the same Arduino board in the same room of each study

home.

4.3. Optimal Sensor Sets410

Figure 8 shows a set of example sensor sets output by the BKP algorithm for given

weight constraints (W ) and upper bounds on the sensor quantities ci. The values are

the mean Gini impurity decrease measures in Figure 6, and the costs are described in

Section 4.1 above.

5. Discussion415

This section discusses the results and their implications and limitations for energy

sensing in buildings. The two key outputs from our work are (i) a quantitative measure

17

of sensor “value” as a predictor of energy events; and (ii) an approach for designing

sensor sets for energy sensing in buildings based upon values and a measure of cost.

5.1. Implications420

The first implication of this work relates to the utilisation of environmental sen-

sors as predictors of energy events in buildings. The sensors in our study are designed

to measure a particular environmental property, e.g. temperature, rather than direct

energy use – something that devices such as current clamps attached to electricity me-

ters and plug power monitors do. The sensor values show that temperature, humidity,425

light, CO2, sound and motion sensors are useful predictors of energy use, though their

predictive values do vary both across sensors and between homes.

By combining these values with costs – in our case, financial costs – it is interesting

to note that some of the more valuable sensors, e.g. CO2 and sound, are not often

included in the design sets output by the BKP solver (see Figure 8). Clearly this is430

because the building’s sensing value can be maximised by using multiple low-cost,

less valuable sensors rather than fewer high-cost, more valuable ones.

Other interesting results include the comparatively low Gini measure for the PIR

motion sensor. Although, from Figure 3, motion appears to visually correspond to en-

ergy events, it is an event-based sensor and even its moving average value is not an435

outstanding predictive feature. There are also issues relating to stationary people not

triggering the sensor, and the argument that a motion sensor is not a presence sen-

sor [13]. A probabilistic input such as a pre-learned HMM may be more suitable to

increase this value. Despite this however, the PIR sensor tends to be chosen for mid-

budget sensor sets due to its low cost.440

From Figure 7, we see that CO2 shares an almost uniform amount of information

with the other displayed sensors and that light level shares the largest (in mean value).

The CO2 result broadly agrees with the value from the office study in [22], although

humidity is lower. A higher coefficient implies redundancy in the sensor information,

which could be used by the designer to prune sensors from the set if necessary. The445

uniform – and relatively high – uncertainty coefficient for the CO2 sensor, coupled with

its typically large financial cost stands in contrast to its large though variable sensor

18

value (see Figure 6).

This work has further implications for designers of energy sensor systems. By

choosing the sensors a priori, deployment costs can be saved by lowering sensor re-450

dundancy, though it is probably wise to test a larger set in a pilot study as we have done

here. Although our sensor values can be taken as a measure of predictive value, this

value is likely to be context specific, i.e. our field study was conducted in domestic

buildings, and we recommend that designers replicate our approach in order to obtain

customised sensor values. However, the values presented in the results can be used as455

a guideline to the predictive power of the sensors in a domestic context.

There is also an interesting argument for using a KP solver rather than a BKP one

(as we have used in this paper) for the sensor set specification. The BKP solver allows

multiple copies of each sensor to be included in the final building set; therefore the

physical sensor units, e.g. the Arduino or Raspberry Pi extension boards, may vary in460

their design in order to accommodate multiple sensors in different locations. By using

a KP solver, a single, consistent sensor unit can be designed that only allows one copy

of each item in the output set. The advantage here lies in the parsimony of general

design, but it does restrict the amount of energy information that could be extracted

from a building compared with a BKP set. Thus, there is a design trade-off between465

simplicity and value that the designer should make. It is relatively trivial to run a KP

solver using the process in this paper, so the output sets can be compared without much

further work.

Scalability is another key implication of our work. As sensors vary in cost and bud-

gets are typically fixed, designers and researchers may face the problem of choosing470

a large sensor set for a small number of buildings, or a sparser sensor set for a larger

number of buildings. Using our approach, these constraints can be fixed – see the ex-

amples in Figure 8 – to suit the design requirements. Likewise, if there are sensors that

are essential to the application requirements, they can be removed from the candidate

set and the BKP may be run on the remaining set.475

Finally, our approach can be generalised beyond domestic buildings. Although our

field study was conducted within the home, there is no restriction to this, but we do

suggest that new sensor values be derived for environments other than domestic ones.

19

Furthermore, various value and cost functions may be used. In this paper, we have

used the Gini impurity measure for value, and approximate financial cost for the cost480

measure. Again, there are no restrictions to the measures used – particularly for cost –

as functions could be designed to combine, for example, financial cost with energy or

installation disruption costs.

5.2. Limitations

The main limitations of our work relate to the context of sensing, the range of485

sensors and the study size. As discussed in the previous section, the context of sensing

is important and the results obtained here are more applicable to, though not restricted

to, domestic buildings. Furthermore, our range of sensors could be extended, as well

as the features chosen for analysis in the random forest process.

Indeed, there are many parameters to explore in the feature extraction step. In490

addition to the choice of features, parameters such as moving average type or time

window can be varied.

Other means of defining sensor values could also be derived. We chose to use the

random forests approach due to its robustness and frequent use in the feature selection

problem [26], but other approaches incorporating dimensionality reduction, e.g. prin-495

cipal components analysis (PCA), or regression models, e.g. generalised linear models

(GLMs) or partial least squares analysis, could be used instead.

As we previously mentioned, financial cost is used in this paper as an illustrative

cost measure, but other costs could be defined that incorporate, for example, installation

effort or sensor energy use. Furthermore, the BKP algorithm is very sensitive to the500

cost and value measures, thus a robust measure of each would be useful for future work.

Our range of sensors was relatively small, and large projects are likely to consider

a greater range than the environmental ones used in our study. However, this does not

detract from the generalisability of our approach: random forests and the BKP solver

can handle larger inputs.505

Finally, our study was also comparatively small due to the constraints of fine-

grained ground truth collection. As discussed in § 3, ground truth collection is la-

borious, and alternatives to the diary study are likely to compromise on data validity

20

[28]. Using a larger dataset gathered from more homes would reduce the uncertainty in

the value measures in Figure 6; in particular, even though CO2 and sound sensors have510

the larger mean Gini decreases, they also have the largest variances observed in the

study data, thus more data would reduce this variance to give more accurate empirical

measures of sensor value.

5.3. ENLITEN Deployment

We have used the process outlined in this paper to design a sensor set on the EN-515

LITEN project [30], which aims to sense a wide range of energy-related, environmen-

tal and occupancy properties for domestic energy reduction. Along with direct energy

sensors such as current clamps, gas meters and plug-load monitors, we have used the

design process in this paper to create cheap, wireless sensor units from Raspberry Pi

computers. At the time of writing, Raspberry Pis are inexpensive computing devices520

with standard hardware interfaces such as USB and Ethernet. They run a small operat-

ing system, and also contain a general purpose hardware interface.

Figure 9 shows a Raspberry Pi computer with our custom board containing the

sensors output from the BKP solver: temperature, humidity, light and motion. We

are deploying three of these sensor units per home (with a target deployment of 200525

homes), with another temperature-only unit for monitoring radiator and boiler temper-

atures. All sensor units report their data in real time over WiFi through the occupants’

broadband connection or a mobile data connection.

6. Conclusion and Future Work

In this paper, we have presented a process for designing sensor sets to capture530

energy events in buildings. The key contributions lie in the use of random forests

to produce a measure of sensor value a priori, and the implementation of a bounded

knapsack problem (BKP) solver that chooses an optimum sensor set given a set of costs

and values. Through a field study in 4 UK homes, we have illustrated how random

forests can be used to output a measure of predictive value using the Gini impurity535

measure, and how this measure – when combined with an appropriate cost measure,

21

e.g. financial cost – can be used to generate sensor sets given designer constraints.

Through this, we have also shown that more valuable but expensive sensors such as

CO2 are often not included in the sets due to their high cost. Furthermore, we have

shown that CO2 and light sensors are particularly predictable, with a mean predictable540

proportion for both of ≥ 0.4 bits from the other sensors used in our study of domestic

buildings (temperature, humidity and sound level).

For future work, we suggest replicating our field study in other building types,

e.g. industrial buildings, and comparing further measures of cost beyond the purely

financial. As we are currently deploying our sensor sets in the ENLITEN project,545

a large part of our future work involves validating how well the sensors perform as

inputs to building energy and occupancy model. Another potential research is a com-

parison of real data collected by our sensor sets and simulation results e.g. to analyse

the uncertainly coefficient matrix for co-present sensors with pre-simulation for the en-

vironment parameters, which allows the evaluation of the approach we present e.g. the550

information theoretic approach in measuring sensor redundancy.

Acknowledgments

This work is funded by EPSRC grant reference EP/K002724/1. All data created

during this research are openly available from the University of Bath data archive at .

References555

[1] Climate Change Act, retrieved from http://www.legislation.gov.

uk/ukpga/2008. Dec 2013 (2008).

[2] Committee on Climate Change. Meeting Carbon Budgets – ensuring a low-

carbon recovery, retrieved from http://www.theccc.org.uk/reports.

Dec 2013 (June 2010).560

[3] H. Allcott, S. Mullainathan, Behavioral science and energy policy, Science

327 (5970) (2010) 1204–1205.

22

http://www.legislation.gov.uk/ukpga/2008http://www.legislation.gov.uk/ukpga/2008http://www.legislation.gov.uk/ukpga/2008http://www.theccc.org.uk/reports

[4] G. Cohn, S. Gupta, J. Froehlich, E. Larson, S. N. Patel, GasSense: Appliance-

level, single-point sensing of gas activity in the home, in: Proc. Pervasive ’10,

Springer, 2010, pp. 265–282.565

[5] J. Froehlich, E. Larson, S. Gupta, G. Cohn, M. Reynolds, S. Patel, Disaggregated

end-use energy sensing for the smart grid, Pervasive Computing, IEEE 10 (1)

(2011) 28–39.

[6] R. Yang, M. W. Newman, Learning from a learning thermostat: lessons for intel-

ligent systems for the home, in: Proc. Ubicomp ’13, ACM, 2013, pp. 93–102.570

[7] J. Kolter, M. Johnson, REDD: A public data set for energy disaggregation re-

search, in: Workshop on Data Mining Applications in Sustainability (SIGKDD),

San Diego, CA, 2011.

[8] O. Parson, S. Ghosh, M. Weal, A. Rogers, Nonintrusive load monitoring using

prior models of general appliance types, in: Proc. AAAI ’12, 2012.575

[9] Z. Yu, B. Fung, F. Haghighat, H. Yoshino, E. Morofsky, A systematic procedure to

study the influence of occupant behavior on building energy consumption, Energy

and Buildings 43 (6) (2011) 1409–1417.

[10] E. Costanza, S. Ramchurn, N. Jennings, Understanding Domestic Energy Con-

sumption through Interactive Visualisation: a Field Study, in: Proc. Ubicomp580

’12, 2012, pp. 216–225.

[11] S. Darby, The Effectiveness of Feedback on Energy Consumption, Working Pa-

per, Oxford Environmental Change Institue (2006).

[12] Y. Agarwal, B. Balaji, R. Gupta, J. Lyles, M. Wei, T. Weng, Occupancy-driven

energy management for smart building automation, in: Proc. 2nd ACM Workshop585

on Embedded Sensing Systems for Energy-Efficiency in Buildings, ACM, 2010,

pp. 1–6.

[13] M. Milenkovic, O. Amft, An opportunistic activity-sensing approach to save en-

ergy in office buildings, in: Proc. e-Energy ’13, ACM, 2013, pp. 247–258.

23

[14] V. L. Erickson, Y. Lin, A. Kamthe, R. Brahme, A. Surana, A. E. Cerpa, M. D.590

Sohn, S. Narayanan, Energy efficient building environment control strategies us-

ing real-time occupancy measurements, in: Proc. 1st ACM Workshop on Embed-

ded Sensing Systems for Energy-Efficiency in Buildings, ACM, 2009, pp. 19–24.

[15] S. N. Patel, M. S. Reynolds, G. D. Abowd, Detecting human movement by differ-

ential air pressure sensing in HVAC system ductwork: An exploration in infras-595

tructure mediated sensing, in: Proc. Pervasive ’08, Springer, 2008, pp. 1–18.

[16] S. Gupta, M. S. Reynolds, S. N. Patel, ElectriSense: single-point sensing using

EMI for electrical event detection and classification in the home, in: Proc. Ubi-

comp ’10, ACM, 2010, pp. 139–148.

[17] J. E. Froehlich, E. Larson, T. Campbell, C. Haggerty, J. Fogarty, S. N. Patel,600

HydroSense: infrastructure-mediated single-point sensing of whole-home water

activity, in: Proc. Ubicomp ’09, ACM, 2009, pp. 235–244.

[18] U. Wilke, F. Haldi, J.-L. Scartezzini, D. Robinson, A bottom-up stochastic model

to predict building occupants’ time-dependent activities, Building and Environ-

ment 60 (2013) 254–264.605

[19] K. Fisher, J. Gershuny, Multinational Time Use Study. Chapter 3: Activity Codes,

retrieved from http://www.timeuse.org/sites/ctur/files/858/

mtus-user-guide-chapter-3.pdf. Dec 2013 (July 2013).

[20] F. Bian, D. Kempe, R. Govindan, Utility based sensor selection, in: Proc. IPSN

’06, ACM, 2006, pp. 11–18.610

[21] S. Joshi, S. Boyd, Sensor selection via convex optimization, Signal Processing,

IEEE Transactions on 57 (2) (2009) 451–462.

[22] R. Zhang, K. P. Lam, Y.-S. Chiou, B. Dong, Information-theoretic environment

features selection for occupancy detection in open office spaces, Building Simu-

lation 5 (2) (2012) 179–188.615

24

http://www.timeuse.org/sites/ctur/files/858/mtus-user-guide-chapter-3.pdfhttp://www.timeuse.org/sites/ctur/files/858/mtus-user-guide-chapter-3.pdfhttp://www.timeuse.org/sites/ctur/files/858/mtus-user-guide-chapter-3.pdf

[23] H. Godrich, A. P. Petropulu, H. V. Poor, Sensor selection in distributed multiple-

radar architectures for localization: A knapsack problem formulation, Signal Pro-

cessing, IEEE Transactions on 60 (1) (2012) 247–260.

[24] S. Martello, P. Toth, Knapsack problems: algorithms and computer implementa-

tions, John Wiley & Sons, Inc., 1990.620

[25] U. Pferschy, Dynamic programming revisited: improving knapsack algorithms,

Computing 63 (4) (1999) 419–430.

[26] L. Breiman, Random Forests, Machine Learning 45 (1) (2001) 5–32.

[27] K. Ellegård, J. Palm, Visualizing energy consumption activities as a tool for mak-

ing everyday life more sustainable, Applied Energy 88 (5) (2011) 1920–1926.625

[28] N. Bolger, A. Davis, E. Rafaeli, Diary methods: Capturing life as it is lived,

Annual Review of Psychology 54 (1) (2003) 579–616.

[29] A. Liaw, M. Wiener, Classification and Regression by randomForest, R News

2 (3) (2002) 18–22.

[30] T. Lovett, E. Gabe-Thomas, S. Natarajan, E. O’Neill, J. Padget, ‘Just enough’630

sensing to ENLITEN: a preliminary demonstration of sensing strategy for

the’energy literacy through an intelligent home energy advisor’ (ENLITEN)

project, in: Proc. e-Energy ’13, ACM, 2013, pp. 279–280.

25

Figure 1: Experimental Settings – The placement of sensors at each home: (a) Home A – 3 Bedrooms, 3

Floors, (b) Home B – 3 Bedrooms, 2 Floors, (c) Home C – 4 Bedrooms, 2 Floors and (d) Home D – 4

Bedrooms, 3 Floors in Table 1

26

Figure 2: Sensor box in situ., showing PIR, temperature, CO2, light, sound and humidity sensors.

Category Example(s) MTUS code [19]

Wash Bath or shower Selfcare

Windows Opening or closing windows and external doors for extended periods of time –

Eat/Drink Eating meals, e.g. breakfast Eatdrink

Food preparation/cooking Preparing meals Foodprep

Wash dishes Using a dishwasher Foodprep

Cleaning Vacuuming Cleanetc

Laundry Using a washing machine, tumble dryer or iron Cleanetc

Sport/exercise Using a treadmill Sportex

Receive friends Hosting a party Leisure

Music listening Listing to radio or stereo TVradio

Watch TV Watching TV, DVD or web-streamed content TVradio

Play computer games Using a games console Compgame

Use computer Using PC or laptop for work Compint

Unoccupied Empty home with no activity –

Table 2: Energy events logged by study participants, with categories, example events and corresponding

MTUS codes.

27

22

24

26

30405060708090

750100012501500

05

10152025

40455055

0.000.250.500.751.00

0.000.250.500.751.00

Temp (C

)S

ound (dB)

CO

2 (ppm)

Light (lux/100)H

umidity (%

)P

IRE

vent

Aug 06 Aug 08 Aug 10 Aug 12Day

valu

e

Figure 3: Time series for Home 1’s kitchen over the week-long study. ‘Event’ is encoded as one of three

states: event (1), non-event (0) and no record (NA).

28

Figure 4: Example - Home 1’s participant-recorded energy event in a day (Aug 06, 2013)

29

0

50

100

maD

igital_Inputs

Hum

idity

maH

umidity

maTem

perature

Sound_level

Light_level

CO2_concentration

maLight_level

maC

O2_concentration

maS

ound_level

Feature

MeanD

ecreaseG

ini

Figure 5: The top ten sensor features using the mean Gini impurity decrease from the random forest process;

the larger the better. ma = moving average, .95 CIs shown (non-parametric bootstrap; 1000 replicates).

30

0

10

20

30

40

CO2 Digital Humidity Light Sound TemperatureSensor

MeanDecreaseGini

Figure 6: Mean Gini impurity decrease over all features for each sensor. .95 CIs shown (non-parametric

bootstrap; 1000 replicates).

31

0.39 0.4 0.43 0.39

0.27 0.26 0.3 0.39

0.43 0.42 0.27 0.39

0.3 0.46 0.26 0.4

0.3 0.48 0.27 0.39

CO2_concentration

Humidity

Light_level

Sound_level

Temperature

Temperature

Sound_level

Light_level

Humidity

CO2_concentration

0.30

0.35

0.40

0.45

UC

Figure 7: The uncertainly coefficient matrix for co-present sensors. This shows an estimated proportion of

bits that can be predicted about sensor j (columns) from sensor i (rows). Note, this function is not necessarily

symmetric, and we have omitted the on-diagonal elements and PIR sensor for scale clarity (all < 0.1).

32

c: 1 c: 2 c: 3 c: 5 c: 10

012345

0.0

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

W: 100

W: 200

W: 500

W: 1000

C H L P S T C H L P S T C H L P S T C H L P S T C H L P S Tsensor

quan

tity

Figure 8: Example sensor sets as output by the BKP algorithm using cost and value data described in the

text. C is CO2, H is humidity, L is light, P is PIR, S is sound and T is temperature.

33

Figure 9: A Raspberry Pi computer showing the cutaway sensor board that is currently being deployed on

the ENLITEN project.

34

IntroductionRelated WorkCapturing Energy Events in BuildingsSensor Selection Approaches

Sensor Selection and Study DesignProblem StatementConstrained Optimisation: The Knapsack ProblemDefining Sensor ValuesFeature ExtractionFeature Selection: Random Forest

Defining Sensor CostsSensor RedundancyField StudyExtracted FeaturesSection Summary

ResultsConfiguration ParametersSensor ValuesOptimal Sensor Sets

DiscussionImplicationsLimitationsENLITEN Deployment

Conclusion and Future Work

university of bath...90 gupta et al.’s electrisense [16] – which uses electromagnetic...

Documents