the personalization of mobile health interventions
Post on 01-Mar-2022
2 Views
Preview:
TRANSCRIPT
The Personalization of Mobile HealthInterventions
JULIAN ANDRES RAMOS ROJAS
Thesis committee:Anind K. Dey (Co-Chair), Information School, University of Washington
Mayank Goel (Co-chair), Human-Computer Interaction Institute, Carnegie Mellon UniversityCarissa Low, Department of Medicine, University of Pittsburgh
Tanzeem Choudhury, Department of Information Science, Cornell UniversityRobert Kraut, Human-Computer Interaction Institute, Carnegie Mellon University
A thesis proposal submitted in fulfilment ofthe requirements for the degree of
Doctor of Philosophy
Human-Computer Interaction InstituteSchool of Computer ScienceCarnegie Mellon University
Pittsburgh, PA
26 November 2019
Abstract
Personalized medicine is the adjustment of medical treatment by taking into account
people’s unique demographics, genetic makeup, and lifestyle. This approach, however, relies
on domain knowledge that is often limited and forces medical practitioners to explore multiple
treatments with a patient until finding an appropriate one. During this process, patients
are on their own: They have to remember the specifics of the treatment, and they need to
identify when and what treatment to put into practice. To overcome these challenges, I
envision equipping the most popular computing device: The mobile phone, with the means
to personalize and provide health interventions. This personalized mobile health approach
would give access to health interventions to anyone with a phone, and it would be especially
impactful for populations that lack access to basic health services.
At the core of this proposal, I investigate methods for the personalization of mobile health
interventions using artificial intelligence (AI), smartphones and wearables, and the patient’s
feedback. In my work so far, I have explored two fundamental challenges: when to intervene
(identifying intervention points) and what treatment to use (treatment selection). I approached
these challenges by integrating human-computer interaction work in interruptibility (i.e.,
receptivity) and contextual bandits, an AI method for solving sequential decision-making
problems. This work was applied to a sleep intervention and compared to standard clinical
treatment. The results show that my integrated approach is as good or better than clinical
treatment, and for a stratum of the study’s sample, the results are clinically meaningful.
For my remaining thesis work, I propose to investigate methods for how to predict the
short-term effect of a treatment (models of effects), and how to predict patient adherence
to treatment (models of behavior). Mobile health researchers have identified the proposed
work as crucial for the advancement of the field. Behavior models are necessary for reducing
intervention burden and increasing adherence to the intervention. Models of effects can
inform the direction and strength treatments. My hypothesis is that both models could be used
ii
ABSTRACT iii
to compute the expected value of treatment effect. This expected value could be used to select
the best treatment: one that takes into account the effect and adherence to treatment. I plan to
use these models to augment the treatment selection previously used in my SleepU system,
which will then be deployed to college students in a sleep intervention. This model-based
approach for a mobile health intervention will be compared against my completed work
that does not use an explicit model of treatment effect and adherence, and a survey-based
approach; where treatment is selected from the patient’s own preferences and forecast of
effects of treatment. Additionally, I will measure each patient’s adherence gains from using
this model-based approach. The overall results from this work will inform the development
and deployment of effective and efficient personalized mobile health interventions in the real
world.
Contents
Abstract ii
Acknowledgements iv
Contents v
List of Figures viii
Chapter 1 Introduction 1
1.1 The value of dynamic mobile health interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The elements of a mobile health intervention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Distal outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Proximal outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Decision points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Intervention points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Available treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.6 Tailoring variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.7 Treatment selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Challenges in the personalization of mobile health interventions . . . . . . . . . . . . . . 10
1.3.1 Identifying intervention points using mobile-receptivity (completed) . . . . . 11
1.3.2 Treatment selection and receptivity (completed) . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Development of a personalized model of effects (proposed) . . . . . . . . . . . . . 12
1.3.4 Development of models of behavior (proposed) . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.5 A models-based approach to select Initial treatment (proposed) . . . . . . . . . . 13
Chapter 2 Identifying intervention points using mobile-receptivity (completed) 15
2.1 Mobile-receptivity and interruptibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Detecting interruptibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17v
vi CONTENTS
2.1.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Mobile-receptivity detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Classifier and Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 3 Treatment selection and receptivity (completed) 23
3.1 A framework for the personalization of mobile health interventions . . . . . . . . . . . 23
3.1.1 Sleep interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Related mobile health interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 PECAM Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Sensor input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Communication Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Decision-making module: Defining the selection of a health
recommendation as a reinforcement learning problem . . . . . . . . . . . . . . . . . . 32
3.2.4 Framework connection to behavior change theories . . . . . . . . . . . . . . . . . . . . 35
3.3 Deployment and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.3 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.4 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.5 Analysis plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.1 H1) The combination of a mobile-receptivity detector and a decision-
making module produces better sleep outcomes than a traditional sleep
hygiene appointment intervention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.2 H2) Delivering sleep recommendations at mobile-receptivity states
increases their operationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.3 H3) The SleepU app increased motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.4 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
CONTENTS vii
Chapter 4 Development of a personalized model of effects (proposed) 57
4.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Proposed work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Envisioned results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 5 Development of models of behavior (proposed) 61
5.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Envisioned results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 Alternative plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Chapter 6 A models-based approach to select initial treatment (proposed) 65
6.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.1 Study protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.2 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Envisioned results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Chapter 7 General timeline for the proposal 71
References 72
List of Figures
1.1 Traditional vs mobile health intervention cycle. 4
3.1 The PECAM Framework 24
3.2 Communication module strategy selection process 31
3.3 Fogg Behavior model example for a sleep recommendation 37
3.4 SleepU walkthrough and screenshots 39
3.5 Study design 41
3.6 Sleep duration changes over the semester 46
3.7 Actionability rates for all participants and sub-groups 49
3.8 Actionability of different delivery of notification mechanisms 55
viii
CHAPTER 1
Introduction
Personalized medicine or precision medicine is the tailoring of health interventions to take
into account genes, environment and lifestyle. This national initiative was introduced in 2015
by United States President Barack Obama (Collins and Varmus, 2015) and later renamed
to the All of Us program (Sankar and Parker, 2017), a project that is currently active in the
United States. The value of personalized health interventions comes from improved health
care outcomes from trying only treatments that are most likely to succeed; this approach
not only reduces time to achieve improved clinical outcomes, but it also decreases costs and
improves patients overall quality of care by minimizing side effects (Jameson and Longo,
2015). However, precision medicine is still a nascent field on its own and it requires further
advancement of medical techniques for characterizing patients, larger biological databases and
enhanced mobile health technology. Mobile health technology has emerged as a promising
path to personalized medicine, not only as a way to collect and monitor 24x7 and to collect
previously unreachable data, but also to support real-time interaction with the patient that
could potentially improve engagement and empowerment.
Personalization has traditionally been a process in which both the patient and physician are
involved: The clinician first provides a treatment based on experience, patient’s preference and
goal of treatment; after acquiring evidence (Ashley, 2015) of success or failure in achieving
the desired outcomes, the clinician proceeds to adjust or completely change the treatment. The
need for personalization comes from two main sources that are not necessarily exclusive: Gaps
in medical and personal knowledge. Medical knowledge may be insufficient to understand
adherence or treatment effect for an individual. Personal knowledge means an individual may
not be aware of her own preferences, treatment adherence (how the patient will comply with
1
2 1 INTRODUCTION
treatment), or treatment side effects to it ( unaware of allergies); this means that even when
the science is precise, lack of knowledge does play a role and it is a challenge that may only
be solved through trial and error. This manual personalization process is inefficient: it can
take a long time to find a treatment that works, the multiple visits to the physician have a
monetary cost and meanwhile the patient has to go through unwarranted treatment that could
have side effects and may lead to the patient giving up treatment.
Another problem is that patients are on their own when it comes to self-monitoring and
self-managing their treatment, two crucial components of self-efficacy: an individual’s belief
in their innate ability to achieve goals ( e.g., take medication on time, exercise more, etc.).
Without self-efficacy, behavior change is not viable. Mobile health (mHealth) researchers have
shown the feasibility of using Artificial Intelligence (AI) methods and mobile sensors (Paredes
et al., 2013; Rabbi, Aung, Zhang, Choudhury, 2014; Rahman, Czerwinski, Gilad-Bachrach,
Johns, 2015; Sano, Johns, Czerwinski, n.d.) to personalize health interventions. Also, there
has been work looking at the design of tools for patients that support self-management (e.g.,
blood glucose levels (Desai, Levine, Albers, Mamykina, 2017)). In my thesis, I propose
to further advance the field by studying and testing ways to personalize the elements of
mobile health interventions. Personalization, tailoring and individualization will be used
interchangeably in this proposal as they refer to the same concept in this line of work.
1.1 The value of dynamic mobile health interventions
Just-In-Time-Adaptive-Interventions in mobile health (Nahum-Shani et al., 2018) referred to
in short as mobile health interventions in this proposal, are interventions that are delivered via
a mobile device and are tailored in a dynamic fashion i.e., changes to the health intervention
are based on sensor data or user feedback and performed at multiple times over the duration
of the intervention.
Mobile health interventions are a type of dynamic computer-tailored health interventions
where dynamic means the intervention is adjusted at multiple times during the duration of
the intervention. In comparison, traditional computer tailored interventions are not dynamic
1.2 THE ELEMENTS OF A MOBILE HEALTH INTERVENTION 3
(static): usually tailoring is done at most once at the beginning of treatment. Dynamic
computer-tailored health interventions have an increased efficacy (Krebs, Prochaska, Rossi,
n.d.) in comparison to static health interventions. Besides the value provided by being more
efficient than a static health intervention, mobile health interventions have the added benefit
that they can accompany the patient at all times: A mobile health intervention can both reach
(push) or be reached by (pull) the patient at any time and place (Smith et al., 2016). Ultimately
one of the most promising roles of a mobile health intervention is to support the patient at
the time and place where treatment is put into practice, and this is a role that even the best
medical care cannot provide.
Mobile health interventions are defined by components that are not present in traditional
health interventions due to the intrinsic capabilities of mobile computing devices that make
health interventions readily available anytime and anywhere. Some of these elements have
been identified in the literature (Nahum-Shani et al., 2018) while other elements are extended
(e.g., available treatments, tailoring variables, treatment selection), or first defined (inter-
vention points, intial treatment) in this proposal to better match the nature of mobile health
interventions.
To better illustrate some of the elements, in figure 1.1 shows a general mobile health interven-
tion cycle compared to traditional health intervention. The following are the elements of a
mobile health intervention considered throughout this proposal:
1.2 The elements of a mobile health intervention
1.2.1 Distal outcomes
These are defined as the set of outcomes that are the ultimate goal of the intervention (Nahum-
Shani et al., 2018). This is also referred to as the primary clinical outcome. For example,
in drug rehabilitation, the distal outcome is the elimination of drug use; in sleep hygiene,
it is the improvement of sleep health factors. Distal outcomes are very important to health
interventions however they are usually difficult to use for day-to-day treatment adjustment:
4 1 INTRODUCTION
Diag
nosis
First tre
atment
Treatm
ent
adjustment
DemographicsGenomicsLifestyle......
How to pick First treatment?
When to deliver treatment?
C
ABC
ABC
ABC
ABC
AB
ABC
What treatment?
Treatm
ent
adjustment
Treatm
ent
adjustment
Week 1 Week 2 Weeks
Intervention points Available treatments
ABC
FIGURE 1.1: This diagram shows a basic health intervention cycle and eachstakeholder. The patient first gets a diagnosis, afterwards the intial treatmentfollows, and then there are treatment adjustments some time after. In order toselect the intial treatment the doctor needs to take into account the patient’sdemographics, genomics, lifestyle and others. After the intial treatment thepatient goes back to the doctor and depending on the health state the treatmentmay be adjusted. In a mobile health intervention, the process is the samebut every decision is taken autonomously. Also, treatment adjustment, doesnot have to occur at a fixed point in time, it can be adjusted in days or hoursdepending on the disease. This new model of health however has three mainchallenges: 1) How to select the intial treatment?, 2) When to delivery thetreatment and 3)How to select a treatment. These challenges are explainedthoroughly in section 1.3
There is usually a long time between the administration of treatment and the observation of
change. Distal outcomes alone are not sufficient to measure the intermediate success of a
health intervention, however they are crucial for the design of a health intervention. Distal
outcomes are usually domain specific.
1.2 THE ELEMENTS OF A MOBILE HEALTH INTERVENTION 5
1.2.2 Proximal outcomes
These are any outcomes that could potentially lead to the desired distal outcome as mediating
or direct factors affecting the distal outcome (Nahum-Shani et al., 2018). Typical examples of
a proximal outcome are mediators of behavior change like motivation (“BJ Fogg’s Behavior
Model,” 2016; Michie, van Stralen, West, 2011) and self-efficacy (Bandura, 1976). Proximal
outcomes apply not only to behavioral interventions but also to pharmacological treatments
that rely on basic behaviors of the patient like taking pills at specified times; in this case,
adherence to treatment is a crucial factor: Patients’ failure in adhering to medication regimes
causes 33 to 69% of hospitalizations and accounts for $100 billion in annual health care
costs (Osterberg Blaschke, 2004). Proximal outcomes are not domain specific but they are
adapted to each intervention. As an example, adherence for a pharmacological treatment
can be measured by counting how many times a patient takes a pill on time, while in a sleep
intervention, it could be measured by the number of times the participant fills out a sleep
diary. In both cases the construct is the same, but the measure is specific to the intervention.
1.2.3 Decision points
These are the points in time or more generally context (e.g., location, time of day, mood),
where a health intervention is adjusted (Nahum-Shani et al., 2018). Such adjustment could
be based on a combination of sensor input, patient feedback, computational feedback (i.e.,
estimates of future outcomes from a model) or even physician’s feedback. These decision
points may or may not be of importance depending on the application and the computing of a
decision can be decoupled from the delivery. As an example, for a sleep intervention using
sleep related outcomes, decision points could occur everyday after waking up or they could
be computed right before the moment of delivery. Assuming the sleep treatment depends
only on the previous night of sleep, there is no difference between computing a decision right
before treatment is delivered or as soon as the night of sleep data is available (after waking
up). In contrast, in an intervention for increasing physical activity based on steps, right before
delivering an intervention, an estimate of the current number of steps is necessary in order to
suggest the number of steps left to meet a pre-defined goal. In general, interventions where
6 1 INTRODUCTION
the target of the intervention involves an ever changing process (like a step count) will require
a decision point close to delivery.
1.2.4 Intervention points
These are the points in time or more generally context where a health intervention is delivered
to the patient. An important differentiator of intervention points is whether they are vulnerable
or opportunistic states (Nahum-Shani et al., 2018). Vulnerable states are those leading to
undesirable or dangerous outcomes; as an example, a stressful situation could be a vulnerable
state for a person going through drug rehabilitation since such an event could lead to relapse.
Opportunistic states are contexts used to improve health outcomes without a necessary
connection between the health outcome and treatment. As an example, the same individual
going through rehabilitation may benefit from sporadic and randomly timed reminders to
engage in positive social interactions and exercise. A key construct to find the best intervention
points is receptivity: “an individual’s transient ability and/or willingness to receive, process
and utilize just-in-time support”. This construct, rooted in the dual process model for
supportive communication, states that (Burleson, n.d.) supportive communication (e.g., a
sleep recommendation) can result in positive changes in behavior when the recipient is
motivated to process and enact the message. The identification of receptivity is crucial
for finding opportunistic intervention points. Although there has not been work looking
at the detection of receptive states from sensor streams or data in general, researchers in
human-computer interaction (HCI) have a well established body of work on a similar concept
called interruptibility and engagement. There are multiple definitions of interruptibility, but
for this proposal I refer to interruptibility as the idea that people have moments during the
day when they are available to be interrupted. At such times, an interruption has a low
enough cost so that an interruption is acceptable (Ho Intille, 2004; Okoshi et al., n.d.).
Interruptibility has been studied around computer use and more recently mobile phone use,
and as such all of this body of work is centered on finding interruptible states when an
individual is interacting with a computer or a mobile phone. More recently, HCI researchers
have looked at engagement detection (Pielot et al., 2017), an extension of interruptibility
1.2 THE ELEMENTS OF A MOBILE HEALTH INTERVENTION 7
detection, where the goal is to detect not only when an individual can be interrupted but also
when the individual further engages with the content of the interruption. An easy way to
differentiate the two follows: When an individual receives an SMS and does not even look at
it, the individual is not interruptible; when the individual glances at the SMS, the individual
is interruptible; lastly, when the individual looks at the SMS, opens it and even replies to
the sender or further engages in a task related to it, the individual has been engaged. In this
work, we use engagement detection as a proxy for detecting receptivity, however we make the
distinction that detecting a state of engagement may not always result in the detection of a
receptive state given that receptivity is more involved and depends on variables intrinsic to
the individual like ability and motivation to engage with a stimulus. All of these concepts
are related in the following way: interruptibility preludes engagement, and engagement
preludes receptivity. Interruptibility is necessary but not sufficient for engagement, likewise
engagement is necessary but not sufficient for receptivity, and receptivity implies an individual
is interruptible and engaged. Despite the importance of receptivity, and its related constructs of
engagement and interruptibility, there has not yet been any work using detection of receptivity
to trigger the delivery of a health intervention. However, some researchers have already
started including receptivity in their study protocols for future studies (Kramer Jan-Niklas et
al. 2019).
1.2.4.1 Initial treatment
In this proposal, I further refine the definition of intervention points to include the initial
treatment. The inial treatment refers to the state in which the intervention starts and is
delivered to an individual. There are two possible options for an intervention on how it could
start: 1) The intervention could start with a treatment picked at random among the possibilities
for treatment. This is the less ideal case, however it is realistic in situations when there is
not enough knowledge about the patient to perform any kind of personalization. Also, this
could be an option for interventions that are trying to fulfill research and clinical goals and as
such, this intial treatment, if uniformly randomized is a micro-randomized trial (Klasnja et
al., n.d.) and the data generated from this stage could be used for causal inference. At later
decision points the intervention could move away from a uniform probability distribution,
8 1 INTRODUCTION
however the data generated from that point forward cannot be used for causal inference
because treatment is not provided in a random fashion and instead is focused on the clinical
goal. 2) The intervention could start with a treatment picked using variables that help
identify the subset of treatments that have a higher chance of succeeding at achieving the
target outcome of the intervention. This treatment selection can be performed by means of
expert knowledge where a physician could look for specific demographic variables or other
signs. This treatment selection could also be performed using computational models that
can estimate from clinical health records or biological databases, possible outcomes based
on demographics or genetic makeup. Another possibility is to use a mixed approach where
physicians rely on computational models and their own knowledge to determine the best
course of treatment.
1.2.5 Available treatments
These are referred to as intervention options in the literature and are the different types
of treatment that are available for delivery at any given point. Here, I decided for adding
"Available" to highlight the changing nature of the context of the patient, and how that
context ultimately changes her ability to put into practice health treatments. Nahum-Shani
(Nahum-Shani et al., 2018), further defines as part of the available treatments the media of
delivery (e.g., sms, email, phone call), the type (advice, feedback), or even the quantity of the
treatment (e.g., dosage of a medication or the number of times a heath recommendation is
provided).
1.2.6 Tailoring variables
Traditionally, tailoring variables have been focused on the patient receiving the interventions
and as such, these variables provide information related to the individual that help decide when
and what intervention to provide (Nahum-Shani et al., 2018). However, it is very important
to notice that, from a mobile health intervention point of view, intervention options must be
dependent on the context of the individual receiving the intervention and the computational
1.2 THE ELEMENTS OF A MOBILE HEALTH INTERVENTION 9
resources available (e.g., battery levels, data available, internet connection). The context
of the individual can define the content of the intervention; as an example, reminding a
person to exercise when they are ready to go to bed is not only counter-intuitive, it is
also frustrating. Similarly, taking into account computational resources should limit which
recommendations are suggested to those that have enough support from data collected on that
particular individual or when the intervention is a task that requires computational resources
to complete; if such a task relies on having an internet connection and connectivity is not
available, the system should automatically provide other tasks that are available under the
current circumstances. Tailoring variables are domain and system specific.
1.2.7 Treatment selection
Treatment selection or decision rules (Nahum-Shani et al., 2018) are the underlying mechan-
ism that uses the tailoring variables to select intervention options. The decision rules pick
the intervention treatment (intervention options) based on the variables being tracked during
the intervention (tailoring variables). More broadly, these rules are not necessarily static
and can adapt to evidence of treatment or patient feedback in order to increase treatment
efficacy, engagement or any other proximal or distal intervention outcomes. This is a key
difference with traditional approaches to treatment selection; in the context of mobile health
interventions, treatment selection is not static and treatments are updated on a data driven
basis. An example of this approach is MyBehavior (Rabbi et al., 2014), a system that uses
a stochastic method to determine the best intervention to provide based on sensor data and
personal preferences.
As shown in this section, the elements of a mobile health intervention presented here are not
fundamentally different to those of a traditional health intervention, however, the nature of a
mobile health intervention provides new challenges and opportunities for improved health
care. The first such difference is on the initial treatment selection, in a traditional health
intervention, the physician uses her expertise and medical knowledge to decide. In a mobile
health intervention, this initial treatment could be chosen in a data driven fashion. Another
difference is that in a mobile health intervention, intervention points do not need to be fixed
10 1 INTRODUCTION
and they can be tailor to specifics that are not bound by availability of a physician, time of day
or even geographic location. Instead, a mobile health intervention could intervene at anytime
as needed. Last, a mobile health intervention could decide treatment at any intervention point
in an objetive manner by using available data. In the next section, all of these challenges and
their possible solutions are illustrated.
1.3 Challenges in the personalization of mobile health
interventions
Mobile health researchers have identified several aspects necessary to achieve full personaliz-
ation of health interventions. These challenges arise naturally and are rooted in the different
elements of a mobile health intervention; I first explore two fundamental challenges: when
to intervene (identifying intervention points), how to intervene (treatment selection). After
solving the above, the next challenge is to get individuals to install and try a mobile health
intervention app: In 2018, mobile phone users uninstalled %28 of the health apps installed on
their phones (of Apps, 2018). One possible way to minimize the uninstall rate, in the context
of a mobile health intervention, is to focus on the initial treatments. As will be described
in chapter 4, the initial treatments could be improved by using a prior estimated from the
integrated model of behavior and effect that can then be fed to a contextual bandit, which can
then pick sleep recommendations from the beginning that are most likely to be followed and
have a positive outcome on sleep. Moving forward, I will refer to this problem as the intial
treatment challenge: How to select treatments at the beginning of the study that are more
likely to keep the patient engaged, intervention’s burden low, and the distal health outcomes
at a satisfactory level. I plan to solve the intial treatment challenge by first estimating a model
of short-term effect of treatment i.e., a model that can estimate the direction and strength of a
treatment in a health outcome. Second I want to estimate a model of behavior; a model able
to estimate how likely a given treatment will fit the lifestyle and preferences of a patient, and
thus provide the likelihood that the patient will comply with treatment. And last, I want to
1.3 CHALLENGES IN THE PERSONALIZATION OF MOBILE HEALTH INTERVENTIONS 11
use the effects and behavior models to compute the long-term effect of treatment, which will
take into account both the patient estimated preference and strength of effect.
In this section, I provide a brief description of each of the challenges as well as the completed
and proposed work related to them. All of the completed and proposed work in this thesis
generalizes to many different health interventions, but due to time and space constraints
the work is focused around the automation of sleep hygiene, a well known sleep health
intervention. Details about sleep definition, importance and treatment are provided in the
section 3.1.1. I now provide a general breakdown of the challenges:
1.3.1 Identifying intervention points using mobile-receptivity
(completed)
The first challenge is the identification of an intervention point, given the nature of a mobile
health interventions this requires finding the best possible context for delivery of treatment;
context is not limited to time and it could include location, weather, current activity, cognitive
state or constructs specific to mobile health interventions like mobile-receptivity (defined
in section 2.1 ) among others. Identifying intervening points is crucial for the success of a
health intervention. This challenge has not been explored yet in human computer interaction
and mostly intervention work has been limited to passive approaches where the intervention
treatments appear as part of the home screen of the smartphone or when the user decides
to look for it. In already completed work, I show how intervention points can be identified
by estimating mobile-receptivity a measurable construct of receptivity (Nahum-Shani et
al., 2018) through a machine learning classifier built from smartphone sensor data. This
mobile-receptivity detector was used in the context of a sleep health intervention to identify
the best delivery times for sleep recommendations. The classifier performance at identifying
receptive times in general is shown in section2.2. The effect of using receptivity in a sleep
intervention is shown in section 3.
12 1 INTRODUCTION
1.3.2 Treatment selection and receptivity (completed)
After the identification of a time for treatment, selection of treatment is the next challenge.
Selecting a treatment in the context of a mobile health intervention is a challenging process:
From a very small amount of data, the method chosen for treatment selection should be
able to pick those treatments that will result in the highest increase in the distal outcome.
Although there is work on the topic looking at personalized (Rabbi Mashfiqui et al. 2015)
and cohort-driven (Daskalova Nediyana et al. 2018) treatment selection, that work is mostly
focused on the reinforcement of positive behaviors. In this proposal, I present a method
that generalizes a multi-armed bandit method, in a computationally tractable fashion, to
include contextual data for the selection of health recommendations and works in tandem
with a mobile-receptivity detector that recognizes the best times for delivery of treatment. In
comparison to previous work (Rabbi Mashfiqui et al. 2015; Daskalova Nediyana et al. 2018),
this method recommends new treatments to participants and also may reinforce existing ones.
This novel approach was implemented and tested in the context of a sleep health intervention.
Results of this intervention as well as details about the system can be found in 3.
1.3.3 Development of a personalized model of effects (proposed)
The challenge of treatment selection can be overcome without an initial model of effects, a
model capable of estimating the strength and direction of a treatment in a target outcome.
This model is estimated over the course of the mobile health intervention and it is used by
the treatment selection method. However, that approach is slow and puts a high strain on the
patient by forcing the exploration of treatments that may be onerous, painful or inefficient.
Having a personalized model of effects for each patient can potentially save time, keep the
patient engaged and improve the overall efficiency and efficacy of a mobile health intervention.
Despite all the advantages of using an effects model, its estimation and use in the context of
a mobile health intervention has remained elusive until the time of writing of this proposal.
For my thesis, I propose to estimate such models for the intervention options of a sleep
hygiene intervention. The estimation of the different sleep recommendations effect on sleep
health is very challenging and will require the use of and comparison across techniques like
1.3 CHALLENGES IN THE PERSONALIZATION OF MOBILE HEALTH INTERVENTIONS 13
hierarchical linear models, probabilistic graphical models like Hidden Markov models and
structural equation modelling. The precise estimation of such effects is very challenging and
likely infeasible. However approximate estimates or estimates that can provide the direction
of treatment or rankings among the available treatments, are suitable approaches for making
this model feasible. I foresee the comparison and implementation of this approach as the
main contribution of this part of the proposed work as described in section4.
1.3.4 Development of models of behavior (proposed)
Mobile health researchers have identified the value of models of patient behavior as a way to
inform a mobile health intervention (Hekler et al., n.d.; Nilsen et al., 2016; Riley et al., n.d.;
Tewari & Murphy, 2016). Models capable of estimating people’s preference or likelihood
for following a specific sequence of situations and actions have been used to estimate the
behavioral differences among different types of drivers, routes a cab driver may prefer while
navigating a city or even how people will move around an office environment or parking
structure. In my thesis, I propose to use a modified version of those models to estimate the
likelihood of people’s behavior in the future and the use these models together with a model
of the effects of treatment to estimate the expectation of treatment outcomes. The contribution
of this work lies on the adaptation, implementation and comparison of inverse reinforcement
learning models for mobile health interventions; further details are described in section5.
1.3.5 A models-based approach to select Initial treatment (proposed)
The integration of models that not only take into account the effect of an intervention but also
take into account other aspects of the individual like preferences and routine behavior are
fundamental (Nilsen et al., 2016) to guarantee personalization. Using models that take into
account patient preference and effects of treatment to select treatment can possibly achieve
better outcomes than a mechanism that selects a treatment but ignores those factors (Nilsen et
al., 2016). In this proposal, I want to investigate methods for merging models of effects and
behavior with the goal of decreasing burden and increasing efficacy of the initial treatment.
14 1 INTRODUCTION
As a main approach, I plan to use a model of behavior to estimate the probability of daily
life situations together with the decisions taken by an individual in relation to a sleep health
intervention. Such probability estimates over a span of days or weeks provide a simulation of
an individual’s behavior. This simulation then can be combined with the effects of treatment
to compute the expectation of treatment, and the expectation can be computed for all of the
available treatments. The expectation of treatment is an estimate of the long-term effects of
the intervention. The simulation results could also be used to estimate a confidence interval
for each of the intervention treatments. The expectations computed from the simulations
could further inform day to day treatment selection by trying to maximize long-term effects
or they could inform the selection of the initial treatment (Tewari & Murphy, 2016). The
contribution of this part of the proposal lies in the implementation and testing of a method for
merging the model of behavior and treatment effects. Furthermore, this method will be used
to pick the intial treatments in the context of a sleep health intervention deployed to college
students in the spring of 2020. Details about the study are provided in section6.
CHAPTER 2
Identifying intervention points using mobile-receptivity (completed)
Intervention points can be broadly defined as contexts (time, location, etc) where treatment
must be delivered. Following the definition provided by (Nahum-Shani et al., 2018) for
intervention points, this work is focused on the identification of opportunistic states defined as
contexts where the patient is not in a vulnerable state but is in a state where she has the "ability
or willingness to receive, process and utilize just-in-time support". Receptivity identification
is crucial for the success of mobile health interventions (Nahum-Shani et al., 2018), but it
may be impossible to measure since it requires the sensing of constructs like willingness
or contextual ability. Although there has not been any work looking at the detection of
receptive states from sensor streams, researchers in human-computer interaction (HCI) have
a well established body of work on a very close concept: interruptibility. In this section, is
summarized the most prominent and recent work in interruptibility detection from mobile
phone sensors. This body of work inspires the definition of mobile-receptivity as shown in
section 2.1, a construct very close to receptivity adapted for mobile health interventions and
constrained to be measurable through mobile phone sensors or similar technologies. Using
this definition, it was implemented and tested a mobile-receptivity detector. The detector
is a machine learning model trained using mobile-phone data from 4 weeks and 37 people.
Performance of the receptivity detector is provided at the end of this section. This mobile-
receptivity detector was used in a randomized clinical trial as a trigger for the delivery of a
sleep health intervention presented in chapter 3. Details about the mobile-receptivity detector
implementation are provided in section 2.2.
15
16 2 IDENTIFYING INTERVENTION POINTS USING MOBILE-RECEPTIVITY (COMPLETED)
2.1 Mobile-receptivity and interruptibility
Interruptibility is closely related to receptivity (Nahum-Shani Inbal et al. 2014), however
there is not a single definition of interruptibility and instead it has been studied under different
terms:
• Interruptibility (Okoshi et al., 2016; Ho and Intille, 2005): the idea that people have
moments during the day when they are available to be interrupted. At such times, an
interruption has a low enough cost such that an interruption is acceptable.
• Attention (Pielot et al., 2014; Pielot et al., 2015): The idea that people are busy and
have moments of attention that they can direct towards something other than their
current task.
• Boredom (Pielot et al., 2015): the idea that people intentionally seek information
and ways to entertain themselves.
• Engagement (Pielot et al., 2017) with the information presented: Users not only
attend to a notification but click on it to find out more about it. Engagement detection,
is a step forward in the direction of receptivity detection and it is well differentiated
with interruptibility work that has been mostly focused on finding a moment where
the user is reachable by a notification or another type of alert (Pielot et al., 2017).
Instead, engagement detection aims to estimate user states where they are likely to
engage with the content provided.
All of these concepts are related in the following way: interruptibility preludes engagement,
and engagement preludes receptivity. Interruptibility is necessary but not sufficient for
engagement, likewise engagement is necessary but not sufficient for receptivity, and receptivity
implies an individual is interruptible and engaged. Despite the importance of receptivity,
and its related constructs of engagement and interruptibility, there has not been yet any
work looking at the detection of receptivity to trigger the delivery of a health intervention.
However, some researchers have considered including receptivity in future studies (Kramer
et al., 2019), as a fundamental part of mobile health interventions. In this work, we bridge
interruptibility and receptivity under a new term, mobile-receptivity: A state in which an
2.1 MOBILE-RECEPTIVITY AND INTERRUPTIBILITY 17
individual has the cognitive ability to stop their current task to read and make sense of a
notification related to a health treatment in the context of a mobile health intervention. In
practice, this can be measured by means of observing when the user clicks and reads through
a push notification from a mobile phone application. Although mobile-receptivity is more
constrained than interruptibility, many of the related work and lessons learned in building
models of interruptibility can be used for building models of mobile-receptivity.
2.1.1 Detecting interruptibility
Although interruptibility itself is not sufficient for identifying mobile-receptivity states, many
of the methods and features used are useful for detecting mobile-receptivity. The preferred
method for building models of interruptibility is by using machine learning classifiers. Re-
searchers have used different classifiers to build successful interruptibility detectors, however
the preferred classifiers are decision trees and random forests (Pielot et al., 2014; Ho and
Intille, 2005; Pielot et al., 2017; Katevas et al., 2017; Okoshi et al., 2016; Dingler and
Pielot, 2015). The performance of models of interruptibility has been measured mainly in
two different ways: leave a subset of users out at random or cross-validation in which data
is randomized without taking into account time or user independence. The later evaluation,
is the most prevalent in the literature and accounts for the best results. This is expected
due to cross-validation’s over-optimistic results in time series data where the independence
assumption is broken, and as a result, work that splits the data according to users, has a
lower, but more realistic, performance results to those expected in a real world deployment. A
majority of the work in this domain report accuracy, precision and recall. Engagement work
(Pielot et al., 2017) shows the lowest performance however this is expected since engagement
is only a small subset of interruptible situations and a much more difficult event for detection.
detection work:
18 2 IDENTIFYING INTERVENTION POINTS USING MOBILE-RECEPTIVITY (COMPLETED)
Paper Method Evaluation A P R F1 Featuresselection
Didn’t You See MyMessage? (Pielot etal. 2014 )
Random Forests Random cross-validation 0.68 – – – WrapperAccuracy
Using Context-Aware Computing(Ho et al. 2005 )
Decision Tree Data split into train andtest
0.91 – – – –
Beyond interrupt-ibility(Pielot et al.2017-09-11 )
XGBoost Cross validation randomiz-ing over random groups ofpeople
0.89 0.218 0.540 0.31 Featuresselection
People’s inter-ruptibility in-the-wild(Tsubouchi etal. 2017 )
Linear regression Live evaluation of themodel, the performancemetrics were reduced userresponse time 49%(54 to27 minutes)
– – – – –
Continual Predic-tion of Notifica-tion(Katevas et al.2017)
RNNs XGBost Cross-validation includinggrid search for XGBoost
AUC 0.7 0.8 0.5 0.61 Featuresselection
Towards attention-aware (Okoshi et al.2016-02 )
Random forests — 0.82 0.82 0.82 0.82 —
I’ll be there foryou(Dingler et al.2013)
Random forests – 0.79 0.77 0.82 0.79 —
When attention isnot scarce (Pielot etal. 2015 )
– Random cross-validation 0.83 – – – —
InterruptMe(Pejovicet al. 2014 )
Adaboost Random cross-validation 0.73 0.36 0.48 0.41 —
Using decision-theoretic(Rosenthal etal. 2011 )
Logistic regression — 0.9 – – – —
TABLE 2.1: All the articles including method, evaluation and performanceresults. A (Accuracy), P (Precision), R(Recall).
2.1.2 Features
In terms of the data used to build the classifiers, there is an ever increasing number of features
used for detecting interruptibility in the literature. The number of features used has varied
from 4 to more than 300 and there is not a general agreement on what features should be used.
However (Pielot et al., 2017) presents an all-encompassing categorization of the different
features used that is informative and allows for flexibility in implementation. Furthermore, all
of the features used in other works fall into one of the categories described by (Pielot et al.,
2017) and so it is recommended to use them in any interruptibility:
2.1 MOBILE-RECEPTIVITY AND INTERRUPTIBILITY 19
• Communication activity: Computer-mediated communication. This group includes
features that show how often a user is using the phone to communicate with others
by, e.g., sending or receiving messages, or making or replying to phone calls. For
instance, a user that just got distracted by an incoming phone call might not be open
to further interruptions. Examples of Communication Activity features are: number
of SMS messages received in the past hour, time since the last incoming phone call,
or category of the app that created the last notification.
• Context: Features related to the situation of the mobile phone user, i.e., his or her
environmental context. The context of use often determines whether it is appropriate
or safe to interact with the mobile phone. For instance, being at home during the
weekend may indicate opportune moments for interruption, whereas being at work
during the morning may indicate the opposite. Examples of Context features are:
time of day, estimated current distance from home, recent levels of motion activity,
or average ambient noise level during the last five minutes.
• Phone status: Features related to the status of the mobile phone. For instance, a
device with screen status ‘unlocked’ indicates that the user is currently using the
phone, thus a notification might be interrupting a concurrent task. Examples of
Phone Status features are: the current ringer mode, the charging state of the battery,
or current screen status (off, on, unlocked).
• Usage patterns: The type and intensity of usage of the phone. For instance, a user
engaged in playing a game or watching a video may be less open to an interruption,
whereas while surfing on the Internet might provide a better moment. Examples of
Phone Usage features are: number of apps launched in the 10 minutes prior to the
notification, average data usage of the current day, battery drain levels in the last
hour, number of device unlocks, screen orientation changes, or number of photos
taken during the day.
Demographics is another category however it has mainly covered age and gender and no other
variables have been studied. The importance of the features by category was studied by (Pielot
et al., 2017); in that work, the ranking from best to worst features to predict interruptibility:
Context (1), Communication (2), Usage Patterns (2), Demographics (3), Usage Patterns
20 2 IDENTIFYING INTERVENTION POINTS USING MOBILE-RECEPTIVITY (COMPLETED)
(3). A feature analysis was performed by (Pielot, Dingler, Pedro, Oliver, 2014), using the
same categorization as in (Pielot et al., 2017) the ranking becomes: Communication (1),
Context(2), Demographics(3), Usage Patterns(4). These results show that consistently both
Communication and Context are the most important categories.
2.2 Mobile-receptivity detection
The main goal of mobile-receptivity detection is to detect receptivity states when people
are nearby or interacting with their phone, and use this state to remind the individual about
actionable health treatments. It is worth noting that although there are not applications of
mobile-receptivity detectors in mobile health interventions, interruptibility classifiers have
already been used outside the lab setting to increase news readership in japan (Okoshi Tadashi
et al. 2018). For my thesis, I built a mobile-receptivity classifier using most of the findings
from previous work (Pielot Martin et al. 2014; Ho Joyce et al. 2005; Pielot Martin et al. 2017;
Okoshi Tadashi et al. 2016; Katevas Kleomenis et al. 2017; Pielot Martin et al. 2015).
The classifier uses most of the features identified in (Pielot et al., 2017) (communication
activity, context, phone status and usage patterns) and was trained using data from the baseline
phase (4 weeks) of the sleep intervention study described in more detail in section 3. Below,
we provide a detailed description of how the mobile-receptivity classifier works and was
evaluated.
2.2.1 Data collection
Data for building the mobile-receptivity detector was collected from the baseline (i.e., first
4 weeks without any intervention) phase of a sleep health intervention study described in
3.The app, which did not interact in any way with the participant, collected smartphone sensor
data while running on the background. The app collected a total of 88 different features
summarized as: Communication activity (e.g., number of SMS received, time since last phone
call, etc.); Context ((e.g., light, proximity, activity from Google’s activity recognition API);
Phone status ((e.g., battery level, time since unlocked, number of times locked in the day, etc);
2.2 MOBILE-RECEPTIVITY DETECTION 21
Usage patterns ((e.g., number of apps interacted with, number of UI events, etc). The app
computed and stored the features every second as long as the phone was not asleep.
2.2.2 Pre-processing
Pre-processing during training of the classifier was kept simple to ease implementation and
avoid computing overhead for its future use live as part of a sleep intervention. The first
pre-processing step was to use a sliding window of 5 minutes, and to compute features like
mean, max, min and standard deviation over each window. After that, values were normalized
using a min-max scaler, using pre-stored min and max values to keep consistency across the
classifier training and live deployment.
Like in (Pielot et al., 2017), labels for mobile-receptivity states are obtained when the phone
user not only checks a notification but further engages in it by clicking on it.
2.2.3 Classifier and Performance evaluation
We used a MultiLayer Perceptron (MLP) from the scikit-learn library (Pedregosa et al., 2011)
for our mobile-receptivity classifier. Although state-of-the-art models mostly use Decision
Trees, for our implementation we needed the flexibility of a classifier capable of learning
from batches of data (online-learning), allowing us to train a classifier as soon as data arrives
from each participant instead of waiting for all participants to finish their baseline phase. This
functionality is available for MLP but not for Random Forests or Decision Trees. After the
model was trained, it was translated into Android-Java using the sklearn-porter (Morawiec, ).
The performance was evaluated using leave-one-out-validation stratified by participant and
is shown in Table 2.2. The mobile-receptivity classifier has a better performance (88%
accuracy, F1_score=0.54) than the state-of-the-art engagement classifier ((Pielot et al., 2017):
Precision=0.2, Recall=0.5, F1_score=0.3).
22 2 IDENTIFYING INTERVENTION POINTS USING MOBILE-RECEPTIVITY (COMPLETED)
Accuracy Precision Recall F1 score0.88 0.44 0.74 0.54
TABLE 2.2: Performance of the mobile-receptivity detector
CHAPTER 3
Treatment selection and receptivity (completed)
In this proposal, the process of personalization is defined as the solution of two different but
interrelated problems: Detecting a mobile-receptivity state for delivery of the intervention and
selecting treatment based on Health outcome and compliance. To solve this challenges, in this
proposal is presented PECAM a Personalized and Context-Aware Mobile health intervention
framework. In this chapter is first introduced PECAM and its components, then sleep and
sleep intervention work in HCI and the chapter ends with the results from a sleep intervention
using PECAM and delivered in the spring of 2019 to 30 college students.
3.1 A framework for the personalization of mobile health
interventions
PECAM models a health intervention as a reinforcement learning problem incorporating the
way the patient interacts with her phone through notifications. Under PECAM as shown in
figure 3.1, we have a health intervention delivered through a phone that uses a communication
module to decide the context when the different treatments should be delivered, and a decision-
making module that decides which health treatment to deliver. The patient interacts with the
phone through notifications and she could decide to accept (i.e., read and enact the health
treatment although not necessarily immediately), dismiss (i.e., it is considered irrelevant in
the current context) or ignore (i.e., the patient is engaged in a task and did not pay attention at
all to the notification). After the patient decides what to do with the health treatment provided,
she goes about her everyday life represented as the environment (i.e., all of the external factors23
24 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
that could have an effect on the patient’s decision, motivation and ability or constraints with
respect to the health treatment provided).
Recommendation
Accept
Dismiss
Ignore
Patient
Communication module
Decision-making module
Environment
Sensorsmodule
Phone
Wearable
FIGURE 3.1: The PECAM Framework. PECAM models the way a healthintervention can be delivered through a phone to a patient taking into accountthe way the patient interacts with the phone.The starting point is the phone,where multiple sensor streams including onboard sensors and external oneslike a wearable reach the phone for pre-processing and other purposes. Inthe phone there is a communication module that uses the sensor streams todecide the right context to deliver a health treatment. In tandem, a decision-making module selects, using sensor data and the patient’s feedback, the healthrecommendation (i.e., treatment) that is more likely to be followed and thathas the best impact on the health outcome of interest, as measured throughsome subset of the sensor streams available. A health recommendation is thendelivered in the form of a notification to the participant which may decide toread, dismiss or ignore the recommendation. The consequences of the patient’sdecision will have an impact that is measurable through the sensors module.The data derived from the sensors module is then used by the communicationmodule and the decision-making module
For the reminder of this proposal, sleep hygiene(Posner and Gehrman, 2011) is used as the
domain in which most of the ideas and methods presented are tested, however, the general
framework of this project can be applied to other domains like weight management, stress
management, physical activity among other health interventions. In the next section is
described very generally the importance of sleep and related work in HCI.
3.1 A FRAMEWORK FOR THE PERSONALIZATION OF MOBILE HEALTH INTERVENTIONS 25
3.1.1 Sleep interventions
Sleep in humans is defined as a natural state of unconsciousness where responses to external
stimuli are reduced. Sleep is reversible and occurs at regular intervals that are independent
of many other physiological processes. Sleep has a fundamental role for many essential
processes in the human body that regulate learning (Stickgold et al., 2001; Yang et al., 2014),
memory (Rasch and Born, 2013; Stickgold et al., 2001), weight (Nagai et al., 2013), mood
(Walker, 2009) and cardiovascular health (Wolk et al., 2005) among other processes. Sleep
is multidimensional; there is not a single factor that captures overall sleep quality. Instead,
sleep is defined using the following sleep health (Buysse, 2014) factors: Sleep duration,
the total amount of sleep obtained in a 24-hour period; Sleep efficiency, the ease of falling
asleep and returning to sleep calculated as the percent of time asleep of the total time spent in
bed; Timing, the time of occurrence of sleep within a 24 hour day; Alertness, the ability to
maintain attentive wakefulness; Quality: the subjective assessment of sleep.
The ideal sleep hygiene intervention has two components: Sleep Hygiene Education and
Sleep Hygiene Recommendations. The education component refers to teaching individuals
about the importance of sleep and its relation to general health. The recommendations are
a set of practices that are meant to improve sleep. A sleep hygiene intervention usually
starts with the education component and then sleep hygiene recommendations are introduced.
Sleep hygiene recommendations are usually taught by an expert clinician who first does
a sleep assessment to determine the individual’s most salient sleep problem and after that
proceeds to create a personalized plan of treatment: finding a set of recommendations that
are aligned with the patient’s goal, preferences, and desired outcomes. This personalized
plan of treatment however is not static; the individual usually starts trying a small set of sleep
recommendations. After some time, usually weeks, depending on outcomes from this first
plan, the clinician may suggest alternative recommendations in a follow-up visit. This process
is repeated until the desired outcomes are achieved. Oftentimes, however, health services
only provide a limited number of follow-ups or none at all, and these follow-ups are usually
weeks apart. In the meantime, the individual may be wasting time and effort trying out sleep
recommendations that do not work for her and this could result in her dropping the sleep
26 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
intervention altogether. In summary, personalization is challenging, time-consuming and
prone to error, and can take from weeks to months due to the modifications to treatment and
limited availability of clinicians, if it even succeeds at all. It is worth noting that there is high
variability in the delivery of a sleep hygiene intervention; for example, at some colleges and
universities, both the education and recommendations components are delivered in the context
of a classroom, but in such a format, there is no personalization of treatment or follow up.
In the best case scenario, sleep hygiene is provided over multiple individual sessions by an
experienced clinician.
One of the earliest work in HCI related to sleep intervention is ShutEye (Bauer et al., 2012),
a smartphone application that shows Sleep Hygiene recommendations at appropriate times in
the background of the home-screen of a user’s smartphone. ShutEye modified the background
of the home-screen to display activities that were encouraged or discouraged depending only
on the time of the day and sleep hygiene recommendations, and did so without sensing sleep-
related parameters. Although the study was exploratory, there was a decrease in subjective
sleepiness score for 8 out of 12 participants.
Horsch et al., (Horsch et al., 2017) demonstrate that the usage of reminders increased
adherence to automated parts of a CBT-I based intervention. This intervention was delivered
through a smartphone application that contained a sleep diary, a relaxation exercise, sleep
overview graphs, and reminders (set by the participant) to use the sleep diary and perform
the relaxation exercises. As part of their results, they show that reminders can improve
intervention adherence.
Daskalova et al., presents SleepCoacher (Daskalova et al., 2016), a framework for self-
experimentation with sleep recommendations. The system works by using the phone as a
sleep parameters sensor (sleep duration, time to bed, time out of bed, awakenings, etc.). Sleep
measurements are collected over a baseline period of five days and then correlations are
estimated for observed sleep related behaviors (time to bed, sleep environment, etc.) and
sleep related outcomes (awakening, sleep duration, efficiency). SleepCoacher then selects the
pair of sleep behavior-outcomes with the highest correlation, finds a corresponding template
generated by sleep experts, and then asks the participant to follow this behavior for 5 days,
3.1 A FRAMEWORK FOR THE PERSONALIZATION OF MOBILE HEALTH INTERVENTIONS 27
followed by 5 days of no-intervention, then another 5 days of the same recommendation. The
total duration of the final study was 3 weeks with 17 participants. This intervention only
provides one recommendation to each participant. SleepCoacher, given its high correlation
selection algorithm, operates by reinforcing the participant’s behavior that shows the highest
correlation with a positive sleep outcome. In terms of outcomes as an intervention, 2 of the
17 participants showed improvements (Hedge’s g>=0.5) in their respective target variable
(frequency of awakenings, self-reported restfulness and time to fall asleep). In a different
project, Daskalova demonstrates the usage of a cohort-based approach for sleep health
intervention (Daskalova et al., 2018). This method for providing recommendations is based
on providing sleep recommendations for a new patient by looking at data from people with
similar demographics. Once a cohort is identified for a new patient sleep-related measures that
are the most dissimilar (compared to the cohort’s) is chosen as a sleep target. Then, the sleep
recommendation with the highest positive effect on the sleep target selected is provided to the
participant. Their results show that cohort-based recommendations resulted in an increase of
17 minutes in sleep duration but this result was not statistically significant.
In summary, sleep interventions in HCI are still at an exploratory stage, however they are very
promising. Most of these interventions were based on or are an extension of sleep hygiene
recommendations (Daskalova et al., 2016; ?), and the usage of daily reminders has shown
promising results (Horsch et al., 2017) at increasing adherence to the intervention.
3.1.2 Related mobile health interventions
Mobile health researchers have shown the feasibility of using Artificial Intelligence (AI)
methods and mobile sensors (Rabbi et al., 2016; Paredes et al., 2014; Sano et al., 2017;
Rahman et al., 2016) to personalize health interventions. Paredes(Paredes et al., 2014)
presents a stress intervention that uses a contextual bandit and the Upper Confidence Bound
method to provide stress recommendations through a mobile phone. Their results show
that there was a close to significant decrease of perceived stress for participants in the ML
condition and there was another effect for copying mechanisms.
28 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
Yom et.al., (Yom-Tov et al., 2017) present a system that uses a contextual bandit to personalize
the type of message received to encourage physical activity. The goal of the study was to
increase physical activity to improve health of type 2 diabetes patients. The results show a
positive effect of the system in increasing physical activity and reduction of glucose levels.
The method used is the next: First a pseudo-random policy is use to collect data for a couple
of months. After that, a policy is estimated and used in the study. The policy itself is a linear
regression model using features that summarize the state and features that capture the actions
as indicator functions. All these features then are used to predict the effect of actions and
patient state. To estimate an action, Boltzman sampling is performed over the different actions
and model outputs. The stochastic nature of the method used allows for variability in the
treatment provided and not always the best treatment is provided.
Mashfiqi et al., introduced MyBehavior (Rabbi et al., 2016), a mobile application that auto-
matically generates recommendations for a healthy lifestyle. MyBehavior uses participant-
provided preferences together with location, activity and food intake logs to suggest recom-
mendations to reduce calorie intake and increase calorie expenditure. MyBehavior was tested
in a multiple baseline (Dallery et al., 2013) design study consisting of a baseline period of 3
weeks, then 2, 3 or 4 weeks of the control condition followed by 7-9 weeks of the treatment
condition. The study was conducted with 16 participants that were ready to act (n=7) or
acting (n=9) towards healthier behavior change previous to the study. MyBehavior delivers
recommendations through an on-screen widget that also shows real-time updates of calorie
intake and expenditure, and chronological summaries of physical activities and food intake.
During the baseline condition, participants do not receive any recommendations, however
they have access to all the tracking information from the app. During the control condition,
participants receive random recommendations from a set of 42 pre-defined recommendations.
During the treatment condition, participants receive recommendations that are adapted to
participant preferences and outcomes. MyBehavior generates the recommendations using two
separate EXP3 (Auer et al., 2002) multi-armed bandits (one for food and another for exercise)
and a pareto frontier method (ROBERTS et al., ). Together, these two methods find the
recommendation with the best outcomes and with the highest participant preference. When
using the MyBehavior app, participants followed 1.2 more recommendations (p<0.0005),
3.1 A FRAMEWORK FOR THE PERSONALIZATION OF MOBILE HEALTH INTERVENTIONS 29
walked for 10.1 (p<0.005) more minutes and burned 42.1 more calories in non-walking exer-
cises (p<0.05) and consumed 56.1 less calories (p<0.05) each day. Mashfiqi et al., followed
MyBehavior with MyBehaviorCBP (Rabbi et al., 2018), which uses a very similar method
for providing suggestions for pain management. For a thorough review of myBehavior, see
(Aung et al., 2017).
Liao et al., (Liao et al., 2019) presents a general method for the estimation of vulnerable times
from historical data. At the time of this proposal, their results are derived from simulations
however the authors plan on using this method in real world deployment of a physical activity
intervention for hypertension. Their results are very encouraging and show the value of
methods for the delivery of interventions at vulnerable times.
Overall, all of the systems and methods(Rabbi et al., 2016; Yom-Tov et al., 2017; Paredes et
al., 2014; Liao et al., 2019), produce very positive results, however most of them (Rabbi et al.,
2016; Yom-Tov et al., 2017; Paredes et al., 2014), with the exception of Liao’s (Liao et al.,
2019), lack a mechanism for proactively delivering health recommendations at opportunistic
or vulnerable times and instead they rely entirely on the user’s willingness or a predefined
time to receive recommendations. This lack of a delivery mechanism, limits the effect of the
intervention only to participants that are actively engaged with the intervention.
As a consequence, in this proposal, health recommendations are pushed to participants in
a more proactive way by displaying sleep recommendations that are relevant for the time
of the day, and at times when we detect that the patient is in a receptivity context. Also,
treatment is further personalized by using contextual bandits which can better tailor the sleep
recommendations for different contexts.
30 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
3.2 PECAM Components
3.2.1 Sensor input
The PECAM framework uses sensors to support the functions of the communication module
and the decision-making module: Phone sensors and external sensors. The phone provides
several physical sensors and several virtual sensors used by the communication model to
estimate mobile-receptivity. Examples of physical sensors used are the accelerometer, which
is used to estimate general activities such as walking, jogging, still and in a vehicle through
the Google activity recognition API. Examples of virtual sensors are estimates of how many
touches per second are produced by the user, number of calls placed in the last hour, number
of text messages received in the last hour, etc. External sensors are any sensor that is not on
the phone such as a wearable device or digital scale. For SleepU, a Fitbit was used as the
external sensor, which uses its own accelerometers and gyroscopes to estimate basic sleep
stages such as asleep, asleep movement and awakenings. The Fitbit API sleep estimates
were used as an input for the decision-making module. This very specific implementation
of the framework uses the phone and wearable as the source for sensor streams but could
be expanded to other sensors including bed and next-to-the-bed sensors to more accurately
estimate sleep stages.
3.2.2 Communication Module
The communication module is in charge of deciding when to deliver a health intervention in
the form of a notification to the user. The communication module’s main goal is to detect
mobile-receptivity states and use them to remind people of actionable health recommendations
for the current time of day, as chosen by the decision-making module. For SleepU, it was
built a mobile-receptivity classifier as described in section 2.2.
Although the main goal is to show health recommendations to the user during mobile-
receptivity states, those states are limited to the times when the user is interacting with and
3.2 PECAM COMPONENTS 31
next to the phone. This means that there are times when the user could be in a mobile-
receptivity state but this cannot be detected. To overcome this challenge, the communication
module is stochastic and at every hour of the day as shown in figure fig:random-strategy, it
decides at random whether to use the mobile-receptivity classifier for the next hour or to pick a
random time during the next hour to interrupt the participant. The probability that the mobile-
receptivity classifier is used decreases over each time period (i.e., morning, afternoon, evening)
so that a random time will always be picked in the last hour if a recommendation for that time
period has not been seen by the user. To avoid overwhelming the user, the communication
module only sends a notification once per hour during each time period, and only if the user
has not already viewed a recommendation for that time period. Further notifications are only
sent before 9am, when the user is classified as being in a mobile-receptivity state, to avoid
having the notification disrupt the user’s sleep.
FIGURE 3.2: Communication module strategy selection process. Since themobile receptivity detector may not work at all times, every hour the commu-nication module will decide at random whether to use the mobile-receptivitydetector or a random time during the next hour to push the health recom-mendation. The probabilities for picking at random either strategy changeover time with the highest probability at the beginning of the period for themobile-receptivity detector. The random strategy has the highes probability bythe end of the period. Notice how the probabilities are one at the beginningand end, this guarantees that at the beginning the mobile-receptivity is usedand at the end, if the patients has not seen a recommendation yet it will bedisplayed for sure at a random time.
32 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
3.2.3 Decision-making module: Defining the selection of a health
recommendation as a reinforcement learning problem
For PECAM, personalization is defined as the selection of an appropriate health recommenda-
tion based on two different factors: Health outcome and compliance. For SleepU, the sleep
health outcomes taken into account are sleep duration and efficiency. Compliance is defined
as whether or not the participant followed a sleep recommendation. The selection of a health
recommendation is defined as a reinforcement learning problem (?). Reinforcement learning
problems are defined as those related to sequential decision making in which an agent is
interacting in an environment by taking actions and the goal of the agent is to maximize
some reward obtained after taking each action and over a period of time. In a mobile health
intervention, the agent is the app providing the intervention, the available actions for the app
are the different health treatments that the app can provide to the patient, and the reward is a
measurement of the health outcome of interest.
In the context of the sleep recommendations problem, the agent is the SleepU app, the
available actions for the app are the different sleep recommendations that can be shown to the
user, and the reward is defined as the harmonic mean of sleep duration and sleep efficiency.
Compliance is used to control updates to the estimates of possible rewards for each action;
when a recommendation is followed, an update occurs, otherwise there is no update since
there is no new information for making an update. In summary, the SleepU app is selecting
and displaying sleep recommendations to a participant while trying to maximize the following
day’s sleep duration and efficiency of the participant. For completeness, it is assumed the
following about the sleep recommendation problem, although this generalizes to many other
health recommendation problems:
(1) The probability distribution for the actions’ rewards is unknown: This means that it
cannot be easily assumed a known probability distribution (i.e., Gaussian) for how
the reward (i.e., the health outcome) is distributed for each of the actions. This is
also referred to as an unknown data generation model (Bubeck et al., 2012).
3.2 PECAM COMPONENTS 33
(2) The change in sleep duration and efficiency has low variance: although it is expected
to see differences in sleep duration and efficiency after a participant follows a sleep
recommendation, it is not expected to see large changes from one day to the other.
For instance, a participant that has a sleep efficiency of 70% is not going to change
to 99% following any given recommendation for one day. This assumption is highly
dependent on the domain. As an example, in a physical activity intervention with the
goal of increasing daily steps, the average number of daily steps taken could change
drastically. However, depending on the time-frame, for example, if the goal is to
achieve an average weekly number of steps, then the changes may not be as drastic.
(3) Selecting a health recommendation is a non-stationary problem: although there
is likely to be a single recommendation that produces the best health outcomes
for a participant at any given time, this recommendation is likely to change. This
is a problem that has been identified as a common challenge for mobile health
interventions (?).
3.2.3.1 Contextual bandit
Contextual bandits (Lattimore and Szepesvári, 2019) are a generalization of the bandit
algorithms capable of dealing with context. Contextual bandits are typically used in web
advertising where the goal is to maximize click through rate by deciding, for example, on
location and topic of an ad given a particular set of contextual features like age, time of day and
season. For the implementation of PECAM presented in this work, we chose to use contextual
bandits as opposed to other methods like Q-Learning or SARSA, because contextual bandits
are more sample-efficient (i.e., learn with a smaller data set). However, applications with
access to big data sets or a large pool of participants could use more sophisticated methods.
In order to make our sleep recommendation problem computationally tractable, we divided
the context based on time into three different non-overlapping periods:morning (6:01am to
12pm), afternoon (12:01pm to 6pm) and evening (6:01pm to 6am). Then for each period we
use a different EXP3(Auer et al., 2002) multi-armed bandit. As defined in (Lattimore and
Szepesvári, 2019), this particular usage of multiple multi-armed bandits for different contexts
34 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
corresponds to a contextual bandit. For other health interventions, a similar approach could
be taken where the contextual factor with most weight in the intervention could be divided
and then individual MABs can deal with each context separately.
EXP3 works by selecting a recommendation at random from a multinomial distribution.
The EXP3 algorithm is described in algorithm 1. There are many different multi-armed
bandit methods such as the Upper Bound Confidence Interval, Thompson sampling, etc.,
however EXP3 provides the best theoretical guarantees given the assumptions of the sleep
recommendation problem. EXP3 assumes the environment is adversarial; in such an en-
vironment, whenever the bandit picks a specific sleep recommendation, the environment
can foresee the decision rule and pick a different sleep recommendation as the best at any
given time. Although, the actual environment for the sleep recommendation problem may
not be adversarial, working under that assumption prepares the bandit for the worst possible
conditions, and as such, EXP3 is guaranteed to only make a finite number of mistakes and to
adapt to a non-stationary environment. EXP3 has no assumptions about the data generation
model (Bubeck et al., 2012). The Upper Confidence Bound (UCB) approach breaks under
low variance problems (Kuleshov and Precup, 2014). Lastly, EXP3 has already being used in
related work (e.g., (Rabbi et al., 2016)) with successful results.
EXP3 in the context of the sleep recommendations starts with a uniform probability over
each of the recommendations. When a recommendation has a positive sleep outcome (high
efficiency and/or high sleep duration), the probability of that recommendation is increased
slightly while all the other recommendations’ probabilities are decreased. In order to make
the problem computationally feasible, three different multi-armed bandits (MABs) are used:
one for each period of the day (morning, afternoon and evening). A short version of the sleep
recommendations handled by each of the MABs is shown in table 3.1. To decide in which
period of the day each recommendation should appear, we worked together with a CMU sleep
clinician. This took into account that planning is an important part for some activities. For
example, for the recommendation "Avoid exercising 4 hours before bed time", the goal is
to help the student change the exercising time to the morning or afternoon and not to drop
exercising. Therefore, the best time to remind them about it is in the morning. More details
3.2 PECAM COMPONENTS 35
Initialization;w(0) = {w(0)
n = 1}, n = 1, ...., N ;for t=1,...,T do
β =√
( log(k)/(k · t));Select recommendation i;φ(t) =
∑Nn=1w
(t)n ;
i ∼Multinomial(w(t−1)/φ(t−1));Compute sleep score;s(t) = 1(i ∈ r(t−1)) ·H(sleepD(t−1), sleepE(t−1));Update;
w(t)n = w
(t−1)n · e(−β·`(s(t))/p
(t)n )
endAlgorithm 1: EXP3 algorithm adapted for the sleep recommendations problem. Wherew
(t)n is the weight for recommendation n at time t, p(t)n = w
(t)n /φ(t) is the probability of
selecting a recommendation, sleepD is the sleep duration in hours capped at 7 and dividedby 7, sleepE is the sleep efficiency, H(x) is the harmonic mean and 1(i ∈ r(t−1)) isone if the recommendation i pushed by the app to the participant is reported as followedi ∈ r(t−1).
on how these recommendations were selected and displayed are provided in the study design
section (3.4.1). In other domains, other MABs could be more adequate; as an example, in a
health intervention where it is necessary to optimize the health treatment very quickly (i.e.,
the number of opportunities to try out treatments is limited) and non-stationarity is not an
issue, the UCB MAB may be a better option.
3.2.4 Framework connection to behavior change theories
The design choices behind PECAM are mainly based on self-efficacy (Bandura, 1977) theory,
the Fogg Behavior Model (Fogg, 2009), the COM-B framework (Michie et al., 2011) and
closely follows the design guidelines for Just in time adaptive interventions (Nahum-Shani
et al., 2017). Self-efficacy theory posits that behavior change can only be achieved once the
individual has a perception of success towards the execution of a task. In this proposal, I
posit that achieving high self-efficacy is context-dependent: Even if an individual has high
efficacy for a given task, this task can only be executed under very specific circumstances,
so ultimately success is dictated by the individual’s ability and context. As an example,
an individual may be able and willing to stop drinking coffee to improve sleep outcomes,
36 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
TABLE 3.1: Sleep Hygiene recommendations used in the SleepU app
MAB Sleep Recommendation
Morning Keep record of your sleep with a di-ary (this app’s diary counts!)Avoid exercising 4 hours before bed-timeAlways keep the daytime routine
Afternoon Go to bed and wake up at the sametime everydayAvoid caffeine 6 hours before bed-timeAvoid alcohol 6 hours before bed-timeAvoid napsAvoid heavy meals before bedtime
Evening Sleep only when sleepyGet out of bed when not asleep in 20mins and calm down until sleepyUse bed only for sleep and sexPerform a sleep routineTake a bath 1-2 hours before bed-timeAvoid watching the clockMake the bed environment condu-cive to sleep
however due to habit, this individual may only remember to avoid coffee once inside a coffee
shop at which point surrendering to habit is easier than restraint. In such a case, a reminder
that arrives with enough time to allow the individual to avoid this particular habit could have
succeeded in helping.
Reminders driven by context and receptivity are also motivated by the Fogg Behavior Model
(FBM). This model posits that behavior is composed of three different factors: motivation,
ability and triggers. Under the FBM, for any individual to succeed at behavior change, she
needs to be motivated, needs to have the ability to perform the behavior and needs a trigger
to perform this behavior. Take as an example, in the context of a sleep intervention, the
recommendation to "avoid drinking coffee 6 hours before bedtime"; an individual’s ability
level to perform this recommendation varies over the course of the day as shown in figure 3.3,
3.3 DEPLOYMENT AND TESTING 37
FIGURE 3.3: Fogg Behavior Model adaptation of the recommendation "avoidcaffeine 6 hours before bedtitme". The horizontal blue and red rectangle showshow the ability to enact a recommendation depends on time of day
The 1907 Franklin Model D roadster.
where the morning and afternoon are among the best times to provide this recommendation,
while an evening reminder cannot result in behavior change since the window of opportunity
for succeeding has already passed. In the SleepU app, the FBM trigger is a notification
delivered to the user’s phone. COM-B (Michie et al., 2011), a behavior change framework,
relates several causal factors (e.g., capability, opportunity and motivation) for the performance
of volitional behaviour including the influence of extrinsic factors. COM-B was derived
from an exhaustive literature review and the summarization of nineteen different behavior
change frameworks. In comparison to FBM, COM-B considers the role of motivation at a
broader level in the performance of behavior mediated by ability and opportunity (triggers
under the FBM). However, COM-B goes further and suggests that motivation, capability
and opportunity are also influenced by the performance of the behavior. This implies that
motivation can increase as a patient engages more with a behavior resulting in a positive
health outcome.
3.3 Deployment and testing
Using the PECAM framework, it was implemented the SleepU app: an Android application
that uses a Fitbit wearable, user’s feedback and phone sensor data to personalize a sleep
hygiene intervention. The app was designed to only give sleep recommendations to the user,
38 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
while other functionality like tracking and visualization of sleep or other behaviors was not
part of the app. By avoiding this other functionality, any effect of the app on sleep-related
outcomes can be more directly attributed to the sleep recommendations and delivery method
provided by the SleepU app. Also, it has been shown that tracking of behaviors can be
detrimental in domains like weight loss intervention (Jakicic et al., 2016).
The next is a walk-through of the SleepU app. At installation, the app will ask the user to
connect to her Fitbit account and ask for the necessary permissions to automatically access
sleep-related data. The next day at 9 am, the app will push a notification to the user asking
her to fill out a standard sleep diary (figure 3.4:1) (i.e., time to bed and wake up). If the
user starts interacting with her phone before 9am and the communication module detects a
mobile-receptivity state, the app will push a notification about the sleep diary at that time.
After the user fills out the sleep diary, the app immediately uses the Fitbit data and diary
responses to update the probability distributions of the recommendations and selects the sleep
recommendations for the day. The probability estimates for EXP3 are updated daily using
the harmonic mean of sleep duration and efficiency and whether the participant followed
the recommendations provided or not. After the updates, SleepU pushes the morning sleep
recommendation. SleepU provides one sleep recommendation at 3 different time periods:
morning, afternoon and evening. SleepU selects which recommendations to show using
the EXP3 multi-armed bandit (MAB). Notifications for each time period from SleepU are
stopped once the user has seen a sleep recommendation for their current time period. SleepU
knows that a notification has been read because the app notification does not directly display
the recommendation in the notification text as shown in figure 3.4:2, but instead says: “I
have a new sleep recommendation for you!”. This mechanism forces the user to click on the
notification to read the recommendation. When the notification is clicked, the SleepU app
is opened and displays the sleep recommendation. In summary, SleepU will push at least 3
notifications a day (one for each period) and a maximum of one notification per hour between
9am and 12am. Participants in the study were free to mute the notifications or ignore them.
After the first day, while filling out the sleep diary, the user is also asked about the sleep
recommendations shown the previous day and whether she followed any of them. Sleep
3.3 DEPLOYMENT AND TESTING 39
FIGURE 3.4: Different screenshots from the SleepU app. Left to right: 1)SleepU diary entry, the user gets a reminder at 9 am to fill out the diary, if theychecked their phone earlier than that, the receptivity classifier could trigger anotification to fill out the sleep diary. 2) The app pushing a notification to theuser about a new sleep recommendation available, the actual recommendationtext is omitted in the notification. 3) A sleep recommendation viewable afterthe notification is clicked on. 4) Main screen of the app which gives the useraccess to the sleep recommendations selected for her for the current day, withthe other sleep recommendations hidden.
recommendations that were followed then result in an update in the probability estimates of
its respective MAB; this updates the probabilities for all recommendations for that MAB.
The recommendations in the SleepU app (figure 3.4:3) are a slight modification, for improved
readability, of the sleep hygiene recommendations offered by sleep clinicians (Centre for
Clinical Interventions, ) with a single illustration related to the recommendation. The home
screen of the app (figure 3.4:4) provides access to all the sleep recommendations already
provided for the current day.
The SleepU app has four different mechanisms for triggering the delivery of a recommenda-
tion: User, Random, Mobile-receptivity and Diary. User-triggered recommendations refer to
when the user checks the app’s recommendations on her own volition and does not involve
filling out the sleep diary or receiving a notification from the app. In this scenario, the
participant goes on her own to the phone and looks at any of the 3 sleep recommendations
available for the day (morning, afternoon and evening), available from the app’s home screen.
Random-triggered recommendations are those scheduled and shown in a notification at a
random time by the SleepU app as explained in section ??. Mobile-Receptivity-triggered
40 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
recommendations are those shown as a notification to the user after the mobile-receptivity
detector identified a receptive state. Lastly, the Diary-triggered recommendations are those
checked right after filling out the sleep diary; again in this case the participant could check
any of the morning, afternoon or evening recommendations available.
3.4 METHOD
3.4.1 Study design
We conducted a 12-weeks long, within-subjects randomized clinical study with 37 college
students from Carnegie Mellon University (CMU). The study design is shown in figure 3.5.
After screening, participants in the study were assigned at random to two different groups:
app-first group or sleep-appointment-first group. Randomization also took into account
group balance by gender. All participants were exposed to three different study phases (each
approximately 4 weeks long) in varying order depending on their group assignment: Baseline,
in this phase only data collection took place; App-Intervention, in this phase students were
asked to install the SleepU app on their phones; Sleep-appointment, in this phase students
were asked to attend a sleep health appointment in which a standard sleep hygiene intervention
was delivered by a sleep clinician. The one-time sleep-appointment was provided by the
university health center of CMU and it is part of their standard university wellness program
provided at no cost to students. The two different groups were created to counterbalance
any possible order effect. Due to limited availability of the sleep-appointment, participants
starting the study late, cancellations of the sleep appointment, and the semester calendar, the
study duration varied slightly among participants.
3.4.2 Participants
Participants were recruited using flyers and Facebook posts at university groups in the
beginning of January 2019. Participants eligible to participate in the study had to comply
with two different sets of requirements: demographics and health-related. Demographic
3.4 METHOD 41
FIGURE 3.5: Study design. All study phases lasted 4 weeks with the exceptionof screening. The Qs indicate times in the study when the participants filledout a battery of questionnaires as explained in section 3.4.4.
requirements were: 18 to 25 years old and with an active undergraduate student status at
CMU. Participants in the study were screened for on-going problematic substance use (i.e.,
drugs, alcohol or nicotine) and sleep disorders (i.e., apnea, narcolepsy, chronic, insomnia).
Participants with a substance use problem or a sleep disorder were not accepted in the study.
This exclusion criteria was necessary because participants with these issues need a specialized
sleep treatment; a standard sleep hygiene approach will not work for them and could worsen
ongoing sleep problems like insomnia. Procedures were approved by the our university’s
institutional review board, and all participants provided informed consent. All participants
were provided at no cost with a Fitbit Flex2, a wrist-worn wearable with sensors that measures
steps and sleep (awake vs. asleep). Participants were compensated with 10 dollars (US) for
each week of data logged in the study, and, as an extra incentive, those filling out 80% or more
of the diaries were allowed to keep the Fitbit Flex2. The participants were not compensated
for using the SleepU app’s sleep intervention functionality (e.g., checking or following sleep
recommendations).
3.4.3 Interventions
3.4.3.1 Sleep-appointment
A sleep-appointment intervention was scheduled with the university health center after a
participant joined the study, but the appointment occurred at the beginning of this intervention
period. During this 45 minutes to 1 hour long appointment, a sleep clinician covered basics
about sleep, performed a sleep assessment using the Pittsburgh Sleep Quality Index (PSQI)
42 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
(Buysse et al., 1989), and went over relevant sleep hygiene recommendations. The sleep
clinicians at our university follow recommendations from the Australian Centre for Clinical
Interventions (Centre for Clinical Interventions, ). After the sleep-appointment, as part of our
study, Fitbit data and sleep diary questionnaires were recorded for 4 weeks.
3.4.3.2 App-intervention
Participants would get a link from the research coordinator with instructions on how to install
the SleepU app on their phones. After installation, participants had the app on their phones
for 4 weeks and they uninstalled the app afterwards. Specific details about the way the app
works and the delivery of the sleep recommendations can be found in section ??.
3.4.4 Measures
After screening, participants that joined the study were asked to fill out a battery of question-
naires related to sleep health and other related and proximal outcomes after each phase of
the study. The questionnaires included are mechanistic proximal outcomes and measures of
psycho-social or physiological processes that are thought to mediate health behavior change,
as suggested by (Klasnja and Veeraraghavan, 2018). The questionnaires used were: the
Pittsburgh Sleep Quality Index (PSQI) (Buysse et al., 1989), Sleep Practices and Attitudes
(Grandner et al., 2014), Sleep beliefs scale (Adan et al., 2006), Perceived stress scale (Cohen
et al., 1994), Morningness - Eveningness questionnaire (Horne and Östberg, 1976) and a
Readiness to change motivation towards healthy sleep related behaviors questionnaire (i.e.,
motivation questionnaire). We created the motivation questionnaire from a readiness ruler,
a questionnaire that measures the patient’s health stage as defined in the transtheoretical
model of behavior change (Prochaska and Velicer, 1997). The readiness ruler has been used
for smoking cessation (Biener and Abrams, 1991) and alcohol rehabilitation interventions
(Heather et al., 2008). Our modification consisted of adjusting the text content for sleep
hygiene recommendations and decreasing the number of options from 10 to 8 options for
improved readability on a mobile phone, on which participants were filling out the question-
naire.
3.4 METHOD 43
In our motivation questionnaire, we asked participants to rate their readiness for each of the 14
different sleep recommendations listed in table 3.1, excluding the sleep diary recommendation
since we directly compensated participants for the diary entries. The readiness levels used
a scale from 1 to 7: 1) Not ready at all, 3) Thinking about it, 5) Planning and making a
commitment, 7) Actively/Already doing it, and a Does not apply to me option (e.g., the coffee
recommendation for participants that do not drink coffee).
Additionally, participants were asked to fill out a standard sleep diary, everyday, during the 12
weeks duration of the study. During the baseline and sleep-appointment phases, participants
received an email with a link to a website form every morning. For the app-intervention
phase, participants received the sleep diary prompt on the SleepU app (via a notification)
with three extra questions asking whether the participant followed any of the three sleep
recommendations generated by SleepU the previous day.
Sleep duration and efficiency were collected continuously during the 12 weeks of the study
except when the Fitbit was being charged. Participants in the study were instructed to wear
the Fitbit Flex2 at all times including while taking a shower and while sleeping, with the
exception of recharging. The device does not collect any data while recharging or when the
user does not wear it.
3.4.5 Analysis plan
The main hypotheses we tested are:
(1) H1: The combination of a mobile-receptivity detector and a decision-making mod-
ule results in better sleep duration and efficiency than a traditional sleep hygiene
intervention.
(2) H2: Delivering sleep recommendations at mobile-receptivity states has a higher
operationalization than recommendations users tried to put in practice on their own
or using alternative mechanisms.
(3) H3: The SleepU app increased sleep-related motivation.
44 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
For H1) The combination of a mobile-receptivity detector and a decision-making module
produces better sleep outcomes than a traditional sleep hygiene intervention, we compared the
app-intervention against the baseline and the standard sleep-appointment intervention. The
SleepU app and the sleep-appointment have the same base information (e.g.,, the same sleep
recommendations), however SleepU learns over time to select and remind the participant only
about recommendations that result in an increased sleep duration or efficiency and during
detected mobile-receptivity states. This means that since the content of both interventions is
the same, any difference in their outcomes should only come from the differences between
SleepU and the sleep-appointment. For this comparison, we performed a regression to analyze
sleep outcome variance across the phases, taking into account interaction effects between
study phase and group. In addition to evaluating outcomes on the entire study sample, for the
post-hoc analysis, we looked at smaller groups based on their sleep duration and motivation
at baseline. Using the baseline Fitbit’s sleep duration data, two groups were formed by
separating our study population into short-sleepers (<7 hours per day) and long-sleepers (>=7
hours per day); this same post-hoc analysis was used in a recent pilot study of sleep hygiene
(Levenson et al., 2016). This grouping is further supported by expert consensus and national
guidelines that state adults should sleep for more than 7 hours, and that individuals already
sleeping for 7 hours are generally not expected to increase their sleep duration. Similarly, we
created two more groups based on motivation to put sleep recommendations into practice.
According to self-efficacy theory and the transtheoretical model of behavior change(Prochaska
and Velicer, 1997), we expect participants with a readiness to change level of 5 or higher
(e.g., action or maintenance) to be very motivated to put sleep recommendations into practice,
while those that are at levels less than 5 (e.g.,, pre-contemplation, contemplation, preparation),
to be less motivated and hence will have a smaller or no change in sleep related outcomes.
For H2) Delivering sleep recommendations at mobile-receptivity states increases their opera-
tionalization, we looked at how participants interacted with the different recommendations
during the app-intervention phase of the study and whether they put them into practice the
next day or not. For testing this hypothesis, we computed the total actionability rate of each
type of triggering mechanism (e.g.,user, random, mobile-receptivity, diary), where we define
actionability to be the total number of recommendations followed of each type divided by
3.4 METHOD 45
the total number of recommendations seen of each type. Actionability was first computed for
each participant and type and then the median was used as the summary for all participants in
the study. The median was used instead of the mean due to the small sample size and the data
not following a well-defined probability distribution.
For H3) The SleepU app increased motivation, we looked at the differences in scores for the
motivation questionnaires administered at the end of each phase of the study. To understand
the effect of phase and group, we applied a two-way ANOVA.
To analyze the Fitbit sleep data (sleep duration and efficiency) across study phases, sleep data
was evaluated for normality using a Shapiro-Wilk normality test and for homogeneous vari-
ance using a Fligner-Killeen test. For normally distributed data with homogeneous variance,
a standard ANOVA was used; otherwise the Aligned Rank Transform for Nonparametric
Factorial ANOVAs was used (ART) (Wobbrock et al., 2011). ANOVAs were followed by
pairwise comparisons using paired t-tests. ARTs were followed by pairwise comparisons
using a Wilcoxon signed rank test. The appropriate effect size estimate (r, cohens’ d) was
used in each case.
Following advice from the American Statistical Association (Wasserstein et al., 2016; Wasser-
stein et al., 2019) results are presented comprehensively including both successes and failures.
For hypothesis tests, also as suggested, we report the p-values and explicitly avoid using the
term "statistically significant" (Thiese et al., 2016; Kim and Bang, 2016) and instead we trust
that researchers can make their own judgments. We also present both adjusted and un-adjusted
p-values as suggested by (Rothman, 1990), recognizing that for planned comparisons, p-value
adjustment is not necessary (Saville, 1990).
We performed our data analysis using Python 3.6 and the libraries numpy(Oliphant, 2006),
scipy(Jones et al., 2001 ), matplotlib(Hunter, 2007), seaborn (Waskom et al., 2017) and
pandas (McKinney and others, 2010). Hypothesis testing was performed using R 3.6.1 and the
ARTTool packages (Kay and Wobbrock, 2016). Both exploratory data analysis and hypothesis
testing were conducted in Jupyter notebooks.
46 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
3.5 Results
After screening, 37 participants were invited to join the study. Of those, 30 participants (22
Female, 7 Male, 1 Undisclosed) finished the study. 17 participants were in the sleep-first
group (3 Male) and 13 in the app-first group (4 male). The average length of the study was
84.4 days (min=69, max=96). We used the Fitbit data for hypothesis testing of changes in
sleep duration and efficiency. We excluded the Fitbit data from 4 participants due to large
amounts of missing data during some of the study phases. The resulting dataset has a total of
26 participants and it was used to test H1 and H2 and in their related post-hoc analyses. For
readiness to change motivation (H3), we used all of the 30 participants’ responses available
because only questionnaire data was necessary. A breakdown of the number of participants
and post-hoc analysis groups is shown in table 3.2. A summary of all of the results related to
sleep duration and efficiency (H1) is shown in table 3.3 and results related to motivation (H3)
are shown in table 3.4. The sleep progression during the semester of the sleep-intervention
first and app-intervention first groups is shown in figure 3.6.
A.I A.IS.A S.A
FIGURE 3.6: Sleep duration changes over the semester. From left to right. 1)Sleep duration for the sleep-appointment first and app-intervention first groups,S.A corresponds to tje Sleep-Appointment phase and A.I corresponds to theApp-intervention phase. 2) Sleep duration for motivated vs less-motivatedparticipants. 3) Sleep duration for short vs long sleepers.
3.5 RESULTS 47
H1 and H2 H3Short-sleepers Long-sleepers Total Short-sleepers Long-sleepers Total
Motivated 7 6 13 7 6 13Less-motivated 3 10 13 3 14 17
Total 10 16 26 10 20 30TABLE 3.2: Participants distribution for the different analyses
3.5.1 H1) The combination of a mobile-receptivity detector and a
decision-making module produces better sleep outcomes than a
traditional sleep hygiene appointment intervention
On average, sleep duration for the participants while in the app-intervention was maintained
(small increase of 4.2 minutes) from their baseline sleep (p = 0.51, padj = 1.0, r=0.09).
In comparison to the sleep-appointment, participants slept 19.2 more minutes while in the
app-intervention (p = 0.016, padj = 0.049, r = 0.32). Sleep efficiency across all participants
in the different phases did not change; for the different sub-samples, the differences are minor
(2%), which for 7 hours of sleep account for only 8.4 minutes difference. These changes in
efficiency are not clinically meaningful and as such are not discussed any further.
Sleep duration for short-sleepers when experiencing the app-intervention (SleepU) increased
by 36 minutes from their baseline duration (p = 0.043, padj = 0.13, d = 1.12) and it was also
24 minutes longer than when experiencing the sleep-appointment intervention (p = 0.068,
padj = 0.2, d = 0.8). Sleep duration for motivated participants experiencing the app-
intervention increased 19.8 minutes in comparison to their baseline (p = 0.09, padj = 0.27,
d = 0.37) and was also 22.2 minutes longer than when experiencing the sleep-appointment
intervention (p = 0.03, padj = 0.09, d = 0.48). Similar results were achieved in a recent pilot
study by sleep researchers (Levenson et al., 2016) in a similar population of short-sleepers
applying a traditional sleep hygiene intervention plus a social comparison component.
Sleep duration for long-sleepers experiencing the app-intervention decreased by 16 minutes
in comparison to their baseline phase (p = 0.43,padj = 1.0,r = 0.14), but this was still
15 minutes higher in comparison to when they were experiencing the sleep-appointment
intervention (p = 0.10, padj = 0.31, r = 0.29). Similarly, sleep duration for less-motivated
48 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
participants during the app-intervention decreased by 15.6 minutes in comparison to their
baseline phase (p = 0.59, padj = 1.0, r = 0.11) and was 15 minutes higher in comparison to
when they experienced the sleep-appointment intervention (p = 0.15, padj = 0.44, d = 0.29).
Based on these results, we can confirm H1, that the combination of the mobile-receptivity
detector and decision-making module in the SleepU app resulted in better sleep outcomes
than the traditional sleep hygiene appointment intervention.
3.5.2 H2) Delivering sleep recommendations at mobile-receptivity
states increases their operationalization
The median actionability rate of each type of recommendation across all participants is shown
in figure 3.7. A Friedman test did not reveal any large statistical differences across the four
types for all participants (p = 0.16) or the post-hoc groups (short-sleepers, long-sleepers,
motivated and less-motivated). However, sleep recommendations delivered via the mobile-
receptivity detector had a median 75% of actionability for all participants in comparison to
50% for the other mechanisms. For some groups like short-sleepers actionability was as high
as 86%. While the results shown in figure 3.7 are promising, we cannot confirm H2, that
mobile-receptivity increased operationalization of the sleep recommendations.
3.5.3 H3) The SleepU app increased motivation
There was a change in average motivation for all participants from 4.6 in the baseline phase
to 5.17 when experiencing the app-intervention (p = 0.02, padj = 0.059, r=0.55). There
was also a change for all participants from 4.6 in baseline to 5.11 when experiencing the
sleep-appointment intervention (p = 0.057, padj = 0.171, r=0.45). There was no difference
in motivation between the app-intervention and the sleep-appointment intervention (p =
0.705,padj = 1.0,r=0.07). Motivation was also measured before (4.5) and after (4.6) the
baseline phase but there was not a difference between the two (p = 0.65, r=0.13).
3.5 RESULTS 49
5 7 21 17 4 10 24 17 5 6 21 18 5 6 19 19 5 9 25 15
FIGURE 3.7: Actionability rates for all participants, short- and long-sleepersand motivated and less-motivated participants by types of notification triggermechanism (diary, random, mobile-receptivity and user). The numbers at thebottom are the average total number of notifications for each type and group.
Short-sleepers’ average motivation changed from 4.9 in the baseline phase to 5.38 when
experiencing the app-intervention (p = 0.28, padj = 0.83, r=0.59). There was also a change in
motivation for short-sleepers from 4.9 in baseline to 5.2 in the sleep-appointment intervention
(p = 0.41, padj = 1.0, r=0.39).
Long-sleepers’ average motivation changed from 4.5 in the baseline phase to 5.0 when
experiencing the app-intervention (p = 0.076, padj = 0.23, r=0.53). Also, there was a change
from 4.5 in baseline to 5.0 in the sleep-appointment intervention (p = 0.08, padj = 0.25,
r=0.49).
Motivated-students average motivation changed slightly from 5.4 in the baseline phase to 5.6
in the app-intervention (p = 0.21,padj = 0.65,r=0.52). Also, there was a change from 5.4 in
baseline to 5.7 in the sleep-appointment intervention (p = 0.093, padj = 0.28, r=0.40).
For the less-motivated students, average motivation changed from 4.0 in the baseline phase to
4.79 in the app-intervention (p = 0.0056, padj = 0.017, r=0.57). Also, there was a change
from 4.0 in baseline to 4.64 in the sleep-appointment intervention (p = 0.0093, padj = 0.028,
r=0.49).
50 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
Based on these results, we can confirm H3 that SleepU increased motivation over the baseline
phase, but to the same degree as the sleep-appointment intervention.
3.5.4 Summary of results
In general, the results support our main hypothesis (H1) that the combination of a mobile-
receptivity detector and our decision-making module results in better sleep outcomes than a
traditional sleep hygiene intervention. However, the impact was only seen on sleep duration
and not on sleep efficiency. For the different sub-groups, our combined method always resulted
in a higher sleep duration duration in comparison to the sleep-appointment intervention.
For H2, the results are encouraging but do not support our hypothesis. Although there is
a measurement of the actionability for all of the different types of recommendation trigger
mechanisms, the results are only observational and not causal i.e., evaluating this effect would
require a second study in which we deliver the sleep recommendations using a single type of
notification mechanism for a few days each to compare; unfortunately this was not feasible to
do during our study.
For H3, the results not only demonstrate that our SleepU app improved motivation, but also
that this improvement is comparable to that produced by the sleep-appointment intervention.
Additionally, we found that motivation during the screening phase and by the end of the
baseline phase did not change by a large amount. This result shows that filling out the sleep
diary daily and being involved in a sleep study (without receiving any intervention) does
not have a visible effect on participant motivation and hence we would not expect to see a
behavioral change either.
The results demonstrate the value of using a mobile-receptivity detector and a contextual
bandit for the detection of contexts to intervene and select treatment and compared it to a
standard sleep intervention delivered by an experienced clinician. Our results demonstrate
that our system overall (H1), is as good or better than an in-person, individual, one-hour
sleep intervention delivered by an experienced clinician. This result shows that it is possible
3.5 RESULTS 51
Outcome Method Sample Baseline Sleep-A. App-I BaselinevsApp-I
BaselinevsSleep-A.
Sleep-A.vsApp-I.
Duration ART All 7.24 6.99 7.31 p = 0.51,padj = 1.0,r = 0.09
p = 0.084,padj = 0.25,r = 0.24
p = 0.016,padj = 0.049,r = 0.32
Efficiency ART All 94.31% 94.25% 94.34% —-phase (p = 0.96)
—-group (p = 0.73)
—-phase-groupinteraction(p = 0.45)
Duration Anova Motivated 7.17 7.13 7.50 p = 0.09,padj = 0.27,d = 0.37
p = 0.69,padj = 1.0,d = 0.04
p = 0.03,padj = 0.09,d = 0.48
Efficiency Anova Motivated 94.58% 93.95% 93.99% p = 0.07,padj = 0.21,d = 0.26
p = 0.09,padj = 0.28,d = 0.28
p = 0.92,padj = 1.0,d = 0.01
Duration ART Less-motivated 7.374 6.86 7.115 p = 0.59,padj = 1.0,r = 0.11
p = 0.08,padj = 0.24,r = 0.34
p = 0.15,padj = 0.44,r = 0.29
Efficiency Anova Less-motivated 94.04% 94.55% 94.69% —-phase(p = 0.291)
—-group(p = 0.679)
—-phase-groupinteraction(p = 0.833)
Duration Anova Short-sleepers 6.3 6.5 6.9 p = 0.043,padj = 0.13,d = 1.12
p = 0.334,padj = 1.0,d = 0.2
p = 0.068,padj = 0.2,d = 0.8
Efficiency Anova Short-sleepers 94.82% 94.11% 94.39% —-phase (p = 0.03)
—-group(p = 0.978)
—-phase-groupinteraction(p = 0.02)
Duration ART Long-sleepers 7.82 7.30 7.55 p = 0.43,padj = 1.0,r = 0.14
p = 0.004,padj = 0.01,r = 0.48
p = 0.10,padj = 0.31,r = 0.29
Efficiency Anova Long-sleepers 93.99% 94.33% 94.31% —-phase(p = 0.672)
—-group(p = 0.628)
—-phase-groupinteraction(p = 0.875)
TABLE 3.3: Hypothesis testing results related to sleep duration and efficiency.Sleep-A: Sleep-appointment, App-I: App-intervention
and effective to scale a behavioral intervention like sleep hygiene in the form of an Android
application. Due to the pervasiveness of mobile phones (median 45% ownership for devel-
oping economies and 75% for developed economies (Taylor Kyle, 2019)) this opens up the
possibility to deliver this kind of intervention with ease to large groups of people with the
only limiting but not insurmountable factor being the usage of a Fitbit for tracking sleep; the
user could alternatively manually log sleep data every day on the app. Despite the promising
results, we can only partially attribute this outcome to the inclusion of the mobile-receptivity
52 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
Outcome Method Sample Baseline Sleep-A. App-I BaselinevsApp-I
BaselinevsSleep-A.
Sleep-A.vsApp-I.
Motivation ART All 4.662 5.116 5.17 p = 0.02,padj = 0.059,r = 0.55
p = 0.057,padj = 0.171,r = 0.45
p = 0.705,padj = 1.0,r = 0.07
Motivation ART Short-sleepers 4.942 5.256 5.388 p = 0.28,padj = 0.83,r = 0.59
p = 0.41,padj = 1.0,r = 0.39
p = 0.77,padj = 1.0,r = 0.03
Motivation ART Long-sleepers 4.522 5.047 5.062 p = 0.076,padj = 0.23,r = 0.53
p = 0.08,padj = 0.25,r = 0.49
p = 0.956,padj = 1.0,r = 0.04
Motivation ART Less-motivated 4.662 5.116 5.17 p = 0.28,padj = 0.83,r = 0.59
p = 0.41,padj = 1.0,r = 0.39
p = 0.77,padj = 1.0,r = 0.03
Motivation ART Motivated 4.082 4.648 4.799 p = 0.0056,padj = 0.017,r = 0.57
p = 0.0093,padj = 0.028,r = 0.49
p = 0.3388,padj = 1.0,r = 0.19
TABLE 3.4: Hypothesis testing results related to Motivation. Sleep-A: Sleep-appointment, App-I: App-intervention
detector (H2). We also fully corroborated that the system is as persuasive as an experienced
clinician in helping users to feel motivated to follow sleep recommendations (H3).
When experiencing the app-intervention, students had a small increase in sleep duration of
4 minutes, which means they mostly maintained an already healthy sleep duration from the
baseline period. In contrast, when experiencing the sleep-appointment intervention, students
lost 15 minutes (p = 0.084, padj = 0.25) of sleep duration in comparison to their baseline.
We further investigated this sleep duration loss or maintenance among students and we found
that less-motivated students and long-sleepers already had an average healthy daily sleep
duration of 7.3 hours or higher during the baseline. Both groups had a decrease in sleep
duration during the sleep-appointment and app intervention phases as shown in figure 3.6,
however this decrease was smaller for the app-intervention. This pattern of losing sleep as the
academic semester advances for college students was also found by the StudentLife project
(Wang et al., 2014), and may be the result from an ever increasing workload during the
academic term that forces students to sacrifice sleep in order to finish homework or prepare
for exams.
3.5 RESULTS 53
Students in the motivated group, had a borderline healthy sleep duration (7.1 hours) at baseline.
These students gained 19.8 minutes of sleep during the app-intervention in comparison to their
baseline and 22.2 minutes in comparison to the sleep-appointment. These students maintained
their baseline sleep duration during the sleep-appointment phase (average decrease of only
2.4 minutes). These results show that sleep duration gains depends on whether there is actual
room for improvement. The maintenance of sleep duration in long-sleepers and less-motivated
students shows that the outcome of a sleep intervention in general may not always result in
an increase of sleep duration or any other sleep related outcome like efficiency, but rather
the maintenance of baseline values. The results also show that SleepU, even under these
circumstances, also helps students that are naturally moving from a higher to a lower sleep
duration due to external pressure (e.g., increasing stress from the academic semester), and
helps by minimizing losses in sleep duration.
Short-sleepers in our study had the most to gain in comparison to all the other students. They
started with an average sleep duration of 6.3 hours in the baseline phase, and saw an increase
of 12 minutes while in the sleep-appointment and 36 minutes while in the app-intervention.
In this case, we can see that these students had the most to gain from a sleep intervention,
however their gains in sleep duration were 3 times higher from our personalization method
compared to a standard sleep-appointment. Moreover, based on our results for H1 and H2,
the improvement appears to comes from our unique approach to personalization. It is worth
noting that this 36 minute increase in sleep duration for short-sleepers is clinically significant.
In a 2013 study (Haack et al., 2013) with pre-hypertension patients, it was shown that an
increase of 36 minutes in sleep duration results in a significant decrease in blood-pressure.
The participants in that study were also short-sleepers and the study duration was 6 weeks.
There was a change in average motivation for all participants from 4.6 in baseline to 5.17 in
the app-condition to 5.11 in the sleep-appointment. This change means that participants as
a whole moved from a stage of contemplation to an action stage for both treatments. This
finding not only shows that SleepU is comparable to a sleep clinician in its persuasiveness, it
also demonstrates that even though motivation was similar across both the sleep-appointment
54 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
and the app-intervention, sleep duration did not remain the same and instead, after the sleep-
appointment, participants’ sleep duration was usually lower. In other words, despite having
the same interest and possibly having the same intentions, only during the app-intervention
phase of the study were the students able to succeed at improving their sleep. It is not
surprising that students would only show a positive behavior change while using the SleepU
app. The app was a frequent and contextual reminder of different things to do to improve sleep.
Once the app was no longer being used (as they started the sleep-appointment intervention),
their prioritization of sleep health remained the same, however their success at improving or
maintaining their sleep was lower.
Using mobile-receptivity triggered notifications has a lot of promise, even though their
actionability compared to the other triggers was not that different. As shown in figure 3.7
mobile-receptivity has the highest median actionability for all participants and across all
sub-groups investigated. For short-sleepers, the actionability of mobile-receptivity-triggered
notifications has a median of 86% while the next highest actionability is for user-triggered
notifications, which has a median of about 50%; this is a substantial difference despite the lack
of statistical power. Similarly, for motivated participants, the median actionability for mobile-
receptivity triggered notifications has a median of 80% while for user-triggered notifications,
it is again about 50%. Further evaluation of actionability over different days of the study
(figure 3.8) shows that the actionability for mobile-receptivity was increasing over the course
of the intervention. This is evident from the distribution over actionability "moving" from
the lower left (low actionability early in the phase) to the top right (increased actionability
late in the phase) for both motivated participants and short-sleepers. This indicates that the
study length was not long enough to reach the highest level of actionability possible by this
mechanism.
The results from our study are in line with those of a similar study by sleep researchers
(Levenson et al., 2016) where a sleep health intervention plus a social comparison component
was applied. However, that intervention (Levenson et al., 2016) relied on expert sleep
clinicians personalizing the sleep intervention for each participant in the study. In comparison
SleepU is comparably cheaper and scalable, since the app runs all necessary computing locally
3.5 RESULTS 55
Motivated Short-sleepers
FIGURE 3.8: Actionability of mobile-receptivity generated sleep recommend-ations by day in the app-intervention phase for motivated and short-sleeperparticipants. This plot shows how by the end of the intervention phase, for bothgroups actionability was increasing over time (more density in the top right ofeach plot). The lack of actionability early in the phase is a natural consequenceof the mechanism inside the communications module that does not use themobile-receptivity detector when the user checks the recommendations on herown.
on the user’s phone and does not require of a sleep clinician. Although our instantiation used
a Fitbit to track sleep, in a different instantiation, the app can rely instead solely on user
self-reports of sleep duration, or on one of the increasing number of mobile phone-based
assessments of sleep duration (e.g., SleepScore).
In terms of limitations, all of the effects found in this study are short-term; long-term effects
could not be evaluated given the length and study design. Future work will address this
issue by increasing the time over which students interact with SleepU. The scope of the
study also limits our results to the college student population, although some of our findings
may generalize across populations with similar constraints and behaviors (e.g., high school
students). Another limitation of the current work is that there was a monetary incentive to
install SleepU and fill out daily sleep diaries. We did not see any effect of filling out sleep
diaries in participants’ motivation, however just getting individuals to install and try a mobile
phone app could be challenging. In 2018, mobile phone users uninstalled %28 of the health
apps installed on their phones (of Apps, 2018). One possible way to minimize the uninstall
56 3 TREATMENT SELECTION AND RECEPTIVITY (COMPLETED)
rate is focusing on the first interactions of the user with the app and in the specific case of a
mobile health intervention in the first treatment. As will be described in chapter 4, the first
treatments could be improved by using a prior that helps the contextual bandit pick sleep
recommendations from the beginning that are most likely to be followed and have a positive
outcome on sleep.
CHAPTER 4
Development of a personalized model of effects (proposed)
A model of effects is a mathematical model that can estimate the direction and/or strength
of a treatment on a health outcome. These models, are usually some form of generalized
linear regression, and are estimated to measure the average long-term effect of treatment
after a randomized clinical trial. However, this effects model is usually not personalized
but instead is only capable to give population or study-sample-level estimates. For this part
of the proposed work, I will investigate methods for estimating a personalized model of
effect: A model capable of estimating from a small amount of behavioral data (weeks or days)
the effects of the treatments of a health intervention for an individual for whom, treatment
related data does not exist yet. These models can be used to inform the selection of the intial
treatment as will be shown in chapter 6.
Mobile health researchers have recognized the selection of a good intial treatment (a good
initial policy(Tewari and Murphy, 2017)) as an important problem in the development of
mobile health interventions. Bad intial treatments can have a negative impact on health, and
user engagement. Tewari et al., (Tewari and Murphy, 2017) further argue that although this
could be done by including expert knowledge, it could be very difficult to capture accurately,
since experts may not be able to take context into account.
A model of effects could be used beyond the selection of initial treatment, however for the
specific work proposed in chapter 6, the informed prior built from this part of the project,is
enough to inform the intial treatment and any future intervention point. This is possible
because in this proposal I’m using a contextual bandit method that starts using as a prior an
uniform distribution, this could be replaced by the prior estimated from the model of effects.
57
58 4 DEVELOPMENT OF A PERSONALIZED MODEL OF EFFECTS (PROPOSED)
Over time, the contextual bandit will personalize (update) this prior to the outcomes of the
health intervention.
4.1 Related work
Mobile health researchers have usually relied on models of effects that are estimated with the
intervention itself(Rabbi et al., 2016; Daskalova et al., 2016; Paredes et al., 2014; Yom-Tov
et al., 2017) (i.e., there is a model of effects at the beginning of the intervention). More
recent approaches have looked at cohort-based modeling (Daskalova et al., 2018) a method
very similar to collaborative filtering (Aggarwal and others, 2016; Breese et al., 1998). In
Cohort-based modeling, where treatment for a new patient in a sleep intervention is provided
based on its effectivity for similar patients: For a new patient, behavior data is used to build
a sleep health profile, this data is used to select a cohort from a pool of patient’s. Then, the
patient’s sleep health aspects are compared to the cohort and the worst one is selected as the
target of the sleep intervention. Treatment is then selected for the individual by looking at the
cohort’s best treatment for the target of the sleep intervention.
4.2 Proposed work
Estimating a model of effects is challenging: Any personalized model of effects, due to the
small amount of data available, is likely to overfit (i.e., cannot generalize to observations
that are not in the training data set). In general, for personalized model of effects the main
constraints are that the method has to learn from a small number of observations and yet
should be able to generalize. Although this sounds like an impossible task, this constraint
does not mean that the data or even the model should be limited to the patient’s.
For this part of the proposal, two main approaches will be compared in terms of predictive
power and computing complexity:
(1) Cohort-based approach: Under this approach, formulated by Daskalova (Daskalova
et al., 2018), a profile is created using relevant demographics and behavioral variables
4.4 ENVISIONED RESULTS 59
and using those variables a cohort is selected from a pool of available data. Using
the cohort’s data a model of effects can be estimated.
(2) Model of effects including demographics: In this approach, demographics and
behavioral variables are directly included in the model and a single model is used to
estimate effects for any participant.
4.3 Evaluation
In order to understand the predictive power of each of the approaches listed above, they will
be used to predict the effect of sleep recommendations in sleep duration. I will be using
the dataset already collected during the sleep intervention study described in section 3. The
evaluation metrics are the standard performance metrics used in machine learning to evaluate
classifiers (accuracy, precision, recall and f1_score, measures of fit like R2, and measures of
fit that penalize model complexity like the Akaike Information Criterion(Akaike, 1998) (AIC)
and the Bayesian Information Criterion (Schwarz and others, 1978) (BIC). The evaluation will
be done replicating the conditions of the deployment of these models: There is an available
pool of data, and using demographics and baseline data, a subset of the data is used to estimate
effects models.
The available dataset has in total 12 weeks of data from 26 participants. The dataset includes
fitbit data like sleep related measurements like duration, efficiency, and physical activity
data like steps, and strength of physical exercise. Daily self reported data includes caffeine
consumption, cognitive activities before bed and sleep disruptions. For 4 of the 12 weeks,
the participants self-reported whether they followed sleep recommendations provided by our
SleepU app and so we have data related to the effectivity of specific sleep recommendations.
4.4 Envisioned results
From this work, I foresee the creation of specific machine learning pipelines that will allow
for creating models of effects for new participants from a very small amount of observations.
60 4 DEVELOPMENT OF A PERSONALIZED MODEL OF EFFECTS (PROPOSED)
As a secondary contribution I foresee the comparison of a cohort-based approach vs the
personalized population model.
CHAPTER 5
Development of models of behavior (proposed)
In order to capture patients behavior from sensor data, I plan to use a markov decision
process and methods from inverse reinforcement learning (IRL) like maximal causal entropy
to estimate the probabilities and costs. IRL is a general method that models state-action
tuples of an agent and environment with the goal of discovering a reward function and the
underlying policy (i.e.,the decision-making process) that the agent uses to interact with the
world (Ng et al., 2000). IRL models can inform a mobile health intervention by estimating the
preference of a treatment under specific contexts: An IRL model can estimate the likelihood
of an observed state (context) and an action (treatment).
Traditional approaches like generalized regression model approaches, are not well equipped
to deal with decision-making or considering the effect of context in behavior. Additional
constraints like how the behavior is related to the optimization of general behavioral and
contextual preferences and outcomes are also overlooked in traditional methods. These
behavioral preferences are strongly related to self-perceptions of ability and context which
have strong influence in behavior change.
In comparison with other approaches like a general regression model, do not take into
account the dynamics of the environment (i.e., how a patient transitions between contexts).
Other approaches that take into account dynamics do not fit well human behavior data by
overly constraining the shape of the probability distributions (Hidden Markov Models using
Expectatin Maximization). IRL models in contrast can model dynamics and have methods
available for estimating the probability distributions (e.g., maximal causal entropy) that are
better fitted to model human behavior(e.g., by allowing for probability distributions that have
a higher entropy).
61
62 5 DEVELOPMENT OF MODELS OF BEHAVIOR (PROPOSED)
5.1 Related work
In previous work, IRL has been able to capture the underlying policy behind everyday
activities like driving a cab(Ziebart et al., 2008), peoples walking trajectories around an
office(Kitani et al., 2012), driving behaviors of aggressive vs non-aggressive drivers(Banovic
et al., 2017), and even uncover and transfer the policy of an expert aerobatic RC-helicopter
pilot (Abbeel et al., 2010). IRL modeling in the context of a mobile health intervention, can be
used to uncover people’s policy: how people make decisions in the real world and how those
decisions are affected by their current physiological state and the state of the environment.
For this proposal, I am particularly interested in how IRL models can improve the effectiveness
of health interventions by informing how a patient would respond behaviorally to specific
treatments. As an example, in a mobile health intervention a system could recommend to a
patient to exercise more at specific times of day and places, however, such recommendation,
if ill-informed, may cause the participant to harm or give up on trying because the treatment
is too strenuous or does not accommodate the patient’s lifestyle. Instead, using an IRL-based
approach, it could be estimated whether particular treatments are likely to be followed by the
patient. Based on that, personalization of the health intervention is not only achievable, but
more likely to succeed. To investigate the use of IRL in mobile health interventions, I will be
using the data already collected from the sleep intervention study 3.
One of the first research questions I will be investigating is whether we need individual models
for each participant, models for subsets of participants or a population model. I have explored
similar research in the past (Gjoreski et al., 2015; Hong et al., 2012; Hong et al., 2015)
in the context of activity recognition. In that work, I found that neither the population or
individual level models were useful and instead a middle ground approach produced the best
results. Although this hints towards this middle-ground approach in the domain of modeling
behaviors, there is a key difference with respect to activity recognition and in general, the
topic needs more research.
5.4 ALTERNATIVE PLAN 63
5.2 Evaluation
For this part of the project I will be comparing between the individual, middle-ground and
population-level behavior modeling. The evaluation metrics are those usually used in machine
learning to evaluate classifiers (accuracy, precision, recall and f1_score, measures of fit like
R2, measures of fit that penalize model complexity like the Akaike Information Criterion AIC
(Akaike, 1998) and the (Schwarz and others, 1978) Bayesian Information Criterion. Once
behavioral models are established, an collaborative filtering approach will be used to find the
best behavioral models for new participants.
5.3 Envisioned results
From this part of the proposed work, I foresee the creation of an inverse reinforcement
learning approach that leverages available behavioral data to estimate a new behavioral model
for a new participant.
5.4 Alternative plan
Although IRL methods are well known and have been used in complex domains, their per-
formance with a very limited amount of data is unknown. For that reason if this approach does
not work well enough as an alternative I plan to use collaborative filtering and content-based
filtering. Collaborative filtering (Breese et al., 1998) is an approach for recommender systems:
Systems that try to predict the preference of a person for a particular item. Collaborative
system starts with an usually incomplete set of preference of a user, those preferences are then
used to find other users in a database with similar preferences, then new items are suggested
from the pool of other prefered items by used in the pool. This is a very similar approach to
the cohort-based approach described in section chapter 4.
Another applicable method is content-based filtering (Aggarwal and others, 2016), in this
approach, items are described through a series of properties. Those properties are used as
64 5 DEVELOPMENT OF MODELS OF BEHAVIOR (PROPOSED)
input of a machine learning classifier which can then estimate which properties of those items
are desirable by the user and additionally, may be able to predict if a new item is in general
desirable. In the context of a sleep intervention I could create several dimensions for each
sleep recommendation like: Difficulty of performance, time of day applicable, perceived
value, etc. Once those dimensions are created a classifier could be built to predict whether a
participant will like or not a particular sleep intervention.
CHAPTER 6
A models-based approach to select initial treatment (proposed)
After figuring out the best techniques for estimating models of effects and behavior. I will be
using those models in the context of a sleep health intervention to select the intial treatment,
this models-based approach will also be used as priors for the contextual bandits. Priors are
probability estimates for each of the sleep recommendations available in the study that inform
how likely is a recommendation to be followed and improve sleep duration and efficiency.
The goal of using a prior is to speed up learning, reduce variance and decrease the impact of
noise.
The general idea is to replicate and re-use the same system from the former sleep intervention
study with the difference that this time the contextual bandit will not be starting selecting
treatments using a uniform probability distribution and instead the probabilities are going to
be informed using the models of behavior and effect estimated. Another key difference is that
the study is going to be 8 weeks long total, with 6 weeks of intervention (the previous version
was only 4 weeks of intervention). There are going to be only minimal changes to the app so
that the results from this new study can be compared to the earlier study.
6.1 Related work
Selecting an initial treatment or a starting policy is a general problem that is also found in the
field of contextual-bandits. Zhang et al.,(Zhang et al., 2019) for example explore the problem
of combining some expert advice (supervised labels) and combining it with the feedback
acquired by a contextual bandit to solve the starting policy problem. In Liao’s work, the expert
advice and the contextual-bandit are assumed to be be misaligned e.g., the preferences of the65
66 6 A MODELS-BASED APPROACH TO SELECT INITIAL TREATMENT (PROPOSED)
expert and those experienced by the contextual bandit may not be the same. In the context of a
mobile health intervention, expert advice could be health interventions provided by a clinician
while the contextual bandit feedback is provided by the patient. If the clinician didn’t assess
well enough the patient’s preferences, there is going to be misalignment between the two and
the intervention is not going to be as effective. Yet another example is estimating the effects
of an intervention using a model of effects, and using that model estimates as expert advice,
again here if the preferences of the patient are not taken into account misalignment will occur
and generate adverse outcomes.
Mobile health researchers have mostly used uniform priors (Paredes et al., 2014; Rabbi et al.,
2016; Yom-Tov et al., 2017) which is equivalent to assuming that all treatments are equally
good. Liao et al., (Liao et al., 2019) is one of the first to consider for future studies the use of
an informative prior from a previous study to initialize the reinforcement learning algorithm.
6.2 Simulation
Although the main goal of this work is to test this approach in a real deployment, the first step
is to simulate and estimate empirical bounds of possible outcomes. There are going to be two
main approaches for simulation:
(1) . Abstract simulation. In this simulation the goal is to gain empirical understanding
of the impact of priors varying from very close to very far from the real probability
values, and then seed those different priors to the contextual bandit. For this simula-
tion, no real sleep data is going to be used and instead idealized data generated from
normal distributions that approximate the ranges and values observed in the actual
study will be used.
(2) Sleep-personas simulation. In this simulation, the goal is to create several personas
from the data collected in our sleep study and then use those profiles and again
simulate what happens with priors varying from very close to very far from the real
probability values. The used personas will be varying from very predictable (i.e., low
variance and low entropy) to unpredictable (high variance and low entropy). These
6.3 STUDY 67
estimates will provide bounds on how the quality of the priors will affect different
types of people in a real deployment.
In general, the outcomes from the simulations are estimates of the consequences of having a
range of priors varying from close to the underlying preference and effect of treatment, on the
patient to a completely uninformative estimate. The measures used to evaluate the performance
of the approaches proposed are borrowed from the transfer learning in reinforcement learning
literature(Taylor and Stone, 2009), namely: Jumpstart (Gains in treatment outcomes during
the first week of intervention), Speed (Time to achieve the long-term treatment outcome),
and Generalization (Improvement gains on the treatment outcome in the long-term). The
results from this simulation will inform how to use and implement the priors in a real world
deployment.
6.3 Study
The main goal of this study is the measurement and comparison of two different approaches
for selecting the intial treatment in the context of a mobile health intervention for sleep.
The first approach is to select intial treatment using a model of behavior and a model of
effects. The second approach is to ask participants in the study via a survey, to estimate their
preferences and forecast of the effects of each of the treatments in the intervention. These
estimates will then be used to select intial treatment. The main reason to include this survey
approach is two fold: First, Although model-driven approaches are attractive they limit their
application to economies where there is access to a data scientist or an expert capable of
estimating such models, in comparison surveys are available to anyone. Second, even if the
model-driven approach is better, it is really important to understand how much better can it
be and whether it is justified to use a method that is computationally and economically more
expensive than a simple survey.
The study will be following a between subjects design and both groups will be interacting
with the same system as in the earlier study 3 and the only difference between them is going
to be the method used for estimating the intial treatment and priors for the contextual bandits.
68 6 A MODELS-BASED APPROACH TO SELECT INITIAL TREATMENT (PROPOSED)
The main research questions are the next:
(1) How does it compare the model-based and the survey approaches?
(2) How does it compare an approach using an informed prior against uniform pri-
ors(former study)?
(3) Does giving better intial treatments improve adherence in subsequent treatments?
For research questions 1 and 2, the measurements that will be compared are sleep duration,
sleep efficiency, motivation to improve sleep, jumpstart, speed, and generalization. For
research question 3, adherence will be measure by looking at how many of the sleep recom-
mendations are followed and seen in the app. Motivation will also be measured in this case
and compared to motivation levels for the same stage in the former study.
In terms of the questionnaires, I will be expanding the measures to collect health outcomes
that may be affected by improved (or worsen) sleep like: General Mood, Attention, Memory
and Stress. This is an important step since sleep is fundamental to many biological processes.
The main goal here is to collect data that can inform about measurable consequences of
changes in sleep duration.
For screening, I will be using the same criteria as in the former study which excluded
participants with sleep disorders or problematic substance use.
6.3.1 Study protocol
The study will follow a between subjects study design. After screening, participants are
assigned at random to one of two groups: Survey approach vs Models-based approach. The
study length is 8 weeks with two weeks of baseline and 6 weeks of the sleep intervention.
During the entire duration of the study the participants will be wearing at all times a Fitbit
device (we are still deciding between using a Fitbit inspire or an Inspire HR). For the duration
of the study, participants will be answering a daily questionnaire in which they will be asked
about whether they performed any of the sleep recommendations. These questions will be
asked even during the baseline period.
6.4 ENVISIONED RESULTS 69
6.3.2 Power analysis
In order to estimate the sample size required for this study, I estimated sample size following
(Sakpal, 2010).
u1 =0.7
u2 =0.5
std =0.3
Zα/2 =1.96
Zβ =0.84
n =(Zα/2 + Zβ)
2 · 2 · std2
(u1 − u2)2
n =35.28
Where u1 is average actionability of the model based approach during the first two weeks, u2 is
the average actionability using a uniform prior, Zα/2 is the Z-score for achieving significance
of 0.05, Zβ is the score for a Power of 0.8 , and n is the sample size for each group. According
to the above estimate the sample size per group should be 36 people for a total study sample
of 72 participants.
6.4 Envisioned results
For the simulation part, I expect to find that priors that are offset from the real probability
estimates, are very damaging and may increase the time it takes to find the optimal treatment
by an amount of time proportional to the difference in magnitude between the real and the
estimated priors. Informative priors on the other hand can truly accelerate the process likewise
in an amount of time proportional to the difference between the prior and the real probabilities.
For the study part of the proposed work, I expect to find that model-based approach is
significantly superior to a uniform prior and only slightly better to the survey approach. My
main hypothesis is that the survey approach may be enough in non-critical interventions.
70 6 A MODELS-BASED APPROACH TO SELECT INITIAL TREATMENT (PROPOSED)
My main reasoning behind this hypothesis is that people although may not have perfect
information about their preferences and forecast of effects of treatment, can still provide a
relatively good estimate that is better than using an uniform prior.
CHAPTER 7
General timeline for the proposal
The next timeline includes all of the activities described in chapters 4, 5 and 6:
Activity Start date End date Description
Models of effects and behavior November 20th January 20th Estimating and testing performanceof behavior and effects models fromalready collected data.
Android app upgrades November 20th February 1st Improvements and updates to theapp
IRB December 8th January 8th IRB submission and changes
Ideal priors simulation with contex-tual bandits
November 20th December 2nd
Persona priors simulation with con-textual bandits
January 20th February 4th
Study recruitment January 15th February 15th The study will be deployed simul-taneously at the University of Pitts-burgh, Carlow and Carnegie MellonUniversity.
Study start February 20th April 9th End of semester for U. PittsburghApril 20th, CMU May 19th, CarlowApril 24th
Data wrangling May 1st June 1st Initial data cleaning,transformation,and visualization
Hypothesis testing June 2nd August 2nd
Papers writing July 1st End of September
Thesis writing August 1st End of October
Thesis defense Early december
TABLE 7.1: Timeline
71
References
[Abbeel et al.2010] Pieter Abbeel, Adam Coates, and Andrew Y Ng. 2010. Autonomous heli-copter aerobatics through apprenticeship learning. The International Journal of RoboticsResearch, 29(13):1608–1639.
[Adan et al.2006] ANA Adan, Marco Fabbri, Vincenzo Natale, and Gemma Prat. 2006.Sleep beliefs scale (sbs) and circadian typology. Journal of Sleep Research, 15(2):125–132.
[Aggarwal and others2016] Charu C Aggarwal et al. 2016. Recommender systems. Springer.
[Akaike1998] Hirotogu Akaike. 1998. Information theory and an extension of the maximumlikelihood principle. In Selected papers of hirotugu akaike, pages 199–213. Springer.
[Auer et al.2002] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire.2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77.
[Aung et al.2017] Min Hane Aung, Mark Matthews, and Tanzeem Choudhury. 2017. Sensingbehavioral symptoms of mental health and delivering personalized interventions usingmobile technologies. Depression and anxiety, 34(7):603–609.
[Bandura1977] Albert Bandura. 1977. Self-efficacy: toward a unifying theory of behavioralchange. Psychological review, 84(2):191.
[Banovic et al.2017] Nikola Banovic, Anqi Wang, Yanfeng Jin, Christie Chang, JulianRamos, Anind Dey, and Jennifer Mankoff. 2017. Leveraging human routine modelsto detect and generate human behaviors. In Proceedings of the 2017 CHI Conference onHuman Factors in Computing Systems, pages 6683–6694. ACM.
[Bauer et al.2012] Jared Bauer, Sunny Consolvo, Benjamin Greenstein, Jonathan Schooler,Eric Wu, Nathaniel F. Watson, and Julie Kientz. 2012. ShutEye: Encouraging Awarenessof Healthy Sleep Recommendations with a Mobile, Peripheral Display. In Proceedingsof the 2012 ACM annual conference on Human Factors in Computing Systems - CHI ’12,page 1401, New York, New York, USA. ACM Press.
[Biener and Abrams1991] Lois Biener and David B Abrams. 1991. The contemplationladder: validation of a measure of readiness to consider smoking cessation. HealthPsychology, 10(5):360.
72
REFERENCES 73
[Breese et al.1998] John S Breese, David Heckerman, and Carl Kadie. 1998. Empiricalanalysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenthconference on Uncertainty in artificial intelligence, pages 43–52. Morgan KaufmannPublishers Inc.
[Bubeck et al.2012] Sébastien Bubeck, Nicolo Cesa-Bianchi, et al. 2012. Regret analysis ofstochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® inMachine Learning, 5(1):1–122.
[Buysse et al.1989] Daniel J Buysse, Charles F Reynolds III, Timothy H Monk, Susan RBerman, and David J Kupfer. 1989. The pittsburgh sleep quality index: a new instrumentfor psychiatric practice and research. Psychiatry research, 28(2):193–213.
[Buysse2014] Daniel J Buysse. 2014. Sleep health: can we define it? does it matter? Sleep,37(1):9–17.
[Centre for Clinical Interventions] Australia Centre for Clinical Interventions. Sleep hygiene.
[Cohen et al.1994] Sheldon Cohen, T Kamarck, R Mermelstein, et al. 1994. Perceived stressscale. Measuring stress: A guide for health and social scientists, 10.
[Collins and Varmus2015] Francis S Collins and Harold Varmus. 2015. A new initiative onprecision medicine. New England journal of medicine, 372(9):793–795.
[Dallery et al.2013] Jesse Dallery, Rachel N Cassidy, and Bethany R Raiff. 2013. Single-caseexperimental designs to evaluate novel technology-based health interventions. Journal ofmedical Internet research, 15(2):e22, feb.
[Daskalova et al.2016] Nediyana Daskalova, Danaë Metaxa-Kakavouli, Adrienne Tran,Nicole Nugent, Julie Boergers, John McGeary, and Jeff Huang. 2016. Sleepcoacher:A personalized automated self-experimentation system for sleep recommendations. InProceedings of the 29th Annual Symposium on User Interface Software and Technology,pages 347–358. ACM.
[Daskalova et al.2018] Nediyana Daskalova, Bongshin Lee, Jeff Huang, Chester Ni, and Jes-sica Lundin. 2018. Investigating the effectiveness of cohort-based sleep recommendations.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,2(3):101.
[Dingler and Pielot2015] Tilman Dingler and Martin Pielot. 2015. I’ll be there for you:Quantifying attentiveness towards mobile messaging. In Proceedings of the 17th Inter-national Conference on Human-Computer Interaction with Mobile Devices and Services,pages 1–5. ACM.
[Fogg2009] Bj Fogg. 2009. A behavior model for persuasive design. Proceedings of the 4thInternational Conference on Persuasive Technology - Persuasive ’09, page 1.
74 REFERENCES
[Gjoreski et al.2015] Hristijan Gjoreski, Simon Kozina, Matjaz Gams, Mitja Lustrek,Juan Antonio Álvarez-García, Jin-Hyuk Hong, Julian Ramos, Anind K Dey, MaurizioBocca, and Neal Patwari. 2015. Competitive live evaluations of activity-recognitionsystems. IEEE Pervasive Computing, 14(1):70–77.
[Grandner et al.2014] Michael A Grandner, Nicholas Jackson, Nalaka S Gooneratne, andNirav P Patel. 2014. The development of a questionnaire to assess sleep-related practices,beliefs, and attitudes. Behavioral sleep medicine, 12(2):123–142.
[Haack et al.2013] Monika Haack, Jorge Serrador, Daniel Cohen, Norah Simpson, HansMeier-Ewert, and Janet M Mullington. 2013. Increasing sleep duration to lower beat-to-beat blood pressure: a pilot study. Journal of sleep research, 22(3):295–304.
[Heather et al.2008] Nick Heather, David Smailes, and Paul Cassidy. 2008. Development ofa readiness ruler for use with alcohol brief interventions. Drug and alcohol dependence,98(3):235–240.
[Ho and Intille2005] Joyce Ho and Stephen S Intille. 2005. Using context-aware computingto reduce the perceived burden of interruptions from mobile devices. In Proceedings of theSIGCHI conference on Human factors in computing systems, pages 909–918. ACM.
[Hong et al.2012] Jin-Hyuk Hong, Julian Ramos, Choonsung Shin, and Anind K Dey. 2012.An activity recognition system for ambient assisted living environments. In InternationalCompetition on Evaluating AAL Systems Through Competitive Benchmarking, pages 148–158. Springer.
[Hong et al.2015] Jin-Hyuk Hong, Julian Ramos, and Anind K Dey. 2015. Toward personal-ized activity recognition systems with a semipopulation approach. IEEE Transactions onHuman-Machine Systems, 46(1):101–112.
[Horne and Östberg1976] Jim A Horne and Olov Östberg. 1976. A self-assessment question-naire to determine morningness-eveningness in human circadian rhythms. Internationaljournal of chronobiology.
[Horsch et al.2017] Corine Horsch, Sandor Spruit, Jaap Lancee, Rogier van Eijk, Robbert JanBeun, Mark Neerincx, and Willem-Paul Brinkman. 2017. Reminders make people adherebetter to a self-help sleep intervention. Health and technology, 7(2-3):173–188.
[Hunter2007] John D Hunter. 2007. Matplotlib: A 2d graphics environment. Computing inscience & engineering, 9(3):90–95.
[Jakicic et al.2016] John M Jakicic, Kelliann K Davis, Renee J Rogers, Wendy C King,Marsha D Marcus, Diane Helsel, Amy D Rickman, Abdus S Wahed, and Steven H Belle.2016. Effect of wearable technology combined with a lifestyle intervention on long-termweight loss: the idea randomized clinical trial. Jama, 316(11):1161–1171.
REFERENCES 75
[Jameson and Longo2015] J Larry Jameson and Dan L Longo. 2015. Precision medi-cine—personalized, problematic, and promising. Obstetrical & gynecological survey,70(10):612–614.
[Jones et al.2001 ] Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001–. SciPy: Opensource scientific tools for Python. [Online; accessed ].
[Katevas et al.2017] Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, and Joan Serrà.2017. Continual prediction of notification attendance with classical and deep networkapproaches. arXiv preprint arXiv:1712.07120.
[Kay and Wobbrock2016] Matthew Kay and J Wobbrock. 2016. Artool: aligned ranktransform for nonparametric factorial anovas. R package version 0.10, 2.
[Kim and Bang2016] Jeehyoung Kim and Heejung Bang. 2016. Three common misuses ofp values. Dental hypotheses, 7(3):73.
[Kitani et al.2012] Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and MartialHebert. 2012. Activity forecasting. In European Conference on Computer Vision, pages201–214. Springer.
[Klasnja and Veeraraghavan2018] Predrag Klasnja and Eric B Veeraraghavan. 2018. Rethink-ing evaluations of mhealth systems for behavior change. GetMobile: Mobile Computingand Communications, 22(2):11–14.
[Kramer et al.2019] Jan-Niklas Kramer, Florian Künzler, Varun Mishra, Bastien Presset,David Kotz, Shawna Smith, Urte Scholz, and Tobias Kowatsch. 2019. Investigatingintervention components and exploring states of receptivity for a smartphone app topromote physical activity: protocol of a microrandomized trial. JMIR research protocols,8(1):e11540.
[Kuleshov and Precup2014] Volodymyr Kuleshov and Doina Precup. 2014. Algorithms formulti-armed bandit problems. arXiv preprint arXiv:1402.6028.
[Lattimore and Szepesvári2019] Tor Lattimore and Csaba Szepesvári. 2019. Bandit al-gorithms.
[Levenson et al.2016] Jessica C Levenson, Elizabeth Miller, Bethany L Hafer, Mary F Re-idell, Daniel J Buysse, and Peter L Franzen. 2016. Pilot study of a sleep health promotionprogram for college students. Sleep health, 2(2):167–174.
[Liao et al.2019] Peng Liao, Kristjan Greenewald, Predrag Klasnja, and Susan Murphy.2019. Personalized heartsteps: A reinforcement learning algorithm for optimizing physicalactivity. arXiv preprint arXiv:1909.03539.
[McKinney and others2010] Wes McKinney et al. 2010. Data structures for statisticalcomputing in python. In Proceedings of the 9th Python in Science Conference, volume445, pages 51–56. Austin, TX.
76 REFERENCES
[Michie et al.2011] Susan Michie, Maartje M Van Stralen, and Robert West. 2011. Thebehaviour change wheel: a new method for characterising and designing behaviour changeinterventions. Implementation science, 6(1):42.
[Morawiec] Darius Morawiec. sklearn-porter. Transpile trained scikit-learn estimators to C,Java, JavaScript and others.
[Nagai et al.2013] Masato Nagai, Yasutake Tomata, Takashi Watanabe, Masako Kakizaki,and Ichiro Tsuji. 2013. Association between sleep duration, weight gain, and obesity forlong period. Sleep Medicine, 14(2):206–210.
[Nahum-Shani et al.2017] Inbal Nahum-Shani, Shawna N Smith, Bonnie J Spring, Linda MCollins, Katie Witkiewitz, Ambuj Tewari, and Susan A Murphy. 2017. Just-in-timeadaptive interventions (jitais) in mobile health: key components and design principles forongoing health behavior support. Annals of Behavioral Medicine, 52(6):446–462.
[Ng et al.2000] Andrew Y Ng, Stuart J Russell, et al. 2000. Algorithms for inverse reinforce-ment learning. In Icml, volume 1, page 2.
[of Apps2018] Business of Apps. 2018. Mobile app uninstall rate after 30 days.
[Okoshi et al.2016] Tadashi Okoshi, Hiroki Nozaki, Jin Nakazawa, Hideyuki Tokuda, JulianRamos, and Anind K Dey. 2016. Towards attention-aware adaptive notification on smartphones. Pervasive and Mobile Computing, 26:17–34.
[Oliphant2006] Travis E Oliphant. 2006. A guide to NumPy, volume 1. Trelgol PublishingUSA.
[Paredes et al.2014] Pablo Paredes, Ran Gilad-Bachrach, Mary Czerwinski, Asta Roseway,Kael Rowan, and Javier Hernandez. 2014. Poptherapy: Coping with stress throughpop-culture. In Proceedings of the 8th International Conference on Pervasive ComputingTechnologies for Healthcare, pages 109–117. ICST (Institute for Computer Sciences,Social-Informatics and . . . .
[Pedregosa et al.2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machinelearning in Python. Journal of Machine Learning Research, 12:2825–2830.
[Pielot et al.2014] Martin Pielot, Rodrigo De Oliveira, Haewoon Kwak, and Nuria Oliver.2014. Didn’t you see my message?: predicting attentiveness to mobile instant messages. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages3319–3328. ACM.
[Pielot et al.2015] Martin Pielot, Tilman Dingler, Jose San Pedro, and Nuria Oliver. 2015.When attention is not scarce-detecting boredom from mobile phone usage. In Proceedingsof the 2015 ACM international joint conference on pervasive and ubiquitous computing,
REFERENCES 77
pages 825–836. ACM.
[Pielot et al.2017] Martin Pielot, Bruno Cardoso, Kleomenis Katevas, Joan Serrà, AleksandarMatic, and Nuria Oliver. 2017. Beyond interruptibility: Predicting opportune moments toengage mobile phone users. Proceedings of the ACM on Interactive, Mobile, Wearable andUbiquitous Technologies, 1(3):91.
[Posner and Gehrman2011] Donn Posner and Philip R. Gehrman. 2011. Sleep Hygiene.Academic Press, jan.
[Prochaska and Velicer1997] James O Prochaska and Wayne F Velicer. 1997. The trans-theoretical model of health behavior change. American Journal of Health Promotion,12(1):38–48.
[Rabbi et al.2016] Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury.2016. MyBehavior: Automatic Personalized Health Feedback from User Behaviors andPreferences using Smartphones. In Proceedings of the 2015 ACM International JointConference on Pervasive and Ubiquitous Computing - UbiComp ’15, pages 707–718, NewYork, New York, USA. ACM Press.
[Rabbi et al.2018] Mashfiqui Rabbi, Min SH Aung, Geri Gay, M Cary Reid, and TanzeemChoudhury. 2018. Feasibility and acceptability of mobile phone–based auto-personalizedphysical activity recommendations for chronic pain self-management: Pilot study on adults.Journal of medical Internet research, 20(10):e10147.
[Rahman et al.2016] Tauhidur Rahman, Mary Czerwinski, Ran Gilad-Bachrach, and PaulJohns. 2016. Predicting about-to-eat moments for just-in-time eating intervention. InProceedings of the 6th International Conference on Digital Health Conference, pages141–150. ACM.
[Rasch and Born2013] Björn Rasch and Jan Born. 2013. About Sleep’s Role in Memory.Physiological Reviews, 93(2):681–766.
[ROBERTS et al.] MARY CATHERINE ROBERTS, AVERY ST DIZIER, and JOSHUAVAUGHAN. Multiobjective optimization: Portfolio optimization based on goal program-ming methods.
[Rothman1990] Kenneth J Rothman. 1990. No adjustments are needed for multiple compar-isons. Epidemiology, pages 43–46.
[Sakpal2010] Tushar Sakpal. 2010. Sample size estimation in clinical trial. Perspectives inclinical research, 1(2):67–67.
[Sankar and Parker2017] Pamela L Sankar and Lisa S Parker. 2017. The precision medicineinitiative’s all of us research program: an agenda for research on its ethical, legal, andsocial issues. Genetics in Medicine, 19(7):743.
78 REFERENCES
[Sano et al.2017] Akane Sano, Paul Johns, and Mary Czerwinski. 2017. Designing opportunestress intervention delivery timing using multi-modal data. In 2017 Seventh InternationalConference on Affective Computing and Intelligent Interaction (ACII), pages 346–353.IEEE.
[Saville1990] Dave J Saville. 1990. Multiple comparison procedures: the practical solution.The American Statistician, 44(2):174–180.
[Schwarz and others1978] Gideon Schwarz et al. 1978. Estimating the dimension of a model.The annals of statistics, 6(2):461–464.
[Stickgold et al.2001] R. Stickgold, J. A. Hobson, R. Fosse, and M. Fosse. 2001. Sleep,learning, and dreams: Off-line memory reprocessing. Science, 294(5544):1052–1057.
[Taylor and Stone2009] Matthew E Taylor and Peter Stone. 2009. Transfer learning forreinforcement learning domains: A survey. Journal of Machine Learning Research,10(Jul):1633–1685.
[Taylor Kyle2019] Silver Laura Taylor Kyle. 2019. Smartphone ownership is growing rapidlyaround the world, but not always equally.
[Tewari and Murphy2017] Ambuj Tewari and Susan A Murphy. 2017. From ads to interven-tions: Contextual bandits in mobile health. In Mobile Health, pages 495–517. Springer.
[Thiese et al.2016] Matthew S Thiese, Brenden Ronna, and Ulrike Ott. 2016. P valueinterpretations and considerations. Journal of thoracic disease, 8(9):E928.
[Walker2009] Matthew P. Walker. 2009. The role of sleep in cognition and emotion. Annalsof the New York Academy of Sciences, 1156:168–197.
[Wang et al.2014] Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari,Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T Campbell. 2014. Studentlife:assessing mental health, academic performance and behavioral trends of college studentsusing smartphones. In Proceedings of the 2014 ACM international joint conference onpervasive and ubiquitous computing, pages 3–14. ACM.
[Waskom et al.2017] Michael Waskom, Olga Botvinnik, Drew O’Kane, Paul Hobson, SauliusLukauskas, David C Gemperline, Tom Augspurger, Yaroslav Halchenko, John B. Cole,Jordi Warmenhoven, Julian de Ruiter, Cameron Pye, Stephan Hoyer, Jake Vanderplas, SantiVillalba, Gero Kunter, Eric Quintero, Pete Bachant, Marcel Martin, Kyle Meyer, AlistairMiles, Yoav Ram, Tal Yarkoni, Mike Lee Williams, Constantine Evans, Clark Fitzgerald,Brian, Chris Fonnesbeck, Antony Lee, and Adel Qalieh. 2017. mwaskom/seaborn: v0.8.1(september 2017), September.
[Wasserstein et al.2016] Ronald L Wasserstein, Nicole A Lazar, et al. 2016. The asa’sstatement on p-values: context, process, and purpose. The American Statistician, 70(2):129–133.
REFERENCES 79
[Wasserstein et al.2019] Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. 2019.Moving to a world beyond “p< 0.05”.
[Wobbrock et al.2011] Jacob O Wobbrock, Leah Findlater, Darren Gergle, and James JHiggins. 2011. The aligned rank transform for nonparametric factorial analyses usingonly anova procedures. In Proceedings of the SIGCHI conference on human factors incomputing systems, pages 143–146. ACM.
[Wolk et al.2005] Robert Wolk, Apoor S. Gami, Arturo Garcia-Touchard, Virend K. Somers,and S. H. Rahimtoola. 2005. Sleep and cardiovascular disease. Current Problems inCardiology, 30(12):625–662.
[Yang et al.2014] Guang Yang, Cora Sau Wan Lai, Joseph Cichon, Lei Ma, Wei Li, andWen-Biao Gan. 2014. Sleep promotes branch-specific formation of dendritic spines afterlearning. Science, 344(6188):1173–1178.
[Yom-Tov et al.2017] Elad Yom-Tov, Guy Feraru, Mark Kozdoba, Shie Mannor, MosheTennenholtz, and Irit Hochberg. 2017. Encouraging physical activity in patients withdiabetes: intervention using a reinforcement learning system. Journal of medical Internetresearch, 19(10):e338.
[Zhang et al.2019] Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, andSahand N Negahban. 2019. Warm-starting contextual bandits: robustly combiningsupervised and bandit feedback. arXiv preprint arXiv:1901.00301.
[Ziebart et al.2008] Brian D Ziebart, Andrew L Maas, Anind K Dey, and J Andrew Bagnell.2008. Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior.In Proceedings of the 10th international conference on Ubiquitous computing, pages 322–331. ACM.
top related