the university of chicago retrospective file …€¦ · · 2017-11-22the university of chicago...
TRANSCRIPT
THE UNIVERSITY OF CHICAGO
RETROSPECTIVE FILE MANAGEMENT FOR PRIVACY AND SECURITY IN
CLOUD STORAGE SERVICES
A DISSERTATION SUBMITTED TO
THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES
IN CANDIDACY FOR THE DEGREE OF
MASTER’S DEGREE
DEPARTMENT OF COMPUTER SCIENCE
BY
MARIA HYUN
CHICAGO, ILLINOIS
NOVEMBER 2017
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Cloud Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Privacy and Security Concerns for Cloud Storage . . . . . . . . . . . . . . . 42.3 Retrospective Privacy of Social Media . . . . . . . . . . . . . . . . . . . . . . 62.4 User Conceptualization of File Sharing . . . . . . . . . . . . . . . . . . . . . 72.5 Personal Information Management . . . . . . . . . . . . . . . . . . . . . . . 7
3 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 Cloud Storage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Data Collection and Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Recruitment and Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . 123.4 File Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 Survey Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.1 Generic Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5.2 File-Specific Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5.3 Features and Demographics . . . . . . . . . . . . . . . . . . . . . . . 17
4 DATA ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Aggregation and Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Qualitative Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.1 Participant Demographics and Account Usage . . . . . . . . . . . . . . . . . 215.2 Account Archeology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 File Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.4 File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5 File Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.6 File Co-ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.7 File Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
iii
6 DISCUSSION AND LIMITATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 476.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 497.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A SURVEY INSTRUMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.1 General question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.2 Content specific question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60A.3 Features and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
iv
LIST OF FIGURES
3.1 An overview of the survey procedures from the perspective of a participant. . . . 103.2 Screenshot of file-specific questions . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Number of files based on creation date . . . . . . . . . . . . . . . . . . . . . . . 245.2 Number of files based on last modification date . . . . . . . . . . . . . . . . . . 245.3 Comparison of file ownership and remembrance . . . . . . . . . . . . . . . . . . 255.4 File recollection and management decisions . . . . . . . . . . . . . . . . . . . . . 295.5 Comparison of file deletion and file ownership levels . . . . . . . . . . . . . . . . 315.6 Comparison of file encryption and participant technical background . . . . . . . 315.7 Future access and file management decision . . . . . . . . . . . . . . . . . . . . 315.8 Participant management decisions for additional copies . . . . . . . . . . . . . . 325.9 The effect of security perception on file management decisions . . . . . . . . . . 335.10 Ability to access the files and file management decisions . . . . . . . . . . . . . 335.11 Participant preferences for sharing decisions . . . . . . . . . . . . . . . . . . . . 385.12 Sharing type and sharing decisions . . . . . . . . . . . . . . . . . . . . . . . . . 385.13 Original shared status and sharing decision . . . . . . . . . . . . . . . . . . . . . 395.14 Cloud storage and file co-ownership . . . . . . . . . . . . . . . . . . . . . . . . . 415.15 The effect of file ownership on co-ownership . . . . . . . . . . . . . . . . . . . . 415.16 Original shared status and file versioning . . . . . . . . . . . . . . . . . . . . . . 415.17 Sharing method and file versioning . . . . . . . . . . . . . . . . . . . . . . . . . 425.18 Comparison of auto-archiving and delay tolerance . . . . . . . . . . . . . . . . . 46
v
LIST OF TABLES
3.1 Categories for selecting files in our stratified sample. . . . . . . . . . . . . . . . 13
5.1 Participant demographics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Descriptive statistics of participant accounts . . . . . . . . . . . . . . . . . . . . 235.3 Factors correlated with file recognition . . . . . . . . . . . . . . . . . . . . . . . 275.4 Factors correlated with file remembrance . . . . . . . . . . . . . . . . . . . . . . 285.5 Factors correlated with preferences for file deletion . . . . . . . . . . . . . . . . 345.6 Factors correlated with preferences for file encryption . . . . . . . . . . . . . . . 355.7 Factors correlated with wanting to stop sharing . . . . . . . . . . . . . . . . . . 39
vi
ACKNOWLEDGMENTS
아무 것도 염려하지 말고 다만 모든 일에 기도와 간구로,
너희 구할 것을 감사함으로 하나님께 아뢰라.
그리하면 모든 지각에 뛰어난 하나님의 평강이
그리스도 예수 안에서 너희 마음과 생각을 지키시리라.
(빌립보서 4:6-7)
First, I would like to express my sincere gratitude to my advisor, Prof. Ur, for the con-
tinuous support of my study, for his patience, motivation, and immense knowledge. His
guidance helped me in research and writing this thesis. I could not have imagined having a
better advisor and mentor for my MS thesis.
Besides my advisor, I would like to thank Prof. Kanich for his insightful comments and
encouragement, but also for his hard questions which incentivized me to widen my research
and account for various perspectives. My sincere thanks also goes to Taha. It was always my
pleasure to work with you. Also, I thank Miranda, who helped me to organize and edit my
thesis.
Last but not least, I would like to thank my family: my parents, my sister, Young-Eun,
and Modu, Dalbi, Bana for supporting me spiritually throughout writing this thesis and my
life in general.
vii
ABSTRACT
Users have accumulated years of personal data in cloud storage, creating potential privacy
and security risks. This agglomeration includes files retained or shared with others simply
out of momentum, rather than intention. We presented 100 online survey participants with a
stratified sample of 10 files currently stored in their own Dropbox or Google Drive accounts.
We asked about the origin of each file, whether the participant remembered that file was
stored there, and, when applicable, about that file’s sharing status. We also recorded partic-
ipants’ preferences moving forward for keeping, deleting, or encrypting those files, as well as
adjusting sharing settings. Participants had forgotten that half of the files they saw were in
the cloud. Participants recalled that 50.6% of the files they saw in the study were stored in
the cloud. Participants did not recognize 13.5% of the files they saw. Participants recognized
the remaining 35.9%, but had forgotten that the file was stored in the cloud. Moreover, out
of the ten files we asked about, the median number of files the subject remembered storing
to their cloud account was five. Overall, 83% of participants wanted to delete at least one
file they saw, while 13% wanted to unshare at least one file. 81% of participants responded
that it was important to keep at least one of the ten presented files safe from unauthorized
access, yet they had forgotten that file was stored in the cloud. Our combined results suggest
directions for retrospective cloud data management.
viii
CHAPTER 1
INTRODUCTION
1.1 Motivations
As cloud platforms for storage and backup have matured, many users have implicitly become
long-term users of these platforms. These users have years of their personal data stored in the
cloud, yet they have likely forgotten about the existence of most of this data. This state of
affairs has two troubling consequences. First, the agglomeration of a user’s personal data in
one location presents attackers with a very attractive single target. If an attacker successfully
impersonates the user (e.g., by guessing his or her password), the attacker can potentially
access all of the user’s data. Second, maintaining this large amount of data such that all of
it is accessible to the user on a moment’s notice is a tremendous waste of resources.
However, many of these concerns could be mitigated if users had an active role in man-
aging their data and better understood which files were stored in their cloud. Although
researchers have analyzed user perceptions and system limitations, there has been little re-
search from a user-centered perspective about what data users have stored in the cloud and
forgotten about, as well as what they would like to do with that data. Thus, we take the first
steps toward filling that gap. We investigated cloud storage usage, including why participants
originally stored files in the cloud, to determine optimal file management decisions.
1.2 Our Work
This thesis focuses on our user study to characterize the data participants have stored in
their cloud accounts. We also investigated three types of remediations for retrospective data
management: deleting old data, automatically encrypting old data, and moving old data to
low-energy archives. In our study, participants were first given a series of file management
1
options and instructed to identify their preferred outcome for each file and the relevant
parameters (i.e., service, file type, access permission, size of file).
We conducted a 100-participant online survey using Amazon’s Mechanical Turk. To
ground this survey concretely in a participants’ own stored data, we focused the survey
questions on ten files selected from a participant’s very own Dropbox or Google Drive in
a stratified sample. We used the APIs for Dropbox and Google Drive to show participants
these files and to characterize their account more broadly.
Our survey consisted of three parts. The first part was composed of generic questions
related to account information, such as account age and the main reason for using cloud
storage. Second, we asked questions related to the ten different files selected from the user’s
account. We investigated whether participants knew what the file was, whether they remem-
bered that it was stored in the cloud, and gauged whether they wanted to keep the file
as-is, or if they wanted to either delete or encrypt it. If the file was shared with other users,
either by name or via a shared link, we also asked about the origin of this sharing, as well
as whether sharing the file was still desired. Finally, we asked about user demographics and
general preferences related to the possibility of automated retrospective file management.
1.3 Findings
Our participants used either Google Drive or Dropbox for storing and sharing a nontrivial
number of files, and they had varied goals in using these services. The median number of
files stored in each participant’s cloud was 444.5. 71% used cloud storage for collaboration,
83% for sharing, 92% for archival purposes, and 5% for other reasons.
Overall, we found that the cloud storage accounts of these participants contained a mass
of data that was indeed forgotten, but not gone. Participants recalled that 50.6% of the files
they saw in the study were stored in the cloud. Participants did not recognize 13.5% of the
files they saw. Participants recognized the remaining 35.9%, but had forgotten that the file
2
was stored in the cloud. Moreover, out of the ten files we asked about, the median number
of files the subject remembered storing to their cloud account was five. The likelihood a
participant remembered a file was stored in the cloud varied significantly based on a number
of other factors, including file type, the participant’s access to the file (owner, editor, viewer),
file size, and when the file was last modified.
Participants’ responses to our questions about managing files in their cloud storage and
the sharing settings of those files revealed a latent need for retrospective data management.
83% of participants wanted to delete at least one file of the ten presented, and 13% wanted
to unshare at least one previously shared file.
Our study is the first to focus on cloud-user needs for retrospective file management by
grounding questions in a sample of the files stored in participants’ own cloud storage accounts.
81% of participants responded that it was important to keep at least one of the ten presented
files safe from unauthorized access, yet they had forgotten that file was stored in the cloud.
Such latent risks are exactly those that users have difficulty effectively understanding or
managing.
Moreover, using mixed-effects logistic regression, we investigated possible predictive fac-
tors for these file management preferences. Beyond a small number of factors, like the par-
ticipant’s access to the file, these models did not capture much of the rationale underlying
the decisions of participants. However, our study is the first step toward designing interfaces
and mechanisms for enabling retrospective file management in the cloud. Further research
into both understanding user perceptions of these archives and new methods of effectively
managing them can empower users to better deal with privacy and security threats.
3
CHAPTER 2
RELATED WORK
Here we summarize the history of cloud storage and associated privacy and security concerns
that have emerged. We then describe work that has been done to improve retrospective
privacy in social media and personal information management in email and other archives.
2.1 Cloud Storage
The advent of cloud storage was based on the reality of increasing amounts of data and
decreasing costs for storage. Cloud storage allows “ubiquitous, convenient, on-demand net-
work access” to its users at a low cost [42]. Moreover, cloud storage provides broad network
accesses by allowing thick and thin client platforms [42]. Data availability is also ensured
because cloud storage companies protect any failures [20,36]. As a result, cloud storage has
gained significant popularity. Consumer cloud services have developed primarily over the last
decade. Box announced online file sharing for personal use in 2005 and Dropbox followed
soon after. Eventually big companies such as Microsoft and Google started their services in
2012 [20], and some researchers predict that the global market for personal cloud storage is
projected to reach $71.3 billion USD by 2020 [25].
2.2 Privacy and Security Concerns for Cloud Storage
Despite its benefits, cloud storage has many implications for privacy and security. Careful
analysis of the architecture and workloads of such systems highlights vulnerabilities in their
usage and impact on users [20, 26, 63]. Computer experts have found security issues in the
implementations of cloud storage. Hu et al. evaluated cloud storage options from Mozy,
Carbonite, Dropbox, and CrashPlan, and found that no company offered any guarantees for
data integrity and availability, nor did they assume any liability for security breaches or data
4
loss [31]. Moreover, most free services do not offer data encryption, forcing data safety to
become the responsibility of the user. Although some solutions have been proposed to allow
users to take advantage of the cloud without compromising privacy and autonomy [52],
personal cloud storage is still vulnerable to many attacks. When personal information is
at risk, as in the 2014 case of Dropbox’s link disclosure vulnerability [28], users are left
vulnerable. While legal protections on data stored in the cloud dictate that users do have a
reasonable expectation of security and privacy [33], the question remains: how do providers
implement user-centered data management?
These issues are exacerbated because users do not fully understand how their data is
managed. It is not uncommon for private information to be uploaded to the cloud uninten-
tionally. Clark et al. discovered the majority of cloud users did not know that their private
photos were uploaded to their cloud storage [17].
Moreover, users still express distrust in the cloud. In Ion et al.’s cross-cultural study
of cloud usage, most participants perceived cloud storage to be less secure than local stor-
age [32]. This would explain why users are reluctant to store sensitive data in the cloud [2,
16,43,49,54].
Many of these concerns can be mitigated if users have a more active role in managing their
data and better understand which files are stored in their cloud. Although researchers have
analyzed user perceptions and system limitations, there has been little research from a user-
centered perspective about what data users have stored in the cloud and forgotten about and
what they would like to do with such data. We investigated cloud storage usage, including
why participants originally stored files in the cloud, to determine optimal file management
decisions.
5
2.3 Retrospective Privacy of Social Media
While surprisingly little work has investigated retrospective data management for cloud stor-
age, researchers have examined analogous questions concerning social media. Safeguarding
privacy in social media is especially complex because users make dynamic privacy decisions
based on context [50]. Nevertheless, the context of social media is a useful point of comparison
for cloud storage— support content that can either be shared publicly or kept private.
According the current researchers, temporality mediates whether users perceive content
to be public or private. It can also explains the changing relevance of posts over time [3,9,62].
The passage of time plays an important role for predicting the behaviors of users. Zhao et
al. found that there were two regions in social network services—personal and public—
people gradually moved their posts from the public region to the private by reevaluating and
reselecting the content depending on time [62]. Ayalon et al. showed that the willingness to
share and the relevancy of posts significantly dropped as time advances [3].
However, temporality cannot always predict what a user’s preferences will be in the
future [9]. Bauer et al. examined longitudinal aspects of Facebook privacy and expected
that content would become less valuable over time and thus users would want that content
to “fade away.” However, they found much more complex preferences. Participants wanted
roughly one-third of posts to indeed fade away after a month, but they surprisingly wanted
another one-third of posts to become more widely accessible one month later. Through the
qualitative responses of these participants, Bauer et al. found that these changes were caused
by a number of factors ranging from life events to nostalgia [9].
A study of retrospective privacy on Twitter demonstrates the limitations of current con-
tent control mechanisms in social media. Even if users withdraw tweets (e.g., by deleting
them), retweets may provide residual evidence and may even highlight when deleted tweets
are missing [44]. Cloud storage can create similar problems for users. They may not be fully
aware of the consequences of changing file-sharing settings. In this study, we investigated
6
the optimal choices that should be offered to cloud storage users and how to minimize the
negative consequences that could result from sharing files.
2.4 User Conceptualization of File Sharing
One of the advantages of cloud storage is sharing files with the service’s users. However,
there is no clear sharing characterization for cloud storage [27, 45, 64], and users experience
problems for understanding the functionality of cloud providers because they have inaccu-
rate conceptual models of the cloud [40]. In addition to unclear sharing characterization,
insufficient visibility of collaborator activities is one of the major problems of cloud shar-
ing [55,56,60].
Sharing privileges pose many issues within the cloud file sharing paradigm. Local files
usually have a single owner and others are only given editor and viewer privileges. In the
cloud, however, owner privileges can be assigned to others or to multiple people. This distinc-
tion is not always intuitive to users and requires more explanation [15]. Nuances of sharing
are central to how users understand the concept of cloud storage [29]. Users often refrain
from making decisions about shared-ownership files, even when they can and should, mainly
because they relegate authority to the original creator [47, 61]. Also, reluctance to delete
files results in clutter, frustrates shared users [46], and creates problems for file management.
Moreover, it is challenging and confusing for most users to understand the implications of
deleting from a shared folder [48]. This can be exacerbated in shared repositories, and users
must develop a variety of management structures and strategies [41].
2.5 Personal Information Management
Research on personal information management (PIM) began in the 1980s to help users better
store, organize, and retrieve collections of data [7, 10,11,14,39]. Researchers have suggested
7
PIM interfaces for web activities [21,35], email [4, 5, 10, 51,58,59], and local files [6, 8].
There are several technical and usability limitations of existing PIM systems, and users
struggle to manage volumes of information constantly increasing over time [12, 13]. People
naturally want to organize their information, regardless of data type or storage location, and
building software that is aware of these expectations will greatly decrease costs, errors, and
frustrations [53]. PIM must be adequately supported by current technologies.
The critics of PIM have focused on current research trends. These researchers have
adopted a cognitive approach to their PIM tools. One focus has been on understanding
memory issues, and these researchers found that memory problems hinder PIM [19, 23, 34].
Furthermore, many groups have designed systems to support known characteristics of mem-
ory [1, 18, 21, 24, 34, 35, 38, 57]. Elsweiler et al. tried to understand the memory lapses that
related to PIM. Their study focused on retrospective lapse—defined as forgetting details of
past events or previously acquired information—and found that this caused problems for ac-
tions performed in the present. They promoted reducing cognitive overload and maintaining
several organizational systems that minimized cognitive effort [23].
Other researchers adopted machine learning techniques to reduce user effort expended on
PIM [4, 53]. Ayodele et al. suggested an intelligent email assistant manager, which applied
semantic content learning tools to capture email conversation threads in order to group
them according to contextual relevancies [4]. Stumpf and Herlocker suggested TaskTracer,
which used machine learning techniques and past activities to reduce physical and cognitive
costs and errors to increase productivity [53]. TaskTracer monitored user interactions with a
computer, collected detailed records of user activities and resources accessed, used machine
learning techniques to detect task switches, and was thereby able to predict what the current
task was.
Lastly, little work has focused on PIM for the unique complications of consumer cloud
storage. Cloudsweeper, a cloud-based email protection system for PIM, let users remove or
8
“lock up” sensitive, unexpected, and rarely used information. While it effectively protected
some sensitive files [51], Cloudsweeper’s methods do not map directly to cloud storage. Thus,
we tried to determine participant preferences for their file management system to provide
better insight for PIM in cloud storage through a retrospective file management system.
9
CHAPTER 3
METHODOLOGY
Obtain UserConsent
OAuth Flow
Collect DataFrom API
GenericQuestions
DisplaySelected File
File SpecificQuestions
Repeated for 10 selected files
Features andDemographics
Questions
Figure 3.1: An overview of the survey procedures from the perspective of a participant.
To map out the needs and opportunities for helping users manage forgotten files in their
cloud storage accounts, our procedure combined programmatic access to the stored files
with a dynamic online survey. Due to their popularity and API availability, we chose to
implement our survey instrument for both Google Drive and Dropbox. The survey has three
main sections: (1) a set of generic questions regarding the use of cloud storage, (2) detailed
questions about a stratified sample of 10 files that each participant had in their actual Google
Drive or Dropbox account, and (3) a final section in which we asked about the potential for
automating file management and collected participant demographics. Figure 3.1 summarizes
our survey flow. Each step is detailed in the following sections.
3.1 Cloud Storage Services
While Dropbox has existed since 2007, Google Drive was only introduced in 2012. Both ser-
vices offer free and paid tiers. Dropbox offers 2GB of free storage, while Google Drive provides
15GB. Google’s free 15GB, however, are shared between all Google services, including Gmail
and Google Photos.
While the services offered by Google Drive and Dropbox are similar in the grand scheme,
some small differences impacted our study design. Dropbox and Google Drive provide sharing
in two distinct ways: the first one is sharing files via email, which is done on an individual
10
basis. The second method of sharing is via a link, where anyone with a link has access to
the file. Additionally, sharing can be transitive: a file shared from user A to user B can then
be shared from user B to user C, depending upon the permissions granted by user A. How
sharing works differs slightly between services: a Dropbox user sharing an individual file can
only give others viewing access; granting edit access requires the entire folder containing
the file to be shared. On the other hand, Google Drive allows its users to grant view and
edit access for both files and folders. Furthermore, for link sharing, Dropbox users with free
accounts are limited to share links with view access only, whereas Google Drive links can
apportion view or edit access. When asking specifically about shared files in our survey, we
did not consider Dropbox files shared via a link because they do not enable collaboration.
3.2 Data Collection and Ethics
An essential part of our study involved showing participants files in their own cloud storage
accounts and asking questions to gauge their receptiveness to different data management
options. We first presented users with a consent form explaining what API access we needed
and what information we would retain on our servers. After participants consented to the
study, we requested authorized access to the service using OAuth2, which allows our applica-
tion to programmatically access the files stored within the account. This mechanism allowed
us to be granted temporary access to these accounts without having to ask users for their
passwords. This access can be revoked by the user at any time.
After obtaining participant authorization, we used the official APIs provided by Dropbox
and Google Drive to collect the data. Specifically, we used the Dropbox API v2 and Google
Drive API v3. Because the number of files per account varied widely, and we needed the
full list of files in the account to perform a stratified sample, we optimized API calls to
ensure that the collection process was robust and relatively quick. As shown in Figure 3.1,
we programmatically collected this data while the participant completed the generic portion
11
of our survey.
Throughout this process, our primary concern was to maintain the privacy of all partici-
pants and to collect data in an ethical manner. We used multiple techniques to protect user
safety. The survey was hosted on an HTTPS domain with a valid certificate. We provided
participants with our detailed privacy policy, including our contact details. For both cloud
services, we limited the OAuth2 permission scope and requested only basic account informa-
tion along with the file/folder metadata needed for our survey. In terms of data storage, we
only stored the information we needed, including one-way hashes for any unique identifiers to
prevent retaining PII (Personally identifiable information). Furthermore, information such
as file names and the names of other users who shared files with the participants were only
displayed in-browser via direct API calls and were not retained on our servers.
3.3 Recruitment and Inclusion Criteria
We recruited participants on Amazon’s Mechanical Turk. We limited participants to North
America and also required them to have a previous approval rating of 95%+. As our goal
was to the investigate temporal file management and sharing decisions for cloud storage,
we preformed a preliminary screening of the survey participants using metadata from their
accounts and verified that they met our criteria for inclusion, which we also presented to
prospective participants in our Mechanical Turk HIT description. Our criteria included the
following stipulations:
• More than 50 total files in the cloud storage account
• At least one file that was older than 30 days
• At least one shared folder on Dropbox or at least ten shared files on Google Drive
These filters ensured that the participants’ accounts were sufficiently well used for us to
ask about various use cases. We had additional sanity checks to ensure that participants
could not attempt to trivially meet our requirements without using their own legitimate
12
Index Selected File Description1 Largest shared file of any type2 Largest unshared file of any type3 Shared media file of size greater than 250KB4 Unshared media files of size greater than 250KB5 Recently modified shared document6 Recently modified unshared document7 Old modified shared document8 Old modified unshared document9 Any shared file where participant is an editor10 Any file shared via link (Google Drive Only)
Table 3.1: Categories for selecting files in our stratified sample.
account.
We recruited participants through two classes of HITs. The first, we asked participants
to select the service they used more often for cloud storage, and resulted in 17 Dropbox users
and 67 Google Drive users. To able to compare for cross the service, we posted additional
Dropbox-only HITs, which resulted in additional 16 Dropbox users.
3.4 File Selection
To gauge the various factors that might affect the file management choices of participants,
we asked each participant about ten different files from their cloud storage account. While
random sampling of files would allow us to make statistical inferences about the entire
contents of the cloud storage account, our focus was instead on collecting perceptions about
as broad a set of files and use cases as possible. We therefore conducted a stratified sampling
strategy, which is outlined in Table 3.1. Within each of these ten categories, we randomly
selected one file from all files that met the specified criteria. If no files in the user’s account
matched a category (or if we had already asked about the only such file), we selected a
random file from the account in its place.
The first two categories (#1 and #2) were used to gauge perceptions of file size and
sharing. We selected each of the largest shared and unshared files present in a participant’s
13
cloud storage. Categories #3–#8 select files by varying file types, recency of edits, and
sharing status. Finally, to investigate how sharing modality affects answers, we varied the
sharing modality for categories #9 and #10. Because one cannot share a file for editing via
link, for Dropbox users, category #10 was replaced with a file that satisfies category #3
instead. Category #10 also asked Google Drive users about link-based sharing practices.
This stratified file selection enabled us to study various metrics across individual file types.
After performing this study with 100 participants, we collected information about 1,000 files
total. Due to an error, our survey software did not record three of these 1,000 responses. We,
therefore, report results for 997 files.
3.5 Survey Structure
Our online survey consisted of three main sections. The first and third sections covered
generic questions about cloud storage usage and demographics. The second section, which
was repeated ten times, asked a series of questions about each of the ten files selected in the
stratified sample. The questionnaire used for the survey can be found in the Appendix.
3.5.1 Generic Questions
The first set of questions targeted account information and usage trends. Specifically, we
asked about 1) account history, 2) account type, 3) reasons for using cloud storage, 4) device
usage and storage patterns, and 5) account management.
We asked participants when they originally created their cloud storage accounts. We then
inquired if they had a free or a paid versions, as this may impact expectations or use cases.
The next part of the generic survey queried whether the participant used that account for
work (or school), as well as for personal purposes. We further asked whether the account
was used for collaboration, sharing, file backup, or a combination of factors.
One benefit of cloud storage is that access is not limited to an individual machine. To
14
investigate how participants accessed their accounts, a subset of the generic survey questions
asked about how frequently this storage was accessed, and whether that access was through
the service’s website, desktop, or mobile application.
We then asked participants how often they replicated their cloud storage files on local
computers, and what proportions of their local files were also backed up in the cloud. We
defined a local file as any file stored on a user’s computer accessible without an Internet
connection, and cloud storage as any file only accessible with an Internet connection. We
speculated that responses to this set of questions would provide insight into the overall file
management strategies of participants.
Since cloud storage provides a finite capacity, we asked participants how often they run
out of storage space on their cloud accounts. Along the same lines, we also asked how
frequently they organize their cloud storage by deleting unnecessary files, moving files to
different folders, or performing similar clean-up tasks. Finally, we presented the participants
with a comprehensive list of popular cloud services and asked about the ones they had
used. Our list included Amazon Cloud Drive, Apple iCloud, Box, Dropbox, Google Drive,
Microsoft OneDrive, and SpiderOak One.
3.5.2 File-Specific Questions
We proceeded to the file-specific questions once we selected the ten files. The questions were
repeated for each specific file (ten times). Before participants began answering the questions
about each file, we had them view the file via a preview link provided by the respective cloud
service API. When participants clicked the file name on the screen, the actual file opened in
new tab. This was mandatory and the next button was disabled until that link was viewed.
Figure 3.2 shows a screenshot of what a participant saw at the beginning of each set of
file-specific questions.
The first set of questions asked to what extent the participants remembered storing the
15
Figure 3.2: What participants see at the beginning of a file-specific question. Clicking theview button triggers a new browser tab with a file preview provided by the cloud storageservice.
file. We defined two levels of recall: recognition and remembrance. Recognition refers to the
individual knowing what the file is after looking at it. Remembrance indicates that the user
remembered that the file resided in their cloud storage account prior to taking the survey.1
For files the participant recognized, we asked when and why they originally stored the file
and when they would most likely access it in future.
We also presented participants with three hypothetical file management decisions for
each file: keep the file as-is, delete the file, and encrypt the file. We described the benefits
and disadvantages of each decision. For instance, leaving the file as is provides instant access
but leaves the file vulnerable in the case of an account compromise. Deleting files eliminates
sensitive personal information but is irreversible, making the file inaccessible in the future.
Encryption protects a user’s data from attackers yet would entail the overhead of managing
an encryption key or password.
Participants chose their preferred management decision from these three choices for each
1. We also asked about a third level of recall, that of remembering whether the file was still retainedanywhere, including local or offline storage. It was highly correlated with remembrance, and thus we excludedit from further analysis (ρ : 0.91, p < 0.001).
16
file. They also explained their decision in a follow-up free-response question. Lastly, we also
asked whether the participants would want to automatically apply the same decision to other
files on their account.
If a file we showed the participant was shared with others, we asked a set of questions
regarding how and why that file was shared. First, we randomly selected a set of members it
was shared with, up to three, and asked participants if they knew the person and had been in
contact with them in the past year. We also asked participants if they wanted to change the
sharing preferences of the file and why or why not. To understand how users conceptualize
file changes on those that were shared but not edited recently, we also asked if they would
like their copy of a shared file to reflect the changes others made to the file, and vice versa.
Finally, for Google Drive participants, we asked a subset of similar questions about files
shared via link. These questions did not list the name of any participants with whom the
file was shared, but still aimed toward capturing the same set of concepts.
3.5.3 Features and Demographics
The final section of our survey included questions pertinent to additional features that could
possibly be added to cloud storage services in the future. Specifically, we asked about auto-
matic file management. That is, we asked whether auto-deletion, auto-archiving, and auto-
encryption would be useful for the participant. We also inquired about what kinds of files or
folders they would want to apply these automatic decisions to, if any. Finally, we collected
optional demographic information about our participants, including gender, age, occupation,
and if they had a degree or job in computer science or a related technical field.
17
CHAPTER 4
DATA ANALYSIS
4.1 Aggregation and Basic Statistics
Besides survey responses, we collected non-sensitive, non-personally identifiable metadata
from participant cloud storage accounts. Specifically, we calculated basic descriptive account
statistics, such as the number of bytes stored in the account, number of files, and percent
of files shared in each account. We then aggregated all this file metadata with our survey
analysis for further interpretation.
4.2 Qualitative Coding
We used a standard coding process to analyze free text responses. First, a researcher created
a codebook based on the text responses. This codebook included labels for each response
with definitions. After the first researcher finished creating the codebook, that researcher
and another researcher read through the same survey responses and assigned a code to
each using the codebook. After calibrating a small number of responses, both researchers
independently coded all participant answers and calculated a Cohen’s kappa coefficient to
determine agreement on the coding. The codebook for each question varied between three
and fifteen categories per question, and Cohen’s Kappa between the two coders was at least
0.61 for each question. After each researchers finished their coding, they also calibrated their
coding results by discussing each text responses and their assigned codes.
4.3 Regression Model
We ran a series of mixed-effects logistic regressions to understand what file-level metadata,
information about a given cloud storage account, and participant demographics correlated
18
with participant ability to recognize or remember files, and the decisions they made con-
cerning managing the file and its sharing settings. We chose to use a mixed model because
ten files in our model belonged to each participant and our mixed-effects logistic regression
included a participant-specific random factor to account for this non-independence of data.
We included the following account-specific independent variables in each of our regression
models:
• Service (Dropbox or Google Drive)
• Age of the cloud storage account (years)
• Whether the account was used for work purposes
• Whether the account was used for personal purposes
We included the following file-specific factors:
• File type (document, image, spreadsheet, video, or other)
• File access permissions (owner, editor, or viewer)
• Number of days (log10) since the file was last modified
• Size of the file (log10)
• Whether the file was shared, either with specific users or using a shared link
Because we hypothesized that usage patterns and management decisions might differ
between Dropbox and Google Drive, we included terms to capture the interaction between
the service and all five file-specific factors.
We included the following participant-specific factors:
• Participant’s age
• Participant’s technical background (degree or job in computer science or related fields)
We also ran a mixed-effects logistic regression to identify correlations between these
factors and whether or not participants preferred to keep sharing that file with up to three
different individuals with whom that file was shared (sharing recipients). The dependent
19
variable was ordinal and captured preferences to keep sharing (1), whether it did not matter
if the file was shared (2), or to stop sharing (3). We removed the sharing status independent
variable from the regression because we only modeled shared files in this regression. However,
we added an independent variable for participant responses about how recently they had been
in touch with the sharing recipient (within the past year, over a year ago, or they did not
know who that person was). We treated both the participant and the file as random factors
in our mixed model. Because shared files were only a fraction of our data set, we did not
include interaction terms in our model. In the body of the paper, we report the p-values for
factors that were significant in our models.
20
CHAPTER 5
RESULTS
5.1 Participant Demographics and Account Usage
Here we present an overview of the results of our survey as well as a statistical analysis of the
file recollection and management decisions of users as a function of various factors related
to the user and files in question.
Dropbox GDrive
Total # Participants 33 67
Gender Male 21 37Female 11 30
Not answered 1 0
Age <20 1 020-34 18 4735-49 8 18
50+ 5 2Not answered 1 0
Technical Yes 11 19Background No 21 48
Not answered 1 0
Table 5.1: Participant demographics.
Our participant pool contained a total of 100 individuals, of which 58% were male and
41% were female. The remaining 1% did not declare their gender. The ages of the individuals
varied from 19 to 68 with a mean of 32 years. We classified participant technical background
on whether the individual had a degree or a job in a computer science related field. Our
survey indicated that 69% of our participants did not have a strong technical background.
Table 5.1 provides the demographic details of our participants.
From the collected responses, we evaluated the overall cloud storage services that our
participants used. While our survey data included 33 Dropbox and 67 Google Drive users,
33% of our participants also used Microsoft OneDrive and 24% had an Apple iCloud account.
21
Other commonly used services included Amazon Cloud Drive and Box. Most participants
used at least two different cloud storage services. Only 17% of participants used one cloud
storage service. However, 36% of participants used two different cloud storage services, and
47% of participants used more than three cloud storage services.
Information regarding the usage of these accounts is presented in Table 5.2. While both
Dropbox and Google Drive services have attracted significant numbers of new users in recent
years [30,37], our participants had been using these services for quite some time: 85% of the
Google Drive and 94% of Dropbox accounts were older than 3 years old. Also, our survey
results indicated that median age of the accounts of our participants (of both cloud storage
services) was 4.9.1
There is also a good amount of variety in how these accounts are used. More than 80%
of the participants used accounts for both work/school and personal reasons, which can lead
to an intermingling of files stored for different purposes with different sensitivities. 48% of
participants used their accounts for either work/school or personal purposes at least once
a week and 86% participants used it at least once a month. However, it was relatively rare
for the cloud to completely supplant local file storage: 88% of individuals retained at least a
subset of their cloud files on some local storage medium.
Participants frequently used synced folders, the official website, and smart phones to
access their cloud storage. 12% of participants accessed their cloud storage through synced
folders at least once a day, while 15% of participants daily accessed their cloud storage
through the service’s website. However, smartphone use was quite limited. 19% of partici-
pants never used their smartphones to access their cloud storage and only 9% of participants
used their smartphones to access the storage medium on a daily basis.
To investigate the organization pattern of the cloud storage accounts, we asked our par-
ticipants how often they ran out of storage space. Although Google Drive and Dropbox only
1. We calculated account age based on the oldest file(s) in a participant’s cloud storage account.
22
offer limited cloud storage capacity, our participants did not run out of storage space fre-
quently. 45.5% of Dropbox users and 83.6% of Google Drive users never ran out of storage
space. However, storing capacity also correlated with storage shortage. 9.1% of our Dropbox
users and 4.5% of Google Drive users almost always ran out of storage space.
Even though half of our participants answered that they never ran out of storage space,
they organized their cloud storage on a yearly basis. 33% of participants organized their
cloud storage at least once a year, but less than once a month. 33% of participants organized
it less than once a year, but occasionally. However, 17% of participants never took the time
to organize their cloud storage.
Property Service Min Median Max Mean(SD)
Account Age DB 0.4 4.9 8.2 4.7(2.2)(Years) GD 0.05 4.9 5.3 4.1(1.5)
Account Size DB 0.12 2.0 54.1 3.9(9.1)(GB) GD 0.002 1.2 63.3 3.4(8.4)
# of Files DB 53 514 66.6K 3.5K(11.4K)GD 59 424 22.1K 1.8K(3.6K)
Avg. File(MB) DB 0.015 3.1 26.7 6.0(7.3)GD 0.15 7.3 131.1 14.6(21.5)
Largest File(MB) DP 3.1 295.9 4000 571(820)GD 8.4 506.9 9600 1100(1660)
Shared Files DP 0.02 21.5 100 38.6(38.5)(%) GD 0.3 44.0 99.7 46.7(34.4)
Table 5.2: Descriptive statistics of the Dropbox (DB) and Google Drive (GD) accounts ofparticipants.
5.2 Account Archeology
The statistics we collected about each cloud storage service showed the original characteristics
of the storage system. Google Drive was shared between all Google services, including Gmail
and Google Photos, while Dropbox only provided cloud storage services. Google Drive’s
23
median account size was smaller than that of Dropbox, which showed Google Drive users
used their cloud storage as a part of their entire account. Moreover, Google Drive was more
sharing oriented than Dropbox—44% of Google Drive files and 21.5% of Dropbox files were
shared with others.
Moreover, most files in the cloud storage accounts of participants were modified within the
past three years, but participants kept files that were older than three years also. Figure 5.1
and Figure 5.2 shows file creation and modification dates.
0 30 60 90 120 150 1800
10
20
Weeks
#of
file
s
Figure 5.1: [Based on creation date,Google Drive only.] Participants kepttheir files more than 3 years. Some par-ticipants kept their files for 9 years.
0 30 60 90 120 150 180
0
10
20
Weeks
#of
file
s
Figure 5.2: [Based on modification date,Dropbox and Google Drive.] Our sampleshowed that participants only continuedto modify files for three years.
5.3 File Recognition
First, we asked participants whether they recognized a file by directly asking: “(After looking
at this file), do you know what it is?” We found that the vast majority of the files we asked
about were recognized: Only 9.7% of Dropbox files and 15.5% of Google Drive files were not
recognized.
As described in the methodology, we ran a mixed-effects logistic regression to investigate
what factors specific to the file, account, or participant correlated with whether participants
24
Ownership Remembrance
Owner0 10 20 30 40 50 60 70 80 90 100
Editor0 10 20 30 40 50 60 70 80 90 100
Viewer 0 10 20 30 40 50 60 70 80 90 100Stronglyagree
Agree Neutral Disagree Stronglydisagree
Figure 5.3: Comparison of file ownership and remembrance. File ownership had a significantpositive correlation with remembering that the file was stored in the cloud (χ2(8, N = 862)= 32.24, p < .001).
recognized the files they were shown. Table 5.3 includes the results of this logistic regression.
Compared to the “other” file type,2 participants were more likely to recognize documents
(p < .001) and images (p = .027). Unsurprisingly, compared to files for which they were
the owner, participants were less likely to recognize files owned by others and for which
they only had editor (p = .001) or viewer (p = .011) permissions. We observed a significant
interaction effect in which participants were more likely to recognize files for which they had
editor permissions if they used Dropbox, rather than Google Drive (p = .018), but the cloud
storage service otherwise did not significantly impact file recognition. We did not observe
any significant correlations between whether the participant recognized the file and any of
the other file metadata factors or participant-specific factors we collected.
In addition to asking whether a participant recognized a file, we asked whether they
remembered that they retained it in cloud storage. Compared to recognizing the file, partici-
pants remembered retaining far fewer files. Users did not remember that 39.39% of Dropbox
files and 34.18% of Google Drive files were retained in cloud storage. While our non-random
sampling approach is not representative of all files stored within these accounts, this result
suggests that even though recalling the act of saving a file is not hard, with such large and
2. We categorized file type based on five different file extensions: document, image, spreadsheet, video,and other. The “other” category is a baseline of file type.
25
long-lived accounts it is difficult to keep track of what has been retained.
Using logistic regression (Table 5.4), we found that compared to files in the “other”
category, participants were more likely to remember video files (p = .025), yet less likely to
remember image files (p < .001). Unsurprisingly, participants were less likely to remember
files if they had only editor (p = .013) or viewer (p < .001) permissions, as opposed to being
the owner of the file. Participants were also more likely to remember a file the more recently
it had been modified (p < .001) or the larger its file size (p < .001). They were also more
likely to remember shared files than unshared files (p < .001). Participants were less likely to
remember a file if their cloud storage account was older (p < .001), although they were more
likely to remember a file if they, the participant, were older in age (p < .001). Participants
were less likely to remember files if they used their account for work purposes (p < .001) and
more likely to remember files if they used their account for personal purposes (p < .001).
To investigate the utility of these stored files, we asked participants to self-report when
they last accessed each file.3 We found that most files that we asked about were not recently
accessed. 28.76% of Dropbox files and 43.15% of Google Drive files were last accessed between
one month and one year ago. 41.18% of Dropbox files and 40.93% of Google Drive files
were last accessed between one and five years ago. Regarding potential future utility, our
participants answered that 30.13% of Google Drive files and 23.03% of Dropbox files would
most likely never be accessed again. While copious cheap or free storage makes such “write
only” archives tenable, if a user is to store sensitive data here without expecting it to provide
future benefit, the risks of such an archive clearly outweigh the rewards. Participants used
their cloud storage for various reasons. We also investigated the reasons why participants
originally stored the files using the cloud. 21.0% of files were stored for backup, 15.9% for
work, 12.3% for access advantages. Even though it was not the main reason for storing the
3. Last access time was not available via the API, and only modification date was available. However,modification date was not limited to survey participants. The last modification date also recorded whenother owners or editors modified the file.
26
Table 5.3: The results of a mixed-effects logistic regression to identify what factors werecorrelated with recognizing what the file was (baseline: not recognized). Non-italicizedvalues in the baseline column specify the baseline category for terms representing categoricalvariables. Italicized values in the baseline column indicate the units for numerical terms.Significant p-values are bolded.
Factor Baseline / Units Coefficient Std. Error z value p
Service: Dropbox Google Drive -0.727 2.219 -0.328 0.743File Type: Document Other 1.997 0.488 4.090 <.001File Type: Image Other 0.940 0.424 2.215 0.027File Type: Spreadsheet Other 1.209 0.896 1.349 0.177File Type: Video Other 1.483 1.070 1.386 0.166Access: Editor Owner -2.143 0.653 -3.281 0.001Access: Viewer Owner -1.690 0.664 -2.546 0.011Days Since Modified log10(days+1) -0.308 0.285 -1.082 0.279File Size log10(bytes) 0.264 0.142 1.862 0.062Shared Not shared -0.058 0.541 -0.107 0.915Account Age Years 0.274 0.166 1.654 0.098Participant Tech. Background No 0.350 0.429 0.816 0.414Participant Age Years 0.016 0.021 0.772 0.440Account for Work Purposes No 0.491 0.427 1.150 0.250Account for Personal Purposes No -0.147 0.528 -0.278 0.781Service: Dropbox * File Type: Document Google Drive, Other 0.050 0.866 0.058 0.954Service: Dropbox * File Type: Image Google Drive, Other 0.425 0.733 0.580 0.562Service: Dropbox * File Type: Spreadsheet Google Drive, Other 0.152 1.499 0.102 0.919Service: Dropbox * File Type: Video Google Drive, Other 0.187 1.702 0.110 0.912Service: Dropbox * Access Type: Editor Google Drive, Owner 2.983 1.258 2.371 0.018Service: Dropbox * Access Type: Viewer Google Drive, Owner 1.038 1.688 0.615 0.539Service: Dropbox * Days Since Modified Google Drive, N/A -0.538 0.591 -0.911 0.362Service: Dropbox * File Size Google Drive, N/A 0.276 0.213 1.297 0.195Service: Dropbox * Shared Google Drive, Not 0.754 0.883 0.853 0.393
files, 3.4% of participants used their cloud storage for keeping personal memories, such as
family pictures and love letters with their spouses. P45, a Google Drive user, answered that
he kept the files, “Just because they are my father’s feet. haha I know it sounds weird but
the day he is gone, I want to remember everything :( ”
Lastly, we analyzed recognition and remembrance by participants. Most participants had
at least one file that they did not recognize or remember. 59% of participants had at least
one file that they did not recognize and 81% of participants had at least one file that they
did not remember. The average number of files that participants did not recognize was 1.35.
The average number of files that participant did not remember was 3.3. Our participants
only remembered five out of ten files, and only 10% of participants fully recognized and
remembered their survey files.
27
Table 5.4: The results of a mixed-effects logistic regression to identify what factors werecorrelated with remembering that the file was stored in the cloud, which was recordedon a five-point Likert scale coded as an integer from -2 (the participant strongly disagreesthat they remembered the file was stored in the cloud) to 2 (the participant strongly agreesthat they remembered the file was stored in the cloud). Non-italicized values in the baselinecolumn specify the baseline category for terms representing categorical variables. Italicizedvalues in the baseline column indicate the units for numerical terms. Significant p-values arebolded.
Factor Baseline / Units Coefficient Std. Error z value p
Service: Dropbox Google Drive -1.444 1.111 -1.300 0.193File Type: Document Other 0.102 0.220 0.465 0.642File Type: Image Other -0.150 0.002 -88.197 <.001File Type: Spreadsheet Other -0.063 0.599 -0.104 0.917File Type: Video Other 1.467 0.655 2.239 0.025Access: Editor Owner -1.067 0.428 -2.493 0.013Access: Viewer Owner -3.09 0.455 -6.789 <.001Days Since Modified log10(days+1) -0.658 0.002 -385.835 <.001File Size log10(bytes) 0.058 0.002 34.251 <.001Shared Not shared 0.375 0.002 220.280 <.001Account Age Years -0.126 0.002 -73.893 <.001Participant Tech. Background No -0.459 0.433 -1.061 0.289Participant Age Years 0.011 0.002 6.491 <.001Account for Work Purposes No -0.382 0.002 -212.773 <.001Account for Personal Purposes No 0.726 0.002 404.017 <.001Service: Dropbox * File Type: Document Google Drive, Other 0.025 0.483 0.051 0.959Service: Dropbox * File Type: Image Google Drive, Other 0.106 0.403 0.263 0.793Service: Dropbox * File Type: Spreadsheet Google Drive, Other 0.682 0.988 0.690 0.490Service: Dropbox * File Type: Video Google Drive, Other -2.348 0.917 -2.560 0.010Service: Dropbox * Access Type: Editor Google Drive, Owner 1.549 0.688 2.250 0.024Service: Dropbox * Access Type: Viewer Google Drive, Owner 4.084 1.187 3.441 <.001Service: Dropbox * Days Since Modified Google Drive, N/A -0.294 0.259 -1.133 0.257Service: Dropbox * File Size Google Drive, N/A 0.294 0.095 3.086 0.002Service: Dropbox * Shared Google Drive, Not -0.493 0.428 -1.154 0.249
5.4 File Management
To assess file management needs, we asked our participants what file management decision
they wanted to perform for each file: encrypt the file in place, delete it, or keep it as-is.
Participants wanted to keep 57.93% of the files they saw in our survey as-is, delete 35.24% of
files, and encrypt 6.83% of files. Recollection of files was highly correlated with file manage-
ment decisions. Participants were more likely to delete files if they did not recognize them.
They typically kept the file as-is if they recognized and remembered it. Also, participants
were more likely to encrypt the file if they recalled it (Figure 5.4).
Participants were more likely to prefer deleting files if they only had editor (p = .008)
28
Recognition &Remembrance File Management Decision
Not recognized0 10 20 30 40 50 60 70 80 90 100
Recognized butnot remembered 0 10 20 30 40 50 60 70 80 90 100
Recognized andremembered 0 10 20 30 40 50 60 70 80 90 100
Keep as-is Encrypt Delete
Figure 5.4: Participant management decisions on files across the possible combinations of filerecognition and remembrance. Our statistics suggest these correlations are significant (χ2(4,N = 997) = 260.26, p < .001).
or viewer (p < .001) permissions, as opposed to being the owner of the file. This effect,
however, was far more muted for files on Dropbox than for those on Google Drive. There
was a significant negative interaction between the service and the access permissions in
predicting preferences for file deletion. We did not observe any other significant main effects
for predicting which files a participant would express a preference for deleting, nor which
files participants were more likely to delete rather than keep as-is (Table 5.5).
As with our regression model identifying which file-based, account-based, and participant-
based features correlated with preferences for encrypting a file, we observed few significant
correlations between these factors and participant preferences to encrypt a file. We observed
that participants with a technical background, relative to participants without such a back-
ground, were more likely to choose to encrypt a file (p = .036). In addition, participants who
used their cloud storage account for work purposes were less likely to choose to encrypt a
file (p = .013). We did not observe any other significant correlations (Table 5.6).
We asked participants why they made each file management decision. Participants had
multiple reasons for wanting to keep files as-is. 21.1% of keep as-is decisions were based on the
fact that participants might need the file in the future. P5, a Google Drive user, mentioned,
“I might need it if I am ever audited, and I don’t know how long I need to keep tax-related
29
for” for a tax-related file they were storing. For 19.1% of keep as-is decisions, participants
suggested that they did not care about the file because it did not contain any private or
sensitive information and they wanted to keep it. For instance, one participant described,
“There is nothing about the file that I would be concerned about during a data breach”.
17.7% of the responses emphasized saving files for backup purposes, and 13.9% mentioned
that files should be kept as-is because participants wanted to access the files remotely and
across multiple devices.
For 69.2% of delete decisions, participants mentioned that the files were no longer useful.
When questioned about one of the images we displayed, P27 said, “I don’t need it anymore
and that folder is full of junk photos.” 12.0% of delete decisions were determined because
participants did not remember the file in question. Participants answered that they wanted to
delete files to clear up space for 10% of delete decisions. Another popular reason for deleting
files was because the content was said to be personal and users wanted to prevent unau-
thorized access. One participant worried about a personal photo and said, “It’s a personal
photo of my wife and I don’t want anyone else to see it”.
Encryption was not as common as deletion. For 30.6% of encrypt decisions, participants
answered that the file contained private information. 27.8% of encryption decisions were
for security purposes. The responses suggested participants encrypted files that contained
sensitive information. P44’s recorded response said “It is a financial document that I would
not want to be public”. We also found instances where users wanted to encrypt pictures and
videos.
The possibility to access a file in the future also impacted file management decisions.
79% of files that participants said that they would access in the future were kept as-is.
Participants answered that they wanted to delete 10.7% of files that they would access in
the future. Therefore, we investigated why participants might want to delete files even if
they thought they would access them in the future. Participants had second copies of 27.4%
30
Ownership Deletion Decision
Owner0 10 20 30 40 50 60 70 80 90 100
Editor0 10 20 30 40 50 60 70 80 90 100
Viewer0 10 20 30 40 50 60 70 80 90 100
Do not Delete Delete
Figure 5.5: Comparison of deletion and file ownership levels. The ownership level significantlycorrelated with the decision to delete a file (χ2(2, N = 928) = 13.81, p = .001).
Participant
Background Encryption Decision
Technical0 10 20 30 40 50 60 70 80 90 100
Non-technical0 10 20 30 40 50 60 70 80 90 100
Do not encrypt Encrypt
Figure 5.6: Comparison of file encryption and participant technical background. If the par-ticipant had a technical background, they were more likely to encrypt the file (χ2(1, N =645) = 8.14, p = .004).
Future Access File Management Decision
Access0 10 20 30 40 50 60 70 80 90 100
Do not access0 10 20 30 40 50 60 70 80 90 100
Keep as-is Encrypt Delete
Figure 5.7: Our participants were more likely to delete files when they expected they wouldnever need to access the files in the future (χ2(2, N = 997) = 272.84, p < .001).
of files that they wanted to delete. However, participants also answered that they wanted
to delete 60% of those files because they did not need the files. Also, participants elected
to delete 13.3% of those files to save space. Participants stated that they wanted to delete
64.6% of files that they would not need to access in the future (Figure 5.10).
In terms of managing multiple copies of files, we thought participants would be more
31
Second Copy File Management Decision
Yes0 10 20 30 40 50 60 70 80 90 100
No0 10 20 30 40 50 60 70 80 90 100
Do not know0 10 20 30 40 50 60 70 80 90 100
Keep as-is Encrypt Delete
Figure 5.8: Our participants were more likely to keep files as-is when they had a secondcopy. However, when they did not have a second copy, they were more likely to delete thefile (χ2(4, N = 997) = 130.85, p < .001).
likely to delete their files when they had a second copy. However, our results showed that
participants were more likely to keep their files when they had extra copies. This suggested
that participants kept multiple copies when they thought files were important (Figure 5.8).
Lastly, we also investigated how security perceptions affected file management decisions.
Our participants thought that never losing the ability to access a file was more important
than security. 26.3% of participants answered that keeping the file safe from unauthorized
access was important, while 40.3% of participants thought never losing the ability to access
a file was important. Participants who considered security important were more likely to
choose file encryption (Figure 5.9), while participants who thought ability to access a file
was important were more likely to keep a file as-is. However, if ability to access a file was not
important to participants, they were more likely to delete the files (Figure 5.10). We think
that participants who do not want to lose the ability to access a file also assume that other
people can easily access the file. Also, participants do not want to spend time on encryption.
They were likely to delete files whether the file had personal information or not.
32
Keep the file safe fromunauthorized access File Management Decision
Important0 10 20 30 40 50 60 70 80 90 100
Not important0 10 20 30 40 50 60 70 80 90 100
Keep as-is Encrypt Delete
Figure 5.9: Our participants were more likely to encrypt files when they wanted to preventunauthorized file access (χ2(2, N = 997) = 205.55, p < .001).
Never lose the abilityto access the file File Management Decision
Important0 10 20 30 40 50 60 70 80 90 100
Not important0 10 20 30 40 50 60 70 80 90 100
Keep as-is Encrypt Delete
Figure 5.10: Our participants were more likely to keep the files as-is when they thought neverlosing the ability to access a file was important (χ2(2, N = 997) = 310.42, p < .001).
33
Table 5.5: The results of a mixed-effects logistic regression to identify what factors werecorrelated with expressing a preference to delete the file shown, as opposed to keepingthe file as-is. Files the participant wanted to encrypt are excluded from this model. Non-italicized values in the baseline column specify the baseline category for terms representingcategorical variables. Italicized values in the baseline column indicate the units for numericalterms. Significant p-values are bolded.
Factor Baseline / Units Coefficient Std. Error z value p
Service: Dropbox Google Drive -2.342 1.556 -1.505 0.132File Type: Document Other -0.375 0.335 -1.121 0.262File Type: Image Other -0.226 0.337 -0.671 0.502File Type: Spreadsheet Other 1.233 0.696 1.772 0.076File Type: Video Other -1.143 0.683 -1.673 0.094Access: Editor Owner 1.379 0.518 2.665 0.008Access: Viewer Owner 2.054 0.535 3.838 <.001Days Since Modified log10(days+1) 0.077 0.189 0.407 0.684File Size log10(bytes) -0.149 0.105 -1.419 0.156Shared Not shared -0.361 0.359 -1.006 0.314Account Age Years -0.186 0.142 -1.316 0.188Participant Tech. Background No -0.129 0.362 -0.355 0.722Participant Age Years -0.018 0.017 -1.012 0.312Account for Work Purposes No -0.045 0.361 -0.123 0.902Account for Personal Purposes No -0.420 0.435 -0.964 0.335Service: Dropbox * File Type: Document Google Drive, Other 1.024 0.631 1.623 0.105Service: Dropbox * File Type: Image Google Drive, Other 1.208 0.602 2.007 0.045Service: Dropbox * File Type: Spreadsheet Google Drive, Other -0.158 1.278 -0.123 0.902Service: Dropbox * File Type: Video Google Drive, Other 1.350 1.073 1.258 0.208Service: Dropbox * Access Type: Editor Google Drive, Owner -2.924 0.864 -3.385 <.001Service: Dropbox * Access Type: Viewer Google Drive, Owner -3.322 1.624 -2.045 0.041Service: Dropbox * Days Since Modified Google Drive, N/A 0.351 0.355 0.989 0.322Service: Dropbox * File Size Google Drive, N/A 0.020 0.160 0.127 0.899Service: Dropbox * Shared Google Drive, Not 0.859 0.611 1.406 0.160
34
Table 5.6: The results of a mixed-effects logistic regression to identify what factors werecorrelated with expressing a preference to encrypt the file shown, as opposed tokeeping the file as-is. Files the participant wanted to delete are excluded from this model.Non-italicized values in the baseline column specify the baseline category for terms repre-senting categorical variables. Italicized values in the baseline column indicate the units fornumerical terms. Significant p-values are bolded.
Factor Baseline / Units Coefficient Std. Error z value p
Service: Dropbox Google Drive -4.400 2.808 -1.567 0.117File Type: Document Other -0.348 0.645 -0.539 0.590File Type: Image Other -0.424 0.680 -0.624 0.533File Type: Spreadsheet Other -24.054 420.899 -0.057 0.954File Type: Video Other -1.527 1.340 -1.139 0.255Access: Editor Owner 0.255 0.982 0.260 0.795Access: Viewer Owner -23.491 280.600 -0.084 0.933Days Since Modified log10(days+1) -0.034 0.341 -0.099 0.921File Size log10(bytes) -0.256 0.189 -1.355 0.176Shared Not shared -0.086 0.634 -0.135 0.892Account Age Years 0.105 0.234 0.451 0.652Participant Tech. Background No 1.177 0.562 2.095 0.036Participant Age Years -0.032 0.029 -1.089 0.276Account for Work Purposes No -1.37 0.549 -2.486 0.013Account for Personal Purposes No -0.594 0.624 -0.952 0.341Service: Dropbox * File Type: Document Google Drive, Other 1.098 1.228 0.894 0.371Service: Dropbox * File Type: Image Google Drive, Other 1.375 1.108 1.241 0.215Service: Dropbox * File Type: Spreadsheet Google Drive, Other 28.213 420.896 0.067 0.947Service: Dropbox * File Type: Video Google Drive, Other 1.583 1.987 0.797 0.426Service: Dropbox * Access Type: Editor Google Drive, Owner 0.156 1.442 0.108 0.914Service: Dropbox * Access Type: Viewer Google Drive, Owner 24.414 280.597 0.087 0.931Service: Dropbox * Days Since Modified Google Drive, N/A -0.086 0.571 -0.150 0.881Service: Dropbox * File Size Google Drive, N/A 0.517 0.290 1.783 0.075Service: Dropbox * Shared Google Drive, Not 0.189 1.056 0.179 0.858
35
5.5 File Sharing
Besides asking about file retention for each file, we also asked about whether users wanted to
maintain sharing relationships. We asked this question for 212 files and got 447 file-recipient
pairs. For each shared file, there was a range of 1 to 19 shared individuals. If there were only
1–3 individuals the file was shared with, we asked questions pertaining to them. However,
if there were four or more shared individuals, we randomly selected three. As a result, we
focused on 80 files that were shared with one person, 29 files that were shared with 2 people,
and 103 files shared with more than three people. Most participants wanted to keep the same
sharing decision for each file: Only 20.1% of shared with two people had different sharing
decisions. 13.5% of shared with three people had different sharing decisions.
Participants wanted to keep sharing 40.7% of these file-recipient pairs, stop sharing 11.4%
of these file-recipient pairs, and did not care about 47.9% of the file-recipient pairs.
In our regression of participant preferences about whether or not to continue sharing files
that were shared with one or more other users by name, rather than through a shared link,
we found that a handful of factors correlated with participant preferences, and the regression
results are shown in Table 5.7. Unsurprisingly, participants tended toward continuing to share
files when they had communicated in the past year with the person with whom the file was
shared (p < .001). Dropbox participants were more likely to want to keep sharing files than
Google Drive participants (p = .011). Furthermore, the use of accounts for work purposes
had a nuanced correlation with sharing preferences (p = .044).
Whether participants were in touch (had communicated with the sharing recipient in the
last year) was highly correlated with participants wanting to keep sharing files. Participants
in touch with the recipient definitely wanted to keep sharing with the recipient for 58% of
file-recipient pairs. In contrast, they definitely wanted to keep sharing for only 20% of file-
recipient pairs when they were out of touch (had not communicated in the past year) and
12% of files in cases when they did not know who the recipient was. Participants definitely
36
wanted to stop sharing files for 5% of pairs when they were in touch with the recipient, 25%
of pairs when they were out of touch, and 21% of pairs when they did not know who the
recipient was.
While the proportion of files participants definitely wanted to stop sharing with a partic-
ular person was similar for Dropbox (12%) and Google Drive (15%), the difference was in the
strength of the preference to keep sharing. For particular file-recipient pairs, 59% of Dropbox
participants definitely wanted to keep sharing the file, but Google Drive participants only
definitely wanted to keep sharing 22% of file-recipient pairs. For the majority of Google Drive
pairs (63%), participants did not care whether or not the file was still shared, whereas the
same was true for only 29% of Dropbox pairs. Participants who used their accounts for work
purposes did not care whether or not files continued to be shared for 56% of file-recipient
pairs. The same was true for only 33% of pairs when participants did not use their accounts
for work purposes. Participants who did not use their accounts for work purposes wanted to
both definitely keep sharing and stop sharing files at higher rates than participants who did
use their accounts for work purposes.
When participants were asked why they originally shared a file, the main reasons were for
work (38.9%). Another major reason to share files was to provide access, which accounted
for 17.0% of the responses. Participants who mentioned they would like to continue sharing
the file stated similar reasons as to why they shared the file in the first place. Participants
answered that they wanted to provide access for 44.1% of keep sharing decisions. 17.6%
of keep sharing decisions were based on the fact that participants wanted to keep sharing
files they collaborated on with other individuals. Participants said that there was no obvious
reason to stop sharing the file for 3% of keep sharing decisions. As an example, P25 mentioned
“There is no reason to stop. They don’t need access to it for anything important, but its not
necessary to stop sharing.”
On the other hand, users also had interesting reasons for deciding to stop sharing files.
37
Recipient Sharing Decision
In touch0 10 20 30 40 50 60 70 80 90 100
Out of touch0 10 20 30 40 50 60 70 80 90 100
Don’t know them0 10 20 30 40 50 60 70 80 90 100
Keep sharing Neutral Stop sharing
Figure 5.11: Participant preferences for definitely continuing to share files (keep sharing), notcaring whether or not the file continues to be shared (neutral), or to definitely stop sharinga file (stop sharing) across file-recipient pairs based on whether the participant said theywere in touch with the recipient (had communicated in the past year), out of touch withthe recipient (had not communicated in the past year), or did not know the recipient (don’tknow them).
Sharing Method Sharing Decision
Via e-mail0 10 20 30 40 50 60 70 80 90 100
Via link0 10 20 30 40 50 60 70 80 90 100
Keep sharing Neutral Stop sharing
Figure 5.12: Our participants were more likely to keep sharing a file if they had shared itvia e-mail. However, they were more likely to stop sharing or they did not care about thesharing status if they had used a link to share the file (χ2(2, N = 562) = 17.25, p < .001).
For Participants decided to stop sharing files 41.6% of the time because they could not
remember the recipient or were no longer in communication. Participants also answered that
the task pertinent to the file was completed in relation to 41.7% of stop sharing decisions.
One participant who wanted to stop sharing said: “Because I don’t remember sharing it with
them in the first place.”
38
Original Sharer Sharing Decision
Shared with all of them0 10 20 30 40 50 60 70 80 90 100
Shared with no one0 10 20 30 40 50 60 70 80 90 100
Don’t know0 10 20 30 40 50 60 70 80 90 100
Keep sharing Neutral Stop sharing
Figure 5.13: If a participant originally shared the file, they were more likely to wanted tostop sharing it. However, if the participant was the recipient, they were less likely to wantto stop sharing (χ2(6, N = 447) = 54.16, p < .001).
Table 5.7: The results of a mixed-effects logistic regression to identify what factors were cor-related with expressing a preference to stop sharing the file shown. In particular thedependent variable is an ordinal variable reflecting a preference to keep sharing (1), whetherthe sharing setting does not matter (2), or to stop sharing (3). Non-italicized values in thebaseline column specify the baseline category for terms representing categorical variables.Italicized values in the baseline column indicate the units for numerical terms. Significantp-values are bolded.
Factor Baseline / Units Coefficient Std. Error z value p
Service: Dropbox Google Drive -5.159 2.030 -2.541 0.011File Type: Document Other -0.400 1.552 -0.258 0.797File Type: Image Other 0.059 1.505 0.039 0.969File Type: Spreadsheet Other 1.202 2.109 0.570 0.569File Type: Video Other -4.871 2.618 -1.861 0.063Access: Editor Owner 0.748 1.217 0.615 0.539Access: Viewer Owner 1.475 3.650 0.404 0.686Days Since Modified log10(days+1) 2.055 1.112 1.849 0.065File Size log10(bytes) 0.748 0.460 1.625 0.104Account Age Years -0.237 0.823 -0.288 0.773Participant Tech. Background No -1.341 1.845 -0.727 0.467Participant Age Years -0.092 0.110 -0.836 0.403Account for Work Purposes No -3.837 1.903 -2.016 0.044Account for Personal Purposes No 3.069 1.930 1.590 0.112Relationship to Sharing Recipient Have communicated in past year 3.104 0.740 4.193 <.001
39
5.6 File Co-ownership
One potential way to handle shared files long after their original use is simply to provide
each participant with their own independent copy, which then diverges as any edits are made.
We asked whether users would prefer the edits of others to be reflected in their files, or if
they would prefer not to receive those edits and keep their own copy. Thus, we asked two
questions that are related with file co-ownership. We asked participants to indicate on a
five-point Likert scale whether they agree with these statements: “If anyone other than me
changes (modifies or deletes) the file, my copy of the file should also reflect their changes
(Others’ changes → My copy),” and “If I change (modify or delete) this file, other people’s
copies of the file should also reflect my changes (My changes → Others copies).”
For 60.8% of Dropbox files and 27.6% of Google Drive files, our participants preferred to
receive edits, and conversely for 51.2% of Dropbox files and 39.1% of Google Drive files our
participants preferred that their own edits be reflected in someone else’s copy of the shared
file. We think this difference originated from the sharing characteristics of each cloud storage
service. For Dropbox users, sharing an individual file can only give others viewer access;
granting editor access requires making a “team” inside their cloud storage service and the
entire folder containing the file has to be shared. On the other hand, Google Drive allows
its users to grant view and edit access for both files and folders. Therefore, the relationship
between individuals who share files with each other in Dropbox is stronger than Google
Drive.
This decision was also impacted by whether the participant was an owner or editor of
the file. For files owned by the participant, they preferred that their files reflect external
changes 39.2% of the time and that their changes should be applied to other copies 51.6% of
the time. For files with editing rather than ownership permissions, the participants preferred
that changes be reflected in their copy 53.7% of the time and that their changes be applied
to other copies 43.6% of the time.
40
Cloud storage Others’ changes → My copy My changes → Others copies
Dropbox0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Google Drive 0 10 20 30 40 50 60 70 80 90 100Stronglyagree
Agree Neutral Disagree Stronglydisagree
0 10 20 30 40 50 60 70 80 90 100
Figure 5.14: Dropbox users were more likely to have the same version of shared files thanGoogle Drive users (Others’ changes → My copy : χ2(4, N = 327) = 49.3, p < .001, Mychanges → Others copies : χ2(4, N = 327) = 9.88, p = .04).
Ownership Others’ changes → My copy My changes → Others copies
Owner0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Editor0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Viewer 0 10 20 30 40 50 60 70 80 90 100Stronglyagree
Agree Neutral Disagree Stronglydisagree
0 10 20 30 40 50 60 70 80 90 100
Figure 5.15: If a participant owned the file in question, they were less likely to accept thechanges of others and were more likely to want their changes to be reflected in the sharedcopies (Others’ changes → My copy : χ2(8, N = 327) = 23.07, p = .003, My changes →Others copies : χ2(4, N = 327) = 14.43, p = .07).
Shared with Others’ changes → My copy My changes → Others copies
All of them0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
None0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Don’t know 0 10 20 30 40 50 60 70 80 90 100Stronglyagree
Agree Neutral Disagree Stronglydisagree
0 10 20 30 40 50 60 70 80 90 100
Figure 5.16: If a participant was the original sharer of the file in question, they were lesslikely to accept the changes of others and more likely to want their changes reflected in theshared copies (Others’ changes→My copy : χ2(12, N = 327) = 33.64, p < .001, My changes→ Others copies : χ212, N = 327) = 43.41, p < .001).
41
We asked whether the participant originally shared the file or not. If the participant
was the original sharer of the file, they preferred that changes be reflected in their copy for
32.48% of files and that their changes be applied to other copies to 53.5% of files. However,
if the participant was the recipient of the file, they preferred that changes be reflected in
their copy for 44.09% of files and that their changes be applied to other copies for 28.34%
of files.
The sharing method also impacted these decisions. For 47.17% of e-mail based and 21.74%
of link based shared files, participants preferred to receive edits. For 46.22% of e-mail based
and 38.26% of link based sharing files, participants preferred that their own edits be reflected
in someone else’s copy of the shared file.
SharingMethod
Others’ changes → My copy My changes → Others copies
E-mail0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Link 0 10 20 30 40 50 60 70 80 90 100Stronglyagree
Agree Neutral Disagree Stronglydisagree
0 10 20 30 40 50 60 70 80 90 100
Figure 5.17: Participants preferred to keep the same version of files for e-mail based ratherthan link based shared files (Others’ changes→ My copy : χ2(4, N = 337) = 23.88, p < .001,My changes → Others copies : χ2(4, N = 337) = 3.40, p = .49).
We asked participants why their copy of the file should also reflect the changes of oth-
ers. 57.4% answered that they want to receive updates. P99 emphasized the importance of
updates: “modifications should absolutely be shown. better to have record of what has hap-
pened with a file than not to have record of it.” 11.9% answered that accepting the changes
of others was a natural part and even the point of collaboration. P18 replied, “It was a group
assignment, so it should be up to date with everyone’s information.” Trusting the work of
others was one reason why participants wanted to accept collaborator edits. P55 said, “I trust
that the changes they make are appropriate.” However, participants had different reasons
42
for not wanted to receive updates for other selected files. 40.3% showed that they did not
care about updates. P58 answered, “I don’t need to see their changes or care about them.”
Also, 22.9% answered that they wanted to keep the original version of files. P75 replied, “I
want to remember it the way it was, rather than someone changing it.”
Also, we asked why other people’s copies of the file should also reflect a participant’s
changes. 51.0% answered that they wanted to receive updates. P33 mentioned that “Changes
to that file indicate changes to our budget, so I’d like to keep that up to date.” 10.4% answered
other cloud service members needed to accept their changes because they collaborated with
each other. P88 replied that “All other collaborators within a file or project should know
who is editing our project, including myself.” 9.4% answered that they were the owner of the
file in question. 6.3% answered that they trusted the changes of others. P55 answered that
“The other person would be OK with me making any changes to this file.” However, the
participants who did not want to impact the file copies other others had different answers.
22.8% said that they did not care about updates. P16 answered that “I don’t care what
happens to the files at this point.” 13.2% answered that they were not the owner of the file
in question. 11.4% stated that shared members could keep their own working copies. 9.6%
answered that receiving a participant’s updates should be the decision of another shared file
user. P17 answered that “I think others should make their own decision.”
43
5.7 File Automation
We suggested three types of file automation features that could be added in the future:
deleting, encrypting, and moving the data to low-energy archives. To investigate user prefer-
ence, we asked the participants to indicate how much they agreed with each statement; “It
would be helpful if I could specify that a file on my cloud storage should be automatically
encrypted, deleted, or moved to an archive.” After that, we assumed that participants “sup-
port” file management automation if they selected “strongly agree” or “agree,” and assumed
that participants “did not support” file management automation if they selected “neutral,”
“disagree,” or “strongly disagree.” Participants had a more positive attitude toward auto-
encryption (72%) than auto-deletion (32%) and auto-archiving (37%).
Participants who supported auto-encryption answered that it offered better security(37.5%),
and it was easy to use (26.4%). P34 mentioned “that [he] wouldn’t have to implement [his]
own encryption solution.” However, 13.9% answered that having auto-encryption would be
useful whether they would use it or not. P65 stated, “I don’t know when I would need this
feature, but it would always be nice to know it was there when I do.” To know which fac-
tors should be considered for auto-encryption, we also asked participants to identify files or
folders in their account that should be automated. 36.1% of participants answered that a
“pre-defined rule” should be used for auto-encryption. P40 wanted a radio button that could
be used for auto-encryption: “I’d want to tag it with a radio button (in the shape of a lock,
maybe?) mouseover of course would explain that it would encrypt the file. You could also
have a section in settings that scans the filetype and automatically encrypts based on what
filetype you specify (such as, *.mp3, *.wav, *.pdf).” Moreover, 20.8% of participants replied
that “specific file type” should be considered for auto-encryption. P7 said, “Maybe by asking
first if you would like this ”type” of file to be automatically encrypted let’s say the file type
was a “Package” file. Or it was a JPG file, or something like that and then it could just
automatically assume you’d want that package file to be encrypted.”
44
Participants who supported auto-deletion answered that auto-deletion would delete junk
(43.7%) and free up space (18.8%). P34 said, “Sometimes I put things in my Dropbox
temporarily and then after I use the, I forget they’re there. I would like to set some of these
files to delete automatically after a set amount of time.” However, there was no significant
factor that could be applied for auto-deletion.
Many of our participants expressed concern about unintentional deletion. 37.5% of par-
ticipants answered that they wanted to decide whether they deleted the files or not. P24
answered, “I want to decide if I want to get rid of certain things.” Moreover, 25.0% of par-
ticipants worried about accidental deletion. P16 mentioned that “you might accidentally put
something in there you need.”
28.9% of participants who supported auto-archiving answered that auto-archiving would
be helpful to “save energy.” P1 stated, “I would like to be able to help save energy with things
I haven’t used in a long time but might need later.” 28.9% answered that auto-archiving
would “save space.” P95 mentioned that auto-archiving freed up space while it was revocable:
“This automatically free up space but it is also non-permanent. I still have the photos.” To
know which factors should be considered for auto-archiving, we asked participants to identify
files or folders in their account that should be automated. 47.3% of participants answered
that “time” should be a factor for things to be considered for auto-archiving. P17 answered,
“If it could be identified by year. Everything before a certain period of time could be archived.
After 2 years, send it to archive.”
Participants who had technical backgrounds were more likely to support file automa-
tion. 83.3% of participants who had technical backgrounds answered that auto-encryption
was useful, while 68.0% of non-technical participants supported auto-encryption. 36.7% of
participants who had technical backgrounds supported auto-deletion. 46.7% of participants
supported auto-archiving and 30.3% of non-technical participants supported auto-deletion.
33.3% of participants supported auto-archiving.
45
Auto-archiving Delay Tolerance
Support0 10 20 30 40 50 60 70 80 90 100
Do not support 0 10 20 30 40 50 60 70 80 90 100
No dely Minutes Hours Days
Figure 5.18: Comparison of auto-archiving and delay tolerance. Participants who supportauto-archiving tolerate a longer delay before a file is retrieved (χ2(3, N = 534) = 38.31,p < .001).
Also, based on attitudes toward auto-archiving, the tolerance for how long it takes to
retrieve a file changed. If participants thought that auto-archiving was useful, they could
accept a longer delay, but only up to a few minutes (57.6%).
46
CHAPTER 6
DISCUSSION AND LIMITATIONS
6.1 Discussion
Our participants had forgotten that a high proportion of the files they saw in our study were
stored in the cloud, yet many participants wanted to delete or encrypt at least one of those
files. Further, participants did not even recognize 13.5% of the files they saw, and wanted
to delete or encrypt 83.6% of these unrecognized files. These combined results highlight the
need for retrospective file management mechanisms in the cloud.
Some retrospection tools already exist in other domains. For instance, Facebook has
an “on this day” feature to highlight an old post, though this mechanism is focused on
resharing. Whereas Facebook’s feature is meant to drive reminiscence and engagement, our
results suggest that cloud users also need such retrospective mechanisms to remind them of
forgotten files, particularly those likely to arouse privacy concerns.
While automated retrospective file management mechanisms would be helpful, we did not
find many significant predictors in our regression models. Basic file metadata and information
about the participant alone was not enough to predict the file management decision.
Content clustering, closer interaction with users during the discovery process, and deeper
analyses of file contents, not just metadata, might enable better predictions on the way to
automated file management.
6.2 Limitations
A core limitation of our study is that we report on a convenience sample. Our participants
may not represent the typical user of cloud storage services, particularly since Mechanical
Turk workers tend to be more technically oriented than the population at large. Furthermore,
prospective participants with particularly sensitive files stored in the cloud might be reluctant
47
to participate since they needed to give our software OAuth permissions to access their files.
That said, even among individuals who were willing to participate, we observed many files
participants would want to delete or encrypt.
Our study focused on Dropbox and Google Drive, which are only two of the many cloud
storages services available, albeit the two most popular. We had an unequal distribution of
Dropbox and Google Drive participants in our sample. A more comparably sized sample of
the two services would provide a more accurate point of comparison.
In this research, we did not include the online document-creation service in our analysis.
However, these online documents accelerate collaboration by allowing co-editing, a special
feature that is totally different from locally shared files. For example, Google Drive does not
consume local storage space, but files can be frequently modified. An additional compari-
son of files generated by these web-based editing tools would have helped us develop more
comparable insights across the two cloud storage platforms.
48
CHAPTER 7
CONCLUSION AND FUTURE WORK
7.1 Conclusion
By investigating participant perspectives on a stratified sample of files stored in their own
Google Drive or Dropbox account, we built a better understanding of the contents of cloud
storage accounts, identifying latent needs for retrospective file management tools. We used a
stratified sample to measure a broad cross-section of files users retain in their cloud storage
accounts, rather than focusing on the files most likely to arouse security and privacy concerns
(e.g., files named “taxreturn2017.pdf” or that contain saved passwords). Even so, we found
that 83% of participants wanted to permanently delete at least one file from this sample
of ten. This result highlights the disconnect between the desired file management decisions
of our participants and the high overhead of retrospectively managing thousands of files
in a cloud storage account. Thus, our results highlight the need for retrospective privacy
mechanisms that empower users to manage the risks latent in their file archives without
expending unreasonable effort.
7.2 Future Work
According to our research, the average number of files stored on each participant’s cloud
account was 444.5 and almost 46.6% of files were forgotten after they were stored in the
cloud. Thus, managing files for users is quite demanding work. We believed we could use
our user-oriented research to help people manage their files easily. Our user-centered ap-
proach will be helpful for developing a predictive model. Because it is unfeasible to ask users
to retrospectively revisit all of their previous files, this survey can be used for building a
predictive model for which files might be safely deleted, automatically encrypted, or moved
to cold storage. Predictive models could combine techniques from machine learning with
49
insights drawn from human—computer interaction work concerning user security and pri-
vacy personas [22]. Based on this latter body of work, we expect that users can naturally
be categorized into a small set of different approaches to data management (e.g., those who
favor deletion, those who hoard files in cold storage, etc.). A predictive model could combine
a deep understanding of a user’s preferred mode of archive management with the specific
management decisions already made for certain files. After the user makes a few represen-
tative file management decisions, these more advanced methods might be able to partially
automate file management.
Moreover, we did not fully investigate file sharing practices, however, file sharing fore-
grounds several security and privacy issues. Rader reveals that users almost exclusively touch
files they have created themselves and are particularly reluctant to delete files that could
be useful to someone else in the future. Such behavior can result in clutter and frustrate
users [46]. Even though only a few participants wanted to stop sharing, keeping entire shared
files in cloud storage eventually brings file management issues. Our research shows the me-
dian percent of shared files was 34.9%. Thus, to develop a better file management system,
we need to conceptualize user preferences concerning sharing decisions and file versioning.
Lastly, we expected that studying an online document creation service would be greatly
helpful for understanding current cloud storage practices. Even though we did not include
online document creation tools in our research, this has many implications for future research.
Currently, many people uses online document creation tools, however, this area has never
been studied before. We need to investigate these online document creation tools to get
better insight about the current state of cloud storage.
50
REFERENCES
[1] John Robert Anderson. 1985. Cognitive psychology and its implications. A series ofbooks in psychology. (1985).
[2] Ibrahim Arpaci, Kerem Kilicer, and Salih Bardakci. 2015. Effects of security and privacyconcerns on educational use of cloud services. Computers in Human Behavior 45 (2015),93–98.
[3] Oshrat Ayalon and Eran Toch. 2013. Retrospective privacy: Managing longitudinal pri-vacy in online social networks. In Proc. 9th Symposium on Usable Privacy and Security.ACM, 4.
[4] Taiwo Ayodele, Galyna Akmayeva, and Charles A Shoniregun. 2012. Machine learningapproach towards email management. In Internet Security (WorldCIS), 2012 WorldCongress on. IEEE, 106–109.
[5] Olle Balter. 1997. Strategies for organizing email messages. HCI 1997 (1997), 21–38.
[6] Deborah Barreau and Bonnie A Nardi. 1995. Finding and reminding: file organizationfrom the desktop. ACM SigChi Bulletin 27, 3 (1995), 39–43.
[7] Deborah K Barreau. 1995a. Context as a factor in personal information managementsystems. Journal of the American Society for Information Science 46, 5 (1995), 327.
[8] Deborah K Barreau. 1995b. Context as a factor in personal information managementsystems. Journal of the American Society for Information Science 46, 5 (1995), 327.
[9] Lujo Bauer, Lorrie Faith Cranor, Saranga Komanduri, Michelle L Mazurek, Michael KReiter, Manya Sleeper, and Blase Ur. 2013. The post anachronism: The temporal di-mension of Facebook privacy. In Proc. 12th ACM Workshop on privacy in the electronicsociety. ACM, 1–12.
[10] Victoria Bellotti, Nicolas Ducheneaut, Mark Howard, Ian Smith, and ChristineNeuwirth. 2002. Innovation in extremis: evolving an application for the critical work ofemail and information management. In Proc. 4th conference on Designing interactivesystems: processes, practices, methods, and techniques. ACM, 181–192.
[11] Ofer Bergman, Richard Boardman, Jacek Gwizdka, and William Jones. 2004. Personalinformation management. In Proc. CHI. ACM, 1598–1599.
[12] Richard Boardman and M Angela Sasse. 2004. Stuff goes into the computer and doesn’tcome out: a cross-tool study of personal information management. In Proc. CHI. ACM,583–590.
[13] Richard Boardman, Robert Spence, and M Angela Sasse. 2003. Too many hierarchies?The daily struggle for control of the workspace. In Proc. HCI international, Vol. 1.616–620.
51
[14] Richard Peter Boardman. 2004. Improving tool support for personal information man-agement. Ph.D. Dissertation. University of London.
[15] Robert Capra, Emily Vardell, and Kathy Brennan. 2014. File synchronization andsharing: User practices and challenges. Proc. ASIS&T 51, 1 (2014).
[16] Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon, RyusukeMasuoka, and Jesus Molina. 2009. Controlling data in the cloud: outsourcing compu-tation without outsourcing control. In Proc. 2009 ACM workshop on Cloud computingsecurity. ACM, 85–90.
[17] Jason W Clark, Peter Snyder, Damon McCoy, and Chris Kanich. 2015. I Saw ImagesI Didn’t Even Know I Had: Understanding User Perceptions of Cloud Storage Privacy.In Proc. CHI. ACM, 1641–1644.
[18] Edward Cutrell, Daniel Robbins, Susan Dumais, and Raman Sarin. 2006. Fast, flexiblefiltering with phlat. In Proc. CHI. ACM, 261–270.
[19] Mary Czerwinski and Eric Horvitz. 2002. An investigation of memory for daily com-puting events. People and Computers (2002), 229–246.
[20] Idilio Drago, Marco Mellia, Maurizio M Munafo, Anna Sperotto, Ramin Sadre, and AikoPras. 2012. Inside dropbox: understanding personal cloud storage services. In Proc. 2012ACM conference on Internet measurement conference. ACM, 481–494.
[21] Susan Dumais, Edward Cutrell, Jonathan J Cadiz, Gavin Jancke, Raman Sarin, andDaniel C Robbins. 2016. Stuff I’ve seen: a system for personal information retrieval andre-use. In ACM SIGIR Forum, Vol. 49. ACM, 28–35.
[22] Janna Lynn Dupree, Richard Devries, Daniel M. Berry, and Edward Lank. 2016. PrivacyPersonas: Clustering Users via Attitudes and Behaviors toward Security Practices. InProc. CHI.
[23] David Elsweiler, Ian Ruthven, and Christopher Jones. 2007. Towards memory support-ing personal information management tools. Journal of the Association for InformationScience and Technology 58, 7 (2007), 924–946.
[24] Eric Freeman and David Gelernter. 1996. Lifestreams: A storage model for personaldata. ACM SIGMOD Record 25, 1 (1996), 80–86.
[25] Global Industry Analysts. Inc. Accessed 2017. personal cloud-a global strategic businessreport. http://www.strategyr.com/MarketResearch/PersonalC loudMarketT rends.asp.(Accessed2017).
[26] Glauber Goncalves, Idilio Drago, Ana Paula Couto Da Silva, Alex Borges Vieira, andJussara M Almeida. 2014. Modeling the dropbox client behavior. In Proc. ICC.
52
[27] Raul Gracia-Tinedo, Pedro Garcıa-Lopez, Alberto Gomez, and Anastasio Illana. 2016.Understanding data sharing in private personal clouds. In Cloud Computing (CLOUD),2016 IEEE 9th International Conference on. IEEE, 392–399.
[28] Graham Cluley. Accessed 2017. Dropbox users leak tax returns, mortgage applicationsand more. https://www.grahamcluley.com/dropbox-box-leak/. (Accessed 2017).
[29] Jane Gruning and Sian Lindley. 2016. Things We Own Together: Sharing Possessionsat Home. In Proc. CHI. ACM, 1176–1186.
[30] Drew Houston and Arash Ferdowsi. 2016. Celebrating half a billion users.https://blogs.dropbox.com/dropbox/2016/03/500-million/. (2016).
[31] Wenjin Hu, Tao Yang, and Jeanna N Matthews. 2010. The good, the bad and theugly of consumer cloud storage. ACM SIGOPS Operating Systems Review 44, 3 (2010),110–115.
[32] Iulia Ion, Niharika Sachdeva, Ponnurangam Kumaraguru, and Srdjan Capkun. 2011.Home is safer than the cloud!: privacy concerns for consumer cloud storage. In Proc.7th Symposium on Usable Privacy and Security. ACM, 13.
[33] Eric Johnson. 2017. Lost in the Cloud: Cloud Storage, Privacy, and Suggestions forProtecting Users’ Data. Stan. L. Rev. 69 (2017), 867.
[34] William Jones, Charles F Munat, Harry Bruce, and Austin Foxley. 2005. The univer-sal labeler: Plan the project and let your information follow. Proc. Association forInformation Science and Technology 42, 1 (2005).
[35] Victor Kaptelinin. 2003. UMEA: translating interaction histories into project contexts.In Proc. CHI. ACM, 353–360.
[36] Beom Heyn Kim, Wei Huang, and David Lie. 2012. Unity: secure and durable personalcloud storage. In Proc. 2012 ACM Workshop on Cloud computing security. ACM, 31–36.
[37] Felix Kollmar. 2017. Cloud Storage Report 2017. https://blog.cloudrail.com/cloud-storage-report-2017/. (2017).
[38] Aparna Krishnan and Steve Jones. 2005. TimeSpace: activity-based temporal visuali-sation of personal information spaces. Personal and Ubiquitous Computing 9, 1 (2005),46–65.
[39] Mark W Lansdale. 1988. The psychology of personal information management. Appliedergonomics 19, 1 (1988), 55–66.
[40] Cathy Marshall and John C Tang. 2012. That syncing feeling: early user experienceswith the cloud. In Proc. Designing Interactive Systems Conference. ACM, 544–553.
53
[41] Charlotte Massey, Thomas Lennig, and Steve Whittaker. 2014. Cloudy forecast: anexploration of the factors underlying shared repository use. In Proc. CHI.
[42] Peter Mell, Tim Grance, and others. 2011. The NIST definition of cloud computing.(2011).
[43] Adriana Mijuskovic and Mexhid Ferati. 2015. User awareness of existing privacy andsecurity risks when storing data in the cloud. In Proc. International Conference one-Learning, Vol. 15. 268–273.
[44] Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P Gummadi, andAniket Kate. 2017. Longitudinal Privacy Management in Social Media: The Need forBetter Controls. IEEE Internet Computing 21, 3 (2017), 48–55.
[45] Michael Nebeling, Matthias Geel, Oleksiy Syrotkin, and Moira C Norrie. 2015. MUBox:Multi-User Aware Personal Cloud Storage. In Proc. CHI.
[46] Emilee Rader. 2009. Yours, mine and (not) ours: social influences on group informationrepositories. In Proc. CHI. ACM, 2095–2098.
[47] Emilee Rader. 2010. The effect of audience design on labeling, organizing, and findingshared files. In Proc. CHI. ACM, 777–786.
[48] Kopo Marvin Ramokapane, Awais Rashid, and Jose Such. 2017. “I feel stupid I can’tdelete...”: a study of users’ cloud deletion practices and coping strategies. In Proc. 13thSymposium on Usable Privacy and Security. ACM.
[49] Esther Schindler. Accessed 2017. Cloud development sur-vey. Evans Data Corporation, Strategic Reports, July 2010.https://evansdata.com/reports/viewRelease.php?reportID=27. (Accessed 2017).
[50] Manya Sleeper, William Melicher, Hana Habib, Lujo Bauer, Lorrie Faith Cranor, andMichelle L Mazurek. 2016. Sharing personal content online: Exploring channel choiceand multi-channel behaviors. In Proc. CHI.
[51] Peter Snyder and Chris Kanich. 2013. Cloudsweeper: enabling data-centric documentmanagement for secure cloud archives. In Proc. 2013 ACM workshop on Cloud comput-ing security. ACM, 47–54.
[52] Luke Stark and Matt Tierney. 2014. Lockbox: mobility, privacy and values in cloudstorage. Ethics and Information Technology 16, 1 (2014), 1–13.
[53] Simone Stumpf and Jon Herlocker. 2006. Tasktracer: Enhancing personal informationmanagement through machine learning. 2nd Invitational Workshop on Personal Infor-mation Management at SIGIR 2006 (2006), 105.
[54] Nabil Ahmed Sultan. 2011. Reaching for the ‘cloud’: How SMEs can manage. Interna-tional journal of information management 31, 3 (2011), 272–278.
54
[55] Amy Voida, Judith S Olson, and Gary M Olson. 2013. Turbulence in the clouds: chal-lenges of cloud-based information work. In Proc. CHI. ACM, 2273–2282.
[56] Stephen Voida, W Keith Edwards, Mark W Newman, Rebecca E Grinter, and NicolasDucheneaut. 2006. Share and share alike: exploring the user interface affordances of filesharing. In Proc. CHI. ACM, 221–230.
[57] James Wen. 2003. Post-valued recall web pages: User disorientation hits the big time.It & Society 1, 3 (2003), 184–194.
[58] Steve Whittaker, Victoria Bellotti, and Jacek Gwizdka. 2006. Email in personal infor-mation management. Commun. ACM 49, 1 (2006), 68–73.
[59] Steve Whittaker and Candace Sidner. 1996. Email overload: exploring personal infor-mation management of email. In Proc. CHI. ACM, 276–283.
[60] Jinsheng Xu, Jinghua Zhang, T Harvey, and J Young. 2008. A survey of asynchronouscollaboration tools. Information Technology Journal 7, 8 (2008), 1182–1187.
[61] Hong Zhang and Michael Twidale. 2012. Mine, yours and ours: using shared folders inpersonal information management. Personal Information Management (2012).
[62] Xuan Zhao, Niloufar Salehi, Sasha Naranjit, Sara Alwaalan, Stephen Voida, and DanCosley. 2013. The many faces of Facebook: Experiencing social media as performance,exhibition, and personal archive. In Proc. CHI. ACM, 1–10.
[63] Diao Zhe, Wang Qinghong, Su Naizheng, and Zhang Yuhan. 2017. Study on Data Secu-rity Policy Based on Cloud Storage. In Big Data Security on Cloud (BigDataSecurity),IEEE International Conference on High Performance and Smart Computing (HPSC),and IEEE International Conference on Intelligent Data and Security (IDS), 2017 IEEE3rd International Conference on. IEEE, 145–149.
[64] Jianying Zhou. 2014. On the security of cloud data storage and sharing. In Proc. 2ndinternational workshop on Security in cloud computing. ACM, 1–2.
55
APPENDIX A
SURVEY INSTRUMENT
A.1 General question
G1 For approximately how long have you had the Cloud Storage account youare using for this study?◦ Less than 1 year (1)◦ At least 1 year, but less than 2 years (2)◦ At least 2 years, but less than 3 years (3)◦ At least 3 years, but less than 4 years (4)◦ At least 4 years, but less than 5 years (5)◦ More than 5 years (6)
G1-1 Cloud storage providers offer both free accounts and paid accounts, wherethe latter offers more storage space. Do you use a free Cloud Storage account ora paid Cloud Storage account?◦ Free account (1)◦ Paid account (2)◦ I’m not sure (3)
Display This Question: If Cloud storage providers offer both free accounts and paid accounts,where the latter offers more storage space. Do you use a free Cloud Storage account or a paidCloud Storage account? Paid account Is SelectedG1-2 How much do you pay per month?
G2 How often do you use this Cloud Storage account for work or school pur-poses?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I do not use it for work or school purposes (5)
56
G3 How often do you use this Cloud Storage account for personal purposes (i.e.,for purposes other than for work or school)?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I do not use it for personal purposes (5)
G4 I use this Cloud Storage account for the following purposes: (Check all thatapply)� Collaborating with co-workers, classmates, or professional contacts by joinly creating andediting files (1)� Collaborating with friends and family by joinly creating and editing files (2)� Sharing files that I have created with co-workers, classmates, or other professional contacts(3)� Sharing files that I have created with family and friends (4)� Backing up files related to my job, school, or career (5)� Backing up files that are not related to my job, school, or career (6)� Other (7)
G5 There are multiple ways you can access files in your Cloud Storage account.One of these ways is by installing Cloud Storage software on your computer sothat certain folders are automatically synced with your Cloud Storage account.How often do you access (view or edit) files or folders on your computer thatare automatically synced with your Cloud Storage account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never (6)
G6 Another way to access files in your Cloud Storage account is by using a webbrowser like Chrome, Firefox, or Safari to log into the Cloud Storage website.How often do you log into the Cloud Storage website using this account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never (6)
57
G7 Yet another way to access files in your Cloud Storage account is by using anapp on your smartphone (IPhone or Android). How often do you use a smart-phone app to access files or folders stored in this Cloud Storage account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never, though I do use a smartphone (6)◦ Never; I do not use a smartphone (7)
The following two questions concern the following distinction: A file stored locallyis acces-sible on your computer (i.e., stored on the hard drive) even if you are not connected to theInternetA file stored in the cloud is accessible only if you are connected to the Internet Notethat a given file might be stored both locally and in the cloud.
G8-1 Which statement best describes your current situation regarding cloud-files?◦ All of my cloud files are also stored locally on my computer (1)◦ Most of my cloud files are also stored locally on my computer (2)◦ Some of my cloud files are also stored locally on my computer (3)◦ None of my cloud files are also stored locally on my computer (4)
G8-2 Which statement best describes your current situation regarding files stored-locally?◦ All of my locally stored files are also accessible in the cloud via Cloud Storage (1)◦ Most of my locally stored files are also accessible in the cloud via Cloud Storage (2)◦ Some of my locally stored files are also accessible in the cloud via Cloud Storage (3)◦ None of my locally stored files are also accessible in the cloud via Cloud Storage (4)
G9 On average, how often do you run out of storage space on your Cloud Storageaccount?◦ I am almost always out of storage space (1)◦ At least once a month (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I have never run out of storage space (5)◦ I don’t know (6)
58
G10 On average, how often do you organize your Cloud Storage by deleting un-necessary files, moving files to different folders, or performing similar clean-uptasks?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I have never organized my Cloud Storage (5)◦ I don’t know (6)
G11 Overall, which of the following cloud services do you use? (Check all thatapply)� Amazon Cloud Drive (1)� Apple iCloud (2)� Box (3)� Dropbox (4)� Google Drive (5)� Microsoft OneDrive (6)� SpiderOak One (7)� Other (8)
59
A.2 Content specific question
CF-1 After looking at this file, do you know what it is? (Note: You might notknow what it is if the file was automatically created or automatically saved toyour cloud storage.)◦ Yes (1)◦ No (2)
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-11 Prior to this survey, I remembered that this file was stored on any deviceor service I use.◦ Strongly Agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-12 Prior to this survey, I remembered that this file was stored in my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-13 As far as you can remember, why did you originally store this file onCloud Storage?
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-14 As far as you can remember, when did you originally store this file onCloud Storage?◦ Within the last week (1)◦ At least a week ago, but less than a month ago (2)◦ At least a month ago, but less than a year ago (3)◦ At least a year ago, but less than five years ago (4)◦ At least five years ago (5)◦ I don’t know (6)◦ As far as I remember, I did not store it on Cloud Storage (7)
60
CF-2 Which of these statements best characterizes what you would like to hap-pen to this file?◦ I would like to keep this file stored as-is in my Cloud Storage. (1)◦ I would like to keep only an encrypted version of this file in my Cloud Storage. (2)◦ I would like to delete this file from my Cloud Storage. (3)
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-15 As far as you remember, when is the last time you accessed (viewed ormodified) this file?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ As far as I know, I have never accessed this file (6)◦ I don’t remember (7)
Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-16 When do you next expect to access (view or modify) this file in the fu-ture?◦ Within the next week (1)◦ Over 1 week from now, but less than 1 month from now (2)◦ Over 1 month from now, but less than 1 year from now (3)◦ Over 1 year from now, but less than 5 years from now (4)◦ Over 5 years from now, but eventually (5)◦ Never (6)
61
Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to keep this file stored as-is in my Cloud Storage. Is Se-lected Or Which of these statements best characterizes what you would like to happen tothis file?I would like to keep only an encrypted version of this file in my Cloud Storage. IsSelectedCF-21 Files could potentially be stored in the cloud in a way that saves energy.However, this would mean that the file could only be accessed with some delay,rather than instantaneously. When I next try to access this file· · ·◦ · · · no delay in the file being available is acceptable (1)◦ · · · a delay of up to a few minutes in being able to access the file is acceptable (2)◦ · · · a delay of up to a few hours in being able to access the file is acceptable (3)◦ · · · a delay of up to a few days in being able to access the file is acceptable (4)
Display This Question: If Which of these statements best characterizes what you wouldlike to happen to this file?I would like to keep only an encrypted version of this file in myCloud Storage. Is SelectedCFE-1 It is important to me that this file is encrypted, rather than remainingas-is in my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-1 It is important to me that this file is deleted, rather than remaining as-isin my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to keep this file stored as-is in my Cloud Storage. Is Se-lectedCFA-1 Why would you want to continue storing this file as-is onCloud Storage?
62
Display This Question: If Which of these statements best characterizes what you wouldlike to happen to this file?I would like to keep only an encrypted version of this file in myCloud Storage. Is SelectedCFE-2 Why would you want to keep an encrypted version of this file onCloud Storage?
Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-2 Why would you want to delete this file fromCloud Storage?
CF-3 It is important to me to keep this file safe from unauthorized access.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CF-4 It is important to me that I never lose the ability to access this file.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CF-5 As far as you know, do you have a copy of this file on any other device orservice you use?◦ Yes, I have another copy of the file somewhere (1)◦ No, I do not have any other copies of this file (2)◦ I’m not sure (3)
Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-3 Which of the following two statements better describes what you wouldwant to happen?◦ Although I would like to delete this file from my Cloud Storage account, I would want tokeep a copy of the file on a local device (e.g., my computer or smartphone) (1)◦ I would like to delete this file from my Cloud Storage account, and I would not want tokeep a copy of the file on any of my local devices (2)
63
CF-6 Are there any other files stored in your Cloud Storage account for whichyou would want to apply the same file-management decision (keep as-is, encrypt,delete) as for this file?◦ Yes (1)◦ No (2)◦ I’m not sure (3)
Display This Question: If Are there any other files stored in your Cloud Storage account forwhich you would want to· · · Yes Is SelectedCF-61 For what other files in your Cloud Storage would you want to apply thesame file-management decision? Please describe those files using whatever lan-guage you use to think about them, rather than constraining yourself to thecurrent Cloud Storage interface.
Display This Question: If Are there any other files stored in your Cloud Storage account forwhich you would want to· · · No Is SelectedCF-62 Why would you not want to apply the same file-management decisionfrom this file toother files in your Cloud Storage?
CSP-1 For each of these people with whom the file is shared, indicate belowwhether you know who the person is.
I know who this is, andI have talked to themwithin the last year
I know who this is, butI have not talked tothem in over a year
I do not know who thisis
Member A ◦ ◦ ◦
Member B ◦ ◦ ◦
Member C ◦ ◦ ◦
CSP-2 For each of these people, indicate below whether you would want to keepsharing this particular file with that person, stop sharing thisparticularfile withthat person, or whether it doesn’t matter to you.
Definitely keep sharing Doesn’t matter Definitely stop sharing
Member A ◦ ◦ ◦
Member B ◦ ◦ ◦
Member C ◦ ◦ ◦
64
CSP-3 To your knowledge, were you the person who originally shared this filewith those people?◦ I am the person who shared this file with all of those people (1)◦ I am the person who shared this file with some, but not all, of those people (2)◦ I am not the person who shared this file with any of those people (3)◦ I don’t know (4)
Display This Question:If For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberA - Definitely keep sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberB - Definitely keep sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberC - Definitely keep sharing Is SelectedCSP-11 You indicated that you want to keep sharing this file with at least oneother person. Why do you want to keep sharing this file with them?
Display This Question:If For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberA - Definitely stop sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberB - Definitely stop sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberC - Definitely stop sharing Is SelectedCSP-12 You indicated that you want to stop sharing this file with at least oneother person. Why do you want to stop sharing this file with them?
Display This Question:If To your knowledge, were you the person who originally shared this file with those people?Iam the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those peo-ple?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-31 If you remember, when did you first share this file with other people?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ I don’t know (6)
65
Display This Question:If MemberA Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?Iam the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those peo-ple?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-32 (Optional) If you remember, why did you originally share this file withMamberA?
Display This Question:If MemberB Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-33 (Optional) If you remember, why did you originally share this file withMamberB?
Display This Question:If MemberC Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-34 (Optional) If you remember, why did you originally share this file withMamberC?
CSP-4 If anyone other than me changes (modifies or deletes) the file, my copyof the file should also reflect their changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CSP-41 Why?
66
CSP-5 If I change (modify or delete) this file, other people’s copies of the fileshould also reflect my changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CSP-51 Why?
CSI-1 To your knowledge, were you the person who created a shareable link forthis file?◦ Yes, I am the person who created the link for sharing this file (1)◦ No, I am not the person who created the link for sharing this file (2)◦ I don’t know (3)
Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-11 To your knowledge, with how many people have you shared the link toaccess this file?◦ No one other than yourself (1)◦ 1 - 5 people (2)◦ 6 - 10 people (3)◦ 11 - 15 people (4)◦ 16 - 20 people (5)◦ More than 20 people (6)◦ I don’t know (7)
CSI-2 Do you want to keep sharing this particular file with others using a link,stop sharing thisparticularfile with others using a link, or does it not matter toyou?◦ Definitely keep sharing using a link (1)◦ Doesn’t matter (2)◦ Definitely stop sharing using a link (3)
Display This Question: If Do you want to keep sharing this particular file with others usinga link, stop sharing thispart· · · Definitely keep sharing using a link Is SelectedCSI-21 You indicated that you want to keep sharing this file using a link. Whydo you want to keep sharing this file?
67
Display This Question: If Do you want to keep sharing this particular file with others usinga link, stop sharing thispart· · · Definitely stop sharing using a link Is SelectedCSI-22 You indicated that you want to stop sharing this file using a link. Whydo you want to stop sharing this file?
Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-12 If you remember, when did you set this file to be shared using a link?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ I don’t know (6)
Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-13 (Optional) If you remember, why did you originally share this file usinga link?
CSI-3 If anyone other than me changes (modifies or deletes) the file, my copy ofthe file should also reflect their changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CSI-31 Why?
CSI-4 If I change (modify or delete) this file, other people’s copies of the fileshould also reflect my changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
CSI-41 Why?
68
A.3 Features and Demographics
Our last few questions cover features that Cloud Storage could possibly add in the future.Please respond to the following statements.
DE (Regardless of whether or not I would want to encrypt any of the files Isaw in today’s study,) it would be helpful if I could specify that a file on myCloud Storageshould be automatically encrypted.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If (Regardless of whether or not I would want to encrypt any ofthe files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly agree Is Selected Or (Regardless of whether ornot I would want to encrypt any of the files I saw in today’s study,) it would be helpful if Icould specify that a file on my Cloud Storage should be automatic Agree Is SelectedDE-11 Why would it be helpful?
Display This Question: If (Regardless of whether or not I would want to encrypt any ofthe files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly agree Is Selected Or (Regardless of whether ornot I would want to encrypt any of the files I saw in today’s study,) it would be helpful if Icould specify that a file on my Cloud Storage should be automatic Agree Is SelectedDE-12 How, if at all, could Cloud Storage identify files or folders in your accountthat should be automatically encrypted?
Display This Question:If (Regardless of whether or not I would want to encrypt any of the files I saw in today’sstudy,) it would be helpful if I could specify that a file on my Cloud Storage should be au-tomatic Disagree Is Selected Or (Regardless of whether or not I would want to encrypt anyof the files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly disagree Is SelectedDE-2 Why would it not be helpful??
69
DD It would be helpful if I could choose that certain files or folders would au-tomatically and permanently delete themselves from my Cloud Storageaccountafter a period of time I specify.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Strongly agree Is Selected Or It would be helpful if I couldchoose that certain files or folders would automatically and permanently delete themselvesfrom my Cloud Storage account after a period of time I specify. Agree Is SelectedDD-11 Why would it be helpful?
Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Strongly agree Is Selected Or It would be helpful if I couldchoose that certain files or folders would automatically and permanently delete themselvesfrom my Cloud Storage account after a period of time I specify. Agree Is SelectedDD-12 How, if at all, could Cloud Storage automatically identify files or foldersin your account that should be automatically deleted?
Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Disagree Is Selected Or It would be helpful if I could choosethat certain files or folders would automatically and permanently delete themselves from myCloud Storage account after a period of time I specify. Strongly disagree Is SelectedDD-2 Why would it not be helpful?
70
DA It would be helpful if I could specify that certain files or folders would au-tomatically move to an archive (saving energy, but causing a delay when I tryto access the file) after a period of time I specify.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)
Display This Question: If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Strongly agree Is Selected Or It would be helpful ifI could specify that certain files or folders would automatically move to an archive (savingenergy, but causing a delay when I try to access the file) after a period of time Agree IsSelectedDA-11 Why would it be helpful?
Display This Question: If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Strongly agree Is Selected Or It would be helpful ifI could specify that certain files or folders would automatically move to an archive (savingenergy, but causing a delay when I try to access the file) after a period of time Agree IsSelectedDA-12 How, if at all, could Cloud Storage automatically identify files or foldersin your account that should be automatically moved to an energy-saving archive?
Display This Question:If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Disagree Is Selected Or It would be helpful if I couldspecify that certain files or folders would automatically move to an archive (saving energy,but causing a delay when I try to access the file) after a period of time Strongly disagree IsSelectedDA-2 Why would it not be helpful?
71
DC (Optional) Do you have any other comments about anything in today’s sur-vey?
DP-1 With what gender do you identify?◦ Male (1)◦ Female (2)◦ Other (3)◦ Prefer not to answer (4)
DP-2 Are you majoring in, or do you have a degree or job in, computer science,computer engineering, information technology, or a related field?◦ Yes (1)◦ No (2)◦ Prefer not to answer (3)
DP-3 How old are you?
DP-4 What is your occupation?
72