content analyses of personal emergency response calls ......and emotional speech, including personal...
TRANSCRIPT
Content Analyses of Personal Emergency Response Calls: Towards a More Robust Spoken Dialogue-Based Personal
Emergency Response System
by
Victoria Young
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Rehabilitation Sciences Institute in collaboration with the Institute of Biomaterials and Biomedical Engineering
University of Toronto
© Copyright by Victoria Young 2016
ii
Content Analyses of Personal Emergency Response Calls:
Towards a More Robust Spoken Dialogue-Based Personal
Emergency Response System
Victoria Young
Doctor of Philosophy
Rehabilitation Sciences Institute in collaboration with Institute of Biomaterials and Biomedical Engineering
University of Toronto
2016
Abstract
In an attempt to address identified usability barriers of traditional push-button-type personal
emergency response systems, a novel automated, intelligent, spoken dialogue-based personal
emergency response system is being developed. To design this system and make it more robust
for end-users, further information, currently not available in research literature, is needed to
improve the artificial intelligence and spoken dialogue components of the system. Using a mixed
methods design, this dissertation describes three studies that derive this needed information from
real personal emergency response calls. The first study identified 185 keywords and phrases
spoken by system users; 17 categories for classifying keywords; and a personal emergency
situation model including caller type, call reason, and risk level. The second study expanded the
situation model to a personal emergency response model by adding a response-type
classification. Various statistical analyses were applied to response calls using call classifications
and select conversational measures. Significant trends in call data that could be used to pre-
inform the automated personal emergency response system dialogue manager of a call’s potential
outcome were identified. Words per minute and turn length in words were found to be possible
predictors of caller type. Emergency medical services were the predominant response requested
for high risk calls and medium risk calls and non-professional responders appeared mainly in
medium risk calls. Care provider and older adult callers were also found to employ different
conversational strategies when responding to the call taker. In the third study, a spoken speech
corpus was developed containing younger and older adult, actor simulated, spontaneous, read,
iii
and emotional speech, including personal emergency response keywords, phrases, and scenarios.
Taken together, these research results will contribute towards the design and development of a
more robust automated, personal emergency response system for older adults to help them age-
in-place.
Keywords: aging-in-place, assistive technology, personal emergency response, personal
emergency response system, content analysis, speech corpus, older adult, spoken dialogue
system
iv
Acknowledgments
This research has developed through the concerted efforts and contributions of many individuals,
groups and organizations. As the Nigerian’s say, “it takes a whole village to raise a child.”
Supervisory Committee: First and foremost, I would like to acknowledge and thank my
supervisor, Alex Mihailidis, for providing me with a unique opportunity to work on this research
project. I extend a great big ‘thank you’ to you for supporting and encouraging me throughout
these many years. Your dedication and hard work is inspiring to watch and your quiet caring and
patience has been very much appreciated.
I would like to acknowledge and thank the other members of my supervisory committee,
Elizabeth Rochon, Gerald Penn, Tom Chau, and Willy Wong, for giving me guidance and
contributing their expertise, critiques, wisdom, and ideas to the project. Your input has helped to
give this research shape and a solid ground on which to grow.
Examiners: I would like to acknowledge and extend my thanks to my internal-external
examiner, Yana Yunusova and external-external examiner, Ann McKibbon, for contributing their
time and providing thoughtful comments and critique on my dissertation. Your efforts have
helped me to refine the project and have taught me how to better defend my work.
Research Participants: I would like to acknowledge and thank the 40 volunteer participants
who graciously lent their time and voices in the development of the CARES corpus. It is through
your willingness to participate in research studies such as these that great strides can be made in
research and technology development.
Non-Research Organizations: I would like to acknowledge and thank the Personal Emergency
Response Call Centre who provided the real call recordings. These calls are the foundation of
this project and could not have been completed without your willingness to collaborate.
I would like to acknowledge and thank the employees of the local Personal Emergency Response
Call Centre, the Toronto EMS Communications Centre, and Toronto Fire Station #343, who took
the time to explain their work setup, how calls are responded to, and how emergency situations
are assessed. These on-site visits have provided needed context in order to better understand the
tasks of the emergency responder and to get a feeling of the emotions involved when dealing
with live, personal emergency response situations.
v
Research Collaborators: I would like to acknowledge and thank the two keyword coders,
Rozanne Wilson and Tammy Siemenkowski, for their time and efforts spent in identifying and
categorizing keywords in this project. Additionally, I also would like to thank Tammy for further
categorizing the personal emergency response calls by risk level.
I would like to acknowledge and thank, Mark Chignell, for providing statistical guidance and
lending his expertise and ideas for this project, specifically Study 2.
I would like to acknowledge and thank the speech processors, Heidi Diepstra, Andrew Chignell,
Oleksandr Nishta, and Sanaz Alali, for their effort, time, and precision in processing sound files.
Research Groups: I would like to acknowledge the many research teams who provided
opportunities to listen to and discuss research within a supportive community environment.
These teams include: the Toronto Rehabilitation Institute – University Health Network’s iDAPT
Communication and Artificial Intelligence and Robotics Teams; and at the University of
Toronto, the Oral Dynamics Lab (in Speech-Language Pathology) (lenders of the sound
attenuation booths); the Sensory Communications Team (in the Institute of Biomaterials and
Biomedical Engineering) (lenders of the sound level meter); and the Computational Linguistics
Lab (in Computer Science).
Special thanks go to the members of my home lab, the Intelligent Assistive Technology and
Systems Lab, my research family. Each of the members, past and present, have helped create a
warm and supportive environment and atmosphere in which to spend copious amounts of time
talking about “intelligent assistive technology and systems research” as well as non-research life.
I would also like to thank the administrative staff at the Rehabilitation Sciences Institute, iDAPT
in the Toronto Rehabilitation Institute – University Health Network, and the Institute of
Biomaterials and Biomedical Engineering for always responding quickly to my questions with
happy smiles and friendly faces.
Cheerleaders: Last but not least, I would like to acknowledge and thank my family and friends,
and especially my husband and daughter for their patience and encouragement over these
doctoral research years. Your presence, listening, caring, assistance, kindness, thoughtfulness,
and laughter remind me each day that I am not alone on this journey.
vi
Funding Organizations: I would like to acknowledge and thank my supervisor and the many
organizations that provided funding for this research project. These funding sources included: the
Canadian Institutes of Health Research Strategic Training Initiative in Health Research (CIHR-
STIHR) Fellowship in Health Care Technology and Place (FRN:STP 53911); the National
Science and Engineering Research Council (NSERC) Graduate Award (doctoral); the Toronto
Rehabilitation Institute-University Health Network’s TRI-OSOTF Student Scholarship Fund
(which receives funding under the Provincial Rehabilitation Research Program from the Ministry
of Health and Long-Term Care in Ontario. The views expressed in this dissertation do not
necessarily reflect those of the Ministry); the University of Toronto’s Rehabilitation Sciences
Institute; the University of Toronto (Open Scholarship and Doctoral Completion Award);
Engineers Canada-TD Meloche Monnex.
vii
Preface
La Cuisine By Jules Renard (1864-1910)
Seigneur, s’il est vrai que vous seul soyez grand, ne réservez pas à ma vieillesse un château,
mais faites-moi la grâce de me garder, comme dernier refuge, cette cuisine avec sa marmite
toujours en l’air,
avec la crémaillère aux dents diaboliques,
la lanterne d’ecurie et le moulin à café,
le litre de pétrole, la boîte de chicorée extra et les allumettes de contrebande,
avec la lune en papier jaune qui bouche le trou du tuyau de poêle,
et les coquilles d’oeufs dans la cendre,
et les chenets au front luisant, au nez aplati,
et le soufflet qui écarte ses jambs raides et dont le ventre fait de gros plis,
avec ce chien à droite et ce chat à gauche de la cheminée, tous deux vivants peut-être,
et le fourneau d’où filent des étoiles de braise,
et la porte au coin rongé par les souris,
et la passoire grêlée, la bouille bavarde et le gril haute sur pattes comme un basset,
et le careau cassé de l’unique fenêtre dont la vue se paierait cher à Paris,
et ces pavés de savon,
et cette chaise de paille honnêtement percée,
et ce balai inusable d’un côté,
et cette demi-douzaine de fers à repasser, à genoux sur leur planche, par rang de taille,
comme des religieuses qui prient, voilées de noir et les mains jointes.
viii
{English Translation}
Lord, if it is true that you alone are great, do not reserve a castle for my old age,
but grant me the grace to keep, as a last refuge, this kitchen with its cooking pot always in the air,
with the pot hanger, and its evil teeth,
the stable lantern and coffee grinder,
the litre of oil, the box of “extra” chicory and the contraband matches,
with the yellow paper moon covering the stove pipe hole,
and the egg shells in the ash,
and firedogs with shining fronts and flattened nose,
and the bellows that spread stiff legs, with big belly folds,
with this dog on the right and this cat on the left of the fireplace, both alive, perhaps,
and the furnace where ember stars spin,
and the door with the corner gnawed by mice,
and the pockmarked colander, the talkative kettle, and the grill, high on its legs like a basset,
and the single broken window pane with a view one would pay dearly for in Paris,
and the cobblestones of soap,
and this chair of straw, honestly pierced,
and this broom, hard worn on one side,
and this half dozen of irons, kneeling on their boards, arranged by size, like the nuns who pray,
veiled in black, with clasped hands.
ix
Situating the Work within Rehabilitation Science
The field of Rehabilitation Science is defined as “an integrated science dedicated to the study of
human function and participation and its relationship to health and well-being” (Rehabilitation
Sciences Institute Handbook, 2014/2015 p. 4). “Using basic and applied methods, the science is
focused on phenomena at the level of the cell, person, family, community, or society to develop
and evaluate theories, models, processes, measures, interventions and policies to prevent,
reverse, or minimize impairments; enable activity; and facilitate participation” (Graduate
Department of Rehabilitation Science [GDRS] Handbook, 2007, p. 4.).) Within the realm of
rehabilitation science, the research completed as part of this dissertation contributes to the area of
Rehabilitation Technology Sciences.
The research described in this dissertation focuses on deriving new knowledge from real personal
emergency response calls. Keywords and phrases were identified using keyword categories; a
method for characterizing personal emergency situations and emergency response was
developed; and patterns or trends in call conversations were examined based on various
conversational and verbal measures. As well, audio recordings of spoken keywords, phrases, and
simulated personal emergency response situation scenarios by younger and older adult actors
were collected to create a spoken speech database tool. Together, these research outcomes will
help to advance knowledge in the area of personal emergency response as well as further the
design and development of an automated, artificially intelligent, spoken dialogue-based personal
emergency response system contained within a smart home monitoring system called the
HELPER.
The HELPER technology as a whole is considered a rehabilitation intervention because through
its use, individuals will be able to access appropriate medical care or obtain emergency attention
when needed. Access to immediate medical attention will help to prevent and minimize
impairment resulting from “waiting too long” for treatment or care. Minimizing impairment will
ultimately help to facilitate the older adult’s participation in daily living activities and will
support their aging-in-place.
x
Table of Contents
ACKNOWLEDGMENTS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐IV
PREFACE‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐VII
TABLEOFCONTENTS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐X
LISTOFTABLES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XVII
LISTOFFIGURES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XIX
LISTOFACRONYMS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XXIII
CHAPTER1 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐1
1 INTRODUCTIONANDLITERATUREREVIEW‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐1
1.1 Dissertation Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 1
1.2 Dissertation Overview ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 2
1.3 Introduction to the Problem ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 3
1.3.1 Aging‐in‐Place with Assistive Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 3
1.3.2 A Novel PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 5
1.3.3 The First HELPER Prototype ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 7
1.3.4 Research Rationale and Problem Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 8
1.3.5 Research Response ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 8
1.4 Literature Review ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9
1.4.1 PART I: PERS Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9
1.4.1.1 Health Challenges for the Older Adult ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9
1.4.1.2 Personal Emergency Response Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 10
1.4.1.3 PERS Technology Basics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 11
1.4.1.4 PERS Use by Older Adults ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 12
1.4.1.5 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 14
1.4.1.6 The HELPER Spoken Dialogue System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 16
1.4.2 PART II: Human to Computer Spoken Dialogue Interactions ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 20
1.4.2.1 Variables Affecting ASR Recognition Accuracy ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 20
1.4.2.2 The Older Adult Voice and ASR ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 22
1.4.2.3 The Older Adult User and Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 23
1.4.2.4 Spoken Dialogue Strategy ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 24
xi
1.4.3 PART III: Human to Human Emergency Dialogues ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 25
1.4.3.1 Emergency Response Call Basics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 25
1.4.3.2 Emergency Response Call Structure ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 26
1.4.4 Literature Review Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 28
1.5 Research Purpose and Objectives ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 29
CHAPTER2 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32
2 IDENTIFICATIONOFKEYWORDSANDPHRASESSPOKENBYCALLERSIN
PERSONALEMERGENCYRESPONSECALLS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32
2.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 32
2.2 Abstract ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 32
2.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 33
2.3.1 Need for a New PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 33
2.3.2 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 34
2.3.3 HELPER Prototype Testing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 34
2.3.4 Designing for the End‐User ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 35
2.3.5 Study Objective and Significance ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36
2.3.6 Background ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36
2.3.6.1 An Automated and Intelligent HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36
2.3.6.2 The HELPER Communication Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 38
2.3.7 Study Focus as Applied to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 41
2.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 42
2.4.1 Research Design Method ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 42
2.4.1.1 Method Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43
2.4.1.2 Method Implementation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43
2.4.1.3 Method Approaches ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43
2.4.2 Research Design Details ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44
2.4.2.1 Research Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44
2.4.2.2 Research Setting ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44
2.4.2.3 Data Collection ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 45
2.4.2.4 Data Processing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 46
2.4.2.5 Data Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 47
2.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 52
2.5.1 Extraction of keywords ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 52
xii
2.5.2 Keyword Results from Coders ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 54
2.5.3 Characterizing the Personal Emergency Situation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 57
2.5.3.1 Proposed PES Characteristics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 57
2.5.3.2 PES ‐ Caller Type ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 58
2.5.3.3 PES ‐ Risk Level ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 58
2.5.3.4 PES ‐ Call Reason ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59
2.5.3.5 PES ‐ Communication Ability ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59
2.5.4 The Personal Emergency Situation (PES) Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59
2.5.5 Classifying the Personal Emergency Response Calls ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 60
2.5.6 Reduction of Keyword List ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 61
2.5.7 Identification of Key PES Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 67
2.5.8 Keywords in Various PESs ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 68
2.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 69
2.6.1 Word Categories ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 69
2.6.2 Coding Methods ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 70
2.6.3 Full Keyword List Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 70
2.6.4 The PES Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 71
2.6.5 Small Keyword List Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 72
2.6.6 PES Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 73
2.6.7 Application to HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 74
2.6.8 Study Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 75
2.7 Conclusion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 76
CHAPTER3 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐77
3 IDENTIFICATIONOFCONVERSATIONALTRENDSINPERSONALEMERGENCY
RESPONSECALLS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐77
3.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 77
3.2 Abstract ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 77
3.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78
3.3.1 Need for a New PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78
3.3.2 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78
3.3.3 HELPER Prototype Testing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 79
3.3.4 Older Adults and Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 80
3.3.5 Study Objective & Research Significance ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 80
xiii
3.3.6 Background ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 81
3.3.6.1 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 81
3.3.6.2 The HELPER Communication Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 82
3.3.6.3 Human to Machine Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 85
3.3.7 Study Focus as Applied to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 86
3.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 87
3.4.1 Research Design Method ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 87
3.4.1.1 Method Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 88
3.4.1.2 Method Implementation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 89
3.4.1.3 Method Approaches ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 89
3.4.2 Research Design Details ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90
3.4.2.1 Research Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90
3.4.2.2 Research Setting ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90
3.4.2.3 Data Collection ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90
3.4.2.4 Data Processing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 91
3.4.2.5 Data Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 92
3.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98
3.5.1 The Conventional Conversational Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98
3.5.1.1 Two Main Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98
3.5.1.2 A Closer Look at Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98
3.5.1.3 The Personal Emergency Response (PER) Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 100
3.5.2 Conversational Analysis using PER Categories ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101
3.5.2.1 Descriptive Statistics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101
3.5.2.2 Call Breakdown Using PER Classifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101
3.5.2.3 Breakdown of Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 103
3.5.3 Conversational Analysis using Conversational Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 104
3.5.3.1 Analysis of Verbal Ability Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 105
3.5.3.2 Analysis of Conversational Structure Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 110
3.5.3.3 Analysis of Timing Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 114
3.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 116
3.6.1 Personal Emergency Response Call Trends ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 116
3.6.2 Verbal Ability Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 118
3.6.3 Conversational Structure Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 119
3.6.4 Timing Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 121
3.6.5 Study Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 122
xiv
3.6.6 Future Research ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 122
3.7 Conclusion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 123
CHAPTER4 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐124
4 THECARESCORPUS:ADATABASEOFOLDERADULTACTORSIMULATED
EMERGENCYDIALOGUEFORDEVELOPINGAPERSONALEMERGENCYRESPONSE
SYSTEM‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐124
4.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 124
4.2 ABSTRACT ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 125
4.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 125
4.3.1 Background & Motivation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 127
4.3.1.1 The Traditional PERS Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 127
4.3.1.2 Re‐designing the PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 128
4.3.1.3 The Application ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 128
4.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 130
4.4.1 Application Context and Target Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 130
4.4.2 Speech Corpus Design Specifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 131
4.4.2.1 Live Emergency Calls ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 131
4.4.2.2 Phonetically Balanced Sentences ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133
4.4.2.3 Spontaneous Speech Sample ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133
4.4.2.4 Simulated Vocal Expression ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133
4.4.3 Participant Recruitment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 134
4.4.4 Recording Procedure ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 135
4.4.4.1 Recording Environment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 139
4.4.4.2 Recording Equipment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 140
4.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 141
4.5.1 Participant Recruitment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 141
4.5.2 Speech Recording Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142
4.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142
4.6.1 The Age Effect ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142
4.6.2 Recording Difficulties ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142
4.6.3 Design Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 143
4.6.4 Background Noise ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144
4.6.5 Implementing the CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144
xv
4.6.6 Other Applications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144
4.7 Conclusions ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 145
CHAPTER5 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐146
5 DISCUSSION&CONCLUSIONS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐146
5.1 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 146
5.2 Study Highlights ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 147
5.2.1 Principal Findings from Study 1: Identification of Keywords and Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 147
5.2.2 Principal Findings from Study 2: Identification of Conversational Trends ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 148
5.2.3 Principle Findings from Study 3: Creating the CARES Corpus. ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 150
5.2.4 Data Interpretation Highlights ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 152
5.2.4.1 Identification of Keywords & Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 152
5.2.4.2 Statistical Analyses of Conversational Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 153
5.2.4.3 Actor Simulated PESs‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 154
5.3 Contributions to Knowledge ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155
5.3.1 Original Research with Response Call Recordings ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155
5.3.2 Applying Research Findings to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155
5.3.3 The CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 162
5.4 Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 163
5.4.1 Study 1: Keyword and Phrase Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 163
5.4.2 Study 2: Statistical Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 164
5.4.3 Study 3: Creating the CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 164
5.4.4 PES and PER Call Classifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165
5.4.5 Methodology Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165
5.5 Future Research ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165
5.5.1 Supporting the ASR ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165
5.5.2 Developing the Dialogue ‐ Assessing Patterns in Response Call Conversations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 166
5.5.3 HELPER Field Testing ‐ Future Proposed Studies ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 168
5.5.3.1 Developing the HELPER Speech Handler ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 168
5.5.3.2 Developing the HELPER Dialogue and Response Handlers ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 170
5.5.3.3 Testing the HELPER Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 170
5.6 Implications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 171
5.7 Final Remarks ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 172
xvi
BIBLIOGRAPHY‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐173
APPENDIXA:COMMONOLDERADULTCONDITIONS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐186
APPENDIXB:ORIGINALHELPERDIALOGUESTRATEGY‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐187
APPENDIXC:SMALLKEYWORDVOCABULARYSET‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐188
APPENDIXD:UNIQUEKEYWORDOCCURRENCES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐193
APPENDIXE:QUESTIONSFORPARTICIPANT‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐194
APPENDIXF:KEYWORDSANDPHRASESLIST‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐195
APPENDIXG:EMERGENCYSCENARIOS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐200
APPENDIXH:EMERGENCYRESPONSESERVICESVISITS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐206
APPENDIXI:SUMMARYOFPEERREVIEWEDJOURNALPAPERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐209
xvii
List of Tables
Chapter 1
Table 1-1: Physical changes in the older adult and the possible effects on speech expression. ... 22
Table 1-2: Emergency Response Call Discourse and Speech Acts. ............................................. 26
Chapter 2
Table 2-1: Various distinct approaches to content analysis. ......................................................... 44
Table 2-2: The word categories derived from words extracted from response call transcripts. ... 53
Table 2-3: Summary of coding results for Coders 2 and 3 based on keywords and category
matching. ....................................................................................................................................... 54
Table 2-4: Breakdown of the keywords identified from Coder 3. ................................................ 55
Table 2-5: Summary of phrase results for Coder 3 with agreement of keyword selection. ......... 56
Table 2-6: Breakdown of the keywords identified from Coder 3. ................................................ 56
Table 2-7: Initial exclusion criteria for reducing keyword list. .................................................... 61
Table 2-8: Definitions and examples of the word inclusion criteria. ............................................ 65
Table 2-9: Definitions and examples of the final word exclusion criteria. ................................... 65
Table 2-10: A breakdown of the phrase categories included in the CARES corpus selected by
Coder 3, sorted by word categories. .............................................................................................. 68
Table 2-11: The number of keywords identified by response call classification. ........................ 69
xviii
Table 2-12: Example of how an incoming statement might be deciphered by the semantic
analyser ......................................................................................................................................... 74
Chapter 4
Table 4-1: Important aspects of emergency call transcript analysis applied to speech corpus
design specifications. .................................................................................................................. 131
Table 4-2: Summary of speech sample recorded and general recording details. ........................ 135
Table 4-3: Emergency scenario type, risk level and scenario detail. .......................................... 138
Table 4-4: Example of data combination arranged for each participant indicated. .................... 139
Table 4-5: Participants by Age Group and Gender ..................................................................... 141
Table 4-6: Participants by Age Range ........................................................................................ 141
Chapter 5
Table 5-1: Example of how the original HELPER initial dialogue strategy and response call
classifier may work with incoming user responses. .................................................................... 160
Table 5-2: Example of how initial dialogue from a high alert dialogue strategy and response call
classifier may work with incoming user responses. .................................................................... 160
Table 5-3: Example of how initial dialogue from a medium alert dialogue strategy and response
call classifier may work with incoming user responses. OA = Older adult. ............................... 161
xix
List of Figures
Chapter 1
Figure 1-1: Differences between traditional PERS and the HELPER. -------------------------------- 6
Figure 1-2: Comparison of pathway to emergency response between traditional PERS and
HELPER. ------------------------------------------------------------------------------------------------------ 6
Figure 1-3: Pathway to personal emergency response using the traditional push-button PERS. - 11
Figure 1-4: Pathway to personal emergency response using the HELPER System. --------------- 14
Figure 1-5: Main components of the HELPER System. ----------------------------------------------- 15
Figure 1-6: Sub-sections and functional components of the HELPER Communication Module 18
Figure 1-7: The ASR component of the HELPER Communication Module. ---------------------- 18
Figure 1-8: Inside the dialogue handler component of an SDS. -------------------------------------- 19
Figure 1-9: The internal components of the response handler within the SDS. -------------------- 20
Chapter 2
Figure 2-1: Pathway to personal emergency response using the traditional push-button PERS. - 37
Figure 2-2: Pathway to personal emergency response using the HELPER System. --------------- 37
Figure 2-3: Sub-sections and functional components of the HELPER Communication Module 39
Figure 2-4: The ASR component of the HELPER Communication Module. ---------------------- 40
Figure 2-5: Possibly data application areas within the HELPER communication module along the
personal emergence response pathway. ------------------------------------------------------------------ 41
xx
Figure 2-6: Diagram of the process of exploratory sequential mixed methods design (Clark &
Creswell, 2011). -------------------------------------------------------------------------------------------- 42
Figure 2-7: Flow diagram illustrating the methodology followed to analyse the calls and
complete study objectives. -------------------------------------------------------------------------------- 47
Figure 2-8: Process of keyword identification and categorization from Coder 1. ------------------ 49
Figure 2-9: Process of keyword identification and categorization from Coders 2 and 3. --------- 50
Figure 2-10: Examples of differences between risk levels. ------------------------------------------- 58
Figure 2-11: Model of a PES ------------------------------------------------------------------------------ 60
Figure 2-12: Diagram outlining the decision process for selecting key words using the first word
focus set. ----------------------------------------------------------------------------------------------------- 63
Figure 2-13: Diagram outlining the decision process for selecting key words using the second
word focus set. ---------------------------------------------------------------------------------------------- 64
Figure 2-14: A diagram of showing the pathway to personal emergency response including the
PES model and categories within the classifier unit within the HELPER System. ---------------- 72
Chapter 3
Figure 3-1: Pathway to personal emergency response using the HELPER System. --------------- 81
Figure 3-2: Sub-sections and functional components of the HELPER Communication Module 83
Figure 3-3: Inside the Speech Informant sub-component of the Speech Handler. ----------------- 83
Figure 3-4: Inside the dialogue handler component of an SDS. -------------------------------------- 84
Figure 3-5: The internal components of the response handler within the SDS. -------------------- 85
xxi
Figure 3-6: Diagram of the pathway to personal emergency response using the HELPER with the
addition of the ‘conversational measures’ and ‘timing’ features added. ---------------------------- 87
Figure 3-7: Diagram of the process of exploratory sequential mixed methods design (Clark &
Creswell, 2011). -------------------------------------------------------------------------------------------- 88
Figure 3-8: This flow diagram illustrates how calls were analysed and how outcomes were and
could be applied. -------------------------------------------------------------------------------------------- 93
Figure 3-9: The PES model characterized by caller type, risk level, and call reason. ------------- 94
Figure 3-10: The personal emergency response (PER) model. -------------------------------------- 101
Figure 3-11: Older Adult and Care Provider responders requested during a response call. ----- 102
Figure 3-12: Boxplots of verbal ability measures broken down by risk levels for caller and
speaker types. ---------------------------------------------------------------------------------------------- 106
Figure 3-13: Box plots of conversational measures broken down by risk levels for caller and
speaker types. ---------------------------------------------------------------------------------------------- 111
Figure 3-14: Box plots of timing measures broken down by risk levels for caller and speaker
types. -------------------------------------------------------------------------------------------------------- 115
Chapter 4
Figure 4-1: The CARES Corpus application within the context of a PERS. ---------------------- 129
Figure 4-2: Sample screen shot of emergency phrases and words presented to the participant
during speech recording. The participants were provided with screen prompts to indicate how
the word was to be spoken. ------------------------------------------------------------------------------- 137
Figure 4-3. Participant room setup in sound attenuating booth. ------------------------------------- 140
Figure 4-4. Experimenter room setup in sound attenuating booth. ---------------------------------- 140
xxii
Chapter 5
Figure 5-1: Diagram of the internal components of the HELPER Communication Module. ---- 157
Figure 5-2: Diagram showing a possible response call classifier setup based on the study
findings. ----------------------------------------------------------------------------------------------------- 159
Figure 5-3: A flow diagram illustrating the methodology followed to analyse the calls and
complete study objective. --------------------------------------------------------------------------------- 167
Figure 5-4: The pathway to personal emergency response with “dialogue acts” applied to help
the HELPER. ----------------------------------------------------------------------------------------------- 168
xxiii
List of Acronyms
General Acronyms
ASR: automatic speech recognition/
recognizer (p.2)
CARES: Canadian Adult Regular and
Emergency Speech (p.48)
EMS: emergency medical services (p.12)
HELPER: Health Evaluation Logging and
Personal Emergency Response
(System) (p.5)
PER: personal emergency response (p.92)
PERS: personal emergency response
system (p.1)
PES: personal emergency situation (p.2)
SALT: Systematic Analysis of Language
Transcripts (p.46)
SDS: spoken dialogue system (p.17)
Statistical Measure Acronyms
DF: discriminant function (p.108)
MZW: proportion of total words with mazes
(p.104)
NQ: number of questions (p.109)
NRQ: number of responses to questions
(p.109)
NS: number of statements (p.109)
OWU: number of one word utterances
(p.109)
RM_MANOVA: repeated measures,
multivariate analysis of variance
(p.104)
S.D.: standard deviation (p.100)
ST: speaker turns (p.114)
TNL: turn length in words (p.104)
UPM: utterances per minute (p.94)
WPM: words per minute (p.194)
1
Chapter 1
1 Introduction and Literature Review
1.1 Dissertation Introduction
“It is not how old you are, but how you are old.” - Jules Renard
People age differently depending on their environment, access to financial and social supports,
their health, education, and thinking. Although the actual ‘age’ at which one might consider
himself or herself an “older adult” varies, no one, who lives long enough, can avoid becoming
“elderly.” Over the last century since this phrase was written, society’s ability to provide health
and long-term care support for their elderly population has evolved tremendously. However,
despite vast technological advances and societal changes, many elderly still face challenges
obtaining adequate health care especially those individuals with multiple, long-term, chronic, and
complex care problems coupled with mobility difficulties. Consequently, there exists a desire, a
need, and an overall benefit for the elderly to stay healthy as long as possible and living
independently within their homes and communities.
To help support aging-in-place, a number of assistive technologies have been designed and
developed for the older adult population, ranging from aids for general mobility, communication,
and cognition. Under the communication umbrella, assistive technologies called personal
emergency response systems (PERS) were developed to help individuals living at home with
higher medical risk and/or with mobility difficulties to contact emergency assistance when
needed any time of the day or night, typically by pushing a body worn “button” activator.
Despite many noted benefits from using PERS technology, a majority of elderly people resist
PERS adoption and use. Prior research notes that this resistance results from barriers that span
the physical, social, and psychological realms. In an attempt to address these issues and to ‘re-
think’ how PERS technology can be better designed for the elderly cohort, the concept of a
novel, smart home monitoring system called the HELPER was devised. This system incorporates
within it an automatic, artificially intelligent, spoken-dialogue based PERS (herein called the
“automated PERS”). The main premise behind the HELPER is that such a system will be able to
automatically detect adverse events (e.g., a fall) visually or can be directly activated using
2
spoken word(s) (e.g., a cry for help). The HELPER would interact with the end-user as a first
responder and will allow users to contact their desired responder directly (or cancel a call)
without going through an operator. The automatic activation and speech-based method of
communication could also eliminate the need to wear a body worn activator.
Preliminary testing of a HELPER prototype, by prior researchers, with younger adults using
limited vocabulary (e.g., yes/no responses) demonstrated the feasibility of using automatic
speech recognition (ASR) to communicate with a live user during a simulated personal
emergency situation. However, in order to bring this system to a state in which it can be tested
with real end-users in an actual emergency situation, the communication component of the
HELPER requires further design, development, and testing. Information is needed on what
actually happens during a personal emergency situation (PES) and the personal emergency
response call (herein also called the “response call” or “call”). However, no research literature
could be identified that specifically describes or characterizes what happens during a response
call and/or the response call conversation. Furthermore, in terms of training and testing the
HELPER communication module components (e.g., the ASR), there are limited spoken speech
databases available with older adult speakers and none of the ones found contained Canadian
English examples of emergency situations of sufficient recording quality.
It is hypothesized that the knowledge and data required to further the design of the HELPER
system can be obtained from real personal emergency response calls. Therefore, the main goal
and focus of this dissertation was to derive knowledge and data from analyses of real response
calls using the traditional push-button PERS and to identify ways in which this information could
be applied to help further the design and development of the new automated PERS.
1.2 Dissertation Overview
This dissertation contains five chapters based on three research studies. Chapter 1 provides an
introduction to the dissertation and the research problem, presents a review of the literature, and
summarizes the research purpose and objectives. Chapter 2 describes the first study that focuses
on the isolation of keywords and phrases from response calls, the categorization of keywords,
and the development of a model to characterize PESs. Chapter 3 describes the second study that
3
focuses on characterizing and identifying trends in response calls and response call conversations
that may be built into the automated PERS intelligence for predicting a user’s desired response
when calling for help. Chapter 4 describes the third study that focuses on the design and
development of a spoken speech corpus (the CARES corpus) that may be used for HELPER
training and testing. Chapter 5 is the last chapter and presents a summary for each of the three
studies, discusses contributions of the research work, proposes future work, and ends with a final
conclusion. This dissertation is presented in manuscript style. The studies in Chapters 2 and 3 are
being prepared for submission to peer-reviewed journals. The study in Chapter 4 has been
published in a peer-review journal. Due to the nature of the manuscript format, there may be
some overlap in the content presented, specifically in the introduction, background, and
methodology sections of Chapters 2 to 4.
1.3 Introduction to the Problem
The following introduction will present the research rationale and problem in further detail,
summarize relevant literature, and outline the main research purpose and objectives.
1.3.1 Aging-in-Place with Assistive Technology
The mind is what sets humans apart from other animal species. The human mind’s ability to
communicate through language, to reason, imagine, and imitate at a high level is far superior to
that of any other living creature. It is the human mind that has conceived of tools and techniques
to advance technology, medical treatment, social development, and other initiatives in work and
life that have all combined to extend average human life expectancy into the eighties (age in
years) within the last century. It is also this mind that has led to the development of assistive
technologies to both assist and facilitate humans in their everyday lives (Childress, 2003; Mann,
Ottenbacher, Fraas, Tomita, & Granger, 1999; McCreadie & Tinker, 2005). Yet despite its
capabilities, even the mind is not able to prevent the inevitable decline of the human body or
human functioning, either physically and/or cognitively, as a consequence of injury, chronic
disease or advanced age. Surveys show that one of the greatest fears about aging for the older
human is the risk of losing one’s independence as a result of ill health, increasing frailty, and/or
the decline of the mind’s faculties (Disabled Living Foundation, 2009; News Agencies, 2014).
4
The success in extending human life combined with a lower rate of birth (Bernstein, 1999) has
led to a demographic shift resulting in population aging (World Health Organization, 2011).
With a growing aging population, there is mounting concern about how to handle the increasing
size of a potentially higher maintenance, higher risk, and higher cost older adult cohort, at least
in terms of health care provision (Longino, 1994; World Health Organization, 2011). Herein lies
the desire and need to stay healthy for as long as possible and to age-in-place within one’s own
home and community. Research studies show that seniors, those individuals over 65 years of
age, who remain longer in their communities and who ‘age in place’ tend to age more
successfully (World Health Organization, 2011). They live longer and with a higher self-
perceived quality of life compared to those who age “out of place” in institutions such as long-
term care, nursing homes, or hospitals (Ramage-Morin, 2005).
One way to achieve the goal of aging-in-place is with the use of assistive technologies. An
assistive technology has been defined as “any device or system that allows an individual to
perform a task that they would otherwise be unable to do, or increases the ease and safety with
which the task can be performed (Cowan, Turner-Smith, & others, 1999).” In particular, PERS
assistive technology was developed specifically to provide individuals at higher risk for medical
complications with an easy way to communicate their need for emergency assistance any time of
the day or night when home alone. By receiving care quickly, the goal was to prevent or alleviate
the negative effects that could arise when care is received too late (i.e., after a long lie, after a
heart attack or stroke). With the miniaturization of technologies and advances in computational
power, personal emergency response technologies are now melding into the next generation of
‘smart home’ technologies (Hessels, Le Prell, & Mann, 2011). An earlier definition of a smart
home or “smart housing” was “the electronic and computer-controlled integration of many of the
devices within the home (Cowan et al., 1999).” More recently a ‘smart home’ has been defined
as a “residence equipped with technology that enhances the safety of patients at home and
monitors their health conditions” (Chan, Campo, Estève, & Fourniols, 2009; Demiris, Hensel,
Skubic, & Rantz, 2008). These smart home systems would not only provide immediate
emergency assistance shortly after an adverse event but would also include the ability for
continuous home and health monitoring. By continuously monitoring the user within the home,
the need for possible medical intervention prior to any event even occurring may be possible.
5
The ultimate goal would be to prevent a personal emergency situation from happening
altogether.
Researchers and technology developers alike are making a concerted effort to further develop
rehabilitative assistive technologies like the PERS to make them more ‘age-friendly’. The hope
is that these new technologies will be widely adopted by older adults and that they will be
accessible, usable, and effective at helping to keep the older adult population healthy, mobile,
and living independently longer within their homes and communities.
1.3.2 A Novel PERS
In 2004, in the Intelligent Assistive Technology and Systems Lab (IATSL) at the Rehabilitation
Sciences Institute at the University of Toronto, research began on a novel smart home
technology that also integrated an automated PERS, called the HELPER or ‘Health Evaluation
Logging and Personal Emergency Response’ System (hence forth called the HELPER). As a
smart home technology, the HELPER concept was conceived to monitor the home for adverse
events by visually tracking a user’s movements and positions. If an adverse event such as a fall
was detected, the HELPER would attempt to communicate with the user to determine if
assistance was required. Strictly focusing on the PERS aspect of the HELPER, one of the main
limitations of the traditional push-button PERS that the HELPER seeks to overcome is the need
to physically wear a button actuator to initiate a response call. When using the traditional PERS,
not only does the individual have to remember to wear the ‘button,’ but he or she must also
decide to wear the button (Porter, 2005). Failure in wearing the button essentially renders the
system useless during a PES. Not wanting to wear the button or use the system entirely leads to a
failure in technology adoption. Figure 1-1 illustrates the main interface differences between the
traditional PERS (see older adult to the left of the arrow) and the automated PERS within the
HELPER (see images to the right of the arrow). The HELPER unit (an early prototype version)
is shown mounted on the ceiling in the lab.
By harnessing the human’s unique ability to communicate through language, the HELPER
concept plans to use speech as one of two methods a caller can use to initiate a response call.
Using speech activation would eliminate the need to wear a button actuator on the body and as
speech is a natural form of communication, may be seen as more amenable to the older adult user
6
than wearing a button actuator 24 hours a day. The other method of response call initiation
would be automatically through the HELPER’s vision module.
Eliminate body worn activators
Enhance User Control
Older Adult Friendly
Figure 1-1: Differences between traditional PERS and the HELPER. (Credits: older adult image obtained from WWW, drawing from Intelligent Assistive Technology and Systems Lab)
Figure 1-2 illustrates a flow diagram with the traditional PERS pathway on the left (solid purple
arrows) and the HELPER pathway on the right (dotted orange arrows).
Activates call by pushing button
Family, friends, care providers or emergency
services contacted
Caller speaks with a personal emergency call taker at a local call centre
Assistance Arrives
Help neededTraditional
PERS Pathway
False alarm call cancelled
“Hands on”
“Live person”“Live person”
Personal Emergency Situation
Call activated by adverse event detection or spoken
keyword
“Hands off”
Caller communicates with HELPER computer
HELPER Pathway
“Non‐Live person”
Figure 1-2: Comparison of pathway to emergency response between traditional PERS and HELPER.
7
Following the HELPER pathway, if an adverse event is detected, the HELPER’s vision module
would automatically activate the communication module. In contrast to the traditional PERS
pathway, no active initiation is required from the user when an event is detected automatically. If
the individual does not respond to the HELPER verbally when assistance is deemed necessary,
the system default would be to automatically contact a live person for help. In the absence of a
personal first responder the personal emergency response call centre’s call taker (herein called
the “call taker”) is the default. However, if the individual does not want assistance, he or she
would have the autonomy to cancel the call before a live person is contacted. On the other hand,
in a situation where no adverse event has been detected or the individual has changed his/her
mind and has decided that assistance would be beneficial after all, he or she could still initiate a
response call using a simple spoken keyword or phrase.
1.3.3 The First HELPER Prototype
To demonstrate the feasibility of using ASR technology within a PERS, previous researchers
developed and tested a preliminary HELPER prototype on young adults in a controlled
laboratory environment (Hamill, Young, Boger, & Mihailidis, 2009). Details of the study
described in this section are summarized from Hamill et al. (2009; 2005). The ASR used in the
test was limited to the recognition of ‘yes’ and ‘no’ word forms (e.g., yah, nuh). The acoustic
model used consisted of speech samples from male and female adults speaking randomly
generated sequences of words (AN4 from Carnegie Mellon University). The dialogue was
limited to asking close-ended questions with instructions to respond with a “yes” or “no” answer.
See Appendix B for flow diagram of this dialogue. In this preliminary lab test, the
communication module was initiated by fall detection and not via keyword initiation. With the
HELPER prototype mounted on the ceiling of the room, speech input from users was recognized
correctly 79% of the time and the desired responses were correctly identified in all the twelve
cases tested (Hamill et al., 2009). The success in identifying desired responses despite
recognition errors was attributed to the fact that the dialogue required users to confirm their
response. Thus, the system was able to recover from incorrect recognition in the confirmation
stage.
8
1.3.4 Research Rationale and Problem Summary
The researchers of this study recognized the fact that younger adults speaking in a “calm and
casual” controlled laboratory environment were not truly representative of a real PES with older
adults in a home environment. They proposed several recommendations for future studies
including: (1) expansion of the system vocabulary for words other than “yes” and “no” (e.g.,
ambulance, help); (2) ASR training with older adult speech in context; (3) investigation of the
potential usefulness of statistical modelling methods for planning and decision making; (4)
identification of additional dialog states; (5) ‘in context’ system testing with either live or
recorded voices in PESs including older adult speakers; and (6) optimization of microphone
input parameters (Hamill et al., 2009).
Essentially, aside from the last recommendation which is a technology optimization problem, in
order to design and develop a more robust HELPER it is necessary to understand how actual
end-users in PESs will respond during a response call conversation, to identify what their needs
are in different PESs, and to know what types of PESs might arise. Furthermore, in order to
perform ASR training and system testing a database of speech samples would be required,
specifically including older adult speakers in personal emergency type situations.
To date, no research literature could be found that examines how personal emergency response
call conversations unfold in detail. If the technology developer is unaware of how PERS users
respond during PESs and how help is requested, how can the automated PERS be designed and
developed to work well in a real-live situation with actual end-users?
1.3.5 Research Response
This research focuses on the analyses of live personal emergency response calls for the purpose
of deriving knowledge and data that can be applied to the design and development the HELPER
communication module, specifically, the spoken-dialogue and artificial intelligence components.
Knowledge gained from this research has been directly applied to the development of a spoken
speech corpus that could be used to conduct the actual training and testing of these
communication components. These research outcomes will be beneficial for HELPER
9
development but will also contribute to knowledge in the area of personal emergency response
calls and call dialogue.
1.4 Literature Review
This literature review provides the background knowledge required to better understand the
contributions of this dissertation. The review has been divided into three parts. Part I:
summarizes literature relating to PERS technology. An overview of the main health challenges
faced by older adults is presented first, followed by an introduction to the technology basics, a
review of the complexities surrounding PERS usage, and an introduction to the HELPER system.
Part II: focuses on the literature relating to human-to-computer spoken dialogue interactions.
Potential variables affecting ASR recognition accuracy are presented first, followed by a
description of the characteristics of older adult speech expression, and finally a review of the use
of ASRs and SDSs with older adult users. Part III: provides a description of the basic structure of
general emergency response calls (e.g., 911) and the emergency response call handling
procedures.
1.4.1 PART I: PERS Technology
1.4.1.1 Health Challenges for the Older Adult
A great volume of literature is available detailing the many aspects of the aging process as well
as the common medical concerns that ail the elderly. See Appendix A for some common older
adult conditions. This is not surprising since seniors are also the most frequent users of the health
care system with the greatest health care costs being spent on those over 80 years of age (CIHI,
2011, 2013). The two greatest health concerns for the older adult include the onset of chronic
diseases, such as heart disease and kidney disease, and frailty. Weiss (2011) defines frailty as “an
increased vulnerability to advanced and persistent loss of function that, at least initially, only
becomes evident under stress.” With age also come varying degrees of functional limitations
such as difficulties in bending and reaching or stooping, which may ultimately increase the effort
the older adult needs to exert in order to complete instrumental activities of daily living, such as
cleaning, cooking, shopping, and managing household affairs (Cornman, Freedman, & Agree,
2005; Cowan et al., 1999). The increased likelihood of having functional limitations with one or
10
more chronic medical conditions or frailty also places the elderly adult at a higher risk for falling
down. In Canada, in 2008, more than 2500 seniors over the age of 65 were reported to have died
from injuries related to falls with over 78,000 fall-related hospitalizations associated with hip
fractures reported in 2010/2011 (Public Health Agency of Canada, 2014). For seniors 75 years of
age and older, falls have also been identified as one of the leading causes of both hospitalization
and institutionalization (Demiris et al., 2004; M. Johnson, Cusick, & Chang, 2001; Koski,
Luukinen, Laippala, & Kivela, 1996; Public Health Agency of Canada, 2014; G. Williams,
Doughty, Cameron, & Bradley, 1998). When combined with possible cognitive impairment and
the need for multiple medications, seniors are at a high risk for experiencing medical
complications during emergency situations (Gibson & Hayunga, 2006; Hwang & Morrison,
2007; Salvi et al., 2007). Many research studies highlight the importance of providing emergency
assistance as promptly as possible to increase chances for full recovery (Handschu, Poppe,
Rauss, Neundörfer, & Erbguth, 2003; Rosamond et al., 2005). Unfortunately, the older adult may
not immediately recognize the severity of a situation, may wait to ask for assistance, and then
discover he/she is unable to obtain assistance when needed (e.g., when injured and alone) (Fogle
et al., 2008). It is precisely for reasons such as these that personal emergency response
technologies were developed and why it is important that they be designed to be desirable, easy
to use, accessible and robust.
1.4.1.2 Personal Emergency Response Technology
Assistive technologies have been used by humans for rehabilitation for over thousands of years
(Childress, 2003) and research studies show that assistive technologies can help slow the rate of
functional decline, as well as, reduce an older adult’s reliance on and use of institutional and in-
home personal services (i.e. home and personal care attendants) (Freedman, Agree, Martin, &
Cornman, 2006; Mann et al., 1999; Ramage-Morin, 2005). The PERS technology was
established in the early 1970’s and research demonstrates that these systems can help diminish
the possibility of prolonged injury after a fall or medical trauma (e.g., heart attack or stroke)
(Dibner, 1993; Hessels et al., 2011; Maddox, 1992; Patel et al., 2012). PERS usage has also been
found to decrease overall health care costs and ease care provider and user anxiety (Mann,
Belchior, Tomita, & Kemp, 2005; Montgomery, 1993; Roush, Teasdale, Murphy, & Kirk, 1995).
Common names for the PERS include, but are not limited to, community alarms, social alarms,
11
personal triggers, medical alarms, dispersed alarms, or emergency alarms. In North America,
personal emergency response services are typically offered by private companies as opposed to
provincial, state operated or city run local municipal emergency response services (i.e., police,
fire, and ambulance) and are commonly paid for out of pocket by the end-user (Bernstein, 1999;
Hessels et al., 2011).
1.4.1.3 PERS Technology Basics
In terms of how the PERS technology works, a traditional PERS consists of three components
(Dibner, 1993; Hessels et al., 2011): (1) a wireless push-button typically worn on the body
similar to a necklace or watch, (2) a speaker phone base unit located in the subscriber’s residence
such as on a table or shelf in a main room, and (3) a call centre where the live operator (call
taker) receives and handles incoming calls (Mann et al., 2005). The call taker, is defined as the
person designated and trained to answer response calls. He/she is armed with prior knowledge of
the subscriber’s medical history, place of residence, and a list of preferred first responders.
Figure 1-3 illustrates the pathway to an emergency response using the traditional PERS.
1. Personal Emergency Situation
The Traditional Push Button PERS
Speaker Phone or Telephone
Who is calling?Call reason?Situation risk level?Response required?
Push button activator
2. Personal Emergency Call Centre Call Taker
3. Call ResponseEmergency Response Services
Personal Responder(s)No Response (false alarm)
Live Person Live Personspokendialogue
Hands on
Figure 1-3: Pathway to personal emergency response using the traditional push-button PERS.
During a PES (step 1), the subscriber or user initiates a response call by physically pushing their
button actuator. Once activated, a signal is transmitted through the caller’s speakerphone base
unit to the call centre and a call taker responds immediately (step 2). The call taker, is defined as
the person designated and trained to answer the response call. He/she is armed with prior
knowledge of the subscriber’s medical history, place of residence, and a list of preferred first
12
responders. The call taker communicates with the subscriber using the speakerphone or home
telephone (if speakerphone is not working). The call taker typically identifies who is calling and
the reason for the call, assesses the situation or risk level, and finally determines what response
to provide. A response may include contacting personal responders (i.e., family, friends, or care
providers) or emergency response services (i.e., paramedics, police, or fire fighters) (step 3). If
there is no response from the subscriber, either the subscriber’s first responder or another listed
care provider would be contacted and decisions about what to do next progress from there.
Where circumstances warrant, all emergency response services (i.e., police, fire, and/or
ambulance) may even be dispatched to the subscriber’s home (this was discussed on-site during
informal interviews with emergency medical service (EMS) providers and firefighters). In a false
alarm situation where the button was mistakenly pushed, no further action would be required and
the response call would be ended.
1.4.1.4 PERS Use by Older Adults
With respect to human-machine interaction, a considerable amount of research has been
conducted involving PERS technology over the last four decades. These studies include a general
review of the technology over time (Hizer & Hamilton, 1983; Montgomery, 1993), examinations
of the older adult’s lived experience with the PERS (Johnston, Grimmer-Somers, & Sutherland,
2010; Porter, 2003, 2005), the adoption and impact of PERS usage at a personal and social level
(Davies & Mulley, 1993; De San Miguel & Lewin, 2008; Fallis, Silverthorne, Franklin, &
McClement, 2007; Heinbüchner, Hautzinger, Becker, & Pfeiffer, 2010; Mann et al., 2005), as
well as at a medical care or institutional level (Hyer & Rudick, 1994; McWhirter, 1987; Roush et
al., 1995). Overall, despite very high user satisfaction, PERS adoption within the older adult
population is not as pervasive as it could be. Only a small percentage of older adults who could
use a PERS actually own a PERS (Bernstein, 1999; Fallis et al., 2007; Hessels et al., 2011; Hizer
& Hamilton, 1983; Mann et al., 2005; Porter, 2005; Roush et al., 1995). Two main reasons for
failing to adopt the PERS include a lack of perceived need and basic issues with device form and
function (Davies & Mulley, 1993; Hessels et al., 2011; Mann et al., 2005; Porter, 2005). In
redesigning the traditional PERS, it would be essential to consider not only what would be usable
by an older adult but how to make it desirable and accessible for all users (Blythe, Monk, &
13
Doughty, 2005). Piau, Campo, Rumeau, Vellas, & Nourhashemi (2014) asserted, “technological
innovations need to be perceived by the elderly as relevant to their everyday lives to be useful.”
As it stands, many PERS owners do not comply with wearing their buttons 24 hours a day
(Davies & Mulley, 1993; Heinbüchner et al., 2010). In fact, the majority removes their button in
the evenings and during showering when the risk of falls is greatest (De San Miguel & Lewin,
2008; Taylor & Agamanolis, 2010). Furthermore, even when the button is worn, a significant
proportion of older adult users choose not to activate their systems during a situation of need
(Davies & Mulley, 1993; Fleming, Brayne, & others, 2008; Heinbüchner et al., 2010; Hessels et
al., 2011; Levine & Tideiksaar, 1995; Mann et al., 2005; Porter, 2005; Taylor & Agamanolis,
2010). One study examining falls and emergency alarm use found that 80% of persons (113 of
141) who fell while alone did not use their PERS to obtain help, which included 37 of 38 who
remained on the floor for a long time (over 1 hour) (Fleming et al., 2008). Reasons for non-
compliance with wearing the button are both physical, social, and psychological and include
feeling frustrated if the button was too easy or difficult to activate, leading to a high potential for
false alarms, forgetting to wear the button or a lack of perceived need to wear it, a lack of
comfort or attractiveness when wearing the button, the perception that wearing the button labels
one as old or vulnerable, and cost (Blythe et al., 2005; Davies & Mulley, 1993; Heinbüchner et
al., 2010; Hobbs, 1993; Johnston et al., 2010; Porter, 2005; Taylor & Agamanolis, 2010). The
reasons found in the literature for not pushing the button include the subscriber wanting to
manage and solve the problem on their own (e.g., user wants to get up from a fall on their own or
get help using the telephone) (Fleming et al., 2008; Heinbüchner et al., 2010; Porter, 2005), not
wanting to bother anyone (De San Miguel & Lewin, 2008), and a fear of losing one’s
independence if institutionalized (Heinbüchner et al., 2010). On the call centre side, non-
emergency or false alarms calls are frequent and may increase the burden of already stressed
emergency response service providers (Hamill et al., 2009; Mann et al., 2005; McWhirter, 1987;
Taylor & Agamanolis, 2010; Tinker, 1993). McWhirter (1987) reported a false alarm rate as high
as 40% and Hobbs (1993) suggested it may be 90% or more. The need for a better designed
PERS system to adequately support a growing population of older adults has been suggested by
several researchers (Blythe et al., 2005; Davies & Mulley, 1993; Fallis et al., 2007; Porter, 2005;
Taylor & Agamanolis, 2010). This literature clearly demonstrates the challenges that persist with
14
having a system requiring users to wear part of the system, and also the complexities in
designing for the older adult end-user with their natural desire to remain autonomous. In essence,
if the button is not worn or pushed, the PERS is rendered useless.
1.4.1.5 The HELPER System
To overcome these barriers, the HELPER concept, as previously introduced in section 1.3.2, is a
smart home technology designed to help address the current system limitations of the traditional
PERS - to improve upon its design and expand upon its functionality. A diagram of the pathway
to emergency response using the HELPER is presented in Figure 1-4.
1. Personal Emergency Situation
The HELPER System
Ceiling/wall/shelf mounted
camera, speaker, microphone
2. HELPER ComputerSpeech or
vision activation
3b. Call ResponseEmergency Response Services
Personal Responder(s)
3a. PERS Call Takerspoken
dialogue
Live PersonWho is calling?Call reason?Situation risk level?Response required?
Is person present?Is person active?Is activity/inactivity normal?Activate communications?
2a. Vision Module
2b. Communication Module
No Response (false alarm)
Live Person
Hands free
Figure 1-4: Pathway to personal emergency response using the HELPER System.
In the event of a PES (step 1), the HELPER computer would identify the adverse event (step 2)
either automatically via a computer vision based sensing module (step 2a) which tracks the user
with a camera or via a communication module (step 2b) that recognizes speech input from the
user with a microphone. If the vision module first identifies the event, it would activate the
communication module and an attempt would be made to converse with the user using spoken
dialogue (speaker - microphone combo). By ‘conversing’ with the user, the HELPER would then
attempt to identify the user’s desired response and proceed to contact that responder. Similar to
the traditional PERS, the possible responders would include personal responders and/or
15
emergency response services (step 3b). Additionally, the user may also choose to be connected
with a call taker (step 3a), which would also be the default condition if the HELPER computer
was unable to determine what the user wants. If no response is desired, the HELPER would end
the call. Unlike with the traditional PERS, because the user interacts with a non-live HELPER
computer initially, his/her autonomy is maintained in that he/she can choose when to talk to a
live person and what type of response to initiate. In a sense, the HELPER would function like a
hands-free telephone but with select options and a safety default to the live call taker should
something go wrong.
Figure 1-5 illustrates the HELPER’s four main technology components: (1) the camera, (2)
microphone, (3) speaker, and (4) HELPER computer.
Personal Emergency Situation
SpokenDialogue
Incoming Speech
How can I help you?
1. Camera
2. Mic
3. Speaker
Incoming Image
a. Communication Module
Speech Handler
What does caller want?
Respond to Caller & Get Help
Response Handler
Dialogue Handler
How to respond to caller
Extract Image
b. Vision Module
Identify user location | activity
UserTracking
Image Analysis
Is activity normal?
4. The HELPER System (Computer)
AssessHealth
Older Adult
Hello? Anyone?
Contact Responder
Figure 1-5: Main components of the HELPER System.
Within the HELPER computer, two main function modules are shown: (a) the Communication
Module and (b) the Vision Module. Ideally, the camera, microphone and speaker components
would be mounted on the ceiling or wall or situated in a spot with an optimal camera viewing
angle, microphone input range, and speaker output range. See Figure 1-1 for an example setup.
Inside the HELPER computer, both the communication and vision modules are actively working
to collect speech input and user images. Inside the vision module, the user tracking extracts the
incoming images, identifies the user location and tracks user activity over time. It then assesses
16
the images to determine whether an abnormal event has occurred. If an abnormal event is
detected, the communication module is activated. The Dialogue Handler is initiated and must
determine how to respond to the call. For example, if the call has just been initiated, the Dialogue
Handler will determine that the “opening” dialogue must be spoken. In this case, an opening
greeting might be, “do you need help?” Incoming speech from the user is received by the Speech
Handler sub-component of the communication module. In the Speech Handler the potential
words or phrases spoken by the user are identified and the possible meaning of the response is
determined. The proposed user response results are then sent to the Dialogue Handler to continue
the next conversational turn. In this manner, the human-to-machine conversation proceeds until
the appropriate responder has been identified and contacted by the Response Handler.
The research described in this dissertation will be concerned solely with the communication
module of the HELPER system.
1.4.1.6 The HELPER Spoken Dialogue System
Now that a proposed novel technological solution exists, how feasible is it to use computers to
recognize speech in this context? Research studies have shown that older adults are receptive to
using speech to interact with assistive home technologies (Anderson et al., 1999) including
PERS (J. L. Johnson, Davenport, & Mann, 2007; Portet, Vacher, Golanski, Roux, & Meillon,
2013; Taylor & Agamanolis, 2010). Using speech to activate a PERS would remove the need to
wear the button activator on the body, which would address the present traditional PERS
compliance issue with the push-button and in theory, may reduce the number of accidental calls.
The ability for the HELPER to function as an intermediary between the user and the eventual
live responder is an attempt to improve upon the functionality of the traditional PERS by
maintaining the user’s autonomy in deciding who to call for assistance and when to talk to a live
person. These new system features may hypothetically increase technology adoption and use.
The ability of the HELPER computer to communicate with a human user “verbally” over several
speaker-turns places its communication module into a category of interactive dialogue systems
called a spoken dialogue system (SDS) (Fraser, 1997). According to Mӧller (2005) a SDS is
characterized by its ability to accept continuous speech, allow for user initiatives, to reason,
detect errors or incoherence, to correct, anticipate, and/or predict the spoken user response. A
17
SDS is typically comprised of at least five functional components (Georgila, Wolters, Moore, &
Logie, 2010; Lamel, Minker, & Paroubek, 2000; Mӧller, 2005):
(1) The ASR - receives an acoustic signal (spoken input) and transforms this into a most
probable word sequence;
(2) The Semantic Analyser or Natural Language Understanding component - deciphers the
meaning or intention of the probable word sequence;
(3) The Dialogue Manager – maintains the dialogue and keeps a history of responses;
(4) The Response Generation component – determines the output dialogue according to “the
dialog state, the user utterance, and/or information returned from the database” (Lamel et
al., 2000);
(5) The Speech Synthesis component – converts selected system utterances to actual speech
output.
According to the best practice guidelines for spoken language dialogue systems and components
produced by the DISC European project, the six essential aspects of SDS development include:
speech recognition, language understanding and generation, dialogue management, speech
synthesis, human factors, and systems integration (Lamel et al., 2000). However, currently,
although all SDSs include an ASR component and some form of speech synthesis or output, the
presence of (2) the Semantic Analyser, (3) the Dialogue Manager and (4) the Response
Generation components ranges from not present, to limited in nature, to fully present and
possibly complex (Furui, 2003; Lamel et al., 2000; Vipperla, Wolters, Georgila, & Renals,
2009).
In the HELPER communication module, it is proposed that all the basic functional components
of the SDS be present to follow the DISC recommendations, in addition to a component for
contacting the live responder, conveniently called the “call responder” component. Figure 1-6
illustrates the proposed internal sub-components of the HELPER communication module with
the Call Responder component at the top.
18
The HELPER Communication Module
Incoming Speech
Spoken Output
Speech Handler
Response Handler
Dialogue Handler
Responder On Route
Automatic Speech Recognizer (ASR)
Dialogue Manager
Response Generation
Speech Synthesis
Speech Informant
Call Responder
Figure 1-6: Sub-sections and functional components of the HELPER Communication Module
The Semantic Analyser or Natural Language Understanding component of the SDS would be
included inside the Speech Informant component (located above the ASR) in Figure 1-6.
Taking a closer look at how the ASR component functions, Figure 1-7 illustrates the typical
internal structure of an ASR system. This diagram was derived from (Glass & Zue, 2003;
Jurafsky, 2014).
A/D Conversion &Feature Extraction
Decoder Linguistic Models1. Acoustic2. Pronunciation3. Language
Automatic Speech
Recognizer (ASR)
Incoming Speech from User
To ‘Speech Informant’ Component
Figure 1-7: The ASR component of the HELPER Communication Module.
Starting at the bottom left, incoming speech from the user (the acoustic waveform) arrives
through the microphone and is digitized and processed into “numerical representations of speech
information or features” that describe relevant characteristics of the speech signal for ASR
19
(Scharenborg, 2007). These features are then sent to the Decoder which attempts to decode the
speech signal or recognize what was said by searching through (1) a pre-assembled collection of
speech sound1 representations within the acoustic model, and (2) following specific
pronunciation rules in the pronunciation model (lexicon), and (3) grammar and language rules
in the language model (3) to identify a “best match” (Scharenborg, 2007).
The ASR output is then sent to the Speech Informant component (see Figure 1-6) where the
proposed “best match” utterance is processed. Within the Speech Informant component, an
attempt is made to help the computer “understand” the meaning of the utterance. From this point
on, processed speech from the Speech Informant is sent to the Dialogue Handler as illustrated in
Figure 1-6 with the breakdown in Figure 1-8. Inside the Dialogue Handler, the dialogue
controller looks at the dialogue history, the current dialogue set and dialog state and determines
how next to respond to what the user said. Once the Dialogue Manager decides how to proceed,
the Response Handler is then activated where a response can be generated, or a call to the
responder can be made.
DialogueHandler
Dialogue Manager
Dialogue History
Dialogue StateDialogue Control
Dialogue Set
From ‘Speech Informant’ Component
To Call Responder
To Response Generation
Figure 1-8: Inside the dialogue handler component of an SDS.
The Response Handler is illustrated in Figure 1-9.
1 The speech sounds are usually sub-word units such as phones, the smallest unit of sound of a language (Gold & Morgan, 2000; Jurafsky & Martin, 2009).
20
Spoken Output (to speakers)
ResponseHandler
Database of Dialogue Text
Select Response
Responder On Route
Response Generation
Speech Synthesis
Speech OutputDatabase of
Spoken Dialogue
Responder Information
Response Request History
Call Responder (Initiate/Confirm)
Call Responder
From Dialogue Manager
Figure 1-9: The internal components of the response handler within the SDS.
Aspects of the diagram are derived from (Mӧller, 2005). Inside the Response Generation
component, a database of possible dialogue responses (text) is searched for the response
requested by Dialogue Manager. This response is then sent to the ‘Speech Synthesis’ component,
which searches a database for the desired spoken dialogue units, synthesizes the text to speech if
necessary (pre-recordings of output dialogue may be used), and sends the response out to the
user through a speaker system. If the Call Responder component is activated, the Call Responder
might check for a preferred responder or look through a history of requests to inform the
Dialogue Manager if any further query is required. Once a desired responder is confirmed, the
call to the desired responder is initiated.
Given this background knowledge of how the HELPER SDS should function in theory, the next
part of the literature review will focus on how these technologies function in the real-world.
1.4.2 PART II: Human to Computer Spoken Dialogue Interactions
1.4.2.1 Variables Affecting ASR Recognition Accuracy
The ability to simulate or replicate the human’s ability to recognize and understand speech using
technology has been a growing area of research for over 60 years (Anusuya & Katti, 2009; Gold
& Morgan, 2000). Although considerable progress has been made in the field of ASR, a human’s
21
capacity for speech recognition and understanding in a range of environments is still unmatched
and is superior to that of any machine (Dusan & Rabiner, 2005; Furui, 2003; Scharenborg,
2007). A major source of ASR error arises from a mismatch between the speech sounds used to
train the acoustic models and the actual incoming spoken speech to be recognized (Furui, 2003;
King, 2006). There are many reasons why this mismatch occurs. King (2006) provides a nice
summary of these potential sources of speech variation ultimately affecting ASR recognition
accuracy:
(1) inherent speaker variability: even with the same speaker, a speech sound is not repeated
in exactly the same way twice;
(2) speech production and rate: speech variation is best quantified by the rate of speaking
(e.g., the speed of speech output) and ‘speech production processes’ such as how speech
is spoken. For example, read, planned, and spontaneous speech, such as reading a
newspaper, presenting prepared lecture notes, or ‘everyday’ talk in conversations
respectively, are all acoustically and linguistically different from each other (Furui, 2003;
King, 2006).
(3) human machine adaptation: humans have been shown to adapt their speech (simplify and
reduce speed) when speaking with a machine;
(4) out-of-vocabulary sounds: disfluencies in spoken output such as word fragments,
repeated words/phrases, repairs, and similar phenomena can lower recognition rates.
Scharenborg (2007) also adds, with continuous speech recognition (i.e., recognition of
many incoming words at once) the common ASR system will always propose a possible
output based on its vocabulary. This means that the ASR system lacks the ability to
identify out-of-vocabulary words or non-words;
(5) overlapping conversation: overlapping speech results in signal mixing which can also
reduce recognition rates;
(6) microphone considerations: variation may result when using different microphones for
recording speech samples for ASR training and picking up incoming speech during
testing. Microphone positioning and distance from the speech source also affects the
speech waveform (Jurafsky & Martin, 2009);
22
(7) background noise and reverberant environments: the incoming speech signal may be
masked or degraded by background noise or interference within reverberant
environments making recognition even more challenging.
1.4.2.2 The Older Adult Voice and ASR
Taking a closer look at ASR use by older adults, Lippmann (1997) asserted that the
characteristics of the naturally aged voice are less easy to recognize by commercial ASR systems
that are typically designed for a non-disordered, specific accent, younger adult age group. Other
research studies show that in an emergency or stressful situation human speech may become
altered, if not already, to the point of impairment or disorder, either as a result of a medical
trauma, disease or strong emotion (Devillers & Vidrascu, 2007; Fogle et al., 2008; Handschu et
al., 2003; LaPointe, 1994; Patil & Hansen, 2007)(p.359).
Research literature suggests that age-related voice deterioration may start around the age of 60,
but the degree of deterioration is highly dependent on the individual’s health and well-being
(Ramig, 1994) (p.494). The physical changes associated with natural aging can affect the older
adult’s ability to express speech. Table 1-1 summarizes how certain physical changes can effect
speech expression (Gorham-Rowan & Laures-Gore, 2006; D. Hall & Sinard, 1998; Linville,
2002; Zraick, Gregg, & Whitehouse, 2006). These changes can ultimately alter speech acoustics
and an ASR’s overall ability to accurately recognize the speech.
Table 1-1: Physical changes in the older adult and the possible effects on speech expression.
Physical changes Effect
increased respiration frequency intra-word pauses
decreased muscle efficiency, increased tissue stiffness and a dry laryngeal mucosa could affect vocal tract resonance, phonation and speech articulation
changes in fundamental frequency or pitch
articulation imprecision
(e.g., longer voice-onset time, longer duration of vowels and consonants)
increased voice perturbations
(e.g., tremor, spectral noise, hoarseness)
decreased voice intensity
slower cognitive function slower pace
23
Wilpon & Jacobsen (1996) found that the accuracy of their ASR system, which was trained with
adult speech, was reduced when recognizing older adult speech over 70 years of age. Studies by
Anderson et al. (1999) and Baba, Yoshizawa, Yamada, Lee, & Shikano (2004) further showed
that an ASR acoustic model trained using older adult speech was better able to recognize older
adult voices than an acoustic model trained with only younger adult speech. These study findings
were also supported by Vipperla et al. (2009) who examined speech recognition accuracy using
an ASR acoustic model trained with in context speech from younger adults versus in context
speech from older adults. They found that the ASR word error rate dropped for both younger and
older adult users when using the ASR with acoustic models trained with similarly age-matched
speech samples, The word error rate (WER) dropped from roughly 33% (baseline) to 25% for the
older adult users and from roughly 22% to 11% for younger adult users. These studies clearly
show that ASR recognition accuracy can be improved if the ASR acoustic model is trained with
the same type of speech and in a similar context to the type of speech it would expect to receive
when implemented in the real-world.
In terms of error recovery, Takahashi, Morimoto, Maeda, & Tsuruta (2003) asserted that because
“current speech recognition technology is far from perfect and cannot completely avoid the
recognition errors, many researchers try to develop robust system[s] which can detect and
recover from the system’s misunderstanding.” Furui (2003) and Vipperla et al. (2009) also
support this statement. As mentioned previously, the requirement that users confirm their
responses within the conversation, in essence, enables the system to recover from recognition
errors. However, Hamill et al. (2009) noted that the probability of having two errors occur in a
row was high, therefore, the system still needs to be made more robust.
1.4.2.3 The Older Adult User and Spoken Dialogue Systems
Although, the volume of research literature that discusses the use of SDSs with older adult users
is small, the outcome is consistent. Research studies show that older adult users exhibit definite
patterns of interaction and linguistic variability with SDSs which are different from their younger
counterparts (Georgila et al., 2008; Wolters et al., 2010; Wolters, Georgila, Moore, &
MacPherson, 2009). For this reason, it is important to consider how older adults interact with all
components of the SDS and incorporate those features that will facilitate their ease of use with
24
these end-users (Wolters et al., 2010; Zajicek, Wales, & Lee, 2004). Research conducted by
Wolters et al. (2009) identified two types of SDS user groups. Factual users were those who
adapted to the system and who used a concise communication style (only necessary keywords)
with fairly uniform behavior. Social users were those who treated the system similar to a human
being and who did not adapt their interaction style. The social users were characterized “by more
interpersonal communication, higher verbosity [had longer dialogues], and greater variability
between users” (Wolters et al., 2009). Younger aged adults were found to be mainly factual
whereas just over one third of older adults were factual and the other less-than two-thirds were
social (Wolters et al., 2009). Georgila, Wolters, & Moore (2010) found that the dialogue of older
adult users showed more initiative and repetition of information than their young users. They
used a richer vocabulary and were more social (Georgila et al., 2008), which was expressed in
their speech by using more "definite articles, more auxiliaries, more first person pronouns, and
most importantly, more lexical items related to social interaction, such as 'please' and 'thank
you'” (Mӧller, Gӧdde, & Wolters, 2008). Compared to younger adults, Wolters et al., (2010)
found that the communication style and interaction of older adults were affected by age,
speaking, cognition, hearing, language comprehension, language production, short-term memory,
and affinity to the technology. In essence, these studies underline the importance of designing
and testing the HELPER communication dialogue system with the older adult user keeping in
mind the actual environment in which the system will be used.
1.4.2.4 Spoken Dialogue Strategy
The dialogue strategy used within a SDS can be divided into three main types: (1) system-
initiative, (2) mixed-initiative, and (3) user-initiative. A system-initiative SDS is one in which
the system’s communication or dialogue script is followed with no user deviation, but this may
lead to “long and tedious interactions and generally unnatural dialogues” (Wolters et al., 2009).
A mixed-initiative SDS allows for shared initiative taking with the system expecting the user to
respond to prompts but more information can be provided than what was requested (i.e., over-
answering allowed). Lastly, in a user-initiative system, the user can change the dialogue structure
freely; however, Wolters et al. (2009) stated that “given the limitations of current ASR and
Natural Language Understanding, user-initiative often leads to many errors and
misunderstandings throughout the interaction.” In this study, the social users were found to be
25
“less efficient and less satisfied” with the system-initiative SDS that they interacted with
(Wolters et al., 2009). Therefore, Wolters et al. (2009) suggested a mixed-initiative system may
be better to incorporate this group of users. However, the tradeoff of using a mixed-initiative
approach would be the need to increase the complexity of the ASR and Natural Language
Understanding processes as well as add better error recovery techniques, and the increased risk
of task failures (Wolters et al., 2009). Although older adults were less easy to stereotype, it was
shown that they are able to learn how to speak to a system if help is given when errors are
encountered (Mӧller et al., 2008). In the HELPER prototype, a system-initiative dialogue
strategy is currently used. However, in an emergency situation, if the user utters words other than
‘yes’ and ‘no’ it would be important to use at least a mixed-initiative dialogue strategy to allow
the ability to accept non yes/no words that may have significant importance, such as “I need an
ambulance.”
1.4.3 PART III: Human to Human Emergency Dialogues
1.4.3.1 Emergency Response Call Basics
Armed with some knowledge of how older adults interact with SDSs and the challenges of using
ASR in the real-world, the literature review will now shift focus to look at what is known about
emergency response dialogue. Personal emergency response calls could be considered as a
subset of the general (i.e., 911) emergency response call. Where the emergency response call is
concerned primarily with the ‘where, who, what, why, how, and when’ of a situation (Imbens-
Bailey, 2000), the personal emergency response call is mostly concerned with the ‘what, why,
and how’. Call takers already have access to information on the ‘who’ and ‘where’ and the
‘when’ is assumed to be ‘now’. Although there is a vast amount of research literature
surrounding emergency situations and emergency medicine (there are entire journals dedicated to
these topics), only a small number of research literature was identified that specifically examined
emergency response call conversation organization, structure, and call handling (Cromdal,
Osvaldsson, & Persson-Thunqvist, 2008; Garner & Johnson, 2007, 2007; Imbens-Bailey, 2000;
Waseem, Durrani, & Naseer, 2010; Whalen & Zimmerman, 1987). No research literature has yet
been found through the university library and internet searches that pertain specifically to the
conversations of personal emergency response call’s with older adults, their organization,
26
structure, and/or call handling methods. However, a summary of personal emergency response
call protocol was obtained from the private call centre who provided the recorded calls used in
this research.
1.4.3.2 Emergency Response Call Structure
Knowing the structure of an emergency response call will be helpful in identifying potential
differences with respect personal emergency response calls. The emergency response call
generally follows a basic pattern or call sequence including: (1) an opening sequence
(identification/ greeting/acknowledgement), (2) a request sequence (basic information exchanged
about why caller is requesting aid) (3) interrogative series (dispatcher elicits further information
as required), (4) a response (offer/deny a response to the request or complaint), and (5) a closing
(dispatcher may assure caller that help is on the way) (Imbens-Bailey, 2000; Whalen &
Zimmerman, 1987; Zimmerman, 1992a, 1992b). Zimmerman (1992b) commented that the ER
call sequences presented could be “modified, augmented, and used repetitively or not at all”
depending on the situation. Imbens-Bailey (2000) further identified speech acts within the
discourse. The speech acts are labeled “SA#” in Table 1-2.
Table 1-2: Emergency Response Call Discourse and Speech Acts.
ER Caller. ER Dispatcher Opening (Greeting/Acknowledgement/Identify) Reason for Call Report problem – SA1: descriptive Request (SA2: direct - demand/ SA3: indirect) Ambient (no speaking) Closing
Opening (Greeting/Acknowledgement) SA1: Compliance to need SA2: Acknowledge/Confirmation SA3: Elicit further information SA4: No Response Closing
Using the call centre prototype manual as a guide (Private_PERS_Call_Centre, 2008), the
personal emergency response call structure appears to follow fairly close to that of the
emergency response call sequence, with a few exceptions. Most notably, in the case of the
emergency response call, the caller risks a chance of being denied assistance, whereas, in the
personal emergency response call, assistance is always provided unless the caller denies that they
need help. Furthermore, the call taker must obtain consent from the caller to dispatch aid and
allowance is given for the caller to choose from different responders. The basic call protocol for
the personal emergency response call taker is outlined as follows:
27
Step 1: Greet the subscriber or caller – a template structure is provided
Step 2: Identify yourself and confirm needs
Step 3: Get consent from subscriber or caller to dispatch
Step 4: Dispatch EMS or contact appropriate help
Step 5: Reassure the subscriber or call
Step 6: Follow up and follow through with the alarm
Step 7: Reset the unit
Step 8: Close the alarm
From this protocol, only steps 1 to 3 will be considered within this dissertation. In terms of step
1, a common opening is used by all call takers. The opening script is: “Hello {Subscriber Name},
this is {Call Taker Name} from {Call Centre Name}, how may I help you?” The call taker’s
ultimate goal is to dispatch appropriate assistance. The protocol instructs them to: (1) assess the
situation, (2) determine if help is needed, (3) get permission from subscriber to place on hold,
and (4) call for appropriate help. To carry out these instructions, the guidelines for conversation
include the call takers asking questions to elicit a positive or negative response, to repeat back
the exact words used by the caller, and to probe to “establish the nature of the emergency and the
assistance required.”
In terms of the medical aspect of the response call, the call takers are given some basic
information. The following information in this paragraph is taken from
(Private_PERS_Call_Centre, 2008). The manual describes a “medical distress” as being the state
in which the subscriber is experiencing the following symptoms: severe chest pains, suffering
from a stroke, difficulty breathing freely due to lack of oxygen, suffering from a seizure attack,
hemorrhaging (bleeding), suffering from insulin shock, or having an allergic reaction to
medication. To identify if the caller is in medical distress, the call taker’s suggested symptoms to
identify include: difficulty breathing, chest pains, excessive bleeding, nausea/vomiting, and other
pains, discomfort or weakness. If the call is a fall or an injury has occurred, the subscriber is
instructed to determine the cause of the fall and when it occurred, how the caller fell (e.g., down
stairs or off bed), any injuries (e.g., broken limbs), if the caller is fully awake, having difficulty
28
breathing, or is bleeding. The main conditions caller takers are to report to EMS Responders
(e.g., 911) include whether the PERS user or subscriber is conscious and alert, breathing or
having difficulty breathing, and if he/she is bleeding severely. This information is aligned with
the “ABC’s” of emergency response: the mnemonic used by emergency responders to assess
perceived patient acuity (Canadian_Red_Cross_Association, 2006). The ‘ABC’s’ stand for
“airway” (is airway clear for breathing and person conscious?), “breathing” (is person breathing?
Can talk?), and “circulation” (any injury to circulation system or signs of shock)
(Canadian_Red_Cross_Association, 2006).
1.4.4 Literature Review Summary
In summary, despite the demonstrated benefits gained from using PERS technology, many
barriers to technology adoption and use exist which can only be addressed by re-designing how
the PERS is used as well as re-thinking how PERS technology can be applied and made more
desirable for end-users. The HELPER is one proposed solution. One of the difficulties in further
developing the communication capability of the HELPER is the need to design the system for
actual end users in real PESs. This means ensuring that the HELPER’s communication module
includes a Speech Handler component that can receive incoming adult and older adult speech
during a PES, then process, decode, and decipher it’s meaning; a Dialogue Handler component
that can coordinate the personal emergency response conversation with the PERS user; and
finally a Response Handler component that can output the necessary dialogue response or
contact an emergency responder as required. Whether it is possible to carry out a spoken
dialogue conversation sufficiently well in actual PESs with real users remains to be seen.
However, in order for the design and development of the HELPER to proceed any further, there
is a need for specific knowledge and tools that currently do not exist. Specifically, there is a gap
in the PERS literature that characterizes personal emergency response calls and call
conversations in detail. Furthermore, a suitable training and testing database containing older
adult speech in PESs speaking in Canadian English could not be identified that could be used for
training the HELPER ASR and other SDS components.
29
1.5 Research Purpose and Objectives
This research begins to address these gaps in knowledge and a spoken database tool has been
developed that can be used to advance the HELPER development. The main information source
used for this research was a collection of real personal emergency response calls attained from a
private personal emergency response call centre. Generally speaking, the overall goal was to
derive knowledge and data from analyses of the acquired response calls and to identify ways to
apply the research findings to help further the design and development of the HELPER
communication module. These response calls were analysed at various levels including the word,
speaker turn, conversation, and call levels. Three research objectives were identified:
Objective 1: to identify keywords and phrases used by existing PERS users in various
personal emergency response call situations.
Objective 2: to identify significant trends in personal emergency response calls and call
conversations that may be used to tailor the call response to the user.
Objective 3: to design and develop a corpus of spoken speech to be used for training
and testing the communication module of the HELPER system.
The first objective was the focus of Study 1. In order to improve the HELPER communication
module’s ability to understand the end-user speaking words other than “yes” and “no” (and their
various forms), the recommendation from previous researchers was to expand the system
vocabulary. Research by Takahashi et al. (2003) (Vipperla et al., 2009) also supports the fact that
simple “yes” and “no” responses are not the only answers spoken by patients responding to
close-ended yes/no type questions (in their case, medical type questions). One method for
identifying keyword vocabulary would be to perform an analysis of response call conversations
in order to identify keywords and word combinations spoken by PERS users during various
response calls and PESs. In addition to identifying the keywords, a method for grouping the
keywords and characterizing the different PESs was also important for determining which of the
keywords are spoken during different PESs. The main research outcome from Study 1 can be
applied to improving the HELPER’s speech handler.
30
The second objective was the focus of Study 2. In order to improve the HELPER communication
module’s intelligence, especially in dialogue planning and decision making, it was important to
focus on the main goal of the HELPER which is to provide an appropriate emergency response
to the end-user as quickly as possible. In order to achieve this goal, it would be important to
know when different types of responses (targets) are requested, what kind of dialogue should be
used to respond to the user, and how much time the system has to respond to a call. To facilitate
the HELPER’s decision making ability, it was also important to identify what decisions could be
made based on the incoming speech. In addition to recognizing spoken words, knowledge of
conversational patterns and call statistics may prove useful in helping the HELPER manage and
structure the call dialogue or even foreshadow the probable target response. For example, if all
fall calls were found to result in a request for a care provider, then the HELPER dialogue could
be designed to automatically suggest contacting a care provider responder for all identified fall
calls. The main research outcome from Study 2 could be applied to improving the HELPER’s
artificial intelligence (including decision making and dialogue management).
The third and final objective was the focus of Study 3. In terms of improving the HELPER
communication module’s ability to robustly recognize speech from real end-users, prior research
suggested that training the ASR system with closely matched end-user speech in similar
situations could improve word recognition rates. Improving the system’s natural language
understanding would also help in correctly understanding user utterances. As well, in-context
system testing with end-user voices in PESs would be beneficial for testing and fine tuning the
HELPER communication module. In terms of obtaining speech samples of end-users, especially
older adults in PESs, ethically, it would not be feasible to create emergency response situations.
As well, these situations would be difficult to predict in advance and then record live while
remaining an “uninvolved” bystander. However, simulated or enacted emergency situations
would be possible. In order to realistically recreate a response call scenario prior knowledge
about what actually happens and how speakers converse during a response call conversation are
needed. As previously mentioned, no research literature could be identified detailing this
information for personal emergency response calls. A review of the research literature and
existing speech corpuses was also not successful in uncovering any speech corpuses suitable for
ASR training that contained older adult type speech in PESs in Canadian English. Other types of
31
databases were available with older adult users, such as the MATCH corpus described by
(Georgila, Wolters, Moore, et al., 2010) but this corpus does not contain speech in PESs. The
acquired response calls from the personal emergency call centre were also not of sufficient
recording quality to use for ASR training. Additionally, obtaining consent to use the caller’s
recorded voices would be difficult. Likewise, privately held databases were not readily
accessible and we were unsure whether these collections contained appropriate content or sound
quality for testing the HELPER. Thus the decision was made to design and construct our own
corpus containing older adults speaking in mock emergency situations based on the real response
calls acquired. The main research outcome from Study 3 was a spoken corpus tool that can be
used to train and test components of the HELPER communication module.
32
Chapter 2
2 Identification of Keywords and Phrases Spoken by Callers in Personal Emergency Response Calls
2.1 Prologue
This chapter describes the process of analyzing personal emergency response calls in order to
isolate keywords and phrases used by PERS callers, categorize keywords by word function, and
develop a way to model personal emergency situations. The process of reducing the original
keyword set to a smaller set for inclusion into the CARES corpus is also described. This study
uses both qualitative and quantitative methods to explore the real call data. The contents of this
chapter are intended for publication but have not yet been published.
2.2 Abstract
Purpose: A novel automated, intelligent, spoken dialogue-based personal emergency response
system concept is being developed in an attempt to address the existing usability barriers
identified by prior research groups of traditional push-button type personal emergency response
systems. The main purpose of this study is to identify the keywords and phrases used during
various personal emergency response call situations in order to help, in future, tailor the spoken
dialogue system of an automated personal emergency response system to the end-user.
Method: An emergent, exploratory, sequential mixed methods design was used for this study
with word categories and response call classifications identified qualitatively and keywords and
phrases identified quantitatively using content analysis of personal emergency response calls.
Results: 18 word categories, 402 keywords, and 135 phrases from 84 personal emergency
response calls were identified in this study. The personal emergency response situations were
classified according to three categories: caller type, risk level, and call reason. The keyword list
was selectively reduced to 185 keywords and phrases for inclusion into a spoken speech
database. Using the reduced keyword list and the risk level classification, common and unique
keywords were identified for low, medium and high risk personal emergency situations.
33
Conclusion: The results of this study can be used to improve the spoken-dialogue component of
the novel automated personal emergency response system by expanding the system’s automatic
speech recognition capability with keyword vocabulary; by improving the system’s ability to
understand incoming speech using keyword categories; and by enhancing the system’s ability to
classify a call based on pre-identified patterns or trends in keyword usage during different
personal emergency situations. This work will contribute to the future development of the
automated personal emergency response system’s speech handler and provides further
knowledge about the characteristics of actual personal emergency response calls and call
conversations.
2.3 Introduction
2.3.1 Need for a New PERS
Research studies have found that older adults, or individuals 65 years of age and older, who
remain living longer in their communities or who ‘age-in-place,’ tend to age more successfully
(World Health Organization, 2011). They live longer and with a higher self-perceived quality of
life compared to those who age “out-of-place” in institutions such as long-term care, nursing
homes, or hospitals (Ramage-Morin, 2005). One specific assistive technology being used to
facilitate aging-in-place is the personal emergency response system or PERS. The PERS was
developed in the early 1970’s and was designed to provide individuals at higher risk for medical
complications and/or with mobility difficulties, quick access to emergency assistance any time of
the day or night at the push of a body worn button activator (Dibner, 1993). By providing access
to emergency care when needed, the PERS technology can be used to prevent or alleviate the
negative consequences that may arise when care is received too late (i.e., after a long lie, after a
heart attack or stroke). In addition PERS use has also been shown to decrease overall health care
costs and ease care provider and user anxiety (Mann et al., 2005; Montgomery, 1993; Roush et
al., 1995). However, despite the many benefits of using a PERS, only a small percentage of older
adults have adopted the technology and actually use it when needed (Bernstein, 1999; Fallis et
al., 2007; Hessels et al., 2011; Hizer & Hamilton, 1983; Mann et al., 2005; Porter, 2005; Roush
et al., 1995). Research studies have attributed the reasons for resistance to PERS use to barriers
spanning the physical, social, and psychological realms (Davies & Mulley, 1993; Hessels et al.,
34
2011; Mann et al., 2005; Porter, 2005). In light of these findings, researchers have concluded that
there is a need for a better designed PERS; one that is more tailored to the needs of the older
adult and overall, more desirable and accessible for all end-users (Blythe et al., 2005).
2.3.2 The HELPER System
To address this need, the Intelligent Assistive Technology and Systems Lab at the Rehabilitation
Sciences Institute at the University of Toronto is developing a novel, intelligent, spoken
dialogue-based PERS that is part of a larger smart home monitoring system concept called the
HELPER (“health evaluation logging and personal emergency response system”) (Belshaw,
Taati, Snoek, & Mihailidis, 2011; Hamill et al., 2009; Lee & Mihailidis, 2005; Tam, Dolan,
Boger, & Mihailidis, 2006). In theory, the HELPER would continuously monitor the home for an
adverse event (i.e., a fall) and then automatically initiate a response sequence if such an event is
detected. The person being monitored would communicate first with an artificially intelligent
HELPER call taker who would connect the user to their desired live responder. Using speech or
vision to activate the PERS removes the need to wear a body worn activator such as the
traditional PERS “push-button” and will hypothetically increase the user's autonomy and privacy
by permitting the user to either direct or cancel the call before reaching a live call operator.
2.3.3 HELPER Prototype Testing
In terms of the current state of HELPER development, feasibility testing of a HELPER prototype
by previous researchers has successfully demonstrated that automatic system activation (i.e., via
camera detection of a simulated adverse event) followed by human-to-computer communication
using spoken-dialogue and automatic speech recognition (ASR) is possible (McLean, 2005).
Prototype testing was performed with younger adults in a controlled lab environment with the
ASR set to recognize “yes” and “no” word forms (McLean, 2005). The next step would be to
further design, develop, and fine-tune the communication module to work with actual end-users,
especially older adults, in real personal emergency situations (PESs). Only after this step is
completed should the system be field-tested with end-users in live emergency situations (Hamill
et al., 2009).
35
2.3.4 Designing for the End-User
The importance of considering the end-user and the real-world environment in the design of the
HELPER is supported by previous research studies that focus on ‘universal design’ and the older
adult use of SDSs. The ‘universal design’ approach as described by Federici & Scherer (2012) is
based on the premise that, “…designing products to match a mythical average of human abilities
and conditions is in conflict with the fact that all human users are diverse and experience
different personal and environmental circumstances. Inaccessible mainstream products and
services designed with a focus on a narrow subset of human functioning, such as information and
communication technologies, medical equipment, and physical infrastructure, can impose
significant barriers on people with disabilities and people who are aging.” (Section 1.3.6, p.18).
Essentially, it cannot be assumed that older adult users will interact with the automated PERS in
the same way as younger users or even amongst the same age cohort. As well, different PESs
may also change the way users interact with the system.
Prior research on SDSs has shown that users do not always respond with strictly “yes” or “no”
responses when asked questions that require “yes” and “no” answers (Takahashi, Morimoto,
Maeda, & Tsuruta, 2003). Other research has also demonstrated that older adults do not interact
with SDSs in the same way as their younger counterparts. In fact, a majority of older adults use
both acoustically and linguistically different speech expressions compared to younger adults
when interacting with SDSs (Georgila et al., 2008; Wolters et al., 2010, 2009). Based on these
findings, in order to further develop the SDS and especially the Speech Handler component, it is
important to identify what type of dialogue and vocabulary are used in an actual PES by end-
users and whether different dialogue and vocabulary patterns exist for different PESs.
To the author’s knowledge, aside from the personal emergency response call company’s call
taker protocol manual, no research literature examines personal emergency response call
conversations, the words used within a conversation, or the dialogue patterns that may occur in
different PESs. Not knowing how PERS users respond during PESs makes it extremely difficult
for HELPER technology developers to universally design for end-users in actual situations.
36
2.3.5 Study Objective and Significance
In order to consider the end-user in the design and development of the HELPER, it was
important to identify a way to capture samples of PES conversations, either live or recorded. In
real life, PESs are not the kind of events that can be easily predicted or ethically induced.
Consequently, it was hypothesized that recorded samples of real personal emergency response
calls (herein also referred to as the “call” or “response call”) would be the most feasible and
useful source of end-user conversation samples in the context situations. This study focuses on
the analyses of a collection of real personal emergency response calls. The main objective of this
study was to identify the keywords and phrases used by existing PERS users in various
personal emergency response call situations.
According to Haggag (2013), keywords are significant words or term s that can “best present the
document context in brief and relate to the textual context.” Being able to identify these
keywords and phrases would be significant not only to individuals or organizations wishing to
better understand how PERS users communicate their needs in a personal emergency response
situation during a response call, but also, for the technology designers developing the HELPER
communication module or other similar technologies. Relevant background will be presented
first, followed by the study methodology, results, discussion, and conclusions.
2.3.6 Background
2.3.6.1 An Automated and Intelligent HELPER
Figures 2-1 and 2-2 illustrate the pathways to personal emergency response using the traditional
push-button PERS and the HELPER respectively. Both pathways are designed to engage the user
in conversation and the target responses are also similar, however, in the traditional push-button
PERS, see Figure 2-1, the interaction is between the caller and the live call taker. Whereas, in the
HELPER, see Figure 2-2, the interaction is between the caller the HELPER computer. The way
the HELPER works is, instead of being activated by the user pushing a button, the automated
PERS component is triggered automatically when the Vision Module of the HELPER computer
(Figure 2-2, 2a) detects the occurrence of an adverse event. The user may also activate the
system manually by saying a specific keyword or phrase (e.g., a cry for help). Using speech or
37
automatic event detection to activate the automated PERS essentially eliminates the need to wear
a button activator continuously.
1. Personal Emergency Situation
The Traditional Push Button PERS
Speaker Phone or Telephone
Who is calling?Call reason?Situation risk level?Response required?
Push button activator
2. Personal Emergency Call Centre Call Taker
3. Call ResponseEmergency Response Services
Personal Responder(s)No Response (false alarm)
Live Person Live Personspokendialogue
Hands on
Figure 2-1: Pathway to personal emergency response using the traditional push-button PERS.
1. Personal Emergency Situation
The HELPER System
Ceiling/wall/shelf mounted
camera, speaker, microphone
2. HELPER ComputerSpeech or
vision activation
3b. Call ResponseEmergency Response Services
Personal Responder(s)
3a. PERS Call Takerspoken
dialogue
Live PersonWho is calling?Call reason?Situation risk level?Response required?
Is person present?Is person active?Is activity/inactivity normal?Activate communications?
2a. Vision Module
2b. Communication Module
No Response (false alarm)
Live Person
Hands free
Figure 2-2: Pathway to personal emergency response using the HELPER System.
When using the automated PERS, communications are managed within the Communication
Module (Figure 2-2, 2b) and will occur through spoken dialogue between the user and the
HELPER computer. With the HELPER computer as a first responder, the PERS user’s autonomy
can be maintained. The user has the ability to directly request their desired target responder or to
38
cancel a false alarm call before reaching a live operator. In essence, the automated PERS
functions similarly to a hands-free telephone but with specialized and intelligent features.
2.3.6.2 The HELPER Communication Module
The ability of the HELPER computer to communicate with a human user “verbally” over several
speaker-turns places its communication module into a category of interactive dialogue systems
called a spoken dialogue system (SDS) (Fraser, 1997). According to Mӧller (2005) a SDS is
characterized by its ability to accept continuous speech, allow for user initiatives, to reason,
detect errors or incoherence, to correct, anticipate, and/or predict the spoken user response. A
SDS is typically comprised of at least five functional components (Georgila, Wolters, Moore, et
al., 2010; Lamel et al., 2000; Mӧller, 2005):
(1) The Automatic Speech Recognizer (ASR) - receives an acoustic signal (spoken input)
and transforms this into a most probable word sequence;
(2) The Semantic Analyser or Natural Language Understanding component - deciphers the
meaning or intention of the probable word sequence;
(3) The Dialogue Manager – maintains the dialogue and keeps a history of responses;
(4) The Response Generation component – determines the output dialogue according to “the
dialog state, the user utterance, and/or information returned from the database” (Lamel et
al., 2000);
(5) The Speech Synthesis – converts selected system utterances to actual speech output.
According to the best practice guidelines for spoken language dialogue systems and components
produced by the DISC European project, the six essential aspects of SDS development include:
speech recognition, language understanding and generation, dialogue management, speech
synthesis, human factors, and systems integration (Lamel et al., 2000). However, currently,
although all SDSs include an ASR component and some form of speech synthesis or output, the
presence of (2) the Semantic Analyser, (3) the Dialogue Manager and (4) the Response
Generation components range from not present, to limited in nature, to fully present and possibly
complex (Furui, 2003; Lamel et al., 2000; Vipperla et al., 2009).
39
In the HELPER communication module, it is proposed that all the basic functional components
of the SDS be present to follow the DISC recommendations, in addition to a component for
contacting a live responder, conveniently called the “call responder” component. Figure 2-3
illustrates the proposed internal sub-components of the HELPER communication module with
the Call Responder component at the top. The Semantic Analyser or Natural Language
Understanding component of the SDS would be included inside the Speech Informant
component (located above the ASR) in Figure 2-3.
The results of this study specifically focus on improving the Speech Handler component of the
HELPER communication module. Therefore, further detail is provided only on the Speech
Handler sub-components specifically.
Taking a closer look at the ASR component, Figure 2-4 illustrates the typical internal structure of
an ASR. This diagram was derived from (Glass & Zue, 2003; Jurafsky, 2014).
The HELPER Communication Module
Incoming Speech
Spoken Output
Speech Handler
Response Handler
Dialogue Handler
Responder On Route
Automatic Speech Recognizer (ASR)
Dialogue Manager
Response Generation
Speech Synthesis
Speech Informant
Call Responder
Figure 2-3: Sub-sections and functional components of the HELPER Communication Module
40
A/D Conversion &Feature Extraction
Decoder Linguistic Models1. Acoustic2. Pronunciation3. Language
Automatic Speech
Recognizer (ASR)
Incoming Speech from User
To ‘Speech Informant’ Component
Figure 2-4: The ASR component of the HELPER Communication Module.
In Figure 2-4, starting at the bottom left, incoming speech from the user (the acoustic waveform)
arrives through the microphone and is digitized and processed into “numerical representations of
speech information or features” that describe relevant characteristics of the speech signal for
ASR (Scharenborg, 2007). These features are then sent to the Decoder which attempts to decode
the speech signal or recognize what was said by searching through (1) a pre-assembled collection
of speech sound2 representations within the acoustic model, and (2) following specific
pronunciation rules in the pronunciation model (lexicon), and (3) grammar and language rules
in the language model (3) to identify a “best match” (Scharenborg, 2007). The ASR output is
then sent to the Speech Informant component where the semantic analyser resides. The Semantic
Analyser will process the incoming “best match” utterance and attempt to “understand” or derive
the meaning of the utterance.
ASR and semantic analysis techniques are growing areas of research and various approaches are
available (Jurafsky & Martin, 2009). With respect to ASR, keyword spotting methods may be
appropriate for implementation in the automatic PERS, of which there are various kinds. To
identify the keywords, one method translates the incoming speech to text, another may match
potential keywords acoustically, and yet another method may break down the speech into
phoneme components for comparison (Moyal, Aharonson, Tetariy, & Gishri, 2013). With respect
to semantic analysis, common techniques include syntactic parsing (e.g., nouns, verbs, placement
in utterance), predicate logic (e.g., representations of word meaning), and statistical methods
2 The speech sounds are usually sub-word units such as phones, the smallest unit of sound of a language (Gold & Morgan, 2000; Jurafsky & Martin, 2009).
41
(e.g., probabilities of word order, word relationships, matches to existing known examples)
(Jurafsky & Martin, 2009; Klapuri, 2007). No specific technique is being recommended at this
point as this will be an area of future research.
2.3.7 Study Focus as Applied to the HELPER
Figure 2-5 illustrates how the outcome of this study could be applied within the HELPER SDS,
specifically to further develop the ASR, the Speech Informant (SI), and to help classify the
emergency situation (Classifier). Specifically, the results from this study will identify keywords
that could be used to expand the vocabulary size of the HELPER’s ASR; key phrases that could
be used to train the language model of the HELPER’s ASR; and word categories that could be
used in the semantic analyser sub-component of the HELPER Speech Informant to aid with
utterance understanding. In addition, identifying patterns in keyword usage for different PESs
may aid in classifying the PES. By knowing the class or category of a PES, the HELPER may in
future be able to foreshadow the target response to offer the user.
1. Personal Emergency Situation 2. The HELPER System
ClassifierASR
?
SI
Key words and phrases
Word and phrasecategories
HELPER Computer
What response?
Co
nve
rsat
ion
3. P
ER
S R
esp
on
se
Older Adult
Hello? Anyone?
Cares Corpus
Figure 2-5: Possibly data application areas within the HELPER communication module along the personal emergence response pathway.
42
In summary, these findings could be used to improve the HELPER’s Speech Handler and
enhance the system’s ability to both recognize and understand what is being said by the end-user.
Study 1’s main focus will be on identifying keywords and phrases used by callers and
determining which of these keywords appear for various PESs.
2.4 Methodology
2.4.1 Research Design Method
An exploratory, sequential, mixed methods design was used for this study. Clark & Creswell
(2011) provide a good introduction to this method which consists of a ‘qualitative data collection
and analysis’ phase followed by a ‘quantitative data collection and analysis phase’ and ending
with a ‘final interpretation’ as illustrated in Figure 2-6 (Clark & Creswell, 2011).
Qualitative Data Collection
and Analysis
Quantitative Data Collection
and AnalysisBuilds to Interpretation
Figure 2-6: Diagram of the process of exploratory sequential mixed methods design (Clark & Creswell, 2011).
For the ‘data collection and analysis phases’ of both the qualitative and quantitative portions of
this research design method, content analysis is the approach used. Crede & Borrego (2010)
provides an example of using the content analysis approach within a mixed methods design.
Content analysis is an attractive method of inquiry applied in many research fields for analyzing
text (and sometimes other media) in context of its use (Cavanagh, 1997; Krippendorff, 2012).
Over recent decades content analysis has been used increasingly in the field of health research
(Elo & Kyngäs, 2008; Mays & Pope, 2000). Content analysis is flexible enough to examine data
both qualitatively or quantitatively and inductively (e.g., specific to general) or deductively (e.g.,
general to specific based on existing theory) (Elo & Kyngäs, 2008; Krippendorff, 2012). When
used as a research method, content analysis is noted as being systematic, objective, repeatable
and a valid means of either quantifying phenomena or making inferences about data in context
(Krippendorff, 2012). Typically new knowledge or insights are dervied in the form of concepts
or categories describing some phonomenon or for the purpose of building a model, conceptual
system or map (Elo & Kyngäs, 2008). In this study, for example, words are selected and
43
classified into categories according to their context or meaning which is an exact example of
content analysis’ utility (Cavanagh, 1997). The outcome of a content analysis may also be used
to guide future action which is especially useful in the field of health research (Elo & Kyngäs,
2008) and for this particular research application. Furthermore, content analysis is used in the
field of artificial intelligence to help researchers design machines capable of understanding
natural language (Krippendorff, 2012), which again is another key component for the HELPER.
2.4.1.1 Method Limitations
In terms of limitations, the flexible advantage of content analysis is also its restriction. Some
researchers have noted that because content analysis does not proceed linearly and has minimal
formalized procedures, it can become more complex and difficult to implement than quantitative
analysis (Polit & Beck, 2004).
2.4.1.2 Method Implementation
The general procedure for implementing a content analysis include (Elo & Kyngäs, 2008;
Graneheim & Lundman, 2004; Krippendorff, 2012):
1. Selecting a unit of analysis (e.g., interviews, a program, parts of text);
2. Within the unit of analysis, selecting a meaning/coding/content/recording unit. Essentially,
one must decide what to analyse, to what degree of detail, and how sampling will be
conducted (e.g., should the codes include silence, sighs, laughter, and postures?);
3. Organizing the data (e.g., use open coding, categories, themes, abstractions);
4. Creating a model, conceptual system or map, or categories.
2.4.1.3 Method Approaches
There exist various approaches to the application of content analysis in research and three
approaches will be used within this dissertation. Table 2-1 provides a brief description of how
these approaches are distinct from each other.
In addition to these approaches, it is important to know how the content will be examined.
Looking at manifest content refers to using the visible or obvious components of the content
being studied. This is as opposed to latent content which involves an “interpretation of the
44
underlying meaning of the text” (Downe-Wamboldt, 1992; Graneheim & Lundman, 2004;
Kondracki, Wellman, & Amundson, 2002). This study uses the manifest meaning of words.
Table 2-1: Various distinct approaches to content analysis.
Application Type Description
Direct Theory or relevant research findings are used to guide initial code development (Hsieh&Shannon,2005).
Conventional Coding categories are derived directly from the text data. Generally used to describe a phenomenon in the data (Hsieh&Shannon,2005).
Quantitative Text data are coded into explicit categories and then described using statistics (Morgan, 1993).
2.4.2 Research Design Details
2.4.2.1 Research Population
All recorded calls used in this study were between the clients of the PERS provider or a care
provider and the PERS providers’ call taker. In a few cases, emergency medical service (EMS)
dispatchers or the PERS setup personnel were also included in the call. No subscriber details
were provided with the calls, but caller age and gender details were deduced from within the call
conversations where possible. We are unaware of any prior call "sorting", for example with
respect to gender, call type, caller type, and emergency risk level that may have occurred.
2.4.2.2 Research Setting
This study was completed at the University of Toronto in the Rehabilitation Sciences Institute.
The data processing was performed in the Intelligent Assistive Technology and Systems
Laboratory. This study also included three visits to expert emergency response service sites to
gain a better understanding of how emergency responders operate and interact with older adults
in emergency situations. One visit each was made to a local fire hall, an EMS dispatch centre,
and a personal emergency response call centre.
45
2.4.2.3 Data Collection
Personal Emergency Response Call Recordings
The personal emergency response calls used in this study were provided by a local, private PERS
provider upon our request for a sample of emergency and non-emergency calls. The non-
emergency calls recorded included: false alarms or accidental system activations, installation
setups or equipment test calls, scheduled check-ins, translation requests, and follow-up calls. The
emergency calls recorded included genuine emergency calls for either EMS (i.e., 911,
paramedics) or non-EMS emergency responders (i.e., relatives, friends, or professional care
providers). A total of 109 digitized call recordings were obtained from the PERS provider (name
withheld for confidentiality). These recordings were collected in two sessions over two years
(2008 - 52 calls and 2009 - 57 calls). All recordings were made in Canada. To our knowledge, all
clients in this study used the traditional push-button activator.
Confidentiality
Confidentiality agreements were signed between the private call centre providing the call
recordings and the Intelligent Assistive Technology and Systems Lab. These agreements outlined
how the data would be used and stored. In terms of usage, all transcripts would be stripped of
personal or identifying information and access to call recordings would be limited to select
individuals upon approval by the Company. In terms of storage, all recordings would be kept in a
secure and locked location and all digital recordings on the computer would be kept under
password protection on a lab computer. All correspondences with the Company would also be
kept confidential.
On-site Visits with Emergency Response Service Providers
Informal discussions and observations were conducted with the emergency response call experts
at their business locations. A few hours were spent with one EMS dispatcher and three personal
emergency response call takers to gain familiarity with how the operators receive and handle
incoming calls and to understand how the call centre is organized. At the fire hall, an informal
discussion was held with three firefighters (one had experience working as a paramedic) to gain
46
a better understanding of general emergency procedures and some of the common response
difficulties encountered while attending to emergencies with older adult individuals.
2.4.2.4 Data Processing
Eighty-four (84) response calls were transcribed in total. The twenty-four (24) non-transcribed
calls consisted of repeat recordings or were conversations between the emergency response
service providers only (i.e. between the personal emergency response provider’s call taker and
EMS dispatchers without subscriber involvement). Transcription was performed verbatim from
digital audio files using the computer software, “Systematic Analysis of Language Transcripts”
(SALT), version 8.0 and 9.0 (Miller & Iglesias, 2006). The transcription process followed the
SALT protocol outlined in the user manual (Miller & Chapman, 2008). SALT was specially
designed software for “eliciting, transcribing, and analyzing language samples.” As such, in
addition to transcription tools, the SALT software also includes various analytical tools,
including, but not limited to, the ability to code words and utterances, and calculate words per
minute or conversational time lengths. The coding units of interest were extracted from the
response call transcripts using the "explore multiple transcripts" and "rectangular data file"
features of the SALT software.
Transcriptions were completed by listening to the digital call recordings on a computer using
headphones. The audio content was transcribed directly into text in the SALT program. An effort
was made to capture non-word utterances (e.g., coughing), fillers (e.g., ‘eh’, ‘ah’), and to note
silent moments (long pauses) during the conversation. Patient identifying information was
excluded in the transcripts (i.e., no names, addresses, or contact information). Due to the nature
of the working agreement with the company providing the PERS, only a limited number of the
laboratory research team members had permission to listen to the raw call recordings.
These real call samples all had a fair amount of background noise embedded in the recordings,
presumably caused by both the caller’s and call centre’s background environments, as well as
being inherent in the recording equipment. During transcription, recordings had to be paused
frequently and the volume adjusted to very high levels in order to catch what was being said in
the conversation. Call recordings were stored on the computer as *.wav files and played using
Audacity (version 2.02) an open source, freeware for listening to and editing sound files
47
(Mazzoni, Dannenberg, & et al., 2000). The sound files were played back using the “mono" or
single audio track with a sampling rate of 8 kHz and a sample format of 32 bit floating point.
2.4.2.5 Data Analysis
“Naïve” listening of the call recordings and reading of the transcripts were first used to obtain a
superficial and preliminary understanding of the conversations and to identify possible directions
for analyses. Figure 2-7 illustrates a flow diagram of the general steps used to complete the study
objective.
(b)
Transcribe calls (SALT)
Isolate unique words
Identify reduced keyword set
Personal Emergency Response Calls
Identify keywords by word categorization
To improve ASR acoustic and language
models and apply to CARES Corpus
Manifest Quantitative Content Analysis at Word Level
Identify situation characteristics
Classify response calls with PES Model
Directed Content Analysis at Call Level
Develop Personal Emergency Situation
(PES) Model
• 911 call literature
• Emergency service provider site visits
Identify word categories
Conventional Content Analysis at Word Level
To improve speech understanding
(Speech Informant)
Identify key phrases
(a)(c)
Identify keywords used in various call categories
To improve HELPER call classification
Figure 2-7: Flow diagram illustrating the methodology followed to analyse the calls and complete study objectives.
Start at top arrow. Point (a) word level analysis to identify word categories; Point (b) a word level analysis to identify keywords and key phrases; and Point (c) a call level analysis to create the PES Model. Keywords were then identified for occurrence within different call categories.
A total of three analyses were performed on 84 of the transcribed calls, two at the word level and
one at the call level. A different approach was used for each content analysis. Beginning at the
top (fat green arrow) in Figure 2-7, personal emergency response calls were transcribed and
48
unique words were isolated using SALT. At point (a), a conventional content analysis was
performed at the word level to determine possible word categories. Next, at point (b), a manifest
quantitative content analysis was performed at the word level to identify keywords by assigning
word categories to the unique words. Once a keyword list was obtained, at point (c), a directed
content analysis was performed at the call level to identify personal emergency situation
characteristics. Information from previous literature, visits with emergency service providers (see
Appendix H), and the categorized keywords were combined to direct the call level analysis
where a PES model was developed. Response calls were then classified using the PES model
categories. The PES model was also used to help focus the inclusion/exclusion criteria in the
keyword reduction step. Once the final reduced keyword set was identified, the keyword
occurrence within different classified personal emergency response calls was examined.
The word categories identified from the conventional content analysis can be used to improve
speech understanding within the HELPER Speech Informant. The keywords identified from the
Quantitative Content Analysis can be used to expand the size of recognized vocabulary in the
automated PERS and can be included in the CARES (Canadian Adult Regular and Emergency
Speech) Corpus (see Study 3 for further details) for future ASR training and system testing. The
key phrases identified can be used to build up the ASR’s language model. The PES model
identified from the directed content analysis can be applied to improving the call classification of
the HELPER which may ultimately help in forecasting a final response target.
Full Keyword Identification Using Three Coders
The SALT software was used to extract all unique words from the transcripts spoken by the
PERS callers (users). This initial list of extracted unique words will be referred to as the “raw
word list.” The process of keyword identification was performed in total by three coders. Figure
2-8 illustrates a flow diagram of the process used by Coder 1 to determine the word categories
and “original keyword list”. The word categories were derived from the manifest word meaning
within the context of a personal emergency response call.
49
• Transcribes call recordings
• Extracts unique caller words (raw word list) from call transcripts using SALT
• Creates word categories and category descriptions based on extracted word function
• Assigns word category to each extracted word
• Creates a list of categorized keywords called the ‘original key word list’
Coder 1
Figure 2-8: Process of keyword identification and categorization from Coder 1.
The word categorization process was repeated by a second coder (Coder 2) for all words
extracted by Coder 1 out-of-context (without having read the response call transcripts). See
Figure 2-9 for a flow diagram of the process used to determine the “Coders 1&2 keyword list.”
To examine how the out-of-context word selection compares to in-context word selection, a sub-
study was conducted with a third coder. Coder 3 was provided with a printed copy of the
transcribed call recordings and the word categories used by the first two coders. The third coder
was asked to select the keywords she felt were important from within the transcripts and to code
them using the pre-defined word categories used by Coder 1 and Coder 2. Coder 3 also
highlighted pertinent phrases within the text she felt were important to understanding the PESs
and the required response. The final keyword list determined after the last agreement session
with Coder 3 will be referred to as the “full keyword list.” Inter-rater reliability was measured
using Cohen's Kappa using Coder 1’s original keyword list and Coder 2’s keyword list for out-
of-context word categorization. It was also measured using the Coders 1&2 keyword list (from
Coder 1 and 2) and Coder 3’s keyword list for in-context word categorization. Inter-rater
50
reliability was also measured for Coder 3’s selection of keywords compared with Coders 1&2
keyword list.
• Receives the raw word list used by Coder 1 and list of word categories with descriptions
• Assigns word category to each word from raw list (out-of-context)
• Coder 1 answers questions and clarifies word descriptions with Coder 2
• Coder 2 recommends any changes to word categories
• Final changes made to keyword category assignments
• Coder 1 compares Coder 2’s key word list and word categories with ‘original key word list’
• Misunderstandings are clarified and final modifications are made as necessary.
• The ‘Coders 1&2 keyword list’ is created
• Receives call transcripts and the Coders 1&2 keyword list and list of word categories with descriptions
• Highlights keywords and phrases inside transcripts (in-context) and assigns categories to words
• Coder 1 answers questions and clarifies word category descriptions with Coder 3
• Coder 3 makes final changes to keyword category assignments
• Coder 1 extracts Coder 3’s keyword list and assigned categories and compares Coder 3’s list to the ‘Coders 1&2 keyword list’
Coder 2
• Coder 1 calculates inter-rater reliability
Coder 3
• Coder 1 and Coder 3 decide on final keywords and assigned categories
• List called the ‘full key word list’
• Coder 1 calculates inter-rater reliability
Figure 2-9: Process of keyword identification and categorization from Coders 2 and 3.
Coder 1 is the author of this dissertation with no prior experience in coding. Coder 2 was a
graduate student within the Department of Speech-Language Pathology and has background
experience in coding for qualitative studies and conducting research involving older adults with
dementia. Coder 3 is a practicing emergency room physician with a focus on geriatric care and
an interest in technologies. Coder 2 and Coder 3 were both aware of the main purpose of the
research but were not directly conducting research in this area. The full keyword list contains all
keywords selected from the Coders 1 and 2 keyword lists in addition to the keywords obtained
from Coder 3.
51
The use of more than two coders is not uncommon in qualitative research and the number of
coders used tends to be based on the needs of the project, the data being coded, as well as the
capability and experience of the coders (i.e., their ability to pull out themes and identify
categories, their backgrounds) (HAK, 1997; Krippendorff, 2012; Ryan, 1999). For this study, it
was felt that using three coders would increase the validity of the work if it could be shown that
high interrater reliability could be achieved if the keywords were identified both in-context as
well as out-of-context. The fact that the third coder is an emergency room physician also means
that this coder is familiar with conversations that involve patients communicating health related
problems and symptoms. Selecting coders with background expertise is suggested to be an
important qualification (Krippendorff, 2012). A high interrater reliability amongst the three
coders was expected to demonstrate greater validity in keyword selection.
Final Keyword Identification
The actual number of words that the ASR vocabulary could be increased to was constrained to
some degree by a two hour collection time frame in which the speech data was to be collected;
the number of different speech types that were to be included (e.g., read sentences, free speech,
etc.); and the number of older adult participants that could be successfully solicited to provide
speech samples for the database (see Chapter 4 for further details). The two hour time frame was
also selected to prevent the older participants from becoming too fatigued in the voice recording
process. Within these constraints, it was determined that a final maximum keyword vocabulary
size of 185 words could be recorded. This size was considered small enough to build a small or
small-medium sized ASR vocabulary but large enough to examine how ASR recognition might
be affected by adjusting the number of words being recognized. One-hundred and eighty-five
(185) words is also significantly larger than the previous ASR vocabulary size of two words
(yes/no) and should offer enough range for technology developers to locate the balance between
being able to recognize enough words to carry out a dialogue smoothly while maintaining a
fairly low ASR word error rate.
To identify a small keyword vocabulary set from the full keyword set, a series of word reduction
rules were required. The reduction rules consisted of inclusion and exclusion criteria developed
based on the goal of being able to isolate keywords that could be used by the HELPER to
52
determine the desired target response. Within this context, the keywords that were preferred were
those that carried significant meaning or which could provide enough detail to identify: (1) the
PES and target response, (3) indicate a positive or negative response to a question, and/or (4)
perform some other function vital to a response call conversation, such as the opening/closing
dialogue early on in the response call conversation.
Before keyword reduction rules could be applied, important characteristics that could be used to
distinguish between different PESs was required. Therefore, a PES model was developed using
PES classification categories derived from the initial keyword list, the transcribed response calls,
on-site visits with emergency service providers, and research literature. The reduction rules were
then applied to the full keyword set and a small keyword set was obtained for inclusion into the
CARES corpus. This small keyword set was also checked to ensure that keywords from every
word category (e.g., positive/negative responses, conditions, etc.) of interest would be included
in this smaller vocabulary set.
Keyword Identification in Various PESs
In order to determine the keywords used for different PESs, the response calls were classified
using the PES model categories and the unique caller words spoken within each of the PES
model categories were extracted using SALT software. For each PES classification category,
only words from the full keyword list were retained.
2.5 Results
2.5.1 Extraction of keywords
A total of 779 possible words were isolated from the response call transcripts using only the
caller dialogue (all caller types). This raw list also included word fragments, code names (e.g.,
CGN = caregiver name), unintelligible code markers (e.g., ‘X’ denoted for unintelligible words),
and some word repetitions (e.g., because of spelling differences or computer symbol
differences). Removing 16 word repetitions left 763 possible words in the raw word list.
Generally, non-word sounds such as word codes representing coughing, silence, TV noise, and
other non-verbal noises (e.g. sighs, breathing, moans), were not included in the keyword lists.
53
Eighteen (18) word categories or themes were developed from the raw word list and are shown
in Table 2-2.
Table 2-2: The word categories derived from words extracted from response call transcripts.
Index Word Categories 1-p Positive response to questions - (e.g. yes) 1-n Negative responses to questions - (e.g. no) 2 Request/command - verb related to obtaining assistance (e.g., help, get) (2a - weaker
requests) 3-n Problem condition current (e.g., unconscious, clammy) 3-e Problem condition pre-existing (e.g., diabetes) 3-p Positive condition descriptions (e.g., good) 4 Neutral body state - made pos/neg with descriptor word (e.g., can't feel, not breathing) 5 Politeness (e.g., thank you) 6 Opening/Closing (e.g., hello, good-bye) 7 Repetition - could not hear, mumbled words 8 Targets (e.g., ambulance) 9 Question word (e.g., what, where, how, etc.) 10 Body part (e.g., arm, leg) 11 Negation word - reverses state (e.g., can't, not) 12 Interjection (e.g., ah) 13 Location (e.g. floor) 14 Special commands (i.e., to cancel call, turn-on/off machine, get weather/time/date) 15 Other (words not considered keywords, e.g., of, the, a, etc.)
The word categories developed out of logical groupings based on word function and meaning
within the context of a response call, for example, “positive word responses”, “question words”,
and “emergency response targets”. These categories were initially developed by the first coder
based on knowledge derived from the conversational structure observed in the transcribed
response calls, the call centre protocol, research literature on emergency response calls, and from
informal discussions with emergency response service providers. For example, in (Imbens-
Bailey, 2000; Zimmerman, 1992a, 1992b), the researchers discuss emergency response calls as
having an ‘opening/closing’, some ‘initial caller request’, followed by a possible ‘interrogative
series’, before either a ‘response’ is provided or denied. The word categories therefore generally
fall within this particular conversational context with a word category for “openings/closings,”
“positive and negative responses,” and ‘target responses” to name a few.
54
The process of assigning word categories was performed by all coders in the in and out-of-
context studies. Coder 1 and Coder 2 performed out-of-context coding with the 779 original raw
words extracted. Coder 1 used an original list of word categories. Coder 2 reviewed the proposed
word categories and her feedback was used to make slight modifications. Coder 2 then used this
modified set of word categories for classifying the words. Coder 1 subsequently revised her
coding using the finalized word category set. In the in-context study, Coder 3 performed the
coding by selecting keywords within printed transcripts and categorized these words using the
same word categories as Coder 1 and Coder 2. Coder 3 also highlighted possible key phrases
within the text and coded a majority of these phrases.
2.5.2 Keyword Results from Coders
Coder 1 identified 277 keywords from the raw word list. Table 2-3 shows a summary of the
coding results obtained for Coder 2 and Coder 3 after each coding process.
Table 2-3: Summary of coding results for Coders 2 and 3 based on keywords and category matching.
Measure Coder 2 Coder 3 Keywords coded 300* 204+
Percent Agreement (Category matches per
total words)
81% agreement^ (631/779)*
75.5% agreement~ (575/762)*
Inter-rater Reliability (Cohen’s Kappa)
0.682 p < 0.001
95% CI (0.637,0.727)
0.564 p < 0.001
95% CI (0.511,0.617) # keywords after
consolidation 324* 348*
*word repetitions removed, unintelligible words included. + phrase words, non-words (i.e. coughs, silence) and word repetitions not included.
^ with Coder 1 ~
with Coders 1&2
The remaining 19% differences between Coder 1 and Coder 2 were resolved with discussion.
According to Landis & Koch (1977), the value of Cohen’s Kappa (0.682, p<0.001) is considered
moderate to good agreement.
For the in-context study with Coder 3, some of the words categorized were assigned more than
one category depending on the context in which the word was used. For example “accident”
could be construed as a negative word if the caller says, “I had an accident and fell down,” or a
55
positive word if the caller says instead, “I pushed the button by accident.” For the purpose of
calculating the inter-rater reliability statistic, categories used more frequently were included. So
if “accident” was mostly used in a positive way, such as an ‘accidental call’, then the more
frequently associated category was used. If the frequency of occurrence was low or equivalent
(e.g., word occurred once and was categorized with two category codes), the category matching
that used by Coders 1 and 2 was selected. In situations, where differences in interpretation of the
category definition occurred, the situation was addressed and resolved through discussions. For
Coder 3, the 24.5% difference in categorization with Coders 1 and 2 was resolved this way.
According to Landis & Koch (1977), the value of Cohen’s Kappa (0.564, p<0.001) would be
considered moderate agreement.
Table 2-4 provides further detail on the breakdown of the word comparisons identified in Coder
3’s keyword list and the Coders 1 and 2 keyword list.
Table 2-4: Breakdown of the keywords identified from Coder 3.
Counts/Calculations Description (Keywords) 179 Coder 3's keywords found in Coders 1&2's keyword list
25 Coder 3's keywords not found in Coders 1&2's keyword list 204 Total keywords identified by Coder 3 (no phrase words) 323* Total number of keywords in Coders 1&2's list after modification
179/323 55.42% % of Coder 3 keywords found in Coders 1&2's keyword list 25/323 7.74% % of Coder 3 keywords not in Coders 1&2's keyword list
*Decreased by 1 because “spinal stenosis” combined into one word.
Table 2-5 shows a summary of the coding results from the phrase words selected by Coder 3.
Ten phrases selected had unintelligible words that were included in the phrase count. Six phrases
were removed from Coder 3’s list because they were phrases identified in the “comments”
section of the transcripts and would not have been extracted in the raw word list. One phrase was
duplicated and coded with different categories. This duplication was removed from the phrase
count. Although the phrases were sorted into the same categories used as the words, the phrase
categories could not be statistically compared to the word categories because the individual
phrase words by themselves outside the phrase would not have the same meaning as when used
within the phrase (i.e., would be categorized differently). The inter-rater reliability measure was
calculated based on the number of keywords and keyword phrase words Coder 3 selected and
how these related to Coders 1 and 2’s keyword list. According to Landis & Koch (1977), the
56
value of Cohen’s Kappa (0.523, p<0.001, and 0.512, p<0.001) are considered moderate
agreement.
Table 2-5: Summary of phrase results for Coder 3 with agreement of keyword selection.
Measure Coder 3 Phrases selected 135*
Phrase words (phrase words total)
153^
Percent Agreement (# matches per total words)
(keywords only) 77.8%+
(593/762)
(keywords + phrase words) 76.4%+
(582/762) Inter-rater Reliability (of word selection) (Cohen’s Kappa)
0.523 p < 0.001
95% CI (0.464,0.582)
0.512 p < 0.001
95% CI (0.449,0.575) # keywords after
consolidation 402*
* Repetitions removed, unintelligible words included. ^ 97 words outside of Coder 3’s keyword list + with Coders 1&2
Table 2-6 provides further detail on the breakdown of the keyword and phrase word comparisons
identified in Coder 3’s keyword list and the Coders 1 and 2 keyword list.
Table 2-6: Breakdown of the keywords identified from Coder 3.
Counts Description (Keywords + Phrase words) 222 179 Coder 3 keywords + 43 phrase words found in Coders 1&2's keyword list 79 25 Coder 3 keywords + 54 phrase words not in Coders 1&2's list (cat 15) 301 Total 204 keywords + 97 phrase words in Coder 3's list
101 323 Coders 1&2's keywords - 222 all Coder 3's keywords in Coders 1&2's keyword list
222/323 68.73% All Coder 3's keywords identified in Coders 1&2's keyword list 79/323 24.46% All Coder 3's keywords not in Coders 1&2's keyword list (cat 15) 101/323 31.27% Coders 1&2's keywords not identified by Coder 3 144/323 44.58% % of Coders 1&2's keywords not in Coder 3's keyword list
The final number of keywords identified after the third coding was 402 as shown in Table 2-5.
57
2.5.3 Characterizing the Personal Emergency Situation
2.5.3.1 Proposed PES Characteristics
Findings from research literature and on-site observations with expert emergency responders
highlight the fact that risk level is an important factor to consider when assessing a potential
emergency situation. The identification of medical distress from the response call protocol, as
well as the way firefighters and call takers/dispatchers are trained to first assess the ABC’s (i.e.,
breathing and consciousness) suggests that risk level is of primary importance.
In addition to risk level, EMS dispatchers also classify certain types of call situations (e.g.,
pedestrian/vehicle accident), and this suggests that call reason may also be important. In the
personal emergency response call protocol manual, in addition to medically related symptoms,
specific questions are asked if a ‘fall’ has occurred. This may suggest that ‘medical’ and ‘fall’
calls may be reasonable categories in which to classify calls.
While listening to calls at the EMS dispatch centre, a few calls were received from personal
emergency response call takers requesting transportation from a care residence (e.g., long- term
care facility) to the hospital or back. On one occasion, the personal emergency response call
taker indicated that she had only spoken to the care provider and not to the PERS subscriber.
When the EMS dispatcher was queried about why the care provider did not just call 911 directly,
the EMS dispatcher commented saying that care providers are sometimes instructed by the
personal emergency response provider to activate the subscriber’s PERS as opposed to calling
911 directly in order for the personal emergency response provider to keep track of their client’s
emergency events. This statement has not been corroborated by a personal emergency response
provider, however, it suggests that older adult subscribers are not the only users of PERS and
knowing the caller type may also be important.
As mentioned in the literature review in Chapter 1, during stressful situations such as a medical
trauma or strong emotion, the human voice may change. In addition, natural aging can affect
voice quality and diseases causing conditions such as stroke, aphasia, or dementia may also
affect one’s ability to communicate. Given that the HELPER is based on spoken input, the
communication ability of the user is also important during a PES. However, communication
58
ability may be more a characteristic of the caller type rather than the PES and so was not
included in this model.
2.5.3.2 PES - Caller Type
In terms of the caller type characteristic, response calls were found to be initiated by both older
adult subscribers as well as care providers. Three PES caller type categories were identified: (1)
the subscriber (herein referred to as the 'older adult' user), (2) the other caller (herein referred to
as the ‘care provider’), and (3) a combination of older adult and care provider callers. The term
'care provider' refers to any individual (e.g., neighbour, friend, professional home care worker,
staff nurse, or relative) assisting the older adult user and using the PERS to request assistance on
their behalf.
2.5.3.3 PES - Risk Level
In terms of risk level, response call transcripts were categorized into three emergency risk levels:
(1) low risk or non-emergent (a false alarm), (2) urgent or medium risk (needs help soon), and
(3) emergent or high risk (possible loss of life or limb) (Gilboy et al., 2005). Figure 2-10 outlines
the basic differences between low, medium and high risk situations.
High Risk Situation (Emergent) Non/Semi‐Responsive
Medium Risk (Urgent) Situation Fully/Semi Responsive
Low Risk SituationFully Responsive
• Needs attention ASAP
• Life or death
• Loss of sight, limb or mind
(e.g., unconscious, gasping, heavy bleeding, choking, faint)
• Needs attention fairly quickly
• Not life or death
(e.g., fallen but not injured or bleeding heavily, pneumonia needs x-ray and medication)
• No assistance needed
• Accidental or service call
(e.g., user in full control, needs help changing system batteries)
MobileNot‐mobile
Voice Range Calm/NormalImpaired
Mobility Range
Figure 2-10: Examples of differences between risk levels.
59
2.5.3.4 PES - Call Reason
In terms of the call reason characteristic, response call transcripts that were not false alarms were
grouped into two categories: (1) fall calls and (2) medical calls. In this study, a ‘fall call’ was
defined as one where the caller experiences an unintentional fall, is not hurt or hurt minimally,
but cannot get up without assistance. Fall calls resulting in physical injury such as bleeding were
considered to be medical calls. In medical calls the caller needs medical assistance either because
of a physical injury, pre-existing medical condition, new illness, or psychological concern. The
potential mobility range of a caller is illustrated in Figure 2-10.
2.5.3.5 PES - Communication Ability
In terms of communication ability, the voice may be either calm or normal to impaired as shown
in Figure 2-10. This does not mean that the voice will become impaired in high risk situations,
only that it could become so. With respect to the potential for ‘communication ability’ to be a
PES characteristic, no specific communication ability categories could be identified that would
be help distinguish between different keywords. For example, the keyword “stroke” may suggest
a high risk situation, but if the communication ability is good, then it is simply that keywords
should be intelligible enough for ASR. Due to the important role that communication plays in a
SDS such as the HELPER, looking at the style or manner in which the caller communicates may
reveal interesting findings. It is suggested that communication ability be examined separately in
a future study.
2.5.4 The Personal Emergency Situation (PES) Model
Figure 2-11 illustrates a one model of a PES with the various categories for the situation
classifications shown (e.g., risk level classification has three category levels, high, medium and
low). In this model the PERS user is in the centre and is represented by the 'caller type'. During
a PES, each caller exists in a certain physical and cognitive state which may affect his/her ability
to communicate and respond. During a PES, the user is also experiencing an adverse event which
motivates him or her to activate the PERS and this reason for calling is represented by the 'call
reason'. Finally, the PES situation can also be graded in terms of severity which is represented
by ‘risk level’.
60
Situation
Call Reason
Risk Level
Physical‐Cognitive State
UserCaller Type
Caller Type
• Older adult
• Care provider
Call Reason
• Medical call
• Fall call
Risk Level
• High Risk
• Medium Risk
• Low Risk
Figure 2-11: Model of a PES
2.5.5 Classifying the Personal Emergency Response Calls
The personal emergency response calls were classified by Coder 1 based on the PES model
categories of caller type, call reason, and risk level. For caller type, if the subscriber called, the
call was classified as an “older adult” call. If another person called instead, such as a care
provider (i.e., relative, friend, or personal support worker), the call was categorized as a “care
provider” call. For the situation in which the older adult and care provider both spoke, the call
was categorized as a combination call.
For call reason, as defined previously, only the calls in which a “fall” was mentioned and the
older adult was simply unable to get up (i.e., not or minimally injured) was the call categorized
as a ‘fall call.’ All other calls were considered medical calls.
For the risk level, response calls were categorized using the basic risk level divisions illustrated
in Figure 2-10, following the ABC’s of emergency response. To verify reliability, risk level
coding was performed by a second coder (who was the keyword Coder 3), the physician
specializing in geriatric emergency medicine. Call classification was made by this second coder
in-context after reading the call transcripts. Risk level classification results were compared
between coders 1 and 2 and percent agreement was found to be very high, 89% (75
61
agreements/84 total calls). Inter-rater reliability was calculated using Cohen's Kappa = 0.820,
95% CI (0.706, 0.934), p<0.001. A reliability of 0.8 is considered excellent according to Landis
& Koch (1977). Discussions were held to resolve the remaining classification differences (11%)
until final agreement was obtained.
2.5.6 Reduction of Keyword List
To reduce the full keyword list into a smaller list of vocabulary that could be used for inclusion
into a spoken speech database for ASR training (the CARES corpus), the full keyword list was
passed through a series of word reduction rules consisting of: (1) initial exclusion criteria, (2)
inclusion criteria, and (3) final exclusion criteria.
The initial set of eight exclusion criteria is shown in Table 2-7.
Table 2-7: Initial exclusion criteria for reducing keyword list.
# Exclusion Criteria
Comments Example
1 Conjunctions Function words removed but in CARES corpus these can be found within the other recorded utterances.
“for, and, nor, or, but”
2 Connectors Function words removed but in CARES corpus these can be found within the other recorded utterances.
“to, too, so, as, yet”
3 Articles Function words removed but in CARES corpus these can be found within the other recorded utterances.
“the”
4 Interjections Removed in keywords but in CARES corpus, a few were added to the phrases and PES scenarios.
“ah, ahem, oh, ugh, uh, uh-huh um”
5 Non-Words Removed in keywords but in CARES corpus, some were included in the PES scenarios
E.g. coughing, sighs, breathing
6 Unintelligible Words
Words that cannot be replicated and are not intelligible (‘*’ in example shows incomplete word portion)
E.g., wea*, ye*
7 Unknown Words Words in which the meaning maybe questionable or unknown were removed
e.g., TN, ER
8 Reducible Words Words that normally can stand alone with same meaning and has it’s complement already included in the keyword list
e.g. retain “bye” but exclude “bye-bye”
62
In general, function words making up conjunctions (i.e., for, and, nor, or, but) and a few
connectors (i.e., to, too, so, as, yet) and articles (i.e., the) were removed from the final key
vocabulary list. These connectors would be present within the recorded phrases as well as the 5-
minute monologue recorded by each person. Most interjections (i.e., ah, ahem, oh, ugh, uh, uh-
huh, um) and non-word sounds were also removed (i.e., coughing, sighs). However, interjections
that could imply that the caller requires repetition of the call taker’s utterance were retained, for
example, ‘eh?’ and ‘heh?’ Non-word sounds were removed but some were included within the
emergency scenarios. Unintelligible words were removed as their meanings are questionable and
difficult to replicate.
Before applying the next set of reduction rules, the keywords were divided into two focus sets.
Figures 2-12 and 2-13 illustrate the flow of the decision making process for each of the focus
sets during the reduction process. As illustrated in Figure 2-12, the first word focus set included
all higher frequency words or words occurring five times or more over all response calls and
across four or more PES categories. Next the word inclusion criteria were applied. Words that
met at least one of the inclusion criteria were included in the small vocabulary set. Words that
did not meet any of the inclusion criteria were passed through to the final exclusion criteria. If
the word did not meet any of the final exclusion criteria, the word was considered for non-
keyword inclusion into a PES phrase or scenario that would be recorded in the CARES corpus.
Words meeting at least one exclusion criteria were not included in the vocabulary set.
63
Include Word
No Yes
1st Word Focus • Occurs 5 times or more AND• Across 4+ PES categories
Word Inclusion? (Related to at least one of)• Emergency Request• High Risk• Low Risk• State/Condition/Symptom• Conversational Utility• Caller Type Identifier• Location• HELPER related
command word
Word Exclusion• Combo word or• Vague: cannot assess risk level/target or• Alternate tense already used
Non-Keyword Inclusion• Use in Phrase or• Include in Script
Exclude Word
No Yes
Figure 2-12: Diagram outlining the decision process for selecting key words using the first word focus set.
As illustrated in Figure 2-13, the second word focus set includes any word with a lower
frequency of occurrence (less than five times in the response calls). These words were checked
for inclusion with the same criteria as for the first word focus set except for “location” words.
Location words occurring less than five times were not considered for inclusion.
64
Non-Keyword Inclusion• Use in Phrase or• Include in Script
YesNo
No
Include Word
2nd Word Focus • Occurs < 5 times
Word Inclusion? (Related to at least one of)• Emergency Request• High Risk• Low Risk• State/Condition/Symptom• Conversational Utility• Caller Type Identifier• HELPER related
command word
Word Exclusion• Combo word or• Vague: cannot assess risk level/target or• Alternate tense already used
YesNo
Word Exclusion• Combo word or• Vague: cannot assess risk level/target or• Alternate tense already used
Exclude Word
Yes
Non-Keyword Inclusion• Use in Phrase or• Include in Script
Figure 2-13: Diagram outlining the decision process for selecting key words using the second word focus set.
All words were checked next for meeting inclusion and exclusion criteria. Words that both met
or did not meet an inclusion and an exclusion criterion were to be included in the CARES corpus
as a phrase word or in a PES script. If the word did meet an inclusion but not an exclusion
criterion then the word would be included in the small keyword set. If the word did not meet any
inclusion criterion but did meet an exclusion criterion, then the word would not be considered in
the small vocabulary set.
Tables 2-8 and 2-9 provide further descriptions for the word inclusion and exclusion criteria
respectively and also an example for each criterion is provided.
65
Table 2-8: Definitions and examples of the word inclusion criteria.
# Word Inclusion Criteria
Description Example
1 Emergency Request Words
Words related to requests for emergency care or assistance
e.g., Ambulance, hospital
2 High Risk Word Word related to emergent care need - loss of consciousness, can’t breathe, serious fall
e.g., breathing, beating
3 Low Risk Word Words related to low risk situations
e.g., mistake, accident
4 State/Condition/ Symptom Word
Word related to a common older adult disease or condition, state of health, fall, targets important body area, and will indicate whether help is needed
e.g., don’t, chest, asthma, terrible, discomfort
5 Conversational Utility Word
Words related to requests for repetition, politeness, opening/closing, responses to questions, asking questions
e.g., okay, thank_you, yeah, no
6 Caller Type Identifier Word
Word is a pronoun identifying caller type
e.g., I, me
7 HELPER Related Command Word
Possible command word(s) used to turn-on, turn-off, system, or ask for information
e.g., off, time, day
Table 2-9: Definitions and examples of the final word exclusion criteria.
# Word Exclusion Criteria
Description Example
1 Combo Word Compound words (consists of sub words already included)
e.g., pardon me, panic attack
2 Too vague/cannot assess risk level
Not enough information to assess caller condition or risk level or too infrequent
e.g., lie, shape, been, yet
3 Alternate tense used Most common word tense or agreement already included
e.g., waiting, falled, bleed
While reviewing the words for possible inclusion into the small keyword list, several questions
were considered: (1) how important is the word’s function or (2) how significant is the word’s
meaning with respect to identifying: a target response; a high or low risk situation; a state or
condition that would be suggestive of a high or low risk situation or a target response; a
necessary word for forming a question or responding to a question; a word that clearly suggests
66
who is calling; an important location; or a word that is an obvious possible command word for
the HELPER system.
In terms of the word exclusion criteria, combo words were excluded because these words could
in theory be formed after, as long as minimal word convergence occurred when moving from the
first to the second word. Word vagueness was considered mostly with respect to the specific risk
scenarios. For example, the word “abs” occurred in a response call categorized as ‘high risk” but
it was not included as a final keyword because the use of this word would likely require several
question/answer iterations to determine why the call is high risk. On the other hand, words such
as “bleeding” or “oxygen” also occurred in a “high risk” response call and these words could
clearly suggest some possible urgent/emergent situation in their meaning. If a word had alternate
tenses or word agreements, typically only the most frequently used tense/agreement was used.
For example, “paramedics” was retained over “paramedic” and “ambulance” was retained over
“ambulances.”
After applying the reduction rules, a smaller list of keywords was obtained but still larger than
185 words. To bring the final number down to 185, the words were sorted by word category to
ensure all remaining 16 word categories were covered (two categories were removed: (15) other
words and (12) interjections). Then, for each category, the words were reviewed again and
reduction decisions were made using the inclusion and exclusion criteria in Figure 2-13, except
the words which did not make the small keyword set would be included in the CARES corpus as
a phrase word or in the PES scenarios.
Of the words that did not make the small keyword set, 70 words, four (4) interjections (e.g., ah,
oh, uh, um), and three (3) combination words (i.e., heart attack, panic attack, blood pressure)
were specifically identified for inclusion into the PES phrases or scenarios in the CARES corpus.
Of the 179 keywords words identified by Coder 3 and Coders 1 and 2, 137 of these were
included in the small keyword set with 35 of the words also included in the CARES corpus as
words in a PES phrase or scenario, or as an alternate word tense of the same word (e.g., ‘leg’
included but not “legs”).
67
2.5.7 Identification of Key PES Phrases
A total of 185 PES phrases were selected from the response calls for inclusion in the CARES
corpus based on following guidelines:
1. All keywords must be used at least once in the 185 phrases (in at least one instance, two
keywords occur together in one phrase);
2. The additional words identified during the reduction of the full keyword set are to be
contained within the 185 PES phrases or PES scenarios;
3. The phrases should span the range of all three PES risk levels, two caller types, two call
reasons, and various response requests (e.g., phrases might involve breathing, falling,
accidental calls, urgent or emergent requests, requests for paramedics, ambulance or
other, general descriptions of medical condition range);
4. The phrase range should include different styles of communication for requesting
assistance (e.g., direct or indirect, narrative like or succinct) and requests for repetition;
5. Phrases should be mostly extracted from the beginning of the response call (within the
first four to five (4-5) speaker turns);
6. As many of the phrases and phrase segments identified from Coder 3 (from the keyword
coding) are to be incorporated into the final 185 phrases or the PES scenarios. See Table
2-10 for a breakdown of what was included; and
7. An additional phrase(s) that deals with a hypothetical initiation of the HELPER system
should be included.
In Table 2-10, out of the 135 phrases identified by Coder 3, 81 of the phrases selected were fully
included in the CARES corpus, 24 of the phrases were included as separate words, and 17 of the
phrases were included partially (e.g., “needs the ambulance” vs “needs an ambulance” or “I’m
really” vs. “I’m really so”). Of the phrases not included in the corpus, ten (10) contained
unintelligible words, and for the last three (3) phrases - one phrase had a different word tense
already included, one was a repeated phrase, and the last phrase did not meet the criteria for
inclusion (Guidelines 1-3 or 7).
68
Table 2-10: A breakdown of the phrase categories included in the CARES corpus selected by Coder 3, sorted by word categories.
Word Category (applied to the phrases)
Included fully
Included separately
Included partially
Positive Response (1-p) 3 -- 1 Negative Response (1-n) 1 2 2
Request/Command (2) 15 6 3 Existing Problem (3-e) 4 2 -- Current Problem (3-n) 41 8 8
Positive Condition (3-p) 5 2 -- Neutral Body State (4) 4 1 1
Politeness (5) 1 -- -- Targets (8) 1 -- --
Body Part (10) -- 2 -- Negation word (11) 5 1 2
Special Commands (14) 1 -- -- Sub-Total = 81 24 17
Total = 122
The selection of phrases for the CARES corpus began by taking the 185 keywords and searching
the response call transcripts for phrases that contained the keyword of interest. The phrases were
then organized according to the desired range of phrases desired, such as phrases of high risk,
medium risk, low risk, fall call phrases, medical call phrases, phrases spoken by older adults and
caregivers, and phrases with succinct requests for assistance versus narrative type requests.
Phrases were then selected from this list to be included in the corpus. Once a set of phrases was
obtained, the keywords that were not part of the small keyword set were also verified to be
present in these phrases. If they were not, phrases were replaced with others or existing phrases
were modified to ensure their inclusion. The phrases from Coder 3 were not considered until
after the phrases for the corpus had already been selected. Fortunately, the majority of the
phrases managed to be included and only 17 out of the 135 were partial inclusions.
2.5.8 Keywords in Various PESs
The PES model categories were used to classify the personal emergency response calls (i.e., low,
medium, and/or high risk levels, older adult and/or care provider callers, and/or fall or medical
calls) and the keywords used within each category were then identified. Table 2-11 shows a
69
breakdown of the number of keywords, from the final small keyword set, that were identified
according to the PES categories.
Table 2-11: The number of keywords identified by response call classification. (LR = low risk, MR = medium risk, HR = high risk, Fall = fall call, Med = medical call, OA = older adult call, and CG is caregiver call)
Call Classification LR MR HR Fall Med OA* CG
# keywords 26 147 137 112 162 160 113 % of total words
(out of 185) 14.05% 79.46% 74.05% 60.54% 87.57% 86.49% 61.08%
# of calls in each category
10 34 40 21 53 61 22
* 1 combination call was not included when examining words between OA and CG. For the other combination call, the CG response was commented out for processing.
In terms of unique keywords spoken, looking at risk level only, three (3) keywords were found
for low risk calls, 44 for medium risk calls, and 31 for high risk calls. There were 16 common
words used across the risk level and call reason PES categories. See Appendix C for a
breakdown of the 185 keywords and categories and Appendix D to see a list of the unique
keyword occurrences.
2.6 Discussion
2.6.1 Word Categories
With respect to the word categories developed for keyword identification from the call
transcripts, for both the out-of-context and in-context studies it would have been beneficial to run
a mock trial with the coders prior to the actual word categorization process. A mock trial would
provide a preview of how the coders are interpreting the word category definitions and would
offer an opportunity for clarifying misunderstandings prior to the actual real coding of the words.
Coder feedback on any categories they felt may have needed further explanation or the need to
create new or remove other categories could also have been done at this time. In this study, the
Coders were able to ask questions during their process of coding, however, in some cases, being
able to identify differences in interpretation would not arise until well into the coding process or
even after the coding has been completed and during the code comparison.
70
2.6.2 Coding Methods
In this study, Coder 1 and Coder 2 categorized words out-of-context, while Coder 3 categorized
the words in-context. As expected, the interrater reliability between Coder 3 and Coders 1 and 2
was lower than between Coder 1 and Coder 2, although still moderately similar. The fact that the
same word can have different meanings depending on its context underlines the importance of
coding the words in-context (e.g., ‘okay’ could mean ‘yes’ or just ‘acknowledgement’ and
‘accident’ could be a negative or positive result depending on the context). The same word may
then be assigned a different word category depending on how it is used in the utterance. In this
study, only one code could be considered for the calculation of the Cohen Kappa statistic and
this may have affected the results slightly.
Some inconsistency also resulted in terms of how to handle two word units (one word composed
of two words), such as “thank you” or “heart attack.” In this study these words were separately
extracted by the computer (e.g., “thank” and “you”) as they were not linked in the transcription
process. Coding in-context however, these words were identified as one unit. It may have been
better then to code these word units as one word as opposed to two words in the out-of-context
situation especially if all instances within the transcripts are of the two words occurring together
and never separately. However, given that the results were still of moderate agreement despite
this difference in coding method, does suggest that the process of word categorization with the
derived categories was still moderately robust.
2.6.3 Full Keyword List Identification
Several utterance extractions in the raw word list consisted of speech units that could not really
be considered keyword vocabulary even though the utterances may be important in terms of
understanding the situational context. For example, non-word sounds such as coughs or sighs
and unintelligible utterances (e.g., partially spoken words or word cut offs), may suggest that the
caller is having difficulty speaking or is ill, but in terms of being able to decipher meaning from
these speech units by themselves would be difficult for a computer. Out-of-context, these types
of speech units were considered non-keywords and coded as such. However, the fact that Coder
3 identified some of these unintelligible speech units as key while coding in-context, does
highlight the fact they are important in the actual conversational dialogue. For this study,
71
unintelligible and non-word sounds were not considered for inclusion into the small keyword set,
however, examples of these types of speech units were included in either the PES phrases or the
scenarios of the CARES corpus.
In the process of selecting keywords, all the Coders eliminated the function words (e.g., for, and,
to, so, etc.) from consideration as keywords. However, the identification of key short phrases by
Coder 3, re-introduced these function words and other words that may not mean much out-of-
context by themselves back into the list of possible keywords. For this reason, even though
adding the phrase words increased the number of possible keywords, it also produced a greater
mis-match in keyword comparison. This is reflected in the lower inter-rater reliability statistic
and percent agreement that was shown in Table 2-5 (77.8% dropped to 76.4% agreement).
2.6.4 The PES Model
Considering the end-user within the environment and situation where he/she will use the
technology during the design phase may make the difference between developing a technology
that will be adopted by the end-user and one that is not. The PES model developed in this study
is a very simple model and one of many possible ones to describe the PES. However, despite its
simplicity, incorporating these PES categories into the final selection of keywords and phrases
from the response calls should, in theory, ensure that the data reflect, to some degree, these
various PES aspects.
By replacing the older adult user with the PES model in Figure 2-5, and expanding the “classifier
unit” to include the PES categories, Figure 2-14 illustrates more closely the task of the HELPER
communication module and how the results of this study can be applied to improve the
automated PERS component.
In the future, other studies may want to consider other PES categories for caller types, such as
different genders, older adult age ranges, medical conditions, or personalities. Different care
provider types may also be of interest, such as those with a background in health care provision
versus those without this background. Another definition of a fall might also be used or the risk
level may be replaced by a condition type, such as individuals with chronic conditions versus
acute conditions (e.g., infections) as opposed to high versus medium risk.
72
1. Personal Emergency Situation 2. The HELPER System
Speech and non-speech
Communication Ability
ClassifierASR
• Caller Type(Who is calling?)
• Call Reason (Fall or medical?)
• Risk Level(Patient acuity?)
SI
Key words and phrases
HELPER Computer
What response?
Co
nve
rsat
ion
3. P
ER
S R
esp
on
se
Cares Corpus
Word and phrasecategories
Situation
Call ReasonRisk Level
Physical‐Cognitive State
UserCaller Type
Figure 2-14: A diagram of showing the pathway to personal emergency response including the PES model and categories within the classifier unit within the HELPER System.
2.6.5 Small Keyword List Identification
The process of small keyword list identification was performed by Coder 1 according to the
reduction rules identified in Figures 2-12 and 2-13, Tables 2-7, 2-8, and 2-9. Perhaps ideally, it
would have been better to use all 402 keywords identified from the full keyword set as the
“keyword data set” in the CARES corpus, however, this would have taken twice the amount of
time to record as well as increase costs. In addition, it would require a fairly long recording
session for the participants involved which may not be suitable for the more elderly participants.
During the keyword reduction process, the decision to select the initial word focus set to be all
words with a high frequency of occurrence and appearing across multiple risk categories was
mainly to identify words which would be commonly used across PES categories as well as ones
that would be frequently spoken. This is beneficial for ASR acoustic model training as it would
ensure that commonly used words across all situations would have the potential to be recognized
if included in the ASR vocabulary. On the other hand, not all PESs are the same and if the desire
is to be able to identify PES categories in hopes of deducing a target response, it would also be
73
extremely valuable to also look for keywords that have a higher probability of being only spoken
during specific situations or PES categories and not others (e.g., words used in high risk versus
low risk situations). As well, some PESs do not occur frequently but when they do occur it is
imperative that they be identified. For example, the word “heart attack” only occurred three
times (so less than the minimum five occurrence cut-off) but this is one situation that would
require an immediate emergency response. So words with a low frequency of occurrence were
also important to include and identify. For these reasons, the second word focus set could not be
eliminated.
Different word reduction processes were used for the first and second word focus sets mainly
because even though the words occurring less frequently may have possessed at least one of the
inclusion criteria, the words tended to be weaker at conveying why they should be included. For
example, the words seemed to be less able to indicate the situation’s risk level, desired target
response, the caller type, or describe the state/condition/symptom of the individual clearly so that
the other criteria could be determined. In considering the actual response call conversation,
response calls that took longer for the call taker to resolve contained more words and thus more
detail. However, this detail may have included these less frequently occurring words, that are
also less robust at conveying a PES’s risk level or needed target response. For example, an
analogy might be a patient complaining to a doctor that they have stomach pain and aches versus
one that is bleeding profusely or having a heart attack. The non-specific complaints of “aches
and pain” tends to take the physician more time to determine a specific cause for compared to the
complaint of profuse bleeding or someone having a heart attack.
With respect to including the words that were not part of the small keyword set into the PES
phrases or scenarios recorded in the CARES corpus, adding the word into a phrase was the first
preference. The main reason for this is that every participant providing speech samples was
required to speak every phrase whereas only three scenarios, selected out of a total set of nine,
were to be enacted by a participant.
2.6.6 PES Phrases
Although the study began with the intent of selecting keywords from transcripts, as the study
proceeded, it became clear that word combinations and phrases were also very important for
74
providing context to what was spoken. Especially, a preference was given for phrases occurring
within the first several turns of the response call conversation. The interest in the initial speaker
turns is because since the HELPER must provide a response as quickly as possible, it is not
expected that the HELPER should engage the user in a long and extended conversation. Ideally,
the HELPER must be able to identify a target quickly within a few speaker turns. If not, it may
be necessary to default to a live call taker. Therefore, the hypothesis for phrase selection was that
selecting phrases from the first several speaker turns of the live response call would reflect more
closely how callers would respond to the HELPER during an actual PES. The information
provided by these key phrases will be useful for developing the language model within the ASR
sub-component of the HELPER Speech Handler (see Figure 2-4 for the ASR main components).
2.6.7 Application to HELPER
An expansion of recognizable vocabulary in the HELPER will require some method of also
‘understanding’ this increased vocabulary. Although the language model in the ASR will help
the Speech Handler identify possible word configurations, it is the keyword categories that will
provide the vital information required by the semantic analyser, in the speech informant sub-
component of the Speech Handler, that will assist the HELPER in at least “artificially
understanding” the meaning behind what is presumed to have been said by the user.
In Figure 2-4, incoming speech being received by the Speech Handler’s ASR will be processed
by the Decoder using the three linguistic models trained with words and phrases from the
CARES corpus. The resulting “best match” of the words spoken by the user would then be sent
on to the semantic analyser in the Speech Informant which will attempt to interpret the meaning
of what was said by the user. So, in a hypothetical situation where the HELPER opens with “Do
you need help? (Please say ‘yes’ or ‘no’),” as the current system prototype does, and if the user
responds, “yes, could you send an ambulance?” the goal would be that the semantic analyser
could break down the words and identify that the user responded as shown in Table 2-12.
Table 2-12: Example of how an incoming statement might be deciphered by the semantic analyser
Spoken word yes could you send an ambulance Word Code (1-p) 9 nc 2 nc 8
Code description positive response
question word
request/
command target
*nc = no code
75
Given this information, the Speech Informant might then tell the HELPER’s Dialogue Handler
component that the user responded to the asked question “positively”, plus there is a request for
“target”=ambulance. The Dialogue Handler might then respond with “please confirm you would
like an ambulance,” as opposed to what the previous system prototype would do which is “would
you like me to call an ambulance? (Please say ‘yes’ or ‘no’).” The main difference in these
responses is that with the expanded vocabulary combined with an enhanced language model and
semantic analyser, the system should be able to recognize that the user has already stated their
desired target, whereas in the existing HELPER communication module prototype, the user
response is only recognized as being more similar to a “yes” or a “no” and the “ambulance”
request would not be recognized.
An interesting comment made by the physician (Coder 3 for keyword coding) was that while
reading the transcripts the caller could have been responded to sooner with the information that
had already been provided. This comment may allude to the need for more basic emergency
medical response training for call takers. The authors are not aware of what training the call
takers are provided or what minimum requirements are needed to be hired on as a call taker. One
research study did note however, that emergency response operators tend to be hired based on
their personality traits: their ability to listen, be sensitive, insightful, empathetic, and have good
intuition (Forslund, Kihlgren, & Kihlgren, 2004).
2.6.8 Study Limitations
The keywords and phrases obtained in this study were derived from a small sample of 84
response call transcripts provided by one call centre. These recordings cover only two provinces
in Canada, but not all the cities. Therefore, this research is limited by the number of different
types of PESs and response call conversations contained in the collection of studied personal
emergency response calls. Researcher bias also plays a role in the development of the keyword
and phrase categories as well as the process of keyword selection and reduction.
76
2.7 Conclusion
In conclusion, this study describes the process by which keywords and phrases spoken by PERS
users were identified for various PESs based on a proposed PES model. The study derived data
using directed, conventional, and quantitative content analyses of transcribed personal
emergency response calls. The main results include the identification of 18 word categories; the
categorization and isolation of 402 keywords and 135 phrases from transcripts of 84 response
calls; the development of a PES model; the reduction of the full keyword list into a small
keyword list of 185 keywords and phrases for incorporation into a spoken speech database.
Common words across the PES categories and unique words for the personal emergency
response situations categorized by risk level were identified. Prior research that examines
keywords and phrases used in response calls by PERS callers has not been previously identified
by researchers in the Intelligent Assistive Technology and Systems Lab. The results of this study
can be directly applied to improving the Speech Handler component of the HELPER and
expanding the HELPER’s recognition vocabulary for incoming speech. The hope is that this
work will contribute to the future development of a more robust HELPER system.
77
Chapter 3
3 Identification of Conversational Trends in Personal Emergency Response Calls
3.1 Prologue
This chapter explores the potential of using statistical measures of speech and conversational
data as an alternate source of information to increase the HELPER system’s confidence in
decision making, and to improve its ability to respond appropriately and efficiently to the end-
user. This study demonstrates the benefit of using both qualitative and quantitative analyses to
help examine the nuances of personal emergency response call conversations. The contents of
this chapter are intended for publication but have not yet been published.
3.2 Abstract
Purpose: A novel automated, intelligent, spoken dialogue-based personal emergency response
system concept is being developed in an attempt to address the existing usability barriers
identified by prior research groups of traditional push-button type personal emergency response
systems. However, spoken dialogue systems and automatic speech recognition technology
cannot perform optimally all of the time, especially with the expected target users in emergency
situations. The main objective of this study was to analyse statistical information from real
personal emergency response calls in order to identify significant call and conversation trends
that may be used to help the automated personal emergency response system tailor it’s dialogue
response to the end-user’s need(s).
Method: An emergent, exploratory, sequential mixed methods design was used for this study.
Personal emergency response calls were classified according to the personal emergency response
categories identified qualitatively from transcribed personal emergency response calls. Various
statistical analyses were performed involving different combinations of three conversational
measures: verbal ability, conversational structure, and timing; and three independent factors:
caller type, risk level, and speaker type.
78
Results: Emergency medical response services were identified as preferred responders for the
majority of medium and high risk calls by both caller types. Non-emergency medical service
responders were requested mainly during medium risk situations by older adult callers. Older
adult callers may be predicted with fairly high accuracy by measuring the caller’s spoken ‘words
per minute’ and ‘turn length in words.’ Average call taker response times were calculated in both
speaker turns and in seconds. Care providers and older adults were found to use different
conversational strategies when responding to the call taker. The words ‘ambulance’ and
‘paramedic’ seem to hold different latent connotations.
Conclusion: Tailoring the response dialogue of the automated personal emergency response
system to the caller can help minimize user frustration and improve call efficiency. Classifying
calls by caller type or risk level may help tailor the call dialogue to the user. Call taker response
times can also be used to limit the length of conversation before reaching a live operator. System
designers should consider when to use the terms “ambulance” or “paramedic” in their response
dialogue and/or to include both as possible responder options.
3.3 Introduction
3.3.1 Need for a New PERS
Over the last few decades, mounting concern over a growing elderly population combined with
advances in computing technology have stimulated research into new methods for improving the
traditional, push-button, personal emergency response system (PERS). This "second” generation
of PERS technologies have begun to incorporate technological advances such as automatic fall
detection and home based monitoring with sensors (Doughty, Cameron, & Garner, 1996;
Heinbüchner et al., 2010). Although the market for the next generation of PERS is large, the
technology is still young and the majority of existing PERS owners continue to use the
traditional, push-button PERSs. Hessels et al. (2011) provides a recent review of the latest
advances in personal emergency response technologies.
3.3.2 The HELPER System
In terms of identifying ways to improve PERS technology, some researchers have suggested that
activation be made using speech (keywords) (Hobbs, 1993; Taylor & Agamanolis, 2010). Other
79
researchers suggest that older adult home care technologies in general need to be made more
“attractive, provide privacy, [and] allow for informed choice” and reduced isolation (Blythe et
al., 2005). In an attempt to incorporate some of these design suggestions into a new PERS,
researchers in the Intelligent Assistive Technology and Systems Lab have been developing a
hands-free, speech and vision based smart home monitoring system called the HELPER system
(Health Evaluation Logging and Personal Emergency Response System) (Belshaw et al., 2011;
Hamill et al., 2009; Lee & Mihailidis, 2005; Tam et al., 2006). In theory, the HELPER would
continuously monitor the home for an adverse event (i.e., a fall) and then automatically initiate a
response sequence if such an event is detected. The person being monitored would communicate
first with an artificially intelligent HELPER call taker who would connect the user to their
desired responder. Using speech or vision to activate the PERS removes the need to wear a body
worn activator such as the traditional PERS “push-button” and will hypothetically increase the
user's autonomy and privacy by permitting the user to either direct or cancel the call before
reaching a live call operator.
3.3.3 HELPER Prototype Testing
Feasibility testing of the HELPER communication module by previous researchers successfully
demonstrated that automatic system activation via visual detection of a simulated adverse event,
followed by human-to-computer communication using spoken dialogue and automatic
recognition of incoming speech is possible (McLean, 2005). The HELPER prototype was tested
with younger adults in a controlled lab environment. The HELPER automatic speech recognizer
(ASR) was set to recognize “yes” and “no” word forms only and the automated dialogue was
modelled off of existing personal emergency response call centre protocol (McLean, 2005). In
these studies, the group of young adults successfully navigated the HELPER dialogue and
obtained assistance by responding to the “yes” and “no” queries in both quiet and noisy
conditions. The next phase of this research is to focus on furthering the design, development, and
fine-tuning of the HELPER communication module for actual end-users, especially older adults,
in real personal emergency situations (PESs).
80
3.3.4 Older Adults and Spoken Dialogue Systems
Research that examines the use of spoken dialogue systems by older adults have revealed that
older adults are more likely than non-older adults to communicate with spoken computer
dialogue systems as if they were human. Researchers have observed that their older adult
participants used more "definite articles, more auxiliaries, more first person pronouns, and ...
more lexical items related to social interaction, such as 'please' and 'thank you', compared to
younger adults (Mӧller et al., 2008). The challenge of the HELPER’s SDS will be to recognize
speech from end-users, mostly older adults, who may converse with it like a human, in
potentially stressful emergent situations where the speaker may have decreased communication
abilities (e.g., hesitations, disfluencies) or may not even be facing or close to the HELPER’s
input microphone. As such, it stands to reason, that to increase robustness in the HELPER
communication module, it would be important to not only include techniques for error recovery,
but to also provide any other supports possible that may assist the HELPER in deciding how to
respond to a call. This study hypothesizes that this “other support” may be derived from prior
real response call patterns and/or call dialogue (e.g., conversational measures) which may then
be used to classify a response call with a certain probability. This additional support information
combined with the ASR’s identified spoken word or utterance could be used to increase the
HELPER’s confidence in decision making and dialogue planning. Further details of the
HELPER system will be provided in the Background section of this chapter.
3.3.5 Study Objective & Research Significance
Previous research studies have used conversational analysis to examine emergency calls (911)
(Cromdal et al., 2008; Garcia & Parmer, 1999; Garner & Johnson, 2007; Imbens-Bailey, 2000;
Waseem et al., 2010; Whalen & Zimmerman, 1987) but no prior studies could be identified
specifically examining personal emergency response calls and call conversations with PERS
users. Therefore, the objective of this study is to identify significant trends in real personal
emergency response calls and call conversations that may be used to tailor the call response
to the user. The research results presented in this paper will incorporate a PES model previously
developed in study 1. Emergency medical response personnel, clinicians and care providers may
find the information from this study useful in helping them understand some of the
81
communication differences that may arise between PERS users during various PESs. Technology
developers should also find this information applicable to the design of personal emergency
communication technologies for older adults and for the development of personal emergency
communication protocols. Preliminary results from this study have been previously summarized
in a short one page conference paper (Young, Rochon, & Mihailidis, 2014). Relevant
background will be presented first, followed by the study methodology, results, discussion, and
conclusion.
3.3.6 Background
3.3.6.1 The HELPER System
Figure 3-1 illustrates the pathway to personal emergency response using the HELPER.
1. Personal Emergency Situation
The HELPER System
Ceiling/wall/shelf mounted
camera, speaker, microphone
2. HELPER ComputerSpeech or
vision activation
3b. Call ResponseEmergency Response Services
Personal Responder(s)
3a. PERS Call Takerspoken
dialogue
Live PersonWho is calling?Call reason?Situation risk level?Response required?
Is person present?Is person active?Is activity/inactivity normal?Activate communications?
2a. Vision Module
2b. Communication Module
No Response (false alarm)
Live Person
Hands free
Figure 3-1: Pathway to personal emergency response using the HELPER System.
The design is based on the concept that the HELPER will monitor the home for an adverse event
(e.g. a fall) (using its Vision Module – at 2a) and when detected, the system will automatically
initiate dialogue with the individual (using the Communication Module – at 2b). The individual
then communicates his/her need using speech with the HELPER functioning as a non-live, first
responder. The user may also activate the system manually by saying a specific keyword or
phrase (e.g., a cry for help). If assistance is required, the HELPER will subsequently initiate
82
contact with the desired live responder (see points 3a and 3b in Figure 3-1) or cancel the call if
no response is needed (e.g. a false alarm). In essence, the automated PERS functions similarly to
a hands-free telephone but with specialized and intelligent features.
3.3.6.2 The HELPER Communication Module
The ability of the HELPER computer to communicate with a human user “verbally” over several
speaker-turns places its communication module into a category of interactive dialogue systems
called a spoken dialogue system (SDS) (Fraser, 1997). According to Mӧller (2005) a SDS is
characterized by its ability to accept continuous speech, allow for user initiatives (i.e., user can
provide more information than requested), to reason, detect errors or incoherence, to correct,
anticipate, and/or predict the spoken user response. It is proposed that the HELPER
communication module contain all five of the basic functional components of a SDS including
(Georgila, Wolters, Moore, et al., 2010; Lamel et al., 2000; Mӧller, 2005):
(1) an ASR that receives an acoustic signal (spoken input) and transforms this into a most
probable word sequence;
(2) a Semantic Analyser or Natural Language Understanding component that deciphers the
meaning or intention of the probable word sequence;
(3) a Dialogue Manager that maintains the dialogue and keeps a history of responses;
(4) a Response Generation component that determines the output dialogue according to “the
dialog state, the user utterance, and/or information returned from the database” (Lamel
et al., 2000); and
(5) a Speech Synthesis component that converts selected system utterances to actual speech
output.
Additionally, a component for contacting a live responder is also included, conveniently called
the “call responder” component. ‘Dialogue measures’ and ‘classifier’ sub-components, located in
the Speech Informant and Dialogue Manager components respectively, are also proposed. Figure
3-2 illustrates the proposed internal components of the HELPER communication module.
83
The HELPER Communication Module
Incoming Speech
Spoken Output
Speech Handler
Response Handler
Dialogue Handler
Responder On Route
Automatic Speech Recognizer (ASR)
Dialogue Manager
Response Generation
Speech Synthesis
Speech Informant
Call Responder
Figure 3-2: Sub-sections and functional components of the HELPER Communication Module
The results of this study specifically focus on improving various aspects within the Speech
Informant, the Dialogue Manager, and the Response Generation components of the HELPER
communication module. Further detail is provided only on these sub-components.
Figure 3-3 illustrates the internal sub-components of the Speech Informant component.
Semantic Analyzer (NLU)
Dialogue MeasuresSpeech
Informant (SI)
From the ‘ASR’ sub-component
To the ‘Call Classifier’ in the Dialogue Handler
Figure 3-3: Inside the Speech Informant sub-component of the Speech Handler.
The ‘best match’ speech utterance obtained from the ASR is sent to the semantic analyser (on
right) for natural language processing to “understand” the meaning of what was said, and also the
84
dialogue measures sub-component (on left) in which “other information” will be extracted and
used to inform the Dialogue Handler about how to classify the PES.
Inside the Dialogue Handler, see Figure 3-4, information from the Speech Informant is first sent
to a PES classifier which will identify any details from the user’s utterance that can be used to
classify the PES. This information is then sent to the dialogue control where it is determined how
next to respond to what the user said. The controller first searchers the dialogue history, the
current dialogue set and dialog state selects a response. The Response Handler is then activated
where the proposed response can be generated or a call to the responder can be made.
DialogueHandler
(DH)PES Classifier
Dialogue Manager
(DM)Dialogue History
Dialogue StateDialogue Control
Dialogue Set
To Response Generation (RG) in RH
From ‘Speech Informant’ Component
(with SA or NLU)
To Call Responder (CR) in RH
Figure 3-4: Inside the dialogue handler component of an SDS.
The Response Handler is illustrated in Figure 3-5. Aspects of the diagram are derived from
(Mӧller, 2005). Inside the Response Generation component, a database of possible dialogue
responses (text) is searched for the response requested by the Dialogue Manager. This response
is then sent to the ‘Speech Synthesizer’, which searches a database for the desired spoken
dialogue units, synthesizes the text to speech if necessary (pre-recordings of output dialogue may
be used), and sends the response out to the user through a speaker system.
If the Call Responder component is activated, the Call Responder might check for a preferred
responder or look through a history of requests to inform the Dialogue Manager if any further
85
query is required. Once a desired responder is confirmed, the call to the desired responder is
initiated.
Spoken Output (to speakers)
ResponseHandler
Database of Dialogue Text
Select Response
Responder On Route
Response Generation
Speech Synthesis
Speech OutputDatabase of
Spoken Dialogue
Responder Information
Response Request History
Call Responder (Initiate/Confirm)
Call Responder
From Dialogue Manager
Figure 3-5: The internal components of the response handler within the SDS.
3.3.6.3 Human to Machine Spoken Dialogue Systems
The ability to simulate or replicate the human’s ability to recognize and understand speech using
technology has been a growing area of research for over 60 years (Anusuya & Katti, 2009; Gold
& Morgan, 2000). Although considerable progress has been made in the field of ASR, a human’s
capacity for speech recognition and understanding in a range of environments is still unmatched
and is superior to that of any machine (Dusan & Rabiner, 2005; Furui, 2003; Scharenborg,
2007). A major source of ASR error arises from a mismatch between the speech sounds used to
train the acoustic models and the actual incoming spoken speech to be recognized (Furui, 2003;
King, 2006). Automatically recognizing speech in human-to-human conversational speech is also
known to be a more difficult task than recognizing human-to-machine speech (Jurafsky &
Martin, 2009).
Generally speaking, when humans interact with a machine that can artificially recognize speech,
researchers have shown that they tend to simplify their speech, speaking more clearly and slowly
(Jurafsky & Martin, 2009). However, the danger in generalizing is that this statement may not be
true for all users interacting with speech recognizing machines. Specifically, research findings
86
from Mӧller et al. (2008) revealed that close to two-thirds of their older adult participants
interacting with a SDS did not adapt their speech but instead spoke to the system as if it was a
real human. The remaining older adult participants in that study did perform as expected and
adapted the way they spoke, using only the speech necessary to convey their meaning (Mӧller et
al., 2008).
Collectively, the research literature highlights the complexity and challenges of using ASR and
SDS and underlines the fact that technologies incorporating these techniques may not be able to
function perfectly 100% of the time even in optimal conditions with designed for users (Furui,
2003; Takahashi et al., 2003; Vipperla et al., 2009).
3.3.7 Study Focus as Applied to the HELPER
Given that the SDS may not function perfectly, in addition to speech recognition and
understanding, another method involving call classification is proposed to further support the
HELPER communication module and help it to determine the best way to tailor a response to the
end-user. Figure 3-6 expands on Figure 3-1 which illustrates the pathway to personal emergency
response using the HELPER system. On the left side of Figure 3-6, the PES model developed in
study 1 is used to characterize the personal emergency situation (1). The Helper System (2) is
represented in the middle-right, with some internal communication module components shown,
specifically the ASR, the Speech Informant (SI), and the classifier sub-component of the
Dialogue Handler. The PERS Response (3) completes the pathway on the far right.
This diagram demonstrates how the call classification sub-component (classifier) of the Dialogue
Handler would be used by the HELPER communication module. In this diagram, incoming
speech from the user is received by the HELPER computer. This speech input is processed
within the ASR to identify keywords and phrases. The keywords are categorized in the Speech
Informant to help derive the meaning of the recognized speech. In addition to recognizing and
categorizing the spoken words, conversational measures of speech (yellow star) could also be
used to help characterize the call situation.
Collectively this information would be used by the classifier sub-component to identify a
possible caller type, call reason, and/or medical risk level for the PES. If PES classifications can
87
be identified for a call, then this information could be used in addition to speech understanding
as a basis for modifying the call dialogue and matching it to the particular caller for the specific
PES. Timing information (yellow star) can also be used to limit the length of a response call
which would ensure the call does not continue indefinitely before defaulting to a live responder.
Study 2’s main focus will be on identifying the conversational measures that could be used to
classify a call as well as identifying the timing used to measure the length of a call.
1. Personal Emergency Situation 2. The HELPER System
Speech and non-speech
Communication Ability
ClassifierASR
• Caller Type(Who is calling?)
• Call Reason (Fall or medical?)
• Risk Level(Patient acuity?)
Timing
SI
Key words and phrases
Word categories
Conversationalmeasures
HELPER Computer
What response?
Co
nve
rsat
ion
3. P
ER
S R
esp
on
se
Situation
Call ReasonRisk Level
Physical‐Cognitive State
UserCaller Type
Figure 3-6: Diagram of the pathway to personal emergency response using the HELPER with the addition of the ‘conversational measures’ and ‘timing’ features added.
3.4 Methodology
3.4.1 Research Design Method
An exploratory, sequential, mixed methods design was used for this study. Clark & Creswell
(2011) provide a good introduction to this method which consists of a ‘qualitative data collection
and analysis’ phase followed by a ‘quantitative data collection and analysis phase’ and ending
with a ‘final interpretation’ as illustrated in Figure 3-7 (Clark & Creswell, 2011).
88
Qualitative Data Collection
and Analysis
Quantitative Data Collection
and AnalysisBuilds to Interpretation
Figure 3-7: Diagram of the process of exploratory sequential mixed methods design (Clark & Creswell, 2011).
For the ‘data collection and analysis phases’ of both the qualitative and quantitative portions of
this research design method, content analysis is the approach used. Crede & Borrego (2010)
provide an example of using the content analysis approach within a mixed methods design.
Content analysis is an attractive method of inquiry applied in many research fields for analyzing
text (and sometimes other media) in context of its use (Cavanagh, 1997; Krippendorff, 2012).
Over the decades, content analysis has been used increasingly in the field of health research (Elo
& Kyngäs, 2008; Mays & Pope, 2000). Content analysis is flexible enough to examine data both
qualitatively or quantitatively and inductively (e.g., specific to general) or deductively (e.g.,
general to specific based on existing theory) (Elo & Kyngäs, 2008; Krippendorff, 2012). Content
analysis has also been used frequently in the area of computer text analysis since the late 1950’s
and in artificial intelligence (Krippendorff, 2012). In the field of artificial intelligence,
researchers were mainly focused on designing machines capable of understanding natural
language (Krippendorff, 2012), which is precisely a component of interest within the HELPER.
When used as a research method, content analysis is noted as being systematic, objective,
repeatable and a valid means of either quantifying phenomena or making inferences about data in
context (Krippendorff, 2012). Typically new knowledge or insights are dervied in the form of
concepts or categories describing some phonomenon or for the purpose of building a model,
conceptual system or map (Elo & Kyngäs, 2008). The outcome of a content analysis may be used
to guide future action which is especially useful in the field of health research (Elo & Kyngäs,
2008).
3.4.1.1 Method Limitations
In terms of limitations, the flexible advantage of content analysis is also its restriction. Some
researchers have noted that because content analysis does not proceed linearly and has minimal
89
formalized procedures, it can become more complex and difficult to implement than quantitative
analysis (Polit & Beck, 2004).
3.4.1.2 Method Implementation
The general procedure for implementing a content analysis include (Elo & Kyngäs, 2008;
Graneheim & Lundman, 2004; Krippendorff, 2012):
1. Selecting a unit of analysis (e.g., interviews, a program, parts of text);
2. Within the unit of analysis, selecting a meaning/coding/content/recording unit. Essentially,
one must decide what to analyse, to what degree of detail, and how sampling will be
conducted (e.g., should the codes include silence, sighs, laughter, and postures?);
3. Organizing the data (e.g., use open coding, categories, themes, abstractions);
4. Creating a model, conceptual system or map, or categories.
3.4.1.3 Method Approaches
Various approaches to the application of content analysis exist in research. Content analysis at
the conversation level becomes conversational analysis. For this study a conventional
conversational analysis will be performed followed by a quantitative conversational analysis. For
the conventional conversational analysis, coding categories are typically derived directly from
the conversational data and are generally used to describe a phenomenon in the data (Hsieh &
Shannon, 2005). For the quantitative conversational analysis, conversational data is coded into
explicit categories and then described using statistics (Morgan, 1993).
Conversational analysis focuses on studying naturally occurring speech as it ordinarily unfolds in
social settings (Mondada, 2012). The conversations may be studied through recorded voice or
video or by transcriptions of interactions using a specialized transcription convention
(Krippendorff, 2012). The main purpose of conversational analysis is to understand the structure
of “talk in interaction” with a minimum of two participants (Krippendorff, 2012). Conversational
analysis examines phenomena such as turn-taking, conversational moves, and other aspects of
conversation, all of which are of primary interest for this study (Krippendorff, 2012). This
method has been used to study medical interactions for over 30 years in settings between
physicians and patients, as well as, in other allied health specialty settings (Teas Gill & Roberts,
90
2012). The results of conversational analysis studies have also been applied to help improve
many medically related applications ranging from medical education, informing current medical
practices, and enhancing patient-provider communications (Teas Gill & Roberts, 2012).
3.4.2 Research Design Details
3.4.2.1 Research Population
All recorded calls used in this study were between the clients of the PERS provider or a care
provider and the PERS providers’ call taker. In a few cases, emergency medical service (EMS)
dispatchers or the PERS setup personnel were also included in the call. No subscriber details
were provided with the calls, but caller age and gender details were deduced from within the call
conversations where possible. We are unaware of any prior call "sorting", for example with
respect to gender, call type, caller type, and emergency risk level that may have occurred.
3.4.2.2 Research Setting
This study was completed at the University of Toronto in the Rehabilitation Sciences Institute.
The data processing was performed in the Intelligent Assistive Technology and Systems
Laboratory.
3.4.2.3 Data Collection
Personal Emergency Response Call Recordings
The personal emergency response calls used in this study were provided by a local, private PERS
provider upon our request for a sample of emergency and non-emergency calls. The non-
emergency calls recorded included: false alarms or accidental system activations, installation
setups or equipment test calls, scheduled check-ins, translation requests, and follow-up calls. The
emergency calls recorded included genuine emergency calls for either EMS (i.e., 911,
paramedics) or non-EMS emergency responders (i.e., relatives, friends, or professional care
providers). A total of 109 digitized call recordings were obtained from the PERS provider (name
withheld for confidentiality). These recordings were collected in two sessions over two years
(2008 - 52 calls and 2009 - 57 calls). All recordings were made in Canada. To our knowledge, all
clients in this study used the traditional push-button activator.
91
Confidentiality
Confidentiality agreements were signed between the private call centre providing the call
recordings and the Intelligent Assistive Technology and Systems Lab. These agreements outlined
how the data would be used and stored. In terms of usage, all transcripts were to be stripped of
personal or identifying information and access to call recordings would be limited to select
individuals upon approval by the Company. In terms of storage, all recordings would be kept in a
secure and locked location and all digital recordings on the computer would be kept under
password protection on a lab computer. All correspondences with the Company would also be
kept confidential.
3.4.2.4 Data Processing
Call Transcripts
Eighty-four (84) response calls were transcribed in total. The 24 non-transcribed calls consisted
of repeat recordings or were conversations between the emergency response service providers
only (i.e. between the personal emergency response provider’s call taker and EMS dispatchers
without subscriber involvement). Transcription was performed verbatim from digital audio files
using the computer software, “Systematic Analysis of Language Transcripts” (SALT), version
8.0 and 9.0 (Miller & Iglesias, 2006). The transcription process followed the SALT protocol
outlined in the user manual (Miller & Chapman, 2008). SALT was specially designed software
for “eliciting, transcribing, and analyzing language samples.” As such, in addition to
transcription tools, the SALT software also includes various analytical tools, including, but not
limited to, the ability to code words and utterances, and calculate words per minute or
conversational time lengths. The coding units of interest were extracted from the response call
transcripts using the "explore multiple transcripts" and "rectangular data file" features of the
SALT software.
Transcriptions were completed by listening to the digital call recordings on a computer using
headphones. The audio content was then transcribed directly into text in the SALT program. An
effort was made to capture non-word utterances (e.g., coughing), fillers (e.g., ‘eh’, ‘ah’), and to
note silent moments during the conversation. Patient identifying information was excluded in the
transcripts (i.e., no names, addresses, or contact information), however caller gender was
92
postulated based on clues from the conversation (i.e. use of “him” or “her” from a care provider,
or perceived voice pitch). If an age was mentioned in the conversation, this number was noted in
the comments section of the call transcript. Due to the nature of the working agreement with the
company providing the PERS, only a limited number of the laboratory research team members
had permission to listen to the raw call recordings.
These real call samples all had a fair amount of background noise embedded in the recordings,
presumably caused by both the caller’s and call centre’s background environments, as well as
being inherent in the recording equipment. During transcription, recordings had to be paused
frequently and the volume adjusted to very high levels in order to catch what was being said in
the conversation. Call recordings were stored on the computer as *.wav files and played using
Audacity (version 2.02) an open source, freeware for listening to and editing sound files
(Mazzoni et al., 2000). The sound files were played back using the “mono" or single audio track
with a sampling rate of 8 kHz and a sample format of 32-bit floating point.
Statistical Software
All data exploration and statistical analyses were performed using IBM SPSS Statistical software
package versions 21 and 22 (IBM, 2014).
3.4.2.5 Data Analysis
Figure 3-8 illustrates the main steps followed within two content analyses conducted in this
study. Starting at the large green arrow at the top left, “naïve” listening of the call recordings and
reading of the transcripts were first used to obtain a superficial and preliminary understanding of
the conversations and to identify possible directions for the analyses. Both analyses were
performed on data from transcribed calls, with one analysis at the call level and the other at the
conversational level. Starting at (a), a conventional conversational analysis was performed to
identify possible responder-type categories. The PES model categories from study 1 were used in
conjunction with the responder-type category to create the ‘personal emergency response’ (PER)
model. For each of the response calls, a call response (responder) was identified. At (b), a
quantitative conversational analysis was performed to identify conversational measures that
could be used to help classify a call according to the PER model categories. Conversational
93
measure data was isolated from the response calls and statistical analyses performed. Significant
relationships and trends were identified. This new information can be applied to improve the
HELPER communication model’s ability to tailor the call dialogue and proposed responder to
the specific user for various PESs.
Call Transcriptions (SALT)
Personal Emergency Response Calls
Personal Emergency Situation Model
Categories
Identify Conversational Measures
Perform statistical analyses using conversational measures
data and PER categories
To Improve artificial intelligence (Decision Making
& Dialogue Management)
Quantitative Conversational Analysis at Conversation/Turn Level
Identify Significant Relationships
Identify responder categories
Identify response call responders
Conventional Content Analysis at Call Level
Develop Personal Emergency Response (PER) Model
Extract conversational measure data from
response calls
(b)
(a)
Figure 3-8: This flow diagram illustrates how calls were analysed and how outcomes were and could be applied.
Column (a) a call level analysis to identify responder type categories and to develop the PER model; Column (b) a quantitative conversational level analysis to identify conversational measures and significant relationships between measures and PER categories.
Identification of Response-Type Categories
In Study 1 (Chapter 2), a PES model was introduced that characterized the personal emergency
situation by caller type, call reason, and risk level. The model is reproduced in Figure 3-9. When
a PES is linked to a personal emergency response call, the model may be expanded to include
categories representing the final personal emergency response to be provided. Categories for the
additional response-type classification were derived from the call conversations.
94
Situation
Call Reason
Risk Level
Physical‐Cognitive State
UserCaller Type
Caller Type
• Older adult
• Care provider
Call Reason
• Medical call
• Fall call
Risk Level
• High Risk
• Medium Risk
• Low Risk
Figure 3-9: The PES model characterized by caller type, risk level, and call reason.
Personal Emergency Response Calls Included
With respect to organizing the call transcripts using the caller type category, in six (6) of the 84
calls, mixed conversations occurred with more than two callers speaking simultaneously;
specifically, the older adult user and one or more care providers. For analysis purposes, four of
these six combination calls were analysed using only speech input from the older adult user
(these calls were classified as 'older adult' calls). For the remaining two calls it was not possible
to extract only the older adult caller’s speech from the conversation whilst still maintaining
conversational coherence. As a result, these calls remain classified as “combination calls”.
Seventy-two (72) of 84 transcribed calls were used in the conventional conversational analysis.
Not included in the analyses were: nine false alarm calls (all non-emergent) initiated by the older
adult subscriber; one follow-up call (non-emergent) where the call taker was calling for a status
update from the older adult (e.g., have you been looked after?); two combination medical calls (
as described above: one urgent and one emergent call) made by the older adult subscriber and
care provider together. One of the combination calls excluded was also identified as an outlier
call due to a large number of speaker turns. This excluded call was classified as an urgent
medium risk, medical call and involved a high number of interactions with the care provider.
95
Selection of Conversational Measures
The conversational measures examined in this study included measures of verbal ability,
conversational structure, and timing for both callers and call takers. A brief summary of these
measures is provided in this section.
Verbal Ability
Three aspects of verbal ability were examined: rate of speech, speaker turn length, and
disfluency.
Rate of Speech
An older adult’s overall rate of speech and intelligibility can be affected by physiological
changes in the aging body as a result of higher breathing frequency and reduced vocal range,
speed and accuracy of structural movement (Zraick et al., 2006). In this study, we hypothesized
that older adult users speak more slowly than call takers and analysed speaker differences using
mean words per minute (WPM) and utterances per minute (UPM). In SALT (Miller & Chapman,
2008), WPM is determined by calculating the total completed words spoken per minute based on
elapsed time (includes main body words and mazes) and UPM is calculated using the total
number of utterance attempts per minute based on elapsed time including all speaker attempts.
Existing literature has previously shown that the rate of speech for older adults tends to be lower
than that of younger individuals (Yuan, Liberman, & Cieri, 2006). Whether this is true during a
PES will be determined.
Speaker Turn Length
According to Sacks, Schegloff, & Jefferson (1974), “the organization of 'taking turns to talk' is
fundamental to conversation…" (pg.2). In this analysis, a ‘speaker turn’ is defined as the unit of
speech or thought communicated by a participant during their turn to talk in a response call
conversation. The end of the first speaker's turn may be signaled either by silence or interruption
by the next speaker thereby causing the first speaker to stop speaking. The model of turn-taking
is outlined by (Sacks et al., 1974). Measures of "mean turn length (in words)" indicate how many
words the caller(s) and call taker utters during their turn to speak. In SALT, the mean turn length
in words is calculated using all main body words but excludes maze words. A speaker turn
96
length includes all "contiguous utterances of the same speaker" including non-verbal,
incomplete, or unintelligible utterances (Miller & Chapman, 2008).
Disfluency
Disfluencies are part of normal speech (Culatta & Leeper, 1990) and may be marked by the
presence of mazes. Reference (N. E. Hall, Wagovich, & Bernstein Ratner, 2007) defines a maze
as “a marker of linguistic disfluency in spontaneous speech,” p.162. The SALT help manual
defines a maze as, "any filled pause [e.g., uh, ah], false starts [e.g. and I (ha*) have], repetitions
[e.g., (and) and I] and reformulations [e.g. (He and) he said] that are parenthesized in the
utterance. ... When maze words are removed from the utterance, the remaining words can stand
alone." (Miller & Chapman, 2008). Ordinarily, mazes occur when a speaker is expressing an idea
that may be abstract, complicated or partially formulated (Leadholm & Miller, 1995). Research
studies suggest that 6-10% or more of spontaneous speech will contain mazes depending on the
discourse and situational context with older adults producing slightly more than younger adults
(Bortfeld, Leon, Bloom, Schober, & Brennan, 2001; Fox Tree, 1995; Shriberg, 1999). By
examining the proportion of total word mazes occurring more than 10% of the time, an estimate
of the number of disfluencies above typical expectations will be obtained.
Note that 'speaker intelligibility' was excluded from this analysis. Upon close examination of the
recorded transcripts, it was difficult to determine true unintelligibility in many situations due to
recording issues (e.g., two speakers speaking concurrently; call taker’s voice being recorded
directly at the microphone versus the caller’s being transmitted over speaker phone).
Conversational Structure
Four aspects of conversational structure were examined: the number of statements, questions,
responses to questions, and one word responses.
Statements, Questions and Responses
Research literature on emergency calls (Whalen & Zimmerman, 1987, 1987), Emergency Call
Centre protocol (Private_PERS_Call_Centre, 2008), and on-site observations of call takers
indicate a majority of the queries in the call conversation are by call takers and a majority of the
responses to questions in the call conversation are by PERS users. Analyzing
97
statements/queries/and response aspect of the call conversation will help identify any differences
in conversational structure between callers and call takers and verify what is known to occur via
the personal emergency response provider’s call handling protocol.
One Word Utterances
One speaker turn may be composed of one or more speaker utterances either verbal or non-
verbal. In this study, utterances were determined using the phonological method of segmentation
as described in the SALT manual (Miller & Chapman, 2008). The number of one word
utterances will give an idea of the frequency of short one word statements within a conversation.
The number of one word utterances is expected to be higher among callers versus call takers.
Response Call Response Time
In an actual emergency situation, seconds matter. Eight minutes or less is the current
recommended target time for 90% of emergency responses (Eisenberg, Bergner, Hallstrom, &
others, 1979; Mullie, Van Hoeyweghen, & Quets, 1989; Pons et al., 2005; Silverman et al.,
2007). For individuals in cardiac arrest, a response time of five minutes or less has been found to
increase survival rates for patients (Blackwell & Kaufman, 2002; Pons et al., 2005; Silverman et
al., 2007). In a PES, the call taker's main goal is to determine what response is required during a
call and to initiate an appropriate response as quickly as possible. We define the time between
when the response call conversation first begins to the time when the call taker says "good-bye"
or puts the caller on hold to initiate a call to the desired responder (e.g., to call the ambulance), as
the ‘call taker response time’. Two measures were used to determine the response time: (1) the
number of speaker turns and (2) time in seconds. These two timing measures will be useful for
establishing standards against which response call conversations with the HELPER can be
compared. In this study, two response call categories: (1) caller type and (2) risk level were used
to assess their effect on response time.
98
3.5 Results
3.5.1 The Conventional Conversational Analysis
3.5.1.1 Two Main Response Types
Based on the response call transcript conversations, two broad categories were identified for the
“response type” classification: (1) EMS and (2) other responders. The EMS group includes
paramedics, fire fighters, and police. The “other responders” group includes non-EMS providers,
such as family, friends, or acquaintances, in addition to professional care providers such as
personal support workers (PSWs) or nurses. An “all responder” category could also be
considered representing the situation where “all responders” are called to attend a PES including
both EMS and other responders.
3.5.1.2 A Closer Look at Response Types
In reading through the response call transcripts, one particular call revealed that the terms
“ambulance” and “paramedic” could be perceived as different assistance request types. This
finding is interesting as the caller was not simply using the terms interchangeably. The caller
specifically declined the proposal for an “ambulance” and requested a “paramedic” be sent
instead. The reasoning behind this was that this caller did not want to go to the hospital. The
following excerpt from the transcript (call example 1) is presented below (E = call taker, C =
older adult caller, arrow brackets < > mark overlapping speech, parentheses ( ) mark repeated
speech or mazes, and {} mark comments or other noises):
Call Example 1:
Line 1 E: Do you need an ambulance?
Line 2 C: {Grunt} No, I don’t need an ambulance, I thought paramedics or something <> to check me over.
Line 3 E: <Yes>, you want the paramedics to come and check you over?
Line 4 C: Yeah, (I) I don’t want (an) an ambulance <>.
Line 5 E: <Oh>.
Line 6 C: <Cause> I’m not going anywhere.
99
In situations where callers are requesting non-EMS responders, it is important to note that even
when medical attention may be necessary, the PERS user is very clear about wanting someone
other than EMS support. In call example 2, the older adult is feeling weak and vomiting.
However, when asked if an ambulance is required, this caller requests the daughter as an
alternate response.
Call Example 2:
Line 1 E: Hello, how are you?
Line 2 C: Oh, I need help (weak, shaky voice).
Line 3 E: What’s wrong?
Line 4 C: (Oh I) I keep throwing up and going to the bathroom.
Line 5 E: (You) You’re vomiting?
Line 6 E: How long has this been going on?
Line 7 C: Oh, it just started now <xx>. {xx = two unintelligible words}
Line 8 E: <Okay>.
Line 9 E: Okay, is there anyone there with you right now?
Line 10 C: No.
Line 11 E: Okay.
Line 12 E: Okay so do you want me to call an ambulance for you or did <you wan*>>
Line 13 {E was cut-off mid-word}
Line 14 C: <No> no, I just want you to call my daughter.
Call Example 2 shows how the call taker quickly assesses the PES and makes an initial decision
about what response to provide. From lines 4 to 9, the call taker identifies the problem and if
anyone is onsite. At line 12, the call taker has decided to suggest an ambulance.
In call example 3, the older adult would like assistance and the paramedics are offered, but the
preference is for someone else. Unfortunately, there are no other responders on this caller’s list.
The call operator concludes that only paramedics can be sent in this situation.
Call Example 3:
Line 1 C: (Eh) I wonder if you could (have s*) send somebody down to my place?
Line 2 E: And who would you like me to call for you?
100
Line 3 C: (Eh) well x {possible grunt} nobody.
Line 4 E: Would you like the paramedics?
Line 5 C: (Ah) can you get somebody else?
Line 6 E: Somebody else, other than the paramedics?
Line 7 C: That’s right.
Line 8 E: Oh well (uh) you don’t have any responders on your file.
Line 9 E: (Uh), is there anyone in particular you would like me to call?
Line 10 C: No.
Line 11 E: Okay, we can only call the paramedics.
In Call Example 3, the call taker is asked for assistance in line 1 right away and so has not had a
chance to assess the situation. In line 2, the call taker leaves it up to the caller to inform her of
what response is desired. However, from lines 3-10, the call taker discovers that even though the
caller does not want EMS, this is the only response that can be provided. In line 11, she explains
this to the caller.
These excerpts not only show the importance placed on having different types of response
targets, but also gives an example of some of the speech input that may make ASR challenging.
For example, in Call Example 1, line 2, a “no” response is immediately followed by a suggestion
for a different response, and the “grunt” at the beginning would be considered a word by the
ASR system which is out-of-vocabulary. Call Example 3 also shows a special situation where
even when help is offered and refused, there may need to be a state in the dialogue set that
indicates that the response offered is the only available option.
The final categories selected to characterize the response call’s response types include: (1)
ambulance, (2) paramedic, (3) other responder, and (4) all responders (EMS and other).
3.5.1.3 The Personal Emergency Response (PER) Model
As illustrated in Figure 3-10, the PES model was expanded to the PER model. The personal
emergency response model includes the situation classified by caller type, risk level, call reason,
and response type. Sub-categories within each classification are shown.
101
Response ObtainedResponse Type
Situation
Call Reason
Risk Level
Physical‐Cognitive State
UserCaller Type
Caller Type
• Older adult
• Care provider
Call Reason
• Medical call
• Fall cal l
Risk Level
• High Risk
• Medium Risk
• Low Risk
Response Type
• Ambulance
• Paramedic
• Other responder
• All Responders
Figure 3-10: The personal emergency response (PER) model.
3.5.2 Conversational Analysis using PER Categories
3.5.2.1 Descriptive Statistics
Fifty (50) calls were made by older adult callers and 22 were made by care providers. Subscriber
age at the time of the call was determined for 53 of 84 calls (63%). Mean age was 82 years
(standard deviation (S.D.) = 8.79) with the youngest known age being 51 years and the oldest
known age being 100 years old. There were 69 female and 15 males subscribers, with gender
being inferred from the conversation (i.e. use of “he” or “she” by the other caller) or by voice
pitch (low for males, higher for females). The higher female caller ratio observed in the
collection of response call transcripts is common amongst PERS users in this age group (Fallis et
al., 2007; Heinbüchner et al., 2010; Hyer & Rudick, 1994; Taylor & Agamanolis, 2010).
3.5.2.2 Call Breakdown Using PER Classifications
Associations between caller type, risk level, call reason, and response type were examined. See
Figures 3-11a and 3-11b for a breakdown of the frequency of response calls by caller type (older
adult vs. care provider) by risk level (emergent high vs. urgent medium risk level), call reason
(fall vs. medical call), and response type (EMS vs. Other Responders).
102
23
1
15
6
32
0
5
10
15
20
25
EMSResponse
OtherResponse
EMSResponse
OtherResponse
EMSResponse
OtherResponse
EMSResponse
OtherResponse
Medical Calls Fall Calls Medical Calls Fall Calls
Emergent (High) Risk Level Urgent (Medium) Risk Level
Freq
uency (#)
15
7
0
5
10
15
20
25
EMSResponse
OtherResponse
EMSResponse
OtherResponse
EMSResponse
OtherResponse
EMSResponse
OtherResponse
Medical Calls Fall Calls Medical Calls Fall Calls
Emergent (High) Risk Level Urgent (Medium) Risk Level
Freq
uen
cy (#)
Figure 3-11: Older Adult and Care Provider responders requested during a response call. The older adult responses are in (a) and the care provider responses are in (b).
Using Pearson's Chi Square statistic and Fisher's Exact test (in cases where counts were less than
five), no significant associations were found between caller type and call reason; caller type and
risk level; and call reason and response type.
(a)
(b)
103
Three significant associations were found:
Caller Type vs. Response Type
Using Fisher’s Exact test, a borderline significant relationship was found between caller type and
response type, p=0.049 (Exact sig., 2-sided). This result suggests that a difference exists between
the response-type requested by different callers. Specifically, older adult and care provider
callers were found to both request EMS responses, however, older adults also made requests for
other responders.
Risk Level vs. Call Reason
Using Fisher’s Exact test, a significant relationship was found between risk level and call reason,
p=0.017 (Exact sig., 2-sided). This result suggests that emergent high risk calls were more likely
to be medically related than fall related.
Risk Level vs. Response Type
Using Fisher’s Exact test, a significant relationship was found between risk level and response
type, p=0.009 (Exact sig., 2-sided). This result suggests that emergent high risk calls were more
likely to lead to an EMS response whereas urgent medium risk calls might also lead to other
response types.
3.5.2.3 Breakdown of Response Types
In terms of response type, care providers requested EMS responses 100% of the time for both
high and medium risk medical situations. Of the three calls requesting a 'paramedic' response,
two calls were high risk and one was medium risk. For older adult callers, EMS responses were
requested 96% of the time in high risk, medical call situations: 19 out of 24 calls were for
ambulances. The other calls consisted of two calls for the 'paramedics', one call for an ‘other
responder’, and two calls for both ‘EMS and other’ responders. In medium risk medical call
situations, EMS requests dropped to 71% with 12 out of 21 calls for the ‘ambulance’, two calls
for the 'paramedics', six calls for ‘other responders’, and one call for ‘EMS and other’
responders. In medium risk fall situations (five calls total), there was a fairly distributed range of
104
requests with two calls for the ambulance, one call for the paramedic, and two calls for ‘other
responders’.
3.5.3 Conversational Analysis using Conversational Measures
Significant group relationships between caller type and response type, risk level and call reason,
and risk level and response type suggest that the desired ‘response type’ may be identified if the
‘caller type’ and/or ‘risk level’ could be determined. Identifying the ‘call reason’ may also help
in identifying ‘risk level’ or vice versa which could then be used to estimate “response type”.
In this exploratory analysis, three repeated measures multivariate analysis of variance
(RM_MANOVA) tests were conducted to examine the relationships between three independent
factors: (1) caller type, (2) risk level, and (3) speaker type, with three conversational speech
measures: (1) verbal ability, (2) conversational structure, and (3) timing. The independent factor
‘speaker type’ was included in the RM_MANOVAs to allow for a comparison of ‘callers’
against ‘call takers’. The ‘call takers’ represent the "norm" or "control" group because these
individuals are not experiencing the emergency situation themselves. Due to a low number of
data points within the ‘response type’ group’s 'non-EMS calls' and the ‘call reason’ group’s 'fall
calls', these factors were not included in the RM_MANOVAs. The independent factors used in
the analysis consisted of two levels each: speaker type included ‘callers’ and ‘call takers’; caller
type included ‘older adult’ and ‘care provider’ callers; and risk level included ‘high (emergent)’
and ‘medium (urgent)’ medical risk levels. Speaker type was a ‘within subjects factor’ and risk
level and caller type were ‘between subject factors’.
An additional outlier was removed in this analysis. The call removed was an urgent medium risk,
fall call by an older adult and had a higher number of speaker turns due to hearing difficulties
between the older adult caller and the call taker. In the call there were several question
repetitions, clarifications, circular conversations and additional requests. This outlier was kept in
the analysis using PER categories because speaker turns and timing were not being assessed and
the data would not be affected by its inclusion. A total of 71 calls were used in this analysis with
unbalanced group counts.
105
Univariate analysis of variance (ANOVA) tests and t-tests were conducted following the
RM_MANOVA to compare different groups with significant multivariate effects. Discriminant
analyses were also conducted to examine which and how well these measures could be used to
predict significant independent factors.
3.5.3.1 Analysis of Verbal Ability Measures
The three measures of verbal ability were: words per minute (WPM), utterances per minute
(UPM), and turn length in words (TNL). The proportion of total words with mazes (MZW)
was examined independently as MZW could not be normalized sufficiently to include in the
RM_MANOVA. Log10(x) transformations were applied to UPM and TNL, and a square root
transformation was applied to MZW in order to normalize the data as outlined in (Field, 2005).
No transformations were required for WPM. Moderate correlation was observed between WPM,
UPM, and TNL; and between MZW and UPM; with Pearson's correlation coefficients ranging
from 0.3 to 0.65, p<0.001. All other correlations were less than 0.3 or were not significant.
The results of the RM_MANOVA revealed significant within subjects multivariate effects for
speaker type, Wilks' λ = 0.616, F(3,65) = 13.52, p<0.001, η2 = 0.384, and for the interaction
between speaker and caller type, Wilks' λ = 0.742, F(3,65) = 7.53, p<0.001, η2 = 0.258. A
borderline significant effect was observed between speaker type and risk level, Wilks' λ = 0.891,
F(3,65) = 2.65, p=0.056, η2 = 0.109. Between subjects, significant multivariate effects were
obtained for both caller type, Wilks' λ = 0.689, F(3,65) = 9.80, p<0.001, η2 = 0.311, and risk
level, Wilks' λ = 0.887, F(3,65) = 2.77, p=0.049, η2 = 0.113. There was no significant
multivariate interaction effect on caller type and risk level. No significant multivariate effect was
observed for the three-way interactions between speaker type, caller type and risk level. In
Figure 3-12, four box plots show: (a) mean words per minute; (b) mean utterances per minute;
(c) mean turn length in words; and (d) mean mazes per total number of spoken words, broken
down by risk levels for caller and speaker types.
106
(a)
(c) (d)
(b)
Figure 3-12: Boxplots of verbal ability measures broken down by risk levels for caller and speaker types.
Words per Minute
As observed in Figure 3-12a, the mean WPM spoken by older adult callers was found to be
significantly lower than that of care provider callers, F(1,67)=19.75, p<0.001, η2 = 0.228
(univariate test for caller type). The mean WPM spoken by callers as a group was found to be
significantly different than the call taker group, F(1,67)=5.48, p=0.022, η2 = 0.076 (univariate
test for speaker type) and a significant interaction effect was obtained between speaker type and
caller type, F(1,67)=19.0, p<0.001, η2 = 0.221. Paired samples t-tests conducted between each
caller level and the associated call takers revealed no significant difference in WPM between
107
care provider callers and call takers but a significant difference between older adult callers and
call takers, t(48)=-6.51, p<0.001. These results suggest that older adult callers speak
significantly fewer WPM compared to both care provider callers and call takers.
The mean WPM spoken was found to be similar across high and medium risk levels for callers
(no significant interaction effect was observed between risk level and caller type), but was
significantly different between callers and call takers, F(1,67)=7.09, p=0.010, η2 = 0.096
(interaction effect between speaker type and risk level). Independent samples t-tests conducted
for each caller level and the call taker group between risk levels revealed no significant
differences in WPM between high and medium risk levels for the care provider or older adult
caller groups, but a significant difference in WPM was observed between high and medium risk
levels for the call taker group, t(69) = 3.15, p=0.002. These results suggest that the call taker
speaks significantly higher mean WPM during high risk situations compared to medium
risk situations, a difference not observed in the caller group.
Utterances per Minute
As observed in Figure 3-12b, callers were found to use significantly fewer UPM compared to
call takers regardless of risk level, F(1,67)=41.22, p<0.001, η2 = 0.381 (univariate test for
speaker type) and F(1,67)=7.34, p=0.009, η2 = 0.099 (univariate test for risk level). There was
no significant effect for caller type. A significant interaction effect was obtained between speaker
type and caller type, F(1,67)=7.13, p=0.010, η2 = 0.096. Paired samples t-tests conducted at each
caller level between callers and call takers revealed significant differences in UPM between call
takers and both care provider callers, t(21)=-2.58, p=0.018, and older adult callers, t(48)=-8.74,
p<0.001. These results suggest that call takers speak significantly more UPM than both
callers, but care providers and older adults are similar in their number of UPM.
No significant effects were obtained between speaker type and risk level nor between caller type
and risk level. Independent samples t-tests conducted at each caller level and with the call taker
group between risk levels confirmed no significant differences in UPM at high and medium risk
levels for the care provider or older adult caller levels. A significant difference in UPM was
obtained between high and medium risk levels for the call taker group, t(69) = 2.71, p=0.008.
These results suggest that the call taker speaks significantly more UPM during high risk
108
situations compared to medium risk situations, a difference not observed in the caller
group.
Turn Length in Words
As observed in Figure 3-12c, care provider callers were found to have significantly longer TNLs
compared to older adult callers, F(1,67)=8.47, p=0.005, η2 = 0.112 (univariate test for caller
type); and a significant mean difference was also found between the caller group and call takers,
F(1,67)=3.86, p=0.054, η2 = 0.054 (univariate test for speaker type). A significant interaction
effect was obtained between speaker type and caller type, F(1,67)=22.03, p<0.001, η2 = 0.247.
Paired samples t-tests conducted at each caller level between callers and call takers revealed no
significant difference in TNL between care provider callers and call takers but a significant
difference in TNL between older adult callers and call takers, t(48)=-6.12, p<0.001. These
results suggest that care provider callers have TNL comparable to call takers, but older
adults tend to have significantly shorter TNL compared to both call takers and care
providers.
TNL does not appear to be significantly different for different risk levels and no significant
effects were observed for risk level. Also, no significant effects were obtained between speaker
type and risk level or between caller type and risk level.
Percent Maze Words
As observed in Figure 3-12d, the mean number of MZWs spoken by callers was higher than for
call takers. Paired-samples t-tests were conducted to compare the proportion of MZW between
each caller level and the call taker group. No significant difference was observed between care
provider and older adult callers, but call takers spoke a significantly lower number of MZW
compared to the callers combined, t(70)=5.35, p<0.001.
Independent-samples t-tests were conducted to compare the proportion of MZWs between risk
levels, and for each caller and speaker types at different risk levels. No significant difference was
observed between the overall risk levels or between the older adult and call taker groups at the
different risk levels. A borderline significant result was obtained for the care provider group at
different risk levels, t(20)=-2.03, p=0.056. These results show an increase observed in the
109
proportion of MZW produced by the care provider during medium risk calls but it is
borderline significant.
The frequency of mazes occurring more than 10% (or 0.1) of the total words, (see dotted line in
Figure 3-12d), was also calculated for each speaker. Older adults expressed a greater number of
mazes per total number of words occurring more than 10% of the time, 34.7% of transcripts (17
times out of 49 calls), compared to care provider callers, 22.7% of transcripts (5 of 22 calls), and
call takers, 5.6% of transcripts (4 of 71 calls). Using the Chi-Square test, frequencies between
older adult and care provider callers were not found to be significantly different, however, when
call taker frequencies were included a significant difference was obtained, χ2(2)=16.71, p<0.001.
Discriminant Analysis
A discriminant analysis was used to examine speaker predictability between caller types: the
older adult and care provider, using three predictor variables: WPM, UPM, and TNL. In an
automated PERS application, it is not necessary to identify the call taker since this role is played
by the automated PERS computer. The transformed variables were used in the analysis for UPM
and TNL. Box's M was non-significant at the 0.05 level. The discriminant function (DF) revealed
a significant association between caller type and all predictors. Entering independent variables
together, Wilks λ = 0.677, χ2(3)=26.31, canonical correlation = 0.568, p<0.001. 32.3% of the
variance between older adult and care provider speakers could be accounted for by the three
predictor variables. Using the standardized canonical discriminant function coefficients, the
discriminant function revealed two major predictors: WPM and TNL, DF = (1.465 x WPM)
+ (-0.631 x TNL) + (-0.323 x UPM). Classification based on the DF and group centroids (Care
Provider = 1.016; Older Adult = -0.456) using the original group cases resulted in high success at
81.7% of cases being correctly classified, 93.9% of older adults and 54.5% of care providers (out
of 22 Care Provider and 49 Older Adult cases). Classification using cross validation in SPSS is
performed where each case is classified by the functions derived from all cases except for the
case of interest. The results using cross-validated classification dropped the number of correctly
classified cases to 77.5% with the older adult and care provider percentage of correctly classified
cases dropping to 89.8% and 50% respectively. Re-running the discriminant analysis using only
the variables with significant differences, WPM and TLW, did not change the overall number of
110
cases correctly re-classified (using original group cases) but did modify the individual
percentages correctly classified to 91.8% of older adults and 59.1% of care providers. Box's M
remained non-significant and Wilks λ = 0.684, χ2(2)=25.80, canonical correlation = 0.562,
p<0.001, DF = (1.156 x WPM) + (-0.218 x TLW). Using cross-validation classification the
number of correctly classified cases was 80.3% with only the care provider percentage of
correctly classified cases dropping to 54.5%. These classification results apply only to the cases
used in this study.
3.5.3.2 Analysis of Conversational Structure Measures
The four conversational structure measures included were: number of statements (NS),
number of questions (NQ), number of responses to questions (NRQ), and number of one
word utterances (OWU). All measures followed a non-normal distribution. Log10(x+1)
transformations were applied to NS, NQ, NRQ and OWU measures in order to normalize the
data as outlined in (Field, 2005). Moderate correlations were observed between NS, NQ, NRQ
and OWU, with Pearson's correlation coefficients in the range of 0.3 to 0.7, p<0.001.
The results of the RM_MANOVA revealed a significant within subjects multivariate effect for
speaker type, Wilks' λ = 0.191, F(4,64) = 67.66, p<0.001, η2 = 0.809. A significant between
subjects multivariate effect was obtained for caller type, Wilks' λ = 0.857, F(4,64) = 2.66,
p=0.040, η2 = 0.143, and a borderline significant multivariate effect was obtained for risk level,
Wilks' λ = 0.869, F(4,64) = 2.42, p=0.057, η2 = 0.131. All 2 and 3 way interaction effects were
non-significant. In Figure 3-13, four box plots show: (a) mean number of statements; (b) mean
number of questions; (c) mean number of responses to questions; and (d) mean number of one
word utterances, broken down by risk levels for caller and speaker types.
Number of Statements
As observed in Figure 3-13a, the NS spoken by callers is similar between older adult and care
provider callers (no significant effects for caller type); but differ as a combined group from that
of the call taker group, F(1,67)=125.01, p<0.001, η2 = 0.651 (univariate test for speaker type).
Paired samples t-tests conducted at each caller level between callers and call takers revealed
significant differences in NS between care providers and call takers, t(21)=8.43, p<0.001, and
111
older adults and call takers, t(48)=10.43, p<0.001. These findings show that both callers made
significantly more statements than the call takers during the response call.
(a) (b)
(c) (d)
Figure 3-13: Box plots of conversational measures broken down by risk levels for caller and speaker types.
The NS spoken at high risk levels was found to differ significantly from those at medium risk
levels, F(1,67)=8.82, p=0.004, η2 = 0.116 (univariate test for risk level). Independent samples t-
tests conducted for each caller level and the call taker group between risk levels revealed
significant differences between high and medium risk levels for older adult callers, t(47)=-2.82,
p=0.007, and call takers, t(69)=-3.34, p=0.001, but no significant difference was observed for the
care provider callers. These results suggest that both older adult callers and call takers make
112
significantly fewer statements during high risk calls than medium risk calls, however care
providers make approximately the same NS across risk levels. No significant interaction
effects were obtained between caller type, speaker type and/or risk level.
Number of Questions
As observed in Figure 3-13b, the NQ asked by care provider callers was significantly less than
older adult callers, F(1,67)=7.31, p=0.009, η2 = 0.098 (univariate test for caller type); and the
caller group NQs differed significantly from that of the call taker group, F(1,67)=269.46,
p<0.001, η2 = 0.801 (univariate test for speaker type). Paired samples t-tests conducted between
each caller level and the associated call takers revealed significant differences in NQ between
care providers and call takers, t(21)=-12.42, p<0.001, and older adults and call takers, t(48)=-
15.20, p<0.001. These findings suggest that both callers asked significantly less questions
than the call takers during the response calls. There was no significant effect for risk level and
no significant interaction effects were obtained.
Number of Responses to Questions
As observed in Figure 3-13c, care provider callers had significantly less NRQ than older adult
callers, F(1,67)=5.35, p=0.024, η2 = 0.074 (univariate test for caller type); and the caller group
had significantly more NRQ than the call taker group, F(1,67)=267.50, p<0.001, η2 = 0.800
(univariate test for speaker type). Paired samples t-tests conducted between each caller level and
the associated call takers revealed significant differences in NRQ between care providers and call
takers, t(21)=12.00, p<0.001, and older adults and call takers, t(48)=15.06, p<0.001. These
findings confirmed that both care provider and older adult callers responded to
significantly more questions than call takers, and older adults responded to more questions
than the care provider. The NRQ spoken did not differ significantly across risk levels. No
significant interaction effects were obtained.
Number of One Word Utterances
As observed in Figure 3-13d, care providers had borderline significantly fewer OWU than older
adult callers, F(1,67)=3.93, p=0.052, η2 = 0.055 (univariate test for caller type); and the caller
group differed significantly from that of the call taker group, F(1,67)=6.78, p=0.011, η2 = 0.092
113
(univariate tests speaker type). Paired samples t-tests conducted between each caller level and the
associated call takers revealed a significant difference in OWU between older adults callers and
call takers, t(48)=4.23, p<0.001, but no significant difference between care providers callers and
call takers. These findings suggest that older adult callers made significantly more one word
utterances than both care provider callers and call takers, while one word utterances are
similar between care provider callers and the call taker.
The OWUs spoken also differed between high and medium risk levels regardless of caller or
speaker type, F(1,67)=6.23, p=0.015, η2 = 0.085 (univariate test for risk level). Independent
samples t-tests conducted for each caller level and the call taker group between risk levels
revealed significant differences between high and medium risk levels for older adult callers,
t(47)=-2.27, p=0.028, and call takers, t(69)=-2.70, p=0.009, but no significant difference was
observed for the care provider callers. These results suggest that both older adult callers and
call takers made significantly fewer OWU during high risk calls than medium risk calls,
while care providers make approximately the same number of OWU across risk levels. No
significant interaction effects were obtained.
Discriminant Analysis
A discriminant analysis was used to examine speaker predictability between caller types (Older
Adult and Care Provider) using four predictor variables: NS, NQ, NRQ, and OWU. Transformed
variables were used in the analysis. Box’s M test was not significant at the 0.05 level. The
discriminant function revealed a significant association between caller type and all predictors.
Entering independent variables together, Wilks λ = 0.831, χ2(4)=12.38, canonical correlation =
0.411, p=0.015. 16.9% of the variance between older adult and care provider speakers was
accounted for. Using the standardized canonical discriminant function coefficients, the
discriminant function revealed three major predictors: NS, NRQ and OWU, DF = (-0.838 x
NS) + (0.150 x NQ) + (1.20 x NRQ) + (0.450 x OWU). Classification based on the DF and
group centroids (Care Provider = -0.663; Older Adult = 0.298) using the original group cases
resulted in moderately-high success at 70.4% of cases being correctly classified. 89.8% of older
adults and 27.3% of care providers (out of 22 Care Provider and 49 Older Adult cases). The
results using cross-validated classification dropped the number of correctly classified cases
114
slightly to 69%, with only the older adult percentage of correctly classified cases dropping to
87.8%. Re-running the discriminant analysis using only the variables with significantly mean
differences, NRQ and OWU, resulted in increasing the number of cases correctly re-classified
(using original group cases) to 74.6%: 93.9% of older adults and 31.8% of care providers. Wilks
λ = 0.869, χ2(2)=9.54, canonical correlation = 0.362, p=0.008, DF = (0.702 x NRQ) + (0.404 x
OWU). The results using cross-validated classification dropped the number of correctly
classified cases slightly to 66.2%, with the older adult and care provider percentages of correctly
classified cases dropping to 87.8% and 18.2% respectively. These classification results apply
only to the cases used in this study.
These same four conversational structure measures were also found to predict risk level. The
discriminant function revealed a significant association between high risk and medium risk levels
and all predictors. Entering independent variables together, Wilks λ = 0.851, χ2(4)=10.84,
canonical correlation = 0.386, p=0.028. Box’s M test was not significant at the 0.05 level. 14.9%
of the variance between high and low risk levels was accounted for. Using the standardized
canonical discriminant function coefficients, the discriminant function revealed one major
predictor: NS, DF = (1.05 x NS) + (0.153 x NQ) + (-0.258 x NRQ) + (0.186 x OWU).
Classification based on the DF and group centroids (high risk = -0.374; medium risk = 0.456)
using the original group cases resulted in moderate success at 67.6% of cases being correctly
classified, 76.9% at the high risk level and 56.3% at the medium risk level (out of 39 high risk
and 32 medium risk cases). The results using cross-validated classification dropped the number
of correctly classified cases slightly to 66.2% with only the care provider percentage of correctly
classified cases dropping to 74.4%.
3.5.3.3 Analysis of Timing Measures
Timing measures included: number of speaker turns (ST) and time in seconds, all of which
followed a non-normal distribution. Log10(x) transformations were applied to these measures to
normalize the data. A very high and significant Pearson’s correlation coefficient of 0.8, p<0.001,
was observed between ST and seconds. As a result, the seconds measure was not included in the
RM_MANOVA and examined separately. In Figure 3-14, two box plots show (a) mean number
115
of speaker turns and (b) mean time in seconds broken down by risk levels for caller and speaker
types.
(a) (b)
Figure 3-14: Box plots of timing measures broken down by risk levels for caller and speaker types.
Number of Speaker Turns
As observed in Figure 3-14a, significant differences were found in the number of ST between
callers (Mean=7.85, S.D.=4.24) and call takers (Mean=9.27, S.D.=4.6). RM_MANOVA results
revealed a significant within subjects multivariate effect for speaker type, Wilks' λ = 0.599,
F(1,67) = 44.82, p<0.001, η2 = 0.401. The difference in number of ST between care provider
callers (Mean=6.27, S.D.=3.15) and older adult callers (Mean=8.55, S.D.=4.5) was not found to
be statistically significant. Paired samples t-tests conducted between each caller level and the
associated call takers revealed a significant difference between care provider callers and call
takers, t(21)=-5.17, p<0.001, and between older adult callers and call takers, t(48)=-5.70,
p<0.001. These results suggest that older adult and care provider callers speak on average
fewer ST than call takers, as well, older adult and care provider callers use a similar
number of speaker turns. Essentially the results suggest that call takers are indeed managing
the conversation and usually speak the first and last words.
High risk calls (Mean=6.28, S.D.=2.79) were found to require fewer ST than medium risk calls
(Mean=9.75, S.D. 4.93). RM_MANOVA results revealed, a significant between subjects
116
univariate effect for risk level, F(1,67) = 7.61, p=0.007, η2 = 0.102, but no significant differences
were observed for caller type nor for any 2 or 3 way interactions within or between subjects.
Box’s M and Levene’s Tests were all non-significant at the 0.05 level. Low risk calls had a mean
of 3.50 ST with a S.D.=1.35. Independent samples t-tests conducted for each caller level and the
call taker group between risk levels revealed significant differences between high and medium
risk levels for older adult callers, t(47)=-2.33, p=0.024, and call takers, t(69)=-3.50, p=0.001, but
no significant difference was observed for the care provider callers (close at t(20)=-1.82,
p=0.084). These results suggest that both older adult callers and call takers take
significantly fewer ST during high risk calls, while care provider callers require
approximately the same number of ST across risk levels.
Time in Seconds
The results of a two-way ANOVA examining the relationship between call taker’s response time
‘time in seconds’ with caller type (at two levels: care provider and older adult) and risk level (at
two levels: high and medium risk) revealed a significant difference for risk level, F(1, 67)=13.31,
p=0.001, but no significant difference for caller type or the interaction between caller type and
risk level. These results suggest that high risk calls (Mean=40.64 sec, S.D. 22.25) have a lower
response time than medium risk calls (Mean=69.59 sec, S.D. 39.16). See Figure 3-14b. The
average response time for low risk calls is 20.70 seconds, S.D. 1.73.
3.6 Discussion
3.6.1 Personal Emergency Response Call Trends
Care provider callers accounted for 30.6% of the recorded call sample collected. A possible
reason why care providers might use the PERS button to reach assistance as opposed to dialing
911 using a telephone may be because pushing the button is easier/quicker and would allow them
to keep their hands free to actively care for the older adult while making the call. An alternate
argument may be that pushing the button actually slows down the process for obtaining
emergency assistance because the caller would first need to discuss with the call taker before
reaching EMS. As suggested during an on-site visit to an EMS call centre, care providers may
also be instructed to do this by the PERS provider so they can keep track of their client’s events.
117
In terms of call situations, there were many more medical calls versus fall calls. Defining fall
calls as calls with “unintentional falls not resulting in injury,” essentially excluded fall calls from
the ‘high’ (emergent) risk level category. Only medium risk level, fall calls made by older adult
callers were identified. The main reason behind using this definition was because caller health
information was limited to what was presented in conversation and it became difficult to
determine whether an unintentional fall call with resulting injury was caused by an underlying
medical condition or purely accidental. With the definition used, we could also determine if falls
without physical injury would elicit different responses from PERS users. One possible reason
why fall calls were not observed for care providers in medium risk situations is presumably
because the care provider, if present, would be able to assist the older adult in either getting up
from a fall on their own, or would obtain the necessary help required. Future studies may wish to
consider alternate fall definitions.
Looking at the data purely in terms of the numbers, for the call response types, because care
provider callers were found to request EMS services 100% of the time, it would seem pertinent
for the HELPER to offer EMS services as a first response option to care provider callers. For
older adult callers in high risk situations, an EMS suggestion also seems to be appropriate.
However in medium risk situations, an EMS response might work for approximately 70% of the
medical calls and 50% of the fall calls. Given that these situations occur at the medium risk level,
if the older adult caller does not specify up front who they want called, perhaps the EMS
suggestion first would be the best approach as a default option. The additional length of time
required to suggest an EMS response would also be minimal, if no request is made initially.
Looking at the conversation as opposed to the numbers, older adult callers may be sensitive to
the different latent meaning behind the words “ambulance” versus “paramedic.” The difference
in meaning may be construed as the “ambulance” taking the individual away to be cared for in
the hospital versus the “paramedics” coming to the home of the individual to check on him/her
and to see how they can assist. In a situation where the older adult is trying to maintain his/her
independence, these differences in terms may be very significant. In terms of HELPER
technology design, using the term “paramedic” to offer assistance may be seen as less aggressive
than the term “ambulance,” especially in medium risk level calls made by older adults. The term
118
“ambulance” may be perfectly fine to use in high risk situations where the caller clearly wants to
go to the hospital or can only receive medical care in the hospital (e.g., stroke, heart attack).
The call examples where the caller declines an EMS service and instead requests a non-EMS
responder may further suggest that in fact defaulting to the EMS service may not be the best
option for certain situations. It may be that the HELPER should offer the caller the choice
between the call taker or a non-responder, with the default being the call taker, in medium risk
situations. Then the ambulance offer only appears if the HELPER presumes that the PES is a
high risk situation, such as if the person is not moving (as seen through the video camera), or has
mentioned possible high risk terms such as ‘stroke’, ‘heart attack’, ‘need oxygen’, or ‘can’t
breathe’. Future studies might consider examining these finer differences in HELPER response
dialogue, specifically looking at what responses to offer (e.g., ambulance or call taker), when
should they be offered (e.g., as default, immediately, after response call classification?), and how
should they be offered (e.g., what words should be used?).
3.6.2 Verbal Ability Measures
In terms of verbal ability measures, compared to the care provider caller, older adult callers
spoke more slowly, their rates of utterances were similar, but they had shorter turn lengths. On
the other hand, care provider callers and call takers both spoke at similar rates to each other, both
more quickly and with longer turn lengths than older adult callers. Call takers had higher rates of
utterances compared to both caller types. Our hypothesis that older adult callers would speak
more slowly than call takers during a PES was confirmed. On average, the data suggests that care
provider callers and call takers will say more within their speaker turns and with greater speed
compared to older adult callers.
The discriminant function analysis demonstrated that WPM and TNL had a high rate of caller
type predictability. However, the DF was better at identifying older adults who were classified
correctly over 90% of the time, compared to care providers who were only correctly classified
between 50-60% of the time. In order to adjust the function to capture more care providers, the
optimal threshold can be adjusted to favour care providers more than older adults. This would
have to be done outside of SPSS. The results of this analysis suggest that measures of caller
WPM or TNL may be helpful when used in conjunction with ASR and Natural Language
119
Understanding to further increase the HELPER’s ability to classifying the response call and
increase its confidence in deciding what response to provide. No verbal ability conversational
measures were found to be good predictors for response call risk level suggesting that risk level
information would need to be elicited from the actual semantic content of the response call
conversations, possibly through a caller’s use of keywords and phrases.
In terms of disfluencies, older adult and care provider callers both have a higher average
proportion of maze words compared with the call taker. The higher proportion of mazes for
callers may just be a product of natural spontaneous speech and the need to "find one's words",
as opposed to the call taker who is mainly following an organized, scripted dialogue during the
conversation. The proportion of maze words per total words was found to be lower for the care
provider than the older adult but this was not significant. It is possible that with more data
samples a significant difference would be observed. In Figure 3-12d, Case 47 was found to have
a very high number of maze words, 48%. A closer examination of this case revealed that the
caller had a significant speech impediment which resulted in a great deal of stuttering. In
situations with many maze words, it may be difficult for the HELPER to decipher what an
individual is saying. If these situations could be identified early on in the conversation, automatic
default to a live call taker may be the best response for the HELPER. Future work might
determine how often maze words occur within the initial speaker turns of the response call
conversation and whether the proportion of maze words would be representative of the rest of the
conversation.
3.6.3 Conversational Structure Measures
In terms of conversational structure, compared to the callers, call takers were found to make
fewer statements, ask more questions, and respond to fewer questions. These results correspond
to the fact that the conversational script call takers follow requires them to ask mostly close
ended questions until they obtain enough information, justification, and verification to initiate a
call response. Compared to the call taker, both the older adult and care provider callers used a
similar number of statements and asked a similar number of questions. However, older adult
callers responded to more questions and had more one word utterances on average than care
providers. The number of one word utterances was similar between care providers and call takers
120
which disproved our hypothesis that all callers would have fewer OWU compared to call takers.
A possible explanation for these differences in conversational structure between caller types may
be in the way the caller responds to the questions posed by the call taker. It is possible that older
adults tend to be led more by the call taker and subsequently they respond with simple one word
answers (e.g., yes, no, fine, okay); whereas the care provider may be more direct and provide the
necessary information required by the call taker up front. For example, the care provider often
states what they want and justifies their need, “I need an ambulance because Mrs. Smith fell and
hit her head and it is bleeding”. The questions posed by the call taker may also differ between
caller types. The call taker may ask the older adult specific questions about their ailments versus
a more general overall condition question that might be asked of a care provider (e.g., Are you
hurt? Are you cold? Do you have a temperature?). Furthermore, with more questions and
responses, there is a higher probability of needing to repeat oneself when communication
difficulties occur or for confirmation of answers. In designing the HELPER communication
module, the designer should consider adjusting the dialogue to handle these different types of
callers and conversational structures. For example, the call dialogue could be tailored to handle
‘direct requests with justification’ responses from care providers, as well as, conversations in
which a more ‘seek and find’ approach prevails, where the older adult subscriber answers several
questions before the best response to provide can be identified.
A discriminant function analysis demonstrated that NRQ and OWU were the best predictors of
caller type with a moderately high success rate of 74.6% correct classifications. The DF was
better at identifying older adults who were classified correctly over 90% of the time, compared to
care providers who were only correctly classified 32% of the time. In order to adjust the
function to capture more care providers, the optimal threshold can be adjusted as discussed
previously in the 3.6.2 Verbal Ability Measures section above. A second discriminant function
analysis demonstrated that NS was the best predictor of call risk level with a moderate success
rate of 67.6% correct classifications. The DF was better at identifying high risk levels compared
to medium risk levels. The use of conversational structure by an HELPER to predict caller type
or risk level, however, would probably not be practical simply because it would require several
turns of conversation to be completed before an analysis could begin. Rather, these results are
important because it supports the fact that a significantly different conversational structure
121
occurs between caller types based on how they respond to questions, as well, demonstrating a
conversational change with a majority of higher risk calls requiring fewer statements to be made.
3.6.4 Timing Measures
With respect to timing measures, the difference in speaker turns between older adults and care
providers was not found to be statistically different, regardless of risk level. However, the
number of speaker turns for callers was on average less than the number of speaker turns for the
call taker. As the call taker generally opens and ends the call, this result was as expected.
Considering risk level, although both older adult and call takers had significantly shorter speaker
turns for high risk situations compared to medium risk situations, this difference was not
significant for care providers. It is possible that because care providers usually only call for EMS
services, they are more succinct when requesting a response and their responses are similar
regardless of risk level. Another possibility is that the data results are altered due to the presence
of two outliers, #4 and #2 as observed in the care provider high risk category in Figure 3-14a.
Removing the outliers and re-running the t-test did not change this result (p-value went slightly
lower to 0.067). More data samples would be beneficial to strengthen and confirm the result
outcomes.
In addition to lower ST in high risk situations, call takers were found to speak more quickly
(faster WPM, UPM) during their calls. This quickened pace is possibly associated with the call
taker recognizing the high risk situation and their need to obtain all the required information as
quickly as possible before initiating a response. In contrast, the older adult callers did not speak
more quickly in high risk situations, but they did make fewer statements and one word responses.
It is not clear whether the older adults are simply calmer about their situation in high risk
situations or whether they want to appear calm so as not to alarm the call taker, and possibly to
demonstrate they are in control of the situation.
The mean number of ST calculated can be used as a guideline for technology developers to target
when developing the automated PERS communication module for both caller types. During
medium risk situations, the dialogue for older adult callers varies considerably in terms of the
122
number of ST. In an effort to provide assistance as quickly as possible, it may best to set a limit
for the number of ST before the system automatically defaults to a live operator.
As expected, in terms of time in seconds, high risk calls were responded to more quickly than
medium risk calls, while low risk calls were identified the fastest among all risk levels. These
results were the same for both older adult and care provider callers. All calls were less than 3
minutes. These results also provide a baseline in actual time (sec).
3.6.5 Study Limitations
This study was limited by its small and unbalanced sample size. Increasing the sample size may
improve the robustness of the results. Also, using a different definition for a ‘fall call’, may
increase the number of ‘call reason – fall call’ events and allow this category to be included in
the RM_MANOVA analysis. The fact that all response call recordings had come from a single
PERS provider also limits the number of PESs represented in this study and the generalizability
of the findings. Other PERS providers may follow different call protocols and may experience
other types of events which were not observed with the PERS provider where the calls examined
were obtained. Another study limitation is transcription variability resulting from human error
(e.g., difficulty hearing call recordings clearly). In addition, the fact that statistical analyses are
based on mean measurements is also a limitation. Wide variances in measures were observed for
both caller and speaker types and simply looking at means does not provide a complete picture of
what may be happening within each call. Finally, call meta-data surrounding the speaker details
was not provided (e.g., which call taker is responding, gender of callers, caller medical history).
As such, this study is limited by assumptions made by the researcher in completing the analysis.
For example, the analysis was performed on the assumption that each caller was unique and
interacted with the call taker only one time.
3.6.6 Future Research
Future research may want to consider examining speech intelligibility measures if better call
recordings can be obtained. It would also be important to refine these results to examine only the
initial utterances from the caller. As time is of the essence in emergency response, the HELPER
123
system will need to classify the call within the initial speaker turns. Alternative methods of call
classification may also be considered including a different definition for fall calls.
3.7 Conclusion
In conclusion, this chapter outlines the process by which response call conversations were
analysed to identify significant conversational trends that could be used to help establish
development guidelines for the HELPER communication module’s speech and dialogue
handlers. Care providers were shown to consistently request EMS services when using PERS
100% of the time. Older adults requested EMS services nearly 96% of the time for high risk
situations, however, this number dropped to 71% for medium risk situations. Therefore
identifying a response call’s caller type and/or risk level may help the HELPER in predicting a
possible response outcome. In terms of trends in verbal ability measures, WPM and TNL, were
identified as possible useful predictors for a response call’s caller type. However, no verbal
ability measures were identified as useful predictors for risk level. The identification of average
call taker response times in speaker turns and in seconds will also provide a target against which
the HELPER system’s response times can be compared. In terms of trends in conversational
structure, care providers and older adult callers were shown to employ different strategies for
responding to the call taker. This result suggests that there may be benefit in tailoring the
HELPER dialogue to the actual caller type and risk level. Especially in emergency situations
where time is of the essence, a SDS that responds well to the caller type and PESs may not only
result in identifying the desired response type more quickly, but may also provide a better user
experience. An improved user experience could, theoretically, lead to higher usage rates and
lower technology abandonment.
124
Chapter 4
4 The CARES Corpus: A Database of Older Adult Actor Simulated Emergency Dialogue for Developing a Personal Emergency Response System
4.1 Prologue
This chapter describes the process used to design and develop a spoken speech database
containing Canadian adult regular and emergency speech (CARES). Although the main
motivation for building this speech corpus was to help train and test various components of the
HELPER communication module, this database will also be of benefit to researchers interested
in older adult speech and in other fields such as computational linguistics, natural language
processing, and linguistics. The contents of this chapter have been published in a peer-reviewed
journal.
*N.B. The “Intelligent Call Handler” and the “Communication Dialogue” components illustrated
in this Chapter in Figure 4-1 would house the Speech Informant, Dialogue Manger, and Call
Responder HELPER SDS components and the Response Generation and Speech Synthesis
HELPER SDS components respectively, as described in Chapter 1, Figure 1-6.
Author Contributions: V.Young wrote the manuscript, designed and developed the database
collection protocol, and managed and led the data collection process. A. Mihailidis reviewed the
manuscript and led the research for the automated PERS.
Journal Citation: Young V, Mihailidis A. (2013). The CARES Corpus: A database of older
adult actor simulated emergency dialogue for developing a personal emergency response system.
International Journal of Speech Technology. 16:55-73.
125
4.2 ABSTRACT
There has been limited research on automatic speech recognition systems developed specifically
for older adults and there exist few older adult speech corpora available for training them. For
our research, samples of primarily older adult voices within an emergency context were needed
to help develop, train, and test the automatic speech recognition component of a novel,
intelligent, speech-based personal emergency response system. We were unable to locate an
existing speech corpus with all the properties we required. Specifically, these properties included
spoken Canadian English, both male and female adult (especially older adult) speech, emotional
or stressed speech, and emergency type dialogue. As a result, we created the Canadian adult
regular and emergency speech (CARES) corpus. The goal of this paper will be to describe the
design and development of the CARES corpus. The CARES corpus has been designed using
information obtained from live emergency call centre call transcripts and research literature in
the field of automatic speech recognition. This corpus consists of a collection of spontaneous
speech, read sentences, simulated expression of words, phrases, and emergency scenarios from
adult actors aged 23-91 years. The emphasis is on emergency type dialogue and older adult
speech. A total of 40 participant voices are included in the corpus and over 70% of the voices are
from adults over the age of 50 years. Approximately 3,200 minutes of speech was acquired in
total.
4.3 Introduction
High recognition accuracy in automatic speech recognition (ASR) applications is heavily
dependent on how closely the incoming speech can be matched to the speech samples used to
train the ASR system despite the presence of non-targeted speech noise. For this reason, a great
deal of effort has been placed on developing speech corpora for training ASR systems that
contain speech samples highly representative of the final target population group within the
expected application context. Adding to the already complex task of speech recognition in quiet
environments with the “average adult” speaker, if one considers using ASR in a situation of high
stress, such as a potentially life threatening emergency event involving an older adult speaker,
the choice of speech corpora used for ASR training may be considerably more crucial. Research
literature underlines the fact that stressful situations can alter a speaker’s voice, negatively
126
affecting ASR performance (Baber & Noyes, 1996; Zhou, Hansen, & Kaiser, 1998). Other
research suggests that ASR performance with older adults improves when older adult voices are
used for training the ASR (Anderson et al., 1999; Baba et al., 2004).
A number of speech corpora currently exist and are available for research use through various
institutions; for example, the University of Pennsylvania’s Linguistics Data Consortium (LDC)
(www.ldc.upenn.edu), the Oregon Health and Science University’s Centre for Spoken Language
Understanding (www.cslu.ogi.edu/corpora/corpCurrent.html), Stanford University’s Department
of Linguistics (linguistics.stanford.edu/department-resources/corpora), the Division of
Psychology & Language Sciences at University College London
(www.phon.ucl.ac.uk/resource/scribe), and the European Language Resources Association
(catalog.elra.info/index.php), However, for our intended application involving older adults in
stressful situations, we were unable to identify an existing speech corpus that contained all the
necessary properties required. Specifically, the relevant properties of interest included speech in
Canadian English spoken by older adults and caregivers, use of emergency type dialogue, and
speech spoken in an emotional or stressed state. Good coverage of both male and female voices
was also desired. As a result, we decided to develop a new speech corpus specific to our needs
called the Canadian Adult Regular and Emergency Speech (CARES) corpus.
The CARES corpus was designed especially for future training and testing of the ASR
component of a novel speech-based, intelligent personal emergency response system or PERS, as
well as for overall PERS testing. The PERS system is still in development and further details can
be found in the next section “Background & Motivation”. This paper exclusively focuses on the
creation of the CARES corpus. The application of the CARES corpus in the context of the PERS
ASR will be the topic of another future paper. In this paper, we will outline the motivation
behind the creation of the corpus, present a detailed description of the design specifications and
development, and conclude with a final discussion of our results and the corpus limitations.
127
4.3.1 Background & Motivation
4.3.1.1 The Traditional PERS Technology
Our research focuses on exploring the use of speaker independent (or factory pre-trained) ASR,
artificial intelligence, and human-computer dialogue, within a PERS for initiating and
determining the most appropriate emergency response for an older adult user in an emergency
situation at home. Traditionally, PERS technology is installed in the home of an older adult and
provides him or her with access to immediate 24 hour emergency assistance at the push of a
button. This button actuator is typically worn on the body - around the neck or wrist. In the case
of a fall or medical complication, the button must be pressed in order to reach an emergency call
operator who will respond immediately over a speaker phone or by telephone. The emergency
call operator will then determine through a quick series of questions and responses the most
appropriate emergency response. Finally, the desired emergency service or care provider (e.g.,
ambulance or neighbour) would be contacted or, in the case of a false alarm, the emergency call
operator would end the call.
PERS technology has been shown to support aging-in-place or aging at home in one’s
community, lower caregiver and user anxiety, and decrease overall healthcare costs (Hizer &
Hamilton, 1983; Mann et al., 2005). However, despite high user satisfaction, a large proportion
of PERS owners do not use their systems when needed. Reasons for non-use vary but include
lack of perceived need; sensitivity or burden from having to wear or remember to wear the
button actuator; potential loss of independence from outcome related hospitalization; and
inability to press or access the button (Heinbüchner et al., 2010; Hessels et al., 2011; Mann et al.,
2005; Porter, 2005).
In addition, a large proportion of calls to personal emergency response call centres actually
consist of false alarms (accidental button presses) (Hamill et al., 2009). False alarm calls often
result in unexpected calls to the subscriber, loss of work hours for family responders, and an
increased workload for already stressed emergency care providers. In order for the successful
adoption of this potentially life-saving assistive technology among older adults to help with
aging-in-place, it is paramount that the PERS be made accessible, usable, efficient, and effective:
therefore, system re-design is necessary (Hessels et al., 2011; Porter, 2005).
128
4.3.1.2 Re-designing the PERS
In our research, we hypothesize that using ASR in the PERS could eliminate the need for a body
worn button activator. As well, an artificially intelligent PERS might also permit call
cancellation in the event of a false alarm. The development of the first ‘proof of concept’,
speech-based, intelligent PERS was a ceiling mounted device with a dialogue modeled after
existing call centre protocol (Hamill et al., 2009). An open-source ASR called Sphinx 4.0 from
Carnegie Melon University was used in this prototype (Walker et al., 2004). The system was
designed to understand “yes” and “no” responses to questions and ASR and PERS testing was
completed in a controlled lab environment and performed on non-older adults.
The next step in our research is to further develop the second PERS prototype incorporating
design and testing for the older adult user (Young & Mihailidis, 2010). This improved prototype
will include an expanded vocabulary (but not more than 200 words); a dialogue manager that can
handle normal conversational acts such as repetition, silence, barging-in, mis-understandings,
openings and closings; an artificial intelligence capable of identifying false alarms from true
emergencies and recalling past history; and an ASR trained for the older adult and caregiver
voices in typical emergency situations. The automatic default of this system will always be a live
operator.
To improve the robustness and accuracy of the ASR component used within this new PERS, a
collection of speech samples was required for training and testing from the target population in
emergency type situations. This same collection of speech could also be used as a simulated user
for dialogue and intelligence testing of the overall PERS. The CARES corpus was created to
meet these needs and includes a collection of read and spontaneous speech, as well as emergency
words, phrases, and enacted dialogue collected from adults, the majority of which were older
adult actors over the age of 50 years.
4.3.1.3 The Application
The speech-based and intelligent PERS is comprised of three modules or sub-components: the
Automatic Speech Recognizer or ASR, the Intelligent Call Handler and the Communication
Dialogue (Figure 4-1).
129
The Caller, typically an older adult or caregiver, provides the speech input into the PERS which
goes to the ASR for processing. The ASR ‘Decoder’ takes the processed speech sample and
determines the possible user response (word or phrase) by referencing data from the ‘Linguist’.
The Linguist contains three data components: (1) the acoustic model, (2) the lexicon or
dictionary (word list), and (3) the language model (pronunciation of the words included in the
lexicon).
Personal Emergency Response System
Communication Dialogue
Intelligent Call Handler
Automatic Speech Recognizer
SignalProcessing
Decoder Linguist
Caller
Assistance
CARES Corpus
Input
Output
Output
System Testing
System Training
Emergency Response
Figure 4-1: The CARES Corpus application within the context of a PERS.
When an ASR is trained, the speech samples from the speech corpus are used to create the
acoustic model component within the Linguist. The words and word pronunciations of these
speech samples are also stored within the lexicon and language model of the Linguist. Once the
input speech is decoded, the “Intelligent Call Handler” module then determines whether
emergency response assistance can be initiated and provided, or if further dialogue is required. If
the call is a false alarm, the call may also be cancelled. If further dialogue is required the
“Communication Dialogue” module is initiated and the Caller is queried with another question or
provided further information.
As shown in Figure 4-1, in addition to ASR training, the CARES corpus will also be used for
PERS testing; both the ASR and the overall PERS. Pre-recorded words, phrases and emergency
130
scenarios from the corpus can be used repeatedly to examine how the PERS system components
might respond to different user input. The use of the corpus provides a method for controlled and
repeatable testing of different ASR acoustic models and various modifications to the
communication dialogue and the intelligent call handler. Using the CARES corpus as opposed to
a live older adult in the initial testing phase is more cost efficient and reliable, and will hopefully
provide a smoother technology transition to field testing with live subjects.
4.4 Methodology
The methodology used in this study was reviewed and approved by the University of Toronto
Ethics Board (Protocol Reference Number 23482).
4.4.1 Application Context and Target Population
In designing the CARES corpus, we wanted to include simulated speech of callers (the target
population) using a PERS during a live emergency call, both for false alarms or true
emergencies. To achieve this goal our methodology involved the following steps:
(1) We analysed live emergency call centre calls to determine what occurs during a live
emergency call centre call, who calls, why do they call, what help is requested, and to
identify what key words and phrases are used.
(2) We tried to recreate or simulate live emergency situations by using adult actors, mainly
older adult actors, and recorded their voices while simulating emergency type words,
phrases, and situations.
(3) Older adult subjects (the individuals in need of care) and other adult subjects (the
caregivers) were solicited to provide speech samples of spontaneous speech, read speech,
simulated emotional speech and simulated conversational speech.
131
4.4.2 Speech Corpus Design Specifications
4.4.2.1 Live Emergency Calls
To design a speech corpus that accurately reflects the traditional PERS user, his/her speech
characteristics, and the application context, it was necessary to better understand what happens
during a live emergency call situation after the button activator is pressed.
A total of 84 recordings of live personal emergency response calls were obtained from the call
centre of a private personal emergency response service provider and transcribed using SALT
v.9.0 (Miller & Iglesias, 2006). The company’s name is not provided for reasons of
confidentiality. Discourse and conversational analyses were performed on the transcripts
(Wooffitt (2005), presents an introduction to these two methods of analyses), in addition to key
word isolation. Specific details of the transcript analysis and key word isolation will be discussed
in a separate paper, however a brief summary of the results is included here. See Table 4-1 for a
summary of important aspects of live emergency calls, corresponding transcript analyses
findings, and the resulting speech corpus design specifications.
Table 4-1: Important aspects of emergency call transcript analysis applied to speech corpus design specifications.
Index Important Aspects of Live Emergency Calls
Transcript Analysis Findings
Speech Corpus Design Specifications
1 Who uses the system?
Older adults and care providers Mostly older adult females and some males
Include adults and older adult speakers. Emphasis should be on older adults. Need female and male voice samples
2 How are requests for assistance made?
Direct and indirect requests for assistance
Important vocabulary - select key words and phrases to include
3 Important aspects of conversation and dialogue
Identify communication acts, speech categories and key words and phrases Older adult callers speak significantly more slowly than care provider callers and emergency call operators
Important vocabulary - select key words and phrases to include Vary manner of speech in recordings (e.g., speed of speaking, read speech and emotional speech)
132
4 What types of emergency situations?
Accidental, medical and fall calls
Include each type of emergency situation in scenarios
5 Are there varying degrees of emergency situations?
Low, Medium and High Risk
Include different degrees of emergency situations in emergency scenarios
For item 1, both older adults and their care providers were found to use the PERS and activate
the system using the push button. Therefore, both adults and older adult voices should be
included in the speech corpus. As it is more difficult to recognize older adult voices using ASR,
a greater number of older adult voices should be collected. Although a majority of the emergency
response call centre clients are older adult females, both female and male voice samples would
be required.
For items 2 and 3, in terms of how requests are made and other important aspects of the
emergency conversation and dialogue, care providers were found to be more explicit or direct
when requesting an emergency response (e.g., I need an ambulance) compared to the older adult
callers who were more explanative (e.g. “I’ve been vomiting, I need help”). Within the
conversation itself, the words and phrases used could be categorized into different
communication acts or speech categories, for example, openings, closings, confirmations,
negations, and queries. To reflect this information in the speech corpus, the final recorded
vocabulary included the most common or important key words and phrases which were deemed
necessary to make direct or indirect emergency response requests, indicate a high or low risk
situation, or reflect a negative health condition or fall. In addition, words were also included that
are used in essential communication acts or speech categories required to carry out a
conversation (e.g., repetition requests, responses to questions, queries). All short phrases used in
the corpus contained one or more of the key words contained in the vocabulary.
On average, the older adult callers spoke significantly more slowly than both the care provider
callers and the ECOs. In an attempt to capture the different manners of speech among younger
and older adults within the speech corpus, participants were instructed to repeat key words using
different manners of speaking (e.g., normally, loudly, quietly, quickly, and slowly).
133
For items 4 and 5, emergency response calls were divided into medical emergencies and calls
associated with falls, and also categorized into low, medium and high risk situations. A low risk
situation being a false alarm and a high risk situation being one of life or death, possible loss of
sight or limb, or where immediate assistance is required (e.g., heart attack, stroke, fire). In a
medium risk situation, the individual requires assistance but would probably not lose their sight,
limb or life in the process of waiting (e.g., fell but not hurt). In terms of the length of time before
emergency response is initiated, low and high risk calls were resolved faster compared to
medium risk calls. This information was covered in the speech corpus by including emergency
scenarios involving accidental, medical and fall calls and situations of low, medium and high
risk.
4.4.2.2 Phonetically Balanced Sentences
The words and phrases selected for inclusion into the speech corpus were selected based on their
semantic importance in emergency dialogue as opposed to their phonetic makeup. Thus we felt it
necessary to include in the speech corpus voice samples covering all the phonemes of the
English language. These samples would allow for the possibility of training the ASR system
using phonemes (also di- or tri- phones) if required. Sentences were selected from the SCRIBE
database (Huckvale, 2004) to provide a set of phonetically rich or balanced sentences and
phonetically compact sentences for use in the speech corpus.
4.4.2.3 Spontaneous Speech Sample
Research has shown that read speech and spontaneous spoken speech are acoustically and
linguistically different (Howell & Kadi-Hanifi, 1991) and that ASR recognition rates can be
improved when spontaneous speech is used with an ASR system trained with comparable
spontaneous speech (Furui, Nakamura, Ichiba, & Iwano, 2005). Since emergency dialogue is not
typically read speech, we thought it would be useful to also include a sample of spontaneous
speech in the CARES corpus.
4.4.2.4 Simulated Vocal Expression
Adult actors were mainly used to simulate emergency speech in the CARES corpus because it
was not possible to use actual emergency speech for several reasons. First it is not ethical to
134
create a real emergency, or a realistic simulation of an emergency, since this would create
unacceptable risks for the participants. Nor was it possible for live recordings of emergency calls
to be used because the ones we managed to obtain were not of sufficient audio quality for ASR
training.
A popular approach for research in the area of ‘affective computing,’ where human emotion is
considered when designing ASR systems, is to begin with a collection of emotional speech either
real or simulated by actor subjects (Campbell, 2000; ten Bosch, 2003). Although simulated vocal
expression using actors may not be as natural as the speech obtained from a “true” emergency
situation, there is research evidence to suggest that actor simulated vocal expressions will
provide more expressive speech than non-emotional read speech (Murray & Arnott, 2008;
Scherer, 1986, 2003; C. E. Williams & Stevens, 1972).
4.4.3 Participant Recruitment
Adult actors were targeted for recruitment within three age groups from the greater Toronto area
(GTA). The three age groups consisted of: (1) adults 19+ to <55 years of age; (2) adults 55 to 69
years of age; and (3) adults 70 years of age and over. The upper age of 70 years and older was
selected instead of 60 or 65 years (retirement age) based on literature which suggests that age
related changes in the older adult voice may only start to affect ASR accuracy when the speaker
is over the age of 70 years (Wilpon & Jacobsen, 1996). The lower age of 19 years was selected
as the majority of students within the post-secondary education system would begin at this age.
The middle age of 55 years was selected as this is a possible early retirement age. All actors were
required to have a minimum of one year of prior acting experience with an acting group.
Actor participants were recruited from local theatre events (e.g., Toronto Fringe Festival), a
senior’s acting group (Act II) based out of a local university (Ryerson University), and from a
“Performing Arts Lodge” located within the GTA which is a residence for individuals in the
performing arts. Participants were also recruited via word of mouth from other participants. A
minimum of five participants of each gender (male and female) were recruited for each age
group category.
135
Participant suitability was determined through a telephone interview conducted prior to
acceptance into the study. See Appendix E for a list of the questions participants were asked.
Desired user characteristics are also listed below.
1. Fluent in English with an English language comprehension level high enough to understand the consent forms and follow simple instructions;
2. Canadian residents;
3. Minimal language accent;
4. Minimal motor speech difficulties;
5. English language literate;
6. No more than normal-mild hearing loss or corrected hearing loss;
7. Normal or corrected vision;
8. Medically stable;
9. Mobile and living independently in the community;
10. Cognitively capable of consenting to participate in the study.
4.4.4 Recording Procedure
The speech recording session was designed to last approximately two hours in total. Participants
were required to perform four different speaking exercises. See Table 4-2:
Table 4-2: Summary of speech sample recorded and general recording details.
Exercise No.
Speech Sample Recording Details
1 Free speech (miscellaneous topic) Spoken, 1 session, 5 minutes long
2 Sentences Read, 96 sentences (50 rich and 46 compact), 5 minutes per session, 10 sessions
3 Isolated emergency words and phrases
Read with emotion, 185 phrases and key words repeated 5x, 5 sessions, 10 minutes per session
4 Emergency scenarios Read with emotion, 3 scenarios, 10 minutes total
136
Exercise 1 - Free speech: participants were asked to speak naturally and spontaneously about a
miscellaneous topic of their choosing. If they had difficulty, questions were asked to facilitate the
dialogue.
Exercise 2 - Sentence Reading: participants read a collection of phonetically rich and compact
sentences (96 sentences combined). In total, 200 phonetically rich sentences and 460
phonetically compact sentences were available from the SCRIBE database (Huckvale, 2004).
Two sentence sets were created: Set A included the phonetically rich sentences and Set B
included the phonetically compact sentences. Each of these sentence sets were further divided
into smaller sub-sets: four groups of 50 rich and ten groups of 46 compact sentences. Each
participant was randomly assigned one sub-set of sentences from both the phonetically rich Set A
group and the phonetically compact Set B group. The selected sentence sub-sets were presented
on a computer monitor to the participant for reading. An example of these sentences is shown
below:
SET A – Phonetically Rich Sentences
A1-001. The price range is smaller than any of us expected.
A1-002. They asked if I wanted to come along on the barge trip.
A1-003. Amongst her friends she was considered beautiful.
A1-004. The smell of the freshly ground coffee never fails to entice me into the shop.
A1-005. I'm often perplexed by rapid advances in state of the art technology.
SET B – Phonetically Compact Sentences
B1-001. This was easy for us.
B1-002. Is this seesaw safe?
B1-003. Those thieves stole thirty jewels.
B1-004. Jane may earn more money by working hard.
137
Exercise 3 – Isolated emergency words and phrases: participants were instructed to read and
then speak with emotion, 185 short pre-selected emergency phrases and words. Each key word
was pre-selected from within the phrase and was repeated five times in different manners of
speaking: normally, loudly, softly, quickly and slowly. Spoken numbers were also included in
the recordings: 0-20, 30 to 90 in tens. All phrases and words were displayed on a computer
monitor and participants were provided prompts for the five different manners of speaking. For
the “slow” manner of speaking, participants were instructed to imagine they had difficulty
forming their words. Figure 4-2 shows an example of the question to keep in mind, the response
sentence, and the key word of interest provided to participants during this session. Appendix F
presents a list of the words and phrases recorded by the participants.
In Exercise 3, the emergency words and phrases were randomized and arranged into a list called
‘Set-1’. ‘Set-2’ was Set-1 in reverse order. Each participant was assigned either Set-1 or Set-2 in
an alternating pattern. There was one exception where the first participant viewed the words in
alphabetical order. The order of presentation of the word and phrase list was counterbalanced to
reduce the effects of voice differences due to fatigue or effects resulting from beginning a new
verbal exercise.
Figure 4-2: Sample screen shot of emergency phrases and words presented to the participant during speech recording. The participants were provided with screen prompts to indicate how the word was to be spoken.
138
Exercise 4 - Enacting three emergency scenarios: In total, nine short emergency scenarios
were used with each participant being assigned to three of the nine possible scenarios, all of
which involved dialogue that might occur after pressing an assist button on a PERS. The three
scenarios included: one accidental button push for assistance, one fall incident, and one request
for medical assistance. All scenarios were pre-written and pre-assigned by the researchers. The
scenarios were taken from the live emergency call transcripts and modified to remove any
identifying information. Scenarios were provided to the participants for review and practice prior
to the day of their voice recording. See Table 4-3 for a summary of the scenario type, risk level,
and scenario details. See Appendix G for the emergency scenarios.
Table 4-3: Emergency scenario type, risk level and scenario detail.
Scenario Type
Risk Level Scenario Detail
Accident (A) Low (1) Un-aware of button press, accidental call, hard of hearing
Low (2) Aware of button press, accidental call
Low (3) Un-aware of button press, accidental call
Fall (F) Medium (4) A fall, can’t get up, send responder
Medium (5) Caregiver call, a fall, broken bone, send ambulance
High Risk (6) A fall, bleeding, hard of hearing, send ambulance
Medical (M) Medium (7) Nauseous, vomiting, wants responder
High (8) Breathing difficulty, dizzy, wants paramedics
High (9) Shaky, difficulty breathing, wants responder
In Exercise 4, the nine emergency scenarios consisted of three accidental push-button scenarios
(A), three fall incident scenarios (F), and three medical assistance scenarios (M). Within these
emergency scenario groupings, the accidental push-button scenarios were all low risk events,
whereas the fall incident scenarios included two medium and one high risk situations, and the
medical assistance scenarios contained one medium risk and two high risk situations. A total of
twenty-seven scenario combinations were created and placed in random order. Each participant
139
was assigned one of the twenty-seven randomized scenario combinations which included one
accidental, one fall, and one medical scenario. When all twenty-seven scenario combinations had
been performed the order was repeated (Table 4-4 – Scenarios). The three scenario type
combinations (e.g., A, F and M) were also arranged into six order combinations (e.g., A-F-M, F-
M-A, M-A-F, etc.). Each participant was assigned one of these scenario order combinations to
determine the order in which the scenarios would be performed. This order was repeated when
all six scenario type combinations were completed (Table 4-4 – Scene Order). See Table 4-4 for
an example of the data combinations used for each Participant.
Table 4-4: Example of data combination arranged for each participant indicated.
Participant Scenarios* Scene Order
Sentences A
Sentences B
Sent. Order
Word Set
10 248 MFA 4 8 AB SET1
11 258 FAM 3 2 BA SET2
12 157 AMF 4 6 AB SET1
13 367 AFM 2 10 BA SET2
14 368 FMA 3 5 AB SET1
15 149 MAF 2 2 BA SET2
16 269 MFA 2 1 AB SET1
17 259 FAM 4 7 BA SET2
18 168 AMF 2 5 AB SET1
19 147 AFM 4 10 BA SET2
20 167 FMA 4 5 AB SET1
* For participant #10 for example, scenarios ‘248’ corresponds to scenes 2, 4 and 8.
4.4.4.1 Recording Environment
All speech recordings were conducted at the University of Toronto in quiet background
conditions inside a doubled wall, sound attenuated booth of approximately 74 x 74 x 78.5 inches
(DxWxH) in size. Participants were seated inside the sound attenuated booth in front of a
computer monitor. See Figure 4-3.
140
Figure 4-3. Participant room setup in sound attenuating booth.
Figure 4-4. Experimenter room setup in sound attenuating booth.
4.4.4.2 Recording Equipment
Speech recordings were made using ProTools TDM Software on a dedicated Apple Computer
(MAC OS X version 10.4.11, 3 GHz Dual-Core Intel Xeon). The microphone pre-amp was a
Digidesign “PRE” and the audio interface was the Digidesign “192 I/O”. Participant speech was
recorded at a sampling rate of 96 kHz and 24 bits. The participant used an AKG Acoustics k271
studio headphone and an Audio-technica 4050 multi-pattern condenser microphone. The
experimenter used an eH150 Sennheiser headphone and an AKG Acoustics C4000B large
diaphragm condenser microphone.
141
4.5 Results
4.5.1 Participant Recruitment
A total of 40 participants, 19 male and 21 female, were recruited for the study over a six month
period. Thirteen participants fell within the 19+ to <55 years of age group, twelve participants
were in the 55 to 69 years of age group, and fifteen participants were in the 70 years of age and
older group. See Table 4-5 for a breakdown of the participants by Age Group and Gender.
Table 4-5: Participants by Age Group and Gender
Age Group (years) Gender
Male Female
19+ to <55 6^ 7
55 to 69 6 6*
70 and over 7& 8~
Gender Totals 19 21
^1 non-actor; &1 with minor accent; *1 with minor accent, 1 non-actor; ~3 with minor accent, 2 non-actors.
Nineteen of the participants had 15 years or more experience in the acting profession. Four
participants had no acting experience and five participants spoke with a minor accent. Minor
accents included British English, French and German. Participants spanned an age range from 23
to 91 years of age. See Table 4-6 for a breakdown of the participants by age range.
Table 4-6: Participants by Age Range
Age Range
(years)
Gender
Male Female
20’s 2 3
30’s 3 3
40’s 1 0
50’s 3 2
60’s 3 5
70’s 5 6
80’s 2 1
90’s 0 1
142
Fifteen of the participants were born outside of Canada. They represented the following
countries: Austria, England, France, Germany, Italy, Japan, Scotland and USA. For participants
born within Canada, the Canadian birth provinces included Alberta, British Columbia, Ontario,
Newfoundland, and Nova Scotia.
4.5.2 Speech Recording Summary
Each participant completed all four speech exercises described in the Methodology Section,
except for one participant who did not complete the emergency scenario exercise due to fatigue.
Two participants also did not complete the number counting. A total of ~3,200 minutes of speech
was recorded.
4.6 Discussion
4.6.1 The Age Effect
The length of time required for speech recording was approximately two hours; however, timing
was dependent on how quickly the participants spoke and how many breaks were required.
Younger participants tended to finish the study in less than two hours, while some of the older
participants required more time to finish the exercises. Older participants needed more voice
breaks during the recording sessions.
In the free speech exercise, the topic selected for dialogue tended to be more related to
relationships and travel for younger participants compared to life experience, job and children for
the older participants.
Due to their age and life experiences, the older participants were observed to be more realistic at
portraying older adults in emergency situations especially for certain conditions (e.g., stroke,
weakness, heart attack) than their younger counterparts.
4.6.2 Recording Difficulties
In the sentence exercise, the majority of participants knew how to pronounce all the words in the
sentences however some individuals did have trouble pronouncing words which may have been a
result of education, vision, or reading difficulties.
143
Some participants with hearing aids removed them during the recording sessions as they said
they could hear well over the headphones. Other participants kept the hearing aids on. Wearing
the headphones did not seem to bother them.
Occasionally, a participant gestured or moved during the recording causing noise to be added to
the recorded voice sample. These samples were generally not re-recorded unless the noise was
sufficiently loud to interfere significantly with the recorded signal. If the participant coughed or
sneezed during a sentence, the sentence or word was repeated.
4.6.3 Design Limitations
In the emergency word and phrase exercise, for the word repetition component, the “slow”
manner of speaking was interpreted in different ways. Variations included lengthening the word
slightly, exaggerating the length, stuttering, and slurring the speech.
Some actors with stage or theatre experience were found to exaggerate and enunciate words
more clearly than non-stage actors. It is possible that this type of clear word enunciation may not
accurately reflect a real live emergency scenario. Some actors also seemed to over-act in some
situations. This has been noted to occur in other research studies involving actors and simulated
vocal affect (Scherer, 1986). It is possible that the individuals who over-acted may have had less
acting experience or possibly less life experience with respect to the specific emergency situation
being simulated.
In terms of the speech recording, the position of the microphone was set at the beginning of the
study. However the participants were free to move closer or further from the microphone over
the course of the recording session. This may have affected the final recorded volume of the
speech samples. The microphone was very sensitive and occasionally the change in speech from
a whisper to a very loud voice would cause the input to be maximized or over-saturated and the
recording for that word would need to be re-recorded. If the microphone was positioned too close
to the participant’s mouth and the participant spoke loudly, for certain consonants (plosives) (i.e.,
/p/, /b/, /t/) the microphone may also have picked up a noise artifact (burst of airflow) that would
not normally occur when the microphone is positioned far away from the participant.
144
The CARES corpus is of similar size to the “few talker” set in the SCRIBE (Huckvale, 2004).
Although the speech sample size may not be large enough to train a large vocabulary ASR, the
number of speech samples should be sufficient for training and testing the intelligent PERS ASR
(a small vocabulary ASR) and preliminary field testing. If required, additional speech samples
may be added in the future.
4.6.4 Background Noise
In a true emergency situation, the caller’s incoming speech signal will be contaminated with
background noise from their respective environments. In creating the CARES corpus, the actual
recordings of speech samples were carried out in the quiet environment of a sound attenuated
booth. The benefit of recording the speech signal in a quiet recording environment is that the
degree and type of background noise contamination can be controlled. For example, simulated
room noises, street noise or other noises (e.g., television, radio, crowds) can be added later to test
their effect on the ASR’s speech recognition capability.
4.6.5 Implementing the CARES Corpus
Preliminary plans for using the CARES Corpus in training the PERS ASR includes expanding
the existing vocabulary which currently only recognizes “yes” or “no”. We also plan to examine
different acoustic model combinations, for example, training with strictly older adult voices, or
older adult and younger adult voices, or strictly the younger adult voices. It might also be
interesting to combine the speech samples from the CARES Corpus with speech samples from
other existing corpora to increase the speech data for ASR training for certain common words
such as “yes”, “no”, “help” or “ambulance”.
4.6.6 Other Applications
In addition to benefiting the development of PERS, we expect that the CARES corpus will be
useful for other applications involving older adult subjects such as audio interfaces to Smart
Home technologies. It may also be useful in linguistics research studying the speech patterns of
older adults. It is our intention that the CARES corpus will be made available to other
individuals for their research once the processing has been completed.
145
4.7 Conclusions
A collection of Canadian adult regular and emergency speech has been developed containing
speech samples from 40 adults between the ages of 23 to 91 years. The CARES Corpus
specifications were based on transcript analyses of emergency response call conversations and
dialogue as well as other research literature in the field of ASR development. Participants
included mainly adult actors who were required to carry out four speech exercises including
spontaneous and read speech, and enacted emergency dialogue, phrases and words. The CARES
corpus contains roughly 3,200 minutes of speech. This corpus was primarily designed to further
develop the ASR component of an intelligent, speech-based PERS for older adults. It may also
find uses in future research studies involving smart home technologies, natural speech processing
and computational linguistics. The next step in our study will involve using the CARES Corpus
to train and test an acoustic model for the PERS. It will also be used to provide controlled input
to examine the communication dialogue and intelligent call handling aspects of the PERS.
146
Chapter 5
5 Discussion & Conclusions
5.1 Discussion
There are many challenges in designing and building a novel automated, artificially intelligent,
spoken dialogue-based PERS. Simply getting the system to work in the real world with the
intended user and in the right environment is a major one. Yet, as difficult as it may be to
overcome this challenge, a working system also does not guarantee technology adoption in the
end. Research studies have found that older adults are willing to use intelligent assistive
technologies on two conditions: the older adult must see a need for the technology and it must
work well (Demiris et al., 2004; Mann, Marchant, Tomita, Fraas, & Stanton, 2002; McCreadie &
Tinker, 2005). In addition, the desire or motivation must be there to use it. As researchers,
perhaps the best place our skills can be applied is in creating the knowledge and identifying the
means to make the technology work well. This project takes a step back from the actual physical
system development and focuses more on filling in the gaps in knowledge needed to complete
the design and development of the HELPER communication module for the intended end-user in
a real personal emergency situation. All three research studies in this dissertation use live
personal emergency response call data obtained from a local personal emergency response call
centre. The knowledge derived from analyzing these real personal emergency response calls
combined with prior research studies will pave the path down which a more robust HELPER can
be created – one that will work well for the end-user and instill in them the desire to use it.
Three main objectives of this doctoral research project are re-stated as follows:
(1) To identify keywords and phrases used by existing PERS users in various personal
emergency response call situations. (Study 1 – Chapter 2)
(2) To identify significant trends in personal emergency response calls and call conversations
that may be used to tailor the call response to the user. (Study 2 – Chapter 3)
(3) To design and develop a corpus of spoken speech to be used for training and testing the
communication module of the HELPER system. (Study 3 – Chapter 4)
147
In this final Chapter, first, study and data highlights will be summarized for each study. Next, the
research contributions to knowledge will be presented followed by the research limitations and a
description of future research and proposed future studies. Finally, the thesis will conclude with a
short discussion on the implications of the work and offer some final remarks.
5.2 Study Highlights
The three studies are presented in the order in which they appeared in the dissertation. In each
sub-section, the research objective is re-stated followed by a summary of the study highlights.
5.2.1 Principal Findings from Study 1: Identification of Keywords and Phrases
Study 1 Objective: To identify keywords and phrases used by existing PERS users in various
personal emergency response call situations.
Before keywords and phrases used by PERS callers could be identified, it was necessary to first
determine what “different types” of PESs occur as well as to figure out how a word or phrase
might be considered “key.”
To determine different types of PESs, a model of a PES was created which consisted of the
system user experiencing some possible personal emergency event and existing in some
physical-cognitive state. In the model, the user was described as a caller type, the situation was
characterized by a call reason and a risk level, and the physical-cognitive state was represented
by the user’s communication ability. Knowledge gained from on-site visits with emergency
response service providers, research literature, and the response call transcripts was used to
develop the model and to identify PES categories within the PES. Caller type included three
categories: the older adult, the care provider, and a combination of older adult and call provider.
Call reason included two categories: fall and medical calls. Risk level included three categories:
low, medium, and high risk.
Unique words were extracted from response call transcripts and word categories were developed
to help identify what words would be considered “key” and why. These categories were
developed qualitatively using the call transcripts and research literature. The process of
148
categorizing the extracted words was repeated by two coders categorizing words out-of-context.
A sub-study was also performed with a third coder identifying keywords, phrases, and their
categories in-context by reading the response call transcripts. Eighteen (18) category codes were
used by all three coders and a full keyword list of 402 words were identified after integrating all
coder results. 135 phrases were identified by Coder 3.
A smaller keyword set was produced from the full keyword set for use as content material in the
CARES corpus. This list was obtained by applying a series of reduction rules to the full keyword
set. The reduction process took into consideration both the keyword frequency of occurrence, as
well as the number of PES classifications in which the keyword was used. A final small keyword
set of 185 keywords was obtained. For every keyword, a matching phrase was identified which
contained the keyword, for a total of 185 phrases. The 185 keywords fell into 16 of the word
categories (only the categories for “other” words and “interjections” were removed). The small
keyword set was found to include 16 common words used across the risk level and call reason
PES categories, as well as unique keywords for low risk (3 words), medium risk (44 words), and
high risk (31 words) calls.
5.2.2 Principal Findings from Study 2: Identification of Conversational Trends
Study 2 Objective: To identify significant trends in personal emergency response calls and call
conversations that may be used to tailor the call response to the user.
The theory behind this objective was that in addition to recognizing and understanding incoming
speech, other information identified from the response call transcripts could be combined with
the words to support the HELPER’s decision making capability. This other-information might be
in the form of non-semantic speech data such as conversational measures, identified through an
analysis of the response call conversations. By examining trends within the response call
conversations these alternate sources of information would become apparent and could then be
integrated into the communication module’s Dialogue Manager. The Dialogue Manager would
use this information to classify the response calls thereby increasing the HELPER’s confidence
in directing the conversation to quickly identify the appropriate target response.
149
Before this objective can be met, a method for classifying a response call was required and
measurable aspects of interest within a response call conversation needed to be identified. In this
study, the PES model was expanded to the PER model to include an additional response-type
classification. The PER categories were then used to classify the response calls. The four
response type categories identified included: (1) an ambulance, (2) paramedics, (3) other
responders, and (4) all responders. An EMS response was considered to be any of response types
1, 2, or 4. Three groups of conversational measures were then selected for examination: (1)
verbal ability, (2) conversational structure, and (3) timing. Within verbal ability, the study looked
at: words per minute, utterances per minute, turn length in words, and mazes. Within
conversational structure, the study looked at: number of statements, number of questions,
number of responses to questions, and number of one word utterances. Timing was examined
using seconds and number of speaker turns.
The first statistical analysis identified that caller type and risk level were significantly related to
response type. Knowing the call reason might also indicate the call risk level. Care providers
were found to consistently request EMS services when using PERS 100% of the time. Older
adults requested EMS services nearly 96% of the time for high risk situations, however, this
number dropped to 71% for medium risk situations. Thus by knowing the response call’s caller
type and/or risk level the HELPER may be able to deduce the end response type. For example,
an EMS response might be suggested immediately for high risk calls but the system might
propose a non-EMS responder for medium risk calls.
The three independent factors used in the subsequent statistical analyses included caller type
(older adults and care providers), risk level (medium and high risk), and speaker type (call taker
versus callers). The statistical analyses performed included repeated measures analysis of
variance and discriminant analyses using the response call conversational measures and response
call categories of caller type, risk level and speaker type (call takers were the control group).
The results of these analyses showed that words per minute and turn length in words could be
used to help predict caller type. No measures were found to be useful for predicting a call’s risk
level. Mazes were found to occur more often in older adults than care providers but this was not
significant. High risk calls were resolved more quickly than medium risk calls as measured by
150
time in seconds and number of speaker turns. Finally, care providers and older adult callers were
found to employ different conversational structure strategies for responding to the call taker. This
result suggests that different dialogue responses may be required depending on the caller type
and risk level.
5.2.3 Principle Findings from Study 3: Creating the CARES Corpus.
Study 3 Objective: To design and develop a corpus of spoken speech to be used for training
and testing the communication module of the HELPER system.
There are many ways in which a speech corpus can be created and a multitude of reasons why
recordings of speech may be collected. The key to building a useful speech corpus is to make it
relevant for the purpose and context in which it is being built. Researchers, who have been
involved in the development of larger speech databases, have noted that developing a speech
database is an extremely labour intensive process (Huckvale, 2004; Lamel, 1989). This reason
alone underlines the fact that care must be taken when selecting database content material as this
will ensure that the database will be useful for all aspects of system training and testing (Lamel,
1989).
The goal of the CARES corpus was not to construct a corpus for the training of a large
continuous spoken word recognizer, but rather, to have enough speech samples of relevant
content in which to build a small vocabulary either isolated or continuous spoken word
recognizer following the recommendations outlined in (Young & Mihailidis, 2010). Work by
other researchers also suggested that combining a small amount of targeted speech data with a
larger collection of speech data could be used to finely tune an ASR for the intended end-user
(Vipperla et al., 2009).
The CARES corpus was primarily designed for testing the HELPER and to test and train various
components contained within the HELPER’s communication module, namely the ASR acoustic
model. For this reason, a portion of the speech content included in the CARES was modeled off
another database also used for training ASR systems called the SCRIBE which contains
phonetically compact and rich sentences and some spontaneous speech. The other portion of the
151
CARES speech content comprises the personal emergency keywords, phrases, and scenarios that
represent the end-user in various PESs.
Having context relevant speech samples are invaluable for being able to replay a mock PES or a
group of spoken words or phrases during HELPER communication module testing. Comparison
testing of different ASR acoustic models or spoken dialogues could then be performed and
differences in outcomes could be attributed to actual modifications made to the design or settings
of the system components themselves rather than to any variability present in the incoming
speech. Environmental noise could also be added or different microphones used with the
recorded speech samples to observe their effect on recognition accuracy and system output.
The CARES corpus was designed to include five different types of speech samples: (1)
spontaneous, non-emergency related, continuous speech; (2) non-emergency related read
sentences; (3) emergency related and read with emotion phrases/sentences; (4) read words
spoken in different manners (i.e., fast, slow, loudly, softly, normally); and finally (5) read with
emotion/enacted emergency conversations.
The database included speech samples from 40 adult participants between the ages of 23 to 91
years within three age ranges: (1) 19+ to <55 years, (2) 55 to 69 years, and (3) 70 years and over.
A majority of participants were adult actors with over one year of experience in acting and had to
meet some minimum demographic criteria (e.g., minimum foreign accent, Canadian residents,
fluent in English with no speech difficulties, etc.)
All participants performed four speech exercises, over the course of 2 hours, which comprised:
(1) 5 minutes of spontaneous monologue, (2) the reading of 96 phonetically rich and compact
sentences, (3) reading with emotion 185 emergency phrases with a keyword repeated 5 different
ways, and (4) enacting of three different emergency scenarios. The 185 emergency keywords and
phrases and 9 scenarios were all derived from response call transcripts. The scenarios were
selected to portray various response call classifications (e.g., high, medium, low risk calls, fall
and medical calls, caregiver or older adult calls, ambulance versus request for other responder
types). Roughly 3,200 minutes (~53 hours) of speech were recorded.
152
5.2.4 Data Interpretation Highlights
The studies conducted as part of this dissertation demonstrate the richness of the data that can be
obtained from analyses using response call recordings. Some reflections on the research data are
discussed next.
5.2.4.1 Identification of Keywords & Phrases
To identify keywords and phrases in response call transcripts, this study considered the universal
design approach in developing the PES model taking into account different caller types during
various PESs. Categories describing the caller types, call reason and risk level were created and
identification of keywords and phrases were based on word categorization based on word
meaning and function. The initial full keyword set was further reduced by taking into account
word occurrences and occurrences across the PES categories. This process hopefully ensured
that enough keywords (and associated phrases) were selected to sufficiently represent the variety
of different PESs observed in the response call recording set. The use of word meaning, function,
and occurrence to select keywords are common techniques used in other studies examining
automatic keyword extraction techniques (Haggag, 2013; Madane, 2012). Whether the same
keywords would have been identified by a computer has not been examined. However, in this
particular study, humans were needed to categorize the keywords according to their meaning and
function, to identify keywords that were still important but occurred very infrequently, as well as
to assess the words for inclusion into/exclusion from the final small keyword set.
With respect to expanding the vocabulary of the HELPER’s communication module or the SDS,
there is a trade-off between having the HELPER recognize more words and maximizing the
recognition accuracy of the ASR. A smaller vocabulary usually equates to higher rates of word
recognition. In this study the final small keyword set was limited to 185 words mostly based on
the reality of having to record the data with older adults in a timely fashion for the CARES
Corpus. Although this was also in keeping with wanting a small vocabulary set. Although the
full set of 402 words might reflect to a greater extent the actual vocabulary used in the
transcripts, the small vocabulary set should also contain enough data with which future
researchers can determine whether more vocabulary is needed or not. For example, this set
should allow future researchers to determine whether a small subset of the 185 keywords is
153
sufficient, or if the full set of 185 keywords is needed, or if it would be better to use the entire
402 keyword set.
5.2.4.2 Statistical Analyses of Conversational Measures
Statistical measures look at trends in data and significance is relative to both the population
sample size and the p value selected. However, just because a statistical test is insignificant does
not necessarily mean that the observation being tested is insignificant. It could be that the mere
presence of an occurrence is significant in and of itself, although, with a sample size of one, a
statistical test is likely not the best way to demonstrate significance. In this study, the frequency
of mazes between callers was not shown to be significantly different from each other. However,
in one of the response calls the older adult caller had significant communication difficulties
resulting in a higher number of mazes (see Chapter 3, Figure 3-12d, case#47). Despite the
statistical test being non-significant, this one case is significant as it indicates that the caller’s
communication ability was considerably reduced (it is an outlier). Furthermore, the presence of a
high number of mazes would likely limit the HELPER speech handler’s ability to both recognize
and understand this caller’s speech.
Looking at the timing measure results (see Chapter 3, Figures 3-14 a,b), the number of speaker
turns needed before a response was initiated was not found to be significantly different between
the older adult and care provider callers. A closer look at the boxplot graphs shows a lot of
variance in the data especially for the older adult caller at the medium risk level. It could be that
categorizing the response calls in a slightly different way would better divide the data and yield
different trends. For example, characterizing the callers based on their communication style as
opposed to being an ‘older adult’ or ‘care provider’ caller may be more aligned with the work by
(Wolters et al., 2009) who identified two groups of older adult SDS users. These groups
consisted of “social” older adult users who interacted with their SDS as if it was a human and the
“factual” older adult users who communicated succinctly with their SDS. Of course, having
more samples would also strengthen the power of the observed findings.
Finally, statistical analyses were performed using average measures of verbal ability,
conversational structure, and timing over the entire transcript. However, it is not expected that
the HELPER would be conversing at length with the user. Therefore, calculating the
154
conversational measures using only a few utterances after the opening query of the response call
conversation may have been a better choice for analysis and is recommended for future analyses.
In general, using statistical measures to examine the response call transcripts has revealed
valuable information that can be applied to improving the HELPER communication module,
however, it should be noted that these tests only look at data trends and non-significant tests may
still be significant.
5.2.4.3 Actor Simulated PESs
In the making of the CARES Corpus actors were solicited to simulate personal emergency
response situations or emotions while providing voice samples of PES related words, phrases and
conversations. Some qualitative observations are offered based on having first listened to the live
response call recordings, followed by conducting the acquisition of speech samples from study
participants.
1. Older adult actors seem better able to portray older adults in PESs more realistically than
younger adults, most likely because of their personal life experiences.
2. Theatre/stage actors compared to movie/TV actors were found to enunciate more clearly
which may be a little unnatural in real life conversations.
3. Some actors tended to “over-act” when imagining how the older adult would respond
during an actual PES. This is likely due to lack of experience or knowledge in
understanding what happens when someone experiences these various PESs.
4. The degree to which the recorded speech samples are able to replicate true speech during
real PESs ranges from not-quite believable to quite believable.
For observation 4, this variability in acting ability could be attributed to the actors’ prior acting
and life experiences as well as fatigue. For example, one of the older adult actors, who, in my
opinion, was at the most believable end of the spectrum, commented that recording the PES
words, phrases, and scenarios was one of the most challenging assignments this actor had been
given. In this situation, the actor was referring to the degree of emotion and feeling required to
play the role and the rapidly changing situations occurring within each phrase/word and scenario
in the corpus. This actor paused before recording each sentence/phrase/keyword set to determine
how she would speak the utterance. In contrast, another less experienced actor did not always
155
take this extra time to ‘setup’ resulting in a less realistic outcome. However, overall, it is
believed that the speech samples collected in this study using actors will more accurately reflect
the real PESs as opposed to collecting speech samples from non-actors simply reading words.
This observation is aligned with the findings from other researchers using actors in their studies
(Murray & Arnott, 2008; Scherer, 1986; Scherer, 2003; Williams & Stevens, 1972).
5.3 Contributions to Knowledge
5.3.1 Original Research with Response Call Recordings
Over the last four decades, there have been many research studies examining the use of PERS
technology. These studies have examined the PERS benefits, reviewed its impact at the personal,
family, social, and medical institution levels, as well as at the system design and technology
levels. In addition, a handful of literature exists examining emergency response (911) call
conversations. However, no research studies could be identified that actually focuses on
characterizing personal emergency response calls and call conversations. When trying to
understand the intricacies of a process in the real world involving people, looking only at
quantitative data or only at qualitative data would have provided only a partial view of the bigger
picture (Krippendorff, 2012). As a result, Study 1 and 2, as described in Chapters 2 and 3
respectively, are original studies performed using a mixed methods approach where the
knowledge gained from the initial qualitative content analysis is used to inform the design of the
quantitative content analysis. All three studies in this dissertation help to fill a research gap by
providing a better understanding of what happens during personal emergency response calls and
call conversations between PERS callers and call takers.
5.3.2 Applying Research Findings to the HELPER
A summary of the main research contributions to knowledge that pertain specifically to the
HELPER are listed next followed by an explanation or where the knowledge can be applied
specifically within the HELPER communication module.
1. Keywords were identified to increase the vocabulary of the HELPER ASR (acoustic and
pronunciation models). Key phrases were identified to improve the HELPER ASR language
model.
156
2. Keyword categories were identified to improve the Semantic Analyser of the Speech
Informant component of the HELPER communication module.
3. A personal emergency response (PER) model was developed and used to classify and
characterize personal emergency response calls.
4. Measures of words per minute or turn length in words were found to be fairly good at
predicting caller type (e.g., older adults spoke fewer words per minute than care providers).
5. Significant patterns in call response requests were observed that are related to the caller type
and risk level of a call (e.g., high risk calls were associated with ambulance requests over
95% of the time).
6. Differences in call dialogue were observed between caller types and at different risk levels
(e.g., care providers tended to be more succinct).
7. Call timing measures were obtained based on the call’s risk level. These timing values can be
used to plan the length of the HELPER dialogue by time and/or speaker turns.
8. The CARES Corpus was developed and can be used for HELPER ASR training, and ASR
and system testing.
These studies build upon prior research knowledge (i.e., the recommendations from the first
HELPER prototype testing), and contributes new knowledge that can be used to develop
HELPER application specifications and/or contribute to the actual development of the HELPER
communication module.
In the literature review in Chapter 1, the components of the HELPER communication module
were illustrated and described. Figure 5-1 presents the assembled internal components of the
HELPER communication module including the basic internal sub-component units. The units
circled in “orange” indicate the areas of the communication module in which the study results
can be applied. The application of these study findings to theses specific communication module
sub-components is outlined next.
157
The HELPER Communication Module
Incoming Speech(from microphone)
Spoken Output (to speakers)
Speech Handler
A/D Conversion &Feature Extraction
Decoder Linguistic Models1. Acoustic2. Pronunciation3. Language
ResponseHandler
Database of Dialogue Text
Select Response
DialogueHandler
Responder On Route
Automatic Speech
Recognizer (ASR)
Semantic Analyzer (NLU)
Dialogue Measures
PERC Classifier
Dialogue Manager
Response Generation
Speech Synthesis
Speech Informant
Dialogue History
Speech OutputDatabase of
Spoken Dialogue
Dialogue State
Responder Information
Response Request History
Call Responder (Initiate/Confirm)
Dialogue Control
Dialogue Set
Call Responder
Figure 5-1: Diagram of the internal components of the HELPER Communication Module.
Application of findings to the ASR component of the Speech Handler:
1. ASR – The 185 keywords identified from Study 1 can be used to expand the Linguistic
Model’s pronunciation model and are used within the CARES Corpus which can be used
to train the Linguistic Model’s acoustic model. The 185 key phrases identified from
Study 1 can be used to train the Linguistic Model’s language model. Together, these three
158
models can be used to improve the ASR component of the HELPER SDS. The output of
the ASR is a “best match” guess at the speech input. This information is sent on to the
Speech Informant for interpretation of the meaning of this “best match” spoken utterance.
Application of findings to the Speech Informant component of the Speech Handler:
2. Semantic Analyser – The 16 word categories identified from Study 1 can be applied in
the semantic analyser to improve understanding of the recognized words coming from the
ASR decoder. Speech understanding could occur by matching the recognized words with
their word categories to derive the associated word function or word group intent. This
information would then be sent on to the Dialogue Manager for decision making purposes
(see Table 2-12 in Chapter 2 for an example of how a speech unit would be associated
with the word categories).
3. Dialogue Measures – the discriminant function using words per minute and caller turn
length in words identified in Study 2 could be used for predicting whether the incoming
speech from the first utterance is more likely to be from a care provider or an older adult.
The final prediction could then be sent to the response call classifier for processing.
Application of findings to the Dialogue Manager component of the Dialogue Handler:
4. Call Classifier – the keyword categories, measures of words per minute and turn length in
words, PER model, and response type preference results from Studies 1 and 2 can be used
to design the response call classifier unit in the Dialogue Manager. See Figure 5-2 for an
example of what the classifier structure might look like. The classifier unit would receive
information from the Speech Informant and use this information to classify the response
call if possible. For example, keywords and conversational/dialogue measures identified
from the incoming speech might indicate that assistance is needed, the call reason is a
fall, an older adult is calling, and the risk level is medium. The classifier in Figure 5-2
would then identify that the older adult’s first responder or the call taker (see response
type #2 under the medium risk level category) is the response type to propose to the
Dialogue Control sub-component in the Dialogue Manager.
159
LOW RISK(Not-urgent)
MED RISK(Urgent)
HIGH RISK(Emergent)
AccidentTesting
Check-inOther (i.e., time, day)
Need assistance soonNon-life threatening
Injury, fall, illness
Immediate assistance Loss of life or limb
Major illness or injury
1. HELPER2. Call Taker
1. HELPER2. OA Responder or
Call Taker (if no responder listed)
3. EMS paramedic
1. HELPER2. EMS - ambulance3. OA Responder 4. Call taker
(Risk Level)
(Response Type)
(Call Reason)
NORMAL(Not activated)
Normal routineNo issues
1. None
Care Provider Speaker Or
Older Adult SpeakerOlder Adult SpeakerNo speaker(Caller Type)
Care Provider Speaker Or
Older Adult Speaker
Figure 5-2: Diagram showing a possible response call classifier setup based on the study findings.
5. Dialogue Control – the findings from Study 2 suggest that the dialogue may need to be
adjusted depending on the caller type and risk level. Using the previous example from the
call classifier above, the information received by the dialogue control suggests that the
next response type should be an older adult responder or call taker if no responder is
listed. Based on this information, the dialogue control might then select a dialogue set or
script to follow based on a medium risk, fall call for an older adult caller. Some examples
are provided next.
In Table 5-1, the opening dialogue from the original script used by the first HELPER
prototype is shown. This script takes only ‘yes’ or ‘no’ responses and so the response call
classifier does not have much information to help the dialogue control. The user is
queried for more information and confirmation. The original HELPER dialogue is shown
in column 1.The user response in the succinct style is shown in column 2. Column 3
shows the changes to the response call classier settings as the dialogue proceeds.
160
Table 5-1: Example of how the original HELPER initial dialogue strategy and response call classifier may work with incoming user responses.
HELPER Dialogue (original script)
User Dialogue (system-initiative)
Response Call Classifier
OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help? Please say ‘yes’ or ‘no’.
Yes
Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Unknown
1st QUERY: Would you like me to call an ambulance? Please say ‘yes’ or ‘no’
Yes
Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Ambulance
CONFIRM: Okay {Mr. Smith}. I will call an ambulance right away. Please say ‘yes’ to confirm.
Yes
Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Ambulance
In Table 5-2, the user response is more social or human-to-human like and so more
information is obtained immediately. The response call classifier is set to high and the
dialogue set follows a “high alert” type script. The ambulance response is confirmed in
the 1st query and confirmation is requested from the user. The HELPER dialogue is
shown in column 1, user responses in a “human-human like” style are shown in column 2
and column 3 shows the changes to the response call classier settings as the dialogue
proceeds.
Table 5-2: Example of how initial dialogue from a high alert dialogue strategy and response call classifier may work with incoming user responses.
HELPER Dialogue Set (high alert script)
User Dialogue (mixed-initiative)
Response Call Classifier
OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help?
Yes, could you send an ambulance please?
Caller Type: Older Adult Call Reason: Unknown Risk Level: High Response: Ambulance
1st QUERY: You said you wanted {an ambulance}, is that correct?
Yes, that’s right, I need an ambulance
Caller Type: Older Adult Call Reason: Unknown Risk Level: High Response: Ambulance
CONFIRM: Okay {Mr. Smith}. I will call an ambulance right away. Please hold on.
Alright, thank you
Caller Type: Older Adult Call Reason: Unknown Risk Level: High Response: Ambulance
161
In Table 5-3, the user response leads to the classifier being set at a medium level risk for
an older adult fall call. The dialogue control activates the “medium alert” dialogue set
script. The Older Adult responder is suggested in the 1st query as the best response to
propose. The HELPER dialogue is shown in column 1, user responses in the “human-
human like” style are shown in column 2, and column 3 shows the changes to the
response call classier settings as the dialogue proceeds.
Table 5-3: Example of how initial dialogue from a medium alert dialogue strategy and response call classifier may work with incoming user responses. OA = Older adult.
HELPER Dialogue Set (medium alert script)
User Dialogue (mixed-initiative)
Response Call Classifier
OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help?
Oh, yes, I fell down
Caller Type: Older Adult Call Reason: Fall Risk Level: Medium Response: OA Responder
1st QUERY: You {fell down}. Would you like me to call {OA responder}?
Yes please, thank you
Caller Type: Older Adult Call Reason: Fall Risk Level: Medium Response: OA Responder
CONFIRM: Okay {Mr. Smith}. I will call an {OA responder} right away. Please hold on.
Alright
Caller Type: Older Adult Call Reason: Fall Risk Level: Medium Response: OA Responder
If the original script was followed for the situation in Table 5-3, the ambulance response
would still be proposed first and then other responders would be proposed subsequently.
Essentially the dialogue control will select different dialogue sets depending on the
response call classification. The Dialogue Manager would also keep track of the current
dialogue set and state being implemented as well as any necessary dialogue history.
The SDS in Table 5-1 is an example of the system-initiative dialogue style and Tables 5-2
and 5-3 are examples of the mixed-initiative dialogue style. Recall that the mixed-
initiative dialogue style means that the system will prompt the user for a response but if
more information is provided than requested, the system will attempt to decipher this
extra information. The knowledge created from these two studies will permit the
HELPER SDS to be expanded to the mixed-initiative style of dialogue.
162
6. The timing measures calculated from Study 2 can also be applied to the dialogue control
to ensure that the HELPER SDS communications do not extend beyond the time that is
considered typical for a response call. The spoken dialogue responses themselves can be
constructed in such a ways so as to meet the minimum number of speaker turns needed in
a live response call.
Application of Findings to the Response Generation component of the Response Handler:
7. Dialogue Text – the findings in Study 2 that suggested that the terms “paramedic” and
“ambulance” convey different meanings and represent different responses, could be
incorporated into the actual dialogue response presented to the user. Examples of how the
dialogue may change can be observed in Tables 5-2 (ambulance offer) and 5-3 (OA
responder offer) in column 1. Also, in Figure 5-2, in the medium risk situation, the
paramedic is offered as option #3, whereas in the high risk situation, the ambulance is
offered as option #2. These responses are stored in the HELPER computer and accessed
by the Response Generation sub-component of the Response Handler.
Application of findings to the HELPER Communication Module:
8. System testing with the CARES Corpus – when a new HELPER communication module
prototype is constructed, the system can be tested as a whole using keywords, phrases,
and emergency scenarios contained in the CARES corpus. Individual components of the
communication module can also be trained, such as the ASR language and pronunciation
models, and/or tested, such as the Speech and Dialogue Handlers.
5.3.3 The CARES Corpus
The CARES corpus developed in Study 3 can be used as a development and testing tool for the
design and development of the HELPER system, amongst other possible ASRs or SDSs that may
require interaction with older adult users.
Although a large number of speech databases have been created specifically for ASR training
and testing, none of these corpuses contained speech samples of older adults in emergency
situations speaking Canadian English. To our knowledge, the CARES corpus will be the first
163
speech database created that contains a collection of Canadian adult speech recordings including
younger and older adult actor simulated emotion during various personal emergency situations
derived from real response call recordings. Although this speech corpus was developed primarily
to be used for enhancing the HELPER communication module, the contents of this database can
also be applied to other research applications across several research disciplines. For example,
the five minute monologues maybe of interest to linguists who wish to study different speaking
patterns or word usage by individuals across various age groups (e.g., 23-90 years). Sociologists
may be interested in the different monologue topics selected by individuals of different ages.
Computer scientists might be interested in testing out various computer algorithms for improving
speech recognition or natural language understanding of the spoken text across age ranges.
Acousticians or speech language pathologists may be interested in the acoustic phonetic features
present in the speech samples of individuals representing the older and younger ages or in
emergency type situations. Other researchers may be interested in examining the “believability”
aspect of the enacted scenarios and speech samples provided.
5.4 Limitations
In this section, an overview of the main study limitations will be presented. Where appropriate,
recommendations have been included for future studies that may be conducted in a similar field
or fashion.
5.4.1 Study 1: Keyword and Phrase Identification
This study was limited by:
(1) the researcher’s ability to accurately transcribe the call recordings, to derive the most
appropriate word categories from the call transcripts, and to identify a small keyword set that
maximally represents the call conversations with limited bias; and
(2) the ability of the coders to select words that accurately represent the keywords of the call
conversations and to accurately categorize the keywords selected with limited bias.
Although inter-rater reliability between the three coders showed moderate agreement, two of
the three coders categorized words out-of-context and one in-context. It is recommended that
if this process were to be replicated, all the coders would extract the keywords in-context by
164
reading the transcripts. Having all the coders identify and categorize the words in the same
way should theoretically improve inter-rater reliability, but also allow for comparison of
words with multiple meanings and thus multiple categories.
5.4.2 Study 2: Statistical Measures
This study was limited by the sample size of response call recordings (from only one call centre)
upon which the results were derived. It would have been beneficial to obtain a larger sample size
or a more balanced sample size for the different call classifications for example, more samples of
the fall category, and from different call centres. However, given that the response call
classifications were not known prior to obtaining the response call recordings, this may have
been difficult. Future studies may want to consider working out the parameters and factors to be
included in and the number of samples required for the statistical analyses prior to the end of
response call data collection. Call meta-data was not available which would have been useful to
have in order to confirm some assumptions made; for example, the gender and sex of
participants, use of traditional push button in all cases, each call taker spoke with a different
caller and each caller was unique.
5.4.3 Study 3: Creating the CARES Corpus
The process of creating the CARES Corpus had several limitations:
(1) The end result of a response call was not known. This information would have been
beneficial for confirming the response call risk level and call reason classifications.
(2) Given the age of some of the speech participants it was decided that recording all 402 words
in the full keyword set would be too ambitious at this stage of the research.
(3) The recording results obtained in this study are limited by the speech participant’s ability to
act realistically like an older adult or caregiver caller in a PES during a response call in their
recording session.
In order to maximize realism in simulating PESs for speech sample recording, it likely would
have been beneficial to have the speech participants listen to some of the real response call
recordings prior their actual recording session. Although they were given a sample response
165
call scenario to read prior to their session, listening to an actual call would have been better
to help guide the actor’s performance rather than giving them open reign for creative
expression.
5.4.4 PES and PER Call Classifications
These studies are limited by the researcher’s ability to identify suitable categories for the PES
and PER models. For example, although caller type was characterized by two categories: older
adult and care provider, perhaps characterizing the call by a caller’s communication ability using
‘factual’ and ‘social’ categories, as coined by (Wolters et al., 2009), would have led to different
conclusions.
5.4.5 Methodology Limitations
The content analysis approach is flexible enough to be used qualitatively or quantitatively, which
is both its strength and its limitation. As Morgan (1993) stated, content analysis is argued to be
not qualitative enough by qualitative researchers and not quantitative enough by quantitative
researchers. Additionally, the reliability in the data obtained through content and conversational
analyses is highly dependent on the researchers who collect, code, and process the data,
especially when doing so qualitatively. Reliability of data tends to be demonstrated through the
use of multiple coders, descriptions of the process that show how the data and results are linked,
and by demonstrating support from existing research literature. However, there is still debate as
to what and how much evidence is required to support reliability and thus establish research
validity (Krippendorff, 2012).
5.5 Future Research
5.5.1 Supporting the ASR
With respect to identifying non-semantic information that could be used in conjunction with the
semantic information derived from the HELPER SDS, future studies may wish to consider
different measures from the ones examined. For example, in Study 2, speech intelligibility could
not be examined due to the quality of the call recordings received. However, future studies could
examine this measure if better quality call recordings were acquired. Other measures to consider
166
might be speaker disfluency, voice pitch, or shimmer and jitter. In a study by (Müller, Wittig, &
Baus, 2003) shimmer and jitter were used to identify a speaker’s age and gender.
5.5.2 Developing the Dialogue - Assessing Patterns in Response Call Conversations
The need to develop the communication dialogue for the older adult user was recommended by
researchers involved in the preliminary HELPER prototype testing. This recommendation
specifically described the need to add additional dialogue states to the HELPER communication
module. This need was based on a few factors. First, the HELPER prototype system does not
“hear” a user’s response until the system has finished speaking and has turned its microphone
‘on.’ Therefore, if the user speaks too soon, nothing is heard by the system. Second, the
HELPER would need some way to respond to instances of silence and out-of-vocabulary sounds
and when repetition is required. To determine how to improve the HELPER communication
dialogue for situations such as these described, one method would be to examine the
conversational dynamics between call takers and callers to identify how real call-takers handle
these conversational situations and to see how the users respond in return. This research would
be aligned with the research completed thus far using the real personal emergency response calls
and has, in fact, already started. The objective of this research in progress is: to identifykey
conversational patterns in personal emergency response calls.
This study looks at coding the call conversations based on dialogue acts and using these acts to
observe any conversational patterns that may be useful to replicate in the HELPER SDS. A
conventional conversational analysis performed at the conversation or turn level of the response
call is being considered. Figure 5-3 illustrates a flow diagram of how this work would unfold.
167
Call Transcriptions (SALT)
Personal Emergency Response Calls
(a)
Identify Dialogue Codes
Code Response Call Dialogue Acts
To Improve dialogue development (Dialogue Management)
Conventional Conversational Analysis at Conversation/Turn Level
Identify Patterns or Conversational Techniques
Figure 5-3: A flow diagram illustrating the methodology followed to analyse the calls and complete study objective.
Figure 5-4 further highlights where the work from this study would be applied along the pathway
to personal emergency response. In Figure 5-4, the potential results of this study are represented
by the term “dialogue acts” (see yellow star) which is positioned between the HELPER computer
and the user. It is located after the ‘conversation’ as any patterns identified would be applied to
further develop the dialogue states, the dialogue manger, and the actual dialogue used within or
by the HELPER.
168
Situation
Call reason
UserCaller Type
Physical-Cognitive
State
Risk Level
1. Personal Emergency Situation 2. The HELPER System
Speech and non-speech
Communication Ability
ClassifierASR
• Caller Type(Who is calling?)
• Call Reason (Fall or medical?)
• Risk Level(Patient acuity?)
Timing
SI
Key words and phrases
Word categories
Conversationalmeasures
HELPER Computer
What response?
Co
nve
rsat
ion
3. P
ER
S R
esp
on
se
Dialogue Acts
Figure 5-4: The pathway to personal emergency response with “dialogue acts” applied to help the HELPER.
This study is currently in the data processing phase.
5.5.3 HELPER Field Testing - Future Proposed Studies
The next research recommendation involves testing the HELPER in mock or real PESs. Before
in-field testing is possible (i.e., in a home of an older adult), a new and robust HELPER
prototype must be designed and developed and undergo mock testing using simulated users.
Several future studies are proposed next which will bring the HELPER up to this point where
actual system testing can be performed.
5.5.3.1 Developing the HELPER Speech Handler
CARES Corpus Current State
Although the speech recording for the CARES corpus has been completed, the process of
organizing the data into a format so that it is available for distribution is still ongoing. Manual
segmentation of the audio recordings of PES phrases and keywords has been completed, but the
169
work needs to be verified. Further work must be done to organize this information including
proper labelling and sorting into appropriate file folders. In order for these speech samples to be
used for ASR training, assuming a typical continuous speech recognizer is being used, the
phrases and words will need to be further segmented into phone units. This may be done
automatically (using forced alignment), however, the output would need some verification
especially for the keywords since these words were spoken were using five different methods of
expression (i.e., fast, slow, etc.) and may be more difficult for forced alignment software to
process.
Building the HELPER Speech Handler
A future example study involving the HELPER Speech Handler might include some or all
aspects of the following steps:
1. Perform forced alignment on the CARES Corpus audio speech samples, for example, on
keywords and phrases. However, phonetically balanced and compact sentences can be
used if needed.
2. Use these samples to build one or many acoustic models for the HELPER ASR.
3. Build-up the pronunciation and language models of the HELPER ASR.
4. Perform comparison testing of various ASR acoustic models with the phrases and
keywords from the CARES Corpus to identify the model with the best recognition
accuracy. The different acoustic models used in the comparison might include: (1) all
CARES corpus participant speaker samples, (2) only older adult speakers from the
CARES corpus, (3) only younger adults from the CARES Corpus, (4) an existing
acoustic model (e.g., AN4 or Wall Street Journal), (5) a mix of existing acoustic models
with CARES Corpus samples added.
5. Perform comparison testing of various language models using the CARES Corpus
content.
6. Build the Semantic Analyser and test for speech understanding using CARES Corpus
content.
7. Assemble the HELPER Speech Handler using the acoustic model with the highest
accuracy for recognizing the recorded PES speech samples from the CARES Corpus, the
associated pronunciation and language models and semantic analyser.
170
8. Test newly assembled HELPER Speech Handler with recorded speech samples from the
CARES Corpus.
5.5.3.2 Developing the HELPER Dialogue and Response Handlers
A future example study involving the HELPER Dialogue Handler and Response Handler might
include some or all aspects of the following steps:
1. Identify the necessary dialogue states to include in the HELPER SDS.
2. Determine the dialogue sets and actual dialogue responses to be spoken within each
dialogue state.
3. Build the speech synthesizer component of the Response Handler or pre-record the
dialogue responses for inclusion into the dialogue database.
4. Design the dialogue strategy to be followed by the dialogue control in the Dialogue
Manager based on previous research findings (i.e. from Study 2, and the conversational
patterns study described in section 5.5.2).
5. Code and setup the new dialogue strategy and other necessary components of the
Dialogue Manager and Response Handler.
6. Various dialogue strategies, dialogue sets, and dialogues may be compared and tested
using Wizard of Oz techniques with older adult research participants.
7. Perform a survey to obtain user feedback and comments.
8. Select the best dialogue strategy, dialogue sets, and dialogues to include in the HELPER
Dialogue Handler and Response Handler.
5.5.3.3 Testing the HELPER Module
A future example study involving the HELPER communication module might include some or
all aspects of the following steps:
1. Using the desired Speech Handler, Dialogue Handler, and Response Handler
components, build the HELPER communication module.
2. Test the new HELPER communication module/SDS in a home-like setting using
simulated users from the CARES Corpus.
3. Make necessary adjustments.
171
4. Test the system in a home-like setting using live older actors and younger adults (care
providers). Use various response strategies as determined from Study 2 and the study
described in section 5.5.2.
5. Perform a survey to obtain user feedback and comments.
5.6 Implications
The goal of the HELPER system is to identify and respond to potentially critical personal
emergency situations. As such, it needs to work well. The infrequent occurrence of a PES
combined with their delicate nature and the target age group for this technology (mainly older
adults), make PESs difficult to study in the real-world. To better understand the process of a
response call during a PES and to be able to characterize the response call conversation and
understand the spoken keywords and phrases that convey one’s need, actual response call
recordings are the best medium for study. Otherwise trying to witness these situations are
difficult and it would be unreliable to depend on the recall memory of people who have
experienced these events. The personal emergency response call recording is our pot of gold.
The knowledge created from these research studies were derived from live personal emergency
response call recordings. Furthermore the studies developed employed qualitative, quantitative,
and mixed method approaches in order to obtain the required data. In forming the PES and
response call models, this research also considered both the user and his/her environment. By
incorporating these research techniques and methods of analyses, this study is, in a sense,
attempting to create in the HELPER an ability to predict the outcome of a PES in order to
improve its ability to make decisions and respond appropriately to the user. The main research
findings from these two studies and the CARES Corpus development collectively contribute new
knowledge to, plus a research tool for, the future development of the HELPER.
With respect to the field of rehabilitation, if the development of the HELPER can be realized, the
potential will be there for the older adult to use this technology to help him/her age-in-place. The
HELPER would enable the older adult to access medical or emergency care whenever needed.
The main advantage over the traditional PERS is that the HELPER would not require active user
initiation because this system would also be actively monitoring the older adult in their home on
172
a continual basis. Upon identification of an adverse event, the HELPER would initiate
conversation with the user and contact assistance as appropriate. Older adults that are well cared
for medically tend to suffer from less impairment, recover more quickly, are healthier, and are
able to retain their functional ability for a longer time.
5.7 Final Remarks
This dissertation presents original work from three research studies. Chapter 2 describes the first
study where keywords, phrases, and word categories were identified from the response call
transcripts. These results could be applied to the development of the HELPER ASR and Speech
Informant units. In addition, a PES model was developed. Chapter 3 describes the second study
where a PER model was developed, conversational measures of response call conversations were
obtained, and significant trends in call conversations were identified that could be used in
combination with incoming semantic data to help the HELPER classify response calls. Being
able to classify response calls helps the HELPER predict a possible target response which in turn
enables it to modify its output dialogue to meet the needs of the caller type and situation risk
level. These results could be applied to the Speech and Dialogue Handler components of the
HELPER. Chapter 4 summarizes how a speech corpus was designed and developed for the
purpose of training and testing various components of the HELPER ASR and the system overall.
The combined results from these three studies provide sufficient preliminary knowledge as well
as a speech corpus that collectively can be used to continue the HELPER communication module
development phase and create the next, hopefully more robust, HELPER SDS.
173
Bibliography
Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B. & Hudson, R. (1999). Recognition of elderly speech and voice-driven document retrieval. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on (Vol. 1, pp. 145–148).
Anusuya, M. & Katti, S. K. (2009). Speech recognition by machine, a review. Internationala Journal of Computer Science and Information Security, 6(3).
Baba, A., Yoshizawa, S., Yamada, M., Lee, A. & Shikano, K. (2004). Acoustic models of the elderly for large-vocabulary continuous speech recognition. Electronics and Communications in Japan (Part II: Electronics), 87(7), 49–57.
Baber, C. & Noyes, J. (1996). Automatic speech recognition in adverse environments. Human Factors: The Journal of the Human Factors and Ergonomics Society, 38(1), 142–155.
Belshaw, M., Taati, B., Snoek, J. & Mihailidis, A. (2011). Towards a single sensor passive solution for automated fall detection. In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE (pp. 1773–1776).
Bernstein, M. (1999). “ Low-tech” personal emergency response systems reduce costs and improve outcomes. Managed Care Quarterly, 8(1), 38–43.
Blackwell, T. H. & Kaufman, J. S. (2002). Response time effectiveness: comparison of response time and survival in an urban emergency medical services system. Academic Emergency Medicine, 9(4), 288–295.
Blythe, M. A., Monk, A. F. & Doughty, K. (2005). Socially dependable design: The challenge of ageing populations for HCI. Interacting with Computers, 17(6), 672–689.
Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F. & Brennan, S. E. (2001). Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44(2), 123–147.
Campbell, N. (2000). Databases of emotional speech. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion: Developing a Conceptual Framework (pp. 34–39). Newark, N. Ireland.
Canadian_Red_Cross_Association. (2006). First Aid & CPR Manual. The StayWell Health Company.
Cavanagh, S. (1997). Content analysis: concepts, methods and applications. Nurse Researcher, 4(3), 5–13.
174
Chan, M., Campo, E., Estève, D. & Fourniols, J.-Y. (2009). Smart homes—current features and future perspectives. Maturitas, 64(2), 90–97.
Childress, D. S. (2003). Development of rehabilitation engineering over the years: As I see it. Journal of Rehabilitation Research and Development, 39(6; SUPP), 1–10.
CIHI. (2011). Health Care in Canada, 2011: A Focus on Seniors and Aging (pp. 1–162).
CIHI. (2013). National Health Expenditure Trends, 1975 to 2013 (pp. 1–182).
Clark, V. L. P. & Creswell, J. W. (2011). Designing and conducting mixed methods research. In Clark, Vicki L Plano and Creswell, John W (Ed.), (pp. 53–106). Thousand Oaks, CA.: Sage.
Cornman, J. C., Freedman, V. A. & Agree, E. M. (2005). Measurement of assistive device use: Implications for estimates of device use and disability in late life. The Gerontologist, 45(3), 347–358.
Cowan, D., Turner-Smith, A. & others. (1999). The role of assistive technology in alternative models of care for older people. In A. Tinker and F. Wright and C. McCreadie and J. Askham and R. Hancock and A. Holmans (Ed.), Alternative Models of Care for Older People (pp. 325–346). Age Concerns Institute of Gerontology.
Crede, E. & Borrego, M. (2010). A content analysis of the use of mixed methods studies in engineering education. In American Society for Engineering Education.
Cromdal, J., Osvaldsson, K. & Persson-Thunqvist, D. (2008). Context that matters: Producing “thick-enough descriptions” in initial emergency reports. Journal of Pragmatics, 40(5), 927–959.
Culatta, R. & Leeper, L. H. (1990). The differential diagnosis of disfluency. National Student Speech Language Association Journal, 17, 59–64.
Davies, K. N. & Mulley, G. P. (1993). The views of elderly people on emergency alarm use. Clinical Rehabilitation, 7(4), 278–282.
De San Miguel, K. & Lewin, G. (2008). Brief Report: Personal emergency alarms: What impact do they have on older people’s lives? Australasian Journal on Ageing, 27(2), 103–105.
Demiris, G., Hensel, B., Skubic, M. & Rantz, M. (2008). Senior residents’ perceived need of and preferences for``smart home’’sensor technologies. International Journal of Technology Assessment in Health Care, 24(1), 120.
Demiris, G., Rantz, M., Aud, M., Marek, K., Tyrer, H., Skubic, M. & Hussam, A. (2004). Older adults’ attitudes towards and perceptions of smart home technologies: a pilot study. Informatics for Health and Social Care, 29, 87–94.
175
Devillers, L. & Vidrascu, L. (2007). Real-life emotion recognition in speech. In Müller, C (Ed.), Speaker Classification II (pp. 34–42). Springer-Verlag.
Dibner, A. S. (1993). Personal response services present and future. Home Health Care Services Quarterly, 13(3-4), 239–243.
Disabled Living Foundation. (2009). Losing independence is a bigger ageing worry than dying. Retrieved May 31, 2015, from www.dlf.org.uk/blog/losing-independence-bigger-ageing-worry-dying
Doughty, K., Cameron, K. & Garner, P. (1996). Three generations of telecare of the elderly. Journal of Telemedicine and Telecare, 2(2), 71–80.
Downe-Wamboldt, B. (1992). Content analysis: method, applications, and issues. Health Care for Women International, 13(3), 313–321.
Dusan, S. & Rabiner, L. R. (2005). Can automatic speech recognition learn more from human speech perception? In P. 3rd Conf. Speech Tech. Hum.-Comput. Dialogue (pp. 21–36).
Eisenberg, M. S., Bergner, L., Hallstrom, A. & others. (1979). Cardiac resuscitation in the community. Jama, 241(18), 1905–1907.
Elo, S. & Kyngäs, H. (2008). The qualitative content analysis process. Journal of Advanced Nursing, 62(1), 107–115.
Fallis, W. M., Silverthorne, D., Franklin, J. & McClement, S. (2007). Client and responder perceptions of a personal emergency response system: Lifeline. Home Health Care Services Quarterly, 26(3), 1–21.
Federici, S. & Scherer, M. (2012). Assistive technology assessment handbook. CRC Press.
Field, A. (2005). Discovering statistics using SPSS (2nd ed.). Sage publications.
Fleming, J., Brayne, C. & others. (2008). Inability to get up after falling, subsequent time on floor, and summoning help: prospective cohort study in people over 90. Bmj, 337, a2227.
Fogle, C. C., Oser, C. S., Troutman, T. P., McNamara, M., Williamson, A. P., Keller, M., … Harwell, T. S. (2008). Public education strategies to increase awareness of stroke warning signs and the need to call 911. Journal of Public Health Management and Practice, 14(3), e17–e22.
Forslund, K., Kihlgren, A. & Kihlgren, M. (2004). Operators’ experiences of emergency calls. Journal of Telemedicine and Telecare, 10(5), 290–297.
176
Fox Tree, J. E. (1995). The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. Journal of Memory and Language, 34(6), 709–738.
Fraser, N. (1997). Assessment of Interactive Systems . In Gibbon, D. and Moore, R. and Winski, R. (Ed.), Handbook on Standards and Resources for Spoken Language Systems (pp. 564–615). Mouton de Gruyler, D-Berlin.
Freedman, V. A., Agree, E. M., Martin, L. G. & Cornman, J. C. (2006). Trends in the use of assistive technology and personal care for late-life disability, 1992-2001. The Gerontologist, 46(1), 124–127.
Furui, S. (2003). Toward robust speech recognition and understanding. In Text, Speech and Dialogue (pp. 2–11).
Furui, S., Nakamura, M., Ichiba, T. & Iwano, K. (2005). Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese. Speech Communication, 47(1), 208–219.
Garcia, A. C. & Parmer, P. A. (1999). Misplaced mistrust: The collaborative construction of doubt in 911 emergency calls. Symbolic Interaction, 22(4), 297–324.
Garner, M. & Johnson, E. (2007). Operational Communication: A paradigm for applied research into police call-handling. International Journal of Speech Language and the Law, 13(1), 55–75.
Georgila, K., Wolters, M. K. & Moore, J. D. (2010). Learning dialogue strategies from older and younger simulated users. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 103–106).
Georgila, K., Wolters, M., Karaiskos, V., Kronenthal, M., Logie, R., Mayo, N., … Watson, M. (2008). A fully annotated corpus for studying the effect of cognitive ageing on users’ interactions with spoken dialogue systems. In 6th International Conference on Language Resources and Evaluation.
Georgila, K., Wolters, M., Moore, J. D. & Logie, R. H. (2010). The MATCH corpus: A corpus of older and younger users’ interactions with spoken dialogue systems. Language Resources and Evaluation, 44(3), 221–261.
Gibson, M. J. & Hayunga, M. (2006). We can do better: lessons learned for protecting older persons in disasters.
Gilboy, N., Tanabe, P., Travers, D., Rosenau, A., Eitel, D. & others. (2005). Emergency severity index, version 4: implementation handbook. Rockville, MD: Agency for Healthcare Research and Quality, 1–72.
177
Glass, J. & Zue, V. (2003). 6.345 Automatic Speech Recognition, Spring 2003, Lecture#1 (Massachusetts Institute of Technology: MIT OpenCourseWare). Retrieved May 18, 2015, from http://ocw.mit.edu
Gold, B. & Morgan, N. (2000). Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons Inc.
Gorham-Rowan, M. M. & Laures-Gore, J. (2006). Acoustic-perceptual correlates of voice quality in elderly men and women. Journal of Communication Disorders, 39(3), 171–184.
Graneheim, U. H. & Lundman, B. (2004). Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. Nurse Education Today, 24(2), 105–112. doi:10.1016/j.nedt.2003.10.001
Haggag, M. H. (2013). Keyword Extraction using Semantic Analysis. International Journal of Computer Applications, 61(1), 1–6.
HAK, T. (1997). Coding effects in comparative research on definitions of health. The European Journal of Public Health, 7(4), 364–372.
Hall, D. & Sinard, R. J. (1998). The aging voice: how to differentiate disease from normal changes. Geriatrics, 53(7), 76–79.
Hall, N. E., Wagovich, S. A. & Bernstein Ratner, N. (2007). Language considerations in childhood stuttering: distinguishing between stuttering and other forms of disfluency in Section III: Intervention: Children who stutter with other co-occurring concerns . In Conture, Edward and Curlee, Richard (Ed.), Stuttering and related disorders of fluency (p. 162). Thieme.
Hamill, M., Young, V., Boger, J. & Mihailidis, A. (2009). Development of an automated speech recognition interface for personal emergency response systems. Journal of NeuroEngineering and Rehabilitation, 6(26), 1–11.
Handschu, R., Poppe, R., Rauss, J., Neundörfer, B. & Erbguth, F. (2003). Emergency calls in acute stroke. Stroke; a Journal of Cerebral Circulation, 34(4), 1005–9.
Heinbüchner, B., Hautzinger, M., Becker, C. & Pfeiffer, K. (2010). Satisfaction and use of personal emergency response systems. Zeitschrift Fuer Gerontologie Und Geriatrie, 43(4), 219–223.
Hessels, V., Le Prell, G. S. & Mann, W. C. (2011). Advances in personal emergency response and detection systems. Assistive Technology, 23(3), 152–161.
Hizer, D. D. & Hamilton, A. (1983). Emergency response systems: an overview. Journal of Applied Gerontology, 2(1), 70–77.
178
Hobbs, M. L. (1993). Product Design and Social Implications in a Personal Response Program. Home Health Care Services Quarterly, 13(3-4), 23–32.
Howell, P. & Kadi-Hanifi, K. (1991). Comparison of prosodic properties between read and spontaneous speech material. Speech Communication, 10(2), 163–169.
Hsieh, H.-F. & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277–1288.
Huckvale, M. (2004). SCRIBE Manual v 1.0 - Spoken Corpus Recordings in British English (1st ed.). Gower Street, London.
Hwang, U. & Morrison, R. S. (2007). The geriatric emergency department. Journal of the American Geriatrics Society, 55(11), 1873–1876.
Hyer, K. & Rudick, L. (1994). The effectiveness of personal emergency response systems in meeting the safety monitoring needs of home care clients. Journal of Nursing Administration, 24(6), 39–44.
IBM. (2014). IBM SPSS Statistical Software v.22.
Imbens-Bailey, A. (2000). The discourse of distress: A narrative analysis of emergency calls to 911. Language and Communication, 20(3), 275–296.
Johnson, J. L., Davenport, R. & Mann, W. C. (2007). Consumer feedback on smart home applications. Topics in Geriatric Rehabilitation, 23(1), 60–72.
Johnson, M., Cusick, A. & Chang, S. (2001). Home-screen: a short scale to measure fall risk in the home. Public Health Nursing, 18(3), 169–177.
Johnston, K., Grimmer-Somers, K. & Sutherland, M. (2010). Perspectives on use of personal alarms by older fallers. International Journal of General Medicine, 3, 231.
Jurafsky, D. (2014). CS224S/Linguist 285: Spoken Language Processing, Lecture 3: ASR: HMMs, Forward, Viterbi. Retrieved May 18, 2015, from http://web.stanford.edu/class/cs224s/
Jurafsky, D. & Martin, J. H. (2009). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Second.). Pearson Education Inc.
King, S. (2006). Language variation in speech technologies. In Brown, Keith (Ed.), Encyclopedia of Language and Linguistics (2nd ed.). Elsevier.
Klapuri, A. (2007). Semantic Analysis of Text and Speech: SGN-9206 Signal Processing Graduate Seminar II.
179
Kondracki, N. L., Wellman, N. S. & Amundson, D. R. (2002). Content analysis: review of methods and their applications in nutrition education. Journal of Nutrition Education and Behavior, 34(4), 224–230.
Koski, K., Luukinen, H., Laippala, P. & Kivela, S. L. (1996). Physiological factors and medications as predictors of injurious falls by elderly people: a prospective population-based study. Age and Ageing, 25(1), 29–38.
Krippendorff, K. (2012). Content analysis: An introduction to its methodology (Second.). Sage.
Lamel, L. (1989). Some Perspectives on Speech Database Development. In Speech Input/Output Assessment and Speech Databases.
Lamel, L., Minker, W. & Paroubek, P. (2000). Towards best practice in the development and evaluation of speech recognition components of a spoken language dialog system. Natural Language Engineering, 6(3&4), 305–322.
Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159–174.
LaPointe, L. L. (1994). Introduction to communication sciences and disorders. In Minifie, F.D. (Ed.), (pp. 351–397). Singular Publishing Group: McNaughton & Gunn.
Leadholm, B. J. & Miller, J. (1995). Language Sample Analysis: The Wisconsin Guide. Madision, WI.
Lee, T. & Mihailidis, A. (2005). An intelligent emergency response system: preliminary development and testing of automated fall detection. Journal of Telemedicine and Telecare, 11(4), 194–198.
Levine, D. A. & Tideiksaar, R. (1995). Personal emergency response systems: factors associated with use among older persons. The Mount Sinai Journal of Medicine, New York, 62(4), 293–297.
Linville, S. E. (2002). Source characteristics of aged voice assessed from long-term average spectra. Journal of Voice, 16(4), 472–479.
Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22(1), 1–15.
Longino, C. F. J. (1994). Myths of an aging America. American Demographics, 16(8), 36–42.
Madane, A. (2012). Identifying Keywords and Key Phrases. IJSCE, July.
180
Maddox, G. L. (1992). Personal response systems: An international report of a new home care service; Foreword. Routledge.
Mann, W., Belchior, P., Tomita, M. R. & Kemp, B. J. (2005). Use of personal emergency response systems by older individuals with disabilities. Assistive Technology, 17(1), 82–88.
Mann, W., Marchant, T., Tomita, M., Fraas, L. & Stanton, K. (2002). Elder acceptance of health monitoring devices in the home. Care Management Journals, 3(2), 91.
Mann, W., Ottenbacher, K. J., Fraas, L., Tomita, M. & Granger, C. V. (1999). Effectiveness of assistive technology and environmental interventions in maintaining independence and reducing home care costs for the frail elderly: A randomized controlled trial. Archives of Family Medicine, 8(3), 210.
Mays, N. & Pope, C. (2000). Qualitative research in health care: Assessing quality in qualitative research. BMJ: British Medical Journal, 320(7226), 50.
Mazzoni, D., Dannenberg, R. & et al. (2000). Audacity. Retrieved 2008, from http://audacity.sourceforge.net/
McCreadie, C. & Tinker, A. (2005). The acceptability of assistive technology to older people. Ageing and Society, 25(1), 91–110.
McLean, M. H. (2005). Design of a Speech Recognition Interface for a Personal Emergency Response and Health Monitoring System. University of Toronto.
McWhirter, M. (1987). A dispersed alarm system for the elderly and its relevance to local general practitioners. The Journal of the Royal College of General Practitioners, 37(299), 244.
Miller, J. F. & Chapman, R. S. (2008). Systematic Analysis of Language Transcripts Version 8. Madison, WI.
Miller, J. F. & Iglesias, A. (2006). Systematic Analysis of Language Transcripts (SALT), English and Spanish (Version 9). University of Wisconsin-Madison.
Mondada, L. (2012). The Conversation Analytic Approach to Data Collection. The Handbook of Conversation Analysis, 32–56.
Montgomery, C. (1993). Personal response systems in the United States. Home Health Care Services Quarterly, 13(3-4), 201–222.
Morgan, D. L. (1993). Qualitative content analysis: A guide to paths not taken. Qualitative Health Research, 3(1), 112–121.
181
Moyal, A., Aharonson, V., Tetariy, E. & Gishri, M. (2013). Keyword Spotting Methods. In Phonetic Search Methods for Large Speech Databases (pp. 7–11). Springer.
Müller, C., Wittig, F. & Baus, J. (2003). Exploiting speech for recognizing elderly users to respond to their special needs. In Proceedings of Eurospeech (Vol. 3, p. 1305\=A1308).
Mullie, A., Van Hoeyweghen, R. & Quets, A. (1989). Influence of time intervals on outcome of CPR. Resuscitation, 17, S23–S33.
Murray, I. R. & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.
Mӧller, S. (2005). Quality of Human-Machine Interaction over the Phone. In Quality of Telephone-Based Spoken Dialogue Systems (pp. 9–91). Springer.
Mӧller, S., Gӧdde, F. & Wolters, M. (2008). A corpus analysis of spoken smart-home interactions with older users. In Proceedings of the 6th international conference on language resources and evaluation (LREC) (pp. 735–740).
News Agencies. (2014). Fear of old age becomes acute after 50, study finds. Retrieved March 31, 2015, from http://www.telegraph.co.uk/news/health/news/10778168/Fear-of-old-age-becomes-acute-after-50-study-finds.html
Patel, S., Park, H., Bonato, P., Chan, L., Rodgers, M. & others. (2012). A review of wearable sensors and systems with application in rehabilitation. Journal of Neuroengineering and Rehabilitation, 9(12), 1–17.
Patil, S. A. & Hansen, J. H. (2007). Speech under stress: Analysis, modeling and recognition.
Piau, A., Campo, E., Rumeau, P., Vellas, B. & Nourhashemi, F. (2014). Aging society and gerontechnology: A solution for an independent living? The Journal of Nutrition, Health \& Aging, 18(1), 97–112.
Polit, D. F. & Beck, C. T. (2004). Nursing research: Principles and methods. Lippincott Williams & Wilkins.
Pons, P. T., Haukoos, J. S., Bludworth, W., Cribley, T., Pons, K. A. & Markovchick, V. J. (2005). Paramedic response time: does it affect patient survival? Academic Emergency Medicine, 12(7), 594–600.
Porter, E. J. (2003). Moments of apprehension in the midst of a certainty: some frail older widows’ lives with a personal emergency response system. Qualitative Health Research, 13(9), 1311–1323.
182
Porter, E. J. (2005). Wearing and using personal emergency respone system buttons. Journal of Gerontological Nursing, 31(10), 26–33.
Portet, F., Vacher, M., Golanski, C., Roux, C. & Meillon, B. (2013). Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Personal and Ubiquitous Computing, 17(1), 127–144.
Private_PERS_Call_Centre. (2008). Operations Protocol for PERS Call Centre.
Public Health Agency of Canada. (2014). Seniors’ Falls in Canada: Second Report. Division of Aging and Seniors .
Ramage-Morin, P. L. (2005). Successful aging in health care institutions. Statistics Canada.
Ramig, L. O. (1994). Introduction to communication sciences and disorders. In Minifie, F.D. (Ed.), (pp. 481–519). Singular Publishing Group: McNaughton & Gunn.
Rosamond, W. D., Evenson, K. R., Schroeder, E. B., Morris, D. L., Johnson, A.-M. & Brice, J. H. (2005). Calling emergency medical services for acute stroke: a study of 9-1-1 tapes. Prehospital Emergency Care, 9(1), 19–23.
Roush, R. E., Teasdale, T. A., Murphy, J. N. & Kirk, M. S. (1995). Impact of a personal emergency response system on hospital utilization by community-residing elders. Southern Medical Journal, 88(9), 917–922.
Ryan, G. (1999). Measuring the typicality of text: Using multiple coders for more than just reliability and validity checks. Human Organization, 58(3), 313–322.
Sacks, H., Schegloff, E. A. & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 696–735.
Salvi, F., Morichi, V., Grilli, A., Giorgi, R., De Tommaso, G. & Dessi-Fulgheri, P. (2007). The elderly in the emergency department: a critical review of problems and solutions. Internal and Emergency Medicine, 2(4), 292–301.
Scharenborg, O. (2007). Reaching over the gap: A review of efforts to link human and automatic speech recognition research. Speech Communication, 49(5), 336–347.
Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143–165.
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1), 227–256.
Shriberg, E. E. (1999). Phonetic consequences of speech disfluency.
183
Silverman, R. A., Galea, S., Blaney, S., Freese, J., Prezant, D. J., Park, R., … others. (2007). The “vertical response time”: barriers to ambulance response in an urban area. Academic Emergency Medicine, 14(9), 772–778.
Takahashi, S., Morimoto, T., Maeda, S. & Tsuruta, N. (2003). Robust speech understanding based on expected discourse plan. In INTERSPEECH.
Tam, T., Dolan, A., Boger, J. & Mihailidis, A. (2006). An intelligent emergency response system: Preliminary development and testing of a functional health monitoring system. Gerontechnology, 4(4), 209–222.
Taylor, A. & Agamanolis, S. (2010). Service users’ views of a mainstream telecare product: the personal trigger. In Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems (pp. 3259–3264).
Teas Gill, V. & Roberts, F. (2012). Conversation Analysis in Medicine. The Handbook of Conversation Analysis, 575–592.
Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.
Tinker, A. (1993). Alarms and telephones in emergency response-Research from the United Kingdom. Home Health Care Services Quarterly, 13(3-4), 177–189.
Vipperla, R., Wolters, M., Georgila, K. & Renals, S. (2009). Speech input from older users in smart environments: Challenges and perspectives. In Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments (pp. 117–126). Springer.
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., … Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition.
Waseem, H., Durrani, M. & Naseer, R. (2010). Prank calls: A major burden for an emergency medical service. Emergency Medicine Australasia, 22(5), 480–480.
Weiss, C. O. (2011). Frailty and chronic diseases in older adults. Clinics in Geriatric Medicine, 27(1), 39–52.
Whalen, M. & Zimmerman, D. (1987). Sequential and institutional contexts in calls for help. Social Psychology Quarterly, 172–185.
Williams, C. E. & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250.
184
Williams, G., Doughty, K., Cameron, K. & Bradley, D. A. (1998). A smart fall and activity monitor for telecare applications. In Engineering in Medicine and Biology Society, 1998. Proceedings of the 20th Annual International Conference of the IEEE (Vol. 3, pp.1151–1154).
Wilpon, J. G. & Jacobsen, C. N. (1996). A study of speech recognition for children and the elderly. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on (Vol. 1, pp. 349–352).
Wolters, M., Engelbrecht, K.-P., Gödde, F., Möller, S., Naumann, A. & Schleicher, R. (2010). Making it easier for older people to talk to smart homes: The effect of early help prompts. Universal Access in the Information Society, 9(4), 311–325.
Wolters, M., Georgila, K., Moore, J. D. & MacPherson, S. E. (2009). Being old doesn’t mean acting old: How older users interact with spoken dialog systems. ACM Transactions on Accessible Computing (TACCESS), 2(1), 2.
Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction. London: Sage.
World Health Organization. (2011). Global Health and Aging.
Young, V. & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22(2), 99–112.
Young, V., Rochon, E. & Mihailidis, A. (2014). Towards the development of a speech-based and intelligent personal emergency response system: Identification of key conversational features in personal emergency response calls. Gerontechnology, 13(2), 315.
Yuan, J., Liberman, M. & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. In INTERSPEECH.
Zajicek, M., Wales, R. & Lee, A. (2004). Speech interaction for older adults. Universal Access in the Information Society, 3(2), 122–130.
Zhou, G., Hansen, J. H. & Kaiser, J. F. (1998). Linear and nonlinear speech feature analysis for stress classification. In ICSLP (pp. 883–886).
Zimmerman, D. (1992a). Achieving context: openings in emergency calls. In Watson, G. and Seiler, R.M. (Ed.), Text in Context: Contributions to Ethnomethodology (pp. 35–51). Sage Publications.
Zimmerman, D. (1992b). The interactional organization of calls for emergency assistance. In Drew, P. and Heritage, J. (Ed.), Talk at Work: interaction in institutional settings (pp. 418–469). Cambridge University Press.
185
Zraick, R. I., Gregg, B. A. & Whitehouse, E. L. (2006). Speech and voice characteristics of geriatric speakers: A review of the literature and a call for research and training. Journal of Medical Speech-Language Pathology, 14(3), 133–142.
186
APPENDIX A: Common Older Adult Conditions
This list was referenced when classifying risk levels for response call transcripts.
Common Older Adult Conditions (Salvi et al., 2007; Gorina et al., 2006)
Possible Symptoms Omission Comments
Chronic DiseaseHeart Disease (Coronary Artery Disease)
intense pain (tighness, pressure, burning, heavy weight) in upper body (chest, shoulders, neck, arms, upper abs, jaw). Loss of consciousness, nausea, shortness of breath. Clammy, sweaty, cold, anxious, nervous, pale
Cancer No rapid attacksStroke confusion, slurred speech, moving difficulties, loss of consciousness,
vision problemsAlzheimer's No rapid attacksDiabetes shaky, sleepy, confused, loss of consciousness, clammy, pale, fast heart
beat.Renal Disease Fluid retention (swelling in legs/feet), seizures, vomiting, nausea, nose
bleeds, hand tremor, HBP, sluggish movement, prolonged bleeding.
Lower Respiratory Disease (Asthma/Bronchitis, emphesema)
Wheezing, coughing, tight chest, short of breath, bluish skin
InfectionInfluenza fever, chills, shivering, muscle pain, headaches, cough, sore throat,
stomach pain, vomiting, diarrheapneumonia shortness of breath, shivering, chills, headache, confusion, muscle pain,
weakness, chest pain, blue lips, cough, high feversepticemia fever, chills, rapid breathing and heart rate
Drug Reaction Sweaty/dry, confused, pale/red, anxious, breathing difficulty, stomach pain, nausea, vomiting, diarrhea
AccidentFall blood, pain (injury), loss of consciousnessMotor Vehicle Omit - not home based
187
Appendix B: Original HELPER Dialogue Strategy
(McLean, 2005)
Y/N?Yes No
Y/N?
Yes
No
Would you like me to call anambulance? (Please say 'yes' or 'no')
Amb
Hello {Mr. Smith}. This is yourautomated health monitoring system.
Do you need help? (Please say 'yes' or 'no').
Root
Okay {Mr.Smith}. Just to confirm,please say 'yes' if you need any help
or 'no' again if you do not.
RootNoConf
Y/N?Yes
NoOkay {Mr. Smith}. I will call an ambulance right away, please say 'yes' to confirm
AmbConf
Start
Y/N?Yes No
Would you like me to callsomeone else to help you?(Please say 'yes' or 'no').
ResponderSorry for the interruption.The system is now exiting
Afalse
I can connect you to a liveoperater if you would like.
Please say 'yes' to be connected,or say 'no' to exit the system.
OpThere are two people on your contact list,
your {daughter Anne} and your {neighbour Paul}.
Would you like me to call {your daughter Anne}?
(Please say 'yes' or 'no')
List
Y/N?
Yes
No
Y/N?Yes No
I will call your {daughter Anne}.Please say 'yes' to confirm.
Name1Conf
I will call your {neighbour Paul}.Please say 'yes' to confirm or
'no' to be connected to an operator.
Name2Conf
Y/N?Yes
No
Y/N?
No
Yes
Connecting you to a live operator now.
Aop
I am calling your {neighbour Paul}now. I will give you an update momentarily.
Aname2
I am calling your {daughter Anne}now. I will give you an update momentarily.
Aname1
Y/N?YesNo
I am calling an ambulance now.I will also notify one of your responders.I will give you an update momentarily.
Aamb
188
Appendix C: Small Keyword Vocabulary Set
(Occ = occurrence, LR=low risk, MR=medium risk, HR=high risk, Fall=fall call, Med=medical call, OA = older adult, CG= caregiver, Final Cat= final category)
Occ Word Root LR MR HR Fall Med OA CG Final Cat
35 GET 1 1 1 1 1 1 1 2
21 NEED 1 1 1 1 1 1 2
18 COME 1 1 1 1 1 1 2
12 HELP 1 1 1 1 1 1 1 2
12 GO 1 1 1 1 1 1 2
10 WANT 1 1 1 1 1 1 2
6 TAKE 1 1 1 1 1 1 2
5 CALLING 1 1 1 1 1 1 1 2
4 CALL 1 1 1 1 1 2
4 COMING 1 1 1 1 1 2
4 SEND 1 1 1 1 1 1 2
3 TELL 1 1 1 1 1 1 2
3 PHONE 1 1 1 1 1 2
3 BRING 1 1 1 1 2
3 CHECK 1 1 1 1 2
2 WANTS 1 1 1 1 2
17 BREATHING 1 1 1 1 1 1 4
13 SEE 1 1 1 1 1 4
12 FEEL 1 1 1 1 1 4
9 FEELING 1 1 1 1 1 1 1 4
5 HEAR 1 1 1 1 1 4
3 BREATH 1 1 1 1 1 4
3 BREATHE 1 1 1 1 4
2 LIFT 1 1 1 1 1 4
2 SUGAR 1 1 1 4
2 MOVE 1 1 1 1 4
2 STRAIGHTEN 1 1 1 4
2 STAND 1 1 1 4
1 BEATING 1 1 1 4
1 BEATS 1 1 1 4
1 AWAKE 1 1 4
1 BREATHES 1 1 1 4
17 PLEASE 1 1 1 1 1 1 5
3 THANKS 1 1 1 1 1 1 5
30 HELLO 1 1 1 1 1 1 6
6 HI 1 1 1 1 1 6
4 BYE 1 1 1 1 1 6
20 PARDON 1 1 1 1 1 1 1 7
12 AGAIN 1 1 1 1 1 1 1 7
25 AMBULANCE 1 1 1 1 1 1 8
12 HOSPITAL 1 1 1 1 1 1 8
11 SOMETHING 1 1 1 1 1 1 8
189
Occ Word Root LR MR HR Fall Med OA CG Final Cat
5 DAUGHTER 1 1 1 1 1 1 8
5 DOCTOR 1 1 1 1 1 8
5 OXYGEN 1 1 1 1 8
3 PARAMEDICS 1 1 1 1 1 8
3 SOMEBODY 1 1 1 1 1 8
2 EMERGENCY 1 1 1 1 8
2 SOMEONE 1 1 1 1 1 8
2 NEIGHBOUR 1 1 1 8
1 BROTHER 1 1 1 8
1 FIREFIGHTER 1 1 1 8
1 MEDICS 1 1 1 8
1 ASSISTANCE 1 1 1 8
27 CAN 1 1 1 1 1 1 9
17 WHAT 1 1 1 1 1 1 9
13 COULD 1 1 1 1 1 1 9
6 WHEN 1 1 1 1 1 1 9
5 WHO 1 1 1 1 1 9
4 WHERE 1 1 1 1 1 9
3 WILL 1 1 1 1 1 1 9
21 BACK 1 1 1 1 1 1 10
18 CHEST 1 1 1 1 1 1 10
8 HEAD 1 1 1 1 1 1 10
6 HEART 1 1 1 1 1 1 10
4 BODY 1 1 1 1 1 10
4 STOMACH 1 1 1 1 1 10
4 ARM 1 1 1 10
4 LEG 1 1 1 10
3 THROAT 1 1 1 1 10
3 SHOULDER 1 1 1 10
3 FEET 1 1 1 10
3 SIDE 1 1 1 1 10
1 ABDOMIN 1 1 1 10
1 NOSE 1 1 1 10
1 KIDNEYS 1 1 1 10
1 NECK 1 1 1 10
1 FACE 1 1 1 10
1 RIB 1 1 1 10
45 DON’T 1 1 1 1 1 1 1 11
35 CAN’T 1 1 1 1 1 1 11
5 DIDN’T 1 1 1 1 1 1 11
6 DOESN’T 1 1 1 1 1 11
1 ISN'T 1 1 1 11
1 CANNOT 1 1 1 11
31 UP 1 1 1 1 1 13
21 DOWN 1 1 1 1 1 1 13
16 BED 1 1 1 1 1 1 13
13 FLOOR 1 1 1 1 1 1 13
190
Occ Word Root LR MR HR Fall Med OA CG Final Cat
13 OUT 1 1 1 1 1 1 13
9 OFF 1 1 1 1 1 1 14
8 TIME 1 1 1 1 1 1 14
7 DAY 1 1 1 1 1 14
3 HOUR 1 1 1 1 14
2 DATE 1 1 1 14
1 MISTAKE 1 1 14
1 WAIT 1 1 1 14
1 WEATHER 1 1 1 14
153 NO 1 1 1 1 1 1 1 1-n
45 NOT 1 1 1 1 1 1 1-n, 11
59 YEAH 1 1 1 1 1 1 1 1-p
16 DO 1 1 1 1 1 1 1-p
16 RIGHT 1 1 1 1 1 1 1-p
7 YUP 1 1 1 1 1 1 1-p
4 SURE 1 1 1 1 1 1 1-p
3 YA 1 1 1 1 1-p
2 MAYBE 1 1 1 1 1 1-p
71 OKAY 1 1 1 1 1 1 1 1-p, 3-n
13 ALRIGHT 1 1 1 1 1 1 1-p, 3-p
45 THANK_YOU 1 1 1 1 1 1 1 1-p, 5
99 YES 1 1 1 1 1 1 1 1-p, 6
3 ASK 1 1 1 2-a
2 TRY 1 1 1 1 2-a
2 ASTHMA 1 1 1 3-e
2 DIALYSIS 1 1 1 3-e
1 FIBRILLATION 1 1 1 3-e
1 DIABETIC 1 1 1 3-e
1+1 SHAKY 1 1 1 1 3-n
46 HAVE 1 1 1 1 1 1 3-n
23 FELL 1 1 1 1 1 3-n
22 PAIN 1 1 1 1 1 1 3-n
11 BAD 1 1 1 1 1 1 3-n
8 FALLEN 1 1 1 1 1 3-n
7 BLOOD 1 1 1 1 1 1 3-n
7 DIZZY 1 1 1 1 1 3-n
7 FALL 1 1 1 1 1 1 3-n
7 SICK 1 1 1 1 1 3-n
5 BLEEDING 1 1 1 1 1 3-n
5 WEAK 1 1 1 1 1 1 3-n
5 HIGH 1 1 1 1 3-n
5 HURT 1 1 1 1 3-n
5 TROUBLE 1 1 1 1 1 3-n
4 WRONG 1 1 1 1 1 1 1 3-n
4 PROBLEMS 1 1 1 1 1 3-n
4 SORE 1 1 1 1 1 1 3-n
4 PRESSURE 1 1 1 1 3-n
191
Occ Word Root LR MR HR Fall Med OA CG Final Cat
4 SWEATING 1 1 1 1 1 3-n
4 TIGHTNESS 1 1 1 1 3-n
4 DIFFICULTY 1 1 1 1 3-n
4 TERRIBLE 1 1 1 3-n
3 INJURED 1 1 1 1 1 3-n
3 COLD 1 1 1 1 3-n
3 BROKEN 1 1 1 3-n
3 CONSTIPATION 1 1 1 3-n
3 RASH 1 1 1 3-n
2 ATTACK 1 1 1 1 1 1 3-n
2 SHORT 1 1 1 1 1 3-n
2 HARD 1 1 1 1 3-n
2 NAUSEATED 1 1 1 3-n
2 PNEUMONIA 1 1 1 1 1 3-n
2 THROWING_UP 1 1 1 1 1 3-n
2 CLAMMY 1 1 1 1 3-n
2 DIARRHEA 1 1 1 3-n
2 TEMPERATURE 1 1 1 3-n
1 FAINTED 1 1 1 3-n
1 LOW 1 1 1 3-n
1 RAPID 1 1 1 3-n
1 STROKE 1 1 1 3-n
1 SUFFERING 1 1 1 3-n
1 TIA 1 1 1 3-n
1 TREMOR 1 1 1 3-n
1 WHEEZING 1 1 1 3-n
1 DISCOMFORT 1 1 1 3-n
1 NUMB 1 1 1 3-n
1 UNCONSCIOUS 1 1 1 3-n
1 ACCIDENT 1 1 3-n
1 LOSING 1 1 1 3-n
1 FEVER 1 1 1 3-n
1 FIRE 1 1 1 3-n
1 HEART_ATTACK 1 1 1 3-n
1 VOMITING 1 1 1 3-n
1 CHOKING 1 1 3-n
1 CONFUSED 1 1 1 3-n
1 DISORIENTED 1 1 1 3-n
1 ANGINA 1 1 1 3-n
33 KNOW 1 1 1 1 1 1 1 3-n
(qualifiers)
12 HAVING
1 1 1 1 1 1 3-n
(qualifiers)
8 GOING
1 1 1 1 1 1 3-n
(qualifiers)
2 EVERYTHING 1
1
1 1
3-n (qualifiers)
46 WELL 1 1 1 1 1 1 3-p
14 GOOD 1 1 1 1 1 1 1 3-p
192
Occ Word Root LR MR HR Fall Med OA CG Final Cat
10 FINE 1 1 1 1 1 1 3-p
1 SORRY 1 1 5, 1-n
383 I 1 1 1 1 1 1 1 Identifier
35 ME 1 1 1 1 1 1 Identifier
26 147 137 112 162 160 113
14.05% 79.46% 74.05% 60.54% 87.57% 86.49% 61.08%
193
Appendix D: Unique Keyword Occurrences
Low Medium High
Unique word count 3 44 31 MISTAKE MOVE YA
ACCIDENT CHECK TIGHTNESS
SORRY SIDE OXYGEN
BYE BLEEDING
TROUBLE BEATING
DAY BEATS
FIREFIGHTER ABDOMIN
MEDICS NOSE
KIDNEYS ISN'T
NECK FIBRILLATION
WAIT FAINTED
DIABETIC LOW
FEVER RAPID
FIRE STROKE
HEART_ATTACK SUFFERING
VOMITING TIA
WANTS TREMOR
STRAIGHTEN WHEEZING
STAND ANGINA
NEIGHBOUR SUGAR
DATE EMERGENCY
DIARRHEA NAUSEATED
TEMPERATURE AWAKE
FEET BREATHES
HOUR BROTHER
ASK DISCOMFORT
COLD NUMB
CONSTIPATION UNCONSCIOUS
RASH ASTHMA
ARM DIALYSIS
TERRIBLE HARD
HURT
ASSISTANCE
FACE
RIB
CANNOT
WEATHER
LOSING
CONFUSED
DISORIENTED
SHOULDER
BROKEN
LEG
CHOKING
194
Appendix E: Questions for Participant
1. Male or Female? 2. Birthdate? 3. Age? 4. Birth Place (City and Province), if not Canada, indicate Country? 5. Mother tongue (language/Country)? (e.g. English/Britain) 6. Cultural Ethnicity? (e.g. French) 7. This experiment will require you to listen to noises and sounds over speakers or headphones. Do
you have any hearing impairment that may affect your performance in this study? If yes, explain. (e.g., deaf in right ear)
8. This experiment requires you to be able to see and read various documents and scripts. Do you have any visual impairment that will affect your performance in this study? If yes, please explain. (e.g., blind in right eye)
9. This experiment will require you to sit/stand inside a small sound proof booth. Please indicate if you have any conditions that may affect your performance during the voice recordings? If yes, explain. (e.g., claustrophobia, trouble with sitting or standing for long periods)
10. Do you have any medical conditions that we should be aware of that may affect your performance during the voice recordings? If yes, please explain. (e.g., respiratory, cardiac (heart) problems)
11. Do you have any previous acting experience? If yes, please indicate number of years as an actor? What type of acting (e.g. theatre, movie, etc.)?
12. What is the highest level of education you have attained? (e.g., elementary, high school, post-secondary (bachelor’s, master’s, doctoral), college)
13. Would you like to be contacted in the future for other research study opportunities for which you may be a suitable candidate? (no or yes) If yes, please provide a contact number or email.
14. Are you interested in receiving a copy of published literature that discusses the results of this study? May take several years before any publications are available. (no or yes)
15. If yes, how would you like to be contacted? Please provide contact information. 16. It is our hope that a freely accessible older adult speech database can be made available to
interested researchers/individuals to help continue the development of speech recognition technologies for older adults. If an older adult speech database is successfully developed from this study but your name, contact and medical information are not included, but your speech sample, age, gender, region of birth and ethnicity are included – would you agree with allowing the database to be used for future research studies and technology development projects by other interested researchers and individuals? (no or yes)
195
Appendix F: Key Words and Phrases List
SET 1 (SET 2 is the reverse order) INDEX KEY PHRASES KEY WORDS
SECTION 1 1 Will you help me? WILL
2 My breathing is not good. GOOD
3 I have tightness in the chest. TIGHTNESS
4 I need the ambulance to take me to the hospital. HOSPITAL
5 Uh, I need help but I didn’t fall. DIDN’T
6 I can’t move it. MOVE
7 They said to bring her in. BRING
8 I want to go to the hospital. GO
9 She has dialysis today, but she’s really sick. DIALYSIS
10 I’m having a terrible time. TERRIBLE
11 What day is it today? DAY
12 I can’t hear you, can you speak up? HEAR
13 He can't feel his body. BODY
14 I think I’m having a TIA. TIA
15 I’ve got a pain in my chest. CHEST
16 Ya, everything’s good. YA
17 Try calling my son. TRY
18 It’s an emergency, we need the ambulance right away. EMERGENCY
19 I’m really weak. WEAK
20 Can you call my brother? BROTHER
21 Is the ambulance coming? COMING
22 I’d like the paramedics to come. PARAMEDICS
23 I can’t straighten my leg. LEG
24 My head is light, I’m very sick. HEAD
25 He has discomfort in the chest. DISCOMFORT
26 He hit his back head and there’s blood BLOOD
27 I’m okay. OKAY
28 He’s choking on something. CHOKING
29 I feel awful, I’m throwing up constantly. THROWING UP
30 Oh, I’m dizzy. DIZZY
31 Can you get somebody else? SOMEBODY
32 Please ask the Superintendent to open the door. ASK
33 Sorry, I pushed it by mistake. MISTAKE
34 Um, he’s clammy. CLAMMY
35 Pardon me? Talk louder! PARDON
36 I get panic attacks. ATTACK
37 Can you send somebody down to my place? CAN
SECTION 2
38 I’m the caregiver, he has pains in his stomach. STOMACH
39 Yes, everything’s just fine, thank you. FINE
196
40 He breathes kind of funny. BREATHES
41 It’s affecting my breathing. BREATHING
42 I take water pills. TAKE
43 I’m not breathing very good again. AGAIN
44 Sugar, blood sugar too low. LOW
45 Computer off! OFF
46 I had like a tremor on my chest. TREMOR
47 I was running hot and cold, hot and cold, it’s just terrible. COLD
48 Can you get someone else? SOMEONE
49 My neighbour is helping me. NEIGHBOUR
50 Can you help me? ME
51 He’s broken his leg. BROKEN
52 Heh? I cannot hear you. CANNOT
53 Could someone come to the house and check me over. CHECK
54 I’ve fallen down. FALLEN
55 My back is very sore. SORE
56 I’m not well, I need some oxygen. WELL
57 I have terrible excruciating pain at night in my back. HAVE
58 Something isn’t right. ISN'T
59 Yes, I’m afraid I might fall over. YES
60 Yeah, I just tested the system. YEAH
61 I can’t get up. UP
62 Who are you? WHO
63 I’m wheezing too much. WHEEZING
64 I might fall down. DOWN
65 Her throat is all swollen up. THROAT
66 I’m having problems getting up. PROBLEMS
67 What is the date today? DATE
68 I’m nauseated. NAUSEATED
69 I can’t breathe this morning. BREATHE
70 Hello? HELLO
71 I broke my right arm. ARM
72 What time is it? TIME
73 No, everything is fine, thank you. EVERYTHING
74 He’s got asthma. ASTHMA
SECTION 3
75 I hurt my ribs, one rib feels broken. RIB
76 Something popped out the side of my stomach. SIDE
77 I think I need the firefighter medics. MEDICS
78 Yes, I do need help. DO
79 She’s confused. CONFUSED
80 The face has come alive again. FACE
81 I have a sore neck and I’m not feeling very good. NECK
82 I have a very high fever. FEVER
83 It could be angina. ANGINA
84 My sugar is low. SUGAR
85 I don’t know what happened. KNOW
86 I wonder if you could send somebody down to my place? SEND
87 My house is on fire! FIRE
88 Get help, I can’t get him up. GET
197
89 I’m having trouble breathing again. TROUBLE
90 What is the hour? HOUR
91 I have severe constipation. CONSTIPATION
92 I’m in a lot of pain. PAIN
93 I fainted again today. FAINTED
94 I've got diarrhea and I'm heavy and irritated. DIARRHEA
95 Yeah, sure, I could do with something. SURE
96 Call the staff. CALL
97 I have a bleeding nose. NOSE
98 She had a bad fall. FALL
99 My grandma’s fallen down and we can’t seem to lift her up. LIFT
100 I have difficulty breathing. DIFFICULTY
101 It beats for a while and then seems to break, and then starts again.
BEATS
102 I believe he’s out, unconscious. UNCONSCIOUS
103 He’s losing a lot of blood. LOSING
104 He says he’s numb. NUMB
105 I’m aching all over, it’s my back, my adomin, my abs, everything! ABDOMEN
106 She’s disoriented. DISORIENTED
107 I was calling to tell you I was alright. TELL
108 Yes, I’ve seen the doctor yesterday. DOCTOR
109 Could you get an ambulance please? COULD
110 What do you mean? WHAT
111 Yes, please, my mom needs an ambulance. PLEASE
SECTION 4
112 I was taking some medication and I developed a horrible rash. RASH
113 I’m a little wobbly on my feet. FEET
114 I might fall out of bed. OUT
115 I just wanted you to call my daughter. DAUGHTER
116 I pulled something. SOMETHING
117 I'm very short of breath. SHORT
118 My mother wants me to get an ambulance. WANTS
119 Please send the firefighter! FIREFIGHTER
120 There’s nothing wrong, bye bye. BYE
121 Where is the ambulance? WHERE
122 I’m sweating and have discomfort in the chest. SWEATING
123 I have atrial fibrillation with my heart. FIBRILLATION
124 My heart isn’t beating smoothly. BEATING
125 I’m not injured, no. INJURED
126 There is a lot of bleeding. BLEEDING
127 I’m in bad shape. BAD
128 No, I don’t want the ambulance. WANT
129 Something’s wrong. WRONG
130 I’m able to breathe alright. ALRIGHT
131 When can you send for a paramedic? WHEN
132 He needs an ambulance. AMBULANCE
133 I don’t know, I just don’t feel good. DON’T
134 My husband fell down, he’s on the floor in the kitchen. FELL
135 It’s the caregiver calling, can you send an ambulance please? CALLING
136 Yes, maybe he can help. MAYBE
198
137 I slid out of bed. BED
138 It’s not a heart attack. HEART_ATTACK
139 I can’t catch my breath. BREATH
140 He doesn’t feel too well. DOESN’T
141 I can’t straighten the left one. STRAIGHTEN
142 I’m feeling sick. FEELING
143 Wait! I beg your pardon? WAIT
144 I’m really suffering right now, I need care. SUFFERING
145 I need a pull. NEED
146 Eh, Thanks for your help. THANKS
147 I had a little stroke. STROKE
148 I can’t even stand. STAND
SECTION 5
149 Nuh… no, I can’t see any bruises. SEE
150 No, I’m having problems. HAVING
151 I hurt myself. HURT
152 He’s not awake anymore. AWAKE
153 I have high blood pressure. PRESSURE
154 Can you phone my sister? PHONE
155 I keep going to the bathroom. GOING
156 Someone help me! HELP
157 I’m vomiting. VOMITING
158 Ah, yes, I was wondering, could the paramedics come and see me?
COME
159 No, I should be back in the hospital. NO
160 Um, I think I have pneumonia. PNEUMONIA
161 I need some oxygen. OXYGEN
162 I just don’t feel well this morning. FEEL
163 I can’t get off the floor. FLOOR
164 I have trouble with my heart. HEART
165 It’s her husband. She feels she has broken her shoulder. SHOULDER
166 My fluid is back up again. BACK
167 My kidneys aren’t working. KIDNEYS
168 Yup, I fell and hurt myself. YUP
169 I'm sick and I'm in bed and I can't do anything for myself. CAN’T
170 Can we get some assistance? ASSISTANCE
171 I have high cholesterol. HIGH
172 I have a hard time breathing. HARD
173 I have ackward breathing, it’s very rapid. RAPID
174 She’s just not well. NOT
175 I’m shaky. SHAKY
176 Ah, it was an accident, thank you. ACCIDENT
177 I have a high temperature. TEMPERATURE
178 Hi, who’s there? HI
179 What’s the weather like today? WEATHER
180 I’m sorry? I didn’t hear you. SORRY
181 I’m very sick. SICK
182 That’s right, I’m really really sick. RIGHT
183 I’m fine, thank you. THANK_YOU
184 I can hardly walk. I
199
185 I’m a diabetic, see? DIABETIC
NUMBERS Counting from 0 to 20, in ones. Counting from 30 to 90, in tens.
200
Appendix G: Emergency Scenarios
Emergency Scenarios
Notation Symbols (not in all scenarios): 1. Non-verbal comments, extra information, and simultaneous speech cues are italicized and in parenthesis, e.g. {calls an ambulance}. 2. Words that are incomplete end with a: -- , e.g., pineap--. 3. Pauses are indicated by: …, e.g., okay…bye. 4. When one speaker is cut-off by the other speaker, the sentence ends by: //.
Scenario 1: Low Risk Accident
The Situation: Imagine you are Mrs. Smith, around 85 years old, settling down to relax in your favourite arm chair. While adjusting yourself, you accidentally push the personal emergency response button without knowing it. The emergency call taker comes on the speaker phone suddenly asking what’s wrong. You inform her that you don’t need any help and you are fine. --- Scenario Start ---- E Hello Mrs Smith, this is Judy from AssistMe Canada, how can I help you? C Pardon? E Hello Mrs Smith? C Yes, dear. E Hi, this is Judy from AssistMe. C Yes? E Are you alright? C Yes, everything’s just fine. E Okay, is there anything that I can do for you? C No, I just had my home care worker here and I’m all looked after, and a just feeling fine. E Alright…well…you have a good day. C Thank you very much. E You’re welcome, bye. --- Scenario End ----
Scenario 2: Low Risk Accident
The Situation: Imagine you are Mrs. Smith, around 70 years old. You have accidentally pushed your help button while opening a can of pickles in the kitchen. You are expecting AssistMe to respond so you can tell them it was an accident. --- Scenario Start ---- E Ms. Smith, it’s Judy from AssistMe Canada, how may I help you? C Yeah, thank you, I pushed it by mistake. E Alright, have a good day. C Alright, bye. --- Scenario End ----
201
Scenario 3: Low Risk - Accident
The Situation: Imagine you are Mr. Smith (John) around 80 years old. You are sitting in your easy chair watching your favourite television show. Suddenly, a voice is heard from the telephone speaker and you are quite surprised. You realize you must have pressed your panic button by mistake during the show. You inform the Emergency Call Taker that everything is fine. --- Scenario Start ---- E Hello Mr. Smith, this is Judy from AssistMe Canada, how may I help you? {no response, TV sounds} E Hello John, do you need any help? C No, I don’t know what w— happen--. E Okay, we got a signal from the button that you wear, so you may have pressed it accidentally. C I don’t know what happened there. E Alright, is there anything else we can do? C No thank you, I’m just fine. E Okay then, I’ll reset, have a good day. C Thank you. E You’re welcome, good_bye. --- Scenario End ----
Scenario 4: Medium Risk – Fall
The Situation: Imagine you are Mrs. Smith (Jane) around 85 years old. You haven’t gotten much exercise lately and have been feeling weak and frail. You use a walker which you rely on heavily for support. This afternoon you were walking from the bedroom to the kitchen but somehow turned too quickly and tripped over the leg of your walker. You are not hurt but you are alone and cannot get to the phone. You’ve tried a few times to get up but you just don’t have the strength to pull yourself up. You push your help button. You want to ask for your son Fred to give you a hand to get up. --- Scenario Start ---- C Hello? E Hello, this is Judy from AssistMe Canada, is this Mrs. Smith? C Yeah, this, this is Jane. {Frustrated} I've just fallen and I can't get up. E Okay, I'm gonna…are you hurt? C Ah, no, I’m not hurt. E You're not hurt? Okay, I'm gonna call your responders to help you, alright? C {slight pause} Pardon? E I'll call your responders to get someone to help you. C Yes ... {E starts to speak} okay. E Okay, just a moment. {Non verbal action: Call Taker calls the responder} E Mrs. Smith? C Yes? E Yes, your son, Fred, is on his way to help you. C Uh, okay. E Okay? C Yup. E Alright then, bye for now. --- Scenario End ----
202
Scenario 5: Medium Risk – Fall
The Situation: Imagine you are Mr. Smith (Frank), the 82 year old husband of Mrs. Smith, your 80 year old wife. One day you hear a big crash and you are dismayed to discover that Mrs. Smith has tripped over the dog and fallen down. Mrs. Smith thinks her shoulder is broken and you need help quickly. You press the help button and wait for the emergency call taker to respond. --- Scenario Start ---- E Hello Mrs. Smith, this is Judy from AssistMe Canada, how may I help you? C {urgent, concerned voice} Yes, it's her husband Frank, she has fallen, a…tripped over her d-- the dog, fallen on the floor, and she feels she has broken her shoulder. E Oh, okay. C Can we get some assistance? E And what shoulder do you think she broke? C Eh…it's the right shoulder. E Is there any bleeding? C Ah…I can't see any de--, I'll take a look. E Okay. {Caregiver checks for blood, can hear wife moaning from husband moving her around} C No, there is none. E Oh okay, we will call the ambulance, hold on. --- Scenario End ----
Scenario 6: High Risk - Fall
The Situation: Imagine you are Mrs. Smith, a frail 85 year old woman who lives alone. You have some hearing loss in your right ear and depend a lot on a cane to help you around your house, otherwise you are just fine. This morning you are in the bathroom getting ready when you slip on some water on the floor and hit your head on the tub. Although dazed, luckily you are still conscious but you feel some blood on the back of your head. You manage to get up and onto a chair but you don’t have much energy and need some help. You push your help button and wait for the Emergency Call Taker to respond. --- Scenario Start ---- E Hello Mrs. Smith, it's Judy calling from AssistMe Canada, how may I help you? C {No response} E Mrs Smith? C {slow} Hello? E Hi, how are you? C I fell…could someone come to the house and help me? E Are you hurt? C Well…I'm bleeding. E Where are you bleeding from? C Come to the back, or come to front door, I'll have to turn off an alarm. E Okay, where are you bleeding from? C {no response} E Miss. Smith? C Yes? E Where are you bleeding from? C 54 Bankok street, apartment 201. E {Louder} No, no, where are you {emphasize} bleeding from? C My head.
203
E Okay, one moment okay? {Calls EMS} E Miss. Smith, the ambulance is on the way. C Thank you…{E starts to speak} are they coming to the front door? E You're welcome. Yes. C Oh. E Alright, so we'll call you back shortly but help is on the way. C Alright, thank you. E You're welcome. --- Scenario End ----
Scenario 7: Medium Risk - Medical
The Situation: Imagine you are Jane, an 85 year old female who lives alone. You have some medical complications such as high blood pressure and are currently taking some medication, but are otherwise healthy. One day you start feeling nauseous and can’t stop throwing up. You are scared, weak, feel terrible and you want help quickly. You press your help button and request that the emergency responder calls your daughter to help you. --- Scenario Start ---- C Hello? E Hello, Jane, it's Judy from AssistMe Canada. C This is Jane. E Hello, how are you? C {weak, shaky voice} Oh, I need help. E What’s wrong? C Oh I, I keep throwing up and going to the bathroom. E You…you’re vomiting? E How long has this been going on? C {painful and drawn out} Oh, it just started now {E speaks as C mumbles another word that is incomprehensible}. E Okay…okay, is there anyone there with you right now? C No. E Okay…okay so do you want me to call an ambulance for you or {C starts to speak} did you wan--// C No, No, I just want you to call my daughter. E Okay, do you know why you’re vomiting? C No. E No, you don’t know, okay, just one moment, I just want to see your daughter’s// C Yeah, and get her to get m-- , ah…Claire to come over. E Is, you’re daughter’s name…is Claire? C No, her name is Tonya. E Okay, you want me to call Tonya and so Tonya can get Claire to come over? C Yes, {E starts speaking} please. E Who's…who’s Claire? C {sigh}. E Is Claire your caregiver? C One, one of the girls that {E starts speaking} works// E One…okay, are you sitting down right now? C Eh? E Are you sitting down? C Yes. E Okay, so you’re nauseated and you’re vomiting? C Yup. E Alright, and…that’s it?
204
C {guttural noise}. E Are you having any difficulty breathing as well? C No {whimper}. E No, {C mumbles during the next word} okay…okay just, one moment, I’m going to call Tonya to get Claire, okay? C Yes, please. {dialing for the responder} E Okay, Jane? C Hm? E Okay, it’s Judy again from AssistMe, so I’ve spoken to your daughter she’s going to try and call, ah…the…I guess, the agency that Claire works for. C Thank you. E Okay, so, I am going to try to get someone to go over and stay with you until they come okay? C {silence}. E Do you need us to call you an ambulance? C No. E No ambulance, okay, so I’ll, I’ll try and get someone else to come over, okay? C Thank you {mumbled words}. E I’ll call you back. --- Scenario End ----
Scenario 8: High Risk - Medical
The Situation: Imagine you are Mrs Smith, an 85 year old frail woman who lives alone. Today you’ve been feeling a bit off, the weather is very humid and in the afternoon you didn’t much feel like eating lunch. Now you begin to feel shaky and start to have more and more difficulty with breathing. You start to worry as you aren’t sure what’s happening. Maybe you have anxiety or your sugar levels are low or is it your blood pressure? You feel like you shouldn’t move too quickly or too much. You find a chair and push your help button. You ask the emergency call taker to get your brother, Jerry, to come over and help you. --- Scenario Start ---- E Hello Mrs. Smith it's Bob calling from AssistMe Canada, how may I help you? C {No response} E Mrs. Smith? C {shaky, breathing difficulty} Yes, I am here. E Do you need any help? C I need help. E What's wrong? C I'm, I, I'm, I'm all shaky. E Okay C and, and uhm// E How is your breathing? C It, my breathing is not, not, not too good. E Not too good? Okay, do you have any chest pain? C No. E No, okay, would you like me to call the ambulance? C Well, no, I, I, I must get, not yet I don't think. E No? C Well, I don't know. E Who would you like me to call? Jerry? C Yes, maybe he can help me// E Call Jerry? C Yup. E Okay, hold on okay?
205
C Okay, {mumbles} thank you. {Call taker calls Jerry} E Mrs. Smith? C Yes? E Jerry is on his way. C Okay. E Okay, so we'll call you back in about fifteen minutes okay? C Thank you very much. E Okay, you're welcome. --- Scenario End ----
Scenario 9: High Risk - Medical
The Situation: Imagine you are Mrs. Smith, a frail woman of 75. The weather has been extremely humid and hot and you’ve been hanging out inside your home where it’s cooler and less humid. Over the last two days you’ve started to have a harder time with breathing but you think it’s just the weather. Today you are feeling a bit weak and you find it increasingly more difficult to get your breath. You decide to press your help button. You might need some oxygen from the paramedics. --- Scenario Start ---- E Hello Mrs. Smith, this is Bob from AssistMe Canada, how may I help you? C Yes, I was wondering, could the paramedics come and see me. E Okay, what’s wrong? C I have not cl--, I don’t have tightness of chest or pain, but I’m having trou--{breathe hard} ble breathing. E Okay, so, no pain in your chest? C No pain, no, no, no tightness nothing. E Okay, how long has it been going on? C Oh, ah…well… I’d say mostly today. E Okay, now have you changed colour or anything? C Haven’t changed a thing. E Okay, and are you sweaty, are you clammy at all? C Very dizzy yesterday. E Okay, alright, so I’m going to call them now, is your apartment door unlocked? C Yes, um, do you want the outer one lo* unlocked too? E Um, let me just check to see if we have an entry code for you. E No, they can get into the building, just make sure that your door is not locked. C Well, that’ll be wonderful. E Okay, alright, so you can do that, I’ll come back to you once they’re on the way okay? C Okay. E Alright. --- Scenario End ----
206
Appendix H: Emergency Response Services Visits
On-Site Visit with Emergency Response Services
The on-site visits to emergency call responders were short single day events that lasted less than
the entire day. The findings are all specific to the city of Toronto where the offices and fire hall
were located. The visit with the firefighters was the shortest and consisted of an informal
interview with the three firefighters that lasted less than one hour. The visit to the EMS dispatch
centre in Toronto lasted several hours and consisted of informal interviews with two EMS
dispatchers, observations to see the process of how the calls are received and dispatched, and
listening to incoming EMS calls with one EMS dispatcher. The visit to the local call centre lasted
several hours as well, and consisted of informal interviews with several call takers and the call
taker leader, observations to see the process of how the calls are handled and responded to, and
listening to incoming responsecalls with three different call takers.
Firefighters
The firefighters in addition to the paramedics may be dispatched to a scene if it is unclear who
will be able to reach the location first. Police are also dispatched for the same reason and if there
is a possible dispute, accident involving vehicles or pedestrians, or other need for police services.
During the discussion, the firefighters mentioned that it is occasionally necessary to force entry
into a home or building by breaking down a door or window if there is no other apparent way to
enter and if a person in medical or emergency distress is presumed to be inside. In terms of older
adults, common call types may be for fires that occur in the kitchen, for example, because
something is burning on the stove. When they approach a scene, typically within a few seconds
they are able to tell if a person is responsive and breathing. This type of statement may suggest
that the firefighter’s primary concern would be assessing the health status of the individual of
concern.
EMS Dispatchers
The EMS dispatchers follow a typical dialogue structure when receiving a call. Their basic
objectives are listed in order of importance:
207
1. Verify caller’s location;
2. Verify caller’s contact info;
3. Identify what is happening. The dispatcher’s initial concern is to determine if the person
experiencing the problem is conscious and breathing. Then, depending on what the caller
says, a set script is followed which suggests what the dispatcher should query next.
4. Categorize the call. Incoming calls are categorized according to the perceived level of
response required. This assessment may also help the dispatcher steer the communication or
dialogue in the appropriate direction based on the call category.
Table H1 provides an example of how the incoming emergency response calls are categorized.
This information was derived from informal discussions and is only provided as an example.
Specific details of what is considered within each call category would need to be re-verified.
Table H1: Emergency response call classifications based on the type of situation.
Alpha Bravo Charlie Delta Echo
Send ambulance after designation
(e.g., paramedic truck, ambulance w/ supplies)
Send ambulance & fire truck (& police)
(e.g., chest pain, breathing, pedestrian/cycle/motor accident, long fall, stab/gunshot)
Send ambulance and police
(e.g. unconscious, not breathing)
The EMS dispatcher’s tools consist of a headset with microphone and two computer monitors
with keyboard and mouse for data entry. For every incoming call, the dispatcher must be very
alert, and must multi-task. He/she is looking to see if information on the phone number is
available as well as the caller’s location. Details on the incoming call situation are entered to
provide information for the emergency responders and for logging, the dialogue script is also
followed, and the call is classified. EMS dispatchers must make decisions on the fly very quickly
and try to respond to calls and dispatch assistance in a minimum amount of time.
208
Personal Emergency Response Call Centre Call Takers
The personal emergency response call taker setup is similar to the EMS dispatcher in that a
headset with microphone is worn and the user sits in front of a computer monitor where
information is received and entered during the call. The call taker follows a basic dialogue script
for their opening utterance (described in the literature review in Chapter 1), and general
guidelines for the remainder of the dialogue in which their goal is to determine what kind of
response is being requested. The call centre protocol provides call takers with basic information
on what details to request in order to inform EMS. Also, the use of the protocol ensures a
minimum level of call standardization for both quality control and company liability. During the
on-site visit, several response calls were listened to over the course of part of the day, although
the vast majority of the calls were non-emergent (e.g., routine “check-in” calls). This is in line
with other literature which discusses the fact that a majority of calls are not emergency calls
(Hamill et al., 2009). Similar to the EMS dispatcher, the call taker is also required to multi-task
during the call and must remain alert. They have access to a client’s medical history (whatever
was provided by the subscriber), as well as information on possible non-EMS responders. While
a call is in progress, this information must be read and processed by the call taker, important
details about the call must be logged, and call takers may reference the dialogue guidelines while
also listening to the caller and making decisions on how to respond to the call itself.
209
Appendix I: Summary of Peer Reviewed Journal Papers
Young V, Mihailidis A. (2013). The CARES Corpus: A database of older adult actor simulated
emergency dialogue for developing a personal emergency response system. International Journal
of Speech Technology. 16:55-73. (*work outlined in Chapter 4 of this dissertation)
Young, V., & Mihailidis, A. (2010). Difficulties in Automatic Speech Recognition of dysarthric
speakers and the implications for speech-based applications used by the elderly: A literature
review. Assistive Technology Journal, 22:99-112. (*review paper resulting from comprehensive
exam paper)
Hamil, M., Young, V., Boger, J. and Mihailidis, A. (2009). Development of an automated speech
recognition interface for personal emergency response systems. Journal of NeuroEngineering
and Rehabilitation, 6(26). (*assisted in paper review, added background information, and
assisted with final revisions).