content analyses of personal emergency response calls ......and emotional speech, including personal...

Content Analyses of Personal Emergency Response Calls: Towards a More Robust Spoken Dialogue-Based Personal

Emergency Response System

by

Victoria Young

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Rehabilitation Sciences Institute in collaboration with the Institute of Biomaterials and Biomedical Engineering

University of Toronto

© Copyright by Victoria Young 2016

ii

Content Analyses of Personal Emergency Response Calls:

Towards a More Robust Spoken Dialogue-Based Personal

Emergency Response System

Victoria Young

Doctor of Philosophy

Rehabilitation Sciences Institute in collaboration with Institute of Biomaterials and Biomedical Engineering

University of Toronto

2016

Abstract

In an attempt to address identified usability barriers of traditional push-button-type personal

emergency response systems, a novel automated, intelligent, spoken dialogue-based personal

emergency response system is being developed. To design this system and make it more robust

for end-users, further information, currently not available in research literature, is needed to

improve the artificial intelligence and spoken dialogue components of the system. Using a mixed

methods design, this dissertation describes three studies that derive this needed information from

real personal emergency response calls. The first study identified 185 keywords and phrases

spoken by system users; 17 categories for classifying keywords; and a personal emergency

situation model including caller type, call reason, and risk level. The second study expanded the

situation model to a personal emergency response model by adding a response-type

classification. Various statistical analyses were applied to response calls using call classifications

and select conversational measures. Significant trends in call data that could be used to pre-

inform the automated personal emergency response system dialogue manager of a call’s potential

outcome were identified. Words per minute and turn length in words were found to be possible

predictors of caller type. Emergency medical services were the predominant response requested

for high risk calls and medium risk calls and non-professional responders appeared mainly in

medium risk calls. Care provider and older adult callers were also found to employ different

conversational strategies when responding to the call taker. In the third study, a spoken speech

corpus was developed containing younger and older adult, actor simulated, spontaneous, read,

iii

and emotional speech, including personal emergency response keywords, phrases, and scenarios.

Taken together, these research results will contribute towards the design and development of a

more robust automated, personal emergency response system for older adults to help them age-

in-place.

Keywords: aging-in-place, assistive technology, personal emergency response, personal

emergency response system, content analysis, speech corpus, older adult, spoken dialogue

system

iv

Acknowledgments

This research has developed through the concerted efforts and contributions of many individuals,

groups and organizations. As the Nigerian’s say, “it takes a whole village to raise a child.”

Supervisory Committee: First and foremost, I would like to acknowledge and thank my

supervisor, Alex Mihailidis, for providing me with a unique opportunity to work on this research

project. I extend a great big ‘thank you’ to you for supporting and encouraging me throughout

these many years. Your dedication and hard work is inspiring to watch and your quiet caring and

patience has been very much appreciated.

I would like to acknowledge and thank the other members of my supervisory committee,

Elizabeth Rochon, Gerald Penn, Tom Chau, and Willy Wong, for giving me guidance and

contributing their expertise, critiques, wisdom, and ideas to the project. Your input has helped to

give this research shape and a solid ground on which to grow.

Examiners: I would like to acknowledge and extend my thanks to my internal-external

examiner, Yana Yunusova and external-external examiner, Ann McKibbon, for contributing their

time and providing thoughtful comments and critique on my dissertation. Your efforts have

helped me to refine the project and have taught me how to better defend my work.

Research Participants: I would like to acknowledge and thank the 40 volunteer participants

who graciously lent their time and voices in the development of the CARES corpus. It is through

your willingness to participate in research studies such as these that great strides can be made in

research and technology development.

Non-Research Organizations: I would like to acknowledge and thank the Personal Emergency

Response Call Centre who provided the real call recordings. These calls are the foundation of

this project and could not have been completed without your willingness to collaborate.

I would like to acknowledge and thank the employees of the local Personal Emergency Response

Call Centre, the Toronto EMS Communications Centre, and Toronto Fire Station #343, who took

the time to explain their work setup, how calls are responded to, and how emergency situations

are assessed. These on-site visits have provided needed context in order to better understand the

tasks of the emergency responder and to get a feeling of the emotions involved when dealing

with live, personal emergency response situations.

v

Research Collaborators: I would like to acknowledge and thank the two keyword coders,

Rozanne Wilson and Tammy Siemenkowski, for their time and efforts spent in identifying and

categorizing keywords in this project. Additionally, I also would like to thank Tammy for further

categorizing the personal emergency response calls by risk level.

I would like to acknowledge and thank, Mark Chignell, for providing statistical guidance and

lending his expertise and ideas for this project, specifically Study 2.

I would like to acknowledge and thank the speech processors, Heidi Diepstra, Andrew Chignell,

Oleksandr Nishta, and Sanaz Alali, for their effort, time, and precision in processing sound files.

Research Groups: I would like to acknowledge the many research teams who provided

opportunities to listen to and discuss research within a supportive community environment.

These teams include: the Toronto Rehabilitation Institute – University Health Network’s iDAPT

Communication and Artificial Intelligence and Robotics Teams; and at the University of

Toronto, the Oral Dynamics Lab (in Speech-Language Pathology) (lenders of the sound

attenuation booths); the Sensory Communications Team (in the Institute of Biomaterials and

Biomedical Engineering) (lenders of the sound level meter); and the Computational Linguistics

Lab (in Computer Science).

Special thanks go to the members of my home lab, the Intelligent Assistive Technology and

Systems Lab, my research family. Each of the members, past and present, have helped create a

warm and supportive environment and atmosphere in which to spend copious amounts of time

talking about “intelligent assistive technology and systems research” as well as non-research life.

I would also like to thank the administrative staff at the Rehabilitation Sciences Institute, iDAPT

in the Toronto Rehabilitation Institute – University Health Network, and the Institute of

Biomaterials and Biomedical Engineering for always responding quickly to my questions with

happy smiles and friendly faces.

Cheerleaders: Last but not least, I would like to acknowledge and thank my family and friends,

and especially my husband and daughter for their patience and encouragement over these

doctoral research years. Your presence, listening, caring, assistance, kindness, thoughtfulness,

and laughter remind me each day that I am not alone on this journey.

vi

Funding Organizations: I would like to acknowledge and thank my supervisor and the many

organizations that provided funding for this research project. These funding sources included: the

Canadian Institutes of Health Research Strategic Training Initiative in Health Research (CIHR-

STIHR) Fellowship in Health Care Technology and Place (FRN:STP 53911); the National

Science and Engineering Research Council (NSERC) Graduate Award (doctoral); the Toronto

Rehabilitation Institute-University Health Network’s TRI-OSOTF Student Scholarship Fund

(which receives funding under the Provincial Rehabilitation Research Program from the Ministry

of Health and Long-Term Care in Ontario. The views expressed in this dissertation do not

necessarily reflect those of the Ministry); the University of Toronto’s Rehabilitation Sciences

Institute; the University of Toronto (Open Scholarship and Doctoral Completion Award);

Engineers Canada-TD Meloche Monnex.

vii

Preface

La Cuisine By Jules Renard (1864-1910)

Seigneur, s’il est vrai que vous seul soyez grand, ne réservez pas à ma vieillesse un château,

mais faites-moi la grâce de me garder, comme dernier refuge, cette cuisine avec sa marmite

toujours en l’air,

avec la crémaillère aux dents diaboliques,

la lanterne d’ecurie et le moulin à café,

le litre de pétrole, la boîte de chicorée extra et les allumettes de contrebande,

avec la lune en papier jaune qui bouche le trou du tuyau de poêle,

et les coquilles d’oeufs dans la cendre,

et les chenets au front luisant, au nez aplati,

et le soufflet qui écarte ses jambs raides et dont le ventre fait de gros plis,

avec ce chien à droite et ce chat à gauche de la cheminée, tous deux vivants peut-être,

et le fourneau d’où filent des étoiles de braise,

et la porte au coin rongé par les souris,

et la passoire grêlée, la bouille bavarde et le gril haute sur pattes comme un basset,

et le careau cassé de l’unique fenêtre dont la vue se paierait cher à Paris,

et ces pavés de savon,

et cette chaise de paille honnêtement percée,

et ce balai inusable d’un côté,

et cette demi-douzaine de fers à repasser, à genoux sur leur planche, par rang de taille,

comme des religieuses qui prient, voilées de noir et les mains jointes.

viii

{English Translation}

Lord, if it is true that you alone are great, do not reserve a castle for my old age,

but grant me the grace to keep, as a last refuge, this kitchen with its cooking pot always in the air,

with the pot hanger, and its evil teeth,

the stable lantern and coffee grinder,

the litre of oil, the box of “extra” chicory and the contraband matches,

with the yellow paper moon covering the stove pipe hole,

and the egg shells in the ash,

and firedogs with shining fronts and flattened nose,

and the bellows that spread stiff legs, with big belly folds,

with this dog on the right and this cat on the left of the fireplace, both alive, perhaps,

and the furnace where ember stars spin,

and the door with the corner gnawed by mice,

and the pockmarked colander, the talkative kettle, and the grill, high on its legs like a basset,

and the single broken window pane with a view one would pay dearly for in Paris,

and the cobblestones of soap,

and this chair of straw, honestly pierced,

and this broom, hard worn on one side,

and this half dozen of irons, kneeling on their boards, arranged by size, like the nuns who pray,

veiled in black, with clasped hands.

ix

Situating the Work within Rehabilitation Science

The field of Rehabilitation Science is defined as “an integrated science dedicated to the study of

human function and participation and its relationship to health and well-being” (Rehabilitation

Sciences Institute Handbook, 2014/2015 p. 4). “Using basic and applied methods, the science is

focused on phenomena at the level of the cell, person, family, community, or society to develop

and evaluate theories, models, processes, measures, interventions and policies to prevent,

reverse, or minimize impairments; enable activity; and facilitate participation” (Graduate

Department of Rehabilitation Science [GDRS] Handbook, 2007, p. 4.).) Within the realm of

rehabilitation science, the research completed as part of this dissertation contributes to the area of

Rehabilitation Technology Sciences.

The research described in this dissertation focuses on deriving new knowledge from real personal

emergency response calls. Keywords and phrases were identified using keyword categories; a

method for characterizing personal emergency situations and emergency response was

developed; and patterns or trends in call conversations were examined based on various

conversational and verbal measures. As well, audio recordings of spoken keywords, phrases, and

simulated personal emergency response situation scenarios by younger and older adult actors

were collected to create a spoken speech database tool. Together, these research outcomes will

help to advance knowledge in the area of personal emergency response as well as further the

design and development of an automated, artificially intelligent, spoken dialogue-based personal

emergency response system contained within a smart home monitoring system called the

HELPER.

The HELPER technology as a whole is considered a rehabilitation intervention because through

its use, individuals will be able to access appropriate medical care or obtain emergency attention

when needed. Access to immediate medical attention will help to prevent and minimize

impairment resulting from “waiting too long” for treatment or care. Minimizing impairment will

ultimately help to facilitate the older adult’s participation in daily living activities and will

support their aging-in-place.

x

Table of Contents

ACKNOWLEDGMENTS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐IV

PREFACE‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐VII

TABLEOFCONTENTS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐X

LISTOFTABLES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XVII

LISTOFFIGURES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XIX

LISTOFACRONYMS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐XXIII

CHAPTER1 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐1

1 INTRODUCTIONANDLITERATUREREVIEW‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐1

1.1 Dissertation Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 1

1.2 Dissertation Overview ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 2

1.3 Introduction to the Problem ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 3

1.3.1 Aging‐in‐Place with Assistive Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 3

1.3.2 A Novel PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 5

1.3.3 The First HELPER Prototype ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 7

1.3.4 Research Rationale and Problem Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 8

1.3.5 Research Response ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 8

1.4 Literature Review ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9

1.4.1 PART I: PERS Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9

1.4.1.1 Health Challenges for the Older Adult ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 9

1.4.1.2 Personal Emergency Response Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 10

1.4.1.3 PERS Technology Basics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 11

1.4.1.4 PERS Use by Older Adults ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 12

1.4.1.5 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 14

1.4.1.6 The HELPER Spoken Dialogue System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 16

1.4.2 PART II: Human to Computer Spoken Dialogue Interactions ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 20

1.4.2.1 Variables Affecting ASR Recognition Accuracy ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 20

1.4.2.2 The Older Adult Voice and ASR ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 22

1.4.2.3 The Older Adult User and Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 23

1.4.2.4 Spoken Dialogue Strategy ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 24

xi

1.4.3 PART III: Human to Human Emergency Dialogues ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 25

1.4.3.1 Emergency Response Call Basics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 25

1.4.3.2 Emergency Response Call Structure ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 26

1.4.4 Literature Review Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 28

1.5 Research Purpose and Objectives ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 29

CHAPTER2 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32

2 IDENTIFICATIONOFKEYWORDSANDPHRASESSPOKENBYCALLERSIN

PERSONALEMERGENCYRESPONSECALLS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32

2.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 32

2.2 Abstract ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 32

2.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 33

2.3.1 Need for a New PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 33

2.3.2 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 34

2.3.3 HELPER Prototype Testing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 34

2.3.4 Designing for the End‐User ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 35

2.3.5 Study Objective and Significance ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36

2.3.6 Background ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36

2.3.6.1 An Automated and Intelligent HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 36

2.3.6.2 The HELPER Communication Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 38

2.3.7 Study Focus as Applied to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 41

2.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 42

2.4.1 Research Design Method ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 42

2.4.1.1 Method Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43

2.4.1.2 Method Implementation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43

2.4.1.3 Method Approaches ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 43

2.4.2 Research Design Details ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44

2.4.2.1 Research Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44

2.4.2.2 Research Setting ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 44

2.4.2.3 Data Collection ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 45

2.4.2.4 Data Processing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 46

2.4.2.5 Data Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 47

2.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 52

2.5.1 Extraction of keywords ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 52

xii

2.5.2 Keyword Results from Coders ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 54

2.5.3 Characterizing the Personal Emergency Situation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 57

2.5.3.1 Proposed PES Characteristics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 57

2.5.3.2 PES ‐ Caller Type ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 58

2.5.3.3 PES ‐ Risk Level ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 58

2.5.3.4 PES ‐ Call Reason ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59

2.5.3.5 PES ‐ Communication Ability ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59

2.5.4 The Personal Emergency Situation (PES) Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 59

2.5.5 Classifying the Personal Emergency Response Calls ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 60

2.5.6 Reduction of Keyword List ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 61

2.5.7 Identification of Key PES Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 67

2.5.8 Keywords in Various PESs ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 68

2.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 69

2.6.1 Word Categories ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 69

2.6.2 Coding Methods ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 70

2.6.3 Full Keyword List Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 70

2.6.4 The PES Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 71

2.6.5 Small Keyword List Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 72

2.6.6 PES Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 73

2.6.7 Application to HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 74

2.6.8 Study Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 75

2.7 Conclusion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 76

CHAPTER3 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐77

3 IDENTIFICATIONOFCONVERSATIONALTRENDSINPERSONALEMERGENCY

RESPONSECALLS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐77

3.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 77

3.2 Abstract ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 77

3.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78

3.3.1 Need for a New PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78

3.3.2 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 78

3.3.3 HELPER Prototype Testing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 79

3.3.4 Older Adults and Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 80

3.3.5 Study Objective & Research Significance ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 80

xiii

3.3.6 Background ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 81

3.3.6.1 The HELPER System ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 81

3.3.6.2 The HELPER Communication Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 82

3.3.6.3 Human to Machine Spoken Dialogue Systems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 85

3.3.7 Study Focus as Applied to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 86

3.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 87

3.4.1 Research Design Method ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 87

3.4.1.1 Method Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 88

3.4.1.2 Method Implementation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 89

3.4.1.3 Method Approaches ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 89

3.4.2 Research Design Details ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90

3.4.2.1 Research Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90

3.4.2.2 Research Setting ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90

3.4.2.3 Data Collection ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 90

3.4.2.4 Data Processing ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 91

3.4.2.5 Data Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 92

3.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98

3.5.1 The Conventional Conversational Analysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98

3.5.1.1 Two Main Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98

3.5.1.2 A Closer Look at Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 98

3.5.1.3 The Personal Emergency Response (PER) Model ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 100

3.5.2 Conversational Analysis using PER Categories ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101

3.5.2.1 Descriptive Statistics ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101

3.5.2.2 Call Breakdown Using PER Classifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 101

3.5.2.3 Breakdown of Response Types ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 103

3.5.3 Conversational Analysis using Conversational Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 104

3.5.3.1 Analysis of Verbal Ability Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 105

3.5.3.2 Analysis of Conversational Structure Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 110

3.5.3.3 Analysis of Timing Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 114

3.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 116

3.6.1 Personal Emergency Response Call Trends ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 116

3.6.2 Verbal Ability Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 118

3.6.3 Conversational Structure Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 119

3.6.4 Timing Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 121

3.6.5 Study Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 122

xiv

3.6.6 Future Research ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 122

3.7 Conclusion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 123

CHAPTER4 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐124

4 THECARESCORPUS:ADATABASEOFOLDERADULTACTORSIMULATED

EMERGENCYDIALOGUEFORDEVELOPINGAPERSONALEMERGENCYRESPONSE

SYSTEM‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐124

4.1 Prologue ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 124

4.2 ABSTRACT ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 125

4.3 Introduction ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 125

4.3.1 Background & Motivation ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 127

4.3.1.1 The Traditional PERS Technology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 127

4.3.1.2 Re‐designing the PERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 128

4.3.1.3 The Application ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 128

4.4 Methodology ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 130

4.4.1 Application Context and Target Population ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 130

4.4.2 Speech Corpus Design Specifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 131

4.4.2.1 Live Emergency Calls ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 131

4.4.2.2 Phonetically Balanced Sentences ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133

4.4.2.3 Spontaneous Speech Sample ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133

4.4.2.4 Simulated Vocal Expression ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 133

4.4.3 Participant Recruitment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 134

4.4.4 Recording Procedure ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 135

4.4.4.1 Recording Environment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 139

4.4.4.2 Recording Equipment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 140

4.5 Results‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 141

4.5.1 Participant Recruitment ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 141

4.5.2 Speech Recording Summary ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142

4.6 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142

4.6.1 The Age Effect ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142

4.6.2 Recording Difficulties ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 142

4.6.3 Design Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 143

4.6.4 Background Noise ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144

4.6.5 Implementing the CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144

xv

4.6.6 Other Applications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 144

4.7 Conclusions ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 145

CHAPTER5 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐146

5 DISCUSSION&CONCLUSIONS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐146

5.1 Discussion ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 146

5.2 Study Highlights ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 147

5.2.1 Principal Findings from Study 1: Identification of Keywords and Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 147

5.2.2 Principal Findings from Study 2: Identification of Conversational Trends ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 148

5.2.3 Principle Findings from Study 3: Creating the CARES Corpus. ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 150

5.2.4 Data Interpretation Highlights ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 152

5.2.4.1 Identification of Keywords & Phrases ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 152

5.2.4.2 Statistical Analyses of Conversational Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 153

5.2.4.3 Actor Simulated PESs‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 154

5.3 Contributions to Knowledge ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155

5.3.1 Original Research with Response Call Recordings ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155

5.3.2 Applying Research Findings to the HELPER ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 155

5.3.3 The CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 162

5.4 Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 163

5.4.1 Study 1: Keyword and Phrase Identification ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 163

5.4.2 Study 2: Statistical Measures ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 164

5.4.3 Study 3: Creating the CARES Corpus ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 164

5.4.4 PES and PER Call Classifications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165

5.4.5 Methodology Limitations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165

5.5 Future Research ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165

5.5.1 Supporting the ASR ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 165

5.5.2 Developing the Dialogue ‐ Assessing Patterns in Response Call Conversations ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 166

5.5.3 HELPER Field Testing ‐ Future Proposed Studies ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 168

5.5.3.1 Developing the HELPER Speech Handler ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 168

5.5.3.2 Developing the HELPER Dialogue and Response Handlers ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 170

5.5.3.3 Testing the HELPER Module ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 170

5.6 Implications ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 171

5.7 Final Remarks ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 172

xvi

BIBLIOGRAPHY‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐173

APPENDIXA:COMMONOLDERADULTCONDITIONS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐186

APPENDIXB:ORIGINALHELPERDIALOGUESTRATEGY‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐187

APPENDIXC:SMALLKEYWORDVOCABULARYSET‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐188

APPENDIXD:UNIQUEKEYWORDOCCURRENCES‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐193

APPENDIXE:QUESTIONSFORPARTICIPANT‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐194

APPENDIXF:KEYWORDSANDPHRASESLIST‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐195

APPENDIXG:EMERGENCYSCENARIOS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐200

APPENDIXH:EMERGENCYRESPONSESERVICESVISITS‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐206

APPENDIXI:SUMMARYOFPEERREVIEWEDJOURNALPAPERS ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐209

xvii

List of Tables

Chapter 1

Table 1-1: Physical changes in the older adult and the possible effects on speech expression. ... 22

Table 1-2: Emergency Response Call Discourse and Speech Acts. ............................................. 26

Chapter 2

Table 2-1: Various distinct approaches to content analysis. ......................................................... 44

Table 2-2: The word categories derived from words extracted from response call transcripts. ... 53

Table 2-3: Summary of coding results for Coders 2 and 3 based on keywords and category

matching. ....................................................................................................................................... 54

Table 2-4: Breakdown of the keywords identified from Coder 3. ................................................ 55

Table 2-5: Summary of phrase results for Coder 3 with agreement of keyword selection. ......... 56

Table 2-6: Breakdown of the keywords identified from Coder 3. ................................................ 56

Table 2-7: Initial exclusion criteria for reducing keyword list. .................................................... 61

Table 2-8: Definitions and examples of the word inclusion criteria. ............................................ 65

Table 2-9: Definitions and examples of the final word exclusion criteria. ................................... 65

Table 2-10: A breakdown of the phrase categories included in the CARES corpus selected by

Coder 3, sorted by word categories. .............................................................................................. 68

Table 2-11: The number of keywords identified by response call classification. ........................ 69

xviii

Table 2-12: Example of how an incoming statement might be deciphered by the semantic

analyser ......................................................................................................................................... 74

Chapter 4

Table 4-1: Important aspects of emergency call transcript analysis applied to speech corpus

design specifications. .................................................................................................................. 131

Table 4-2: Summary of speech sample recorded and general recording details. ........................ 135

Table 4-3: Emergency scenario type, risk level and scenario detail. .......................................... 138

Table 4-4: Example of data combination arranged for each participant indicated. .................... 139

Table 4-5: Participants by Age Group and Gender ..................................................................... 141

Table 4-6: Participants by Age Range ........................................................................................ 141

Chapter 5

Table 5-1: Example of how the original HELPER initial dialogue strategy and response call

classifier may work with incoming user responses. .................................................................... 160

Table 5-2: Example of how initial dialogue from a high alert dialogue strategy and response call

classifier may work with incoming user responses. .................................................................... 160

Table 5-3: Example of how initial dialogue from a medium alert dialogue strategy and response

call classifier may work with incoming user responses. OA = Older adult. ............................... 161

xix

List of Figures

Chapter 1

Figure 1-1: Differences between traditional PERS and the HELPER. -------------------------------- 6

Figure 1-2: Comparison of pathway to emergency response between traditional PERS and

HELPER. ------------------------------------------------------------------------------------------------------ 6

Figure 1-3: Pathway to personal emergency response using the traditional push-button PERS. - 11

Figure 1-4: Pathway to personal emergency response using the HELPER System. --------------- 14

Figure 1-5: Main components of the HELPER System. ----------------------------------------------- 15

Figure 1-6: Sub-sections and functional components of the HELPER Communication Module 18

Figure 1-7: The ASR component of the HELPER Communication Module. ---------------------- 18

Figure 1-8: Inside the dialogue handler component of an SDS. -------------------------------------- 19

Figure 1-9: The internal components of the response handler within the SDS. -------------------- 20

Chapter 2

Figure 2-1: Pathway to personal emergency response using the traditional push-button PERS. - 37



Figure 2-4: The ASR component of the HELPER Communication Module. ---------------------- 40

Figure 2-5: Possibly data application areas within the HELPER communication module along the

personal emergence response pathway. ------------------------------------------------------------------ 41

xx

Figure 2-6: Diagram of the process of exploratory sequential mixed methods design (Clark &

Creswell, 2011). -------------------------------------------------------------------------------------------- 42

Figure 2-7: Flow diagram illustrating the methodology followed to analyse the calls and

complete study objectives. -------------------------------------------------------------------------------- 47

Figure 2-8: Process of keyword identification and categorization from Coder 1. ------------------ 49

Figure 2-9: Process of keyword identification and categorization from Coders 2 and 3. --------- 50

Figure 2-10: Examples of differences between risk levels. ------------------------------------------- 58

Figure 2-11: Model of a PES ------------------------------------------------------------------------------ 60

Figure 2-12: Diagram outlining the decision process for selecting key words using the first word

focus set. ----------------------------------------------------------------------------------------------------- 63

Figure 2-13: Diagram outlining the decision process for selecting key words using the second

word focus set. ---------------------------------------------------------------------------------------------- 64

Figure 2-14: A diagram of showing the pathway to personal emergency response including the

PES model and categories within the classifier unit within the HELPER System. ---------------- 72

Chapter 3



Figure 3-3: Inside the Speech Informant sub-component of the Speech Handler. ----------------- 83

Figure 3-4: Inside the dialogue handler component of an SDS. -------------------------------------- 84

Figure 3-5: The internal components of the response handler within the SDS. -------------------- 85

xxi

Figure 3-6: Diagram of the pathway to personal emergency response using the HELPER with the

addition of the ‘conversational measures’ and ‘timing’ features added. ---------------------------- 87

Figure 3-7: Diagram of the process of exploratory sequential mixed methods design (Clark &

Creswell, 2011). -------------------------------------------------------------------------------------------- 88

Figure 3-8: This flow diagram illustrates how calls were analysed and how outcomes were and

could be applied. -------------------------------------------------------------------------------------------- 93

Figure 3-9: The PES model characterized by caller type, risk level, and call reason. ------------- 94

Figure 3-10: The personal emergency response (PER) model. -------------------------------------- 101

Figure 3-11: Older Adult and Care Provider responders requested during a response call. ----- 102

Figure 3-12: Boxplots of verbal ability measures broken down by risk levels for caller and

speaker types. ---------------------------------------------------------------------------------------------- 106

Figure 3-13: Box plots of conversational measures broken down by risk levels for caller and

speaker types. ---------------------------------------------------------------------------------------------- 111

Figure 3-14: Box plots of timing measures broken down by risk levels for caller and speaker

types. -------------------------------------------------------------------------------------------------------- 115

Chapter 4

Figure 4-1: The CARES Corpus application within the context of a PERS. ---------------------- 129

Figure 4-2: Sample screen shot of emergency phrases and words presented to the participant

during speech recording. The participants were provided with screen prompts to indicate how

the word was to be spoken. ------------------------------------------------------------------------------- 137

Figure 4-3. Participant room setup in sound attenuating booth. ------------------------------------- 140

Figure 4-4. Experimenter room setup in sound attenuating booth. ---------------------------------- 140

xxii

Chapter 5

Figure 5-1: Diagram of the internal components of the HELPER Communication Module. ---- 157

Figure 5-2: Diagram showing a possible response call classifier setup based on the study

findings. ----------------------------------------------------------------------------------------------------- 159

Figure 5-3: A flow diagram illustrating the methodology followed to analyse the calls and

complete study objective. --------------------------------------------------------------------------------- 167

Figure 5-4: The pathway to personal emergency response with “dialogue acts” applied to help

the HELPER. ----------------------------------------------------------------------------------------------- 168

xxiii

List of Acronyms

General Acronyms

ASR: automatic speech recognition/

recognizer (p.2)

CARES: Canadian Adult Regular and

Emergency Speech (p.48)

EMS: emergency medical services (p.12)

HELPER: Health Evaluation Logging and

Personal Emergency Response

(System) (p.5)

PER: personal emergency response (p.92)

PERS: personal emergency response

system (p.1)

PES: personal emergency situation (p.2)

SALT: Systematic Analysis of Language

Transcripts (p.46)

SDS: spoken dialogue system (p.17)

Statistical Measure Acronyms

DF: discriminant function (p.108)

MZW: proportion of total words with mazes

(p.104)

NQ: number of questions (p.109)

NRQ: number of responses to questions

(p.109)

NS: number of statements (p.109)

OWU: number of one word utterances

(p.109)

RM_MANOVA: repeated measures,

multivariate analysis of variance

(p.104)

S.D.: standard deviation (p.100)

ST: speaker turns (p.114)

TNL: turn length in words (p.104)

UPM: utterances per minute (p.94)

WPM: words per minute (p.194)

1

Chapter 1

1 Introduction and Literature Review

1.1 Dissertation Introduction

“It is not how old you are, but how you are old.” - Jules Renard

People age differently depending on their environment, access to financial and social supports,

their health, education, and thinking. Although the actual ‘age’ at which one might consider

himself or herself an “older adult” varies, no one, who lives long enough, can avoid becoming

“elderly.” Over the last century since this phrase was written, society’s ability to provide health

and long-term care support for their elderly population has evolved tremendously. However,

despite vast technological advances and societal changes, many elderly still face challenges

obtaining adequate health care especially those individuals with multiple, long-term, chronic, and

complex care problems coupled with mobility difficulties. Consequently, there exists a desire, a

need, and an overall benefit for the elderly to stay healthy as long as possible and living

independently within their homes and communities.

To help support aging-in-place, a number of assistive technologies have been designed and

developed for the older adult population, ranging from aids for general mobility, communication,

and cognition. Under the communication umbrella, assistive technologies called personal

emergency response systems (PERS) were developed to help individuals living at home with

higher medical risk and/or with mobility difficulties to contact emergency assistance when

needed any time of the day or night, typically by pushing a body worn “button” activator.

Despite many noted benefits from using PERS technology, a majority of elderly people resist

PERS adoption and use. Prior research notes that this resistance results from barriers that span

the physical, social, and psychological realms. In an attempt to address these issues and to ‘re-

think’ how PERS technology can be better designed for the elderly cohort, the concept of a

novel, smart home monitoring system called the HELPER was devised. This system incorporates

within it an automatic, artificially intelligent, spoken-dialogue based PERS (herein called the

“automated PERS”). The main premise behind the HELPER is that such a system will be able to

automatically detect adverse events (e.g., a fall) visually or can be directly activated using

2

spoken word(s) (e.g., a cry for help). The HELPER would interact with the end-user as a first

responder and will allow users to contact their desired responder directly (or cancel a call)

without going through an operator. The automatic activation and speech-based method of

communication could also eliminate the need to wear a body worn activator.

Preliminary testing of a HELPER prototype, by prior researchers, with younger adults using

limited vocabulary (e.g., yes/no responses) demonstrated the feasibility of using automatic

speech recognition (ASR) to communicate with a live user during a simulated personal

emergency situation. However, in order to bring this system to a state in which it can be tested

with real end-users in an actual emergency situation, the communication component of the

HELPER requires further design, development, and testing. Information is needed on what

actually happens during a personal emergency situation (PES) and the personal emergency

response call (herein also called the “response call” or “call”). However, no research literature

could be identified that specifically describes or characterizes what happens during a response

call and/or the response call conversation. Furthermore, in terms of training and testing the

HELPER communication module components (e.g., the ASR), there are limited spoken speech

databases available with older adult speakers and none of the ones found contained Canadian

English examples of emergency situations of sufficient recording quality.

It is hypothesized that the knowledge and data required to further the design of the HELPER

system can be obtained from real personal emergency response calls. Therefore, the main goal

and focus of this dissertation was to derive knowledge and data from analyses of real response

calls using the traditional push-button PERS and to identify ways in which this information could

be applied to help further the design and development of the new automated PERS.

1.2 Dissertation Overview

This dissertation contains five chapters based on three research studies. Chapter 1 provides an

introduction to the dissertation and the research problem, presents a review of the literature, and

summarizes the research purpose and objectives. Chapter 2 describes the first study that focuses

on the isolation of keywords and phrases from response calls, the categorization of keywords,

and the development of a model to characterize PESs. Chapter 3 describes the second study that

3

focuses on characterizing and identifying trends in response calls and response call conversations

that may be built into the automated PERS intelligence for predicting a user’s desired response

when calling for help. Chapter 4 describes the third study that focuses on the design and

development of a spoken speech corpus (the CARES corpus) that may be used for HELPER

training and testing. Chapter 5 is the last chapter and presents a summary for each of the three

studies, discusses contributions of the research work, proposes future work, and ends with a final

conclusion. This dissertation is presented in manuscript style. The studies in Chapters 2 and 3 are

being prepared for submission to peer-reviewed journals. The study in Chapter 4 has been

published in a peer-review journal. Due to the nature of the manuscript format, there may be

some overlap in the content presented, specifically in the introduction, background, and

methodology sections of Chapters 2 to 4.

1.3 Introduction to the Problem

The following introduction will present the research rationale and problem in further detail,

summarize relevant literature, and outline the main research purpose and objectives.

1.3.1 Aging-in-Place with Assistive Technology

The mind is what sets humans apart from other animal species. The human mind’s ability to

communicate through language, to reason, imagine, and imitate at a high level is far superior to

that of any other living creature. It is the human mind that has conceived of tools and techniques

to advance technology, medical treatment, social development, and other initiatives in work and

life that have all combined to extend average human life expectancy into the eighties (age in

years) within the last century. It is also this mind that has led to the development of assistive

technologies to both assist and facilitate humans in their everyday lives (Childress, 2003; Mann,

Ottenbacher, Fraas, Tomita, & Granger, 1999; McCreadie & Tinker, 2005). Yet despite its

capabilities, even the mind is not able to prevent the inevitable decline of the human body or

human functioning, either physically and/or cognitively, as a consequence of injury, chronic

disease or advanced age. Surveys show that one of the greatest fears about aging for the older

human is the risk of losing one’s independence as a result of ill health, increasing frailty, and/or

the decline of the mind’s faculties (Disabled Living Foundation, 2009; News Agencies, 2014).

4

The success in extending human life combined with a lower rate of birth (Bernstein, 1999) has

led to a demographic shift resulting in population aging (World Health Organization, 2011).

With a growing aging population, there is mounting concern about how to handle the increasing

size of a potentially higher maintenance, higher risk, and higher cost older adult cohort, at least

in terms of health care provision (Longino, 1994; World Health Organization, 2011). Herein lies

the desire and need to stay healthy for as long as possible and to age-in-place within one’s own

home and community. Research studies show that seniors, those individuals over 65 years of

age, who remain longer in their communities and who ‘age in place’ tend to age more

successfully (World Health Organization, 2011). They live longer and with a higher self-

perceived quality of life compared to those who age “out of place” in institutions such as long-

term care, nursing homes, or hospitals (Ramage-Morin, 2005).

One way to achieve the goal of aging-in-place is with the use of assistive technologies. An

assistive technology has been defined as “any device or system that allows an individual to

perform a task that they would otherwise be unable to do, or increases the ease and safety with

which the task can be performed (Cowan, Turner-Smith, & others, 1999).” In particular, PERS

assistive technology was developed specifically to provide individuals at higher risk for medical

complications with an easy way to communicate their need for emergency assistance any time of

the day or night when home alone. By receiving care quickly, the goal was to prevent or alleviate

the negative effects that could arise when care is received too late (i.e., after a long lie, after a

heart attack or stroke). With the miniaturization of technologies and advances in computational

power, personal emergency response technologies are now melding into the next generation of

‘smart home’ technologies (Hessels, Le Prell, & Mann, 2011). An earlier definition of a smart

home or “smart housing” was “the electronic and computer-controlled integration of many of the

devices within the home (Cowan et al., 1999).” More recently a ‘smart home’ has been defined

as a “residence equipped with technology that enhances the safety of patients at home and

monitors their health conditions” (Chan, Campo, Estève, & Fourniols, 2009; Demiris, Hensel,

Skubic, & Rantz, 2008). These smart home systems would not only provide immediate

emergency assistance shortly after an adverse event but would also include the ability for

continuous home and health monitoring. By continuously monitoring the user within the home,

the need for possible medical intervention prior to any event even occurring may be possible.

5

The ultimate goal would be to prevent a personal emergency situation from happening

altogether.

Researchers and technology developers alike are making a concerted effort to further develop

rehabilitative assistive technologies like the PERS to make them more ‘age-friendly’. The hope

is that these new technologies will be widely adopted by older adults and that they will be

accessible, usable, and effective at helping to keep the older adult population healthy, mobile,

and living independently longer within their homes and communities.

1.3.2 A Novel PERS

In 2004, in the Intelligent Assistive Technology and Systems Lab (IATSL) at the Rehabilitation

Sciences Institute at the University of Toronto, research began on a novel smart home

technology that also integrated an automated PERS, called the HELPER or ‘Health Evaluation

Logging and Personal Emergency Response’ System (hence forth called the HELPER). As a

smart home technology, the HELPER concept was conceived to monitor the home for adverse

events by visually tracking a user’s movements and positions. If an adverse event such as a fall

was detected, the HELPER would attempt to communicate with the user to determine if

assistance was required. Strictly focusing on the PERS aspect of the HELPER, one of the main

limitations of the traditional push-button PERS that the HELPER seeks to overcome is the need

to physically wear a button actuator to initiate a response call. When using the traditional PERS,

not only does the individual have to remember to wear the ‘button,’ but he or she must also

decide to wear the button (Porter, 2005). Failure in wearing the button essentially renders the

system useless during a PES. Not wanting to wear the button or use the system entirely leads to a

failure in technology adoption. Figure 1-1 illustrates the main interface differences between the

traditional PERS (see older adult to the left of the arrow) and the automated PERS within the

HELPER (see images to the right of the arrow). The HELPER unit (an early prototype version)

is shown mounted on the ceiling in the lab.

By harnessing the human’s unique ability to communicate through language, the HELPER

concept plans to use speech as one of two methods a caller can use to initiate a response call.

Using speech activation would eliminate the need to wear a button actuator on the body and as

speech is a natural form of communication, may be seen as more amenable to the older adult user

6

than wearing a button actuator 24 hours a day. The other method of response call initiation

would be automatically through the HELPER’s vision module.

Eliminate body worn activators

Enhance User Control

Older Adult Friendly

Figure 1-1: Differences between traditional PERS and the HELPER. (Credits: older adult image obtained from WWW, drawing from Intelligent Assistive Technology and Systems Lab)

Figure 1-2 illustrates a flow diagram with the traditional PERS pathway on the left (solid purple

arrows) and the HELPER pathway on the right (dotted orange arrows).

Activates call by pushing button

Family, friends, care providers or emergency

services contacted

Caller speaks with a personal emergency call taker at a local call centre

Assistance Arrives

Help neededTraditional

PERS Pathway

False alarm call cancelled

“Hands on”

“Live person”“Live person”

Personal Emergency Situation

Call activated by adverse event detection or spoken

keyword

“Hands off”

Caller communicates with HELPER computer

HELPER Pathway

“Non‐Live person”

Figure 1-2: Comparison of pathway to emergency response between traditional PERS and HELPER.

7

Following the HELPER pathway, if an adverse event is detected, the HELPER’s vision module

would automatically activate the communication module. In contrast to the traditional PERS

pathway, no active initiation is required from the user when an event is detected automatically. If

the individual does not respond to the HELPER verbally when assistance is deemed necessary,

the system default would be to automatically contact a live person for help. In the absence of a

personal first responder the personal emergency response call centre’s call taker (herein called

the “call taker”) is the default. However, if the individual does not want assistance, he or she

would have the autonomy to cancel the call before a live person is contacted. On the other hand,

in a situation where no adverse event has been detected or the individual has changed his/her

mind and has decided that assistance would be beneficial after all, he or she could still initiate a

response call using a simple spoken keyword or phrase.

1.3.3 The First HELPER Prototype

To demonstrate the feasibility of using ASR technology within a PERS, previous researchers

developed and tested a preliminary HELPER prototype on young adults in a controlled

laboratory environment (Hamill, Young, Boger, & Mihailidis, 2009). Details of the study

described in this section are summarized from Hamill et al. (2009; 2005). The ASR used in the

test was limited to the recognition of ‘yes’ and ‘no’ word forms (e.g., yah, nuh). The acoustic

model used consisted of speech samples from male and female adults speaking randomly

generated sequences of words (AN4 from Carnegie Mellon University). The dialogue was

limited to asking close-ended questions with instructions to respond with a “yes” or “no” answer.

See Appendix B for flow diagram of this dialogue. In this preliminary lab test, the

communication module was initiated by fall detection and not via keyword initiation. With the

HELPER prototype mounted on the ceiling of the room, speech input from users was recognized

correctly 79% of the time and the desired responses were correctly identified in all the twelve

cases tested (Hamill et al., 2009). The success in identifying desired responses despite

recognition errors was attributed to the fact that the dialogue required users to confirm their

response. Thus, the system was able to recover from incorrect recognition in the confirmation

stage.

8

1.3.4 Research Rationale and Problem Summary

The researchers of this study recognized the fact that younger adults speaking in a “calm and

casual” controlled laboratory environment were not truly representative of a real PES with older

adults in a home environment. They proposed several recommendations for future studies

including: (1) expansion of the system vocabulary for words other than “yes” and “no” (e.g.,

ambulance, help); (2) ASR training with older adult speech in context; (3) investigation of the

potential usefulness of statistical modelling methods for planning and decision making; (4)

identification of additional dialog states; (5) ‘in context’ system testing with either live or

recorded voices in PESs including older adult speakers; and (6) optimization of microphone

input parameters (Hamill et al., 2009).

Essentially, aside from the last recommendation which is a technology optimization problem, in

order to design and develop a more robust HELPER it is necessary to understand how actual

end-users in PESs will respond during a response call conversation, to identify what their needs

are in different PESs, and to know what types of PESs might arise. Furthermore, in order to

perform ASR training and system testing a database of speech samples would be required,

specifically including older adult speakers in personal emergency type situations.

To date, no research literature could be found that examines how personal emergency response

call conversations unfold in detail. If the technology developer is unaware of how PERS users

respond during PESs and how help is requested, how can the automated PERS be designed and

developed to work well in a real-live situation with actual end-users?

1.3.5 Research Response

This research focuses on the analyses of live personal emergency response calls for the purpose

of deriving knowledge and data that can be applied to the design and development the HELPER

communication module, specifically, the spoken-dialogue and artificial intelligence components.

Knowledge gained from this research has been directly applied to the development of a spoken

speech corpus that could be used to conduct the actual training and testing of these

communication components. These research outcomes will be beneficial for HELPER

9

development but will also contribute to knowledge in the area of personal emergency response

calls and call dialogue.

1.4 Literature Review

This literature review provides the background knowledge required to better understand the

contributions of this dissertation. The review has been divided into three parts. Part I:

summarizes literature relating to PERS technology. An overview of the main health challenges

faced by older adults is presented first, followed by an introduction to the technology basics, a

review of the complexities surrounding PERS usage, and an introduction to the HELPER system.

Part II: focuses on the literature relating to human-to-computer spoken dialogue interactions.

Potential variables affecting ASR recognition accuracy are presented first, followed by a

description of the characteristics of older adult speech expression, and finally a review of the use

of ASRs and SDSs with older adult users. Part III: provides a description of the basic structure of

general emergency response calls (e.g., 911) and the emergency response call handling

procedures.

1.4.1 PART I: PERS Technology

1.4.1.1 Health Challenges for the Older Adult

A great volume of literature is available detailing the many aspects of the aging process as well

as the common medical concerns that ail the elderly. See Appendix A for some common older

adult conditions. This is not surprising since seniors are also the most frequent users of the health

care system with the greatest health care costs being spent on those over 80 years of age (CIHI,

2011, 2013). The two greatest health concerns for the older adult include the onset of chronic

diseases, such as heart disease and kidney disease, and frailty. Weiss (2011) defines frailty as “an

increased vulnerability to advanced and persistent loss of function that, at least initially, only

becomes evident under stress.” With age also come varying degrees of functional limitations

such as difficulties in bending and reaching or stooping, which may ultimately increase the effort

the older adult needs to exert in order to complete instrumental activities of daily living, such as

cleaning, cooking, shopping, and managing household affairs (Cornman, Freedman, & Agree,

2005; Cowan et al., 1999). The increased likelihood of having functional limitations with one or

10

more chronic medical conditions or frailty also places the elderly adult at a higher risk for falling

down. In Canada, in 2008, more than 2500 seniors over the age of 65 were reported to have died

from injuries related to falls with over 78,000 fall-related hospitalizations associated with hip

fractures reported in 2010/2011 (Public Health Agency of Canada, 2014). For seniors 75 years of

age and older, falls have also been identified as one of the leading causes of both hospitalization

and institutionalization (Demiris et al., 2004; M. Johnson, Cusick, & Chang, 2001; Koski,

Luukinen, Laippala, & Kivela, 1996; Public Health Agency of Canada, 2014; G. Williams,

Doughty, Cameron, & Bradley, 1998). When combined with possible cognitive impairment and

the need for multiple medications, seniors are at a high risk for experiencing medical

complications during emergency situations (Gibson & Hayunga, 2006; Hwang & Morrison,

2007; Salvi et al., 2007). Many research studies highlight the importance of providing emergency

assistance as promptly as possible to increase chances for full recovery (Handschu, Poppe,

Rauss, Neundörfer, & Erbguth, 2003; Rosamond et al., 2005). Unfortunately, the older adult may

not immediately recognize the severity of a situation, may wait to ask for assistance, and then

discover he/she is unable to obtain assistance when needed (e.g., when injured and alone) (Fogle

et al., 2008). It is precisely for reasons such as these that personal emergency response

technologies were developed and why it is important that they be designed to be desirable, easy

to use, accessible and robust.

1.4.1.2 Personal Emergency Response Technology

Assistive technologies have been used by humans for rehabilitation for over thousands of years

(Childress, 2003) and research studies show that assistive technologies can help slow the rate of

functional decline, as well as, reduce an older adult’s reliance on and use of institutional and in-

home personal services (i.e. home and personal care attendants) (Freedman, Agree, Martin, &

Cornman, 2006; Mann et al., 1999; Ramage-Morin, 2005). The PERS technology was

established in the early 1970’s and research demonstrates that these systems can help diminish

the possibility of prolonged injury after a fall or medical trauma (e.g., heart attack or stroke)

(Dibner, 1993; Hessels et al., 2011; Maddox, 1992; Patel et al., 2012). PERS usage has also been

found to decrease overall health care costs and ease care provider and user anxiety (Mann,

Belchior, Tomita, & Kemp, 2005; Montgomery, 1993; Roush, Teasdale, Murphy, & Kirk, 1995).

Common names for the PERS include, but are not limited to, community alarms, social alarms,

11

personal triggers, medical alarms, dispersed alarms, or emergency alarms. In North America,

personal emergency response services are typically offered by private companies as opposed to

provincial, state operated or city run local municipal emergency response services (i.e., police,

fire, and ambulance) and are commonly paid for out of pocket by the end-user (Bernstein, 1999;

Hessels et al., 2011).

1.4.1.3 PERS Technology Basics

In terms of how the PERS technology works, a traditional PERS consists of three components

(Dibner, 1993; Hessels et al., 2011): (1) a wireless push-button typically worn on the body

similar to a necklace or watch, (2) a speaker phone base unit located in the subscriber’s residence

such as on a table or shelf in a main room, and (3) a call centre where the live operator (call

taker) receives and handles incoming calls (Mann et al., 2005). The call taker, is defined as the

person designated and trained to answer response calls. He/she is armed with prior knowledge of

the subscriber’s medical history, place of residence, and a list of preferred first responders.

Figure 1-3 illustrates the pathway to an emergency response using the traditional PERS.

1. Personal Emergency Situation

The Traditional Push Button PERS

Speaker Phone or Telephone

Who is calling?Call reason?Situation risk level?Response required?

Push button activator

2. Personal Emergency Call Centre Call Taker

3. Call ResponseEmergency Response Services

Personal Responder(s)No Response (false alarm)

Live Person Live Personspokendialogue

Hands on

Figure 1-3: Pathway to personal emergency response using the traditional push-button PERS.

During a PES (step 1), the subscriber or user initiates a response call by physically pushing their

button actuator. Once activated, a signal is transmitted through the caller’s speakerphone base

unit to the call centre and a call taker responds immediately (step 2). The call taker, is defined as

the person designated and trained to answer the response call. He/she is armed with prior

knowledge of the subscriber’s medical history, place of residence, and a list of preferred first

12

responders. The call taker communicates with the subscriber using the speakerphone or home

telephone (if speakerphone is not working). The call taker typically identifies who is calling and

the reason for the call, assesses the situation or risk level, and finally determines what response

to provide. A response may include contacting personal responders (i.e., family, friends, or care

providers) or emergency response services (i.e., paramedics, police, or fire fighters) (step 3). If

there is no response from the subscriber, either the subscriber’s first responder or another listed

care provider would be contacted and decisions about what to do next progress from there.

Where circumstances warrant, all emergency response services (i.e., police, fire, and/or

ambulance) may even be dispatched to the subscriber’s home (this was discussed on-site during

informal interviews with emergency medical service (EMS) providers and firefighters). In a false

alarm situation where the button was mistakenly pushed, no further action would be required and

the response call would be ended.

1.4.1.4 PERS Use by Older Adults

With respect to human-machine interaction, a considerable amount of research has been

conducted involving PERS technology over the last four decades. These studies include a general

review of the technology over time (Hizer & Hamilton, 1983; Montgomery, 1993), examinations

of the older adult’s lived experience with the PERS (Johnston, Grimmer-Somers, & Sutherland,

2010; Porter, 2003, 2005), the adoption and impact of PERS usage at a personal and social level

(Davies & Mulley, 1993; De San Miguel & Lewin, 2008; Fallis, Silverthorne, Franklin, &

McClement, 2007; Heinbüchner, Hautzinger, Becker, & Pfeiffer, 2010; Mann et al., 2005), as

well as at a medical care or institutional level (Hyer & Rudick, 1994; McWhirter, 1987; Roush et

al., 1995). Overall, despite very high user satisfaction, PERS adoption within the older adult

population is not as pervasive as it could be. Only a small percentage of older adults who could

use a PERS actually own a PERS (Bernstein, 1999; Fallis et al., 2007; Hessels et al., 2011; Hizer

& Hamilton, 1983; Mann et al., 2005; Porter, 2005; Roush et al., 1995). Two main reasons for

failing to adopt the PERS include a lack of perceived need and basic issues with device form and

function (Davies & Mulley, 1993; Hessels et al., 2011; Mann et al., 2005; Porter, 2005). In

redesigning the traditional PERS, it would be essential to consider not only what would be usable

by an older adult but how to make it desirable and accessible for all users (Blythe, Monk, &

13

Doughty, 2005). Piau, Campo, Rumeau, Vellas, & Nourhashemi (2014) asserted, “technological

innovations need to be perceived by the elderly as relevant to their everyday lives to be useful.”

As it stands, many PERS owners do not comply with wearing their buttons 24 hours a day

(Davies & Mulley, 1993; Heinbüchner et al., 2010). In fact, the majority removes their button in

the evenings and during showering when the risk of falls is greatest (De San Miguel & Lewin,

2008; Taylor & Agamanolis, 2010). Furthermore, even when the button is worn, a significant

proportion of older adult users choose not to activate their systems during a situation of need

(Davies & Mulley, 1993; Fleming, Brayne, & others, 2008; Heinbüchner et al., 2010; Hessels et

al., 2011; Levine & Tideiksaar, 1995; Mann et al., 2005; Porter, 2005; Taylor & Agamanolis,

2010). One study examining falls and emergency alarm use found that 80% of persons (113 of

141) who fell while alone did not use their PERS to obtain help, which included 37 of 38 who

remained on the floor for a long time (over 1 hour) (Fleming et al., 2008). Reasons for non-

compliance with wearing the button are both physical, social, and psychological and include

feeling frustrated if the button was too easy or difficult to activate, leading to a high potential for

false alarms, forgetting to wear the button or a lack of perceived need to wear it, a lack of

comfort or attractiveness when wearing the button, the perception that wearing the button labels

one as old or vulnerable, and cost (Blythe et al., 2005; Davies & Mulley, 1993; Heinbüchner et

al., 2010; Hobbs, 1993; Johnston et al., 2010; Porter, 2005; Taylor & Agamanolis, 2010). The

reasons found in the literature for not pushing the button include the subscriber wanting to

manage and solve the problem on their own (e.g., user wants to get up from a fall on their own or

get help using the telephone) (Fleming et al., 2008; Heinbüchner et al., 2010; Porter, 2005), not

wanting to bother anyone (De San Miguel & Lewin, 2008), and a fear of losing one’s

independence if institutionalized (Heinbüchner et al., 2010). On the call centre side, non-

emergency or false alarms calls are frequent and may increase the burden of already stressed

emergency response service providers (Hamill et al., 2009; Mann et al., 2005; McWhirter, 1987;

Taylor & Agamanolis, 2010; Tinker, 1993). McWhirter (1987) reported a false alarm rate as high

as 40% and Hobbs (1993) suggested it may be 90% or more. The need for a better designed

PERS system to adequately support a growing population of older adults has been suggested by

several researchers (Blythe et al., 2005; Davies & Mulley, 1993; Fallis et al., 2007; Porter, 2005;

Taylor & Agamanolis, 2010). This literature clearly demonstrates the challenges that persist with

14

having a system requiring users to wear part of the system, and also the complexities in

designing for the older adult end-user with their natural desire to remain autonomous. In essence,

if the button is not worn or pushed, the PERS is rendered useless.

1.4.1.5 The HELPER System

To overcome these barriers, the HELPER concept, as previously introduced in section 1.3.2, is a

smart home technology designed to help address the current system limitations of the traditional

PERS - to improve upon its design and expand upon its functionality. A diagram of the pathway

to emergency response using the HELPER is presented in Figure 1-4.


The HELPER System

Ceiling/wall/shelf mounted

camera, speaker, microphone

2. HELPER ComputerSpeech or

vision activation

3b. Call ResponseEmergency Response Services

Personal Responder(s)

3a. PERS Call Takerspoken

dialogue

Live PersonWho is calling?Call reason?Situation risk level?Response required?

Is person present?Is person active?Is activity/inactivity normal?Activate communications?

2a. Vision Module

2b. Communication Module

No Response (false alarm)

Live Person

Hands free

Figure 1-4: Pathway to personal emergency response using the HELPER System.

In the event of a PES (step 1), the HELPER computer would identify the adverse event (step 2)

either automatically via a computer vision based sensing module (step 2a) which tracks the user

with a camera or via a communication module (step 2b) that recognizes speech input from the

user with a microphone. If the vision module first identifies the event, it would activate the

communication module and an attempt would be made to converse with the user using spoken

dialogue (speaker - microphone combo). By ‘conversing’ with the user, the HELPER would then

attempt to identify the user’s desired response and proceed to contact that responder. Similar to

the traditional PERS, the possible responders would include personal responders and/or

15

emergency response services (step 3b). Additionally, the user may also choose to be connected

with a call taker (step 3a), which would also be the default condition if the HELPER computer

was unable to determine what the user wants. If no response is desired, the HELPER would end

the call. Unlike with the traditional PERS, because the user interacts with a non-live HELPER

computer initially, his/her autonomy is maintained in that he/she can choose when to talk to a

live person and what type of response to initiate. In a sense, the HELPER would function like a

hands-free telephone but with select options and a safety default to the live call taker should

something go wrong.

Figure 1-5 illustrates the HELPER’s four main technology components: (1) the camera, (2)

microphone, (3) speaker, and (4) HELPER computer.

Personal Emergency Situation

SpokenDialogue

Incoming Speech

How can I help you?

1. Camera

2. Mic

3. Speaker

Incoming Image

a. Communication Module

Speech Handler

What does caller want?

Respond to Caller & Get Help

Response Handler

Dialogue Handler

How to respond to caller

Extract Image

b. Vision Module

Identify user location | activity

UserTracking

Image Analysis

Is activity normal?

4. The HELPER System (Computer)

AssessHealth

Older Adult

Hello? Anyone?

Contact Responder

Figure 1-5: Main components of the HELPER System.

Within the HELPER computer, two main function modules are shown: (a) the Communication

Module and (b) the Vision Module. Ideally, the camera, microphone and speaker components

would be mounted on the ceiling or wall or situated in a spot with an optimal camera viewing

angle, microphone input range, and speaker output range. See Figure 1-1 for an example setup.

Inside the HELPER computer, both the communication and vision modules are actively working

to collect speech input and user images. Inside the vision module, the user tracking extracts the

incoming images, identifies the user location and tracks user activity over time. It then assesses

16

the images to determine whether an abnormal event has occurred. If an abnormal event is

detected, the communication module is activated. The Dialogue Handler is initiated and must

determine how to respond to the call. For example, if the call has just been initiated, the Dialogue

Handler will determine that the “opening” dialogue must be spoken. In this case, an opening

greeting might be, “do you need help?” Incoming speech from the user is received by the Speech

Handler sub-component of the communication module. In the Speech Handler the potential

words or phrases spoken by the user are identified and the possible meaning of the response is

determined. The proposed user response results are then sent to the Dialogue Handler to continue

the next conversational turn. In this manner, the human-to-machine conversation proceeds until

the appropriate responder has been identified and contacted by the Response Handler.

The research described in this dissertation will be concerned solely with the communication

module of the HELPER system.

1.4.1.6 The HELPER Spoken Dialogue System

Now that a proposed novel technological solution exists, how feasible is it to use computers to

recognize speech in this context? Research studies have shown that older adults are receptive to

using speech to interact with assistive home technologies (Anderson et al., 1999) including

PERS (J. L. Johnson, Davenport, & Mann, 2007; Portet, Vacher, Golanski, Roux, & Meillon,

2013; Taylor & Agamanolis, 2010). Using speech to activate a PERS would remove the need to

wear the button activator on the body, which would address the present traditional PERS

compliance issue with the push-button and in theory, may reduce the number of accidental calls.

The ability for the HELPER to function as an intermediary between the user and the eventual

live responder is an attempt to improve upon the functionality of the traditional PERS by

maintaining the user’s autonomy in deciding who to call for assistance and when to talk to a live

person. These new system features may hypothetically increase technology adoption and use.

The ability of the HELPER computer to communicate with a human user “verbally” over several

speaker-turns places its communication module into a category of interactive dialogue systems

called a spoken dialogue system (SDS) (Fraser, 1997). According to Mӧller (2005) a SDS is

characterized by its ability to accept continuous speech, allow for user initiatives, to reason,

detect errors or incoherence, to correct, anticipate, and/or predict the spoken user response. A

17

SDS is typically comprised of at least five functional components (Georgila, Wolters, Moore, &

Logie, 2010; Lamel, Minker, & Paroubek, 2000; Mӧller, 2005):

(1) The ASR - receives an acoustic signal (spoken input) and transforms this into a most

probable word sequence;

(2) The Semantic Analyser or Natural Language Understanding component - deciphers the

meaning or intention of the probable word sequence;

(3) The Dialogue Manager – maintains the dialogue and keeps a history of responses;

(4) The Response Generation component – determines the output dialogue according to “the

dialog state, the user utterance, and/or information returned from the database” (Lamel et

al., 2000);

(5) The Speech Synthesis component – converts selected system utterances to actual speech

output.

According to the best practice guidelines for spoken language dialogue systems and components

produced by the DISC European project, the six essential aspects of SDS development include:

speech recognition, language understanding and generation, dialogue management, speech

synthesis, human factors, and systems integration (Lamel et al., 2000). However, currently,

although all SDSs include an ASR component and some form of speech synthesis or output, the

presence of (2) the Semantic Analyser, (3) the Dialogue Manager and (4) the Response

Generation components ranges from not present, to limited in nature, to fully present and

possibly complex (Furui, 2003; Lamel et al., 2000; Vipperla, Wolters, Georgila, & Renals,

2009).

In the HELPER communication module, it is proposed that all the basic functional components

of the SDS be present to follow the DISC recommendations, in addition to a component for

contacting the live responder, conveniently called the “call responder” component. Figure 1-6

illustrates the proposed internal sub-components of the HELPER communication module with

the Call Responder component at the top.

18

The HELPER Communication Module

Incoming Speech

Spoken Output

Speech Handler

Response Handler

Dialogue Handler

Responder On Route

Automatic Speech Recognizer (ASR)

Dialogue Manager

Response Generation

Speech Synthesis

Speech Informant

Call Responder

Figure 1-6: Sub-sections and functional components of the HELPER Communication Module

The Semantic Analyser or Natural Language Understanding component of the SDS would be

included inside the Speech Informant component (located above the ASR) in Figure 1-6.

Taking a closer look at how the ASR component functions, Figure 1-7 illustrates the typical

internal structure of an ASR system. This diagram was derived from (Glass & Zue, 2003;

Jurafsky, 2014).

A/D Conversion &Feature Extraction

Decoder Linguistic Models1. Acoustic2. Pronunciation3. Language

Automatic Speech

Recognizer (ASR)

Incoming Speech from User

To ‘Speech Informant’ Component

Figure 1-7: The ASR component of the HELPER Communication Module.

Starting at the bottom left, incoming speech from the user (the acoustic waveform) arrives

through the microphone and is digitized and processed into “numerical representations of speech

information or features” that describe relevant characteristics of the speech signal for ASR

19

(Scharenborg, 2007). These features are then sent to the Decoder which attempts to decode the

speech signal or recognize what was said by searching through (1) a pre-assembled collection of

speech sound1 representations within the acoustic model, and (2) following specific

pronunciation rules in the pronunciation model (lexicon), and (3) grammar and language rules

in the language model (3) to identify a “best match” (Scharenborg, 2007).

The ASR output is then sent to the Speech Informant component (see Figure 1-6) where the

proposed “best match” utterance is processed. Within the Speech Informant component, an

attempt is made to help the computer “understand” the meaning of the utterance. From this point

on, processed speech from the Speech Informant is sent to the Dialogue Handler as illustrated in

Figure 1-6 with the breakdown in Figure 1-8. Inside the Dialogue Handler, the dialogue

controller looks at the dialogue history, the current dialogue set and dialog state and determines

how next to respond to what the user said. Once the Dialogue Manager decides how to proceed,

the Response Handler is then activated where a response can be generated, or a call to the

responder can be made.

DialogueHandler

Dialogue Manager

Dialogue History

Dialogue StateDialogue Control

Dialogue Set

From ‘Speech Informant’ Component

To Call Responder

To Response Generation

Figure 1-8: Inside the dialogue handler component of an SDS.

The Response Handler is illustrated in Figure 1-9.

1 The speech sounds are usually sub-word units such as phones, the smallest unit of sound of a language (Gold & Morgan, 2000; Jurafsky & Martin, 2009).

20

Spoken Output (to speakers)

ResponseHandler

Database of Dialogue Text

Select Response

Responder On Route

Response Generation

Speech Synthesis

Speech OutputDatabase of

Spoken Dialogue

Responder Information

Response Request History

Call Responder (Initiate/Confirm)

Call Responder

From Dialogue Manager

Figure 1-9: The internal components of the response handler within the SDS.

Aspects of the diagram are derived from (Mӧller, 2005). Inside the Response Generation

component, a database of possible dialogue responses (text) is searched for the response

requested by Dialogue Manager. This response is then sent to the ‘Speech Synthesis’ component,

which searches a database for the desired spoken dialogue units, synthesizes the text to speech if

necessary (pre-recordings of output dialogue may be used), and sends the response out to the

user through a speaker system. If the Call Responder component is activated, the Call Responder

might check for a preferred responder or look through a history of requests to inform the

Dialogue Manager if any further query is required. Once a desired responder is confirmed, the

call to the desired responder is initiated.

Given this background knowledge of how the HELPER SDS should function in theory, the next

part of the literature review will focus on how these technologies function in the real-world.

1.4.2 PART II: Human to Computer Spoken Dialogue Interactions

1.4.2.1 Variables Affecting ASR Recognition Accuracy

The ability to simulate or replicate the human’s ability to recognize and understand speech using

technology has been a growing area of research for over 60 years (Anusuya & Katti, 2009; Gold

& Morgan, 2000). Although considerable progress has been made in the field of ASR, a human’s

21

capacity for speech recognition and understanding in a range of environments is still unmatched

and is superior to that of any machine (Dusan & Rabiner, 2005; Furui, 2003; Scharenborg,

2007). A major source of ASR error arises from a mismatch between the speech sounds used to

train the acoustic models and the actual incoming spoken speech to be recognized (Furui, 2003;

King, 2006). There are many reasons why this mismatch occurs. King (2006) provides a nice

summary of these potential sources of speech variation ultimately affecting ASR recognition

accuracy:

(1) inherent speaker variability: even with the same speaker, a speech sound is not repeated

in exactly the same way twice;

(2) speech production and rate: speech variation is best quantified by the rate of speaking

(e.g., the speed of speech output) and ‘speech production processes’ such as how speech

is spoken. For example, read, planned, and spontaneous speech, such as reading a

newspaper, presenting prepared lecture notes, or ‘everyday’ talk in conversations

respectively, are all acoustically and linguistically different from each other (Furui, 2003;

King, 2006).

(3) human machine adaptation: humans have been shown to adapt their speech (simplify and

reduce speed) when speaking with a machine;

(4) out-of-vocabulary sounds: disfluencies in spoken output such as word fragments,

repeated words/phrases, repairs, and similar phenomena can lower recognition rates.

Scharenborg (2007) also adds, with continuous speech recognition (i.e., recognition of

many incoming words at once) the common ASR system will always propose a possible

output based on its vocabulary. This means that the ASR system lacks the ability to

identify out-of-vocabulary words or non-words;

(5) overlapping conversation: overlapping speech results in signal mixing which can also

reduce recognition rates;

(6) microphone considerations: variation may result when using different microphones for

recording speech samples for ASR training and picking up incoming speech during

testing. Microphone positioning and distance from the speech source also affects the

speech waveform (Jurafsky & Martin, 2009);

22

(7) background noise and reverberant environments: the incoming speech signal may be

masked or degraded by background noise or interference within reverberant

environments making recognition even more challenging.

1.4.2.2 The Older Adult Voice and ASR

Taking a closer look at ASR use by older adults, Lippmann (1997) asserted that the

characteristics of the naturally aged voice are less easy to recognize by commercial ASR systems

that are typically designed for a non-disordered, specific accent, younger adult age group. Other

research studies show that in an emergency or stressful situation human speech may become

altered, if not already, to the point of impairment or disorder, either as a result of a medical

trauma, disease or strong emotion (Devillers & Vidrascu, 2007; Fogle et al., 2008; Handschu et

al., 2003; LaPointe, 1994; Patil & Hansen, 2007)(p.359).

Research literature suggests that age-related voice deterioration may start around the age of 60,

but the degree of deterioration is highly dependent on the individual’s health and well-being

(Ramig, 1994) (p.494). The physical changes associated with natural aging can affect the older

adult’s ability to express speech. Table 1-1 summarizes how certain physical changes can effect

speech expression (Gorham-Rowan & Laures-Gore, 2006; D. Hall & Sinard, 1998; Linville,

2002; Zraick, Gregg, & Whitehouse, 2006). These changes can ultimately alter speech acoustics

and an ASR’s overall ability to accurately recognize the speech.

Table 1-1: Physical changes in the older adult and the possible effects on speech expression.

Physical changes Effect

increased respiration frequency intra-word pauses

decreased muscle efficiency, increased tissue stiffness and a dry laryngeal mucosa could affect vocal tract resonance, phonation and speech articulation

changes in fundamental frequency or pitch

articulation imprecision

(e.g., longer voice-onset time, longer duration of vowels and consonants)

increased voice perturbations

(e.g., tremor, spectral noise, hoarseness)

decreased voice intensity

slower cognitive function slower pace

23

Wilpon & Jacobsen (1996) found that the accuracy of their ASR system, which was trained with

adult speech, was reduced when recognizing older adult speech over 70 years of age. Studies by

Anderson et al. (1999) and Baba, Yoshizawa, Yamada, Lee, & Shikano (2004) further showed

that an ASR acoustic model trained using older adult speech was better able to recognize older

adult voices than an acoustic model trained with only younger adult speech. These study findings

were also supported by Vipperla et al. (2009) who examined speech recognition accuracy using

an ASR acoustic model trained with in context speech from younger adults versus in context

speech from older adults. They found that the ASR word error rate dropped for both younger and

older adult users when using the ASR with acoustic models trained with similarly age-matched

speech samples, The word error rate (WER) dropped from roughly 33% (baseline) to 25% for the

older adult users and from roughly 22% to 11% for younger adult users. These studies clearly

show that ASR recognition accuracy can be improved if the ASR acoustic model is trained with

the same type of speech and in a similar context to the type of speech it would expect to receive

when implemented in the real-world.

In terms of error recovery, Takahashi, Morimoto, Maeda, & Tsuruta (2003) asserted that because

“current speech recognition technology is far from perfect and cannot completely avoid the

recognition errors, many researchers try to develop robust system[s] which can detect and

recover from the system’s misunderstanding.” Furui (2003) and Vipperla et al. (2009) also

support this statement. As mentioned previously, the requirement that users confirm their

responses within the conversation, in essence, enables the system to recover from recognition

errors. However, Hamill et al. (2009) noted that the probability of having two errors occur in a

row was high, therefore, the system still needs to be made more robust.

1.4.2.3 The Older Adult User and Spoken Dialogue Systems

Although, the volume of research literature that discusses the use of SDSs with older adult users

is small, the outcome is consistent. Research studies show that older adult users exhibit definite

patterns of interaction and linguistic variability with SDSs which are different from their younger

counterparts (Georgila et al., 2008; Wolters et al., 2010; Wolters, Georgila, Moore, &

MacPherson, 2009). For this reason, it is important to consider how older adults interact with all

components of the SDS and incorporate those features that will facilitate their ease of use with

24

these end-users (Wolters et al., 2010; Zajicek, Wales, & Lee, 2004). Research conducted by

Wolters et al. (2009) identified two types of SDS user groups. Factual users were those who

adapted to the system and who used a concise communication style (only necessary keywords)

with fairly uniform behavior. Social users were those who treated the system similar to a human

being and who did not adapt their interaction style. The social users were characterized “by more

interpersonal communication, higher verbosity [had longer dialogues], and greater variability

between users” (Wolters et al., 2009). Younger aged adults were found to be mainly factual

whereas just over one third of older adults were factual and the other less-than two-thirds were

social (Wolters et al., 2009). Georgila, Wolters, & Moore (2010) found that the dialogue of older

adult users showed more initiative and repetition of information than their young users. They

used a richer vocabulary and were more social (Georgila et al., 2008), which was expressed in

their speech by using more "definite articles, more auxiliaries, more first person pronouns, and

most importantly, more lexical items related to social interaction, such as 'please' and 'thank

you'” (Mӧller, Gӧdde, & Wolters, 2008). Compared to younger adults, Wolters et al., (2010)

found that the communication style and interaction of older adults were affected by age,

speaking, cognition, hearing, language comprehension, language production, short-term memory,

and affinity to the technology. In essence, these studies underline the importance of designing

and testing the HELPER communication dialogue system with the older adult user keeping in

mind the actual environment in which the system will be used.

1.4.2.4 Spoken Dialogue Strategy

The dialogue strategy used within a SDS can be divided into three main types: (1) system-

initiative, (2) mixed-initiative, and (3) user-initiative. A system-initiative SDS is one in which

the system’s communication or dialogue script is followed with no user deviation, but this may

lead to “long and tedious interactions and generally unnatural dialogues” (Wolters et al., 2009).

A mixed-initiative SDS allows for shared initiative taking with the system expecting the user to

respond to prompts but more information can be provided than what was requested (i.e., over-

answering allowed). Lastly, in a user-initiative system, the user can change the dialogue structure

freely; however, Wolters et al. (2009) stated that “given the limitations of current ASR and

Natural Language Understanding, user-initiative often leads to many errors and

misunderstandings throughout the interaction.” In this study, the social users were found to be

25

“less efficient and less satisfied” with the system-initiative SDS that they interacted with

(Wolters et al., 2009). Therefore, Wolters et al. (2009) suggested a mixed-initiative system may

be better to incorporate this group of users. However, the tradeoff of using a mixed-initiative

approach would be the need to increase the complexity of the ASR and Natural Language

Understanding processes as well as add better error recovery techniques, and the increased risk

of task failures (Wolters et al., 2009). Although older adults were less easy to stereotype, it was

shown that they are able to learn how to speak to a system if help is given when errors are

encountered (Mӧller et al., 2008). In the HELPER prototype, a system-initiative dialogue

strategy is currently used. However, in an emergency situation, if the user utters words other than

‘yes’ and ‘no’ it would be important to use at least a mixed-initiative dialogue strategy to allow

the ability to accept non yes/no words that may have significant importance, such as “I need an

ambulance.”

1.4.3 PART III: Human to Human Emergency Dialogues

1.4.3.1 Emergency Response Call Basics

Armed with some knowledge of how older adults interact with SDSs and the challenges of using

ASR in the real-world, the literature review will now shift focus to look at what is known about

emergency response dialogue. Personal emergency response calls could be considered as a

subset of the general (i.e., 911) emergency response call. Where the emergency response call is

concerned primarily with the ‘where, who, what, why, how, and when’ of a situation (Imbens-

Bailey, 2000), the personal emergency response call is mostly concerned with the ‘what, why,

and how’. Call takers already have access to information on the ‘who’ and ‘where’ and the

‘when’ is assumed to be ‘now’. Although there is a vast amount of research literature

surrounding emergency situations and emergency medicine (there are entire journals dedicated to

these topics), only a small number of research literature was identified that specifically examined

emergency response call conversation organization, structure, and call handling (Cromdal,

Osvaldsson, & Persson-Thunqvist, 2008; Garner & Johnson, 2007, 2007; Imbens-Bailey, 2000;

Waseem, Durrani, & Naseer, 2010; Whalen & Zimmerman, 1987). No research literature has yet

been found through the university library and internet searches that pertain specifically to the

conversations of personal emergency response call’s with older adults, their organization,

26

structure, and/or call handling methods. However, a summary of personal emergency response

call protocol was obtained from the private call centre who provided the recorded calls used in

this research.

1.4.3.2 Emergency Response Call Structure

Knowing the structure of an emergency response call will be helpful in identifying potential

differences with respect personal emergency response calls. The emergency response call

generally follows a basic pattern or call sequence including: (1) an opening sequence

(identification/ greeting/acknowledgement), (2) a request sequence (basic information exchanged

about why caller is requesting aid) (3) interrogative series (dispatcher elicits further information

as required), (4) a response (offer/deny a response to the request or complaint), and (5) a closing

(dispatcher may assure caller that help is on the way) (Imbens-Bailey, 2000; Whalen &

Zimmerman, 1987; Zimmerman, 1992a, 1992b). Zimmerman (1992b) commented that the ER

call sequences presented could be “modified, augmented, and used repetitively or not at all”

depending on the situation. Imbens-Bailey (2000) further identified speech acts within the

discourse. The speech acts are labeled “SA#” in Table 1-2.

Table 1-2: Emergency Response Call Discourse and Speech Acts.

ER Caller. ER Dispatcher Opening (Greeting/Acknowledgement/Identify) Reason for Call Report problem – SA1: descriptive Request (SA2: direct - demand/ SA3: indirect) Ambient (no speaking) Closing

Opening (Greeting/Acknowledgement) SA1: Compliance to need SA2: Acknowledge/Confirmation SA3: Elicit further information SA4: No Response Closing

Using the call centre prototype manual as a guide (Private_PERS_Call_Centre, 2008), the

personal emergency response call structure appears to follow fairly close to that of the

emergency response call sequence, with a few exceptions. Most notably, in the case of the

emergency response call, the caller risks a chance of being denied assistance, whereas, in the

personal emergency response call, assistance is always provided unless the caller denies that they

need help. Furthermore, the call taker must obtain consent from the caller to dispatch aid and

allowance is given for the caller to choose from different responders. The basic call protocol for

the personal emergency response call taker is outlined as follows:

27

Step 1: Greet the subscriber or caller – a template structure is provided

Step 2: Identify yourself and confirm needs

Step 3: Get consent from subscriber or caller to dispatch

Step 4: Dispatch EMS or contact appropriate help

Step 5: Reassure the subscriber or call

Step 6: Follow up and follow through with the alarm

Step 7: Reset the unit

Step 8: Close the alarm

From this protocol, only steps 1 to 3 will be considered within this dissertation. In terms of step

1, a common opening is used by all call takers. The opening script is: “Hello {Subscriber Name},

this is {Call Taker Name} from {Call Centre Name}, how may I help you?” The call taker’s

ultimate goal is to dispatch appropriate assistance. The protocol instructs them to: (1) assess the

situation, (2) determine if help is needed, (3) get permission from subscriber to place on hold,

and (4) call for appropriate help. To carry out these instructions, the guidelines for conversation

include the call takers asking questions to elicit a positive or negative response, to repeat back

the exact words used by the caller, and to probe to “establish the nature of the emergency and the

assistance required.”

In terms of the medical aspect of the response call, the call takers are given some basic

information. The following information in this paragraph is taken from

(Private_PERS_Call_Centre, 2008). The manual describes a “medical distress” as being the state

in which the subscriber is experiencing the following symptoms: severe chest pains, suffering

from a stroke, difficulty breathing freely due to lack of oxygen, suffering from a seizure attack,

hemorrhaging (bleeding), suffering from insulin shock, or having an allergic reaction to

medication. To identify if the caller is in medical distress, the call taker’s suggested symptoms to

identify include: difficulty breathing, chest pains, excessive bleeding, nausea/vomiting, and other

pains, discomfort or weakness. If the call is a fall or an injury has occurred, the subscriber is

instructed to determine the cause of the fall and when it occurred, how the caller fell (e.g., down

stairs or off bed), any injuries (e.g., broken limbs), if the caller is fully awake, having difficulty

28

breathing, or is bleeding. The main conditions caller takers are to report to EMS Responders

(e.g., 911) include whether the PERS user or subscriber is conscious and alert, breathing or

having difficulty breathing, and if he/she is bleeding severely. This information is aligned with

the “ABC’s” of emergency response: the mnemonic used by emergency responders to assess

perceived patient acuity (Canadian_Red_Cross_Association, 2006). The ‘ABC’s’ stand for

“airway” (is airway clear for breathing and person conscious?), “breathing” (is person breathing?

Can talk?), and “circulation” (any injury to circulation system or signs of shock)

(Canadian_Red_Cross_Association, 2006).

1.4.4 Literature Review Summary

In summary, despite the demonstrated benefits gained from using PERS technology, many

barriers to technology adoption and use exist which can only be addressed by re-designing how

the PERS is used as well as re-thinking how PERS technology can be applied and made more

desirable for end-users. The HELPER is one proposed solution. One of the difficulties in further

developing the communication capability of the HELPER is the need to design the system for

actual end users in real PESs. This means ensuring that the HELPER’s communication module

includes a Speech Handler component that can receive incoming adult and older adult speech

during a PES, then process, decode, and decipher it’s meaning; a Dialogue Handler component

that can coordinate the personal emergency response conversation with the PERS user; and

finally a Response Handler component that can output the necessary dialogue response or

contact an emergency responder as required. Whether it is possible to carry out a spoken

dialogue conversation sufficiently well in actual PESs with real users remains to be seen.

However, in order for the design and development of the HELPER to proceed any further, there

is a need for specific knowledge and tools that currently do not exist. Specifically, there is a gap

in the PERS literature that characterizes personal emergency response calls and call

conversations in detail. Furthermore, a suitable training and testing database containing older

adult speech in PESs speaking in Canadian English could not be identified that could be used for

training the HELPER ASR and other SDS components.

29

1.5 Research Purpose and Objectives

This research begins to address these gaps in knowledge and a spoken database tool has been

developed that can be used to advance the HELPER development. The main information source

used for this research was a collection of real personal emergency response calls attained from a

private personal emergency response call centre. Generally speaking, the overall goal was to

derive knowledge and data from analyses of the acquired response calls and to identify ways to

apply the research findings to help further the design and development of the HELPER

communication module. These response calls were analysed at various levels including the word,

speaker turn, conversation, and call levels. Three research objectives were identified:

Objective 1: to identify keywords and phrases used by existing PERS users in various

personal emergency response call situations.

Objective 2: to identify significant trends in personal emergency response calls and call

conversations that may be used to tailor the call response to the user.

Objective 3: to design and develop a corpus of spoken speech to be used for training

and testing the communication module of the HELPER system.

The first objective was the focus of Study 1. In order to improve the HELPER communication

module’s ability to understand the end-user speaking words other than “yes” and “no” (and their

various forms), the recommendation from previous researchers was to expand the system

vocabulary. Research by Takahashi et al. (2003) (Vipperla et al., 2009) also supports the fact that

simple “yes” and “no” responses are not the only answers spoken by patients responding to

close-ended yes/no type questions (in their case, medical type questions). One method for

identifying keyword vocabulary would be to perform an analysis of response call conversations

in order to identify keywords and word combinations spoken by PERS users during various

response calls and PESs. In addition to identifying the keywords, a method for grouping the

keywords and characterizing the different PESs was also important for determining which of the

keywords are spoken during different PESs. The main research outcome from Study 1 can be

applied to improving the HELPER’s speech handler.

30

The second objective was the focus of Study 2. In order to improve the HELPER communication

module’s intelligence, especially in dialogue planning and decision making, it was important to

focus on the main goal of the HELPER which is to provide an appropriate emergency response

to the end-user as quickly as possible. In order to achieve this goal, it would be important to

know when different types of responses (targets) are requested, what kind of dialogue should be

used to respond to the user, and how much time the system has to respond to a call. To facilitate

the HELPER’s decision making ability, it was also important to identify what decisions could be

made based on the incoming speech. In addition to recognizing spoken words, knowledge of

conversational patterns and call statistics may prove useful in helping the HELPER manage and

structure the call dialogue or even foreshadow the probable target response. For example, if all

fall calls were found to result in a request for a care provider, then the HELPER dialogue could

be designed to automatically suggest contacting a care provider responder for all identified fall

calls. The main research outcome from Study 2 could be applied to improving the HELPER’s

artificial intelligence (including decision making and dialogue management).

The third and final objective was the focus of Study 3. In terms of improving the HELPER

communication module’s ability to robustly recognize speech from real end-users, prior research

suggested that training the ASR system with closely matched end-user speech in similar

situations could improve word recognition rates. Improving the system’s natural language

understanding would also help in correctly understanding user utterances. As well, in-context

system testing with end-user voices in PESs would be beneficial for testing and fine tuning the

HELPER communication module. In terms of obtaining speech samples of end-users, especially

older adults in PESs, ethically, it would not be feasible to create emergency response situations.

As well, these situations would be difficult to predict in advance and then record live while

remaining an “uninvolved” bystander. However, simulated or enacted emergency situations

would be possible. In order to realistically recreate a response call scenario prior knowledge

about what actually happens and how speakers converse during a response call conversation are

needed. As previously mentioned, no research literature could be identified detailing this

information for personal emergency response calls. A review of the research literature and

existing speech corpuses was also not successful in uncovering any speech corpuses suitable for

ASR training that contained older adult type speech in PESs in Canadian English. Other types of

31

databases were available with older adult users, such as the MATCH corpus described by

(Georgila, Wolters, Moore, et al., 2010) but this corpus does not contain speech in PESs. The

acquired response calls from the personal emergency call centre were also not of sufficient

recording quality to use for ASR training. Additionally, obtaining consent to use the caller’s

recorded voices would be difficult. Likewise, privately held databases were not readily

accessible and we were unsure whether these collections contained appropriate content or sound

quality for testing the HELPER. Thus the decision was made to design and construct our own

corpus containing older adults speaking in mock emergency situations based on the real response

calls acquired. The main research outcome from Study 3 was a spoken corpus tool that can be

used to train and test components of the HELPER communication module.

32

Chapter 2

2 Identification of Keywords and Phrases Spoken by Callers in Personal Emergency Response Calls

2.1 Prologue

This chapter describes the process of analyzing personal emergency response calls in order to

isolate keywords and phrases used by PERS callers, categorize keywords by word function, and

develop a way to model personal emergency situations. The process of reducing the original

keyword set to a smaller set for inclusion into the CARES corpus is also described. This study

uses both qualitative and quantitative methods to explore the real call data. The contents of this

chapter are intended for publication but have not yet been published.

2.2 Abstract

Purpose: A novel automated, intelligent, spoken dialogue-based personal emergency response

system concept is being developed in an attempt to address the existing usability barriers

identified by prior research groups of traditional push-button type personal emergency response

systems. The main purpose of this study is to identify the keywords and phrases used during

various personal emergency response call situations in order to help, in future, tailor the spoken

dialogue system of an automated personal emergency response system to the end-user.

Method: An emergent, exploratory, sequential mixed methods design was used for this study

with word categories and response call classifications identified qualitatively and keywords and

phrases identified quantitatively using content analysis of personal emergency response calls.

Results: 18 word categories, 402 keywords, and 135 phrases from 84 personal emergency

response calls were identified in this study. The personal emergency response situations were

classified according to three categories: caller type, risk level, and call reason. The keyword list

was selectively reduced to 185 keywords and phrases for inclusion into a spoken speech

database. Using the reduced keyword list and the risk level classification, common and unique

keywords were identified for low, medium and high risk personal emergency situations.

33

Conclusion: The results of this study can be used to improve the spoken-dialogue component of

the novel automated personal emergency response system by expanding the system’s automatic

speech recognition capability with keyword vocabulary; by improving the system’s ability to

understand incoming speech using keyword categories; and by enhancing the system’s ability to

classify a call based on pre-identified patterns or trends in keyword usage during different

personal emergency situations. This work will contribute to the future development of the

automated personal emergency response system’s speech handler and provides further

knowledge about the characteristics of actual personal emergency response calls and call

conversations.

2.3 Introduction

2.3.1 Need for a New PERS

Research studies have found that older adults, or individuals 65 years of age and older, who

remain living longer in their communities or who ‘age-in-place,’ tend to age more successfully

(World Health Organization, 2011). They live longer and with a higher self-perceived quality of

life compared to those who age “out-of-place” in institutions such as long-term care, nursing

homes, or hospitals (Ramage-Morin, 2005). One specific assistive technology being used to

facilitate aging-in-place is the personal emergency response system or PERS. The PERS was

developed in the early 1970’s and was designed to provide individuals at higher risk for medical

complications and/or with mobility difficulties, quick access to emergency assistance any time of

the day or night at the push of a body worn button activator (Dibner, 1993). By providing access

to emergency care when needed, the PERS technology can be used to prevent or alleviate the

negative consequences that may arise when care is received too late (i.e., after a long lie, after a

heart attack or stroke). In addition PERS use has also been shown to decrease overall health care

costs and ease care provider and user anxiety (Mann et al., 2005; Montgomery, 1993; Roush et

al., 1995). However, despite the many benefits of using a PERS, only a small percentage of older

adults have adopted the technology and actually use it when needed (Bernstein, 1999; Fallis et

al., 2007; Hessels et al., 2011; Hizer & Hamilton, 1983; Mann et al., 2005; Porter, 2005; Roush

et al., 1995). Research studies have attributed the reasons for resistance to PERS use to barriers

spanning the physical, social, and psychological realms (Davies & Mulley, 1993; Hessels et al.,

34

2011; Mann et al., 2005; Porter, 2005). In light of these findings, researchers have concluded that

there is a need for a better designed PERS; one that is more tailored to the needs of the older

adult and overall, more desirable and accessible for all end-users (Blythe et al., 2005).

2.3.2 The HELPER System

To address this need, the Intelligent Assistive Technology and Systems Lab at the Rehabilitation

Sciences Institute at the University of Toronto is developing a novel, intelligent, spoken

dialogue-based PERS that is part of a larger smart home monitoring system concept called the

HELPER (“health evaluation logging and personal emergency response system”) (Belshaw,

Taati, Snoek, & Mihailidis, 2011; Hamill et al., 2009; Lee & Mihailidis, 2005; Tam, Dolan,

Boger, & Mihailidis, 2006). In theory, the HELPER would continuously monitor the home for an

adverse event (i.e., a fall) and then automatically initiate a response sequence if such an event is

detected. The person being monitored would communicate first with an artificially intelligent

HELPER call taker who would connect the user to their desired live responder. Using speech or

vision to activate the PERS removes the need to wear a body worn activator such as the

traditional PERS “push-button” and will hypothetically increase the user's autonomy and privacy

by permitting the user to either direct or cancel the call before reaching a live call operator.

2.3.3 HELPER Prototype Testing

In terms of the current state of HELPER development, feasibility testing of a HELPER prototype

by previous researchers has successfully demonstrated that automatic system activation (i.e., via

camera detection of a simulated adverse event) followed by human-to-computer communication

using spoken-dialogue and automatic speech recognition (ASR) is possible (McLean, 2005).

Prototype testing was performed with younger adults in a controlled lab environment with the

ASR set to recognize “yes” and “no” word forms (McLean, 2005). The next step would be to

further design, develop, and fine-tune the communication module to work with actual end-users,

especially older adults, in real personal emergency situations (PESs). Only after this step is

completed should the system be field-tested with end-users in live emergency situations (Hamill

et al., 2009).

35

2.3.4 Designing for the End-User

The importance of considering the end-user and the real-world environment in the design of the

HELPER is supported by previous research studies that focus on ‘universal design’ and the older

adult use of SDSs. The ‘universal design’ approach as described by Federici & Scherer (2012) is

based on the premise that, “…designing products to match a mythical average of human abilities

and conditions is in conflict with the fact that all human users are diverse and experience

different personal and environmental circumstances. Inaccessible mainstream products and

services designed with a focus on a narrow subset of human functioning, such as information and

communication technologies, medical equipment, and physical infrastructure, can impose

significant barriers on people with disabilities and people who are aging.” (Section 1.3.6, p.18).

Essentially, it cannot be assumed that older adult users will interact with the automated PERS in

the same way as younger users or even amongst the same age cohort. As well, different PESs

may also change the way users interact with the system.

Prior research on SDSs has shown that users do not always respond with strictly “yes” or “no”

responses when asked questions that require “yes” and “no” answers (Takahashi, Morimoto,

Maeda, & Tsuruta, 2003). Other research has also demonstrated that older adults do not interact

with SDSs in the same way as their younger counterparts. In fact, a majority of older adults use

both acoustically and linguistically different speech expressions compared to younger adults

when interacting with SDSs (Georgila et al., 2008; Wolters et al., 2010, 2009). Based on these

findings, in order to further develop the SDS and especially the Speech Handler component, it is

important to identify what type of dialogue and vocabulary are used in an actual PES by end-

users and whether different dialogue and vocabulary patterns exist for different PESs.

To the author’s knowledge, aside from the personal emergency response call company’s call

taker protocol manual, no research literature examines personal emergency response call

conversations, the words used within a conversation, or the dialogue patterns that may occur in

different PESs. Not knowing how PERS users respond during PESs makes it extremely difficult

for HELPER technology developers to universally design for end-users in actual situations.

36

2.3.5 Study Objective and Significance

In order to consider the end-user in the design and development of the HELPER, it was

important to identify a way to capture samples of PES conversations, either live or recorded. In

real life, PESs are not the kind of events that can be easily predicted or ethically induced.

Consequently, it was hypothesized that recorded samples of real personal emergency response

calls (herein also referred to as the “call” or “response call”) would be the most feasible and

useful source of end-user conversation samples in the context situations. This study focuses on

the analyses of a collection of real personal emergency response calls. The main objective of this

study was to identify the keywords and phrases used by existing PERS users in various


According to Haggag (2013), keywords are significant words or term s that can “best present the

document context in brief and relate to the textual context.” Being able to identify these

keywords and phrases would be significant not only to individuals or organizations wishing to

better understand how PERS users communicate their needs in a personal emergency response

situation during a response call, but also, for the technology designers developing the HELPER

communication module or other similar technologies. Relevant background will be presented

first, followed by the study methodology, results, discussion, and conclusions.

2.3.6 Background

2.3.6.1 An Automated and Intelligent HELPER

Figures 2-1 and 2-2 illustrate the pathways to personal emergency response using the traditional

push-button PERS and the HELPER respectively. Both pathways are designed to engage the user

in conversation and the target responses are also similar, however, in the traditional push-button

PERS, see Figure 2-1, the interaction is between the caller and the live call taker. Whereas, in the

HELPER, see Figure 2-2, the interaction is between the caller the HELPER computer. The way

the HELPER works is, instead of being activated by the user pushing a button, the automated

PERS component is triggered automatically when the Vision Module of the HELPER computer

(Figure 2-2, 2a) detects the occurrence of an adverse event. The user may also activate the

system manually by saying a specific keyword or phrase (e.g., a cry for help). Using speech or

37

automatic event detection to activate the automated PERS essentially eliminates the need to wear

a button activator continuously.


The Traditional Push Button PERS

Speaker Phone or Telephone

Who is calling?Call reason?Situation risk level?Response required?

Push button activator

2. Personal Emergency Call Centre Call Taker

3. Call ResponseEmergency Response Services

Personal Responder(s)No Response (false alarm)

Live Person Live Personspokendialogue

Hands on

Figure 2-1: Pathway to personal emergency response using the traditional push-button PERS.


The HELPER System




vision activation




dialogue



2a. Vision Module



Live Person

Hands free


When using the automated PERS, communications are managed within the Communication

Module (Figure 2-2, 2b) and will occur through spoken dialogue between the user and the

HELPER computer. With the HELPER computer as a first responder, the PERS user’s autonomy

can be maintained. The user has the ability to directly request their desired target responder or to

38

cancel a false alarm call before reaching a live operator. In essence, the automated PERS

functions similarly to a hands-free telephone but with specialized and intelligent features.

2.3.6.2 The HELPER Communication Module




characterized by its ability to accept continuous speech, allow for user initiatives, to reason,

detect errors or incoherence, to correct, anticipate, and/or predict the spoken user response. A

SDS is typically comprised of at least five functional components (Georgila, Wolters, Moore, et

al., 2010; Lamel et al., 2000; Mӧller, 2005):

(1) The Automatic Speech Recognizer (ASR) - receives an acoustic signal (spoken input)

and transforms this into a most probable word sequence;

(2) The Semantic Analyser or Natural Language Understanding component - deciphers the


(3) The Dialogue Manager – maintains the dialogue and keeps a history of responses;

(4) The Response Generation component – determines the output dialogue according to “the

dialog state, the user utterance, and/or information returned from the database” (Lamel et

al., 2000);

(5) The Speech Synthesis – converts selected system utterances to actual speech output.

According to the best practice guidelines for spoken language dialogue systems and components

produced by the DISC European project, the six essential aspects of SDS development include:

speech recognition, language understanding and generation, dialogue management, speech

synthesis, human factors, and systems integration (Lamel et al., 2000). However, currently,

although all SDSs include an ASR component and some form of speech synthesis or output, the

presence of (2) the Semantic Analyser, (3) the Dialogue Manager and (4) the Response

Generation components range from not present, to limited in nature, to fully present and possibly

complex (Furui, 2003; Lamel et al., 2000; Vipperla et al., 2009).

39

In the HELPER communication module, it is proposed that all the basic functional components

of the SDS be present to follow the DISC recommendations, in addition to a component for

contacting a live responder, conveniently called the “call responder” component. Figure 2-3

illustrates the proposed internal sub-components of the HELPER communication module with

the Call Responder component at the top. The Semantic Analyser or Natural Language

Understanding component of the SDS would be included inside the Speech Informant

component (located above the ASR) in Figure 2-3.

The results of this study specifically focus on improving the Speech Handler component of the

HELPER communication module. Therefore, further detail is provided only on the Speech

Handler sub-components specifically.

Taking a closer look at the ASR component, Figure 2-4 illustrates the typical internal structure of

an ASR. This diagram was derived from (Glass & Zue, 2003; Jurafsky, 2014).


Incoming Speech

Spoken Output

Speech Handler

Response Handler

Dialogue Handler

Responder On Route


Dialogue Manager

Response Generation

Speech Synthesis

Speech Informant

Call Responder


40



Automatic Speech

Recognizer (ASR)

Incoming Speech from User

To ‘Speech Informant’ Component

Figure 2-4: The ASR component of the HELPER Communication Module.

In Figure 2-4, starting at the bottom left, incoming speech from the user (the acoustic waveform)

arrives through the microphone and is digitized and processed into “numerical representations of

speech information or features” that describe relevant characteristics of the speech signal for

ASR (Scharenborg, 2007). These features are then sent to the Decoder which attempts to decode

the speech signal or recognize what was said by searching through (1) a pre-assembled collection

of speech sound2 representations within the acoustic model, and (2) following specific

pronunciation rules in the pronunciation model (lexicon), and (3) grammar and language rules

in the language model (3) to identify a “best match” (Scharenborg, 2007). The ASR output is

then sent to the Speech Informant component where the semantic analyser resides. The Semantic

Analyser will process the incoming “best match” utterance and attempt to “understand” or derive

the meaning of the utterance.

ASR and semantic analysis techniques are growing areas of research and various approaches are

available (Jurafsky & Martin, 2009). With respect to ASR, keyword spotting methods may be

appropriate for implementation in the automatic PERS, of which there are various kinds. To

identify the keywords, one method translates the incoming speech to text, another may match

potential keywords acoustically, and yet another method may break down the speech into

phoneme components for comparison (Moyal, Aharonson, Tetariy, & Gishri, 2013). With respect

to semantic analysis, common techniques include syntactic parsing (e.g., nouns, verbs, placement

in utterance), predicate logic (e.g., representations of word meaning), and statistical methods

2 The speech sounds are usually sub-word units such as phones, the smallest unit of sound of a language (Gold & Morgan, 2000; Jurafsky & Martin, 2009).

41

(e.g., probabilities of word order, word relationships, matches to existing known examples)

(Jurafsky & Martin, 2009; Klapuri, 2007). No specific technique is being recommended at this

point as this will be an area of future research.

2.3.7 Study Focus as Applied to the HELPER

Figure 2-5 illustrates how the outcome of this study could be applied within the HELPER SDS,

specifically to further develop the ASR, the Speech Informant (SI), and to help classify the

emergency situation (Classifier). Specifically, the results from this study will identify keywords

that could be used to expand the vocabulary size of the HELPER’s ASR; key phrases that could

be used to train the language model of the HELPER’s ASR; and word categories that could be

used in the semantic analyser sub-component of the HELPER Speech Informant to aid with

utterance understanding. In addition, identifying patterns in keyword usage for different PESs

may aid in classifying the PES. By knowing the class or category of a PES, the HELPER may in

future be able to foreshadow the target response to offer the user.

1. Personal Emergency Situation 2. The HELPER System

ClassifierASR

?

SI

Key words and phrases

Word and phrasecategories

HELPER Computer

What response?

Co

nve

rsat

ion

3. P

ER

S R

esp

on

se

Older Adult

Hello? Anyone?

Cares Corpus

Figure 2-5: Possibly data application areas within the HELPER communication module along the personal emergence response pathway.

42

In summary, these findings could be used to improve the HELPER’s Speech Handler and

enhance the system’s ability to both recognize and understand what is being said by the end-user.

Study 1’s main focus will be on identifying keywords and phrases used by callers and

determining which of these keywords appear for various PESs.

2.4 Methodology

2.4.1 Research Design Method

An exploratory, sequential, mixed methods design was used for this study. Clark & Creswell

(2011) provide a good introduction to this method which consists of a ‘qualitative data collection

and analysis’ phase followed by a ‘quantitative data collection and analysis phase’ and ending

with a ‘final interpretation’ as illustrated in Figure 2-6 (Clark & Creswell, 2011).

Qualitative Data Collection

and Analysis

Quantitative Data Collection

and AnalysisBuilds to Interpretation

Figure 2-6: Diagram of the process of exploratory sequential mixed methods design (Clark & Creswell, 2011).

For the ‘data collection and analysis phases’ of both the qualitative and quantitative portions of

this research design method, content analysis is the approach used. Crede & Borrego (2010)

provides an example of using the content analysis approach within a mixed methods design.

Content analysis is an attractive method of inquiry applied in many research fields for analyzing

text (and sometimes other media) in context of its use (Cavanagh, 1997; Krippendorff, 2012).

Over recent decades content analysis has been used increasingly in the field of health research

(Elo & Kyngäs, 2008; Mays & Pope, 2000). Content analysis is flexible enough to examine data

both qualitatively or quantitatively and inductively (e.g., specific to general) or deductively (e.g.,

general to specific based on existing theory) (Elo & Kyngäs, 2008; Krippendorff, 2012). When

used as a research method, content analysis is noted as being systematic, objective, repeatable

and a valid means of either quantifying phenomena or making inferences about data in context

(Krippendorff, 2012). Typically new knowledge or insights are dervied in the form of concepts

or categories describing some phonomenon or for the purpose of building a model, conceptual

system or map (Elo & Kyngäs, 2008). In this study, for example, words are selected and

43

classified into categories according to their context or meaning which is an exact example of

content analysis’ utility (Cavanagh, 1997). The outcome of a content analysis may also be used

to guide future action which is especially useful in the field of health research (Elo & Kyngäs,

2008) and for this particular research application. Furthermore, content analysis is used in the

field of artificial intelligence to help researchers design machines capable of understanding

natural language (Krippendorff, 2012), which again is another key component for the HELPER.

2.4.1.1 Method Limitations

In terms of limitations, the flexible advantage of content analysis is also its restriction. Some

researchers have noted that because content analysis does not proceed linearly and has minimal

formalized procedures, it can become more complex and difficult to implement than quantitative

analysis (Polit & Beck, 2004).

2.4.1.2 Method Implementation

The general procedure for implementing a content analysis include (Elo & Kyngäs, 2008;

Graneheim & Lundman, 2004; Krippendorff, 2012):

1. Selecting a unit of analysis (e.g., interviews, a program, parts of text);

2. Within the unit of analysis, selecting a meaning/coding/content/recording unit. Essentially,

one must decide what to analyse, to what degree of detail, and how sampling will be

conducted (e.g., should the codes include silence, sighs, laughter, and postures?);

3. Organizing the data (e.g., use open coding, categories, themes, abstractions);

4. Creating a model, conceptual system or map, or categories.

2.4.1.3 Method Approaches

There exist various approaches to the application of content analysis in research and three

approaches will be used within this dissertation. Table 2-1 provides a brief description of how

these approaches are distinct from each other.

In addition to these approaches, it is important to know how the content will be examined.

Looking at manifest content refers to using the visible or obvious components of the content

being studied. This is as opposed to latent content which involves an “interpretation of the

44

underlying meaning of the text” (Downe-Wamboldt, 1992; Graneheim & Lundman, 2004;

Kondracki, Wellman, & Amundson, 2002). This study uses the manifest meaning of words.

Table 2-1: Various distinct approaches to content analysis.

Application Type Description

Direct Theory or relevant research findings are used to guide initial code development (Hsieh&Shannon,2005).

Conventional Coding categories are derived directly from the text data. Generally used to describe a phenomenon in the data (Hsieh&Shannon,2005).

Quantitative Text data are coded into explicit categories and then described using statistics (Morgan, 1993).

2.4.2 Research Design Details

2.4.2.1 Research Population

All recorded calls used in this study were between the clients of the PERS provider or a care

provider and the PERS providers’ call taker. In a few cases, emergency medical service (EMS)

dispatchers or the PERS setup personnel were also included in the call. No subscriber details

were provided with the calls, but caller age and gender details were deduced from within the call

conversations where possible. We are unaware of any prior call "sorting", for example with

respect to gender, call type, caller type, and emergency risk level that may have occurred.

2.4.2.2 Research Setting

This study was completed at the University of Toronto in the Rehabilitation Sciences Institute.

The data processing was performed in the Intelligent Assistive Technology and Systems

Laboratory. This study also included three visits to expert emergency response service sites to

gain a better understanding of how emergency responders operate and interact with older adults

in emergency situations. One visit each was made to a local fire hall, an EMS dispatch centre,

and a personal emergency response call centre.

45

2.4.2.3 Data Collection

Personal Emergency Response Call Recordings

The personal emergency response calls used in this study were provided by a local, private PERS

provider upon our request for a sample of emergency and non-emergency calls. The non-

emergency calls recorded included: false alarms or accidental system activations, installation

setups or equipment test calls, scheduled check-ins, translation requests, and follow-up calls. The

emergency calls recorded included genuine emergency calls for either EMS (i.e., 911,

paramedics) or non-EMS emergency responders (i.e., relatives, friends, or professional care

providers). A total of 109 digitized call recordings were obtained from the PERS provider (name

withheld for confidentiality). These recordings were collected in two sessions over two years

(2008 - 52 calls and 2009 - 57 calls). All recordings were made in Canada. To our knowledge, all

clients in this study used the traditional push-button activator.

Confidentiality

Confidentiality agreements were signed between the private call centre providing the call

recordings and the Intelligent Assistive Technology and Systems Lab. These agreements outlined

how the data would be used and stored. In terms of usage, all transcripts would be stripped of

personal or identifying information and access to call recordings would be limited to select

individuals upon approval by the Company. In terms of storage, all recordings would be kept in a

secure and locked location and all digital recordings on the computer would be kept under

password protection on a lab computer. All correspondences with the Company would also be

kept confidential.

On-site Visits with Emergency Response Service Providers

Informal discussions and observations were conducted with the emergency response call experts

at their business locations. A few hours were spent with one EMS dispatcher and three personal

emergency response call takers to gain familiarity with how the operators receive and handle

incoming calls and to understand how the call centre is organized. At the fire hall, an informal

discussion was held with three firefighters (one had experience working as a paramedic) to gain

46

a better understanding of general emergency procedures and some of the common response

difficulties encountered while attending to emergencies with older adult individuals.

2.4.2.4 Data Processing

Eighty-four (84) response calls were transcribed in total. The twenty-four (24) non-transcribed

calls consisted of repeat recordings or were conversations between the emergency response

service providers only (i.e. between the personal emergency response provider’s call taker and

EMS dispatchers without subscriber involvement). Transcription was performed verbatim from

digital audio files using the computer software, “Systematic Analysis of Language Transcripts”

(SALT), version 8.0 and 9.0 (Miller & Iglesias, 2006). The transcription process followed the

SALT protocol outlined in the user manual (Miller & Chapman, 2008). SALT was specially

designed software for “eliciting, transcribing, and analyzing language samples.” As such, in

addition to transcription tools, the SALT software also includes various analytical tools,

including, but not limited to, the ability to code words and utterances, and calculate words per

minute or conversational time lengths. The coding units of interest were extracted from the

response call transcripts using the "explore multiple transcripts" and "rectangular data file"

features of the SALT software.

Transcriptions were completed by listening to the digital call recordings on a computer using

headphones. The audio content was transcribed directly into text in the SALT program. An effort

was made to capture non-word utterances (e.g., coughing), fillers (e.g., ‘eh’, ‘ah’), and to note

silent moments (long pauses) during the conversation. Patient identifying information was

excluded in the transcripts (i.e., no names, addresses, or contact information). Due to the nature

of the working agreement with the company providing the PERS, only a limited number of the

laboratory research team members had permission to listen to the raw call recordings.

These real call samples all had a fair amount of background noise embedded in the recordings,

presumably caused by both the caller’s and call centre’s background environments, as well as

being inherent in the recording equipment. During transcription, recordings had to be paused

frequently and the volume adjusted to very high levels in order to catch what was being said in

the conversation. Call recordings were stored on the computer as *.wav files and played using

Audacity (version 2.02) an open source, freeware for listening to and editing sound files

47

(Mazzoni, Dannenberg, & et al., 2000). The sound files were played back using the “mono" or

single audio track with a sampling rate of 8 kHz and a sample format of 32 bit floating point.

2.4.2.5 Data Analysis

“Naïve” listening of the call recordings and reading of the transcripts were first used to obtain a

superficial and preliminary understanding of the conversations and to identify possible directions

for analyses. Figure 2-7 illustrates a flow diagram of the general steps used to complete the study

objective.

(b)

Transcribe calls (SALT)

Isolate unique words

Identify reduced keyword set

Personal Emergency Response Calls

Identify keywords by word categorization

To improve ASR acoustic and language

models and apply to CARES Corpus

Manifest Quantitative Content Analysis at Word Level

Identify situation characteristics

Classify response calls with PES Model

Directed Content Analysis at Call Level

Develop Personal Emergency Situation

(PES) Model

• 911 call literature

• Emergency service provider site visits

Identify word categories

Conventional Content Analysis at Word Level

To improve speech understanding

(Speech Informant)

Identify key phrases

(a)(c)

Identify keywords used in various call categories

To improve HELPER call classification

Figure 2-7: Flow diagram illustrating the methodology followed to analyse the calls and complete study objectives.

Start at top arrow. Point (a) word level analysis to identify word categories; Point (b) a word level analysis to identify keywords and key phrases; and Point (c) a call level analysis to create the PES Model. Keywords were then identified for occurrence within different call categories.

A total of three analyses were performed on 84 of the transcribed calls, two at the word level and

one at the call level. A different approach was used for each content analysis. Beginning at the

top (fat green arrow) in Figure 2-7, personal emergency response calls were transcribed and

48

unique words were isolated using SALT. At point (a), a conventional content analysis was

performed at the word level to determine possible word categories. Next, at point (b), a manifest

quantitative content analysis was performed at the word level to identify keywords by assigning

word categories to the unique words. Once a keyword list was obtained, at point (c), a directed

content analysis was performed at the call level to identify personal emergency situation

characteristics. Information from previous literature, visits with emergency service providers (see

Appendix H), and the categorized keywords were combined to direct the call level analysis

where a PES model was developed. Response calls were then classified using the PES model

categories. The PES model was also used to help focus the inclusion/exclusion criteria in the

keyword reduction step. Once the final reduced keyword set was identified, the keyword

occurrence within different classified personal emergency response calls was examined.

The word categories identified from the conventional content analysis can be used to improve

speech understanding within the HELPER Speech Informant. The keywords identified from the

Quantitative Content Analysis can be used to expand the size of recognized vocabulary in the

automated PERS and can be included in the CARES (Canadian Adult Regular and Emergency

Speech) Corpus (see Study 3 for further details) for future ASR training and system testing. The

key phrases identified can be used to build up the ASR’s language model. The PES model

identified from the directed content analysis can be applied to improving the call classification of

the HELPER which may ultimately help in forecasting a final response target.

Full Keyword Identification Using Three Coders

The SALT software was used to extract all unique words from the transcripts spoken by the

PERS callers (users). This initial list of extracted unique words will be referred to as the “raw

word list.” The process of keyword identification was performed in total by three coders. Figure

2-8 illustrates a flow diagram of the process used by Coder 1 to determine the word categories

and “original keyword list”. The word categories were derived from the manifest word meaning

within the context of a personal emergency response call.

49

• Transcribes call recordings

• Extracts unique caller words (raw word list) from call transcripts using SALT

• Creates word categories and category descriptions based on extracted word function

• Assigns word category to each extracted word

• Creates a list of categorized keywords called the ‘original key word list’

Coder 1

Figure 2-8: Process of keyword identification and categorization from Coder 1.

The word categorization process was repeated by a second coder (Coder 2) for all words

extracted by Coder 1 out-of-context (without having read the response call transcripts). See

Figure 2-9 for a flow diagram of the process used to determine the “Coders 1&2 keyword list.”

To examine how the out-of-context word selection compares to in-context word selection, a sub-

study was conducted with a third coder. Coder 3 was provided with a printed copy of the

transcribed call recordings and the word categories used by the first two coders. The third coder

was asked to select the keywords she felt were important from within the transcripts and to code

them using the pre-defined word categories used by Coder 1 and Coder 2. Coder 3 also

highlighted pertinent phrases within the text she felt were important to understanding the PESs

and the required response. The final keyword list determined after the last agreement session

with Coder 3 will be referred to as the “full keyword list.” Inter-rater reliability was measured

using Cohen's Kappa using Coder 1’s original keyword list and Coder 2’s keyword list for out-

of-context word categorization. It was also measured using the Coders 1&2 keyword list (from

Coder 1 and 2) and Coder 3’s keyword list for in-context word categorization. Inter-rater

50

reliability was also measured for Coder 3’s selection of keywords compared with Coders 1&2

keyword list.

• Receives the raw word list used by Coder 1 and list of word categories with descriptions

• Assigns word category to each word from raw list (out-of-context)

• Coder 1 answers questions and clarifies word descriptions with Coder 2

• Coder 2 recommends any changes to word categories

• Final changes made to keyword category assignments

• Coder 1 compares Coder 2’s key word list and word categories with ‘original key word list’

• Misunderstandings are clarified and final modifications are made as necessary.

• The ‘Coders 1&2 keyword list’ is created

• Receives call transcripts and the Coders 1&2 keyword list and list of word categories with descriptions

• Highlights keywords and phrases inside transcripts (in-context) and assigns categories to words

• Coder 1 answers questions and clarifies word category descriptions with Coder 3

• Coder 3 makes final changes to keyword category assignments

• Coder 1 extracts Coder 3’s keyword list and assigned categories and compares Coder 3’s list to the ‘Coders 1&2 keyword list’

Coder 2

• Coder 1 calculates inter-rater reliability

Coder 3

• Coder 1 and Coder 3 decide on final keywords and assigned categories

• List called the ‘full key word list’

• Coder 1 calculates inter-rater reliability

Figure 2-9: Process of keyword identification and categorization from Coders 2 and 3.

Coder 1 is the author of this dissertation with no prior experience in coding. Coder 2 was a

graduate student within the Department of Speech-Language Pathology and has background

experience in coding for qualitative studies and conducting research involving older adults with

dementia. Coder 3 is a practicing emergency room physician with a focus on geriatric care and

an interest in technologies. Coder 2 and Coder 3 were both aware of the main purpose of the

research but were not directly conducting research in this area. The full keyword list contains all

keywords selected from the Coders 1 and 2 keyword lists in addition to the keywords obtained

from Coder 3.

51

The use of more than two coders is not uncommon in qualitative research and the number of

coders used tends to be based on the needs of the project, the data being coded, as well as the

capability and experience of the coders (i.e., their ability to pull out themes and identify

categories, their backgrounds) (HAK, 1997; Krippendorff, 2012; Ryan, 1999). For this study, it

was felt that using three coders would increase the validity of the work if it could be shown that

high interrater reliability could be achieved if the keywords were identified both in-context as

well as out-of-context. The fact that the third coder is an emergency room physician also means

that this coder is familiar with conversations that involve patients communicating health related

problems and symptoms. Selecting coders with background expertise is suggested to be an

important qualification (Krippendorff, 2012). A high interrater reliability amongst the three

coders was expected to demonstrate greater validity in keyword selection.

Final Keyword Identification

The actual number of words that the ASR vocabulary could be increased to was constrained to

some degree by a two hour collection time frame in which the speech data was to be collected;

the number of different speech types that were to be included (e.g., read sentences, free speech,

etc.); and the number of older adult participants that could be successfully solicited to provide

speech samples for the database (see Chapter 4 for further details). The two hour time frame was

also selected to prevent the older participants from becoming too fatigued in the voice recording

process. Within these constraints, it was determined that a final maximum keyword vocabulary

size of 185 words could be recorded. This size was considered small enough to build a small or

small-medium sized ASR vocabulary but large enough to examine how ASR recognition might

be affected by adjusting the number of words being recognized. One-hundred and eighty-five

(185) words is also significantly larger than the previous ASR vocabulary size of two words

(yes/no) and should offer enough range for technology developers to locate the balance between

being able to recognize enough words to carry out a dialogue smoothly while maintaining a

fairly low ASR word error rate.

To identify a small keyword vocabulary set from the full keyword set, a series of word reduction

rules were required. The reduction rules consisted of inclusion and exclusion criteria developed

based on the goal of being able to isolate keywords that could be used by the HELPER to

52

determine the desired target response. Within this context, the keywords that were preferred were

those that carried significant meaning or which could provide enough detail to identify: (1) the

PES and target response, (3) indicate a positive or negative response to a question, and/or (4)

perform some other function vital to a response call conversation, such as the opening/closing

dialogue early on in the response call conversation.

Before keyword reduction rules could be applied, important characteristics that could be used to

distinguish between different PESs was required. Therefore, a PES model was developed using

PES classification categories derived from the initial keyword list, the transcribed response calls,

on-site visits with emergency service providers, and research literature. The reduction rules were

then applied to the full keyword set and a small keyword set was obtained for inclusion into the

CARES corpus. This small keyword set was also checked to ensure that keywords from every

word category (e.g., positive/negative responses, conditions, etc.) of interest would be included

in this smaller vocabulary set.

Keyword Identification in Various PESs

In order to determine the keywords used for different PESs, the response calls were classified

using the PES model categories and the unique caller words spoken within each of the PES

model categories were extracted using SALT software. For each PES classification category,

only words from the full keyword list were retained.

2.5 Results

2.5.1 Extraction of keywords

A total of 779 possible words were isolated from the response call transcripts using only the

caller dialogue (all caller types). This raw list also included word fragments, code names (e.g.,

CGN = caregiver name), unintelligible code markers (e.g., ‘X’ denoted for unintelligible words),

and some word repetitions (e.g., because of spelling differences or computer symbol

differences). Removing 16 word repetitions left 763 possible words in the raw word list.

Generally, non-word sounds such as word codes representing coughing, silence, TV noise, and

other non-verbal noises (e.g. sighs, breathing, moans), were not included in the keyword lists.

53

Eighteen (18) word categories or themes were developed from the raw word list and are shown

in Table 2-2.

Table 2-2: The word categories derived from words extracted from response call transcripts.

Index Word Categories 1-p Positive response to questions - (e.g. yes) 1-n Negative responses to questions - (e.g. no) 2 Request/command - verb related to obtaining assistance (e.g., help, get) (2a - weaker

requests) 3-n Problem condition current (e.g., unconscious, clammy) 3-e Problem condition pre-existing (e.g., diabetes) 3-p Positive condition descriptions (e.g., good) 4 Neutral body state - made pos/neg with descriptor word (e.g., can't feel, not breathing) 5 Politeness (e.g., thank you) 6 Opening/Closing (e.g., hello, good-bye) 7 Repetition - could not hear, mumbled words 8 Targets (e.g., ambulance) 9 Question word (e.g., what, where, how, etc.) 10 Body part (e.g., arm, leg) 11 Negation word - reverses state (e.g., can't, not) 12 Interjection (e.g., ah) 13 Location (e.g. floor) 14 Special commands (i.e., to cancel call, turn-on/off machine, get weather/time/date) 15 Other (words not considered keywords, e.g., of, the, a, etc.)

The word categories developed out of logical groupings based on word function and meaning

within the context of a response call, for example, “positive word responses”, “question words”,

and “emergency response targets”. These categories were initially developed by the first coder

based on knowledge derived from the conversational structure observed in the transcribed

response calls, the call centre protocol, research literature on emergency response calls, and from

informal discussions with emergency response service providers. For example, in (Imbens-

Bailey, 2000; Zimmerman, 1992a, 1992b), the researchers discuss emergency response calls as

having an ‘opening/closing’, some ‘initial caller request’, followed by a possible ‘interrogative

series’, before either a ‘response’ is provided or denied. The word categories therefore generally

fall within this particular conversational context with a word category for “openings/closings,”

“positive and negative responses,” and ‘target responses” to name a few.

54

The process of assigning word categories was performed by all coders in the in and out-of-

context studies. Coder 1 and Coder 2 performed out-of-context coding with the 779 original raw

words extracted. Coder 1 used an original list of word categories. Coder 2 reviewed the proposed

word categories and her feedback was used to make slight modifications. Coder 2 then used this

modified set of word categories for classifying the words. Coder 1 subsequently revised her

coding using the finalized word category set. In the in-context study, Coder 3 performed the

coding by selecting keywords within printed transcripts and categorized these words using the

same word categories as Coder 1 and Coder 2. Coder 3 also highlighted possible key phrases

within the text and coded a majority of these phrases.

2.5.2 Keyword Results from Coders

Coder 1 identified 277 keywords from the raw word list. Table 2-3 shows a summary of the

coding results obtained for Coder 2 and Coder 3 after each coding process.

Table 2-3: Summary of coding results for Coders 2 and 3 based on keywords and category matching.

Measure Coder 2 Coder 3 Keywords coded 300* 204+

Percent Agreement (Category matches per

total words)

81% agreement^ (631/779)*

75.5% agreement~ (575/762)*

Inter-rater Reliability (Cohen’s Kappa)

0.682 p < 0.001

95% CI (0.637,0.727)

0.564 p < 0.001

95% CI (0.511,0.617) # keywords after

consolidation 324* 348*

*word repetitions removed, unintelligible words included. + phrase words, non-words (i.e. coughs, silence) and word repetitions not included.

^ with Coder 1 ~

with Coders 1&2

The remaining 19% differences between Coder 1 and Coder 2 were resolved with discussion.

According to Landis & Koch (1977), the value of Cohen’s Kappa (0.682, p<0.001) is considered

moderate to good agreement.

For the in-context study with Coder 3, some of the words categorized were assigned more than

one category depending on the context in which the word was used. For example “accident”

could be construed as a negative word if the caller says, “I had an accident and fell down,” or a

55

positive word if the caller says instead, “I pushed the button by accident.” For the purpose of

calculating the inter-rater reliability statistic, categories used more frequently were included. So

if “accident” was mostly used in a positive way, such as an ‘accidental call’, then the more

frequently associated category was used. If the frequency of occurrence was low or equivalent

(e.g., word occurred once and was categorized with two category codes), the category matching

that used by Coders 1 and 2 was selected. In situations, where differences in interpretation of the

category definition occurred, the situation was addressed and resolved through discussions. For

Coder 3, the 24.5% difference in categorization with Coders 1 and 2 was resolved this way.

According to Landis & Koch (1977), the value of Cohen’s Kappa (0.564, p<0.001) would be

considered moderate agreement.

Table 2-4 provides further detail on the breakdown of the word comparisons identified in Coder

3’s keyword list and the Coders 1 and 2 keyword list.

Table 2-4: Breakdown of the keywords identified from Coder 3.

Counts/Calculations Description (Keywords) 179 Coder 3's keywords found in Coders 1&2's keyword list

25 Coder 3's keywords not found in Coders 1&2's keyword list 204 Total keywords identified by Coder 3 (no phrase words) 323* Total number of keywords in Coders 1&2's list after modification

179/323 55.42% % of Coder 3 keywords found in Coders 1&2's keyword list 25/323 7.74% % of Coder 3 keywords not in Coders 1&2's keyword list

*Decreased by 1 because “spinal stenosis” combined into one word.

Table 2-5 shows a summary of the coding results from the phrase words selected by Coder 3.

Ten phrases selected had unintelligible words that were included in the phrase count. Six phrases

were removed from Coder 3’s list because they were phrases identified in the “comments”

section of the transcripts and would not have been extracted in the raw word list. One phrase was

duplicated and coded with different categories. This duplication was removed from the phrase

count. Although the phrases were sorted into the same categories used as the words, the phrase

categories could not be statistically compared to the word categories because the individual

phrase words by themselves outside the phrase would not have the same meaning as when used

within the phrase (i.e., would be categorized differently). The inter-rater reliability measure was

calculated based on the number of keywords and keyword phrase words Coder 3 selected and

how these related to Coders 1 and 2’s keyword list. According to Landis & Koch (1977), the

56

value of Cohen’s Kappa (0.523, p<0.001, and 0.512, p<0.001) are considered moderate

agreement.

Table 2-5: Summary of phrase results for Coder 3 with agreement of keyword selection.

Measure Coder 3 Phrases selected 135*

Phrase words (phrase words total)

153^

Percent Agreement (# matches per total words)

(keywords only) 77.8%+

(593/762)

(keywords + phrase words) 76.4%+

(582/762) Inter-rater Reliability (of word selection) (Cohen’s Kappa)

0.523 p < 0.001

95% CI (0.464,0.582)

0.512 p < 0.001

95% CI (0.449,0.575) # keywords after

consolidation 402*

* Repetitions removed, unintelligible words included. ^ 97 words outside of Coder 3’s keyword list + with Coders 1&2

Table 2-6 provides further detail on the breakdown of the keyword and phrase word comparisons

identified in Coder 3’s keyword list and the Coders 1 and 2 keyword list.

Table 2-6: Breakdown of the keywords identified from Coder 3.

Counts Description (Keywords + Phrase words) 222 179 Coder 3 keywords + 43 phrase words found in Coders 1&2's keyword list 79 25 Coder 3 keywords + 54 phrase words not in Coders 1&2's list (cat 15) 301 Total 204 keywords + 97 phrase words in Coder 3's list

101 323 Coders 1&2's keywords - 222 all Coder 3's keywords in Coders 1&2's keyword list

222/323 68.73% All Coder 3's keywords identified in Coders 1&2's keyword list 79/323 24.46% All Coder 3's keywords not in Coders 1&2's keyword list (cat 15) 101/323 31.27% Coders 1&2's keywords not identified by Coder 3 144/323 44.58% % of Coders 1&2's keywords not in Coder 3's keyword list

The final number of keywords identified after the third coding was 402 as shown in Table 2-5.

57

2.5.3 Characterizing the Personal Emergency Situation

2.5.3.1 Proposed PES Characteristics

Findings from research literature and on-site observations with expert emergency responders

highlight the fact that risk level is an important factor to consider when assessing a potential

emergency situation. The identification of medical distress from the response call protocol, as

well as the way firefighters and call takers/dispatchers are trained to first assess the ABC’s (i.e.,

breathing and consciousness) suggests that risk level is of primary importance.

In addition to risk level, EMS dispatchers also classify certain types of call situations (e.g.,

pedestrian/vehicle accident), and this suggests that call reason may also be important. In the

personal emergency response call protocol manual, in addition to medically related symptoms,

specific questions are asked if a ‘fall’ has occurred. This may suggest that ‘medical’ and ‘fall’

calls may be reasonable categories in which to classify calls.

While listening to calls at the EMS dispatch centre, a few calls were received from personal

emergency response call takers requesting transportation from a care residence (e.g., long- term

care facility) to the hospital or back. On one occasion, the personal emergency response call

taker indicated that she had only spoken to the care provider and not to the PERS subscriber.

When the EMS dispatcher was queried about why the care provider did not just call 911 directly,

the EMS dispatcher commented saying that care providers are sometimes instructed by the

personal emergency response provider to activate the subscriber’s PERS as opposed to calling

911 directly in order for the personal emergency response provider to keep track of their client’s

emergency events. This statement has not been corroborated by a personal emergency response

provider, however, it suggests that older adult subscribers are not the only users of PERS and

knowing the caller type may also be important.

As mentioned in the literature review in Chapter 1, during stressful situations such as a medical

trauma or strong emotion, the human voice may change. In addition, natural aging can affect

voice quality and diseases causing conditions such as stroke, aphasia, or dementia may also

affect one’s ability to communicate. Given that the HELPER is based on spoken input, the

communication ability of the user is also important during a PES. However, communication

58

ability may be more a characteristic of the caller type rather than the PES and so was not

included in this model.

2.5.3.2 PES - Caller Type

In terms of the caller type characteristic, response calls were found to be initiated by both older

adult subscribers as well as care providers. Three PES caller type categories were identified: (1)

the subscriber (herein referred to as the 'older adult' user), (2) the other caller (herein referred to

as the ‘care provider’), and (3) a combination of older adult and care provider callers. The term

'care provider' refers to any individual (e.g., neighbour, friend, professional home care worker,

staff nurse, or relative) assisting the older adult user and using the PERS to request assistance on

their behalf.

2.5.3.3 PES - Risk Level

In terms of risk level, response call transcripts were categorized into three emergency risk levels:

(1) low risk or non-emergent (a false alarm), (2) urgent or medium risk (needs help soon), and

(3) emergent or high risk (possible loss of life or limb) (Gilboy et al., 2005). Figure 2-10 outlines

the basic differences between low, medium and high risk situations.

High Risk Situation (Emergent) Non/Semi‐Responsive

Medium Risk (Urgent) Situation Fully/Semi Responsive

Low Risk SituationFully Responsive

• Needs attention ASAP

• Life or death

• Loss of sight, limb or mind

(e.g., unconscious, gasping, heavy bleeding, choking, faint)

• Needs attention fairly quickly

• Not life or death

(e.g., fallen but not injured or bleeding heavily, pneumonia needs x-ray and medication)

• No assistance needed

• Accidental or service call

(e.g., user in full control, needs help changing system batteries)

MobileNot‐mobile

Voice Range Calm/NormalImpaired

Mobility Range

Figure 2-10: Examples of differences between risk levels.

59

2.5.3.4 PES - Call Reason

In terms of the call reason characteristic, response call transcripts that were not false alarms were

grouped into two categories: (1) fall calls and (2) medical calls. In this study, a ‘fall call’ was

defined as one where the caller experiences an unintentional fall, is not hurt or hurt minimally,

but cannot get up without assistance. Fall calls resulting in physical injury such as bleeding were

considered to be medical calls. In medical calls the caller needs medical assistance either because

of a physical injury, pre-existing medical condition, new illness, or psychological concern. The

potential mobility range of a caller is illustrated in Figure 2-10.

2.5.3.5 PES - Communication Ability

In terms of communication ability, the voice may be either calm or normal to impaired as shown

in Figure 2-10. This does not mean that the voice will become impaired in high risk situations,

only that it could become so. With respect to the potential for ‘communication ability’ to be a

PES characteristic, no specific communication ability categories could be identified that would

be help distinguish between different keywords. For example, the keyword “stroke” may suggest

a high risk situation, but if the communication ability is good, then it is simply that keywords

should be intelligible enough for ASR. Due to the important role that communication plays in a

SDS such as the HELPER, looking at the style or manner in which the caller communicates may

reveal interesting findings. It is suggested that communication ability be examined separately in

a future study.

2.5.4 The Personal Emergency Situation (PES) Model

Figure 2-11 illustrates a one model of a PES with the various categories for the situation

classifications shown (e.g., risk level classification has three category levels, high, medium and

low). In this model the PERS user is in the centre and is represented by the 'caller type'. During

a PES, each caller exists in a certain physical and cognitive state which may affect his/her ability

to communicate and respond. During a PES, the user is also experiencing an adverse event which

motivates him or her to activate the PERS and this reason for calling is represented by the 'call

reason'. Finally, the PES situation can also be graded in terms of severity which is represented

by ‘risk level’.

60

Situation

Call Reason

Risk Level

Physical‐Cognitive State

UserCaller Type

Caller Type

• Older adult

• Care provider

Call Reason

• Medical call

• Fall call

Risk Level

• High Risk

• Medium Risk

• Low Risk

Figure 2-11: Model of a PES

2.5.5 Classifying the Personal Emergency Response Calls

The personal emergency response calls were classified by Coder 1 based on the PES model

categories of caller type, call reason, and risk level. For caller type, if the subscriber called, the

call was classified as an “older adult” call. If another person called instead, such as a care

provider (i.e., relative, friend, or personal support worker), the call was categorized as a “care

provider” call. For the situation in which the older adult and care provider both spoke, the call

was categorized as a combination call.

For call reason, as defined previously, only the calls in which a “fall” was mentioned and the

older adult was simply unable to get up (i.e., not or minimally injured) was the call categorized

as a ‘fall call.’ All other calls were considered medical calls.

For the risk level, response calls were categorized using the basic risk level divisions illustrated

in Figure 2-10, following the ABC’s of emergency response. To verify reliability, risk level

coding was performed by a second coder (who was the keyword Coder 3), the physician

specializing in geriatric emergency medicine. Call classification was made by this second coder

in-context after reading the call transcripts. Risk level classification results were compared

between coders 1 and 2 and percent agreement was found to be very high, 89% (75

61

agreements/84 total calls). Inter-rater reliability was calculated using Cohen's Kappa = 0.820,

95% CI (0.706, 0.934), p<0.001. A reliability of 0.8 is considered excellent according to Landis

& Koch (1977). Discussions were held to resolve the remaining classification differences (11%)

until final agreement was obtained.

2.5.6 Reduction of Keyword List

To reduce the full keyword list into a smaller list of vocabulary that could be used for inclusion

into a spoken speech database for ASR training (the CARES corpus), the full keyword list was

passed through a series of word reduction rules consisting of: (1) initial exclusion criteria, (2)

inclusion criteria, and (3) final exclusion criteria.

The initial set of eight exclusion criteria is shown in Table 2-7.

Table 2-7: Initial exclusion criteria for reducing keyword list.

# Exclusion Criteria

Comments Example

1 Conjunctions Function words removed but in CARES corpus these can be found within the other recorded utterances.

“for, and, nor, or, but”

2 Connectors Function words removed but in CARES corpus these can be found within the other recorded utterances.

“to, too, so, as, yet”

3 Articles Function words removed but in CARES corpus these can be found within the other recorded utterances.

“the”

4 Interjections Removed in keywords but in CARES corpus, a few were added to the phrases and PES scenarios.

“ah, ahem, oh, ugh, uh, uh-huh um”

5 Non-Words Removed in keywords but in CARES corpus, some were included in the PES scenarios

E.g. coughing, sighs, breathing

6 Unintelligible Words

Words that cannot be replicated and are not intelligible (‘*’ in example shows incomplete word portion)

E.g., wea*, ye*

7 Unknown Words Words in which the meaning maybe questionable or unknown were removed

e.g., TN, ER

8 Reducible Words Words that normally can stand alone with same meaning and has it’s complement already included in the keyword list

e.g. retain “bye” but exclude “bye-bye”

62

In general, function words making up conjunctions (i.e., for, and, nor, or, but) and a few

connectors (i.e., to, too, so, as, yet) and articles (i.e., the) were removed from the final key

vocabulary list. These connectors would be present within the recorded phrases as well as the 5-

minute monologue recorded by each person. Most interjections (i.e., ah, ahem, oh, ugh, uh, uh-

huh, um) and non-word sounds were also removed (i.e., coughing, sighs). However, interjections

that could imply that the caller requires repetition of the call taker’s utterance were retained, for

example, ‘eh?’ and ‘heh?’ Non-word sounds were removed but some were included within the

emergency scenarios. Unintelligible words were removed as their meanings are questionable and

difficult to replicate.

Before applying the next set of reduction rules, the keywords were divided into two focus sets.

Figures 2-12 and 2-13 illustrate the flow of the decision making process for each of the focus

sets during the reduction process. As illustrated in Figure 2-12, the first word focus set included

all higher frequency words or words occurring five times or more over all response calls and

across four or more PES categories. Next the word inclusion criteria were applied. Words that

met at least one of the inclusion criteria were included in the small vocabulary set. Words that

did not meet any of the inclusion criteria were passed through to the final exclusion criteria. If

the word did not meet any of the final exclusion criteria, the word was considered for non-

keyword inclusion into a PES phrase or scenario that would be recorded in the CARES corpus.

Words meeting at least one exclusion criteria were not included in the vocabulary set.

63

Include Word

No Yes

1st Word Focus • Occurs 5 times or more AND• Across 4+ PES categories

Word Inclusion? (Related to at least one of)• Emergency Request• High Risk• Low Risk• State/Condition/Symptom• Conversational Utility• Caller Type Identifier• Location• HELPER related

command word

Word Exclusion• Combo word or• Vague: cannot assess risk level/target or• Alternate tense already used

Non-Keyword Inclusion• Use in Phrase or• Include in Script

Exclude Word

No Yes

Figure 2-12: Diagram outlining the decision process for selecting key words using the first word focus set.

As illustrated in Figure 2-13, the second word focus set includes any word with a lower

frequency of occurrence (less than five times in the response calls). These words were checked

for inclusion with the same criteria as for the first word focus set except for “location” words.

Location words occurring less than five times were not considered for inclusion.

64


YesNo

No

Include Word

2nd Word Focus • Occurs < 5 times

Word Inclusion? (Related to at least one of)• Emergency Request• High Risk• Low Risk• State/Condition/Symptom• Conversational Utility• Caller Type Identifier• HELPER related

command word


YesNo


Exclude Word

Yes


Figure 2-13: Diagram outlining the decision process for selecting key words using the second word focus set.

All words were checked next for meeting inclusion and exclusion criteria. Words that both met

or did not meet an inclusion and an exclusion criterion were to be included in the CARES corpus

as a phrase word or in a PES script. If the word did meet an inclusion but not an exclusion

criterion then the word would be included in the small keyword set. If the word did not meet any

inclusion criterion but did meet an exclusion criterion, then the word would not be considered in

the small vocabulary set.

Tables 2-8 and 2-9 provide further descriptions for the word inclusion and exclusion criteria

respectively and also an example for each criterion is provided.

65

Table 2-8: Definitions and examples of the word inclusion criteria.

# Word Inclusion Criteria

Description Example

1 Emergency Request Words

Words related to requests for emergency care or assistance

e.g., Ambulance, hospital

2 High Risk Word Word related to emergent care need - loss of consciousness, can’t breathe, serious fall

e.g., breathing, beating

3 Low Risk Word Words related to low risk situations

e.g., mistake, accident

4 State/Condition/ Symptom Word

Word related to a common older adult disease or condition, state of health, fall, targets important body area, and will indicate whether help is needed

e.g., don’t, chest, asthma, terrible, discomfort

5 Conversational Utility Word

Words related to requests for repetition, politeness, opening/closing, responses to questions, asking questions

e.g., okay, thank_you, yeah, no

6 Caller Type Identifier Word

Word is a pronoun identifying caller type

e.g., I, me

7 HELPER Related Command Word

Possible command word(s) used to turn-on, turn-off, system, or ask for information

e.g., off, time, day

Table 2-9: Definitions and examples of the final word exclusion criteria.

# Word Exclusion Criteria

Description Example

1 Combo Word Compound words (consists of sub words already included)

e.g., pardon me, panic attack

2 Too vague/cannot assess risk level

Not enough information to assess caller condition or risk level or too infrequent

e.g., lie, shape, been, yet

3 Alternate tense used Most common word tense or agreement already included

e.g., waiting, falled, bleed

While reviewing the words for possible inclusion into the small keyword list, several questions

were considered: (1) how important is the word’s function or (2) how significant is the word’s

meaning with respect to identifying: a target response; a high or low risk situation; a state or

condition that would be suggestive of a high or low risk situation or a target response; a

necessary word for forming a question or responding to a question; a word that clearly suggests

66

who is calling; an important location; or a word that is an obvious possible command word for

the HELPER system.

In terms of the word exclusion criteria, combo words were excluded because these words could

in theory be formed after, as long as minimal word convergence occurred when moving from the

first to the second word. Word vagueness was considered mostly with respect to the specific risk

scenarios. For example, the word “abs” occurred in a response call categorized as ‘high risk” but

it was not included as a final keyword because the use of this word would likely require several

question/answer iterations to determine why the call is high risk. On the other hand, words such

as “bleeding” or “oxygen” also occurred in a “high risk” response call and these words could

clearly suggest some possible urgent/emergent situation in their meaning. If a word had alternate

tenses or word agreements, typically only the most frequently used tense/agreement was used.

For example, “paramedics” was retained over “paramedic” and “ambulance” was retained over

“ambulances.”

After applying the reduction rules, a smaller list of keywords was obtained but still larger than

185 words. To bring the final number down to 185, the words were sorted by word category to

ensure all remaining 16 word categories were covered (two categories were removed: (15) other

words and (12) interjections). Then, for each category, the words were reviewed again and

reduction decisions were made using the inclusion and exclusion criteria in Figure 2-13, except

the words which did not make the small keyword set would be included in the CARES corpus as

a phrase word or in the PES scenarios.

Of the words that did not make the small keyword set, 70 words, four (4) interjections (e.g., ah,

oh, uh, um), and three (3) combination words (i.e., heart attack, panic attack, blood pressure)

were specifically identified for inclusion into the PES phrases or scenarios in the CARES corpus.

Of the 179 keywords words identified by Coder 3 and Coders 1 and 2, 137 of these were

included in the small keyword set with 35 of the words also included in the CARES corpus as

words in a PES phrase or scenario, or as an alternate word tense of the same word (e.g., ‘leg’

included but not “legs”).

67

2.5.7 Identification of Key PES Phrases

A total of 185 PES phrases were selected from the response calls for inclusion in the CARES

corpus based on following guidelines:

1. All keywords must be used at least once in the 185 phrases (in at least one instance, two

keywords occur together in one phrase);

2. The additional words identified during the reduction of the full keyword set are to be

contained within the 185 PES phrases or PES scenarios;

3. The phrases should span the range of all three PES risk levels, two caller types, two call

reasons, and various response requests (e.g., phrases might involve breathing, falling,

accidental calls, urgent or emergent requests, requests for paramedics, ambulance or

other, general descriptions of medical condition range);

4. The phrase range should include different styles of communication for requesting

assistance (e.g., direct or indirect, narrative like or succinct) and requests for repetition;

5. Phrases should be mostly extracted from the beginning of the response call (within the

first four to five (4-5) speaker turns);

6. As many of the phrases and phrase segments identified from Coder 3 (from the keyword

coding) are to be incorporated into the final 185 phrases or the PES scenarios. See Table

2-10 for a breakdown of what was included; and

7. An additional phrase(s) that deals with a hypothetical initiation of the HELPER system

should be included.

In Table 2-10, out of the 135 phrases identified by Coder 3, 81 of the phrases selected were fully

included in the CARES corpus, 24 of the phrases were included as separate words, and 17 of the

phrases were included partially (e.g., “needs the ambulance” vs “needs an ambulance” or “I’m

really” vs. “I’m really so”). Of the phrases not included in the corpus, ten (10) contained

unintelligible words, and for the last three (3) phrases - one phrase had a different word tense

already included, one was a repeated phrase, and the last phrase did not meet the criteria for

inclusion (Guidelines 1-3 or 7).

68

Table 2-10: A breakdown of the phrase categories included in the CARES corpus selected by Coder 3, sorted by word categories.

Word Category (applied to the phrases)

Included fully

Included separately

Included partially

Positive Response (1-p) 3 -- 1 Negative Response (1-n) 1 2 2

Request/Command (2) 15 6 3 Existing Problem (3-e) 4 2 -- Current Problem (3-n) 41 8 8

Positive Condition (3-p) 5 2 -- Neutral Body State (4) 4 1 1

Politeness (5) 1 -- -- Targets (8) 1 -- --

Body Part (10) -- 2 -- Negation word (11) 5 1 2

Special Commands (14) 1 -- -- Sub-Total = 81 24 17

Total = 122

The selection of phrases for the CARES corpus began by taking the 185 keywords and searching

the response call transcripts for phrases that contained the keyword of interest. The phrases were

then organized according to the desired range of phrases desired, such as phrases of high risk,

medium risk, low risk, fall call phrases, medical call phrases, phrases spoken by older adults and

caregivers, and phrases with succinct requests for assistance versus narrative type requests.

Phrases were then selected from this list to be included in the corpus. Once a set of phrases was

obtained, the keywords that were not part of the small keyword set were also verified to be

present in these phrases. If they were not, phrases were replaced with others or existing phrases

were modified to ensure their inclusion. The phrases from Coder 3 were not considered until

after the phrases for the corpus had already been selected. Fortunately, the majority of the

phrases managed to be included and only 17 out of the 135 were partial inclusions.

2.5.8 Keywords in Various PESs

The PES model categories were used to classify the personal emergency response calls (i.e., low,

medium, and/or high risk levels, older adult and/or care provider callers, and/or fall or medical

calls) and the keywords used within each category were then identified. Table 2-11 shows a

69

breakdown of the number of keywords, from the final small keyword set, that were identified

according to the PES categories.

Table 2-11: The number of keywords identified by response call classification. (LR = low risk, MR = medium risk, HR = high risk, Fall = fall call, Med = medical call, OA = older adult call, and CG is caregiver call)

Call Classification LR MR HR Fall Med OA* CG

# keywords 26 147 137 112 162 160 113 % of total words

(out of 185) 14.05% 79.46% 74.05% 60.54% 87.57% 86.49% 61.08%

# of calls in each category

10 34 40 21 53 61 22

* 1 combination call was not included when examining words between OA and CG. For the other combination call, the CG response was commented out for processing.

In terms of unique keywords spoken, looking at risk level only, three (3) keywords were found

for low risk calls, 44 for medium risk calls, and 31 for high risk calls. There were 16 common

words used across the risk level and call reason PES categories. See Appendix C for a

breakdown of the 185 keywords and categories and Appendix D to see a list of the unique

keyword occurrences.

2.6 Discussion

2.6.1 Word Categories

With respect to the word categories developed for keyword identification from the call

transcripts, for both the out-of-context and in-context studies it would have been beneficial to run

a mock trial with the coders prior to the actual word categorization process. A mock trial would

provide a preview of how the coders are interpreting the word category definitions and would

offer an opportunity for clarifying misunderstandings prior to the actual real coding of the words.

Coder feedback on any categories they felt may have needed further explanation or the need to

create new or remove other categories could also have been done at this time. In this study, the

Coders were able to ask questions during their process of coding, however, in some cases, being

able to identify differences in interpretation would not arise until well into the coding process or

even after the coding has been completed and during the code comparison.

70

2.6.2 Coding Methods

In this study, Coder 1 and Coder 2 categorized words out-of-context, while Coder 3 categorized

the words in-context. As expected, the interrater reliability between Coder 3 and Coders 1 and 2

was lower than between Coder 1 and Coder 2, although still moderately similar. The fact that the

same word can have different meanings depending on its context underlines the importance of

coding the words in-context (e.g., ‘okay’ could mean ‘yes’ or just ‘acknowledgement’ and

‘accident’ could be a negative or positive result depending on the context). The same word may

then be assigned a different word category depending on how it is used in the utterance. In this

study, only one code could be considered for the calculation of the Cohen Kappa statistic and

this may have affected the results slightly.

Some inconsistency also resulted in terms of how to handle two word units (one word composed

of two words), such as “thank you” or “heart attack.” In this study these words were separately

extracted by the computer (e.g., “thank” and “you”) as they were not linked in the transcription

process. Coding in-context however, these words were identified as one unit. It may have been

better then to code these word units as one word as opposed to two words in the out-of-context

situation especially if all instances within the transcripts are of the two words occurring together

and never separately. However, given that the results were still of moderate agreement despite

this difference in coding method, does suggest that the process of word categorization with the

derived categories was still moderately robust.

2.6.3 Full Keyword List Identification

Several utterance extractions in the raw word list consisted of speech units that could not really

be considered keyword vocabulary even though the utterances may be important in terms of

understanding the situational context. For example, non-word sounds such as coughs or sighs

and unintelligible utterances (e.g., partially spoken words or word cut offs), may suggest that the

caller is having difficulty speaking or is ill, but in terms of being able to decipher meaning from

these speech units by themselves would be difficult for a computer. Out-of-context, these types

of speech units were considered non-keywords and coded as such. However, the fact that Coder

3 identified some of these unintelligible speech units as key while coding in-context, does

highlight the fact they are important in the actual conversational dialogue. For this study,

71

unintelligible and non-word sounds were not considered for inclusion into the small keyword set,

however, examples of these types of speech units were included in either the PES phrases or the

scenarios of the CARES corpus.

In the process of selecting keywords, all the Coders eliminated the function words (e.g., for, and,

to, so, etc.) from consideration as keywords. However, the identification of key short phrases by

Coder 3, re-introduced these function words and other words that may not mean much out-of-

context by themselves back into the list of possible keywords. For this reason, even though

adding the phrase words increased the number of possible keywords, it also produced a greater

mis-match in keyword comparison. This is reflected in the lower inter-rater reliability statistic

and percent agreement that was shown in Table 2-5 (77.8% dropped to 76.4% agreement).

2.6.4 The PES Model

Considering the end-user within the environment and situation where he/she will use the

technology during the design phase may make the difference between developing a technology

that will be adopted by the end-user and one that is not. The PES model developed in this study

is a very simple model and one of many possible ones to describe the PES. However, despite its

simplicity, incorporating these PES categories into the final selection of keywords and phrases

from the response calls should, in theory, ensure that the data reflect, to some degree, these

various PES aspects.

By replacing the older adult user with the PES model in Figure 2-5, and expanding the “classifier

unit” to include the PES categories, Figure 2-14 illustrates more closely the task of the HELPER

communication module and how the results of this study can be applied to improve the

automated PERS component.

In the future, other studies may want to consider other PES categories for caller types, such as

different genders, older adult age ranges, medical conditions, or personalities. Different care

provider types may also be of interest, such as those with a background in health care provision

versus those without this background. Another definition of a fall might also be used or the risk

level may be replaced by a condition type, such as individuals with chronic conditions versus

acute conditions (e.g., infections) as opposed to high versus medium risk.

72


Speech and non-speech

Communication Ability

ClassifierASR

• Caller Type(Who is calling?)

• Call Reason (Fall or medical?)

• Risk Level(Patient acuity?)

SI


HELPER Computer

What response?

Co

nve

rsat

ion

3. P

ER

S R

esp

on

se

Cares Corpus

Word and phrasecategories

Situation

Call ReasonRisk Level


UserCaller Type

Figure 2-14: A diagram of showing the pathway to personal emergency response including the PES model and categories within the classifier unit within the HELPER System.

2.6.5 Small Keyword List Identification

The process of small keyword list identification was performed by Coder 1 according to the

reduction rules identified in Figures 2-12 and 2-13, Tables 2-7, 2-8, and 2-9. Perhaps ideally, it

would have been better to use all 402 keywords identified from the full keyword set as the

“keyword data set” in the CARES corpus, however, this would have taken twice the amount of

time to record as well as increase costs. In addition, it would require a fairly long recording

session for the participants involved which may not be suitable for the more elderly participants.

During the keyword reduction process, the decision to select the initial word focus set to be all

words with a high frequency of occurrence and appearing across multiple risk categories was

mainly to identify words which would be commonly used across PES categories as well as ones

that would be frequently spoken. This is beneficial for ASR acoustic model training as it would

ensure that commonly used words across all situations would have the potential to be recognized

if included in the ASR vocabulary. On the other hand, not all PESs are the same and if the desire

is to be able to identify PES categories in hopes of deducing a target response, it would also be

73

extremely valuable to also look for keywords that have a higher probability of being only spoken

during specific situations or PES categories and not others (e.g., words used in high risk versus

low risk situations). As well, some PESs do not occur frequently but when they do occur it is

imperative that they be identified. For example, the word “heart attack” only occurred three

times (so less than the minimum five occurrence cut-off) but this is one situation that would

require an immediate emergency response. So words with a low frequency of occurrence were

also important to include and identify. For these reasons, the second word focus set could not be

eliminated.

Different word reduction processes were used for the first and second word focus sets mainly

because even though the words occurring less frequently may have possessed at least one of the

inclusion criteria, the words tended to be weaker at conveying why they should be included. For

example, the words seemed to be less able to indicate the situation’s risk level, desired target

response, the caller type, or describe the state/condition/symptom of the individual clearly so that

the other criteria could be determined. In considering the actual response call conversation,

response calls that took longer for the call taker to resolve contained more words and thus more

detail. However, this detail may have included these less frequently occurring words, that are

also less robust at conveying a PES’s risk level or needed target response. For example, an

analogy might be a patient complaining to a doctor that they have stomach pain and aches versus

one that is bleeding profusely or having a heart attack. The non-specific complaints of “aches

and pain” tends to take the physician more time to determine a specific cause for compared to the

complaint of profuse bleeding or someone having a heart attack.

With respect to including the words that were not part of the small keyword set into the PES

phrases or scenarios recorded in the CARES corpus, adding the word into a phrase was the first

preference. The main reason for this is that every participant providing speech samples was

required to speak every phrase whereas only three scenarios, selected out of a total set of nine,

were to be enacted by a participant.

2.6.6 PES Phrases

Although the study began with the intent of selecting keywords from transcripts, as the study

proceeded, it became clear that word combinations and phrases were also very important for

74

providing context to what was spoken. Especially, a preference was given for phrases occurring

within the first several turns of the response call conversation. The interest in the initial speaker

turns is because since the HELPER must provide a response as quickly as possible, it is not

expected that the HELPER should engage the user in a long and extended conversation. Ideally,

the HELPER must be able to identify a target quickly within a few speaker turns. If not, it may

be necessary to default to a live call taker. Therefore, the hypothesis for phrase selection was that

selecting phrases from the first several speaker turns of the live response call would reflect more

closely how callers would respond to the HELPER during an actual PES. The information

provided by these key phrases will be useful for developing the language model within the ASR

sub-component of the HELPER Speech Handler (see Figure 2-4 for the ASR main components).

2.6.7 Application to HELPER

An expansion of recognizable vocabulary in the HELPER will require some method of also

‘understanding’ this increased vocabulary. Although the language model in the ASR will help

the Speech Handler identify possible word configurations, it is the keyword categories that will

provide the vital information required by the semantic analyser, in the speech informant sub-

component of the Speech Handler, that will assist the HELPER in at least “artificially

understanding” the meaning behind what is presumed to have been said by the user.

In Figure 2-4, incoming speech being received by the Speech Handler’s ASR will be processed

by the Decoder using the three linguistic models trained with words and phrases from the

CARES corpus. The resulting “best match” of the words spoken by the user would then be sent

on to the semantic analyser in the Speech Informant which will attempt to interpret the meaning

of what was said by the user. So, in a hypothetical situation where the HELPER opens with “Do

you need help? (Please say ‘yes’ or ‘no’),” as the current system prototype does, and if the user

responds, “yes, could you send an ambulance?” the goal would be that the semantic analyser

could break down the words and identify that the user responded as shown in Table 2-12.

Table 2-12: Example of how an incoming statement might be deciphered by the semantic analyser

Spoken word yes could you send an ambulance Word Code (1-p) 9 nc 2 nc 8

Code description positive response

question word

request/

command target

*nc = no code

75

Given this information, the Speech Informant might then tell the HELPER’s Dialogue Handler

component that the user responded to the asked question “positively”, plus there is a request for

“target”=ambulance. The Dialogue Handler might then respond with “please confirm you would

like an ambulance,” as opposed to what the previous system prototype would do which is “would

you like me to call an ambulance? (Please say ‘yes’ or ‘no’).” The main difference in these

responses is that with the expanded vocabulary combined with an enhanced language model and

semantic analyser, the system should be able to recognize that the user has already stated their

desired target, whereas in the existing HELPER communication module prototype, the user

response is only recognized as being more similar to a “yes” or a “no” and the “ambulance”

request would not be recognized.

An interesting comment made by the physician (Coder 3 for keyword coding) was that while

reading the transcripts the caller could have been responded to sooner with the information that

had already been provided. This comment may allude to the need for more basic emergency

medical response training for call takers. The authors are not aware of what training the call

takers are provided or what minimum requirements are needed to be hired on as a call taker. One

research study did note however, that emergency response operators tend to be hired based on

their personality traits: their ability to listen, be sensitive, insightful, empathetic, and have good

intuition (Forslund, Kihlgren, & Kihlgren, 2004).

2.6.8 Study Limitations

The keywords and phrases obtained in this study were derived from a small sample of 84

response call transcripts provided by one call centre. These recordings cover only two provinces

in Canada, but not all the cities. Therefore, this research is limited by the number of different

types of PESs and response call conversations contained in the collection of studied personal

emergency response calls. Researcher bias also plays a role in the development of the keyword

and phrase categories as well as the process of keyword selection and reduction.

76

2.7 Conclusion

In conclusion, this study describes the process by which keywords and phrases spoken by PERS

users were identified for various PESs based on a proposed PES model. The study derived data

using directed, conventional, and quantitative content analyses of transcribed personal

emergency response calls. The main results include the identification of 18 word categories; the

categorization and isolation of 402 keywords and 135 phrases from transcripts of 84 response

calls; the development of a PES model; the reduction of the full keyword list into a small

keyword list of 185 keywords and phrases for incorporation into a spoken speech database.

Common words across the PES categories and unique words for the personal emergency

response situations categorized by risk level were identified. Prior research that examines

keywords and phrases used in response calls by PERS callers has not been previously identified

by researchers in the Intelligent Assistive Technology and Systems Lab. The results of this study

can be directly applied to improving the Speech Handler component of the HELPER and

expanding the HELPER’s recognition vocabulary for incoming speech. The hope is that this

work will contribute to the future development of a more robust HELPER system.

77

Chapter 3

3 Identification of Conversational Trends in Personal Emergency Response Calls

3.1 Prologue

This chapter explores the potential of using statistical measures of speech and conversational

data as an alternate source of information to increase the HELPER system’s confidence in

decision making, and to improve its ability to respond appropriately and efficiently to the end-

user. This study demonstrates the benefit of using both qualitative and quantitative analyses to

help examine the nuances of personal emergency response call conversations. The contents of

this chapter are intended for publication but have not yet been published.

3.2 Abstract

Purpose: A novel automated, intelligent, spoken dialogue-based personal emergency response

system concept is being developed in an attempt to address the existing usability barriers

identified by prior research groups of traditional push-button type personal emergency response

systems. However, spoken dialogue systems and automatic speech recognition technology

cannot perform optimally all of the time, especially with the expected target users in emergency

situations. The main objective of this study was to analyse statistical information from real

personal emergency response calls in order to identify significant call and conversation trends

that may be used to help the automated personal emergency response system tailor it’s dialogue

response to the end-user’s need(s).

Method: An emergent, exploratory, sequential mixed methods design was used for this study.

Personal emergency response calls were classified according to the personal emergency response

categories identified qualitatively from transcribed personal emergency response calls. Various

statistical analyses were performed involving different combinations of three conversational

measures: verbal ability, conversational structure, and timing; and three independent factors:

caller type, risk level, and speaker type.

78

Results: Emergency medical response services were identified as preferred responders for the

majority of medium and high risk calls by both caller types. Non-emergency medical service

responders were requested mainly during medium risk situations by older adult callers. Older

adult callers may be predicted with fairly high accuracy by measuring the caller’s spoken ‘words

per minute’ and ‘turn length in words.’ Average call taker response times were calculated in both

speaker turns and in seconds. Care providers and older adults were found to use different

conversational strategies when responding to the call taker. The words ‘ambulance’ and

‘paramedic’ seem to hold different latent connotations.

Conclusion: Tailoring the response dialogue of the automated personal emergency response

system to the caller can help minimize user frustration and improve call efficiency. Classifying

calls by caller type or risk level may help tailor the call dialogue to the user. Call taker response

times can also be used to limit the length of conversation before reaching a live operator. System

designers should consider when to use the terms “ambulance” or “paramedic” in their response

dialogue and/or to include both as possible responder options.

3.3 Introduction

3.3.1 Need for a New PERS

Over the last few decades, mounting concern over a growing elderly population combined with

advances in computing technology have stimulated research into new methods for improving the

traditional, push-button, personal emergency response system (PERS). This "second” generation

of PERS technologies have begun to incorporate technological advances such as automatic fall

detection and home based monitoring with sensors (Doughty, Cameron, & Garner, 1996;

Heinbüchner et al., 2010). Although the market for the next generation of PERS is large, the

technology is still young and the majority of existing PERS owners continue to use the

traditional, push-button PERSs. Hessels et al. (2011) provides a recent review of the latest

advances in personal emergency response technologies.

3.3.2 The HELPER System

In terms of identifying ways to improve PERS technology, some researchers have suggested that

activation be made using speech (keywords) (Hobbs, 1993; Taylor & Agamanolis, 2010). Other

79

researchers suggest that older adult home care technologies in general need to be made more

“attractive, provide privacy, [and] allow for informed choice” and reduced isolation (Blythe et

al., 2005). In an attempt to incorporate some of these design suggestions into a new PERS,

researchers in the Intelligent Assistive Technology and Systems Lab have been developing a

hands-free, speech and vision based smart home monitoring system called the HELPER system

(Health Evaluation Logging and Personal Emergency Response System) (Belshaw et al., 2011;

Hamill et al., 2009; Lee & Mihailidis, 2005; Tam et al., 2006). In theory, the HELPER would

continuously monitor the home for an adverse event (i.e., a fall) and then automatically initiate a

response sequence if such an event is detected. The person being monitored would communicate

first with an artificially intelligent HELPER call taker who would connect the user to their

desired responder. Using speech or vision to activate the PERS removes the need to wear a body

worn activator such as the traditional PERS “push-button” and will hypothetically increase the

user's autonomy and privacy by permitting the user to either direct or cancel the call before

reaching a live call operator.

3.3.3 HELPER Prototype Testing

Feasibility testing of the HELPER communication module by previous researchers successfully

demonstrated that automatic system activation via visual detection of a simulated adverse event,

followed by human-to-computer communication using spoken dialogue and automatic

recognition of incoming speech is possible (McLean, 2005). The HELPER prototype was tested

with younger adults in a controlled lab environment. The HELPER automatic speech recognizer

(ASR) was set to recognize “yes” and “no” word forms only and the automated dialogue was

modelled off of existing personal emergency response call centre protocol (McLean, 2005). In

these studies, the group of young adults successfully navigated the HELPER dialogue and

obtained assistance by responding to the “yes” and “no” queries in both quiet and noisy

conditions. The next phase of this research is to focus on furthering the design, development, and

fine-tuning of the HELPER communication module for actual end-users, especially older adults,

in real personal emergency situations (PESs).

80

3.3.4 Older Adults and Spoken Dialogue Systems

Research that examines the use of spoken dialogue systems by older adults have revealed that

older adults are more likely than non-older adults to communicate with spoken computer

dialogue systems as if they were human. Researchers have observed that their older adult

participants used more "definite articles, more auxiliaries, more first person pronouns, and ...

more lexical items related to social interaction, such as 'please' and 'thank you', compared to

younger adults (Mӧller et al., 2008). The challenge of the HELPER’s SDS will be to recognize

speech from end-users, mostly older adults, who may converse with it like a human, in

potentially stressful emergent situations where the speaker may have decreased communication

abilities (e.g., hesitations, disfluencies) or may not even be facing or close to the HELPER’s

input microphone. As such, it stands to reason, that to increase robustness in the HELPER

communication module, it would be important to not only include techniques for error recovery,

but to also provide any other supports possible that may assist the HELPER in deciding how to

respond to a call. This study hypothesizes that this “other support” may be derived from prior

real response call patterns and/or call dialogue (e.g., conversational measures) which may then

be used to classify a response call with a certain probability. This additional support information

combined with the ASR’s identified spoken word or utterance could be used to increase the

HELPER’s confidence in decision making and dialogue planning. Further details of the

HELPER system will be provided in the Background section of this chapter.

3.3.5 Study Objective & Research Significance

Previous research studies have used conversational analysis to examine emergency calls (911)

(Cromdal et al., 2008; Garcia & Parmer, 1999; Garner & Johnson, 2007; Imbens-Bailey, 2000;

Waseem et al., 2010; Whalen & Zimmerman, 1987) but no prior studies could be identified

specifically examining personal emergency response calls and call conversations with PERS

users. Therefore, the objective of this study is to identify significant trends in real personal

emergency response calls and call conversations that may be used to tailor the call response

to the user. The research results presented in this paper will incorporate a PES model previously

developed in study 1. Emergency medical response personnel, clinicians and care providers may

find the information from this study useful in helping them understand some of the

81

communication differences that may arise between PERS users during various PESs. Technology

developers should also find this information applicable to the design of personal emergency

communication technologies for older adults and for the development of personal emergency

communication protocols. Preliminary results from this study have been previously summarized

in a short one page conference paper (Young, Rochon, & Mihailidis, 2014). Relevant

background will be presented first, followed by the study methodology, results, discussion, and

conclusion.

3.3.6 Background

3.3.6.1 The HELPER System

Figure 3-1 illustrates the pathway to personal emergency response using the HELPER.


The HELPER System




vision activation




dialogue



2a. Vision Module



Live Person

Hands free


The design is based on the concept that the HELPER will monitor the home for an adverse event

(e.g. a fall) (using its Vision Module – at 2a) and when detected, the system will automatically

initiate dialogue with the individual (using the Communication Module – at 2b). The individual

then communicates his/her need using speech with the HELPER functioning as a non-live, first

responder. The user may also activate the system manually by saying a specific keyword or

phrase (e.g., a cry for help). If assistance is required, the HELPER will subsequently initiate

82

contact with the desired live responder (see points 3a and 3b in Figure 3-1) or cancel the call if

no response is needed (e.g. a false alarm). In essence, the automated PERS functions similarly to

a hands-free telephone but with specialized and intelligent features.

3.3.6.2 The HELPER Communication Module




characterized by its ability to accept continuous speech, allow for user initiatives (i.e., user can

provide more information than requested), to reason, detect errors or incoherence, to correct,

anticipate, and/or predict the spoken user response. It is proposed that the HELPER

communication module contain all five of the basic functional components of a SDS including

(Georgila, Wolters, Moore, et al., 2010; Lamel et al., 2000; Mӧller, 2005):

(1) an ASR that receives an acoustic signal (spoken input) and transforms this into a most

probable word sequence;

(2) a Semantic Analyser or Natural Language Understanding component that deciphers the


(3) a Dialogue Manager that maintains the dialogue and keeps a history of responses;

(4) a Response Generation component that determines the output dialogue according to “the

dialog state, the user utterance, and/or information returned from the database” (Lamel

et al., 2000); and

(5) a Speech Synthesis component that converts selected system utterances to actual speech

output.

Additionally, a component for contacting a live responder is also included, conveniently called

the “call responder” component. ‘Dialogue measures’ and ‘classifier’ sub-components, located in

the Speech Informant and Dialogue Manager components respectively, are also proposed. Figure

3-2 illustrates the proposed internal components of the HELPER communication module.

83


Incoming Speech

Spoken Output

Speech Handler

Response Handler

Dialogue Handler

Responder On Route


Dialogue Manager

Response Generation

Speech Synthesis

Speech Informant

Call Responder


The results of this study specifically focus on improving various aspects within the Speech

Informant, the Dialogue Manager, and the Response Generation components of the HELPER

communication module. Further detail is provided only on these sub-components.

Figure 3-3 illustrates the internal sub-components of the Speech Informant component.

Semantic Analyzer (NLU)

Dialogue MeasuresSpeech

Informant (SI)

From the ‘ASR’ sub-component

To the ‘Call Classifier’ in the Dialogue Handler

Figure 3-3: Inside the Speech Informant sub-component of the Speech Handler.

The ‘best match’ speech utterance obtained from the ASR is sent to the semantic analyser (on

right) for natural language processing to “understand” the meaning of what was said, and also the

84

dialogue measures sub-component (on left) in which “other information” will be extracted and

used to inform the Dialogue Handler about how to classify the PES.

Inside the Dialogue Handler, see Figure 3-4, information from the Speech Informant is first sent

to a PES classifier which will identify any details from the user’s utterance that can be used to

classify the PES. This information is then sent to the dialogue control where it is determined how

next to respond to what the user said. The controller first searchers the dialogue history, the

current dialogue set and dialog state selects a response. The Response Handler is then activated

where the proposed response can be generated or a call to the responder can be made.

DialogueHandler

(DH)PES Classifier

Dialogue Manager

(DM)Dialogue History

Dialogue StateDialogue Control

Dialogue Set

To Response Generation (RG) in RH

From ‘Speech Informant’ Component

(with SA or NLU)

To Call Responder (CR) in RH

Figure 3-4: Inside the dialogue handler component of an SDS.

The Response Handler is illustrated in Figure 3-5. Aspects of the diagram are derived from

(Mӧller, 2005). Inside the Response Generation component, a database of possible dialogue

responses (text) is searched for the response requested by the Dialogue Manager. This response

is then sent to the ‘Speech Synthesizer’, which searches a database for the desired spoken

dialogue units, synthesizes the text to speech if necessary (pre-recordings of output dialogue may

be used), and sends the response out to the user through a speaker system.

If the Call Responder component is activated, the Call Responder might check for a preferred

responder or look through a history of requests to inform the Dialogue Manager if any further

85

query is required. Once a desired responder is confirmed, the call to the desired responder is

initiated.


ResponseHandler


Select Response

Responder On Route

Response Generation

Speech Synthesis


Spoken Dialogue




Call Responder

From Dialogue Manager

Figure 3-5: The internal components of the response handler within the SDS.

3.3.6.3 Human to Machine Spoken Dialogue Systems

The ability to simulate or replicate the human’s ability to recognize and understand speech using

technology has been a growing area of research for over 60 years (Anusuya & Katti, 2009; Gold

& Morgan, 2000). Although considerable progress has been made in the field of ASR, a human’s

capacity for speech recognition and understanding in a range of environments is still unmatched

and is superior to that of any machine (Dusan & Rabiner, 2005; Furui, 2003; Scharenborg,

2007). A major source of ASR error arises from a mismatch between the speech sounds used to

train the acoustic models and the actual incoming spoken speech to be recognized (Furui, 2003;

King, 2006). Automatically recognizing speech in human-to-human conversational speech is also

known to be a more difficult task than recognizing human-to-machine speech (Jurafsky &

Martin, 2009).

Generally speaking, when humans interact with a machine that can artificially recognize speech,

researchers have shown that they tend to simplify their speech, speaking more clearly and slowly

(Jurafsky & Martin, 2009). However, the danger in generalizing is that this statement may not be

true for all users interacting with speech recognizing machines. Specifically, research findings

86

from Mӧller et al. (2008) revealed that close to two-thirds of their older adult participants

interacting with a SDS did not adapt their speech but instead spoke to the system as if it was a

real human. The remaining older adult participants in that study did perform as expected and

adapted the way they spoke, using only the speech necessary to convey their meaning (Mӧller et

al., 2008).

Collectively, the research literature highlights the complexity and challenges of using ASR and

SDS and underlines the fact that technologies incorporating these techniques may not be able to

function perfectly 100% of the time even in optimal conditions with designed for users (Furui,

2003; Takahashi et al., 2003; Vipperla et al., 2009).

3.3.7 Study Focus as Applied to the HELPER

Given that the SDS may not function perfectly, in addition to speech recognition and

understanding, another method involving call classification is proposed to further support the

HELPER communication module and help it to determine the best way to tailor a response to the

end-user. Figure 3-6 expands on Figure 3-1 which illustrates the pathway to personal emergency

response using the HELPER system. On the left side of Figure 3-6, the PES model developed in

study 1 is used to characterize the personal emergency situation (1). The Helper System (2) is

represented in the middle-right, with some internal communication module components shown,

specifically the ASR, the Speech Informant (SI), and the classifier sub-component of the

Dialogue Handler. The PERS Response (3) completes the pathway on the far right.

This diagram demonstrates how the call classification sub-component (classifier) of the Dialogue

Handler would be used by the HELPER communication module. In this diagram, incoming

speech from the user is received by the HELPER computer. This speech input is processed

within the ASR to identify keywords and phrases. The keywords are categorized in the Speech

Informant to help derive the meaning of the recognized speech. In addition to recognizing and

categorizing the spoken words, conversational measures of speech (yellow star) could also be

used to help characterize the call situation.

Collectively this information would be used by the classifier sub-component to identify a

possible caller type, call reason, and/or medical risk level for the PES. If PES classifications can

87

be identified for a call, then this information could be used in addition to speech understanding

as a basis for modifying the call dialogue and matching it to the particular caller for the specific

PES. Timing information (yellow star) can also be used to limit the length of a response call

which would ensure the call does not continue indefinitely before defaulting to a live responder.

Study 2’s main focus will be on identifying the conversational measures that could be used to

classify a call as well as identifying the timing used to measure the length of a call.




ClassifierASR




Timing

SI


Word categories

Conversationalmeasures

HELPER Computer

What response?

Co

nve

rsat

ion

3. P

ER

S R

esp

on

se

Situation

Call ReasonRisk Level


UserCaller Type

Figure 3-6: Diagram of the pathway to personal emergency response using the HELPER with the addition of the ‘conversational measures’ and ‘timing’ features added.

3.4 Methodology

3.4.1 Research Design Method

An exploratory, sequential, mixed methods design was used for this study. Clark & Creswell

(2011) provide a good introduction to this method which consists of a ‘qualitative data collection

and analysis’ phase followed by a ‘quantitative data collection and analysis phase’ and ending

with a ‘final interpretation’ as illustrated in Figure 3-7 (Clark & Creswell, 2011).

88

Qualitative Data Collection

and Analysis

Quantitative Data Collection

and AnalysisBuilds to Interpretation

Figure 3-7: Diagram of the process of exploratory sequential mixed methods design (Clark & Creswell, 2011).

For the ‘data collection and analysis phases’ of both the qualitative and quantitative portions of

this research design method, content analysis is the approach used. Crede & Borrego (2010)

provide an example of using the content analysis approach within a mixed methods design.

Content analysis is an attractive method of inquiry applied in many research fields for analyzing

text (and sometimes other media) in context of its use (Cavanagh, 1997; Krippendorff, 2012).

Over the decades, content analysis has been used increasingly in the field of health research (Elo

& Kyngäs, 2008; Mays & Pope, 2000). Content analysis is flexible enough to examine data both

qualitatively or quantitatively and inductively (e.g., specific to general) or deductively (e.g.,

general to specific based on existing theory) (Elo & Kyngäs, 2008; Krippendorff, 2012). Content

analysis has also been used frequently in the area of computer text analysis since the late 1950’s

and in artificial intelligence (Krippendorff, 2012). In the field of artificial intelligence,

researchers were mainly focused on designing machines capable of understanding natural

language (Krippendorff, 2012), which is precisely a component of interest within the HELPER.

When used as a research method, content analysis is noted as being systematic, objective,

repeatable and a valid means of either quantifying phenomena or making inferences about data in

context (Krippendorff, 2012). Typically new knowledge or insights are dervied in the form of

concepts or categories describing some phonomenon or for the purpose of building a model,

conceptual system or map (Elo & Kyngäs, 2008). The outcome of a content analysis may be used

to guide future action which is especially useful in the field of health research (Elo & Kyngäs,

2008).

3.4.1.1 Method Limitations

In terms of limitations, the flexible advantage of content analysis is also its restriction. Some

researchers have noted that because content analysis does not proceed linearly and has minimal

89

formalized procedures, it can become more complex and difficult to implement than quantitative

analysis (Polit & Beck, 2004).

3.4.1.2 Method Implementation

The general procedure for implementing a content analysis include (Elo & Kyngäs, 2008;

Graneheim & Lundman, 2004; Krippendorff, 2012):

1. Selecting a unit of analysis (e.g., interviews, a program, parts of text);

2. Within the unit of analysis, selecting a meaning/coding/content/recording unit. Essentially,

one must decide what to analyse, to what degree of detail, and how sampling will be

conducted (e.g., should the codes include silence, sighs, laughter, and postures?);

3. Organizing the data (e.g., use open coding, categories, themes, abstractions);

4. Creating a model, conceptual system or map, or categories.

3.4.1.3 Method Approaches

Various approaches to the application of content analysis exist in research. Content analysis at

the conversation level becomes conversational analysis. For this study a conventional

conversational analysis will be performed followed by a quantitative conversational analysis. For

the conventional conversational analysis, coding categories are typically derived directly from

the conversational data and are generally used to describe a phenomenon in the data (Hsieh &

Shannon, 2005). For the quantitative conversational analysis, conversational data is coded into

explicit categories and then described using statistics (Morgan, 1993).

Conversational analysis focuses on studying naturally occurring speech as it ordinarily unfolds in

social settings (Mondada, 2012). The conversations may be studied through recorded voice or

video or by transcriptions of interactions using a specialized transcription convention

(Krippendorff, 2012). The main purpose of conversational analysis is to understand the structure

of “talk in interaction” with a minimum of two participants (Krippendorff, 2012). Conversational

analysis examines phenomena such as turn-taking, conversational moves, and other aspects of

conversation, all of which are of primary interest for this study (Krippendorff, 2012). This

method has been used to study medical interactions for over 30 years in settings between

physicians and patients, as well as, in other allied health specialty settings (Teas Gill & Roberts,

90

2012). The results of conversational analysis studies have also been applied to help improve

many medically related applications ranging from medical education, informing current medical

practices, and enhancing patient-provider communications (Teas Gill & Roberts, 2012).

3.4.2 Research Design Details

3.4.2.1 Research Population

All recorded calls used in this study were between the clients of the PERS provider or a care

provider and the PERS providers’ call taker. In a few cases, emergency medical service (EMS)

dispatchers or the PERS setup personnel were also included in the call. No subscriber details

were provided with the calls, but caller age and gender details were deduced from within the call

conversations where possible. We are unaware of any prior call "sorting", for example with

respect to gender, call type, caller type, and emergency risk level that may have occurred.

3.4.2.2 Research Setting

This study was completed at the University of Toronto in the Rehabilitation Sciences Institute.

The data processing was performed in the Intelligent Assistive Technology and Systems

Laboratory.

3.4.2.3 Data Collection

Personal Emergency Response Call Recordings

The personal emergency response calls used in this study were provided by a local, private PERS

provider upon our request for a sample of emergency and non-emergency calls. The non-

emergency calls recorded included: false alarms or accidental system activations, installation

setups or equipment test calls, scheduled check-ins, translation requests, and follow-up calls. The

emergency calls recorded included genuine emergency calls for either EMS (i.e., 911,

paramedics) or non-EMS emergency responders (i.e., relatives, friends, or professional care

providers). A total of 109 digitized call recordings were obtained from the PERS provider (name

withheld for confidentiality). These recordings were collected in two sessions over two years

(2008 - 52 calls and 2009 - 57 calls). All recordings were made in Canada. To our knowledge, all

clients in this study used the traditional push-button activator.

91

Confidentiality

Confidentiality agreements were signed between the private call centre providing the call

recordings and the Intelligent Assistive Technology and Systems Lab. These agreements outlined

how the data would be used and stored. In terms of usage, all transcripts were to be stripped of

personal or identifying information and access to call recordings would be limited to select

individuals upon approval by the Company. In terms of storage, all recordings would be kept in a

secure and locked location and all digital recordings on the computer would be kept under

password protection on a lab computer. All correspondences with the Company would also be

kept confidential.

3.4.2.4 Data Processing

Call Transcripts

Eighty-four (84) response calls were transcribed in total. The 24 non-transcribed calls consisted

of repeat recordings or were conversations between the emergency response service providers

only (i.e. between the personal emergency response provider’s call taker and EMS dispatchers

without subscriber involvement). Transcription was performed verbatim from digital audio files

using the computer software, “Systematic Analysis of Language Transcripts” (SALT), version

8.0 and 9.0 (Miller & Iglesias, 2006). The transcription process followed the SALT protocol

outlined in the user manual (Miller & Chapman, 2008). SALT was specially designed software

for “eliciting, transcribing, and analyzing language samples.” As such, in addition to

transcription tools, the SALT software also includes various analytical tools, including, but not

limited to, the ability to code words and utterances, and calculate words per minute or

conversational time lengths. The coding units of interest were extracted from the response call

transcripts using the "explore multiple transcripts" and "rectangular data file" features of the

SALT software.

Transcriptions were completed by listening to the digital call recordings on a computer using

headphones. The audio content was then transcribed directly into text in the SALT program. An

effort was made to capture non-word utterances (e.g., coughing), fillers (e.g., ‘eh’, ‘ah’), and to

note silent moments during the conversation. Patient identifying information was excluded in the

transcripts (i.e., no names, addresses, or contact information), however caller gender was

92

postulated based on clues from the conversation (i.e. use of “him” or “her” from a care provider,

or perceived voice pitch). If an age was mentioned in the conversation, this number was noted in

the comments section of the call transcript. Due to the nature of the working agreement with the

company providing the PERS, only a limited number of the laboratory research team members

had permission to listen to the raw call recordings.

These real call samples all had a fair amount of background noise embedded in the recordings,

presumably caused by both the caller’s and call centre’s background environments, as well as

being inherent in the recording equipment. During transcription, recordings had to be paused

frequently and the volume adjusted to very high levels in order to catch what was being said in

the conversation. Call recordings were stored on the computer as *.wav files and played using

Audacity (version 2.02) an open source, freeware for listening to and editing sound files

(Mazzoni et al., 2000). The sound files were played back using the “mono" or single audio track

with a sampling rate of 8 kHz and a sample format of 32-bit floating point.

Statistical Software

All data exploration and statistical analyses were performed using IBM SPSS Statistical software

package versions 21 and 22 (IBM, 2014).

3.4.2.5 Data Analysis

Figure 3-8 illustrates the main steps followed within two content analyses conducted in this

study. Starting at the large green arrow at the top left, “naïve” listening of the call recordings and

reading of the transcripts were first used to obtain a superficial and preliminary understanding of

the conversations and to identify possible directions for the analyses. Both analyses were

performed on data from transcribed calls, with one analysis at the call level and the other at the

conversational level. Starting at (a), a conventional conversational analysis was performed to

identify possible responder-type categories. The PES model categories from study 1 were used in

conjunction with the responder-type category to create the ‘personal emergency response’ (PER)

model. For each of the response calls, a call response (responder) was identified. At (b), a

quantitative conversational analysis was performed to identify conversational measures that

could be used to help classify a call according to the PER model categories. Conversational

93

measure data was isolated from the response calls and statistical analyses performed. Significant

relationships and trends were identified. This new information can be applied to improve the

HELPER communication model’s ability to tailor the call dialogue and proposed responder to

the specific user for various PESs.

Call Transcriptions (SALT)


Personal Emergency Situation Model

Categories

Identify Conversational Measures

Perform statistical analyses using conversational measures

data and PER categories

To Improve artificial intelligence (Decision Making

& Dialogue Management)

Quantitative Conversational Analysis at Conversation/Turn Level

Identify Significant Relationships

Identify responder categories

Identify response call responders

Conventional Content Analysis at Call Level

Develop Personal Emergency Response (PER) Model

Extract conversational measure data from

response calls

(b)

(a)

Figure 3-8: This flow diagram illustrates how calls were analysed and how outcomes were and could be applied.

Column (a) a call level analysis to identify responder type categories and to develop the PER model; Column (b) a quantitative conversational level analysis to identify conversational measures and significant relationships between measures and PER categories.

Identification of Response-Type Categories

In Study 1 (Chapter 2), a PES model was introduced that characterized the personal emergency

situation by caller type, call reason, and risk level. The model is reproduced in Figure 3-9. When

a PES is linked to a personal emergency response call, the model may be expanded to include

categories representing the final personal emergency response to be provided. Categories for the

additional response-type classification were derived from the call conversations.

94

Situation

Call Reason

Risk Level


UserCaller Type

Caller Type

• Older adult

• Care provider

Call Reason

• Medical call

• Fall call

Risk Level

• High Risk

• Medium Risk

• Low Risk

Figure 3-9: The PES model characterized by caller type, risk level, and call reason.

Personal Emergency Response Calls Included

With respect to organizing the call transcripts using the caller type category, in six (6) of the 84

calls, mixed conversations occurred with more than two callers speaking simultaneously;

specifically, the older adult user and one or more care providers. For analysis purposes, four of

these six combination calls were analysed using only speech input from the older adult user

(these calls were classified as 'older adult' calls). For the remaining two calls it was not possible

to extract only the older adult caller’s speech from the conversation whilst still maintaining

conversational coherence. As a result, these calls remain classified as “combination calls”.

Seventy-two (72) of 84 transcribed calls were used in the conventional conversational analysis.

Not included in the analyses were: nine false alarm calls (all non-emergent) initiated by the older

adult subscriber; one follow-up call (non-emergent) where the call taker was calling for a status

update from the older adult (e.g., have you been looked after?); two combination medical calls (

as described above: one urgent and one emergent call) made by the older adult subscriber and

care provider together. One of the combination calls excluded was also identified as an outlier

call due to a large number of speaker turns. This excluded call was classified as an urgent

medium risk, medical call and involved a high number of interactions with the care provider.

95

Selection of Conversational Measures

The conversational measures examined in this study included measures of verbal ability,

conversational structure, and timing for both callers and call takers. A brief summary of these

measures is provided in this section.

Verbal Ability

Three aspects of verbal ability were examined: rate of speech, speaker turn length, and

disfluency.

Rate of Speech

An older adult’s overall rate of speech and intelligibility can be affected by physiological

changes in the aging body as a result of higher breathing frequency and reduced vocal range,

speed and accuracy of structural movement (Zraick et al., 2006). In this study, we hypothesized

that older adult users speak more slowly than call takers and analysed speaker differences using

mean words per minute (WPM) and utterances per minute (UPM). In SALT (Miller & Chapman,

2008), WPM is determined by calculating the total completed words spoken per minute based on

elapsed time (includes main body words and mazes) and UPM is calculated using the total

number of utterance attempts per minute based on elapsed time including all speaker attempts.

Existing literature has previously shown that the rate of speech for older adults tends to be lower

than that of younger individuals (Yuan, Liberman, & Cieri, 2006). Whether this is true during a

PES will be determined.

Speaker Turn Length

According to Sacks, Schegloff, & Jefferson (1974), “the organization of 'taking turns to talk' is

fundamental to conversation…" (pg.2). In this analysis, a ‘speaker turn’ is defined as the unit of

speech or thought communicated by a participant during their turn to talk in a response call

conversation. The end of the first speaker's turn may be signaled either by silence or interruption

by the next speaker thereby causing the first speaker to stop speaking. The model of turn-taking

is outlined by (Sacks et al., 1974). Measures of "mean turn length (in words)" indicate how many

words the caller(s) and call taker utters during their turn to speak. In SALT, the mean turn length

in words is calculated using all main body words but excludes maze words. A speaker turn

96

length includes all "contiguous utterances of the same speaker" including non-verbal,

incomplete, or unintelligible utterances (Miller & Chapman, 2008).

Disfluency

Disfluencies are part of normal speech (Culatta & Leeper, 1990) and may be marked by the

presence of mazes. Reference (N. E. Hall, Wagovich, & Bernstein Ratner, 2007) defines a maze

as “a marker of linguistic disfluency in spontaneous speech,” p.162. The SALT help manual

defines a maze as, "any filled pause [e.g., uh, ah], false starts [e.g. and I (ha*) have], repetitions

[e.g., (and) and I] and reformulations [e.g. (He and) he said] that are parenthesized in the

utterance. ... When maze words are removed from the utterance, the remaining words can stand

alone." (Miller & Chapman, 2008). Ordinarily, mazes occur when a speaker is expressing an idea

that may be abstract, complicated or partially formulated (Leadholm & Miller, 1995). Research

studies suggest that 6-10% or more of spontaneous speech will contain mazes depending on the

discourse and situational context with older adults producing slightly more than younger adults

(Bortfeld, Leon, Bloom, Schober, & Brennan, 2001; Fox Tree, 1995; Shriberg, 1999). By

examining the proportion of total word mazes occurring more than 10% of the time, an estimate

of the number of disfluencies above typical expectations will be obtained.

Note that 'speaker intelligibility' was excluded from this analysis. Upon close examination of the

recorded transcripts, it was difficult to determine true unintelligibility in many situations due to

recording issues (e.g., two speakers speaking concurrently; call taker’s voice being recorded

directly at the microphone versus the caller’s being transmitted over speaker phone).

Conversational Structure

Four aspects of conversational structure were examined: the number of statements, questions,

responses to questions, and one word responses.

Statements, Questions and Responses

Research literature on emergency calls (Whalen & Zimmerman, 1987, 1987), Emergency Call

Centre protocol (Private_PERS_Call_Centre, 2008), and on-site observations of call takers

indicate a majority of the queries in the call conversation are by call takers and a majority of the

responses to questions in the call conversation are by PERS users. Analyzing

97

statements/queries/and response aspect of the call conversation will help identify any differences

in conversational structure between callers and call takers and verify what is known to occur via

the personal emergency response provider’s call handling protocol.

One Word Utterances

One speaker turn may be composed of one or more speaker utterances either verbal or non-

verbal. In this study, utterances were determined using the phonological method of segmentation

as described in the SALT manual (Miller & Chapman, 2008). The number of one word

utterances will give an idea of the frequency of short one word statements within a conversation.

The number of one word utterances is expected to be higher among callers versus call takers.

Response Call Response Time

In an actual emergency situation, seconds matter. Eight minutes or less is the current

recommended target time for 90% of emergency responses (Eisenberg, Bergner, Hallstrom, &

others, 1979; Mullie, Van Hoeyweghen, & Quets, 1989; Pons et al., 2005; Silverman et al.,

2007). For individuals in cardiac arrest, a response time of five minutes or less has been found to

increase survival rates for patients (Blackwell & Kaufman, 2002; Pons et al., 2005; Silverman et

al., 2007). In a PES, the call taker's main goal is to determine what response is required during a

call and to initiate an appropriate response as quickly as possible. We define the time between

when the response call conversation first begins to the time when the call taker says "good-bye"

or puts the caller on hold to initiate a call to the desired responder (e.g., to call the ambulance), as

the ‘call taker response time’. Two measures were used to determine the response time: (1) the

number of speaker turns and (2) time in seconds. These two timing measures will be useful for

establishing standards against which response call conversations with the HELPER can be

compared. In this study, two response call categories: (1) caller type and (2) risk level were used

to assess their effect on response time.

98

3.5 Results

3.5.1 The Conventional Conversational Analysis

3.5.1.1 Two Main Response Types

Based on the response call transcript conversations, two broad categories were identified for the

“response type” classification: (1) EMS and (2) other responders. The EMS group includes

paramedics, fire fighters, and police. The “other responders” group includes non-EMS providers,

such as family, friends, or acquaintances, in addition to professional care providers such as

personal support workers (PSWs) or nurses. An “all responder” category could also be

considered representing the situation where “all responders” are called to attend a PES including

both EMS and other responders.

3.5.1.2 A Closer Look at Response Types

In reading through the response call transcripts, one particular call revealed that the terms

“ambulance” and “paramedic” could be perceived as different assistance request types. This

finding is interesting as the caller was not simply using the terms interchangeably. The caller

specifically declined the proposal for an “ambulance” and requested a “paramedic” be sent

instead. The reasoning behind this was that this caller did not want to go to the hospital. The

following excerpt from the transcript (call example 1) is presented below (E = call taker, C =

older adult caller, arrow brackets < > mark overlapping speech, parentheses ( ) mark repeated

speech or mazes, and {} mark comments or other noises):

Call Example 1:

Line 1 E: Do you need an ambulance?

Line 2 C: {Grunt} No, I don’t need an ambulance, I thought paramedics or something <> to check me over.

Line 3 E: <Yes>, you want the paramedics to come and check you over?

Line 4 C: Yeah, (I) I don’t want (an) an ambulance <>.

Line 5 E: <Oh>.

Line 6 C: <Cause> I’m not going anywhere.

99

In situations where callers are requesting non-EMS responders, it is important to note that even

when medical attention may be necessary, the PERS user is very clear about wanting someone

other than EMS support. In call example 2, the older adult is feeling weak and vomiting.

However, when asked if an ambulance is required, this caller requests the daughter as an

alternate response.

Call Example 2:

Line 1 E: Hello, how are you?

Line 2 C: Oh, I need help (weak, shaky voice).

Line 3 E: What’s wrong?

Line 4 C: (Oh I) I keep throwing up and going to the bathroom.

Line 5 E: (You) You’re vomiting?

Line 6 E: How long has this been going on?

Line 7 C: Oh, it just started now <xx>. {xx = two unintelligible words}

Line 8 E: <Okay>.

Line 9 E: Okay, is there anyone there with you right now?

Line 10 C: No.

Line 11 E: Okay.

Line 12 E: Okay so do you want me to call an ambulance for you or did <you wan*>>

Line 13 {E was cut-off mid-word}

Line 14 C: <No> no, I just want you to call my daughter.

Call Example 2 shows how the call taker quickly assesses the PES and makes an initial decision

about what response to provide. From lines 4 to 9, the call taker identifies the problem and if

anyone is onsite. At line 12, the call taker has decided to suggest an ambulance.

In call example 3, the older adult would like assistance and the paramedics are offered, but the

preference is for someone else. Unfortunately, there are no other responders on this caller’s list.

The call operator concludes that only paramedics can be sent in this situation.

Call Example 3:

Line 1 C: (Eh) I wonder if you could (have s*) send somebody down to my place?

Line 2 E: And who would you like me to call for you?

100

Line 3 C: (Eh) well x {possible grunt} nobody.

Line 4 E: Would you like the paramedics?

Line 5 C: (Ah) can you get somebody else?

Line 6 E: Somebody else, other than the paramedics?

Line 7 C: That’s right.

Line 8 E: Oh well (uh) you don’t have any responders on your file.

Line 9 E: (Uh), is there anyone in particular you would like me to call?

Line 10 C: No.

Line 11 E: Okay, we can only call the paramedics.

In Call Example 3, the call taker is asked for assistance in line 1 right away and so has not had a

chance to assess the situation. In line 2, the call taker leaves it up to the caller to inform her of

what response is desired. However, from lines 3-10, the call taker discovers that even though the

caller does not want EMS, this is the only response that can be provided. In line 11, she explains

this to the caller.

These excerpts not only show the importance placed on having different types of response

targets, but also gives an example of some of the speech input that may make ASR challenging.

For example, in Call Example 1, line 2, a “no” response is immediately followed by a suggestion

for a different response, and the “grunt” at the beginning would be considered a word by the

ASR system which is out-of-vocabulary. Call Example 3 also shows a special situation where

even when help is offered and refused, there may need to be a state in the dialogue set that

indicates that the response offered is the only available option.

The final categories selected to characterize the response call’s response types include: (1)

ambulance, (2) paramedic, (3) other responder, and (4) all responders (EMS and other).

3.5.1.3 The Personal Emergency Response (PER) Model

As illustrated in Figure 3-10, the PES model was expanded to the PER model. The personal

emergency response model includes the situation classified by caller type, risk level, call reason,

and response type. Sub-categories within each classification are shown.

101

Response ObtainedResponse Type

Situation

Call Reason

Risk Level


UserCaller Type

Caller Type

• Older adult

• Care provider

Call Reason

• Medical call

• Fall cal l

Risk Level

• High Risk

• Medium Risk

• Low Risk

Response Type

• Ambulance

• Paramedic

• Other responder

• All Responders

Figure 3-10: The personal emergency response (PER) model.

3.5.2 Conversational Analysis using PER Categories

3.5.2.1 Descriptive Statistics

Fifty (50) calls were made by older adult callers and 22 were made by care providers. Subscriber

age at the time of the call was determined for 53 of 84 calls (63%). Mean age was 82 years

(standard deviation (S.D.) = 8.79) with the youngest known age being 51 years and the oldest

known age being 100 years old. There were 69 female and 15 males subscribers, with gender

being inferred from the conversation (i.e. use of “he” or “she” by the other caller) or by voice

pitch (low for males, higher for females). The higher female caller ratio observed in the

collection of response call transcripts is common amongst PERS users in this age group (Fallis et

al., 2007; Heinbüchner et al., 2010; Hyer & Rudick, 1994; Taylor & Agamanolis, 2010).

3.5.2.2 Call Breakdown Using PER Classifications

Associations between caller type, risk level, call reason, and response type were examined. See

Figures 3-11a and 3-11b for a breakdown of the frequency of response calls by caller type (older

adult vs. care provider) by risk level (emergent high vs. urgent medium risk level), call reason

(fall vs. medical call), and response type (EMS vs. Other Responders).

102

23

1

15

6

32

0

5

10

15

20

25

EMSResponse

OtherResponse

EMSResponse

OtherResponse

EMSResponse

OtherResponse

EMSResponse

OtherResponse

Medical Calls Fall Calls Medical Calls Fall Calls

Emergent (High) Risk Level Urgent (Medium) Risk Level

Freq

uency (#)

15

7

0

5

10

15

20

25

EMSResponse

OtherResponse

EMSResponse

OtherResponse

EMSResponse

OtherResponse

EMSResponse

OtherResponse

Medical Calls Fall Calls Medical Calls Fall Calls

Emergent (High) Risk Level Urgent (Medium) Risk Level

Freq

uen

cy (#)

Figure 3-11: Older Adult and Care Provider responders requested during a response call. The older adult responses are in (a) and the care provider responses are in (b).

Using Pearson's Chi Square statistic and Fisher's Exact test (in cases where counts were less than

five), no significant associations were found between caller type and call reason; caller type and

risk level; and call reason and response type.

(a)

(b)

103

Three significant associations were found:

Caller Type vs. Response Type

Using Fisher’s Exact test, a borderline significant relationship was found between caller type and

response type, p=0.049 (Exact sig., 2-sided). This result suggests that a difference exists between

the response-type requested by different callers. Specifically, older adult and care provider

callers were found to both request EMS responses, however, older adults also made requests for

other responders.

Risk Level vs. Call Reason

Using Fisher’s Exact test, a significant relationship was found between risk level and call reason,

p=0.017 (Exact sig., 2-sided). This result suggests that emergent high risk calls were more likely

to be medically related than fall related.

Risk Level vs. Response Type

Using Fisher’s Exact test, a significant relationship was found between risk level and response

type, p=0.009 (Exact sig., 2-sided). This result suggests that emergent high risk calls were more

likely to lead to an EMS response whereas urgent medium risk calls might also lead to other

response types.

3.5.2.3 Breakdown of Response Types

In terms of response type, care providers requested EMS responses 100% of the time for both

high and medium risk medical situations. Of the three calls requesting a 'paramedic' response,

two calls were high risk and one was medium risk. For older adult callers, EMS responses were

requested 96% of the time in high risk, medical call situations: 19 out of 24 calls were for

ambulances. The other calls consisted of two calls for the 'paramedics', one call for an ‘other

responder’, and two calls for both ‘EMS and other’ responders. In medium risk medical call

situations, EMS requests dropped to 71% with 12 out of 21 calls for the ‘ambulance’, two calls

for the 'paramedics', six calls for ‘other responders’, and one call for ‘EMS and other’

responders. In medium risk fall situations (five calls total), there was a fairly distributed range of

104

requests with two calls for the ambulance, one call for the paramedic, and two calls for ‘other

responders’.

3.5.3 Conversational Analysis using Conversational Measures

Significant group relationships between caller type and response type, risk level and call reason,

and risk level and response type suggest that the desired ‘response type’ may be identified if the

‘caller type’ and/or ‘risk level’ could be determined. Identifying the ‘call reason’ may also help

in identifying ‘risk level’ or vice versa which could then be used to estimate “response type”.

In this exploratory analysis, three repeated measures multivariate analysis of variance

(RM_MANOVA) tests were conducted to examine the relationships between three independent

factors: (1) caller type, (2) risk level, and (3) speaker type, with three conversational speech

measures: (1) verbal ability, (2) conversational structure, and (3) timing. The independent factor

‘speaker type’ was included in the RM_MANOVAs to allow for a comparison of ‘callers’

against ‘call takers’. The ‘call takers’ represent the "norm" or "control" group because these

individuals are not experiencing the emergency situation themselves. Due to a low number of

data points within the ‘response type’ group’s 'non-EMS calls' and the ‘call reason’ group’s 'fall

calls', these factors were not included in the RM_MANOVAs. The independent factors used in

the analysis consisted of two levels each: speaker type included ‘callers’ and ‘call takers’; caller

type included ‘older adult’ and ‘care provider’ callers; and risk level included ‘high (emergent)’

and ‘medium (urgent)’ medical risk levels. Speaker type was a ‘within subjects factor’ and risk

level and caller type were ‘between subject factors’.

An additional outlier was removed in this analysis. The call removed was an urgent medium risk,

fall call by an older adult and had a higher number of speaker turns due to hearing difficulties

between the older adult caller and the call taker. In the call there were several question

repetitions, clarifications, circular conversations and additional requests. This outlier was kept in

the analysis using PER categories because speaker turns and timing were not being assessed and

the data would not be affected by its inclusion. A total of 71 calls were used in this analysis with

unbalanced group counts.

105

Univariate analysis of variance (ANOVA) tests and t-tests were conducted following the

RM_MANOVA to compare different groups with significant multivariate effects. Discriminant

analyses were also conducted to examine which and how well these measures could be used to

predict significant independent factors.

3.5.3.1 Analysis of Verbal Ability Measures

The three measures of verbal ability were: words per minute (WPM), utterances per minute

(UPM), and turn length in words (TNL). The proportion of total words with mazes (MZW)

was examined independently as MZW could not be normalized sufficiently to include in the

RM_MANOVA. Log10(x) transformations were applied to UPM and TNL, and a square root

transformation was applied to MZW in order to normalize the data as outlined in (Field, 2005).

No transformations were required for WPM. Moderate correlation was observed between WPM,

UPM, and TNL; and between MZW and UPM; with Pearson's correlation coefficients ranging

from 0.3 to 0.65, p<0.001. All other correlations were less than 0.3 or were not significant.

The results of the RM_MANOVA revealed significant within subjects multivariate effects for

speaker type, Wilks' λ = 0.616, F(3,65) = 13.52, p<0.001, η2 = 0.384, and for the interaction

between speaker and caller type, Wilks' λ = 0.742, F(3,65) = 7.53, p<0.001, η2 = 0.258. A

borderline significant effect was observed between speaker type and risk level, Wilks' λ = 0.891,

F(3,65) = 2.65, p=0.056, η2 = 0.109. Between subjects, significant multivariate effects were

obtained for both caller type, Wilks' λ = 0.689, F(3,65) = 9.80, p<0.001, η2 = 0.311, and risk

level, Wilks' λ = 0.887, F(3,65) = 2.77, p=0.049, η2 = 0.113. There was no significant

multivariate interaction effect on caller type and risk level. No significant multivariate effect was

observed for the three-way interactions between speaker type, caller type and risk level. In

Figure 3-12, four box plots show: (a) mean words per minute; (b) mean utterances per minute;

(c) mean turn length in words; and (d) mean mazes per total number of spoken words, broken

down by risk levels for caller and speaker types.

106

(a)

(c) (d)

(b)

Figure 3-12: Boxplots of verbal ability measures broken down by risk levels for caller and speaker types.

Words per Minute

As observed in Figure 3-12a, the mean WPM spoken by older adult callers was found to be

significantly lower than that of care provider callers, F(1,67)=19.75, p<0.001, η2 = 0.228

(univariate test for caller type). The mean WPM spoken by callers as a group was found to be

significantly different than the call taker group, F(1,67)=5.48, p=0.022, η2 = 0.076 (univariate

test for speaker type) and a significant interaction effect was obtained between speaker type and

caller type, F(1,67)=19.0, p<0.001, η2 = 0.221. Paired samples t-tests conducted between each

caller level and the associated call takers revealed no significant difference in WPM between

107

care provider callers and call takers but a significant difference between older adult callers and

call takers, t(48)=-6.51, p<0.001. These results suggest that older adult callers speak

significantly fewer WPM compared to both care provider callers and call takers.

The mean WPM spoken was found to be similar across high and medium risk levels for callers

(no significant interaction effect was observed between risk level and caller type), but was

significantly different between callers and call takers, F(1,67)=7.09, p=0.010, η2 = 0.096

(interaction effect between speaker type and risk level). Independent samples t-tests conducted

for each caller level and the call taker group between risk levels revealed no significant

differences in WPM between high and medium risk levels for the care provider or older adult

caller groups, but a significant difference in WPM was observed between high and medium risk

levels for the call taker group, t(69) = 3.15, p=0.002. These results suggest that the call taker

speaks significantly higher mean WPM during high risk situations compared to medium

risk situations, a difference not observed in the caller group.

Utterances per Minute

As observed in Figure 3-12b, callers were found to use significantly fewer UPM compared to

call takers regardless of risk level, F(1,67)=41.22, p<0.001, η2 = 0.381 (univariate test for

speaker type) and F(1,67)=7.34, p=0.009, η2 = 0.099 (univariate test for risk level). There was

no significant effect for caller type. A significant interaction effect was obtained between speaker

type and caller type, F(1,67)=7.13, p=0.010, η2 = 0.096. Paired samples t-tests conducted at each

caller level between callers and call takers revealed significant differences in UPM between call

takers and both care provider callers, t(21)=-2.58, p=0.018, and older adult callers, t(48)=-8.74,

p<0.001. These results suggest that call takers speak significantly more UPM than both

callers, but care providers and older adults are similar in their number of UPM.

No significant effects were obtained between speaker type and risk level nor between caller type

and risk level. Independent samples t-tests conducted at each caller level and with the call taker

group between risk levels confirmed no significant differences in UPM at high and medium risk

levels for the care provider or older adult caller levels. A significant difference in UPM was

obtained between high and medium risk levels for the call taker group, t(69) = 2.71, p=0.008.

These results suggest that the call taker speaks significantly more UPM during high risk

108

situations compared to medium risk situations, a difference not observed in the caller

group.

Turn Length in Words

As observed in Figure 3-12c, care provider callers were found to have significantly longer TNLs

compared to older adult callers, F(1,67)=8.47, p=0.005, η2 = 0.112 (univariate test for caller

type); and a significant mean difference was also found between the caller group and call takers,

F(1,67)=3.86, p=0.054, η2 = 0.054 (univariate test for speaker type). A significant interaction

effect was obtained between speaker type and caller type, F(1,67)=22.03, p<0.001, η2 = 0.247.

Paired samples t-tests conducted at each caller level between callers and call takers revealed no

significant difference in TNL between care provider callers and call takers but a significant

difference in TNL between older adult callers and call takers, t(48)=-6.12, p<0.001. These

results suggest that care provider callers have TNL comparable to call takers, but older

adults tend to have significantly shorter TNL compared to both call takers and care

providers.

TNL does not appear to be significantly different for different risk levels and no significant

effects were observed for risk level. Also, no significant effects were obtained between speaker

type and risk level or between caller type and risk level.

Percent Maze Words

As observed in Figure 3-12d, the mean number of MZWs spoken by callers was higher than for

call takers. Paired-samples t-tests were conducted to compare the proportion of MZW between

each caller level and the call taker group. No significant difference was observed between care

provider and older adult callers, but call takers spoke a significantly lower number of MZW

compared to the callers combined, t(70)=5.35, p<0.001.

Independent-samples t-tests were conducted to compare the proportion of MZWs between risk

levels, and for each caller and speaker types at different risk levels. No significant difference was

observed between the overall risk levels or between the older adult and call taker groups at the

different risk levels. A borderline significant result was obtained for the care provider group at

different risk levels, t(20)=-2.03, p=0.056. These results show an increase observed in the

109

proportion of MZW produced by the care provider during medium risk calls but it is

borderline significant.

The frequency of mazes occurring more than 10% (or 0.1) of the total words, (see dotted line in

Figure 3-12d), was also calculated for each speaker. Older adults expressed a greater number of

mazes per total number of words occurring more than 10% of the time, 34.7% of transcripts (17

times out of 49 calls), compared to care provider callers, 22.7% of transcripts (5 of 22 calls), and

call takers, 5.6% of transcripts (4 of 71 calls). Using the Chi-Square test, frequencies between

older adult and care provider callers were not found to be significantly different, however, when

call taker frequencies were included a significant difference was obtained, χ2(2)=16.71, p<0.001.

Discriminant Analysis

A discriminant analysis was used to examine speaker predictability between caller types: the

older adult and care provider, using three predictor variables: WPM, UPM, and TNL. In an

automated PERS application, it is not necessary to identify the call taker since this role is played

by the automated PERS computer. The transformed variables were used in the analysis for UPM

and TNL. Box's M was non-significant at the 0.05 level. The discriminant function (DF) revealed

a significant association between caller type and all predictors. Entering independent variables

together, Wilks λ = 0.677, χ2(3)=26.31, canonical correlation = 0.568, p<0.001. 32.3% of the

variance between older adult and care provider speakers could be accounted for by the three

predictor variables. Using the standardized canonical discriminant function coefficients, the

discriminant function revealed two major predictors: WPM and TNL, DF = (1.465 x WPM)

+ (-0.631 x TNL) + (-0.323 x UPM). Classification based on the DF and group centroids (Care

Provider = 1.016; Older Adult = -0.456) using the original group cases resulted in high success at

81.7% of cases being correctly classified, 93.9% of older adults and 54.5% of care providers (out

of 22 Care Provider and 49 Older Adult cases). Classification using cross validation in SPSS is

performed where each case is classified by the functions derived from all cases except for the

case of interest. The results using cross-validated classification dropped the number of correctly

classified cases to 77.5% with the older adult and care provider percentage of correctly classified

cases dropping to 89.8% and 50% respectively. Re-running the discriminant analysis using only

the variables with significant differences, WPM and TLW, did not change the overall number of

110

cases correctly re-classified (using original group cases) but did modify the individual

percentages correctly classified to 91.8% of older adults and 59.1% of care providers. Box's M

remained non-significant and Wilks λ = 0.684, χ2(2)=25.80, canonical correlation = 0.562,

p<0.001, DF = (1.156 x WPM) + (-0.218 x TLW). Using cross-validation classification the

number of correctly classified cases was 80.3% with only the care provider percentage of

correctly classified cases dropping to 54.5%. These classification results apply only to the cases

used in this study.

3.5.3.2 Analysis of Conversational Structure Measures

The four conversational structure measures included were: number of statements (NS),

number of questions (NQ), number of responses to questions (NRQ), and number of one

word utterances (OWU). All measures followed a non-normal distribution. Log10(x+1)

transformations were applied to NS, NQ, NRQ and OWU measures in order to normalize the

data as outlined in (Field, 2005). Moderate correlations were observed between NS, NQ, NRQ

and OWU, with Pearson's correlation coefficients in the range of 0.3 to 0.7, p<0.001.

The results of the RM_MANOVA revealed a significant within subjects multivariate effect for

speaker type, Wilks' λ = 0.191, F(4,64) = 67.66, p<0.001, η2 = 0.809. A significant between

subjects multivariate effect was obtained for caller type, Wilks' λ = 0.857, F(4,64) = 2.66,

p=0.040, η2 = 0.143, and a borderline significant multivariate effect was obtained for risk level,

Wilks' λ = 0.869, F(4,64) = 2.42, p=0.057, η2 = 0.131. All 2 and 3 way interaction effects were

non-significant. In Figure 3-13, four box plots show: (a) mean number of statements; (b) mean

number of questions; (c) mean number of responses to questions; and (d) mean number of one

word utterances, broken down by risk levels for caller and speaker types.

Number of Statements

As observed in Figure 3-13a, the NS spoken by callers is similar between older adult and care

provider callers (no significant effects for caller type); but differ as a combined group from that

of the call taker group, F(1,67)=125.01, p<0.001, η2 = 0.651 (univariate test for speaker type).

Paired samples t-tests conducted at each caller level between callers and call takers revealed

significant differences in NS between care providers and call takers, t(21)=8.43, p<0.001, and

111

older adults and call takers, t(48)=10.43, p<0.001. These findings show that both callers made

significantly more statements than the call takers during the response call.

(a) (b)

(c) (d)

Figure 3-13: Box plots of conversational measures broken down by risk levels for caller and speaker types.

The NS spoken at high risk levels was found to differ significantly from those at medium risk

levels, F(1,67)=8.82, p=0.004, η2 = 0.116 (univariate test for risk level). Independent samples t-

tests conducted for each caller level and the call taker group between risk levels revealed

significant differences between high and medium risk levels for older adult callers, t(47)=-2.82,

p=0.007, and call takers, t(69)=-3.34, p=0.001, but no significant difference was observed for the

care provider callers. These results suggest that both older adult callers and call takers make

112

significantly fewer statements during high risk calls than medium risk calls, however care

providers make approximately the same NS across risk levels. No significant interaction

effects were obtained between caller type, speaker type and/or risk level.

Number of Questions

As observed in Figure 3-13b, the NQ asked by care provider callers was significantly less than

older adult callers, F(1,67)=7.31, p=0.009, η2 = 0.098 (univariate test for caller type); and the

caller group NQs differed significantly from that of the call taker group, F(1,67)=269.46,

p<0.001, η2 = 0.801 (univariate test for speaker type). Paired samples t-tests conducted between

each caller level and the associated call takers revealed significant differences in NQ between

care providers and call takers, t(21)=-12.42, p<0.001, and older adults and call takers, t(48)=-

15.20, p<0.001. These findings suggest that both callers asked significantly less questions

than the call takers during the response calls. There was no significant effect for risk level and

no significant interaction effects were obtained.

Number of Responses to Questions

As observed in Figure 3-13c, care provider callers had significantly less NRQ than older adult

callers, F(1,67)=5.35, p=0.024, η2 = 0.074 (univariate test for caller type); and the caller group

had significantly more NRQ than the call taker group, F(1,67)=267.50, p<0.001, η2 = 0.800

(univariate test for speaker type). Paired samples t-tests conducted between each caller level and

the associated call takers revealed significant differences in NRQ between care providers and call

takers, t(21)=12.00, p<0.001, and older adults and call takers, t(48)=15.06, p<0.001. These

findings confirmed that both care provider and older adult callers responded to

significantly more questions than call takers, and older adults responded to more questions

than the care provider. The NRQ spoken did not differ significantly across risk levels. No

significant interaction effects were obtained.

Number of One Word Utterances

As observed in Figure 3-13d, care providers had borderline significantly fewer OWU than older

adult callers, F(1,67)=3.93, p=0.052, η2 = 0.055 (univariate test for caller type); and the caller

group differed significantly from that of the call taker group, F(1,67)=6.78, p=0.011, η2 = 0.092

113

(univariate tests speaker type). Paired samples t-tests conducted between each caller level and the

associated call takers revealed a significant difference in OWU between older adults callers and

call takers, t(48)=4.23, p<0.001, but no significant difference between care providers callers and

call takers. These findings suggest that older adult callers made significantly more one word

utterances than both care provider callers and call takers, while one word utterances are

similar between care provider callers and the call taker.

The OWUs spoken also differed between high and medium risk levels regardless of caller or

speaker type, F(1,67)=6.23, p=0.015, η2 = 0.085 (univariate test for risk level). Independent

samples t-tests conducted for each caller level and the call taker group between risk levels

revealed significant differences between high and medium risk levels for older adult callers,

t(47)=-2.27, p=0.028, and call takers, t(69)=-2.70, p=0.009, but no significant difference was

observed for the care provider callers. These results suggest that both older adult callers and

call takers made significantly fewer OWU during high risk calls than medium risk calls,

while care providers make approximately the same number of OWU across risk levels. No

significant interaction effects were obtained.

Discriminant Analysis

A discriminant analysis was used to examine speaker predictability between caller types (Older

Adult and Care Provider) using four predictor variables: NS, NQ, NRQ, and OWU. Transformed

variables were used in the analysis. Box’s M test was not significant at the 0.05 level. The

discriminant function revealed a significant association between caller type and all predictors.

Entering independent variables together, Wilks λ = 0.831, χ2(4)=12.38, canonical correlation =

0.411, p=0.015. 16.9% of the variance between older adult and care provider speakers was

accounted for. Using the standardized canonical discriminant function coefficients, the

discriminant function revealed three major predictors: NS, NRQ and OWU, DF = (-0.838 x

NS) + (0.150 x NQ) + (1.20 x NRQ) + (0.450 x OWU). Classification based on the DF and

group centroids (Care Provider = -0.663; Older Adult = 0.298) using the original group cases

resulted in moderately-high success at 70.4% of cases being correctly classified. 89.8% of older

adults and 27.3% of care providers (out of 22 Care Provider and 49 Older Adult cases). The

results using cross-validated classification dropped the number of correctly classified cases

114

slightly to 69%, with only the older adult percentage of correctly classified cases dropping to

87.8%. Re-running the discriminant analysis using only the variables with significantly mean

differences, NRQ and OWU, resulted in increasing the number of cases correctly re-classified

(using original group cases) to 74.6%: 93.9% of older adults and 31.8% of care providers. Wilks

λ = 0.869, χ2(2)=9.54, canonical correlation = 0.362, p=0.008, DF = (0.702 x NRQ) + (0.404 x

OWU). The results using cross-validated classification dropped the number of correctly

classified cases slightly to 66.2%, with the older adult and care provider percentages of correctly

classified cases dropping to 87.8% and 18.2% respectively. These classification results apply

only to the cases used in this study.

These same four conversational structure measures were also found to predict risk level. The

discriminant function revealed a significant association between high risk and medium risk levels

and all predictors. Entering independent variables together, Wilks λ = 0.851, χ2(4)=10.84,

canonical correlation = 0.386, p=0.028. Box’s M test was not significant at the 0.05 level. 14.9%

of the variance between high and low risk levels was accounted for. Using the standardized

canonical discriminant function coefficients, the discriminant function revealed one major

predictor: NS, DF = (1.05 x NS) + (0.153 x NQ) + (-0.258 x NRQ) + (0.186 x OWU).

Classification based on the DF and group centroids (high risk = -0.374; medium risk = 0.456)

using the original group cases resulted in moderate success at 67.6% of cases being correctly

classified, 76.9% at the high risk level and 56.3% at the medium risk level (out of 39 high risk

and 32 medium risk cases). The results using cross-validated classification dropped the number

of correctly classified cases slightly to 66.2% with only the care provider percentage of correctly

classified cases dropping to 74.4%.

3.5.3.3 Analysis of Timing Measures

Timing measures included: number of speaker turns (ST) and time in seconds, all of which

followed a non-normal distribution. Log10(x) transformations were applied to these measures to

normalize the data. A very high and significant Pearson’s correlation coefficient of 0.8, p<0.001,

was observed between ST and seconds. As a result, the seconds measure was not included in the

RM_MANOVA and examined separately. In Figure 3-14, two box plots show (a) mean number

115

of speaker turns and (b) mean time in seconds broken down by risk levels for caller and speaker

types.

(a) (b)

Figure 3-14: Box plots of timing measures broken down by risk levels for caller and speaker types.

Number of Speaker Turns

As observed in Figure 3-14a, significant differences were found in the number of ST between

callers (Mean=7.85, S.D.=4.24) and call takers (Mean=9.27, S.D.=4.6). RM_MANOVA results

revealed a significant within subjects multivariate effect for speaker type, Wilks' λ = 0.599,

F(1,67) = 44.82, p<0.001, η2 = 0.401. The difference in number of ST between care provider

callers (Mean=6.27, S.D.=3.15) and older adult callers (Mean=8.55, S.D.=4.5) was not found to

be statistically significant. Paired samples t-tests conducted between each caller level and the

associated call takers revealed a significant difference between care provider callers and call

takers, t(21)=-5.17, p<0.001, and between older adult callers and call takers, t(48)=-5.70,

p<0.001. These results suggest that older adult and care provider callers speak on average

fewer ST than call takers, as well, older adult and care provider callers use a similar

number of speaker turns. Essentially the results suggest that call takers are indeed managing

the conversation and usually speak the first and last words.

High risk calls (Mean=6.28, S.D.=2.79) were found to require fewer ST than medium risk calls

(Mean=9.75, S.D. 4.93). RM_MANOVA results revealed, a significant between subjects

116

univariate effect for risk level, F(1,67) = 7.61, p=0.007, η2 = 0.102, but no significant differences

were observed for caller type nor for any 2 or 3 way interactions within or between subjects.

Box’s M and Levene’s Tests were all non-significant at the 0.05 level. Low risk calls had a mean

of 3.50 ST with a S.D.=1.35. Independent samples t-tests conducted for each caller level and the

call taker group between risk levels revealed significant differences between high and medium

risk levels for older adult callers, t(47)=-2.33, p=0.024, and call takers, t(69)=-3.50, p=0.001, but

no significant difference was observed for the care provider callers (close at t(20)=-1.82,

p=0.084). These results suggest that both older adult callers and call takers take

significantly fewer ST during high risk calls, while care provider callers require

approximately the same number of ST across risk levels.

Time in Seconds

The results of a two-way ANOVA examining the relationship between call taker’s response time

‘time in seconds’ with caller type (at two levels: care provider and older adult) and risk level (at

two levels: high and medium risk) revealed a significant difference for risk level, F(1, 67)=13.31,

p=0.001, but no significant difference for caller type or the interaction between caller type and

risk level. These results suggest that high risk calls (Mean=40.64 sec, S.D. 22.25) have a lower

response time than medium risk calls (Mean=69.59 sec, S.D. 39.16). See Figure 3-14b. The

average response time for low risk calls is 20.70 seconds, S.D. 1.73.

3.6 Discussion

3.6.1 Personal Emergency Response Call Trends

Care provider callers accounted for 30.6% of the recorded call sample collected. A possible

reason why care providers might use the PERS button to reach assistance as opposed to dialing

911 using a telephone may be because pushing the button is easier/quicker and would allow them

to keep their hands free to actively care for the older adult while making the call. An alternate

argument may be that pushing the button actually slows down the process for obtaining

emergency assistance because the caller would first need to discuss with the call taker before

reaching EMS. As suggested during an on-site visit to an EMS call centre, care providers may

also be instructed to do this by the PERS provider so they can keep track of their client’s events.

117

In terms of call situations, there were many more medical calls versus fall calls. Defining fall

calls as calls with “unintentional falls not resulting in injury,” essentially excluded fall calls from

the ‘high’ (emergent) risk level category. Only medium risk level, fall calls made by older adult

callers were identified. The main reason behind using this definition was because caller health

information was limited to what was presented in conversation and it became difficult to

determine whether an unintentional fall call with resulting injury was caused by an underlying

medical condition or purely accidental. With the definition used, we could also determine if falls

without physical injury would elicit different responses from PERS users. One possible reason

why fall calls were not observed for care providers in medium risk situations is presumably

because the care provider, if present, would be able to assist the older adult in either getting up

from a fall on their own, or would obtain the necessary help required. Future studies may wish to

consider alternate fall definitions.

Looking at the data purely in terms of the numbers, for the call response types, because care

provider callers were found to request EMS services 100% of the time, it would seem pertinent

for the HELPER to offer EMS services as a first response option to care provider callers. For

older adult callers in high risk situations, an EMS suggestion also seems to be appropriate.

However in medium risk situations, an EMS response might work for approximately 70% of the

medical calls and 50% of the fall calls. Given that these situations occur at the medium risk level,

if the older adult caller does not specify up front who they want called, perhaps the EMS

suggestion first would be the best approach as a default option. The additional length of time

required to suggest an EMS response would also be minimal, if no request is made initially.

Looking at the conversation as opposed to the numbers, older adult callers may be sensitive to

the different latent meaning behind the words “ambulance” versus “paramedic.” The difference

in meaning may be construed as the “ambulance” taking the individual away to be cared for in

the hospital versus the “paramedics” coming to the home of the individual to check on him/her

and to see how they can assist. In a situation where the older adult is trying to maintain his/her

independence, these differences in terms may be very significant. In terms of HELPER

technology design, using the term “paramedic” to offer assistance may be seen as less aggressive

than the term “ambulance,” especially in medium risk level calls made by older adults. The term

118

“ambulance” may be perfectly fine to use in high risk situations where the caller clearly wants to

go to the hospital or can only receive medical care in the hospital (e.g., stroke, heart attack).

The call examples where the caller declines an EMS service and instead requests a non-EMS

responder may further suggest that in fact defaulting to the EMS service may not be the best

option for certain situations. It may be that the HELPER should offer the caller the choice

between the call taker or a non-responder, with the default being the call taker, in medium risk

situations. Then the ambulance offer only appears if the HELPER presumes that the PES is a

high risk situation, such as if the person is not moving (as seen through the video camera), or has

mentioned possible high risk terms such as ‘stroke’, ‘heart attack’, ‘need oxygen’, or ‘can’t

breathe’. Future studies might consider examining these finer differences in HELPER response

dialogue, specifically looking at what responses to offer (e.g., ambulance or call taker), when

should they be offered (e.g., as default, immediately, after response call classification?), and how

should they be offered (e.g., what words should be used?).

3.6.2 Verbal Ability Measures

In terms of verbal ability measures, compared to the care provider caller, older adult callers

spoke more slowly, their rates of utterances were similar, but they had shorter turn lengths. On

the other hand, care provider callers and call takers both spoke at similar rates to each other, both

more quickly and with longer turn lengths than older adult callers. Call takers had higher rates of

utterances compared to both caller types. Our hypothesis that older adult callers would speak

more slowly than call takers during a PES was confirmed. On average, the data suggests that care

provider callers and call takers will say more within their speaker turns and with greater speed

compared to older adult callers.

The discriminant function analysis demonstrated that WPM and TNL had a high rate of caller

type predictability. However, the DF was better at identifying older adults who were classified

correctly over 90% of the time, compared to care providers who were only correctly classified

between 50-60% of the time. In order to adjust the function to capture more care providers, the

optimal threshold can be adjusted to favour care providers more than older adults. This would

have to be done outside of SPSS. The results of this analysis suggest that measures of caller

WPM or TNL may be helpful when used in conjunction with ASR and Natural Language

119

Understanding to further increase the HELPER’s ability to classifying the response call and

increase its confidence in deciding what response to provide. No verbal ability conversational

measures were found to be good predictors for response call risk level suggesting that risk level

information would need to be elicited from the actual semantic content of the response call

conversations, possibly through a caller’s use of keywords and phrases.

In terms of disfluencies, older adult and care provider callers both have a higher average

proportion of maze words compared with the call taker. The higher proportion of mazes for

callers may just be a product of natural spontaneous speech and the need to "find one's words",

as opposed to the call taker who is mainly following an organized, scripted dialogue during the

conversation. The proportion of maze words per total words was found to be lower for the care

provider than the older adult but this was not significant. It is possible that with more data

samples a significant difference would be observed. In Figure 3-12d, Case 47 was found to have

a very high number of maze words, 48%. A closer examination of this case revealed that the

caller had a significant speech impediment which resulted in a great deal of stuttering. In

situations with many maze words, it may be difficult for the HELPER to decipher what an

individual is saying. If these situations could be identified early on in the conversation, automatic

default to a live call taker may be the best response for the HELPER. Future work might

determine how often maze words occur within the initial speaker turns of the response call

conversation and whether the proportion of maze words would be representative of the rest of the

conversation.

3.6.3 Conversational Structure Measures

In terms of conversational structure, compared to the callers, call takers were found to make

fewer statements, ask more questions, and respond to fewer questions. These results correspond

to the fact that the conversational script call takers follow requires them to ask mostly close

ended questions until they obtain enough information, justification, and verification to initiate a

call response. Compared to the call taker, both the older adult and care provider callers used a

similar number of statements and asked a similar number of questions. However, older adult

callers responded to more questions and had more one word utterances on average than care

providers. The number of one word utterances was similar between care providers and call takers

120

which disproved our hypothesis that all callers would have fewer OWU compared to call takers.

A possible explanation for these differences in conversational structure between caller types may

be in the way the caller responds to the questions posed by the call taker. It is possible that older

adults tend to be led more by the call taker and subsequently they respond with simple one word

answers (e.g., yes, no, fine, okay); whereas the care provider may be more direct and provide the

necessary information required by the call taker up front. For example, the care provider often

states what they want and justifies their need, “I need an ambulance because Mrs. Smith fell and

hit her head and it is bleeding”. The questions posed by the call taker may also differ between

caller types. The call taker may ask the older adult specific questions about their ailments versus

a more general overall condition question that might be asked of a care provider (e.g., Are you

hurt? Are you cold? Do you have a temperature?). Furthermore, with more questions and

responses, there is a higher probability of needing to repeat oneself when communication

difficulties occur or for confirmation of answers. In designing the HELPER communication

module, the designer should consider adjusting the dialogue to handle these different types of

callers and conversational structures. For example, the call dialogue could be tailored to handle

‘direct requests with justification’ responses from care providers, as well as, conversations in

which a more ‘seek and find’ approach prevails, where the older adult subscriber answers several

questions before the best response to provide can be identified.

A discriminant function analysis demonstrated that NRQ and OWU were the best predictors of

caller type with a moderately high success rate of 74.6% correct classifications. The DF was

better at identifying older adults who were classified correctly over 90% of the time, compared to

care providers who were only correctly classified 32% of the time. In order to adjust the

function to capture more care providers, the optimal threshold can be adjusted as discussed

previously in the 3.6.2 Verbal Ability Measures section above. A second discriminant function

analysis demonstrated that NS was the best predictor of call risk level with a moderate success

rate of 67.6% correct classifications. The DF was better at identifying high risk levels compared

to medium risk levels. The use of conversational structure by an HELPER to predict caller type

or risk level, however, would probably not be practical simply because it would require several

turns of conversation to be completed before an analysis could begin. Rather, these results are

important because it supports the fact that a significantly different conversational structure

121

occurs between caller types based on how they respond to questions, as well, demonstrating a

conversational change with a majority of higher risk calls requiring fewer statements to be made.

3.6.4 Timing Measures

With respect to timing measures, the difference in speaker turns between older adults and care

providers was not found to be statistically different, regardless of risk level. However, the

number of speaker turns for callers was on average less than the number of speaker turns for the

call taker. As the call taker generally opens and ends the call, this result was as expected.

Considering risk level, although both older adult and call takers had significantly shorter speaker

turns for high risk situations compared to medium risk situations, this difference was not

significant for care providers. It is possible that because care providers usually only call for EMS

services, they are more succinct when requesting a response and their responses are similar

regardless of risk level. Another possibility is that the data results are altered due to the presence

of two outliers, #4 and #2 as observed in the care provider high risk category in Figure 3-14a.

Removing the outliers and re-running the t-test did not change this result (p-value went slightly

lower to 0.067). More data samples would be beneficial to strengthen and confirm the result

outcomes.

In addition to lower ST in high risk situations, call takers were found to speak more quickly

(faster WPM, UPM) during their calls. This quickened pace is possibly associated with the call

taker recognizing the high risk situation and their need to obtain all the required information as

quickly as possible before initiating a response. In contrast, the older adult callers did not speak

more quickly in high risk situations, but they did make fewer statements and one word responses.

It is not clear whether the older adults are simply calmer about their situation in high risk

situations or whether they want to appear calm so as not to alarm the call taker, and possibly to

demonstrate they are in control of the situation.

The mean number of ST calculated can be used as a guideline for technology developers to target

when developing the automated PERS communication module for both caller types. During

medium risk situations, the dialogue for older adult callers varies considerably in terms of the

122

number of ST. In an effort to provide assistance as quickly as possible, it may best to set a limit

for the number of ST before the system automatically defaults to a live operator.

As expected, in terms of time in seconds, high risk calls were responded to more quickly than

medium risk calls, while low risk calls were identified the fastest among all risk levels. These

results were the same for both older adult and care provider callers. All calls were less than 3

minutes. These results also provide a baseline in actual time (sec).

3.6.5 Study Limitations

This study was limited by its small and unbalanced sample size. Increasing the sample size may

improve the robustness of the results. Also, using a different definition for a ‘fall call’, may

increase the number of ‘call reason – fall call’ events and allow this category to be included in

the RM_MANOVA analysis. The fact that all response call recordings had come from a single

PERS provider also limits the number of PESs represented in this study and the generalizability

of the findings. Other PERS providers may follow different call protocols and may experience

other types of events which were not observed with the PERS provider where the calls examined

were obtained. Another study limitation is transcription variability resulting from human error

(e.g., difficulty hearing call recordings clearly). In addition, the fact that statistical analyses are

based on mean measurements is also a limitation. Wide variances in measures were observed for

both caller and speaker types and simply looking at means does not provide a complete picture of

what may be happening within each call. Finally, call meta-data surrounding the speaker details

was not provided (e.g., which call taker is responding, gender of callers, caller medical history).

As such, this study is limited by assumptions made by the researcher in completing the analysis.

For example, the analysis was performed on the assumption that each caller was unique and

interacted with the call taker only one time.

3.6.6 Future Research

Future research may want to consider examining speech intelligibility measures if better call

recordings can be obtained. It would also be important to refine these results to examine only the

initial utterances from the caller. As time is of the essence in emergency response, the HELPER

123

system will need to classify the call within the initial speaker turns. Alternative methods of call

classification may also be considered including a different definition for fall calls.

3.7 Conclusion

In conclusion, this chapter outlines the process by which response call conversations were

analysed to identify significant conversational trends that could be used to help establish

development guidelines for the HELPER communication module’s speech and dialogue

handlers. Care providers were shown to consistently request EMS services when using PERS

100% of the time. Older adults requested EMS services nearly 96% of the time for high risk

situations, however, this number dropped to 71% for medium risk situations. Therefore

identifying a response call’s caller type and/or risk level may help the HELPER in predicting a

possible response outcome. In terms of trends in verbal ability measures, WPM and TNL, were

identified as possible useful predictors for a response call’s caller type. However, no verbal

ability measures were identified as useful predictors for risk level. The identification of average

call taker response times in speaker turns and in seconds will also provide a target against which

the HELPER system’s response times can be compared. In terms of trends in conversational

structure, care providers and older adult callers were shown to employ different strategies for

responding to the call taker. This result suggests that there may be benefit in tailoring the

HELPER dialogue to the actual caller type and risk level. Especially in emergency situations

where time is of the essence, a SDS that responds well to the caller type and PESs may not only

result in identifying the desired response type more quickly, but may also provide a better user

experience. An improved user experience could, theoretically, lead to higher usage rates and

lower technology abandonment.

124

Chapter 4

4 The CARES Corpus: A Database of Older Adult Actor Simulated Emergency Dialogue for Developing a Personal Emergency Response System

4.1 Prologue

This chapter describes the process used to design and develop a spoken speech database

containing Canadian adult regular and emergency speech (CARES). Although the main

motivation for building this speech corpus was to help train and test various components of the

HELPER communication module, this database will also be of benefit to researchers interested

in older adult speech and in other fields such as computational linguistics, natural language

processing, and linguistics. The contents of this chapter have been published in a peer-reviewed

journal.

*N.B. The “Intelligent Call Handler” and the “Communication Dialogue” components illustrated

in this Chapter in Figure 4-1 would house the Speech Informant, Dialogue Manger, and Call

Responder HELPER SDS components and the Response Generation and Speech Synthesis

HELPER SDS components respectively, as described in Chapter 1, Figure 1-6.

Author Contributions: V.Young wrote the manuscript, designed and developed the database

collection protocol, and managed and led the data collection process. A. Mihailidis reviewed the

manuscript and led the research for the automated PERS.

Journal Citation: Young V, Mihailidis A. (2013). The CARES Corpus: A database of older

adult actor simulated emergency dialogue for developing a personal emergency response system.

International Journal of Speech Technology. 16:55-73.

125

4.2 ABSTRACT

There has been limited research on automatic speech recognition systems developed specifically

for older adults and there exist few older adult speech corpora available for training them. For

our research, samples of primarily older adult voices within an emergency context were needed

to help develop, train, and test the automatic speech recognition component of a novel,

intelligent, speech-based personal emergency response system. We were unable to locate an

existing speech corpus with all the properties we required. Specifically, these properties included

spoken Canadian English, both male and female adult (especially older adult) speech, emotional

or stressed speech, and emergency type dialogue. As a result, we created the Canadian adult

regular and emergency speech (CARES) corpus. The goal of this paper will be to describe the

design and development of the CARES corpus. The CARES corpus has been designed using

information obtained from live emergency call centre call transcripts and research literature in

the field of automatic speech recognition. This corpus consists of a collection of spontaneous

speech, read sentences, simulated expression of words, phrases, and emergency scenarios from

adult actors aged 23-91 years. The emphasis is on emergency type dialogue and older adult

speech. A total of 40 participant voices are included in the corpus and over 70% of the voices are

from adults over the age of 50 years. Approximately 3,200 minutes of speech was acquired in

total.

4.3 Introduction

High recognition accuracy in automatic speech recognition (ASR) applications is heavily

dependent on how closely the incoming speech can be matched to the speech samples used to

train the ASR system despite the presence of non-targeted speech noise. For this reason, a great

deal of effort has been placed on developing speech corpora for training ASR systems that

contain speech samples highly representative of the final target population group within the

expected application context. Adding to the already complex task of speech recognition in quiet

environments with the “average adult” speaker, if one considers using ASR in a situation of high

stress, such as a potentially life threatening emergency event involving an older adult speaker,

the choice of speech corpora used for ASR training may be considerably more crucial. Research

literature underlines the fact that stressful situations can alter a speaker’s voice, negatively

126

affecting ASR performance (Baber & Noyes, 1996; Zhou, Hansen, & Kaiser, 1998). Other

research suggests that ASR performance with older adults improves when older adult voices are

used for training the ASR (Anderson et al., 1999; Baba et al., 2004).

A number of speech corpora currently exist and are available for research use through various

institutions; for example, the University of Pennsylvania’s Linguistics Data Consortium (LDC)

(www.ldc.upenn.edu), the Oregon Health and Science University’s Centre for Spoken Language

Understanding (www.cslu.ogi.edu/corpora/corpCurrent.html), Stanford University’s Department

of Linguistics (linguistics.stanford.edu/department-resources/corpora), the Division of

Psychology & Language Sciences at University College London

(www.phon.ucl.ac.uk/resource/scribe), and the European Language Resources Association

(catalog.elra.info/index.php), However, for our intended application involving older adults in

stressful situations, we were unable to identify an existing speech corpus that contained all the

necessary properties required. Specifically, the relevant properties of interest included speech in

Canadian English spoken by older adults and caregivers, use of emergency type dialogue, and

speech spoken in an emotional or stressed state. Good coverage of both male and female voices

was also desired. As a result, we decided to develop a new speech corpus specific to our needs

called the Canadian Adult Regular and Emergency Speech (CARES) corpus.

The CARES corpus was designed especially for future training and testing of the ASR

component of a novel speech-based, intelligent personal emergency response system or PERS, as

well as for overall PERS testing. The PERS system is still in development and further details can

be found in the next section “Background & Motivation”. This paper exclusively focuses on the

creation of the CARES corpus. The application of the CARES corpus in the context of the PERS

ASR will be the topic of another future paper. In this paper, we will outline the motivation

behind the creation of the corpus, present a detailed description of the design specifications and

development, and conclude with a final discussion of our results and the corpus limitations.

127

4.3.1 Background & Motivation

4.3.1.1 The Traditional PERS Technology

Our research focuses on exploring the use of speaker independent (or factory pre-trained) ASR,

artificial intelligence, and human-computer dialogue, within a PERS for initiating and

determining the most appropriate emergency response for an older adult user in an emergency

situation at home. Traditionally, PERS technology is installed in the home of an older adult and

provides him or her with access to immediate 24 hour emergency assistance at the push of a

button. This button actuator is typically worn on the body - around the neck or wrist. In the case

of a fall or medical complication, the button must be pressed in order to reach an emergency call

operator who will respond immediately over a speaker phone or by telephone. The emergency

call operator will then determine through a quick series of questions and responses the most

appropriate emergency response. Finally, the desired emergency service or care provider (e.g.,

ambulance or neighbour) would be contacted or, in the case of a false alarm, the emergency call

operator would end the call.

PERS technology has been shown to support aging-in-place or aging at home in one’s

community, lower caregiver and user anxiety, and decrease overall healthcare costs (Hizer &

Hamilton, 1983; Mann et al., 2005). However, despite high user satisfaction, a large proportion

of PERS owners do not use their systems when needed. Reasons for non-use vary but include

lack of perceived need; sensitivity or burden from having to wear or remember to wear the

button actuator; potential loss of independence from outcome related hospitalization; and

inability to press or access the button (Heinbüchner et al., 2010; Hessels et al., 2011; Mann et al.,

2005; Porter, 2005).

In addition, a large proportion of calls to personal emergency response call centres actually

consist of false alarms (accidental button presses) (Hamill et al., 2009). False alarm calls often

result in unexpected calls to the subscriber, loss of work hours for family responders, and an

increased workload for already stressed emergency care providers. In order for the successful

adoption of this potentially life-saving assistive technology among older adults to help with

aging-in-place, it is paramount that the PERS be made accessible, usable, efficient, and effective:

therefore, system re-design is necessary (Hessels et al., 2011; Porter, 2005).

128

4.3.1.2 Re-designing the PERS

In our research, we hypothesize that using ASR in the PERS could eliminate the need for a body

worn button activator. As well, an artificially intelligent PERS might also permit call

cancellation in the event of a false alarm. The development of the first ‘proof of concept’,

speech-based, intelligent PERS was a ceiling mounted device with a dialogue modeled after

existing call centre protocol (Hamill et al., 2009). An open-source ASR called Sphinx 4.0 from

Carnegie Melon University was used in this prototype (Walker et al., 2004). The system was

designed to understand “yes” and “no” responses to questions and ASR and PERS testing was

completed in a controlled lab environment and performed on non-older adults.

The next step in our research is to further develop the second PERS prototype incorporating

design and testing for the older adult user (Young & Mihailidis, 2010). This improved prototype

will include an expanded vocabulary (but not more than 200 words); a dialogue manager that can

handle normal conversational acts such as repetition, silence, barging-in, mis-understandings,

openings and closings; an artificial intelligence capable of identifying false alarms from true

emergencies and recalling past history; and an ASR trained for the older adult and caregiver

voices in typical emergency situations. The automatic default of this system will always be a live

operator.

To improve the robustness and accuracy of the ASR component used within this new PERS, a

collection of speech samples was required for training and testing from the target population in

emergency type situations. This same collection of speech could also be used as a simulated user

for dialogue and intelligence testing of the overall PERS. The CARES corpus was created to

meet these needs and includes a collection of read and spontaneous speech, as well as emergency

words, phrases, and enacted dialogue collected from adults, the majority of which were older

adult actors over the age of 50 years.

4.3.1.3 The Application

The speech-based and intelligent PERS is comprised of three modules or sub-components: the

Automatic Speech Recognizer or ASR, the Intelligent Call Handler and the Communication

Dialogue (Figure 4-1).

129

The Caller, typically an older adult or caregiver, provides the speech input into the PERS which

goes to the ASR for processing. The ASR ‘Decoder’ takes the processed speech sample and

determines the possible user response (word or phrase) by referencing data from the ‘Linguist’.

The Linguist contains three data components: (1) the acoustic model, (2) the lexicon or

dictionary (word list), and (3) the language model (pronunciation of the words included in the

lexicon).

Personal Emergency Response System

Communication Dialogue

Intelligent Call Handler

Automatic Speech Recognizer

SignalProcessing

Decoder Linguist

Caller

Assistance

CARES Corpus

Input

Output

Output

System Testing

System Training

Emergency Response

Figure 4-1: The CARES Corpus application within the context of a PERS.

When an ASR is trained, the speech samples from the speech corpus are used to create the

acoustic model component within the Linguist. The words and word pronunciations of these

speech samples are also stored within the lexicon and language model of the Linguist. Once the

input speech is decoded, the “Intelligent Call Handler” module then determines whether

emergency response assistance can be initiated and provided, or if further dialogue is required. If

the call is a false alarm, the call may also be cancelled. If further dialogue is required the

“Communication Dialogue” module is initiated and the Caller is queried with another question or

provided further information.

As shown in Figure 4-1, in addition to ASR training, the CARES corpus will also be used for

PERS testing; both the ASR and the overall PERS. Pre-recorded words, phrases and emergency

130

scenarios from the corpus can be used repeatedly to examine how the PERS system components

might respond to different user input. The use of the corpus provides a method for controlled and

repeatable testing of different ASR acoustic models and various modifications to the

communication dialogue and the intelligent call handler. Using the CARES corpus as opposed to

a live older adult in the initial testing phase is more cost efficient and reliable, and will hopefully

provide a smoother technology transition to field testing with live subjects.

4.4 Methodology

The methodology used in this study was reviewed and approved by the University of Toronto

Ethics Board (Protocol Reference Number 23482).

4.4.1 Application Context and Target Population

In designing the CARES corpus, we wanted to include simulated speech of callers (the target

population) using a PERS during a live emergency call, both for false alarms or true

emergencies. To achieve this goal our methodology involved the following steps:

(1) We analysed live emergency call centre calls to determine what occurs during a live

emergency call centre call, who calls, why do they call, what help is requested, and to

identify what key words and phrases are used.

(2) We tried to recreate or simulate live emergency situations by using adult actors, mainly

older adult actors, and recorded their voices while simulating emergency type words,

phrases, and situations.

(3) Older adult subjects (the individuals in need of care) and other adult subjects (the

caregivers) were solicited to provide speech samples of spontaneous speech, read speech,

simulated emotional speech and simulated conversational speech.

131

4.4.2 Speech Corpus Design Specifications

4.4.2.1 Live Emergency Calls

To design a speech corpus that accurately reflects the traditional PERS user, his/her speech

characteristics, and the application context, it was necessary to better understand what happens

during a live emergency call situation after the button activator is pressed.

A total of 84 recordings of live personal emergency response calls were obtained from the call

centre of a private personal emergency response service provider and transcribed using SALT

v.9.0 (Miller & Iglesias, 2006). The company’s name is not provided for reasons of

confidentiality. Discourse and conversational analyses were performed on the transcripts

(Wooffitt (2005), presents an introduction to these two methods of analyses), in addition to key

word isolation. Specific details of the transcript analysis and key word isolation will be discussed

in a separate paper, however a brief summary of the results is included here. See Table 4-1 for a

summary of important aspects of live emergency calls, corresponding transcript analyses

findings, and the resulting speech corpus design specifications.

Table 4-1: Important aspects of emergency call transcript analysis applied to speech corpus design specifications.

Index Important Aspects of Live Emergency Calls

Transcript Analysis Findings

Speech Corpus Design Specifications

1 Who uses the system?

Older adults and care providers Mostly older adult females and some males

Include adults and older adult speakers. Emphasis should be on older adults. Need female and male voice samples

2 How are requests for assistance made?

Direct and indirect requests for assistance

Important vocabulary - select key words and phrases to include

3 Important aspects of conversation and dialogue

Identify communication acts, speech categories and key words and phrases Older adult callers speak significantly more slowly than care provider callers and emergency call operators

Important vocabulary - select key words and phrases to include Vary manner of speech in recordings (e.g., speed of speaking, read speech and emotional speech)

132

4 What types of emergency situations?

Accidental, medical and fall calls

Include each type of emergency situation in scenarios

5 Are there varying degrees of emergency situations?

Low, Medium and High Risk

Include different degrees of emergency situations in emergency scenarios

For item 1, both older adults and their care providers were found to use the PERS and activate

the system using the push button. Therefore, both adults and older adult voices should be

included in the speech corpus. As it is more difficult to recognize older adult voices using ASR,

a greater number of older adult voices should be collected. Although a majority of the emergency

response call centre clients are older adult females, both female and male voice samples would

be required.

For items 2 and 3, in terms of how requests are made and other important aspects of the

emergency conversation and dialogue, care providers were found to be more explicit or direct

when requesting an emergency response (e.g., I need an ambulance) compared to the older adult

callers who were more explanative (e.g. “I’ve been vomiting, I need help”). Within the

conversation itself, the words and phrases used could be categorized into different

communication acts or speech categories, for example, openings, closings, confirmations,

negations, and queries. To reflect this information in the speech corpus, the final recorded

vocabulary included the most common or important key words and phrases which were deemed

necessary to make direct or indirect emergency response requests, indicate a high or low risk

situation, or reflect a negative health condition or fall. In addition, words were also included that

are used in essential communication acts or speech categories required to carry out a

conversation (e.g., repetition requests, responses to questions, queries). All short phrases used in

the corpus contained one or more of the key words contained in the vocabulary.

On average, the older adult callers spoke significantly more slowly than both the care provider

callers and the ECOs. In an attempt to capture the different manners of speech among younger

and older adults within the speech corpus, participants were instructed to repeat key words using

different manners of speaking (e.g., normally, loudly, quietly, quickly, and slowly).

133

For items 4 and 5, emergency response calls were divided into medical emergencies and calls

associated with falls, and also categorized into low, medium and high risk situations. A low risk

situation being a false alarm and a high risk situation being one of life or death, possible loss of

sight or limb, or where immediate assistance is required (e.g., heart attack, stroke, fire). In a

medium risk situation, the individual requires assistance but would probably not lose their sight,

limb or life in the process of waiting (e.g., fell but not hurt). In terms of the length of time before

emergency response is initiated, low and high risk calls were resolved faster compared to

medium risk calls. This information was covered in the speech corpus by including emergency

scenarios involving accidental, medical and fall calls and situations of low, medium and high

risk.

4.4.2.2 Phonetically Balanced Sentences

The words and phrases selected for inclusion into the speech corpus were selected based on their

semantic importance in emergency dialogue as opposed to their phonetic makeup. Thus we felt it

necessary to include in the speech corpus voice samples covering all the phonemes of the

English language. These samples would allow for the possibility of training the ASR system

using phonemes (also di- or tri- phones) if required. Sentences were selected from the SCRIBE

database (Huckvale, 2004) to provide a set of phonetically rich or balanced sentences and

phonetically compact sentences for use in the speech corpus.

4.4.2.3 Spontaneous Speech Sample

Research has shown that read speech and spontaneous spoken speech are acoustically and

linguistically different (Howell & Kadi-Hanifi, 1991) and that ASR recognition rates can be

improved when spontaneous speech is used with an ASR system trained with comparable

spontaneous speech (Furui, Nakamura, Ichiba, & Iwano, 2005). Since emergency dialogue is not

typically read speech, we thought it would be useful to also include a sample of spontaneous

speech in the CARES corpus.

4.4.2.4 Simulated Vocal Expression

Adult actors were mainly used to simulate emergency speech in the CARES corpus because it

was not possible to use actual emergency speech for several reasons. First it is not ethical to

134

create a real emergency, or a realistic simulation of an emergency, since this would create

unacceptable risks for the participants. Nor was it possible for live recordings of emergency calls

to be used because the ones we managed to obtain were not of sufficient audio quality for ASR

training.

A popular approach for research in the area of ‘affective computing,’ where human emotion is

considered when designing ASR systems, is to begin with a collection of emotional speech either

real or simulated by actor subjects (Campbell, 2000; ten Bosch, 2003). Although simulated vocal

expression using actors may not be as natural as the speech obtained from a “true” emergency

situation, there is research evidence to suggest that actor simulated vocal expressions will

provide more expressive speech than non-emotional read speech (Murray & Arnott, 2008;

Scherer, 1986, 2003; C. E. Williams & Stevens, 1972).

4.4.3 Participant Recruitment

Adult actors were targeted for recruitment within three age groups from the greater Toronto area

(GTA). The three age groups consisted of: (1) adults 19+ to <55 years of age; (2) adults 55 to 69

years of age; and (3) adults 70 years of age and over. The upper age of 70 years and older was

selected instead of 60 or 65 years (retirement age) based on literature which suggests that age

related changes in the older adult voice may only start to affect ASR accuracy when the speaker

is over the age of 70 years (Wilpon & Jacobsen, 1996). The lower age of 19 years was selected

as the majority of students within the post-secondary education system would begin at this age.

The middle age of 55 years was selected as this is a possible early retirement age. All actors were

required to have a minimum of one year of prior acting experience with an acting group.

Actor participants were recruited from local theatre events (e.g., Toronto Fringe Festival), a

senior’s acting group (Act II) based out of a local university (Ryerson University), and from a

“Performing Arts Lodge” located within the GTA which is a residence for individuals in the

performing arts. Participants were also recruited via word of mouth from other participants. A

minimum of five participants of each gender (male and female) were recruited for each age

group category.

135

Participant suitability was determined through a telephone interview conducted prior to

acceptance into the study. See Appendix E for a list of the questions participants were asked.

Desired user characteristics are also listed below.

1. Fluent in English with an English language comprehension level high enough to understand the consent forms and follow simple instructions;

2. Canadian residents;

3. Minimal language accent;

4. Minimal motor speech difficulties;

5. English language literate;

6. No more than normal-mild hearing loss or corrected hearing loss;

7. Normal or corrected vision;

8. Medically stable;

9. Mobile and living independently in the community;

10. Cognitively capable of consenting to participate in the study.

4.4.4 Recording Procedure

The speech recording session was designed to last approximately two hours in total. Participants

were required to perform four different speaking exercises. See Table 4-2:

Table 4-2: Summary of speech sample recorded and general recording details.

Exercise No.

Speech Sample Recording Details

1 Free speech (miscellaneous topic) Spoken, 1 session, 5 minutes long

2 Sentences Read, 96 sentences (50 rich and 46 compact), 5 minutes per session, 10 sessions

3 Isolated emergency words and phrases

Read with emotion, 185 phrases and key words repeated 5x, 5 sessions, 10 minutes per session

4 Emergency scenarios Read with emotion, 3 scenarios, 10 minutes total

136

Exercise 1 - Free speech: participants were asked to speak naturally and spontaneously about a

miscellaneous topic of their choosing. If they had difficulty, questions were asked to facilitate the

dialogue.

Exercise 2 - Sentence Reading: participants read a collection of phonetically rich and compact

sentences (96 sentences combined). In total, 200 phonetically rich sentences and 460

phonetically compact sentences were available from the SCRIBE database (Huckvale, 2004).

Two sentence sets were created: Set A included the phonetically rich sentences and Set B

included the phonetically compact sentences. Each of these sentence sets were further divided

into smaller sub-sets: four groups of 50 rich and ten groups of 46 compact sentences. Each

participant was randomly assigned one sub-set of sentences from both the phonetically rich Set A

group and the phonetically compact Set B group. The selected sentence sub-sets were presented

on a computer monitor to the participant for reading. An example of these sentences is shown

below:

SET A – Phonetically Rich Sentences

A1-001. The price range is smaller than any of us expected.

A1-002. They asked if I wanted to come along on the barge trip.

A1-003. Amongst her friends she was considered beautiful.

A1-004. The smell of the freshly ground coffee never fails to entice me into the shop.

A1-005. I'm often perplexed by rapid advances in state of the art technology.

SET B – Phonetically Compact Sentences

B1-001. This was easy for us.

B1-002. Is this seesaw safe?

B1-003. Those thieves stole thirty jewels.

B1-004. Jane may earn more money by working hard.

137

Exercise 3 – Isolated emergency words and phrases: participants were instructed to read and

then speak with emotion, 185 short pre-selected emergency phrases and words. Each key word

was pre-selected from within the phrase and was repeated five times in different manners of

speaking: normally, loudly, softly, quickly and slowly. Spoken numbers were also included in

the recordings: 0-20, 30 to 90 in tens. All phrases and words were displayed on a computer

monitor and participants were provided prompts for the five different manners of speaking. For

the “slow” manner of speaking, participants were instructed to imagine they had difficulty

forming their words. Figure 4-2 shows an example of the question to keep in mind, the response

sentence, and the key word of interest provided to participants during this session. Appendix F

presents a list of the words and phrases recorded by the participants.

In Exercise 3, the emergency words and phrases were randomized and arranged into a list called

‘Set-1’. ‘Set-2’ was Set-1 in reverse order. Each participant was assigned either Set-1 or Set-2 in

an alternating pattern. There was one exception where the first participant viewed the words in

alphabetical order. The order of presentation of the word and phrase list was counterbalanced to

reduce the effects of voice differences due to fatigue or effects resulting from beginning a new

verbal exercise.

Figure 4-2: Sample screen shot of emergency phrases and words presented to the participant during speech recording. The participants were provided with screen prompts to indicate how the word was to be spoken.

138

Exercise 4 - Enacting three emergency scenarios: In total, nine short emergency scenarios

were used with each participant being assigned to three of the nine possible scenarios, all of

which involved dialogue that might occur after pressing an assist button on a PERS. The three

scenarios included: one accidental button push for assistance, one fall incident, and one request

for medical assistance. All scenarios were pre-written and pre-assigned by the researchers. The

scenarios were taken from the live emergency call transcripts and modified to remove any

identifying information. Scenarios were provided to the participants for review and practice prior

to the day of their voice recording. See Table 4-3 for a summary of the scenario type, risk level,

and scenario details. See Appendix G for the emergency scenarios.

Table 4-3: Emergency scenario type, risk level and scenario detail.

Scenario Type

Risk Level Scenario Detail

Accident (A) Low (1) Un-aware of button press, accidental call, hard of hearing

Low (2) Aware of button press, accidental call

Low (3) Un-aware of button press, accidental call

Fall (F) Medium (4) A fall, can’t get up, send responder

Medium (5) Caregiver call, a fall, broken bone, send ambulance

High Risk (6) A fall, bleeding, hard of hearing, send ambulance

Medical (M) Medium (7) Nauseous, vomiting, wants responder

High (8) Breathing difficulty, dizzy, wants paramedics

High (9) Shaky, difficulty breathing, wants responder

In Exercise 4, the nine emergency scenarios consisted of three accidental push-button scenarios

(A), three fall incident scenarios (F), and three medical assistance scenarios (M). Within these

emergency scenario groupings, the accidental push-button scenarios were all low risk events,

whereas the fall incident scenarios included two medium and one high risk situations, and the

medical assistance scenarios contained one medium risk and two high risk situations. A total of

twenty-seven scenario combinations were created and placed in random order. Each participant

139

was assigned one of the twenty-seven randomized scenario combinations which included one

accidental, one fall, and one medical scenario. When all twenty-seven scenario combinations had

been performed the order was repeated (Table 4-4 – Scenarios). The three scenario type

combinations (e.g., A, F and M) were also arranged into six order combinations (e.g., A-F-M, F-

M-A, M-A-F, etc.). Each participant was assigned one of these scenario order combinations to

determine the order in which the scenarios would be performed. This order was repeated when

all six scenario type combinations were completed (Table 4-4 – Scene Order). See Table 4-4 for

an example of the data combinations used for each Participant.

Table 4-4: Example of data combination arranged for each participant indicated.

Participant Scenarios* Scene Order

Sentences A

Sentences B

Sent. Order

Word Set

10 248 MFA 4 8 AB SET1

11 258 FAM 3 2 BA SET2

12 157 AMF 4 6 AB SET1

13 367 AFM 2 10 BA SET2

14 368 FMA 3 5 AB SET1

15 149 MAF 2 2 BA SET2

16 269 MFA 2 1 AB SET1

17 259 FAM 4 7 BA SET2

18 168 AMF 2 5 AB SET1

19 147 AFM 4 10 BA SET2

20 167 FMA 4 5 AB SET1

* For participant #10 for example, scenarios ‘248’ corresponds to scenes 2, 4 and 8.

4.4.4.1 Recording Environment

All speech recordings were conducted at the University of Toronto in quiet background

conditions inside a doubled wall, sound attenuated booth of approximately 74 x 74 x 78.5 inches

(DxWxH) in size. Participants were seated inside the sound attenuated booth in front of a

computer monitor. See Figure 4-3.

140

Figure 4-3. Participant room setup in sound attenuating booth.

Figure 4-4. Experimenter room setup in sound attenuating booth.

4.4.4.2 Recording Equipment

Speech recordings were made using ProTools TDM Software on a dedicated Apple Computer

(MAC OS X version 10.4.11, 3 GHz Dual-Core Intel Xeon). The microphone pre-amp was a

Digidesign “PRE” and the audio interface was the Digidesign “192 I/O”. Participant speech was

recorded at a sampling rate of 96 kHz and 24 bits. The participant used an AKG Acoustics k271

studio headphone and an Audio-technica 4050 multi-pattern condenser microphone. The

experimenter used an eH150 Sennheiser headphone and an AKG Acoustics C4000B large

diaphragm condenser microphone.

141

4.5 Results

4.5.1 Participant Recruitment

A total of 40 participants, 19 male and 21 female, were recruited for the study over a six month

period. Thirteen participants fell within the 19+ to <55 years of age group, twelve participants

were in the 55 to 69 years of age group, and fifteen participants were in the 70 years of age and

older group. See Table 4-5 for a breakdown of the participants by Age Group and Gender.

Table 4-5: Participants by Age Group and Gender

Age Group (years) Gender

Male Female

19+ to <55 6^ 7

55 to 69 6 6*

70 and over 7& 8~

Gender Totals 19 21

^1 non-actor; &1 with minor accent; *1 with minor accent, 1 non-actor; ~3 with minor accent, 2 non-actors.

Nineteen of the participants had 15 years or more experience in the acting profession. Four

participants had no acting experience and five participants spoke with a minor accent. Minor

accents included British English, French and German. Participants spanned an age range from 23

to 91 years of age. See Table 4-6 for a breakdown of the participants by age range.

Table 4-6: Participants by Age Range

Age Range

(years)

Gender

Male Female

20’s 2 3

30’s 3 3

40’s 1 0

50’s 3 2

60’s 3 5

70’s 5 6

80’s 2 1

90’s 0 1

142

Fifteen of the participants were born outside of Canada. They represented the following

countries: Austria, England, France, Germany, Italy, Japan, Scotland and USA. For participants

born within Canada, the Canadian birth provinces included Alberta, British Columbia, Ontario,

Newfoundland, and Nova Scotia.

4.5.2 Speech Recording Summary

Each participant completed all four speech exercises described in the Methodology Section,

except for one participant who did not complete the emergency scenario exercise due to fatigue.

Two participants also did not complete the number counting. A total of ~3,200 minutes of speech

was recorded.

4.6 Discussion

4.6.1 The Age Effect

The length of time required for speech recording was approximately two hours; however, timing

was dependent on how quickly the participants spoke and how many breaks were required.

Younger participants tended to finish the study in less than two hours, while some of the older

participants required more time to finish the exercises. Older participants needed more voice

breaks during the recording sessions.

In the free speech exercise, the topic selected for dialogue tended to be more related to

relationships and travel for younger participants compared to life experience, job and children for

the older participants.

Due to their age and life experiences, the older participants were observed to be more realistic at

portraying older adults in emergency situations especially for certain conditions (e.g., stroke,

weakness, heart attack) than their younger counterparts.

4.6.2 Recording Difficulties

In the sentence exercise, the majority of participants knew how to pronounce all the words in the

sentences however some individuals did have trouble pronouncing words which may have been a

result of education, vision, or reading difficulties.

143

Some participants with hearing aids removed them during the recording sessions as they said

they could hear well over the headphones. Other participants kept the hearing aids on. Wearing

the headphones did not seem to bother them.

Occasionally, a participant gestured or moved during the recording causing noise to be added to

the recorded voice sample. These samples were generally not re-recorded unless the noise was

sufficiently loud to interfere significantly with the recorded signal. If the participant coughed or

sneezed during a sentence, the sentence or word was repeated.

4.6.3 Design Limitations

In the emergency word and phrase exercise, for the word repetition component, the “slow”

manner of speaking was interpreted in different ways. Variations included lengthening the word

slightly, exaggerating the length, stuttering, and slurring the speech.

Some actors with stage or theatre experience were found to exaggerate and enunciate words

more clearly than non-stage actors. It is possible that this type of clear word enunciation may not

accurately reflect a real live emergency scenario. Some actors also seemed to over-act in some

situations. This has been noted to occur in other research studies involving actors and simulated

vocal affect (Scherer, 1986). It is possible that the individuals who over-acted may have had less

acting experience or possibly less life experience with respect to the specific emergency situation

being simulated.

In terms of the speech recording, the position of the microphone was set at the beginning of the

study. However the participants were free to move closer or further from the microphone over

the course of the recording session. This may have affected the final recorded volume of the

speech samples. The microphone was very sensitive and occasionally the change in speech from

a whisper to a very loud voice would cause the input to be maximized or over-saturated and the

recording for that word would need to be re-recorded. If the microphone was positioned too close

to the participant’s mouth and the participant spoke loudly, for certain consonants (plosives) (i.e.,

/p/, /b/, /t/) the microphone may also have picked up a noise artifact (burst of airflow) that would

not normally occur when the microphone is positioned far away from the participant.

144

The CARES corpus is of similar size to the “few talker” set in the SCRIBE (Huckvale, 2004).

Although the speech sample size may not be large enough to train a large vocabulary ASR, the

number of speech samples should be sufficient for training and testing the intelligent PERS ASR

(a small vocabulary ASR) and preliminary field testing. If required, additional speech samples

may be added in the future.

4.6.4 Background Noise

In a true emergency situation, the caller’s incoming speech signal will be contaminated with

background noise from their respective environments. In creating the CARES corpus, the actual

recordings of speech samples were carried out in the quiet environment of a sound attenuated

booth. The benefit of recording the speech signal in a quiet recording environment is that the

degree and type of background noise contamination can be controlled. For example, simulated

room noises, street noise or other noises (e.g., television, radio, crowds) can be added later to test

their effect on the ASR’s speech recognition capability.

4.6.5 Implementing the CARES Corpus

Preliminary plans for using the CARES Corpus in training the PERS ASR includes expanding

the existing vocabulary which currently only recognizes “yes” or “no”. We also plan to examine

different acoustic model combinations, for example, training with strictly older adult voices, or

older adult and younger adult voices, or strictly the younger adult voices. It might also be

interesting to combine the speech samples from the CARES Corpus with speech samples from

other existing corpora to increase the speech data for ASR training for certain common words

such as “yes”, “no”, “help” or “ambulance”.

4.6.6 Other Applications

In addition to benefiting the development of PERS, we expect that the CARES corpus will be

useful for other applications involving older adult subjects such as audio interfaces to Smart

Home technologies. It may also be useful in linguistics research studying the speech patterns of

older adults. It is our intention that the CARES corpus will be made available to other

individuals for their research once the processing has been completed.

145

4.7 Conclusions

A collection of Canadian adult regular and emergency speech has been developed containing

speech samples from 40 adults between the ages of 23 to 91 years. The CARES Corpus

specifications were based on transcript analyses of emergency response call conversations and

dialogue as well as other research literature in the field of ASR development. Participants

included mainly adult actors who were required to carry out four speech exercises including

spontaneous and read speech, and enacted emergency dialogue, phrases and words. The CARES

corpus contains roughly 3,200 minutes of speech. This corpus was primarily designed to further

develop the ASR component of an intelligent, speech-based PERS for older adults. It may also

find uses in future research studies involving smart home technologies, natural speech processing

and computational linguistics. The next step in our study will involve using the CARES Corpus

to train and test an acoustic model for the PERS. It will also be used to provide controlled input

to examine the communication dialogue and intelligent call handling aspects of the PERS.

146

Chapter 5

5 Discussion & Conclusions

5.1 Discussion

There are many challenges in designing and building a novel automated, artificially intelligent,

spoken dialogue-based PERS. Simply getting the system to work in the real world with the

intended user and in the right environment is a major one. Yet, as difficult as it may be to

overcome this challenge, a working system also does not guarantee technology adoption in the

end. Research studies have found that older adults are willing to use intelligent assistive

technologies on two conditions: the older adult must see a need for the technology and it must

work well (Demiris et al., 2004; Mann, Marchant, Tomita, Fraas, & Stanton, 2002; McCreadie &

Tinker, 2005). In addition, the desire or motivation must be there to use it. As researchers,

perhaps the best place our skills can be applied is in creating the knowledge and identifying the

means to make the technology work well. This project takes a step back from the actual physical

system development and focuses more on filling in the gaps in knowledge needed to complete

the design and development of the HELPER communication module for the intended end-user in

a real personal emergency situation. All three research studies in this dissertation use live

personal emergency response call data obtained from a local personal emergency response call

centre. The knowledge derived from analyzing these real personal emergency response calls

combined with prior research studies will pave the path down which a more robust HELPER can

be created – one that will work well for the end-user and instill in them the desire to use it.

Three main objectives of this doctoral research project are re-stated as follows:

(1) To identify keywords and phrases used by existing PERS users in various personal

emergency response call situations. (Study 1 – Chapter 2)

(2) To identify significant trends in personal emergency response calls and call conversations

that may be used to tailor the call response to the user. (Study 2 – Chapter 3)

(3) To design and develop a corpus of spoken speech to be used for training and testing the

communication module of the HELPER system. (Study 3 – Chapter 4)

147

In this final Chapter, first, study and data highlights will be summarized for each study. Next, the

research contributions to knowledge will be presented followed by the research limitations and a

description of future research and proposed future studies. Finally, the thesis will conclude with a

short discussion on the implications of the work and offer some final remarks.

5.2 Study Highlights

The three studies are presented in the order in which they appeared in the dissertation. In each

sub-section, the research objective is re-stated followed by a summary of the study highlights.

5.2.1 Principal Findings from Study 1: Identification of Keywords and Phrases

Study 1 Objective: To identify keywords and phrases used by existing PERS users in various


Before keywords and phrases used by PERS callers could be identified, it was necessary to first

determine what “different types” of PESs occur as well as to figure out how a word or phrase

might be considered “key.”

To determine different types of PESs, a model of a PES was created which consisted of the

system user experiencing some possible personal emergency event and existing in some

physical-cognitive state. In the model, the user was described as a caller type, the situation was

characterized by a call reason and a risk level, and the physical-cognitive state was represented

by the user’s communication ability. Knowledge gained from on-site visits with emergency

response service providers, research literature, and the response call transcripts was used to

develop the model and to identify PES categories within the PES. Caller type included three

categories: the older adult, the care provider, and a combination of older adult and call provider.

Call reason included two categories: fall and medical calls. Risk level included three categories:

low, medium, and high risk.

Unique words were extracted from response call transcripts and word categories were developed

to help identify what words would be considered “key” and why. These categories were

developed qualitatively using the call transcripts and research literature. The process of

148

categorizing the extracted words was repeated by two coders categorizing words out-of-context.

A sub-study was also performed with a third coder identifying keywords, phrases, and their

categories in-context by reading the response call transcripts. Eighteen (18) category codes were

used by all three coders and a full keyword list of 402 words were identified after integrating all

coder results. 135 phrases were identified by Coder 3.

A smaller keyword set was produced from the full keyword set for use as content material in the

CARES corpus. This list was obtained by applying a series of reduction rules to the full keyword

set. The reduction process took into consideration both the keyword frequency of occurrence, as

well as the number of PES classifications in which the keyword was used. A final small keyword

set of 185 keywords was obtained. For every keyword, a matching phrase was identified which

contained the keyword, for a total of 185 phrases. The 185 keywords fell into 16 of the word

categories (only the categories for “other” words and “interjections” were removed). The small

keyword set was found to include 16 common words used across the risk level and call reason

PES categories, as well as unique keywords for low risk (3 words), medium risk (44 words), and

high risk (31 words) calls.

5.2.2 Principal Findings from Study 2: Identification of Conversational Trends

Study 2 Objective: To identify significant trends in personal emergency response calls and call

conversations that may be used to tailor the call response to the user.

The theory behind this objective was that in addition to recognizing and understanding incoming

speech, other information identified from the response call transcripts could be combined with

the words to support the HELPER’s decision making capability. This other-information might be

in the form of non-semantic speech data such as conversational measures, identified through an

analysis of the response call conversations. By examining trends within the response call

conversations these alternate sources of information would become apparent and could then be

integrated into the communication module’s Dialogue Manager. The Dialogue Manager would

use this information to classify the response calls thereby increasing the HELPER’s confidence

in directing the conversation to quickly identify the appropriate target response.

149

Before this objective can be met, a method for classifying a response call was required and

measurable aspects of interest within a response call conversation needed to be identified. In this

study, the PES model was expanded to the PER model to include an additional response-type

classification. The PER categories were then used to classify the response calls. The four

response type categories identified included: (1) an ambulance, (2) paramedics, (3) other

responders, and (4) all responders. An EMS response was considered to be any of response types

1, 2, or 4. Three groups of conversational measures were then selected for examination: (1)

verbal ability, (2) conversational structure, and (3) timing. Within verbal ability, the study looked

at: words per minute, utterances per minute, turn length in words, and mazes. Within

conversational structure, the study looked at: number of statements, number of questions,

number of responses to questions, and number of one word utterances. Timing was examined

using seconds and number of speaker turns.

The first statistical analysis identified that caller type and risk level were significantly related to

response type. Knowing the call reason might also indicate the call risk level. Care providers

were found to consistently request EMS services when using PERS 100% of the time. Older

adults requested EMS services nearly 96% of the time for high risk situations, however, this

number dropped to 71% for medium risk situations. Thus by knowing the response call’s caller

type and/or risk level the HELPER may be able to deduce the end response type. For example,

an EMS response might be suggested immediately for high risk calls but the system might

propose a non-EMS responder for medium risk calls.

The three independent factors used in the subsequent statistical analyses included caller type

(older adults and care providers), risk level (medium and high risk), and speaker type (call taker

versus callers). The statistical analyses performed included repeated measures analysis of

variance and discriminant analyses using the response call conversational measures and response

call categories of caller type, risk level and speaker type (call takers were the control group).

The results of these analyses showed that words per minute and turn length in words could be

used to help predict caller type. No measures were found to be useful for predicting a call’s risk

level. Mazes were found to occur more often in older adults than care providers but this was not

significant. High risk calls were resolved more quickly than medium risk calls as measured by

150

time in seconds and number of speaker turns. Finally, care providers and older adult callers were

found to employ different conversational structure strategies for responding to the call taker. This

result suggests that different dialogue responses may be required depending on the caller type

and risk level.

5.2.3 Principle Findings from Study 3: Creating the CARES Corpus.

Study 3 Objective: To design and develop a corpus of spoken speech to be used for training

and testing the communication module of the HELPER system.

There are many ways in which a speech corpus can be created and a multitude of reasons why

recordings of speech may be collected. The key to building a useful speech corpus is to make it

relevant for the purpose and context in which it is being built. Researchers, who have been

involved in the development of larger speech databases, have noted that developing a speech

database is an extremely labour intensive process (Huckvale, 2004; Lamel, 1989). This reason

alone underlines the fact that care must be taken when selecting database content material as this

will ensure that the database will be useful for all aspects of system training and testing (Lamel,

1989).

The goal of the CARES corpus was not to construct a corpus for the training of a large

continuous spoken word recognizer, but rather, to have enough speech samples of relevant

content in which to build a small vocabulary either isolated or continuous spoken word

recognizer following the recommendations outlined in (Young & Mihailidis, 2010). Work by

other researchers also suggested that combining a small amount of targeted speech data with a

larger collection of speech data could be used to finely tune an ASR for the intended end-user

(Vipperla et al., 2009).

The CARES corpus was primarily designed for testing the HELPER and to test and train various

components contained within the HELPER’s communication module, namely the ASR acoustic

model. For this reason, a portion of the speech content included in the CARES was modeled off

another database also used for training ASR systems called the SCRIBE which contains

phonetically compact and rich sentences and some spontaneous speech. The other portion of the

151

CARES speech content comprises the personal emergency keywords, phrases, and scenarios that

represent the end-user in various PESs.

Having context relevant speech samples are invaluable for being able to replay a mock PES or a

group of spoken words or phrases during HELPER communication module testing. Comparison

testing of different ASR acoustic models or spoken dialogues could then be performed and

differences in outcomes could be attributed to actual modifications made to the design or settings

of the system components themselves rather than to any variability present in the incoming

speech. Environmental noise could also be added or different microphones used with the

recorded speech samples to observe their effect on recognition accuracy and system output.

The CARES corpus was designed to include five different types of speech samples: (1)

spontaneous, non-emergency related, continuous speech; (2) non-emergency related read

sentences; (3) emergency related and read with emotion phrases/sentences; (4) read words

spoken in different manners (i.e., fast, slow, loudly, softly, normally); and finally (5) read with

emotion/enacted emergency conversations.

The database included speech samples from 40 adult participants between the ages of 23 to 91

years within three age ranges: (1) 19+ to <55 years, (2) 55 to 69 years, and (3) 70 years and over.

A majority of participants were adult actors with over one year of experience in acting and had to

meet some minimum demographic criteria (e.g., minimum foreign accent, Canadian residents,

fluent in English with no speech difficulties, etc.)

All participants performed four speech exercises, over the course of 2 hours, which comprised:

(1) 5 minutes of spontaneous monologue, (2) the reading of 96 phonetically rich and compact

sentences, (3) reading with emotion 185 emergency phrases with a keyword repeated 5 different

ways, and (4) enacting of three different emergency scenarios. The 185 emergency keywords and

phrases and 9 scenarios were all derived from response call transcripts. The scenarios were

selected to portray various response call classifications (e.g., high, medium, low risk calls, fall

and medical calls, caregiver or older adult calls, ambulance versus request for other responder

types). Roughly 3,200 minutes (~53 hours) of speech were recorded.

152

5.2.4 Data Interpretation Highlights

The studies conducted as part of this dissertation demonstrate the richness of the data that can be

obtained from analyses using response call recordings. Some reflections on the research data are

discussed next.

5.2.4.1 Identification of Keywords & Phrases

To identify keywords and phrases in response call transcripts, this study considered the universal

design approach in developing the PES model taking into account different caller types during

various PESs. Categories describing the caller types, call reason and risk level were created and

identification of keywords and phrases were based on word categorization based on word

meaning and function. The initial full keyword set was further reduced by taking into account

word occurrences and occurrences across the PES categories. This process hopefully ensured

that enough keywords (and associated phrases) were selected to sufficiently represent the variety

of different PESs observed in the response call recording set. The use of word meaning, function,

and occurrence to select keywords are common techniques used in other studies examining

automatic keyword extraction techniques (Haggag, 2013; Madane, 2012). Whether the same

keywords would have been identified by a computer has not been examined. However, in this

particular study, humans were needed to categorize the keywords according to their meaning and

function, to identify keywords that were still important but occurred very infrequently, as well as

to assess the words for inclusion into/exclusion from the final small keyword set.

With respect to expanding the vocabulary of the HELPER’s communication module or the SDS,

there is a trade-off between having the HELPER recognize more words and maximizing the

recognition accuracy of the ASR. A smaller vocabulary usually equates to higher rates of word

recognition. In this study the final small keyword set was limited to 185 words mostly based on

the reality of having to record the data with older adults in a timely fashion for the CARES

Corpus. Although this was also in keeping with wanting a small vocabulary set. Although the

full set of 402 words might reflect to a greater extent the actual vocabulary used in the

transcripts, the small vocabulary set should also contain enough data with which future

researchers can determine whether more vocabulary is needed or not. For example, this set

should allow future researchers to determine whether a small subset of the 185 keywords is

153

sufficient, or if the full set of 185 keywords is needed, or if it would be better to use the entire

402 keyword set.

5.2.4.2 Statistical Analyses of Conversational Measures

Statistical measures look at trends in data and significance is relative to both the population

sample size and the p value selected. However, just because a statistical test is insignificant does

not necessarily mean that the observation being tested is insignificant. It could be that the mere

presence of an occurrence is significant in and of itself, although, with a sample size of one, a

statistical test is likely not the best way to demonstrate significance. In this study, the frequency

of mazes between callers was not shown to be significantly different from each other. However,

in one of the response calls the older adult caller had significant communication difficulties

resulting in a higher number of mazes (see Chapter 3, Figure 3-12d, case#47). Despite the

statistical test being non-significant, this one case is significant as it indicates that the caller’s

communication ability was considerably reduced (it is an outlier). Furthermore, the presence of a

high number of mazes would likely limit the HELPER speech handler’s ability to both recognize

and understand this caller’s speech.

Looking at the timing measure results (see Chapter 3, Figures 3-14 a,b), the number of speaker

turns needed before a response was initiated was not found to be significantly different between

the older adult and care provider callers. A closer look at the boxplot graphs shows a lot of

variance in the data especially for the older adult caller at the medium risk level. It could be that

categorizing the response calls in a slightly different way would better divide the data and yield

different trends. For example, characterizing the callers based on their communication style as

opposed to being an ‘older adult’ or ‘care provider’ caller may be more aligned with the work by

(Wolters et al., 2009) who identified two groups of older adult SDS users. These groups

consisted of “social” older adult users who interacted with their SDS as if it was a human and the

“factual” older adult users who communicated succinctly with their SDS. Of course, having

more samples would also strengthen the power of the observed findings.

Finally, statistical analyses were performed using average measures of verbal ability,

conversational structure, and timing over the entire transcript. However, it is not expected that

the HELPER would be conversing at length with the user. Therefore, calculating the

154

conversational measures using only a few utterances after the opening query of the response call

conversation may have been a better choice for analysis and is recommended for future analyses.

In general, using statistical measures to examine the response call transcripts has revealed

valuable information that can be applied to improving the HELPER communication module,

however, it should be noted that these tests only look at data trends and non-significant tests may

still be significant.

5.2.4.3 Actor Simulated PESs

In the making of the CARES Corpus actors were solicited to simulate personal emergency

response situations or emotions while providing voice samples of PES related words, phrases and

conversations. Some qualitative observations are offered based on having first listened to the live

response call recordings, followed by conducting the acquisition of speech samples from study

participants.

1. Older adult actors seem better able to portray older adults in PESs more realistically than

younger adults, most likely because of their personal life experiences.

2. Theatre/stage actors compared to movie/TV actors were found to enunciate more clearly

which may be a little unnatural in real life conversations.

3. Some actors tended to “over-act” when imagining how the older adult would respond

during an actual PES. This is likely due to lack of experience or knowledge in

understanding what happens when someone experiences these various PESs.

4. The degree to which the recorded speech samples are able to replicate true speech during

real PESs ranges from not-quite believable to quite believable.

For observation 4, this variability in acting ability could be attributed to the actors’ prior acting

and life experiences as well as fatigue. For example, one of the older adult actors, who, in my

opinion, was at the most believable end of the spectrum, commented that recording the PES

words, phrases, and scenarios was one of the most challenging assignments this actor had been

given. In this situation, the actor was referring to the degree of emotion and feeling required to

play the role and the rapidly changing situations occurring within each phrase/word and scenario

in the corpus. This actor paused before recording each sentence/phrase/keyword set to determine

how she would speak the utterance. In contrast, another less experienced actor did not always

155

take this extra time to ‘setup’ resulting in a less realistic outcome. However, overall, it is

believed that the speech samples collected in this study using actors will more accurately reflect

the real PESs as opposed to collecting speech samples from non-actors simply reading words.

This observation is aligned with the findings from other researchers using actors in their studies

(Murray & Arnott, 2008; Scherer, 1986; Scherer, 2003; Williams & Stevens, 1972).

5.3 Contributions to Knowledge

5.3.1 Original Research with Response Call Recordings

Over the last four decades, there have been many research studies examining the use of PERS

technology. These studies have examined the PERS benefits, reviewed its impact at the personal,

family, social, and medical institution levels, as well as at the system design and technology

levels. In addition, a handful of literature exists examining emergency response (911) call

conversations. However, no research studies could be identified that actually focuses on

characterizing personal emergency response calls and call conversations. When trying to

understand the intricacies of a process in the real world involving people, looking only at

quantitative data or only at qualitative data would have provided only a partial view of the bigger

picture (Krippendorff, 2012). As a result, Study 1 and 2, as described in Chapters 2 and 3

respectively, are original studies performed using a mixed methods approach where the

knowledge gained from the initial qualitative content analysis is used to inform the design of the

quantitative content analysis. All three studies in this dissertation help to fill a research gap by

providing a better understanding of what happens during personal emergency response calls and

call conversations between PERS callers and call takers.

5.3.2 Applying Research Findings to the HELPER

A summary of the main research contributions to knowledge that pertain specifically to the

HELPER are listed next followed by an explanation or where the knowledge can be applied

specifically within the HELPER communication module.

1. Keywords were identified to increase the vocabulary of the HELPER ASR (acoustic and

pronunciation models). Key phrases were identified to improve the HELPER ASR language

model.

156

2. Keyword categories were identified to improve the Semantic Analyser of the Speech

Informant component of the HELPER communication module.

3. A personal emergency response (PER) model was developed and used to classify and

characterize personal emergency response calls.

4. Measures of words per minute or turn length in words were found to be fairly good at

predicting caller type (e.g., older adults spoke fewer words per minute than care providers).

5. Significant patterns in call response requests were observed that are related to the caller type

and risk level of a call (e.g., high risk calls were associated with ambulance requests over

95% of the time).

6. Differences in call dialogue were observed between caller types and at different risk levels

(e.g., care providers tended to be more succinct).

7. Call timing measures were obtained based on the call’s risk level. These timing values can be

used to plan the length of the HELPER dialogue by time and/or speaker turns.

8. The CARES Corpus was developed and can be used for HELPER ASR training, and ASR

and system testing.

These studies build upon prior research knowledge (i.e., the recommendations from the first

HELPER prototype testing), and contributes new knowledge that can be used to develop

HELPER application specifications and/or contribute to the actual development of the HELPER

communication module.

In the literature review in Chapter 1, the components of the HELPER communication module

were illustrated and described. Figure 5-1 presents the assembled internal components of the

HELPER communication module including the basic internal sub-component units. The units

circled in “orange” indicate the areas of the communication module in which the study results

can be applied. The application of these study findings to theses specific communication module

sub-components is outlined next.

157


Incoming Speech(from microphone)


Speech Handler



ResponseHandler


Select Response

DialogueHandler

Responder On Route

Automatic Speech

Recognizer (ASR)

Semantic Analyzer (NLU)

Dialogue Measures

PERC Classifier

Dialogue Manager

Response Generation

Speech Synthesis

Speech Informant

Dialogue History


Spoken Dialogue

Dialogue State




Dialogue Control

Dialogue Set

Call Responder

Figure 5-1: Diagram of the internal components of the HELPER Communication Module.

Application of findings to the ASR component of the Speech Handler:

1. ASR – The 185 keywords identified from Study 1 can be used to expand the Linguistic

Model’s pronunciation model and are used within the CARES Corpus which can be used

to train the Linguistic Model’s acoustic model. The 185 key phrases identified from

Study 1 can be used to train the Linguistic Model’s language model. Together, these three

158

models can be used to improve the ASR component of the HELPER SDS. The output of

the ASR is a “best match” guess at the speech input. This information is sent on to the

Speech Informant for interpretation of the meaning of this “best match” spoken utterance.

Application of findings to the Speech Informant component of the Speech Handler:

2. Semantic Analyser – The 16 word categories identified from Study 1 can be applied in

the semantic analyser to improve understanding of the recognized words coming from the

ASR decoder. Speech understanding could occur by matching the recognized words with

their word categories to derive the associated word function or word group intent. This

information would then be sent on to the Dialogue Manager for decision making purposes

(see Table 2-12 in Chapter 2 for an example of how a speech unit would be associated

with the word categories).

3. Dialogue Measures – the discriminant function using words per minute and caller turn

length in words identified in Study 2 could be used for predicting whether the incoming

speech from the first utterance is more likely to be from a care provider or an older adult.

The final prediction could then be sent to the response call classifier for processing.

Application of findings to the Dialogue Manager component of the Dialogue Handler:

4. Call Classifier – the keyword categories, measures of words per minute and turn length in

words, PER model, and response type preference results from Studies 1 and 2 can be used

to design the response call classifier unit in the Dialogue Manager. See Figure 5-2 for an

example of what the classifier structure might look like. The classifier unit would receive

information from the Speech Informant and use this information to classify the response

call if possible. For example, keywords and conversational/dialogue measures identified

from the incoming speech might indicate that assistance is needed, the call reason is a

fall, an older adult is calling, and the risk level is medium. The classifier in Figure 5-2

would then identify that the older adult’s first responder or the call taker (see response

type #2 under the medium risk level category) is the response type to propose to the

Dialogue Control sub-component in the Dialogue Manager.

159

LOW RISK(Not-urgent)

MED RISK(Urgent)

HIGH RISK(Emergent)

AccidentTesting

Check-inOther (i.e., time, day)

Need assistance soonNon-life threatening

Injury, fall, illness

Immediate assistance Loss of life or limb

Major illness or injury

1. HELPER2. Call Taker

1. HELPER2. OA Responder or

Call Taker (if no responder listed)

3. EMS paramedic

1. HELPER2. EMS - ambulance3. OA Responder 4. Call taker

(Risk Level)

(Response Type)

(Call Reason)

NORMAL(Not activated)

Normal routineNo issues

1. None

Care Provider Speaker Or

Older Adult SpeakerOlder Adult SpeakerNo speaker(Caller Type)

Care Provider Speaker Or

Older Adult Speaker

Figure 5-2: Diagram showing a possible response call classifier setup based on the study findings.

5. Dialogue Control – the findings from Study 2 suggest that the dialogue may need to be

adjusted depending on the caller type and risk level. Using the previous example from the

call classifier above, the information received by the dialogue control suggests that the

next response type should be an older adult responder or call taker if no responder is

listed. Based on this information, the dialogue control might then select a dialogue set or

script to follow based on a medium risk, fall call for an older adult caller. Some examples

are provided next.

In Table 5-1, the opening dialogue from the original script used by the first HELPER

prototype is shown. This script takes only ‘yes’ or ‘no’ responses and so the response call

classifier does not have much information to help the dialogue control. The user is

queried for more information and confirmation. The original HELPER dialogue is shown

in column 1.The user response in the succinct style is shown in column 2. Column 3

shows the changes to the response call classier settings as the dialogue proceeds.

160

Table 5-1: Example of how the original HELPER initial dialogue strategy and response call classifier may work with incoming user responses.

HELPER Dialogue (original script)

User Dialogue (system-initiative)

Response Call Classifier

OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help? Please say ‘yes’ or ‘no’.

Yes

Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Unknown

1st QUERY: Would you like me to call an ambulance? Please say ‘yes’ or ‘no’

Yes

Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Ambulance

CONFIRM: Okay {Mr. Smith}. I will call an ambulance right away. Please say ‘yes’ to confirm.

Yes

Caller Type: Older Adult Call Reason: Unknown Risk Level: Unknown Response: Ambulance

In Table 5-2, the user response is more social or human-to-human like and so more

information is obtained immediately. The response call classifier is set to high and the

dialogue set follows a “high alert” type script. The ambulance response is confirmed in

the 1st query and confirmation is requested from the user. The HELPER dialogue is

shown in column 1, user responses in a “human-human like” style are shown in column 2

and column 3 shows the changes to the response call classier settings as the dialogue

proceeds.

Table 5-2: Example of how initial dialogue from a high alert dialogue strategy and response call classifier may work with incoming user responses.

HELPER Dialogue Set (high alert script)

User Dialogue (mixed-initiative)


OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help?

Yes, could you send an ambulance please?

Caller Type: Older Adult Call Reason: Unknown Risk Level: High Response: Ambulance

1st QUERY: You said you wanted {an ambulance}, is that correct?

Yes, that’s right, I need an ambulance


CONFIRM: Okay {Mr. Smith}. I will call an ambulance right away. Please hold on.

Alright, thank you


161

In Table 5-3, the user response leads to the classifier being set at a medium level risk for

an older adult fall call. The dialogue control activates the “medium alert” dialogue set

script. The Older Adult responder is suggested in the 1st query as the best response to

propose. The HELPER dialogue is shown in column 1, user responses in the “human-

human like” style are shown in column 2, and column 3 shows the changes to the

response call classier settings as the dialogue proceeds.

Table 5-3: Example of how initial dialogue from a medium alert dialogue strategy and response call classifier may work with incoming user responses. OA = Older adult.

HELPER Dialogue Set (medium alert script)

User Dialogue (mixed-initiative)


OPENING: Hello {Mr. Smith}. This is your automated health monitoring system. Do you need help?

Oh, yes, I fell down

Caller Type: Older Adult Call Reason: Fall Risk Level: Medium Response: OA Responder

1st QUERY: You {fell down}. Would you like me to call {OA responder}?

Yes please, thank you


CONFIRM: Okay {Mr. Smith}. I will call an {OA responder} right away. Please hold on.

Alright


If the original script was followed for the situation in Table 5-3, the ambulance response

would still be proposed first and then other responders would be proposed subsequently.

Essentially the dialogue control will select different dialogue sets depending on the

response call classification. The Dialogue Manager would also keep track of the current

dialogue set and state being implemented as well as any necessary dialogue history.

The SDS in Table 5-1 is an example of the system-initiative dialogue style and Tables 5-2

and 5-3 are examples of the mixed-initiative dialogue style. Recall that the mixed-

initiative dialogue style means that the system will prompt the user for a response but if

more information is provided than requested, the system will attempt to decipher this

extra information. The knowledge created from these two studies will permit the

HELPER SDS to be expanded to the mixed-initiative style of dialogue.

162

6. The timing measures calculated from Study 2 can also be applied to the dialogue control

to ensure that the HELPER SDS communications do not extend beyond the time that is

considered typical for a response call. The spoken dialogue responses themselves can be

constructed in such a ways so as to meet the minimum number of speaker turns needed in

a live response call.

Application of Findings to the Response Generation component of the Response Handler:

7. Dialogue Text – the findings in Study 2 that suggested that the terms “paramedic” and

“ambulance” convey different meanings and represent different responses, could be

incorporated into the actual dialogue response presented to the user. Examples of how the

dialogue may change can be observed in Tables 5-2 (ambulance offer) and 5-3 (OA

responder offer) in column 1. Also, in Figure 5-2, in the medium risk situation, the

paramedic is offered as option #3, whereas in the high risk situation, the ambulance is

offered as option #2. These responses are stored in the HELPER computer and accessed

by the Response Generation sub-component of the Response Handler.

Application of findings to the HELPER Communication Module:

8. System testing with the CARES Corpus – when a new HELPER communication module

prototype is constructed, the system can be tested as a whole using keywords, phrases,

and emergency scenarios contained in the CARES corpus. Individual components of the

communication module can also be trained, such as the ASR language and pronunciation

models, and/or tested, such as the Speech and Dialogue Handlers.

5.3.3 The CARES Corpus

The CARES corpus developed in Study 3 can be used as a development and testing tool for the

design and development of the HELPER system, amongst other possible ASRs or SDSs that may

require interaction with older adult users.

Although a large number of speech databases have been created specifically for ASR training

and testing, none of these corpuses contained speech samples of older adults in emergency

situations speaking Canadian English. To our knowledge, the CARES corpus will be the first

163

speech database created that contains a collection of Canadian adult speech recordings including

younger and older adult actor simulated emotion during various personal emergency situations

derived from real response call recordings. Although this speech corpus was developed primarily

to be used for enhancing the HELPER communication module, the contents of this database can

also be applied to other research applications across several research disciplines. For example,

the five minute monologues maybe of interest to linguists who wish to study different speaking

patterns or word usage by individuals across various age groups (e.g., 23-90 years). Sociologists

may be interested in the different monologue topics selected by individuals of different ages.

Computer scientists might be interested in testing out various computer algorithms for improving

speech recognition or natural language understanding of the spoken text across age ranges.

Acousticians or speech language pathologists may be interested in the acoustic phonetic features

present in the speech samples of individuals representing the older and younger ages or in

emergency type situations. Other researchers may be interested in examining the “believability”

aspect of the enacted scenarios and speech samples provided.

5.4 Limitations

In this section, an overview of the main study limitations will be presented. Where appropriate,

recommendations have been included for future studies that may be conducted in a similar field

or fashion.

5.4.1 Study 1: Keyword and Phrase Identification

This study was limited by:

(1) the researcher’s ability to accurately transcribe the call recordings, to derive the most

appropriate word categories from the call transcripts, and to identify a small keyword set that

maximally represents the call conversations with limited bias; and

(2) the ability of the coders to select words that accurately represent the keywords of the call

conversations and to accurately categorize the keywords selected with limited bias.

Although inter-rater reliability between the three coders showed moderate agreement, two of

the three coders categorized words out-of-context and one in-context. It is recommended that

if this process were to be replicated, all the coders would extract the keywords in-context by

164

reading the transcripts. Having all the coders identify and categorize the words in the same

way should theoretically improve inter-rater reliability, but also allow for comparison of

words with multiple meanings and thus multiple categories.

5.4.2 Study 2: Statistical Measures

This study was limited by the sample size of response call recordings (from only one call centre)

upon which the results were derived. It would have been beneficial to obtain a larger sample size

or a more balanced sample size for the different call classifications for example, more samples of

the fall category, and from different call centres. However, given that the response call

classifications were not known prior to obtaining the response call recordings, this may have

been difficult. Future studies may want to consider working out the parameters and factors to be

included in and the number of samples required for the statistical analyses prior to the end of

response call data collection. Call meta-data was not available which would have been useful to

have in order to confirm some assumptions made; for example, the gender and sex of

participants, use of traditional push button in all cases, each call taker spoke with a different

caller and each caller was unique.

5.4.3 Study 3: Creating the CARES Corpus

The process of creating the CARES Corpus had several limitations:

(1) The end result of a response call was not known. This information would have been

beneficial for confirming the response call risk level and call reason classifications.

(2) Given the age of some of the speech participants it was decided that recording all 402 words

in the full keyword set would be too ambitious at this stage of the research.

(3) The recording results obtained in this study are limited by the speech participant’s ability to

act realistically like an older adult or caregiver caller in a PES during a response call in their

recording session.

In order to maximize realism in simulating PESs for speech sample recording, it likely would

have been beneficial to have the speech participants listen to some of the real response call

recordings prior their actual recording session. Although they were given a sample response

165

call scenario to read prior to their session, listening to an actual call would have been better

to help guide the actor’s performance rather than giving them open reign for creative

expression.

5.4.4 PES and PER Call Classifications

These studies are limited by the researcher’s ability to identify suitable categories for the PES

and PER models. For example, although caller type was characterized by two categories: older

adult and care provider, perhaps characterizing the call by a caller’s communication ability using

‘factual’ and ‘social’ categories, as coined by (Wolters et al., 2009), would have led to different

conclusions.

5.4.5 Methodology Limitations

The content analysis approach is flexible enough to be used qualitatively or quantitatively, which

is both its strength and its limitation. As Morgan (1993) stated, content analysis is argued to be

not qualitative enough by qualitative researchers and not quantitative enough by quantitative

researchers. Additionally, the reliability in the data obtained through content and conversational

analyses is highly dependent on the researchers who collect, code, and process the data,

especially when doing so qualitatively. Reliability of data tends to be demonstrated through the

use of multiple coders, descriptions of the process that show how the data and results are linked,

and by demonstrating support from existing research literature. However, there is still debate as

to what and how much evidence is required to support reliability and thus establish research

validity (Krippendorff, 2012).

5.5 Future Research

5.5.1 Supporting the ASR

With respect to identifying non-semantic information that could be used in conjunction with the

semantic information derived from the HELPER SDS, future studies may wish to consider

different measures from the ones examined. For example, in Study 2, speech intelligibility could

not be examined due to the quality of the call recordings received. However, future studies could

examine this measure if better quality call recordings were acquired. Other measures to consider

166

might be speaker disfluency, voice pitch, or shimmer and jitter. In a study by (Müller, Wittig, &

Baus, 2003) shimmer and jitter were used to identify a speaker’s age and gender.

5.5.2 Developing the Dialogue - Assessing Patterns in Response Call Conversations

The need to develop the communication dialogue for the older adult user was recommended by

researchers involved in the preliminary HELPER prototype testing. This recommendation

specifically described the need to add additional dialogue states to the HELPER communication

module. This need was based on a few factors. First, the HELPER prototype system does not

“hear” a user’s response until the system has finished speaking and has turned its microphone

‘on.’ Therefore, if the user speaks too soon, nothing is heard by the system. Second, the

HELPER would need some way to respond to instances of silence and out-of-vocabulary sounds

and when repetition is required. To determine how to improve the HELPER communication

dialogue for situations such as these described, one method would be to examine the

conversational dynamics between call takers and callers to identify how real call-takers handle

these conversational situations and to see how the users respond in return. This research would

be aligned with the research completed thus far using the real personal emergency response calls

and has, in fact, already started. The objective of this research in progress is: to identifykey

conversational patterns in personal emergency response calls.

This study looks at coding the call conversations based on dialogue acts and using these acts to

observe any conversational patterns that may be useful to replicate in the HELPER SDS. A

conventional conversational analysis performed at the conversation or turn level of the response

call is being considered. Figure 5-3 illustrates a flow diagram of how this work would unfold.

167

Call Transcriptions (SALT)


(a)

Identify Dialogue Codes

Code Response Call Dialogue Acts

To Improve dialogue development (Dialogue Management)

Conventional Conversational Analysis at Conversation/Turn Level

Identify Patterns or Conversational Techniques

Figure 5-3: A flow diagram illustrating the methodology followed to analyse the calls and complete study objective.

Figure 5-4 further highlights where the work from this study would be applied along the pathway

to personal emergency response. In Figure 5-4, the potential results of this study are represented

by the term “dialogue acts” (see yellow star) which is positioned between the HELPER computer

and the user. It is located after the ‘conversation’ as any patterns identified would be applied to

further develop the dialogue states, the dialogue manger, and the actual dialogue used within or

by the HELPER.

168

Situation

Call reason

UserCaller Type

Physical-Cognitive

State

Risk Level




ClassifierASR




Timing

SI


Word categories

Conversationalmeasures

HELPER Computer

What response?

Co

nve

rsat

ion

3. P

ER

S R

esp

on

se

Dialogue Acts

Figure 5-4: The pathway to personal emergency response with “dialogue acts” applied to help the HELPER.

This study is currently in the data processing phase.

5.5.3 HELPER Field Testing - Future Proposed Studies

The next research recommendation involves testing the HELPER in mock or real PESs. Before

in-field testing is possible (i.e., in a home of an older adult), a new and robust HELPER

prototype must be designed and developed and undergo mock testing using simulated users.

Several future studies are proposed next which will bring the HELPER up to this point where

actual system testing can be performed.

5.5.3.1 Developing the HELPER Speech Handler

CARES Corpus Current State

Although the speech recording for the CARES corpus has been completed, the process of

organizing the data into a format so that it is available for distribution is still ongoing. Manual

segmentation of the audio recordings of PES phrases and keywords has been completed, but the

169

work needs to be verified. Further work must be done to organize this information including

proper labelling and sorting into appropriate file folders. In order for these speech samples to be

used for ASR training, assuming a typical continuous speech recognizer is being used, the

phrases and words will need to be further segmented into phone units. This may be done

automatically (using forced alignment), however, the output would need some verification

especially for the keywords since these words were spoken were using five different methods of

expression (i.e., fast, slow, etc.) and may be more difficult for forced alignment software to

process.

Building the HELPER Speech Handler

A future example study involving the HELPER Speech Handler might include some or all

aspects of the following steps:

1. Perform forced alignment on the CARES Corpus audio speech samples, for example, on

keywords and phrases. However, phonetically balanced and compact sentences can be

used if needed.

2. Use these samples to build one or many acoustic models for the HELPER ASR.

3. Build-up the pronunciation and language models of the HELPER ASR.

4. Perform comparison testing of various ASR acoustic models with the phrases and

keywords from the CARES Corpus to identify the model with the best recognition

accuracy. The different acoustic models used in the comparison might include: (1) all

CARES corpus participant speaker samples, (2) only older adult speakers from the

CARES corpus, (3) only younger adults from the CARES Corpus, (4) an existing

acoustic model (e.g., AN4 or Wall Street Journal), (5) a mix of existing acoustic models

with CARES Corpus samples added.

5. Perform comparison testing of various language models using the CARES Corpus

content.

6. Build the Semantic Analyser and test for speech understanding using CARES Corpus

content.

7. Assemble the HELPER Speech Handler using the acoustic model with the highest

accuracy for recognizing the recorded PES speech samples from the CARES Corpus, the

associated pronunciation and language models and semantic analyser.

170

8. Test newly assembled HELPER Speech Handler with recorded speech samples from the

CARES Corpus.

5.5.3.2 Developing the HELPER Dialogue and Response Handlers

A future example study involving the HELPER Dialogue Handler and Response Handler might

include some or all aspects of the following steps:

1. Identify the necessary dialogue states to include in the HELPER SDS.

2. Determine the dialogue sets and actual dialogue responses to be spoken within each

dialogue state.

3. Build the speech synthesizer component of the Response Handler or pre-record the

dialogue responses for inclusion into the dialogue database.

4. Design the dialogue strategy to be followed by the dialogue control in the Dialogue

Manager based on previous research findings (i.e. from Study 2, and the conversational

patterns study described in section 5.5.2).

5. Code and setup the new dialogue strategy and other necessary components of the

Dialogue Manager and Response Handler.

6. Various dialogue strategies, dialogue sets, and dialogues may be compared and tested

using Wizard of Oz techniques with older adult research participants.

7. Perform a survey to obtain user feedback and comments.

8. Select the best dialogue strategy, dialogue sets, and dialogues to include in the HELPER

Dialogue Handler and Response Handler.

5.5.3.3 Testing the HELPER Module

A future example study involving the HELPER communication module might include some or

all aspects of the following steps:

1. Using the desired Speech Handler, Dialogue Handler, and Response Handler

components, build the HELPER communication module.

2. Test the new HELPER communication module/SDS in a home-like setting using

simulated users from the CARES Corpus.

3. Make necessary adjustments.

171

4. Test the system in a home-like setting using live older actors and younger adults (care

providers). Use various response strategies as determined from Study 2 and the study

described in section 5.5.2.

5. Perform a survey to obtain user feedback and comments.

5.6 Implications

The goal of the HELPER system is to identify and respond to potentially critical personal

emergency situations. As such, it needs to work well. The infrequent occurrence of a PES

combined with their delicate nature and the target age group for this technology (mainly older

adults), make PESs difficult to study in the real-world. To better understand the process of a

response call during a PES and to be able to characterize the response call conversation and

understand the spoken keywords and phrases that convey one’s need, actual response call

recordings are the best medium for study. Otherwise trying to witness these situations are

difficult and it would be unreliable to depend on the recall memory of people who have

experienced these events. The personal emergency response call recording is our pot of gold.

The knowledge created from these research studies were derived from live personal emergency

response call recordings. Furthermore the studies developed employed qualitative, quantitative,

and mixed method approaches in order to obtain the required data. In forming the PES and

response call models, this research also considered both the user and his/her environment. By

incorporating these research techniques and methods of analyses, this study is, in a sense,

attempting to create in the HELPER an ability to predict the outcome of a PES in order to

improve its ability to make decisions and respond appropriately to the user. The main research

findings from these two studies and the CARES Corpus development collectively contribute new

knowledge to, plus a research tool for, the future development of the HELPER.

With respect to the field of rehabilitation, if the development of the HELPER can be realized, the

potential will be there for the older adult to use this technology to help him/her age-in-place. The

HELPER would enable the older adult to access medical or emergency care whenever needed.

The main advantage over the traditional PERS is that the HELPER would not require active user

initiation because this system would also be actively monitoring the older adult in their home on

172

a continual basis. Upon identification of an adverse event, the HELPER would initiate

conversation with the user and contact assistance as appropriate. Older adults that are well cared

for medically tend to suffer from less impairment, recover more quickly, are healthier, and are

able to retain their functional ability for a longer time.

5.7 Final Remarks

This dissertation presents original work from three research studies. Chapter 2 describes the first

study where keywords, phrases, and word categories were identified from the response call

transcripts. These results could be applied to the development of the HELPER ASR and Speech

Informant units. In addition, a PES model was developed. Chapter 3 describes the second study

where a PER model was developed, conversational measures of response call conversations were

obtained, and significant trends in call conversations were identified that could be used in

combination with incoming semantic data to help the HELPER classify response calls. Being

able to classify response calls helps the HELPER predict a possible target response which in turn

enables it to modify its output dialogue to meet the needs of the caller type and situation risk

level. These results could be applied to the Speech and Dialogue Handler components of the

HELPER. Chapter 4 summarizes how a speech corpus was designed and developed for the

purpose of training and testing various components of the HELPER ASR and the system overall.

The combined results from these three studies provide sufficient preliminary knowledge as well

as a speech corpus that collectively can be used to continue the HELPER communication module

development phase and create the next, hopefully more robust, HELPER SDS.

173

Bibliography

Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B. & Hudson, R. (1999). Recognition of elderly speech and voice-driven document retrieval. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on (Vol. 1, pp. 145–148).

Anusuya, M. & Katti, S. K. (2009). Speech recognition by machine, a review. Internationala Journal of Computer Science and Information Security, 6(3).

Baba, A., Yoshizawa, S., Yamada, M., Lee, A. & Shikano, K. (2004). Acoustic models of the elderly for large-vocabulary continuous speech recognition. Electronics and Communications in Japan (Part II: Electronics), 87(7), 49–57.

Baber, C. & Noyes, J. (1996). Automatic speech recognition in adverse environments. Human Factors: The Journal of the Human Factors and Ergonomics Society, 38(1), 142–155.

Belshaw, M., Taati, B., Snoek, J. & Mihailidis, A. (2011). Towards a single sensor passive solution for automated fall detection. In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE (pp. 1773–1776).

Bernstein, M. (1999). “ Low-tech” personal emergency response systems reduce costs and improve outcomes. Managed Care Quarterly, 8(1), 38–43.

Blackwell, T. H. & Kaufman, J. S. (2002). Response time effectiveness: comparison of response time and survival in an urban emergency medical services system. Academic Emergency Medicine, 9(4), 288–295.

Blythe, M. A., Monk, A. F. & Doughty, K. (2005). Socially dependable design: The challenge of ageing populations for HCI. Interacting with Computers, 17(6), 672–689.

Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F. & Brennan, S. E. (2001). Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44(2), 123–147.

Campbell, N. (2000). Databases of emotional speech. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion: Developing a Conceptual Framework (pp. 34–39). Newark, N. Ireland.

Canadian_Red_Cross_Association. (2006). First Aid & CPR Manual. The StayWell Health Company.

Cavanagh, S. (1997). Content analysis: concepts, methods and applications. Nurse Researcher, 4(3), 5–13.

174

Chan, M., Campo, E., Estève, D. & Fourniols, J.-Y. (2009). Smart homes—current features and future perspectives. Maturitas, 64(2), 90–97.

Childress, D. S. (2003). Development of rehabilitation engineering over the years: As I see it. Journal of Rehabilitation Research and Development, 39(6; SUPP), 1–10.

CIHI. (2011). Health Care in Canada, 2011: A Focus on Seniors and Aging (pp. 1–162).

CIHI. (2013). National Health Expenditure Trends, 1975 to 2013 (pp. 1–182).

Clark, V. L. P. & Creswell, J. W. (2011). Designing and conducting mixed methods research. In Clark, Vicki L Plano and Creswell, John W (Ed.), (pp. 53–106). Thousand Oaks, CA.: Sage.

Cornman, J. C., Freedman, V. A. & Agree, E. M. (2005). Measurement of assistive device use: Implications for estimates of device use and disability in late life. The Gerontologist, 45(3), 347–358.

Cowan, D., Turner-Smith, A. & others. (1999). The role of assistive technology in alternative models of care for older people. In A. Tinker and F. Wright and C. McCreadie and J. Askham and R. Hancock and A. Holmans (Ed.), Alternative Models of Care for Older People (pp. 325–346). Age Concerns Institute of Gerontology.

Crede, E. & Borrego, M. (2010). A content analysis of the use of mixed methods studies in engineering education. In American Society for Engineering Education.

Cromdal, J., Osvaldsson, K. & Persson-Thunqvist, D. (2008). Context that matters: Producing “thick-enough descriptions” in initial emergency reports. Journal of Pragmatics, 40(5), 927–959.

Culatta, R. & Leeper, L. H. (1990). The differential diagnosis of disfluency. National Student Speech Language Association Journal, 17, 59–64.

Davies, K. N. & Mulley, G. P. (1993). The views of elderly people on emergency alarm use. Clinical Rehabilitation, 7(4), 278–282.

De San Miguel, K. & Lewin, G. (2008). Brief Report: Personal emergency alarms: What impact do they have on older people’s lives? Australasian Journal on Ageing, 27(2), 103–105.

Demiris, G., Hensel, B., Skubic, M. & Rantz, M. (2008). Senior residents’ perceived need of and preferences for``smart home’’sensor technologies. International Journal of Technology Assessment in Health Care, 24(1), 120.

Demiris, G., Rantz, M., Aud, M., Marek, K., Tyrer, H., Skubic, M. & Hussam, A. (2004). Older adults’ attitudes towards and perceptions of smart home technologies: a pilot study. Informatics for Health and Social Care, 29, 87–94.

175

Devillers, L. & Vidrascu, L. (2007). Real-life emotion recognition in speech. In Müller, C (Ed.), Speaker Classification II (pp. 34–42). Springer-Verlag.

Dibner, A. S. (1993). Personal response services present and future. Home Health Care Services Quarterly, 13(3-4), 239–243.

Disabled Living Foundation. (2009). Losing independence is a bigger ageing worry than dying. Retrieved May 31, 2015, from www.dlf.org.uk/blog/losing-independence-bigger-ageing-worry-dying

Doughty, K., Cameron, K. & Garner, P. (1996). Three generations of telecare of the elderly. Journal of Telemedicine and Telecare, 2(2), 71–80.

Downe-Wamboldt, B. (1992). Content analysis: method, applications, and issues. Health Care for Women International, 13(3), 313–321.

Dusan, S. & Rabiner, L. R. (2005). Can automatic speech recognition learn more from human speech perception? In P. 3rd Conf. Speech Tech. Hum.-Comput. Dialogue (pp. 21–36).

Eisenberg, M. S., Bergner, L., Hallstrom, A. & others. (1979). Cardiac resuscitation in the community. Jama, 241(18), 1905–1907.

Elo, S. & Kyngäs, H. (2008). The qualitative content analysis process. Journal of Advanced Nursing, 62(1), 107–115.

Fallis, W. M., Silverthorne, D., Franklin, J. & McClement, S. (2007). Client and responder perceptions of a personal emergency response system: Lifeline. Home Health Care Services Quarterly, 26(3), 1–21.

Federici, S. & Scherer, M. (2012). Assistive technology assessment handbook. CRC Press.

Field, A. (2005). Discovering statistics using SPSS (2nd ed.). Sage publications.

Fleming, J., Brayne, C. & others. (2008). Inability to get up after falling, subsequent time on floor, and summoning help: prospective cohort study in people over 90. Bmj, 337, a2227.

Fogle, C. C., Oser, C. S., Troutman, T. P., McNamara, M., Williamson, A. P., Keller, M., … Harwell, T. S. (2008). Public education strategies to increase awareness of stroke warning signs and the need to call 911. Journal of Public Health Management and Practice, 14(3), e17–e22.

Forslund, K., Kihlgren, A. & Kihlgren, M. (2004). Operators’ experiences of emergency calls. Journal of Telemedicine and Telecare, 10(5), 290–297.

176

Fox Tree, J. E. (1995). The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. Journal of Memory and Language, 34(6), 709–738.

Fraser, N. (1997). Assessment of Interactive Systems . In Gibbon, D. and Moore, R. and Winski, R. (Ed.), Handbook on Standards and Resources for Spoken Language Systems (pp. 564–615). Mouton de Gruyler, D-Berlin.

Freedman, V. A., Agree, E. M., Martin, L. G. & Cornman, J. C. (2006). Trends in the use of assistive technology and personal care for late-life disability, 1992-2001. The Gerontologist, 46(1), 124–127.

Furui, S. (2003). Toward robust speech recognition and understanding. In Text, Speech and Dialogue (pp. 2–11).

Furui, S., Nakamura, M., Ichiba, T. & Iwano, K. (2005). Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese. Speech Communication, 47(1), 208–219.

Garcia, A. C. & Parmer, P. A. (1999). Misplaced mistrust: The collaborative construction of doubt in 911 emergency calls. Symbolic Interaction, 22(4), 297–324.

Garner, M. & Johnson, E. (2007). Operational Communication: A paradigm for applied research into police call-handling. International Journal of Speech Language and the Law, 13(1), 55–75.

Georgila, K., Wolters, M. K. & Moore, J. D. (2010). Learning dialogue strategies from older and younger simulated users. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 103–106).

Georgila, K., Wolters, M., Karaiskos, V., Kronenthal, M., Logie, R., Mayo, N., … Watson, M. (2008). A fully annotated corpus for studying the effect of cognitive ageing on users’ interactions with spoken dialogue systems. In 6th International Conference on Language Resources and Evaluation.

Georgila, K., Wolters, M., Moore, J. D. & Logie, R. H. (2010). The MATCH corpus: A corpus of older and younger users’ interactions with spoken dialogue systems. Language Resources and Evaluation, 44(3), 221–261.

Gibson, M. J. & Hayunga, M. (2006). We can do better: lessons learned for protecting older persons in disasters.

Gilboy, N., Tanabe, P., Travers, D., Rosenau, A., Eitel, D. & others. (2005). Emergency severity index, version 4: implementation handbook. Rockville, MD: Agency for Healthcare Research and Quality, 1–72.

177

Glass, J. & Zue, V. (2003). 6.345 Automatic Speech Recognition, Spring 2003, Lecture#1 (Massachusetts Institute of Technology: MIT OpenCourseWare). Retrieved May 18, 2015, from http://ocw.mit.edu

Gold, B. & Morgan, N. (2000). Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons Inc.

Gorham-Rowan, M. M. & Laures-Gore, J. (2006). Acoustic-perceptual correlates of voice quality in elderly men and women. Journal of Communication Disorders, 39(3), 171–184.

Graneheim, U. H. & Lundman, B. (2004). Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. Nurse Education Today, 24(2), 105–112. doi:10.1016/j.nedt.2003.10.001

Haggag, M. H. (2013). Keyword Extraction using Semantic Analysis. International Journal of Computer Applications, 61(1), 1–6.

HAK, T. (1997). Coding effects in comparative research on definitions of health. The European Journal of Public Health, 7(4), 364–372.

Hall, D. & Sinard, R. J. (1998). The aging voice: how to differentiate disease from normal changes. Geriatrics, 53(7), 76–79.

Hall, N. E., Wagovich, S. A. & Bernstein Ratner, N. (2007). Language considerations in childhood stuttering: distinguishing between stuttering and other forms of disfluency in Section III: Intervention: Children who stutter with other co-occurring concerns . In Conture, Edward and Curlee, Richard (Ed.), Stuttering and related disorders of fluency (p. 162). Thieme.

Hamill, M., Young, V., Boger, J. & Mihailidis, A. (2009). Development of an automated speech recognition interface for personal emergency response systems. Journal of NeuroEngineering and Rehabilitation, 6(26), 1–11.

Handschu, R., Poppe, R., Rauss, J., Neundörfer, B. & Erbguth, F. (2003). Emergency calls in acute stroke. Stroke; a Journal of Cerebral Circulation, 34(4), 1005–9.

Heinbüchner, B., Hautzinger, M., Becker, C. & Pfeiffer, K. (2010). Satisfaction and use of personal emergency response systems. Zeitschrift Fuer Gerontologie Und Geriatrie, 43(4), 219–223.

Hessels, V., Le Prell, G. S. & Mann, W. C. (2011). Advances in personal emergency response and detection systems. Assistive Technology, 23(3), 152–161.

Hizer, D. D. & Hamilton, A. (1983). Emergency response systems: an overview. Journal of Applied Gerontology, 2(1), 70–77.

178

Hobbs, M. L. (1993). Product Design and Social Implications in a Personal Response Program. Home Health Care Services Quarterly, 13(3-4), 23–32.

Howell, P. & Kadi-Hanifi, K. (1991). Comparison of prosodic properties between read and spontaneous speech material. Speech Communication, 10(2), 163–169.

Hsieh, H.-F. & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277–1288.

Huckvale, M. (2004). SCRIBE Manual v 1.0 - Spoken Corpus Recordings in British English (1st ed.). Gower Street, London.

Hwang, U. & Morrison, R. S. (2007). The geriatric emergency department. Journal of the American Geriatrics Society, 55(11), 1873–1876.

Hyer, K. & Rudick, L. (1994). The effectiveness of personal emergency response systems in meeting the safety monitoring needs of home care clients. Journal of Nursing Administration, 24(6), 39–44.

IBM. (2014). IBM SPSS Statistical Software v.22.

Imbens-Bailey, A. (2000). The discourse of distress: A narrative analysis of emergency calls to 911. Language and Communication, 20(3), 275–296.

Johnson, J. L., Davenport, R. & Mann, W. C. (2007). Consumer feedback on smart home applications. Topics in Geriatric Rehabilitation, 23(1), 60–72.

Johnson, M., Cusick, A. & Chang, S. (2001). Home-screen: a short scale to measure fall risk in the home. Public Health Nursing, 18(3), 169–177.

Johnston, K., Grimmer-Somers, K. & Sutherland, M. (2010). Perspectives on use of personal alarms by older fallers. International Journal of General Medicine, 3, 231.

Jurafsky, D. (2014). CS224S/Linguist 285: Spoken Language Processing, Lecture 3: ASR: HMMs, Forward, Viterbi. Retrieved May 18, 2015, from http://web.stanford.edu/class/cs224s/

Jurafsky, D. & Martin, J. H. (2009). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Second.). Pearson Education Inc.

King, S. (2006). Language variation in speech technologies. In Brown, Keith (Ed.), Encyclopedia of Language and Linguistics (2nd ed.). Elsevier.

Klapuri, A. (2007). Semantic Analysis of Text and Speech: SGN-9206 Signal Processing Graduate Seminar II.

179

Kondracki, N. L., Wellman, N. S. & Amundson, D. R. (2002). Content analysis: review of methods and their applications in nutrition education. Journal of Nutrition Education and Behavior, 34(4), 224–230.

Koski, K., Luukinen, H., Laippala, P. & Kivela, S. L. (1996). Physiological factors and medications as predictors of injurious falls by elderly people: a prospective population-based study. Age and Ageing, 25(1), 29–38.

Krippendorff, K. (2012). Content analysis: An introduction to its methodology (Second.). Sage.

Lamel, L. (1989). Some Perspectives on Speech Database Development. In Speech Input/Output Assessment and Speech Databases.

Lamel, L., Minker, W. & Paroubek, P. (2000). Towards best practice in the development and evaluation of speech recognition components of a spoken language dialog system. Natural Language Engineering, 6(3&4), 305–322.

Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159–174.

LaPointe, L. L. (1994). Introduction to communication sciences and disorders. In Minifie, F.D. (Ed.), (pp. 351–397). Singular Publishing Group: McNaughton & Gunn.

Leadholm, B. J. & Miller, J. (1995). Language Sample Analysis: The Wisconsin Guide. Madision, WI.

Lee, T. & Mihailidis, A. (2005). An intelligent emergency response system: preliminary development and testing of automated fall detection. Journal of Telemedicine and Telecare, 11(4), 194–198.

Levine, D. A. & Tideiksaar, R. (1995). Personal emergency response systems: factors associated with use among older persons. The Mount Sinai Journal of Medicine, New York, 62(4), 293–297.

Linville, S. E. (2002). Source characteristics of aged voice assessed from long-term average spectra. Journal of Voice, 16(4), 472–479.

Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22(1), 1–15.

Longino, C. F. J. (1994). Myths of an aging America. American Demographics, 16(8), 36–42.

Madane, A. (2012). Identifying Keywords and Key Phrases. IJSCE, July.

180

Maddox, G. L. (1992). Personal response systems: An international report of a new home care service; Foreword. Routledge.

Mann, W., Belchior, P., Tomita, M. R. & Kemp, B. J. (2005). Use of personal emergency response systems by older individuals with disabilities. Assistive Technology, 17(1), 82–88.

Mann, W., Marchant, T., Tomita, M., Fraas, L. & Stanton, K. (2002). Elder acceptance of health monitoring devices in the home. Care Management Journals, 3(2), 91.

Mann, W., Ottenbacher, K. J., Fraas, L., Tomita, M. & Granger, C. V. (1999). Effectiveness of assistive technology and environmental interventions in maintaining independence and reducing home care costs for the frail elderly: A randomized controlled trial. Archives of Family Medicine, 8(3), 210.

Mays, N. & Pope, C. (2000). Qualitative research in health care: Assessing quality in qualitative research. BMJ: British Medical Journal, 320(7226), 50.

Mazzoni, D., Dannenberg, R. & et al. (2000). Audacity. Retrieved 2008, from http://audacity.sourceforge.net/

McCreadie, C. & Tinker, A. (2005). The acceptability of assistive technology to older people. Ageing and Society, 25(1), 91–110.

McLean, M. H. (2005). Design of a Speech Recognition Interface for a Personal Emergency Response and Health Monitoring System. University of Toronto.

McWhirter, M. (1987). A dispersed alarm system for the elderly and its relevance to local general practitioners. The Journal of the Royal College of General Practitioners, 37(299), 244.

Miller, J. F. & Chapman, R. S. (2008). Systematic Analysis of Language Transcripts Version 8. Madison, WI.

Miller, J. F. & Iglesias, A. (2006). Systematic Analysis of Language Transcripts (SALT), English and Spanish (Version 9). University of Wisconsin-Madison.

Mondada, L. (2012). The Conversation Analytic Approach to Data Collection. The Handbook of Conversation Analysis, 32–56.

Montgomery, C. (1993). Personal response systems in the United States. Home Health Care Services Quarterly, 13(3-4), 201–222.

Morgan, D. L. (1993). Qualitative content analysis: A guide to paths not taken. Qualitative Health Research, 3(1), 112–121.

181

Moyal, A., Aharonson, V., Tetariy, E. & Gishri, M. (2013). Keyword Spotting Methods. In Phonetic Search Methods for Large Speech Databases (pp. 7–11). Springer.

Müller, C., Wittig, F. & Baus, J. (2003). Exploiting speech for recognizing elderly users to respond to their special needs. In Proceedings of Eurospeech (Vol. 3, p. 1305\=A1308).

Mullie, A., Van Hoeyweghen, R. & Quets, A. (1989). Influence of time intervals on outcome of CPR. Resuscitation, 17, S23–S33.

Murray, I. R. & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.

Mӧller, S. (2005). Quality of Human-Machine Interaction over the Phone. In Quality of Telephone-Based Spoken Dialogue Systems (pp. 9–91). Springer.

Mӧller, S., Gӧdde, F. & Wolters, M. (2008). A corpus analysis of spoken smart-home interactions with older users. In Proceedings of the 6th international conference on language resources and evaluation (LREC) (pp. 735–740).

News Agencies. (2014). Fear of old age becomes acute after 50, study finds. Retrieved March 31, 2015, from http://www.telegraph.co.uk/news/health/news/10778168/Fear-of-old-age-becomes-acute-after-50-study-finds.html

Patel, S., Park, H., Bonato, P., Chan, L., Rodgers, M. & others. (2012). A review of wearable sensors and systems with application in rehabilitation. Journal of Neuroengineering and Rehabilitation, 9(12), 1–17.

Patil, S. A. & Hansen, J. H. (2007). Speech under stress: Analysis, modeling and recognition.

Piau, A., Campo, E., Rumeau, P., Vellas, B. & Nourhashemi, F. (2014). Aging society and gerontechnology: A solution for an independent living? The Journal of Nutrition, Health \& Aging, 18(1), 97–112.

Polit, D. F. & Beck, C. T. (2004). Nursing research: Principles and methods. Lippincott Williams & Wilkins.

Pons, P. T., Haukoos, J. S., Bludworth, W., Cribley, T., Pons, K. A. & Markovchick, V. J. (2005). Paramedic response time: does it affect patient survival? Academic Emergency Medicine, 12(7), 594–600.

Porter, E. J. (2003). Moments of apprehension in the midst of a certainty: some frail older widows’ lives with a personal emergency response system. Qualitative Health Research, 13(9), 1311–1323.

182

Porter, E. J. (2005). Wearing and using personal emergency respone system buttons. Journal of Gerontological Nursing, 31(10), 26–33.

Portet, F., Vacher, M., Golanski, C., Roux, C. & Meillon, B. (2013). Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Personal and Ubiquitous Computing, 17(1), 127–144.

Private_PERS_Call_Centre. (2008). Operations Protocol for PERS Call Centre.

Public Health Agency of Canada. (2014). Seniors’ Falls in Canada: Second Report. Division of Aging and Seniors .

Ramage-Morin, P. L. (2005). Successful aging in health care institutions. Statistics Canada.

Ramig, L. O. (1994). Introduction to communication sciences and disorders. In Minifie, F.D. (Ed.), (pp. 481–519). Singular Publishing Group: McNaughton & Gunn.

Rosamond, W. D., Evenson, K. R., Schroeder, E. B., Morris, D. L., Johnson, A.-M. & Brice, J. H. (2005). Calling emergency medical services for acute stroke: a study of 9-1-1 tapes. Prehospital Emergency Care, 9(1), 19–23.

Roush, R. E., Teasdale, T. A., Murphy, J. N. & Kirk, M. S. (1995). Impact of a personal emergency response system on hospital utilization by community-residing elders. Southern Medical Journal, 88(9), 917–922.

Ryan, G. (1999). Measuring the typicality of text: Using multiple coders for more than just reliability and validity checks. Human Organization, 58(3), 313–322.

Sacks, H., Schegloff, E. A. & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 696–735.

Salvi, F., Morichi, V., Grilli, A., Giorgi, R., De Tommaso, G. & Dessi-Fulgheri, P. (2007). The elderly in the emergency department: a critical review of problems and solutions. Internal and Emergency Medicine, 2(4), 292–301.

Scharenborg, O. (2007). Reaching over the gap: A review of efforts to link human and automatic speech recognition research. Speech Communication, 49(5), 336–347.

Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143–165.

Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1), 227–256.

Shriberg, E. E. (1999). Phonetic consequences of speech disfluency.

183

Silverman, R. A., Galea, S., Blaney, S., Freese, J., Prezant, D. J., Park, R., … others. (2007). The “vertical response time”: barriers to ambulance response in an urban area. Academic Emergency Medicine, 14(9), 772–778.

Takahashi, S., Morimoto, T., Maeda, S. & Tsuruta, N. (2003). Robust speech understanding based on expected discourse plan. In INTERSPEECH.

Tam, T., Dolan, A., Boger, J. & Mihailidis, A. (2006). An intelligent emergency response system: Preliminary development and testing of a functional health monitoring system. Gerontechnology, 4(4), 209–222.

Taylor, A. & Agamanolis, S. (2010). Service users’ views of a mainstream telecare product: the personal trigger. In Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems (pp. 3259–3264).

Teas Gill, V. & Roberts, F. (2012). Conversation Analysis in Medicine. The Handbook of Conversation Analysis, 575–592.

Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.

Tinker, A. (1993). Alarms and telephones in emergency response-Research from the United Kingdom. Home Health Care Services Quarterly, 13(3-4), 177–189.

Vipperla, R., Wolters, M., Georgila, K. & Renals, S. (2009). Speech input from older users in smart environments: Challenges and perspectives. In Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments (pp. 117–126). Springer.

Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., … Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition.

Waseem, H., Durrani, M. & Naseer, R. (2010). Prank calls: A major burden for an emergency medical service. Emergency Medicine Australasia, 22(5), 480–480.

Weiss, C. O. (2011). Frailty and chronic diseases in older adults. Clinics in Geriatric Medicine, 27(1), 39–52.

Whalen, M. & Zimmerman, D. (1987). Sequential and institutional contexts in calls for help. Social Psychology Quarterly, 172–185.

Williams, C. E. & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250.

184

Williams, G., Doughty, K., Cameron, K. & Bradley, D. A. (1998). A smart fall and activity monitor for telecare applications. In Engineering in Medicine and Biology Society, 1998. Proceedings of the 20th Annual International Conference of the IEEE (Vol. 3, pp.1151–1154).

Wilpon, J. G. & Jacobsen, C. N. (1996). A study of speech recognition for children and the elderly. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on (Vol. 1, pp. 349–352).

Wolters, M., Engelbrecht, K.-P., Gödde, F., Möller, S., Naumann, A. & Schleicher, R. (2010). Making it easier for older people to talk to smart homes: The effect of early help prompts. Universal Access in the Information Society, 9(4), 311–325.

Wolters, M., Georgila, K., Moore, J. D. & MacPherson, S. E. (2009). Being old doesn’t mean acting old: How older users interact with spoken dialog systems. ACM Transactions on Accessible Computing (TACCESS), 2(1), 2.

Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction. London: Sage.

World Health Organization. (2011). Global Health and Aging.

Young, V. & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22(2), 99–112.

Young, V., Rochon, E. & Mihailidis, A. (2014). Towards the development of a speech-based and intelligent personal emergency response system: Identification of key conversational features in personal emergency response calls. Gerontechnology, 13(2), 315.

Yuan, J., Liberman, M. & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. In INTERSPEECH.

Zajicek, M., Wales, R. & Lee, A. (2004). Speech interaction for older adults. Universal Access in the Information Society, 3(2), 122–130.

Zhou, G., Hansen, J. H. & Kaiser, J. F. (1998). Linear and nonlinear speech feature analysis for stress classification. In ICSLP (pp. 883–886).

Zimmerman, D. (1992a). Achieving context: openings in emergency calls. In Watson, G. and Seiler, R.M. (Ed.), Text in Context: Contributions to Ethnomethodology (pp. 35–51). Sage Publications.

Zimmerman, D. (1992b). The interactional organization of calls for emergency assistance. In Drew, P. and Heritage, J. (Ed.), Talk at Work: interaction in institutional settings (pp. 418–469). Cambridge University Press.

185

Zraick, R. I., Gregg, B. A. & Whitehouse, E. L. (2006). Speech and voice characteristics of geriatric speakers: A review of the literature and a call for research and training. Journal of Medical Speech-Language Pathology, 14(3), 133–142.

186

APPENDIX A: Common Older Adult Conditions

This list was referenced when classifying risk levels for response call transcripts.

Common Older Adult Conditions (Salvi et al., 2007; Gorina et al., 2006)

Possible Symptoms Omission Comments

Chronic DiseaseHeart Disease (Coronary Artery Disease)

intense pain (tighness, pressure, burning, heavy weight) in upper body (chest, shoulders, neck, arms, upper abs, jaw). Loss of consciousness, nausea, shortness of breath. Clammy, sweaty, cold, anxious, nervous, pale

Cancer No rapid attacksStroke confusion, slurred speech, moving difficulties, loss of consciousness,

vision problemsAlzheimer's No rapid attacksDiabetes shaky, sleepy, confused, loss of consciousness, clammy, pale, fast heart

beat.Renal Disease Fluid retention (swelling in legs/feet), seizures, vomiting, nausea, nose

bleeds, hand tremor, HBP, sluggish movement, prolonged bleeding.

Lower Respiratory Disease (Asthma/Bronchitis, emphesema)

Wheezing, coughing, tight chest, short of breath, bluish skin

InfectionInfluenza fever, chills, shivering, muscle pain, headaches, cough, sore throat,

stomach pain, vomiting, diarrheapneumonia shortness of breath, shivering, chills, headache, confusion, muscle pain,

weakness, chest pain, blue lips, cough, high feversepticemia fever, chills, rapid breathing and heart rate

Drug Reaction Sweaty/dry, confused, pale/red, anxious, breathing difficulty, stomach pain, nausea, vomiting, diarrhea

AccidentFall blood, pain (injury), loss of consciousnessMotor Vehicle Omit - not home based

187

Appendix B: Original HELPER Dialogue Strategy

(McLean, 2005)

Y/N?Yes No

Y/N?

Yes

No

Would you like me to call anambulance? (Please say 'yes' or 'no')

Amb

Hello {Mr. Smith}. This is yourautomated health monitoring system.

Do you need help? (Please say 'yes' or 'no').

Root

Okay {Mr.Smith}. Just to confirm,please say 'yes' if you need any help

or 'no' again if you do not.

RootNoConf

Y/N?Yes

NoOkay {Mr. Smith}. I will call an ambulance right away, please say 'yes' to confirm

AmbConf

Start

Y/N?Yes No

Would you like me to callsomeone else to help you?(Please say 'yes' or 'no').

ResponderSorry for the interruption.The system is now exiting

Afalse

I can connect you to a liveoperater if you would like.

Please say 'yes' to be connected,or say 'no' to exit the system.

OpThere are two people on your contact list,

your {daughter Anne} and your {neighbour Paul}.

Would you like me to call {your daughter Anne}?

(Please say 'yes' or 'no')

List

Y/N?

Yes

No

Y/N?Yes No

I will call your {daughter Anne}.Please say 'yes' to confirm.

Name1Conf

I will call your {neighbour Paul}.Please say 'yes' to confirm or

'no' to be connected to an operator.

Name2Conf

Y/N?Yes

No

Y/N?

No

Yes

Connecting you to a live operator now.

Aop

I am calling your {neighbour Paul}now. I will give you an update momentarily.

Aname2

I am calling your {daughter Anne}now. I will give you an update momentarily.

Aname1

Y/N?YesNo

I am calling an ambulance now.I will also notify one of your responders.I will give you an update momentarily.

Aamb

188

Appendix C: Small Keyword Vocabulary Set

(Occ = occurrence, LR=low risk, MR=medium risk, HR=high risk, Fall=fall call, Med=medical call, OA = older adult, CG= caregiver, Final Cat= final category)

Occ Word Root LR MR HR Fall Med OA CG Final Cat

35 GET 1 1 1 1 1 1 1 2

21 NEED 1 1 1 1 1 1 2

18 COME 1 1 1 1 1 1 2

12 HELP 1 1 1 1 1 1 1 2

12 GO 1 1 1 1 1 1 2

10 WANT 1 1 1 1 1 1 2

6 TAKE 1 1 1 1 1 1 2

5 CALLING 1 1 1 1 1 1 1 2

4 CALL 1 1 1 1 1 2

4 COMING 1 1 1 1 1 2

4 SEND 1 1 1 1 1 1 2

3 TELL 1 1 1 1 1 1 2

3 PHONE 1 1 1 1 1 2

3 BRING 1 1 1 1 2

3 CHECK 1 1 1 1 2

2 WANTS 1 1 1 1 2

17 BREATHING 1 1 1 1 1 1 4

13 SEE 1 1 1 1 1 4

12 FEEL 1 1 1 1 1 4

9 FEELING 1 1 1 1 1 1 1 4

5 HEAR 1 1 1 1 1 4

3 BREATH 1 1 1 1 1 4

3 BREATHE 1 1 1 1 4

2 LIFT 1 1 1 1 1 4

2 SUGAR 1 1 1 4

2 MOVE 1 1 1 1 4

2 STRAIGHTEN 1 1 1 4

2 STAND 1 1 1 4

1 BEATING 1 1 1 4

1 BEATS 1 1 1 4

1 AWAKE 1 1 4

1 BREATHES 1 1 1 4

17 PLEASE 1 1 1 1 1 1 5

3 THANKS 1 1 1 1 1 1 5

30 HELLO 1 1 1 1 1 1 6

6 HI 1 1 1 1 1 6

4 BYE 1 1 1 1 1 6

20 PARDON 1 1 1 1 1 1 1 7

12 AGAIN 1 1 1 1 1 1 1 7

25 AMBULANCE 1 1 1 1 1 1 8

12 HOSPITAL 1 1 1 1 1 1 8

11 SOMETHING 1 1 1 1 1 1 8

189


5 DAUGHTER 1 1 1 1 1 1 8

5 DOCTOR 1 1 1 1 1 8

5 OXYGEN 1 1 1 1 8

3 PARAMEDICS 1 1 1 1 1 8

3 SOMEBODY 1 1 1 1 1 8

2 EMERGENCY 1 1 1 1 8

2 SOMEONE 1 1 1 1 1 8

2 NEIGHBOUR 1 1 1 8

1 BROTHER 1 1 1 8

1 FIREFIGHTER 1 1 1 8

1 MEDICS 1 1 1 8

1 ASSISTANCE 1 1 1 8

27 CAN 1 1 1 1 1 1 9

17 WHAT 1 1 1 1 1 1 9

13 COULD 1 1 1 1 1 1 9

6 WHEN 1 1 1 1 1 1 9

5 WHO 1 1 1 1 1 9

4 WHERE 1 1 1 1 1 9

3 WILL 1 1 1 1 1 1 9

21 BACK 1 1 1 1 1 1 10

18 CHEST 1 1 1 1 1 1 10

8 HEAD 1 1 1 1 1 1 10

6 HEART 1 1 1 1 1 1 10

4 BODY 1 1 1 1 1 10

4 STOMACH 1 1 1 1 1 10

4 ARM 1 1 1 10

4 LEG 1 1 1 10

3 THROAT 1 1 1 1 10

3 SHOULDER 1 1 1 10

3 FEET 1 1 1 10

3 SIDE 1 1 1 1 10

1 ABDOMIN 1 1 1 10

1 NOSE 1 1 1 10

1 KIDNEYS 1 1 1 10

1 NECK 1 1 1 10

1 FACE 1 1 1 10

1 RIB 1 1 1 10

45 DON’T 1 1 1 1 1 1 1 11

35 CAN’T 1 1 1 1 1 1 11

5 DIDN’T 1 1 1 1 1 1 11

6 DOESN’T 1 1 1 1 1 11

1 ISN'T 1 1 1 11

1 CANNOT 1 1 1 11

31 UP 1 1 1 1 1 13

21 DOWN 1 1 1 1 1 1 13

16 BED 1 1 1 1 1 1 13

13 FLOOR 1 1 1 1 1 1 13

190


13 OUT 1 1 1 1 1 1 13

9 OFF 1 1 1 1 1 1 14

8 TIME 1 1 1 1 1 1 14

7 DAY 1 1 1 1 1 14

3 HOUR 1 1 1 1 14

2 DATE 1 1 1 14

1 MISTAKE 1 1 14

1 WAIT 1 1 1 14

1 WEATHER 1 1 1 14

153 NO 1 1 1 1 1 1 1 1-n

45 NOT 1 1 1 1 1 1 1-n, 11

59 YEAH 1 1 1 1 1 1 1 1-p

16 DO 1 1 1 1 1 1 1-p

16 RIGHT 1 1 1 1 1 1 1-p

7 YUP 1 1 1 1 1 1 1-p

4 SURE 1 1 1 1 1 1 1-p

3 YA 1 1 1 1 1-p

2 MAYBE 1 1 1 1 1 1-p

71 OKAY 1 1 1 1 1 1 1 1-p, 3-n

13 ALRIGHT 1 1 1 1 1 1 1-p, 3-p

45 THANK_YOU 1 1 1 1 1 1 1 1-p, 5

99 YES 1 1 1 1 1 1 1 1-p, 6

3 ASK 1 1 1 2-a

2 TRY 1 1 1 1 2-a

2 ASTHMA 1 1 1 3-e

2 DIALYSIS 1 1 1 3-e

1 FIBRILLATION 1 1 1 3-e

1 DIABETIC 1 1 1 3-e

1+1 SHAKY 1 1 1 1 3-n

46 HAVE 1 1 1 1 1 1 3-n

23 FELL 1 1 1 1 1 3-n

22 PAIN 1 1 1 1 1 1 3-n

11 BAD 1 1 1 1 1 1 3-n

8 FALLEN 1 1 1 1 1 3-n

7 BLOOD 1 1 1 1 1 1 3-n

7 DIZZY 1 1 1 1 1 3-n

7 FALL 1 1 1 1 1 1 3-n

7 SICK 1 1 1 1 1 3-n

5 BLEEDING 1 1 1 1 1 3-n

5 WEAK 1 1 1 1 1 1 3-n

5 HIGH 1 1 1 1 3-n

5 HURT 1 1 1 1 3-n

5 TROUBLE 1 1 1 1 1 3-n

4 WRONG 1 1 1 1 1 1 1 3-n

4 PROBLEMS 1 1 1 1 1 3-n

4 SORE 1 1 1 1 1 1 3-n

4 PRESSURE 1 1 1 1 3-n

191


4 SWEATING 1 1 1 1 1 3-n

4 TIGHTNESS 1 1 1 1 3-n

4 DIFFICULTY 1 1 1 1 3-n

4 TERRIBLE 1 1 1 3-n

3 INJURED 1 1 1 1 1 3-n

3 COLD 1 1 1 1 3-n

3 BROKEN 1 1 1 3-n

3 CONSTIPATION 1 1 1 3-n

3 RASH 1 1 1 3-n

2 ATTACK 1 1 1 1 1 1 3-n

2 SHORT 1 1 1 1 1 3-n

2 HARD 1 1 1 1 3-n

2 NAUSEATED 1 1 1 3-n

2 PNEUMONIA 1 1 1 1 1 3-n

2 THROWING_UP 1 1 1 1 1 3-n

2 CLAMMY 1 1 1 1 3-n

2 DIARRHEA 1 1 1 3-n

2 TEMPERATURE 1 1 1 3-n

1 FAINTED 1 1 1 3-n

1 LOW 1 1 1 3-n

1 RAPID 1 1 1 3-n

1 STROKE 1 1 1 3-n

1 SUFFERING 1 1 1 3-n

1 TIA 1 1 1 3-n

1 TREMOR 1 1 1 3-n

1 WHEEZING 1 1 1 3-n

1 DISCOMFORT 1 1 1 3-n

1 NUMB 1 1 1 3-n

1 UNCONSCIOUS 1 1 1 3-n

1 ACCIDENT 1 1 3-n

1 LOSING 1 1 1 3-n

1 FEVER 1 1 1 3-n

1 FIRE 1 1 1 3-n

1 HEART_ATTACK 1 1 1 3-n

1 VOMITING 1 1 1 3-n

1 CHOKING 1 1 3-n

1 CONFUSED 1 1 1 3-n

1 DISORIENTED 1 1 1 3-n

1 ANGINA 1 1 1 3-n

33 KNOW 1 1 1 1 1 1 1 3-n

(qualifiers)

12 HAVING

1 1 1 1 1 1 3-n

(qualifiers)

8 GOING

1 1 1 1 1 1 3-n

(qualifiers)

2 EVERYTHING 1

1

1 1

3-n (qualifiers)

46 WELL 1 1 1 1 1 1 3-p

14 GOOD 1 1 1 1 1 1 1 3-p

192


10 FINE 1 1 1 1 1 1 3-p

1 SORRY 1 1 5, 1-n

383 I 1 1 1 1 1 1 1 Identifier

35 ME 1 1 1 1 1 1 Identifier

26 147 137 112 162 160 113

14.05% 79.46% 74.05% 60.54% 87.57% 86.49% 61.08%

193

Appendix D: Unique Keyword Occurrences

Low Medium High

Unique word count 3 44 31 MISTAKE MOVE YA

ACCIDENT CHECK TIGHTNESS

SORRY SIDE OXYGEN

BYE BLEEDING

TROUBLE BEATING

DAY BEATS

FIREFIGHTER ABDOMIN

MEDICS NOSE

KIDNEYS ISN'T

NECK FIBRILLATION

WAIT FAINTED

DIABETIC LOW

FEVER RAPID

FIRE STROKE

HEART_ATTACK SUFFERING

VOMITING TIA

WANTS TREMOR

STRAIGHTEN WHEEZING

STAND ANGINA

NEIGHBOUR SUGAR

DATE EMERGENCY

DIARRHEA NAUSEATED

TEMPERATURE AWAKE

FEET BREATHES

HOUR BROTHER

ASK DISCOMFORT

COLD NUMB

CONSTIPATION UNCONSCIOUS

RASH ASTHMA

ARM DIALYSIS

TERRIBLE HARD

HURT

ASSISTANCE

FACE

RIB

CANNOT

WEATHER

LOSING

CONFUSED

DISORIENTED

SHOULDER

BROKEN

LEG

CHOKING

194

Appendix E: Questions for Participant

1. Male or Female? 2. Birthdate? 3. Age? 4. Birth Place (City and Province), if not Canada, indicate Country? 5. Mother tongue (language/Country)? (e.g. English/Britain) 6. Cultural Ethnicity? (e.g. French) 7. This experiment will require you to listen to noises and sounds over speakers or headphones. Do

you have any hearing impairment that may affect your performance in this study? If yes, explain. (e.g., deaf in right ear)

8. This experiment requires you to be able to see and read various documents and scripts. Do you have any visual impairment that will affect your performance in this study? If yes, please explain. (e.g., blind in right eye)

9. This experiment will require you to sit/stand inside a small sound proof booth. Please indicate if you have any conditions that may affect your performance during the voice recordings? If yes, explain. (e.g., claustrophobia, trouble with sitting or standing for long periods)

10. Do you have any medical conditions that we should be aware of that may affect your performance during the voice recordings? If yes, please explain. (e.g., respiratory, cardiac (heart) problems)

11. Do you have any previous acting experience? If yes, please indicate number of years as an actor? What type of acting (e.g. theatre, movie, etc.)?

12. What is the highest level of education you have attained? (e.g., elementary, high school, post-secondary (bachelor’s, master’s, doctoral), college)

13. Would you like to be contacted in the future for other research study opportunities for which you may be a suitable candidate? (no or yes) If yes, please provide a contact number or email.

14. Are you interested in receiving a copy of published literature that discusses the results of this study? May take several years before any publications are available. (no or yes)

15. If yes, how would you like to be contacted? Please provide contact information. 16. It is our hope that a freely accessible older adult speech database can be made available to

interested researchers/individuals to help continue the development of speech recognition technologies for older adults. If an older adult speech database is successfully developed from this study but your name, contact and medical information are not included, but your speech sample, age, gender, region of birth and ethnicity are included – would you agree with allowing the database to be used for future research studies and technology development projects by other interested researchers and individuals? (no or yes)

195

Appendix F: Key Words and Phrases List

SET 1 (SET 2 is the reverse order) INDEX KEY PHRASES KEY WORDS

SECTION 1 1 Will you help me? WILL

2 My breathing is not good. GOOD

3 I have tightness in the chest. TIGHTNESS

4 I need the ambulance to take me to the hospital. HOSPITAL

5 Uh, I need help but I didn’t fall. DIDN’T

6 I can’t move it. MOVE

7 They said to bring her in. BRING

8 I want to go to the hospital. GO

9 She has dialysis today, but she’s really sick. DIALYSIS

10 I’m having a terrible time. TERRIBLE

11 What day is it today? DAY

12 I can’t hear you, can you speak up? HEAR

13 He can't feel his body. BODY

14 I think I’m having a TIA. TIA

15 I’ve got a pain in my chest. CHEST

16 Ya, everything’s good. YA

17 Try calling my son. TRY

18 It’s an emergency, we need the ambulance right away. EMERGENCY

19 I’m really weak. WEAK

20 Can you call my brother? BROTHER

21 Is the ambulance coming? COMING

22 I’d like the paramedics to come. PARAMEDICS

23 I can’t straighten my leg. LEG

24 My head is light, I’m very sick. HEAD

25 He has discomfort in the chest. DISCOMFORT

26 He hit his back head and there’s blood BLOOD

27 I’m okay. OKAY

28 He’s choking on something. CHOKING

29 I feel awful, I’m throwing up constantly. THROWING UP

30 Oh, I’m dizzy. DIZZY

31 Can you get somebody else? SOMEBODY

32 Please ask the Superintendent to open the door. ASK

33 Sorry, I pushed it by mistake. MISTAKE

34 Um, he’s clammy. CLAMMY

35 Pardon me? Talk louder! PARDON

36 I get panic attacks. ATTACK

37 Can you send somebody down to my place? CAN

SECTION 2

38 I’m the caregiver, he has pains in his stomach. STOMACH

39 Yes, everything’s just fine, thank you. FINE

196

40 He breathes kind of funny. BREATHES

41 It’s affecting my breathing. BREATHING

42 I take water pills. TAKE

43 I’m not breathing very good again. AGAIN

44 Sugar, blood sugar too low. LOW

45 Computer off! OFF

46 I had like a tremor on my chest. TREMOR

47 I was running hot and cold, hot and cold, it’s just terrible. COLD

48 Can you get someone else? SOMEONE

49 My neighbour is helping me. NEIGHBOUR

50 Can you help me? ME

51 He’s broken his leg. BROKEN

52 Heh? I cannot hear you. CANNOT

53 Could someone come to the house and check me over. CHECK

54 I’ve fallen down. FALLEN

55 My back is very sore. SORE

56 I’m not well, I need some oxygen. WELL

57 I have terrible excruciating pain at night in my back. HAVE

58 Something isn’t right. ISN'T

59 Yes, I’m afraid I might fall over. YES

60 Yeah, I just tested the system. YEAH

61 I can’t get up. UP

62 Who are you? WHO

63 I’m wheezing too much. WHEEZING

64 I might fall down. DOWN

65 Her throat is all swollen up. THROAT

66 I’m having problems getting up. PROBLEMS

67 What is the date today? DATE

68 I’m nauseated. NAUSEATED

69 I can’t breathe this morning. BREATHE

70 Hello? HELLO

71 I broke my right arm. ARM

72 What time is it? TIME

73 No, everything is fine, thank you. EVERYTHING

74 He’s got asthma. ASTHMA

SECTION 3

75 I hurt my ribs, one rib feels broken. RIB

76 Something popped out the side of my stomach. SIDE

77 I think I need the firefighter medics. MEDICS

78 Yes, I do need help. DO

79 She’s confused. CONFUSED

80 The face has come alive again. FACE

81 I have a sore neck and I’m not feeling very good. NECK

82 I have a very high fever. FEVER

83 It could be angina. ANGINA

84 My sugar is low. SUGAR

85 I don’t know what happened. KNOW

86 I wonder if you could send somebody down to my place? SEND

87 My house is on fire! FIRE

88 Get help, I can’t get him up. GET

197

89 I’m having trouble breathing again. TROUBLE

90 What is the hour? HOUR

91 I have severe constipation. CONSTIPATION

92 I’m in a lot of pain. PAIN

93 I fainted again today. FAINTED

94 I've got diarrhea and I'm heavy and irritated. DIARRHEA

95 Yeah, sure, I could do with something. SURE

96 Call the staff. CALL

97 I have a bleeding nose. NOSE

98 She had a bad fall. FALL

99 My grandma’s fallen down and we can’t seem to lift her up. LIFT

100 I have difficulty breathing. DIFFICULTY

101 It beats for a while and then seems to break, and then starts again.

BEATS

102 I believe he’s out, unconscious. UNCONSCIOUS

103 He’s losing a lot of blood. LOSING

104 He says he’s numb. NUMB

105 I’m aching all over, it’s my back, my adomin, my abs, everything! ABDOMEN

106 She’s disoriented. DISORIENTED

107 I was calling to tell you I was alright. TELL

108 Yes, I’ve seen the doctor yesterday. DOCTOR

109 Could you get an ambulance please? COULD

110 What do you mean? WHAT

111 Yes, please, my mom needs an ambulance. PLEASE

SECTION 4

112 I was taking some medication and I developed a horrible rash. RASH

113 I’m a little wobbly on my feet. FEET

114 I might fall out of bed. OUT

115 I just wanted you to call my daughter. DAUGHTER

116 I pulled something. SOMETHING

117 I'm very short of breath. SHORT

118 My mother wants me to get an ambulance. WANTS

119 Please send the firefighter! FIREFIGHTER

120 There’s nothing wrong, bye bye. BYE

121 Where is the ambulance? WHERE

122 I’m sweating and have discomfort in the chest. SWEATING

123 I have atrial fibrillation with my heart. FIBRILLATION

124 My heart isn’t beating smoothly. BEATING

125 I’m not injured, no. INJURED

126 There is a lot of bleeding. BLEEDING

127 I’m in bad shape. BAD

128 No, I don’t want the ambulance. WANT

129 Something’s wrong. WRONG

130 I’m able to breathe alright. ALRIGHT

131 When can you send for a paramedic? WHEN

132 He needs an ambulance. AMBULANCE

133 I don’t know, I just don’t feel good. DON’T

134 My husband fell down, he’s on the floor in the kitchen. FELL

135 It’s the caregiver calling, can you send an ambulance please? CALLING

136 Yes, maybe he can help. MAYBE

198

137 I slid out of bed. BED

138 It’s not a heart attack. HEART_ATTACK

139 I can’t catch my breath. BREATH

140 He doesn’t feel too well. DOESN’T

141 I can’t straighten the left one. STRAIGHTEN

142 I’m feeling sick. FEELING

143 Wait! I beg your pardon? WAIT

144 I’m really suffering right now, I need care. SUFFERING

145 I need a pull. NEED

146 Eh, Thanks for your help. THANKS

147 I had a little stroke. STROKE

148 I can’t even stand. STAND

SECTION 5

149 Nuh… no, I can’t see any bruises. SEE

150 No, I’m having problems. HAVING

151 I hurt myself. HURT

152 He’s not awake anymore. AWAKE

153 I have high blood pressure. PRESSURE

154 Can you phone my sister? PHONE

155 I keep going to the bathroom. GOING

156 Someone help me! HELP

157 I’m vomiting. VOMITING

158 Ah, yes, I was wondering, could the paramedics come and see me?

COME

159 No, I should be back in the hospital. NO

160 Um, I think I have pneumonia. PNEUMONIA

161 I need some oxygen. OXYGEN

162 I just don’t feel well this morning. FEEL

163 I can’t get off the floor. FLOOR

164 I have trouble with my heart. HEART

165 It’s her husband. She feels she has broken her shoulder. SHOULDER

166 My fluid is back up again. BACK

167 My kidneys aren’t working. KIDNEYS

168 Yup, I fell and hurt myself. YUP

169 I'm sick and I'm in bed and I can't do anything for myself. CAN’T

170 Can we get some assistance? ASSISTANCE

171 I have high cholesterol. HIGH

172 I have a hard time breathing. HARD

173 I have ackward breathing, it’s very rapid. RAPID

174 She’s just not well. NOT

175 I’m shaky. SHAKY

176 Ah, it was an accident, thank you. ACCIDENT

177 I have a high temperature. TEMPERATURE

178 Hi, who’s there? HI

179 What’s the weather like today? WEATHER

180 I’m sorry? I didn’t hear you. SORRY

181 I’m very sick. SICK

182 That’s right, I’m really really sick. RIGHT

183 I’m fine, thank you. THANK_YOU

184 I can hardly walk. I

199

185 I’m a diabetic, see? DIABETIC

NUMBERS Counting from 0 to 20, in ones. Counting from 30 to 90, in tens.

200

Appendix G: Emergency Scenarios

Emergency Scenarios

Notation Symbols (not in all scenarios): 1. Non-verbal comments, extra information, and simultaneous speech cues are italicized and in parenthesis, e.g. {calls an ambulance}. 2. Words that are incomplete end with a: -- , e.g., pineap--. 3. Pauses are indicated by: …, e.g., okay…bye. 4. When one speaker is cut-off by the other speaker, the sentence ends by: //.

Scenario 1: Low Risk Accident

The Situation: Imagine you are Mrs. Smith, around 85 years old, settling down to relax in your favourite arm chair. While adjusting yourself, you accidentally push the personal emergency response button without knowing it. The emergency call taker comes on the speaker phone suddenly asking what’s wrong. You inform her that you don’t need any help and you are fine. --- Scenario Start ---- E Hello Mrs Smith, this is Judy from AssistMe Canada, how can I help you? C Pardon? E Hello Mrs Smith? C Yes, dear. E Hi, this is Judy from AssistMe. C Yes? E Are you alright? C Yes, everything’s just fine. E Okay, is there anything that I can do for you? C No, I just had my home care worker here and I’m all looked after, and a just feeling fine. E Alright…well…you have a good day. C Thank you very much. E You’re welcome, bye. --- Scenario End ----

Scenario 2: Low Risk Accident

The Situation: Imagine you are Mrs. Smith, around 70 years old. You have accidentally pushed your help button while opening a can of pickles in the kitchen. You are expecting AssistMe to respond so you can tell them it was an accident. --- Scenario Start ---- E Ms. Smith, it’s Judy from AssistMe Canada, how may I help you? C Yeah, thank you, I pushed it by mistake. E Alright, have a good day. C Alright, bye. --- Scenario End ----

201

Scenario 3: Low Risk - Accident

The Situation: Imagine you are Mr. Smith (John) around 80 years old. You are sitting in your easy chair watching your favourite television show. Suddenly, a voice is heard from the telephone speaker and you are quite surprised. You realize you must have pressed your panic button by mistake during the show. You inform the Emergency Call Taker that everything is fine. --- Scenario Start ---- E Hello Mr. Smith, this is Judy from AssistMe Canada, how may I help you? {no response, TV sounds} E Hello John, do you need any help? C No, I don’t know what w— happen--. E Okay, we got a signal from the button that you wear, so you may have pressed it accidentally. C I don’t know what happened there. E Alright, is there anything else we can do? C No thank you, I’m just fine. E Okay then, I’ll reset, have a good day. C Thank you. E You’re welcome, good_bye. --- Scenario End ----

Scenario 4: Medium Risk – Fall

The Situation: Imagine you are Mrs. Smith (Jane) around 85 years old. You haven’t gotten much exercise lately and have been feeling weak and frail. You use a walker which you rely on heavily for support. This afternoon you were walking from the bedroom to the kitchen but somehow turned too quickly and tripped over the leg of your walker. You are not hurt but you are alone and cannot get to the phone. You’ve tried a few times to get up but you just don’t have the strength to pull yourself up. You push your help button. You want to ask for your son Fred to give you a hand to get up. --- Scenario Start ---- C Hello? E Hello, this is Judy from AssistMe Canada, is this Mrs. Smith? C Yeah, this, this is Jane. {Frustrated} I've just fallen and I can't get up. E Okay, I'm gonna…are you hurt? C Ah, no, I’m not hurt. E You're not hurt? Okay, I'm gonna call your responders to help you, alright? C {slight pause} Pardon? E I'll call your responders to get someone to help you. C Yes ... {E starts to speak} okay. E Okay, just a moment. {Non verbal action: Call Taker calls the responder} E Mrs. Smith? C Yes? E Yes, your son, Fred, is on his way to help you. C Uh, okay. E Okay? C Yup. E Alright then, bye for now. --- Scenario End ----

202

Scenario 5: Medium Risk – Fall

The Situation: Imagine you are Mr. Smith (Frank), the 82 year old husband of Mrs. Smith, your 80 year old wife. One day you hear a big crash and you are dismayed to discover that Mrs. Smith has tripped over the dog and fallen down. Mrs. Smith thinks her shoulder is broken and you need help quickly. You press the help button and wait for the emergency call taker to respond. --- Scenario Start ---- E Hello Mrs. Smith, this is Judy from AssistMe Canada, how may I help you? C {urgent, concerned voice} Yes, it's her husband Frank, she has fallen, a…tripped over her d-- the dog, fallen on the floor, and she feels she has broken her shoulder. E Oh, okay. C Can we get some assistance? E And what shoulder do you think she broke? C Eh…it's the right shoulder. E Is there any bleeding? C Ah…I can't see any de--, I'll take a look. E Okay. {Caregiver checks for blood, can hear wife moaning from husband moving her around} C No, there is none. E Oh okay, we will call the ambulance, hold on. --- Scenario End ----

Scenario 6: High Risk - Fall

The Situation: Imagine you are Mrs. Smith, a frail 85 year old woman who lives alone. You have some hearing loss in your right ear and depend a lot on a cane to help you around your house, otherwise you are just fine. This morning you are in the bathroom getting ready when you slip on some water on the floor and hit your head on the tub. Although dazed, luckily you are still conscious but you feel some blood on the back of your head. You manage to get up and onto a chair but you don’t have much energy and need some help. You push your help button and wait for the Emergency Call Taker to respond. --- Scenario Start ---- E Hello Mrs. Smith, it's Judy calling from AssistMe Canada, how may I help you? C {No response} E Mrs Smith? C {slow} Hello? E Hi, how are you? C I fell…could someone come to the house and help me? E Are you hurt? C Well…I'm bleeding. E Where are you bleeding from? C Come to the back, or come to front door, I'll have to turn off an alarm. E Okay, where are you bleeding from? C {no response} E Miss. Smith? C Yes? E Where are you bleeding from? C 54 Bankok street, apartment 201. E {Louder} No, no, where are you {emphasize} bleeding from? C My head.

203

E Okay, one moment okay? {Calls EMS} E Miss. Smith, the ambulance is on the way. C Thank you…{E starts to speak} are they coming to the front door? E You're welcome. Yes. C Oh. E Alright, so we'll call you back shortly but help is on the way. C Alright, thank you. E You're welcome. --- Scenario End ----

Scenario 7: Medium Risk - Medical

The Situation: Imagine you are Jane, an 85 year old female who lives alone. You have some medical complications such as high blood pressure and are currently taking some medication, but are otherwise healthy. One day you start feeling nauseous and can’t stop throwing up. You are scared, weak, feel terrible and you want help quickly. You press your help button and request that the emergency responder calls your daughter to help you. --- Scenario Start ---- C Hello? E Hello, Jane, it's Judy from AssistMe Canada. C This is Jane. E Hello, how are you? C {weak, shaky voice} Oh, I need help. E What’s wrong? C Oh I, I keep throwing up and going to the bathroom. E You…you’re vomiting? E How long has this been going on? C {painful and drawn out} Oh, it just started now {E speaks as C mumbles another word that is incomprehensible}. E Okay…okay, is there anyone there with you right now? C No. E Okay…okay so do you want me to call an ambulance for you or {C starts to speak} did you wan--// C No, No, I just want you to call my daughter. E Okay, do you know why you’re vomiting? C No. E No, you don’t know, okay, just one moment, I just want to see your daughter’s// C Yeah, and get her to get m-- , ah…Claire to come over. E Is, you’re daughter’s name…is Claire? C No, her name is Tonya. E Okay, you want me to call Tonya and so Tonya can get Claire to come over? C Yes, {E starts speaking} please. E Who's…who’s Claire? C {sigh}. E Is Claire your caregiver? C One, one of the girls that {E starts speaking} works// E One…okay, are you sitting down right now? C Eh? E Are you sitting down? C Yes. E Okay, so you’re nauseated and you’re vomiting? C Yup. E Alright, and…that’s it?

204

C {guttural noise}. E Are you having any difficulty breathing as well? C No {whimper}. E No, {C mumbles during the next word} okay…okay just, one moment, I’m going to call Tonya to get Claire, okay? C Yes, please. {dialing for the responder} E Okay, Jane? C Hm? E Okay, it’s Judy again from AssistMe, so I’ve spoken to your daughter she’s going to try and call, ah…the…I guess, the agency that Claire works for. C Thank you. E Okay, so, I am going to try to get someone to go over and stay with you until they come okay? C {silence}. E Do you need us to call you an ambulance? C No. E No ambulance, okay, so I’ll, I’ll try and get someone else to come over, okay? C Thank you {mumbled words}. E I’ll call you back. --- Scenario End ----

Scenario 8: High Risk - Medical

The Situation: Imagine you are Mrs Smith, an 85 year old frail woman who lives alone. Today you’ve been feeling a bit off, the weather is very humid and in the afternoon you didn’t much feel like eating lunch. Now you begin to feel shaky and start to have more and more difficulty with breathing. You start to worry as you aren’t sure what’s happening. Maybe you have anxiety or your sugar levels are low or is it your blood pressure? You feel like you shouldn’t move too quickly or too much. You find a chair and push your help button. You ask the emergency call taker to get your brother, Jerry, to come over and help you. --- Scenario Start ---- E Hello Mrs. Smith it's Bob calling from AssistMe Canada, how may I help you? C {No response} E Mrs. Smith? C {shaky, breathing difficulty} Yes, I am here. E Do you need any help? C I need help. E What's wrong? C I'm, I, I'm, I'm all shaky. E Okay C and, and uhm// E How is your breathing? C It, my breathing is not, not, not too good. E Not too good? Okay, do you have any chest pain? C No. E No, okay, would you like me to call the ambulance? C Well, no, I, I, I must get, not yet I don't think. E No? C Well, I don't know. E Who would you like me to call? Jerry? C Yes, maybe he can help me// E Call Jerry? C Yup. E Okay, hold on okay?

205

C Okay, {mumbles} thank you. {Call taker calls Jerry} E Mrs. Smith? C Yes? E Jerry is on his way. C Okay. E Okay, so we'll call you back in about fifteen minutes okay? C Thank you very much. E Okay, you're welcome. --- Scenario End ----

Scenario 9: High Risk - Medical

The Situation: Imagine you are Mrs. Smith, a frail woman of 75. The weather has been extremely humid and hot and you’ve been hanging out inside your home where it’s cooler and less humid. Over the last two days you’ve started to have a harder time with breathing but you think it’s just the weather. Today you are feeling a bit weak and you find it increasingly more difficult to get your breath. You decide to press your help button. You might need some oxygen from the paramedics. --- Scenario Start ---- E Hello Mrs. Smith, this is Bob from AssistMe Canada, how may I help you? C Yes, I was wondering, could the paramedics come and see me. E Okay, what’s wrong? C I have not cl--, I don’t have tightness of chest or pain, but I’m having trou--{breathe hard} ble breathing. E Okay, so, no pain in your chest? C No pain, no, no, no tightness nothing. E Okay, how long has it been going on? C Oh, ah…well… I’d say mostly today. E Okay, now have you changed colour or anything? C Haven’t changed a thing. E Okay, and are you sweaty, are you clammy at all? C Very dizzy yesterday. E Okay, alright, so I’m going to call them now, is your apartment door unlocked? C Yes, um, do you want the outer one lo* unlocked too? E Um, let me just check to see if we have an entry code for you. E No, they can get into the building, just make sure that your door is not locked. C Well, that’ll be wonderful. E Okay, alright, so you can do that, I’ll come back to you once they’re on the way okay? C Okay. E Alright. --- Scenario End ----

206

Appendix H: Emergency Response Services Visits

On-Site Visit with Emergency Response Services

The on-site visits to emergency call responders were short single day events that lasted less than

the entire day. The findings are all specific to the city of Toronto where the offices and fire hall

were located. The visit with the firefighters was the shortest and consisted of an informal

interview with the three firefighters that lasted less than one hour. The visit to the EMS dispatch

centre in Toronto lasted several hours and consisted of informal interviews with two EMS

dispatchers, observations to see the process of how the calls are received and dispatched, and

listening to incoming EMS calls with one EMS dispatcher. The visit to the local call centre lasted

several hours as well, and consisted of informal interviews with several call takers and the call

taker leader, observations to see the process of how the calls are handled and responded to, and

listening to incoming responsecalls with three different call takers.

Firefighters

The firefighters in addition to the paramedics may be dispatched to a scene if it is unclear who

will be able to reach the location first. Police are also dispatched for the same reason and if there

is a possible dispute, accident involving vehicles or pedestrians, or other need for police services.

During the discussion, the firefighters mentioned that it is occasionally necessary to force entry

into a home or building by breaking down a door or window if there is no other apparent way to

enter and if a person in medical or emergency distress is presumed to be inside. In terms of older

adults, common call types may be for fires that occur in the kitchen, for example, because

something is burning on the stove. When they approach a scene, typically within a few seconds

they are able to tell if a person is responsive and breathing. This type of statement may suggest

that the firefighter’s primary concern would be assessing the health status of the individual of

concern.

EMS Dispatchers

The EMS dispatchers follow a typical dialogue structure when receiving a call. Their basic

objectives are listed in order of importance:

207

1. Verify caller’s location;

2. Verify caller’s contact info;

3. Identify what is happening. The dispatcher’s initial concern is to determine if the person

experiencing the problem is conscious and breathing. Then, depending on what the caller

says, a set script is followed which suggests what the dispatcher should query next.

4. Categorize the call. Incoming calls are categorized according to the perceived level of

response required. This assessment may also help the dispatcher steer the communication or

dialogue in the appropriate direction based on the call category.

Table H1 provides an example of how the incoming emergency response calls are categorized.

This information was derived from informal discussions and is only provided as an example.

Specific details of what is considered within each call category would need to be re-verified.

Table H1: Emergency response call classifications based on the type of situation.

Alpha Bravo Charlie Delta Echo

Send ambulance after designation

(e.g., paramedic truck, ambulance w/ supplies)

Send ambulance & fire truck (& police)

(e.g., chest pain, breathing, pedestrian/cycle/motor accident, long fall, stab/gunshot)

Send ambulance and police

(e.g. unconscious, not breathing)

The EMS dispatcher’s tools consist of a headset with microphone and two computer monitors

with keyboard and mouse for data entry. For every incoming call, the dispatcher must be very

alert, and must multi-task. He/she is looking to see if information on the phone number is

available as well as the caller’s location. Details on the incoming call situation are entered to

provide information for the emergency responders and for logging, the dialogue script is also

followed, and the call is classified. EMS dispatchers must make decisions on the fly very quickly

and try to respond to calls and dispatch assistance in a minimum amount of time.

208

Personal Emergency Response Call Centre Call Takers

The personal emergency response call taker setup is similar to the EMS dispatcher in that a

headset with microphone is worn and the user sits in front of a computer monitor where

information is received and entered during the call. The call taker follows a basic dialogue script

for their opening utterance (described in the literature review in Chapter 1), and general

guidelines for the remainder of the dialogue in which their goal is to determine what kind of

response is being requested. The call centre protocol provides call takers with basic information

on what details to request in order to inform EMS. Also, the use of the protocol ensures a

minimum level of call standardization for both quality control and company liability. During the

on-site visit, several response calls were listened to over the course of part of the day, although

the vast majority of the calls were non-emergent (e.g., routine “check-in” calls). This is in line

with other literature which discusses the fact that a majority of calls are not emergency calls

(Hamill et al., 2009). Similar to the EMS dispatcher, the call taker is also required to multi-task

during the call and must remain alert. They have access to a client’s medical history (whatever

was provided by the subscriber), as well as information on possible non-EMS responders. While

a call is in progress, this information must be read and processed by the call taker, important

details about the call must be logged, and call takers may reference the dialogue guidelines while

also listening to the caller and making decisions on how to respond to the call itself.

209

Appendix I: Summary of Peer Reviewed Journal Papers

Young V, Mihailidis A. (2013). The CARES Corpus: A database of older adult actor simulated

emergency dialogue for developing a personal emergency response system. International Journal

of Speech Technology. 16:55-73. (*work outlined in Chapter 4 of this dissertation)

Young, V., & Mihailidis, A. (2010). Difficulties in Automatic Speech Recognition of dysarthric

speakers and the implications for speech-based applications used by the elderly: A literature

review. Assistive Technology Journal, 22:99-112. (*review paper resulting from comprehensive

exam paper)

Hamil, M., Young, V., Boger, J. and Mihailidis, A. (2009). Development of an automated speech

recognition interface for personal emergency response systems. Journal of NeuroEngineering

and Rehabilitation, 6(26). (*assisted in paper review, added background information, and

assisted with final revisions).

content analyses of personal emergency response calls ......and emotional speech, including personal...

Documents