on the analysis and classification of sleep stages from

On the Analysis and Classification of Sleep Stages

from Cardiorespiratory Activity

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit

Eindhoven, op gezag van de rector magnificus prof.dr.ir. F.P.T.Baaijens,

voor een commissie aangewezen door het College voor Promoties, in het

openbaar te verdedigen op dinsdag 30 juni 2015 om 16:00 uur

door

Xi Long

Geboren te Ganzhou, China

Dit proefschrift is goedgekeurd door de promotoren en de samenstelling van de promotiecom-

missie is als volgt:

voorzitter : prof.dr.ir. P.H.N de With

1e promotor : prof.dr. R.M. Aarts

2e promotor : prof.dr. J.B.A.M. Arends

copromotor : dr.ir. R. Haakma (Philips Research)

leden : prof.dr. P. Markopoulos

Prof.Dr.-Ing.Dr.med. S. Leonhardt (RWTH Aachen University)

prof.dr.ir. S. Van Huffel (KU Leuven)

On the Analysis and Classification of Sleep

Stages from Cardiorespiratory Activity

Xi Long

On the Analysis and Classification of Sleep Stages from Cardiorespiratory Activity / by Xi

Long – Eindhoven : Eindhoven University of Technology, 2015.

A catalogue record is available from the Eindhoven University of Technology Library.

Proefschrift. – ISBN : 978-90-386-3850-8.

The research presented in this thesis was supported by Philips Group Innovation – Research,

Eindhoven, The Netherlands.

Cover Design : Ya Shu, Eindhoven, The Netherlands.

Reproduction : Eindhoven University of Technology.

Copyright c© 2015, Xi Long

All rights reserved. Copyright of the individual chapters belongs to the publisher of the journal

listed at the beginning of each respective chapter. No part of this publication may be reproduced,

stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,

photocopying, recording or otherwise, without the prior written permission from the copyright

owner.

To my beloved parents

to the memory of my wonderful youth

and to my country, China

Summary

Sleep is a state of reversible disconnection from the environment and plays an exceptionally

essential role in maintaining internal homeostasis, memory consolidation, energy conservation,

and cognitive and behavioral performance. Nowadays, problems in sleeping are widely preva-

lent around the world with increasing sleep complaints. Historically, such problems have been

less common because the regulation of sleep is synchronized with the external environment

through a biological circadian rhythm. However, since we are now living in a modern indus-

trialized society with artificial environments where lighting, heat, and food are available at any

moment, sleep disturbances and disorders have reached epidemic levels. People experience the

symptoms of disturbed sleep such as fatigue, increased impulsiveness, and agitation without

being aware of the link between these issues and their sleeping patterns.

In order to have a healthy condition in body and mind, people should be empowered with

the ability to monitor sleep easily and without disturbing the sleep, to assess sleep quality or

sleep-related problems and to be able to adjust their sleep habits accordingly. However, the

traditional sleep monitoring method, known as polysomnography (PSG), has the problems that

the monitoring is usually accomplished in a sleep laboratory with costly facilities, and many

sleep-disturbing devices with electrodes and wires have to be attached to the body. Furthermore,

the measurements of such devices can only be interpreted by highly trained sleep clinicians.

Therefore, although PSG is currently considered the gold standard and common practice for

sleep monitoring, it is very unfit for daily use in a home scenario by people without specialized

training, and will introduce undesired sleep disturbances. This has motivated the investigation

of alternative sensors and methods that allow for monitoring sleep in an unobtrusive manner,

preferably inexpensive and with no requirement of training.

Objective sleep assessment is often based on monitoring sleep stages throughout the night.

In the past decades, cardiorespiratory signals have attracted more and more attention in the

context of sleep staging or sleep stage classification. Cardiorespiratory activity has been shown

to associate with sleep stages through the regulation of the autonomic nervous system. More

importantly, cardiorespiratory signals can be acquired unobtrusively using advanced technolo-

gies such as microwave Doppler radar, ballistocardiography, photoplethysmography, pressure-

sensitive bed sheets, acoustic devices, and near-infrared cameras. Thus, investigating cardiac

vii

viii Summary

and respiratory characteristics in different sleep stages is important for providing a reliable per-

formance in sleep stage classification, with which a more adequate sleep assessment can be

delivered.

This thesis first exploits characteristics of cardiac/respiratory activity and their interaction

during sleep using several signal analysis methods. These are: frequency band adaptation on

heart rate variability (Chapter 2), dynamic time/frequency warping and uniform scaling (mea-

suring self-dissimilarity) for respiration (Chapter 3 and Chapter 4 respectively), analysis of

breathing depth and volume (Chapter 5), and visibility graph analysis in complex networks for

cardiorespiratory interaction (Chapter 6). Based on these methods, novel cardiorespiratory fea-

tures (expressing certain physiological properties) are proposed to classify sleep stages. Results

show that these features can help to profoundly improve performance of sleep stage classifica-

tion.

In addition, an interesting finding is demonstrated in Chapter 7, which is that there is a time

delay between the changes in brain activity and autonomic variations during sleep transitions.

It appears that the cardiac changes consistently precede the variations in brain activity during

light-deep sleep and sleep-wake transitions. In Chapter 8, this finding is utilized to detect deep

sleep (i.e., slow wave sleep) by using the feature values from with a preceding time interval of

a few minutes before, which can help to significantly improve the detection results. Further-

more, the major challenge of sleep stage classification based on cardiorespiratory activity is

discussed in Chapter 9. It is found that the classification performance is mainly limited by the

between- and within-subject variations in autonomic physiology as well as subject demograph-

ics. Therefore, methods of feature normalization and feature smoothing over the entire night

are proposed in Chapter 10, which serve to reduce these variations between and within subjects

that are observed in the cardiorespiratory features. As a result, marked improvements in sleep

stage classification are observed.

In summary, this thesis focuses on objectively analyzing and classifying sleep stages using

cardiorespiratory signals. It shows that by extracting novel features from the signals, post-

processing features using normalization and smoothing, and applying new findings regarding

autonomic-brain time delay, the sleep stage classifiers can be substantially improved with reli-

able results being ultimately achieved.

Nederlandse samenvatting

Slaap is een omkeerbare toestand van ontkoppeling met de omgeving en speelt een buitenge-

woon belangrijke rol in het instandhouden van de interne homeostase, consolidatie van het

geheugen, energiebesparing, cognitieve prestaties en gedrag. Tegenwoordig komen in de hele

wereld problemen bij slapen in toenemende mate voor. Dit is in het verleden geen probleem

geweest omdat de regulatie van slaap altijd goed gesynchroniseerd is geweest met de omge-

ving door een biologisch circadiaan ritme, maar sinds we in een modern geındustrialiseerde

maatschappij leven met kunstmatige omgevingen waarbij licht, warmte, en eten beschikbaar

zijn op elk moment, bereiken slaapverstoring en slaapproblemen een epidemisch niveau. Men-

sen ervaren de symptomen van slaapverstoring zoals moeheid, toegenomen impulsiviteit en

agitatie zonder daarbij de relatie met hun slaappatroon te leggen.

Om een gezonde geestelijke en lichamelijke conditie te hebben zouden mensen de mogelijk-

heid moeten hebben om op een eenvoudige manier en zonder daarmee hun slaap te verstoren

hun slaapkwaliteit of slaapproblemen vast te kunnen stellen en hun slaapgewoontes daaraan

aan te passen. Gangbare slaapregistratiemethodes, bekend als polysomnografie (PSG), worden

toegepast in een slaaplaboratorium met dure faciliteiten, en veel slaapverstorende meetmetho-

den met elektrodes met draden verbonden aan het lichaam, waarvan de metingen bovendien

alleen geınterpreteerd kunnen worden door hoogopgeleide slaaptechnici. Hoewel PSG nu de

gouden standaard is en de gangbare praktijk is voor slaapregistratie, is het ongeschikt voor

dagelijks thuisgebruik door mensen zonder speciale opleiding en zal het ongewenst slaapver-

storingen introduceren. Dit heeft het onderzoek naar alternatieve sensoren en methodes gemo-

tiveerd die het meten zonder deze problemen mogelijk maken, bij voorkeur niet duur en zonder

speciale opleiding te gebruiken.

Objectieve vaststelling van slaapparameters is vaak gebaseerd op registratie van slaaptoe-

standen gedurende de hele nacht. In de afgelopen tientallen jaren hebben cardiaal-respiratoire-

signalen meer en meer aandacht gekregen bij het vaststellen van slaapfases en de classificatie

van slaap. Cardiaal-respiratoire activiteit blijkt gerelateerd te zijn aan slaapfases door de regu-

latie van het autonome zenuwstelsel en de signalen kunnen, nog belangrijker, verkregen worden

zonder daar hinder van te hebben door het gebruik van geavanceerde technieken zoals mi-

crogolf Doppler radar, ballistocardiografie, fotoplethysmografie, drukgevoelige bedlakens, en

ix

x Nederlandse samenvatting

nabij-infraroodcameras. Daarom is het onderzoek naar cardiaal-respiratoire-karakteristieken

van verschillende slaapfases belangrijk om een betrouwbare slaapclassificatie te verkrijgen,

waarmee een betere slaapbeoordeling mogelijk wordt.

Dit proefschrift benut de eigenschappen van cardiaal-respiratoire activiteit en hun interac-

tie tijdens slaap, gebruikmakend van signaalanalysemethodes. Dit zijn: adaptieve filters voor

hartslag (hoofdstuk 2), dynamische tijd-frequentie-warping en uniforme schaling (meten van

zelf-ongelijkheid) voor ademhaling (respectievelijk hoofdstuk 3 en 4), analyse van ademha-

lingsdiepte en -volume (hoofdstuk 5), en zichtbaarheidsgrafiek analyse van complexe netwerken

voor cardiaal-respiratoire-interactie (hoofdstuk 6). Gebruikmakend van deze methodes wor-

den nieuwe cardiale en respiratoire features (die fysiologische eigenschappen representeren)

voorgesteld voor het vaststellen van slaapfaseclassificatie. De resultaten laten zien dat deze

features de slaapfaseclassificatie grondig kunnen verbeteren.

Bovendien is een interessant fenomeen gedemonstreerd in hoofdstuk 7 betreffende de tijds-

vertraging tussen de activiteit van het brein en die van het autonoom zenuwstelsel gedurende de

overgangen van de slaapfases. Het blijkt dat de cardiale veranderingen consistent voorafgaan

aan de variatie van de breinactiviteit gedurende de overgangen tussen lichte en diepe slaap

en gedurende de overgangen tussen slaap en waak. In hoofdstuk 8 passen we dit fenomeen

toe om diepe slaap te detecteren, wat significant verbeterde detectieresultaten oplevert. Het

belangrijkste probleem van slaapfaseclassificatie met cardio-respiratoire-activiteit is behandeld

in hoofdstuk 9. Het blijkt dat de classificatieresultaten voornamelijk worden beperkt door de

variatie tussen proefpersonen en binnen proefpersonen, zowel in de autonome fysiologie, als in

de demografie van de proefpersonen. In hoofdstuk 10 zijn daarom methodes voor normalise-

ren en gladstrijken over de hele nacht van features voorgesteld, welke dienen om de genoemde

variaties in de cardio-respiratoire-activiteit te verminderen. Dit resulteert erin dat er markante

verbeteringen in de slaapfaseclassificatie worden waargenomen.

Samengevat richt dit proefschrift zich op het objectief analyseren en classificeren van slaap-

fases met gebruikmaking van cardio-respiratoire-signalen. Het laat zien dat door het afleiden

van nieuwe features uit de signalen, het verder bewerken van deze features door middel van

normalisatie en gladstrijken, en het toepassen van nieuwe bevindingen betreffende de tijdsver-

traging tussen de activiteit van het brein en die van het autonoom zenuwstelsel, de slaapfase-

classificatie substantieel verbeterd kan worden en uiteindelijk betrouwbare resultaten bereikt

kunnen worden.

List of abbreviations

AASM American Academy of Sleep Medicine

AIC Akaike’s information criterion

ANA Autonomic nervous activity

ANN Artificial neural network

ANOVA Analysis of variance

ANS Autonomic nervous system

AR Autoregressive

ASMD Absolute standardized mean difference

AUC Area under the curve

BB Breath-to-breath interval

BCG Ballistocardiography

BM Body movement

BMI Body mass index

BR Breathing rate

BS Baseline

CFS Correlation-based feature selection

CRI Cardiorespiratory interaction

CS Correction scheme

CV Cross validation

D Deep sleep (slow wave sleep)

DFA Detrended fluctuation analysis

DFT Discrete Fourier transform

DFW Dynamic frequency warping

DS Deep sleep (slow wave sleep)

DTW Dynamic time warping

DVG Difference visibility graph

DW Dynamic warping

ECG Electrocardiography

EEG Electroencephalography

xi

xii List of abbreviations

EMG Electromyography

EOG Electrooculography

FFT Fast fourier transform

FN False negative

FP False positive

HF High frequency

HHT Hilbert-Huang transform

HMM Hidden Markov model

HR Heart rate

HRV Heart rate variability

ICC Intra-group correlation coefficient

IG Information gain

IGLS Iterated generalized least squares

IQR Interquartile range

KNN K-nearest neighbor

L Light sleep

LD Linear discriminant

LF Low frequency

LOSOCV Leave-one-subject-out cross-validation

LS Light sleep

LSA Least squares approximation

N1 Stage 1 NREM sleep

N2 Stage 2 NREM sleep

N3 Stage 3 NREM sleep (slow wave sleep, stage S3 and S4)

NB Naive Bayes

NREM Non-rapid-eye-movement

PAT Peripheral arterial tone

PDFA Progressive DFA

pNN50 Percentage of successive RR differences >50 ms

PPG Photoplethysmography

PR Precision-recall

PSD Power spectral density

PSG Polysomnography

PSQI Pittsburgh Sleep Quality Index

PSSA Pressure-sensitive sensor array

PVE Proportion of variance explained

Q-Q Quantile-Quantile

QD Quadratic discriminant

QRS Three successive extrema in the ECG

OSA Obstructive sleep apnea

R REM sleep

List of abbreviations xiii

R&K Rechtschaffen and Kales

rANOVA Repeated measures ANOVA

RE Respiration (sometimes respiratory effort)

REM Rapid-eye-movement

RF Random forest

RIP Respiratory inductance plethysmography

rMNOVA Repeated measures multivariate ANOVA

RMSSD Root mean square of successive RR differences

ROC Receiver operating characteristic

RR Two successive heartbeats

RS REM sleep

S1 Stage 1 NREM sleep




SampEn Sample entropy

SaO2 Oxygen saturation

SCSB Static-charge-sensitive bed

SD Standard deviation

SDBR Standard deviation of breathing rates

SDNN Standard deviation of heartbeat intervals

SDSD Standard deviation of successive RR differences

SE Sleep efficiency (Chapter 2)

SE Sample entropy (Chapter 4 and 8)

SE Standard error (Chapter 9)

SFS Sequential forward selection

SOL Sleep onset latency

SSA Self-Assessment of Sleep and Awakening Quality Scale

ST Snooze time

STFT Short time fourier transform

SVM Support vector machines

SWS Slow wave sleep

TH Thresholding

TN True negative

TP True positive

TRT Total recording time

TST Total sleep time

TVPP Time-varying prior probability

TWT Total wake time

VG Visibility graph

VLF Very low frequency

xiv List of abbreviations

W Wake

WASO Wake after sleep onset

WDFA Windowed DFA

WRLD Wake, REM sleep, light sleep, and deep sleep

WRN Wake, REM sleep, and NREM sleep

Contents

Summary vii

Nederlandse samenvatting ix

List of abbreviations xi

1 General introduction 1

1.1 Human sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Sleep stages in electrophysiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Polysomnography – standard for sleep assessment . . . . . . . . . . . . . . . . . . . 3

1.4 Automatic sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 PSG-based sleep stage classification . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.2 Cardiorespiratory-based sleep stage classification . . . . . . . . . . . . . . . . 4

1.5 Research question and objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Part I: Signal Analysis for Sleep Stage Classification 11

2 Spectral boundary adaptation on heart rate variability for sleep and wake

classification 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 PSD estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Boundary adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.4 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.5 Feature evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.6 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.7 Classifier evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

xv

xvi Contents

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.1 Discriminative power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.3 Healthy subjects versus insomniacs . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.4 Determination of adaptive boundaries . . . . . . . . . . . . . . . . . . . . . . 27

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Sleep and wake classification with actigraphy and respiratory effort using

dynamic warping 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.1 Dynamic warping algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.2 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.3 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Analysis of respiratory effort amplitude for sleep stage classification 53

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


4.2.1 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.2 Signal preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Existing respiratory features . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2.4 Respiratory amplitude features . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.5 Subject-specific feature normalization . . . . . . . . . . . . . . . . . . . . . . 61

4.2.6 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.7 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Measuring dissimilarity between respiratory effort signals based on uniform

scaling for sleep staging 71

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


5.2.1 Subjects and protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.2 Polysomnographic measurements . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.3 Signal processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.4 Dissimilarity measure with uniform scaling . . . . . . . . . . . . . . . . . . . 74

5.2.5 Windowed dissimilarity feature . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2.6 Feature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Contents xvii

5.2.7 Sleep staging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Modeling cardiorespiratory interaction during sleep with complex networks 85

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Cardiorespiratory interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3 Visibility graph network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.4 Network properties of cardiorespiratory interaction . . . . . . . . . . . . . . . . . . . 88

6.4.1 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.4.2 Assortativity mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Part II: Timing Between Autonomic and Brain Activity 93

7 Time delay between cardiac and brain activity during sleep transitions 95

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96


7.2.1 Subjects and recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.2.2 EEG and cardiac activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.3 Correlation-analysis during sleep transitions . . . . . . . . . . . . . . . . . . . . . . . 98

7.4 Results and discission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8 Detection of nocturnal slow wave sleep based on cardiorespiratory activity 105

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106


8.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8.3.1 Signal preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


8.3.3 Spline fitting for feature smoothing . . . . . . . . . . . . . . . . . . . . . . . 110

8.3.4 Feature subset selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.3.5 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.3.6 Time delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8.4 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Part III: Cardiorespiratory-Based Sleep Stage Classification 122

xviii Contents

9 Effects of between- and within-subject variability on autonomic cardiores-

piratory activity during sleep and their limitations on sleep staging: a mul-

tilevel analysis 125

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126


9.2.1 Subjects and protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9.2.2 PSG recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9.2.3 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

9.2.4 Cardiorespiratory parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

9.2.5 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

9.2.6 Multilevel analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

9.2.7 Explanations of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

9.2.8 Between- and within-subject effects in sleep staging . . . . . . . . . . . . . . 132

9.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

9.3.1 Descriptive results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

9.3.2 Multilevel modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9.3.3 Proportion of variance explained . . . . . . . . . . . . . . . . . . . . . . . . 138

9.3.4 Sleep staging results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

10 Sleep stage classification with ECG and respiratory effort 149

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

10.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10.2.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


10.2.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10.2.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10.2.5 Validation and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

10.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

10.3.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

10.3.2 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

10.3.3 Comparison with state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . 162

10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

11 General discussion and future perspectives 165

11.1 Analysis of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

11.2 Sleep stage classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

11.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

11.4 Subject/patient groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

11.5 Objective and subjective sleep assessments . . . . . . . . . . . . . . . . . . . . . . . 176

Contents xix

11.6 Towards unobtrusive sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 178

List of the author’s publications 205

Acknowledgements 209

About the author 211

xx Contents

CHAPTER 1

General introduction

“Everything is one; during sleep the soul, undistracted, is absorbed into the unity;

when awake, distracted, it sees the different beings.”

— Chuang Tzu, 300 B.C., Warring States period 1

1Translated by M. Palmer, The Book of Chuang Tzu, 1st ed, Penguin Classics, 2006

1

2 Chapter 1. General introduction

1.1 Human sleep

Sleep occupies approximately one-third of our lifetime and it is very important to maintain

health and wellbeing, homeostasis, memory, and cognitive and behavioral performance [48,

165, 285]. Sleep exerts significant effects on the systemic hemodynamics, cardiac function,

endothelial function, and coagulation [311]. Sleep deprivation can lead to loss of daytime

performance, disturbance in circadian rhythm, impairments such as mental or physical fatigue,

reduced immune system, reduced cognitive functioning, and other health risks [10, 15, 45, 80].

In epidemiology and pathophysiology, it has been found that sleep disorders or abnormalities

are linked to depression, diabetes, metabolic syndrome, sudden death, and other cardiovascular

diseases such as cardiac arrhythmias, hypertension, atherosclerosis, stroke, and heart failure

[113, 252, 282, 287, 311].

Human sleep is a complex biological process with its own internal architecture expressed by

sleep states or stages [63, 281]. Sleep states usually include nighttime wakefulness, rapid-eye-

movement (REM) sleep, and non-REM (NREM) sleep, where NREM sleep can be further di-

vided into stage S1-S4 according to the rules recommended by Rechtschaffen and Kales (R&K)

[247]. With the more recent guidelines of the American Academy of Sleep Medicine (AASM)

[136], S3 and S4 are suggested to be merged, namely slow wave sleep (SWS) or “deep” sleep,

since no essential difference was found between them. Besides, S1 and S2 usually correspond

to “light” sleep [276]. Note that, for simplicity, the sleep states and stages are generally called

“sleep stages” in this thesis.

1.2 Sleep stages in electrophysiology

For normal or healthy subjects, sleep progresses with about four NREM-REM sleep cycles per

night, where each cycle lasts around 90-110 min on average, starting with light sleep followed

by deep sleep before REM sleep (Figure 1.1) [63, 216, 243]. Electrophysiological interpre-

tation of sleep stage changes during a sleep cycle can be described as follows. During sleep

onset (usually from wake to S1 sleep), the changes in electroencephalography (EEG) from

clear rhythmic alpha waves (8-13 Hz) to a mixed frequency pattern with less alpha waves but

more theta waves (4-7 Hz), accompanying a gradual decrease of muscle activity that can be

observed in electromyography (EMG) as well as slow and asynchronous eye movements in

electrooculography (EOG) [63]. Many studies argued that the acknowledgement of sleep onset

should require the presence of S2. This is because the transition from wake to S1 may not

coincide with perceived sleep onset and it often occurs several times, which is considered as

‘unequivocal sleep’ associated with a low arousal threshold where subjects often report they

are still awake [5, 82]. During S2 sleep, K-complexes or sleep spindles appear along with the

incremental presence of high-amplitude and low-frequency activity as S2 progresses [243]. Af-

terwards, sleep enters SWS (S3 and then S4), during which high-voltage (≥75 µV) and slow

(delta) wave activity (0.5-2 Hz) accounts for at least 20% (S3) or 50% (S4) of the EEG ac-

tivity with no eye movements [17, 136, 247]. SWS represents the most restorative period of

Chapter 1. General introduction 3

0 1 2 3 4 5 6 7 8

SWS

S2

S1

REM

Wake

Time (h)

Figure 1.1: An example of the sleep stage progression throughout an entire night hypnogram from a

healthy adult.

sleep for metabolic functioning associated with sleep quality [10, 91], where brain and body

energy can be efficiently conserved and recovered [34, 35] and new memories are consolidated

[61, 285, 302]. During SWS, the field potentials in EEG oscillations are related to synchro-

nized patterns of burst-pause firing in cortical neurons [92, 284]. REM sleep correlates to burst

of rapid eye movements, muscle atonia, low-voltage brain waves, and irregular heartbeats and

breathing and this is the state where dreaming often takes place [88].

1.3 Polysomnography – standard for sleep assessment

In clinical practice, overnight polysomnography (PSG) is currently regarded as the “gold stan-

dard” for objective assessment of sleep architecture/pattern and occurrence of sleep-related dis-

orders such as insomnia, parasomnia, sleep-disorder breathing (apnea and hypopnea), and REM

sleep behavior disorder [136, 168, 247]. It is usually recorded in a sleep laboratory (Figure 1.2).

PSG comprises multi-channel biological signals including EEG, EMG, EOG, electrocardiog-

raphy (ECG), airflow, respiratory effort (chest and abdomen), blood oxygen saturation (SaO2),

etc. Figure 1.3 shows an example of a PSG recording (20 min) of a healthy adult. According

to the R&K rules [247] or the AASM guidelines [136] (or the revised version [39]), overnight

sleep stages are typically scored by trained sleep technicians on continuous 30-s epochs through

visually inspecting the EEG, EMG, and EOG channels in PSG, forming a hynogram throughout

the entire night.

1.4 Automatic sleep monitoring

1.4.1 PSG-based sleep stage classification

As stated, objective assessment of sleep is often based on analyzing overnight sleep architecture

so that identifying sleep stages is required. Since 1968, manual scoring with PSG recordings

has become the gold standard in clinical environment for identifying sleep stages [247], where


Figure 1.2: A sleep laboratory where a subject was being monitored with PSG and a sleep technician

was visually inspecting the recorded PSG (adapted from source: www.newscenter.philips.com).

rules and regulations had been used for more than 40 years. In the 21 century, the AASM guide-

lines [136] and their updated version [39] were respectively released in 2007 and 2012, which

can yield an increased inter-rater agreement when scoring sleep stages. However, visually scor-

ing is very tedious and time consuming. This has resulted in a large number of studies (since

1970s) focusing on investigating computer-assisted automatic sleep staging systems with PSG

channels including EEG, EMG, and/or EOG [40, 104, 114, 115, 173, 223, 300] where reliable

classification results have been achieved. Further, some validated computer-assisted sleep scor-

ing systems have been applied in clinical routine such as the Somnolyzerr developed by the

SIESTA group [19, 20].

1.4.2 Cardiorespiratory-based sleep stage classification

Although PSG is the gold standard and common practice for objective sleep assessment and

the sleep stage classification based on PSG can be automated, it has several disadvantages from

healthcare perspective. For example, it requires high costs of facilitating equipment in a sleep

laboratory, it disrupts ‘normal’ sleep, and it is inapplicable for long-term sleep monitoring at

a home environment. This has motivated the investigation of signals and sensors that allow

for reliable physiological measurements during sleep in an unobtrusive and convenient man-

ner. Figure 1.4 shows an obtrusive (with PSG) and an expected unobtrusive scenario for sleep

monitoring. In this context, alternatives such as body movements and cardiorespiratory activity

have attracted more and more attention in the past years, mainly because they can be easily


S3 S4 S3 S2 Wake S2 S2 S2 S2 S1 S2 S2 S2 S2 S2 S1 S2 S1 S2 REM S1 S1 REM REM REM REM REM S1 S1 Wake S1 S2 S1 REM REM REM REM REM Wake Wake

Figure 1.3: An example of an continuous PSG recording (20 min) with multiple channels of bio-signals

from a healthy adult. The channels from top to down are: hypnogram, EEG (Fp1-A2), EEG (C3-A2),

EEG (O1-A2), EEG (Fp2-A1), EEG (C4-A1), EEG (M2-M1), EOG (P8-A1 left), EOG (P18-A1 right),

EMG (mental), EMG (leg), ECG (chest), airflow, respiratory effort (chest wall movements), respiratory

effort (abdominal wall movements), and SaO2.

acquired using less-obtrusive or even non-contact sensors with minimal discomfort to subjects

along with the fast development of wearable/off-body unobtrusive sensing techniques.

Body movements can be measured with several methods. For example, actigraphy is a

well-known less-obtrusive way of measuring one’s body movements that undergo using an ac-

celerometer, typically worn on wrist. It has been extensively studied [18, 74, 126, 204, 234] and

is regarded as a standard method for sleep assessment when PSG is not available [204]. There

are many commercialized actigraphy-based products to monitor sleep. For example, Philips

developed an Actiwatch [229] to measure activity counts during sleep, which is a clinically

validated device. Recently, Fitbit [105] and Jawbone [143] also released their wearable prod-

ucts that can quantify body movements for sleep monitoring. Some studies proposed to use an

‘off-the-shelf’ smartphone by placing it in the bed or close to the pillow to capture body move-

ments during sleep and satisfactory results were obtained in computing some sleep statistics

compared with actigraphy [33, 208]. Contrary to PSG, actigraphy can only be used to iden-

tify sleep and wake periods rather than different sleep stages. This is because it only measures


(a) (b)

Figure 1.4: Scenario of (a) an obtrusive (with PSG) and (b) a conceptual unobtrusive sleep monitoring

(adapted from source: www.newscenter.philips.com).

physical movements of the body, reflecting limited internal physiological information [33, 258].

Researchers argued that, even for distinguishing between sleep and wake states, actigraphy still

accounts for errors when compared with PSG [33, 295, 310]. For example, it can not deal with

the misidentifications of ‘quiet-wake’ with low or no body activity, leading to a low accuracy

in detecting wakefulness [18, 87, 234], in particular for subjects with insomnia, jet lag, or shift

work [175, 220]. To obtain a better performance in identifying sleep/wake and to achieve clas-

sification of multiple sleep stages, additional physiological information is required. Figure 1.5

compares the overnight sleep stages with the corresponding actigraphy measured by a Philips

Actiwatch from a healthy subject [see Figure 1.5(a) and Figure 1.5(b)]. It indicates that, instead

of different sleep stages, activity count in actigraphy is only correlated to sleep and wake states.

Using solely actigraphy to classify multiple sleep stages is of inadequacy thus.

Cardiorespiratory activity is characterized differently by sleep stages due to the substantial

differences in manifestation or regulation of autonomic nervous system (ANS) including sym-

pathetic activity and parasympathetic (or vagal) activity [13, 226, 267, 281, 292]. Mostly, they

have ‘opposite’ actions where one activates a response in physiology while the other suppresses

it [231]. In regard to cardiac activity, for example, heart rate (HR) and standard deviation of

normal-to-normal heartbeat/interbeat intervals (SDNN) are associated with sympathetic activ-

ity, the spectral power in the high-frequency band between 0.15 and 0.4 Hz is a marker of

parasympathetic nervous modulation activated by respiratory-stimulated stretch receptors, and

the spectral power in the low-frequency band between 0.04 and 0.15 Hz is assumed to indicate

sympathetic tone [12, 24, 265, 288]. All these non-invasively measured characteristics have

been experimentally shown to differ across sleep stages. In addition, respiratory dynamics,

such as respiratory frequency (breathing rate, BR) [95], respiratory variability [256], and res-

piratory regularity [67, 129], have also been proven to vary over sleep stages. This means that

cardiac and respiratory activity can be in turn used to separate sleep stages, which is of signifi-

cant clinical relevance. As displayed in Figure 1.5, the hypnogram with full sleep stages seems

more correlated to the variations of BR and HR in comparison with actigraphy. Furthermore,


0 100 200 300 400 500 600 700 8000

100

200

300

400

Time in epoch (30 s)

Activity c

ount (a

.u.)

0 100 200 300 400 500 600 700 800

SWS

S2

S1

REM

Wake

(a)

(c)


0 100 200 300 400 500 600 700 8002.5

3.5

4.5

5.5


SD

NN

(a.u

.)

0 100 200 300 400 500 600 700 800-7

-5

-3

-1


SD

BR

(a

.u.)

(b)

(d)

Figure 1.5: Comparison between (a) a hypnogram, (b) an actigraphy measured by Philips Actiwatch,

(c) standard deviation of normal-to-normal heartbeat intervals (SDNN), and (d) standard deviation of

breathing rates (SDBR) from a healthy adult.

the coupling or interaction between cardiac and respiratory signals has also been demonstrated

to change over sleep stages in previous work [29, 30, 41]. For example, SWS corresponds to

an enhanced phase synchronization between heartbeats and respiration [30].

In the past decade, researchers have dedicated on exploring new sensors or approaches to

acquire cardiac and/or respiratory signals, which can eventually be applied for sleep analysis.

Instead of the traditional Holter system, wearable textile electrodes were developed for record-


Table 1.1: Summary of some unobtrusive or less-unobtrusive approaches for measuring cardiac

and/or respiratory activity

Approach Activity Placement References (examples)

Textile electrode ECG/RE On-body patch or T-shirt [93, 176, 199, 221, 316]

PAT HR Wrist [28]

PPG HR and RE Wrist [174, 318]

BCG HR/RE Mattress, pillow, load cells, or bed [66, 68, 161, 200, 218]

PSSA RE Bedsheet [141, 264]

Web-camera HR In front of face [232]

Infrared-camera RE Bedside table [128]

Microphone RE Off-body [52, 228]

Radar RE Off-body [85, 319]

PAT: peripheral arterial tone; PPG: photoplethysmography; BCG: ballistocardiography; PSSA: pressure-

sensitive sensor array; HR: heart rate (pulse rate for PAT and PPG); RE: respiration.

ing ECG signals [93, 176, 199, 221, 316]. Bar et al. [28] proposed a WatchPAT ambulatory

system to obtain peripheral arterial tone (PAT) signal from which HR or heartbeat interval can

be derived [28]. More recently, photoplethysmography (PPG) is becoming a more widely used

approach that is placed at the skin surface to detect blood volume changes in the microvascular

bed of tissue [16]. From PPG, HR and respiration are able to be reliably estimated [174, 318].

Several PPG-based watches are available in the market including Adidas miCoach [2], Mio

Alpha [201], TomTom Runner Cardio [291], etc. Ballistocardiography (BCG), collected with

piezo-electric sensors for example, has also received a growing recognition as long as it can

be acquired non-invasively and it contains physiological activity of HR and even respiration

[7, 189]. It has been increasingly employed to monitor sleep as an integrated form of mat-

tress [218], load cells [68], (underneath) pillow [66], or bed [161, 200]. Furthermore, a textile

bedsheet with a pressure-sensitive sensor array was designed to estimate respiration and body

posture during bedtime sleep [141, 264]. In addition, video-based [128, 232] and audio-based

[52, 228] approaches were applied to measure cardiac or respiratory activity. They can also be

obtained with a off-body microwave Doppler radar or radio-frequency sensor [85, 319]. All

these approaches can be potentially used for unobtrusive sleep monitoring. Table 1.1 summa-

rizes some unobtrusive or less-unobtrusive approaches for cardiac and/or respiratory measure-

ment.

Automatic classification of sleep stages using body movements, cardiac activity [or more

specifically, heart rate variability (HRV)], and/or respiratory activity has been intensively re-

searched to date due to the rationale of the regulatory autonomic fluctuations occurring over

various sleep stages. With actigraphy used to quantify body movements, the studies were

mostly focused on detecting sleep/wake states [74, 126, 259]. Combining body movements

with cardiorespiratory activity can result in a superior sleep/wake performance [89, 150]. By

means of cardiac and/or respiratory signals, numerous papers were published aiming at differ-

ent classification tasks, such as the classification between sleep and wake [145, 151], between


Data

acquisitionSignal

pre-processing

Feature

extraction

Feature post-

processing

Feature

selection

Sleep stage

classification

Figure 1.6: A general framework of sleep stage classification, in which the present thesis is devoted to

feature extraction and feature post-processing.

wake, REM sleep, and NREM sleep [94, 161, 248, 249, 303], between wake, REM sleep, light

sleep, and SWS [138, 309], between REM sleep and NREM sleep [69, 197], between light sleep

and SWS [51], between SWS and all the other sleep stages [68, 273], and even between full

sleep stages (wake, REM sleep, S1 sleep, S2 sleep, and SWS) [167, 214, 315]. Note that some

of those studies executed several different sleep stage classification tasks and some made also

use of information about body movements. The general framework of sleep stage classification

is illustrated in Figure 1.6.

1.5 Research question and objective

The main research question addressed in the present thesis is can overnight sleep stages be

classified reliably with body movements, cardiac activity, and/or respiratory activity?

It is known that, to discriminate between sleep stages, plenty of existent features describing

certain physiological characteristics of cardiorespiratory activity have been presented (see, e.g.,

[32, 59, 197, 249, 289, 315]). However, in comparison with PSG scoring, the sleep staging per-

formance obtained using those features still remains low, suggesting a strong need for further

improvement to obtain more reliable results. This motivated us to exploit additional information

in autonomic activity characterized by sleep stages that is complementary to the existing fea-

tures. On the other hand, variability between and within subjects conveyed by the signals would

be considerable barriers to achieving a reliable performance in sleep stage classification. The

between-subject variability (individual difference) that influences the cardiorespiratory activity

can relate to subject demographics (including body size) such as age, gender, and body mass in-

dex, and internal physiology such as response of autonomic regulation and metabolic function.

Additionally, some other factors (e.g., measuring noise, body movements, conscious breath-

ing control, sleep environment, daytime activity, and stressful events) varying within subjects

or from subject to subject may also affect the nighttime cardiorespiratory activity. Therefore,

there is an urge to extract features that are not or less affected by the between-/within-subject

variability on their own or to (post-) process the features for the purpose of mitigating those

variations appeared in the signals.

The general objective of this thesis is to achieve performance improvement in reliably classi-

fying sleep stages based on the above-mentioned signals. As indicated in Figure 1.6, this thesis


is on (1) extracting new features that contain cardiorespiratory characteristics in addition to the

existing features and/or are robust to the variability between or within subjects, and (2) reduc-

ing the variability in cardiorespiratory signals through feature post-processing (Figure 1.6). It

is noted that the focus of population in this thesis is mainly on healthy subjects whereas the

patients with disordered sleep are out of our scope.

1.6 Outline of the thesis

The thesis is comprised of three parts. Part I introduces novel informative features for sleep

stage classification by analyzing the cardiac and respiratory signals in different aspects of phys-

iology. These new features include HRV spectral powers with adaptive boundaries using HRV-

derived respiratory frequency (Chapter 2), respiratory self-similarity in signal morphology

quantified by means of dynamic warping (Chapter 3) and uniform scaling (Chapter 5), mea-

sures expressing certain properties of breathing depth and volume (Chapter 4), and interaction

between heartbeats and respiration translated in a two-dimensional visibility graph (Chapter 6).

The detailed methods and results of the new features are provided in the corresponding chap-

ters where they are originally designed to improve the performance for different classification

tasks. In other words, these features should have their own special superiority in identifying

certain sleep stages. Taking the respiratory self-similarity features as examples, the use of dy-

namic warping distance aims at detecting wakefulness out of sleep while using uniform scaling

measure can help separating SWS and the other sleep stages.

Part II demonstrates the findings with regard to timing between autonomic and brain activ-

ity and examines its usefulness in sleep stage classification. Chapter 7 discusses the time delay

phenomenon between changes in cardiac and brain activity for different hierarchy of sleep stage

transitions. In Chapter 8, we apply these findings to help predict SWS periods using early car-

diorespiratory activity that precedes the transitions between SWS and the other sleep stages

with a few minutes.

Part III is devoted to understanding the challenges from the effects caused by between-

/within-subject variability for separating sleep stages. Chapter 9 systematically quantifies and

assesses those effects on cardiorespiratory activity caused by difference in, for example, sub-

ject demographics, internal physiology, and time of the night. In order to overcome these

challenges to some extent, in Chapter 10, we proposes to normalize and smooth features for

each night’s recording aiming at diminishing the between-subject and within-subject variabil-

ity, respectively. The classification results using an extended feature set (including the existing

features presented in literature and the new features in this thesis) with feature post-processing

are reported in this chapter.

The last chapter (Chapter 11) generally discusses the work presented in this thesis and

answers the main research question raised before. Additionally, future work that would be

interesting and promising for sleep stage classification is suggested.

Part I: Signal Analysis for Sleep Stage

Classification

CHAPTER 2

Spectral boundary adaptation on heart rate variability for

sleep and wake classification

This chapter is adapted from: X. Long, P. Fonseca, R. Haakma, R. M. Aarts and J. Foussier. Spectral

boundary adaptation on heart rate variability for sleep and wake classification. International Journal on

Artificial Intelligence Tools, 23(3):1460002, 2014. c©World Scientific Publishing

Abstract – A method of adapting the boundaries when extracting the spectral features from

heart rate variability (HRV) for sleep and wake classification is described. HRV series can be

derived from electrocardiogram (ECG) signals obtained from single-night polysomnography

(PSG) recordings. Conventionally, the HRV spectral features are extracted from the spectrum

of an HRV series with fixed boundaries specifying bands of very low frequency (VLF), low

frequency (LF), and high frequency (HF). However, because they are fixed, they may fail to

accurately reflect certain aspects of autonomic nervous activity which in turn may limit their

discriminative power, e.g. in sleep and wake classification. This is in part related to the fact that

the sympathetic tone (partially reflected in the LF band) and the respiratory activity (modulated

in the HF band) vary over time. In order to minimize the impact of these variations, we adapt

the HRV spectral boundaries using time-frequency analysis. Experiments were conducted on

a data set acquired from two groups with 15 healthy and 15 insomnia subjects each. Results

show that adapting the HRV spectral features significantly increased their discriminative power

when classifying sleep and wake. Additionally, this method also provided a significant im-

provement of the overall classification performance when used in combination with other HRV

non-spectral features. Furthermore, compared with the use of actigraphy, the classification

performed better when combining it with the HRV features.

13

14 Chapter 2. Spectral boundary adaptation on heart rate variability

2.1 Introduction

Sleep plays an important role in human health. Night-time polysomnography (PSG) record-

ings, along with manually scored hypnograms, are considered the “gold standard” for objec-

tively analyzing sleep architecture and occurrence of sleep-related problems [247, 248]. PSG

recordings are typically recorded and analyzed in sleep laboratories, and are usually split into

non-overlapping time intervals (or epochs) of 30 s according to the Rechtschaffen & Kales

(R&K) rules [247].

As shown in literature, monitoring heart rate variability (HRV) during bedtime is helpful

in sleep staging [89, 248], particularly to distinguish between rapid-eye-movement (REM) and

non rapid-eye-movement (NREM) [59, 197]. It reflects the variation, over time, of the period

between consecutive heart beats. HRV is derived from the length variations of RR-intervals,

i.e. time intervals between consecutive R-peaks of the QRS complex in the electrocardiogram

(ECG). Spectral analysis of HRV has been widely employed in the assessment of autonomic

nervous activity during bedtime [59, 197, 299]. It traditionally involves the computation of

the power spectral density (PSD) of an HRV series. An HRV spectrum is typically divided

in three bands, namely in a very low frequency (VLF) band from 0.003 to 0.04 Hz, a low

frequency (LF) band from 0.04 to 0.15 Hz, and a high frequency (HF) band between 0.15

and 0.4 Hz [190, 288]. These bands are then be used to compute certain properties such as

the spectral power of the VLF, LF, and HF components and the power ratio of low-to-high

frequency (LF/HF) components [59, 202, 265]. In general, it has been found that the VLF

spectral power is associated with long-term regulatory mechanisms, the LF spectral power is a

marker of sympathetic modulation of the heart and it also reflects some parasympathetic activity

when the respiratory frequency components partially fall into the LF band, the HF spectral

power is related to parasympathetic activity mainly caused by respiratory sinus arrhythmia

(RSA), and the LF/HF ratio is an indication of sympathetic-parasympathetic balance [265, 275,

288]. In particular, the HRV spectrum usually contains a peak centered around the respiration

frequency, located in the HF band, and another peak in the LF band which reflects, to a certain

degree, sympathetic activation [13, 190, 219].

The parameters derived from HRV PSD are often used as “features” in automatic sleep

staging [248] or sleep and wake classification systems [89]. Previous work has used HRV

spectral features with fixed boundaries for sleep and wake classification [89]. This classifier

exploits the fact that sympathetic tone and the respiratory activity are modulated in different

frequency bands of the HRV spectrum and exhibit different properties during sleep and wake,

allowing them to be distinguished.

It is known that the HRV spectrum and the dominant (or peak) frequencies of the LF and

HF bands are not constant but rather vary over time according to the autonomic modulations

of the heart beats [288]. Hence, as long as fixed band boundaries are used to compute HRV

spectral features, we might produce inaccurate estimates of cardiac autonomic activities. Since

the discrimination of sleep states (or sleep and wake in our case) depends on these estimates,

the classification accuracy will be affected. To avoid this issue, we will use a feature adaption

method while estimating the HRV features.

Part I. Signal analysis for sleep stage classification 15

0 0.1 0.2 0.3 0.4 0.5−2

0

2

4

6

8

10

12

14

16

18

x 10−3

Frequency (Hz)

No

rma

lize

d P

SD

(m

s2/H

z)

Wake (mean)

Wake (standard errors)

Sleep (mean)

Sleep (standard errors)

Figure 2.1: An example of the mean HRV PSD with standard errors for sleep and wake states over an

entire-night’s recording of a subject.

Figure 2.2: An example of the normalized HRV PSD versus time (30-s epoch) over an entire-night’s

recording of a subject.

The problem of boundary adaptation has been analyzed before in other areas such as stress

detection [25, 117] and anesthesia analysis [270]. It has been suggested that the LF and HF

boundaries are related to the peak frequency in the traditional LF band, called “LF peak fre-

quency”, and the peak frequency in the traditional HF band, called “HF peak frequency”, re-


Data

Acquisition

(HRV) PSD

Estimation

Feature

Extraction

Boundary

Adaptation

Classi cationfi

(training/testing)

Classi erfi

Evaluation

Feature

Evaluation

Spectrum Information

(LF and HF peaks)

Figure 2.3: Block diagram of the feature adaptation method used for sleep and wake classification.

spectively [117, 270]. In practice, these two peak frequencies can be estimated by determining

the frequency of local maximum in the band between 0.003 and 0.15 Hz (i.e. the traditional

VLF band and LF band) and in the band from 0.15 to 0.4 Hz (i.e. the traditional HF band),

respectively. The working assumption here is that the peaks always fall within those two bands.

By centering the new bands around these peaks instead of using fixed boundaries, we can com-

pensate for their time-varying behavior. This should help, to some extent, reduce within- and

between-subject variabilities in the way these features express sympathetic activation and res-

piratory activity, ultimately helping improve sleep and wake classification. Figure 2.1 shows an

example of the mean HRV PSDs with standard errors [standard deviations (SD)] for sleep and

wake states of a subject. It can be observed that, although their standard errors overlap, their

mean values are not the same in different frequency ranges. This should provide an opportunity

of discriminating between sleep and wake states. Figure 2.2 illustrates the time variation of the

HRV PSD for a subject.

2.2 Materials and methods

The proposed boundary adaptation method applied on HRV spectral features used for sleep and

wake classification is described by a block diagram in Figure 2.3. Each block will be explained

further in the following subsections.

2.2.1 Data acquisition

In total the data acquired from 30 subjects were used in our experiment. Fifteen subjects belong

to healthy group and fifteen subjects are insomniacs. The insomniacs were randomly selected

from a larger-sized group in order to evenly compare the classification performance between


Table 2.1: Summary of subject demographics

Parameter Mean ± SD

Healthy Sex 5 males and 10 females

Group Age (y) 31.0 ± 10.4

(N = 15) BMI (kg/m2) 24.4 ± 3.3

Sleep Efficiency (%) 92.3 ± 3.8

Insomnia Sex 8 males and 7 females

Group Age (y) 47.4 ± 14.5

(N = 15) BMI (kg/m2) 27.7 ± 4.5

Sleep Efficiency (%) 69.7 ± 14.7

the healthy and insomnia groups, from which we ensured that the numbers of subjects are

equal. A subject was considered healthy if his/her Pittsburgh Sleep Quality Index (PSQI) [60]

was less than 6, while a subject was considered insomnia based on his/her self-report. For each

subject, a full PSG was recorded according to the guidelines of the American Academy of Sleep

Medicine (AASM) [136]. Among the 30 subjects, the PSG recordings of fifteen insomniacs

and nine healthy subjects were recorded in the Sleep Health Center, Boston, USA during 2009

(Alice 5 PSG, Philips Respironics) and of the remaining six healthy subjects in the Philips

Experience Lab, Eindhoven, The Netherlands during 2010 (Vitaport 3 PSG, TEMEC). The

ECG was recorded with a modified V2 Lead, sampled at 500 Hz (Boston data) and 256 Hz

(Eindhoven data).

Sleep stages were manually scored on 30-s epochs by sleep experts according to the AASM

guidelines as wake, REM sleep, and each of the NREM sleep stages (N1-N3). For sleep and

wake classification, we considered two classes wake and sleep (including REM and NREM

sleep). Each PSG recording was manually clipped to the time interval comprised between the

instant when the subject turned the lights off with the intention of sleeping until the moment the

lights were turned on before the subject got out of bed in the morning. The study protocol was

approved by the ethics committee of both centers and all subjects signed an informed consent.

The subject demographics including sex, age, body mass index (BMI), and sleep efficiency are

summarized in Table 2.1.

2.2.2 PSD estimation

To estimate the PSDs of HRV, RR-intervals were first computed from the ECG signals. In our

study, the following steps were performed to obtain an RR interval series: (1) a peak detec-

tor based on the Afonso-Tompkins filter-bank algorithm [4] was used to locate the R peaks,

yielding an RR-interval series; (2) the very short (less than 0.3 s) and long (more than 2 s) RR

intervals (usually caused by ectopic heart beats, misidentification of R peaks, or badly attached

electrodes during measurement) were removed; (3) the RR-interval series was normalized by

dividing it by the mean value; (4) the resulting series was “re-sampled” at 4 Hz using linear in-

terpolation; and (5) the PSD was finally estimated with an autoregressive model with adaptive


order automatically determined using the Akaike’s information criterion (AIC) [43].

2.2.3 Boundary adaptation

As explained in Section 2.1, the use of fixed boundaries in HRV spectrum may not be appro-

priate to accurately represent different states of the autonomic nervous system and further to

classify sleep and wake. The respiratory frequency, and therefore the corresponding peak in

the HF band vary in time. Likewise, the peak corresponding to the sympathetic tone in the LF

band also varies, reflecting differences in the autonomic activation during sleep. By applying

a time-frequency analysis, the boundaries that define each band can be dynamically adapted

so that the frequency components can be more correctly assigned to the corresponding bands.

To do this, it is required to estimate the LF and HF peak frequencies, which change over time.

Figure 2.4 illustrates, with a filled contour plot, an example of the HRV spectrum over time

for a subject together with the traditional fixed frequency bands. As it can be easily seen, the

dominant LF and HF peak frequencies vary over time. Moreover, it can be observed that, for

some epochs, the spectral power of a frequency band spills over its neighboring bands when

using the fixed boundaries. For instance, for the epochs from 140 to 150, the spectral power of

the LF band also partially falls into the HF band (see Figure 2.4).

By adapting the boundaries of the LF and HF bands for each epoch, we can overcome the

issues mentioned above. This can be achieved in the following way.

• The new HF band (HF∗) is centered on the HF peak frequency [25, 117] and has a con-

stant bandwidth of 0.1 Hz [153]. This bandwidth was chosen after analyzing the HRV

PSDs of all 15 healthy subjects and empirically determining that most of the spectral

power related to RSA lie within a bandwidth of 0.1 Hz. A larger bandwidth (0.25 Hz)

was empirically used in other work [25, 142], but we found that in some occasions it

overlapped its adjacent LF band .

• The new LF band (LF∗) is centered on the dominant frequency found in the traditional

LF band, and has a bandwidth of 0.11 Hz that is similar to the traditional definition.

• The new VLF band (VLF∗) is defined from its traditional lower limit of 0.003 Hz up to

the lower limit of the LF band.

Figure 2.5 illustrates the adapted boundaries for the same HRV PSD shown in Figure 2.4. We

note that the LF∗ and HF∗ bands overlap in some epochs. This occurs when the LF and HF

peaks are too close to each other or when there is no HF peak (often during REM sleep [36]).

2.2.4 Feature extraction

2.2.4.1 HRV spectral features

After determining the bands we can extract HRV-related features for sleep and wake classifica-

tion. In our study we computed the logarithm of the spectral power in the VLF∗, LF∗, and HF∗

bands (from here on expressed as hrv vlf, hrv lf , and hrv hf ) and, in addition, the ratio


Time (30 s epoch)−

Fre

quency (

Hz)

100 200 300 400 5000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

VLF band

LF band

HF band

Figure 2.4: HRV spectrum versus time (30-s epoch) of a subject. The fixed boundaries of the VLF, HF,

and LF bands are plotted in solid lines and the corresponding bands are indicated.

Time (30 s epoch)−

Fre

quency (

Hz)

100 200 300 400 5000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

HF*

band

LF*

band

VLF* band

Figure 2.5: HRV spectrum versus time (30-s epoch) of a subject. The limits of the new HF∗ and LF∗

bands are plotted in dotted and solid curves, respectively. The lower boundary of the new VLF∗ band (at

0.003 Hz) is plotted as a dashed line.

between the spectral powers of the LF∗ and the HF∗ bands (expressed as hrv lf/hf ). Before

computing the logarithm, the power of each band was first normalized. This was achieved by

dividing the power in the VLF∗, LF∗, and HF∗ bands by the total spectrum power [202, 298].

Alternatively we could have normalized it by dividing the power in each band by the total

spectrum power minus the power in the VLF∗ band [59, 299]. Since we did not observe any


significant differences in the final result, the first method was used.

2.2.4.2 Spectrum information

As mentioned, the adaptation of the new boundaries requires knowledge of information derived

from the power spectrum, namely the LF and HF peak frequencies, which must be obtained

before extracting the features. The LF peak frequency can be estimated by detecting the location

of the peak in the HRV spectral range from 0.003 to 0.15 Hz. The HF peak frequency can be

estimated from a respiratory effort signal simultaneously recorded with the PSG data or it can

be derived from the HRV series directly by searching for the peak in the range between 0.15 and

0.4 Hz. In this study, to avoid using an additional sensor modality, we used the latter approach.

2.2.5 Feature evaluation

A Hellinger distance metric [130] was employed to evaluate the discriminative power (i.e. class

separability) of the HRV spectral features between sleep and wake. It is estimated by computing

the amount of overlap between two probability density estimates in a binary class problem,

expressed as

DH(p,q) =

√

1−∑√

p(x)q(x) (2.1)

where p(x) and q(x) are the probability density estimates of the feature values given class sleep

and wake, respectively. In its most basic form, these density estimates can be computed by

means of a normalized histogram with either a fixed number of bins or a specific bin size. In

our study the histograms were computed with a fixed number of 100 bins. A larger Hellinger

distance reflects a higher discriminative power in separating the two classes.

2.2.6 Sleep and wake classification

It has been demonstrated that a linear discriminant- (LD-) based classifier is appropriate for

the task of sleep and wake classification [89, 177]. Assuming that all features are normally dis-

tributed and their covariance matrices for the two classes are identical, the “linear discriminant”

function is given by

Gc(f) =−1

2(f−µµµc)

T ΣΣΣ−1(f−µµµc)+ lnPr(c) (2.2)

where µµµc is the mean vector of the feature vector f, ΣΣΣ is the pooled covariance matrix, and

Pr(c) expresses the prior probability of class c.[97] In this study c = sleep as negative class

or c = wake as positive class. Based on a feature vector, the epoch is assigned to one class

when the computed discriminant score of this class minus that of the other class is higher than a

decision making threshold T (here we chose T = 0). For instance, an epoch is classified as sleep

if Gsleep(f) > Gwake(f) for this epoch. Because quadratic discriminants are known to require

a larger sample size than linear discriminants and they seem to be more sensitive to possible

violations of the assumptions of normality [110], the linear discriminant was used instead.


In regard to the prior probability Pr(c), it can be observed that the probabilities of different

classes vary throughout the night [249]. This prior probability is typically estimated during

training procedure. For a given class, for example, the probability of being asleep in the middle

of the night is much higher than just right after entering the bed or at the end of the night.

In order to exploit these variations, instead of using a fixed prior probability we computed a

time-varying prior probability for each epoch by counting the number of times that specific

epoch (relative to the instant when lights were turned off) was annotated as each class [108].

It should be pointed out that a prior probability ‘emphasis’ factor (or weight) γ (γ ∈ [0,1]) is

used to bias the classifier towards a pre-defined class, meaning that it can set a higher barrier of

being identified to one class and at the same time a lower one to another class during decision

making. Because the classes are imbalanced with much more sleep epochs than wake epochs

(this will be explained later), yielding a very low prior probability of wake in our study, we use

this factor to “emphasize” the wake class and meanwhile “penalize” the sleep class. Therefore,

the new time-varying prior probabilities after adding emphasis factor of the two classes are

Pr′(sleep) = γ · Pr(sleep) and Pr′(wake) = 1 − Pr′(sleep), where the factor γ of 0.79 was

experimentally chosen as a proper value in the case of sleep and wake classification.

2.2.7 Classifier evaluation

To assess the performance of this classifier, conventional measures of sensitivity (proportion

of correctly identified actual wake epochs) and specificity (proportion of correctly identified

actual sleep epochs) are often used. They can be calculated as sensitivity = TP/(TP+FN) and

specificity = TN/(TN+FP), with TP, TN, FP, and FN indicating the number of true positive,

true negative, false positive, and false negative classifications, respectively. However, these

two measures are not the most adequate criteria for this binary classification problem. The

reason is that the number of epochs of the wake class during an entire recording lasting the

whole night is naturally smaller than the number of epochs of the sleep class, in what is usually

called “imbalanced class distribution”. On average, the sleep and wake classes account for

respectively 92.3% and 7.7% of all epochs for the healthy subjects, and respectively 69.7% and

30.3% of all epochs in the insomnia group.

The Cohen’s Kappa coefficient of agreement [72] (denoted as κ) not only provides a better

understanding of the general performance of the classifier in correctly identifying both classes,

but also allows for a better interpretation of the imbalanced problem when it is used as a crite-

rion to optimize performance [26]. Although it indicates how well a classifier performs for both

classes, evaluating a method with this metric that represents a single point in the entire solution

space might not be sufficient [237]. An alternative is to use a receiver operating characteristic

(ROC) curve which plots the true positive rate (i.e. sensitivity or recall) versus false positive rate

(i.e. one minus specificity) thus illustrating the classifier’s performance over the entire solution

space by means of varying a decision making threshold [103]. However, the ROC curve has

been shown to be over-optimistic when there is a heavy imbalance between two classes [83], for

instance, sleep and wake in the healthy group. Hence, a so-called Precision-Recall (PR) curve

that plots precision versus recall is used instead, where precision = TP/(TP+FP), measures the


positive predictive value. When comparing different classifiers, a larger ‘area under the PR

curve’ (AUCPR) or ‘area under the ROC curve’ (AUCROC) indicates a better performance. In

this study, the three metrics (κ , AUCROC and AUCPR) were used to evaluate the performance

of sleep and wake classification with and without HRV boundary adaptation.

In addition, we combined the HRV spectral features with some other HRV (non-spectral)

features selected from the feature set used in previous work [89], including time domain fea-

tures [89, 248], nonlinear measures extracted using detrended fluctuation analysis [289] and

sample entropy [75]. Five HRV non-spectral features were selected using the feature selection

method described in [108]. This serves the purpose of examining whether the feature adap-

tation method described in this chapter can help improve the classification performance when

combined with other relevant features. Note that all features were extracted from the same

HRV series. Besides, we compared the results with those obtained using the actigraphy feature

(activity counts over a 30-s epoch, expressed as ac ), a well-known feature for sleep and wake

classification [74]. Finally, we also examined the classification performance by combining the

HRV features with this actigraphy feature.

2.3 Results

A leave-one-subject-out cross-validation (LOSOCV) procedure was conducted to assess the

discriminative power of the HRV spectral features and also to assess the performance of our

classifier. Table 2.2 compares the discriminative power (as measured by a Hellinger distance

DH) of the HRV spectral features using the traditional fixed boundaries and using the adaptive

boundaries for healthy and insomnia subjects. They were obtained by averaging the results

computed based on training data over all iterations of the LOSOCV process.

Table 2.3 and Table 2.4 summarize the classification performance obtained with and without

boundary adaptation using different sets of features for the healthy and insomnia groups. The

HRV spectral features consist of hrv vlf, hrv lf, hrv hf, and hrv lf/hf and the HRV non-

spectral features were selected based on the training sets during the cross-validation procedure.

The results are also illustrated in Figure 2.6 and Figure 2.7 using ROC and PR curves, giving an

overview of the performance of our sleep and wake classifier used in a two-dimension solution

space. Note that the ROC and PR curves were obtained by thresholding the discriminant scores

pooled over all iterations of the LOSOCV for each group.

2.4 Discussion

2.4.1 Discriminative power

Table 2.2 shows that, after using the adaptation method, the discriminative power of the HRV

spectral features are significantly increased for the subjects in both healthy and insomnia groups

(with a paired Wilcoxon signed-rank test). For comparison, the table also indicates the Hellinger

distance of the actigraphy feature ac . Although the feature adaptation helps, to different ex-

tents, improving the discriminative power of each HRV spectral feature, it is still relatively


Table 2.2: Discriminative power comparison of the HRV spectral features for

healthy and insomnia groups

Group Feature Hellinger Distance DH p value†

Fixed Boundaries Adaptive Boundaries

hrv vlf 0.19±0.02 0.22±0.01 0.0004

Healthy hrv lf 0.25±0.01 0.26±0.01 0.0026

Group hrv hf 0.23±0.01 0.29±0.01 0.0001

hrv lf/hf 0.22±0.01 0.27±0.01 0.0001

ac 0.49±0.02 –

hrv vlf 0.13±0.01 0.14±0.01 0.049

Insomnia hrv lf 0.18±0.01 0.19±0.01 0.0001

Group hrv hf 0.17±0.01 0.21±0.01 0.0001

hrv lf/hf 0.20±0.01 0.21±0.01 0.0001

ac 0.41±0.02 –

†Significance of difference between using fixed and using adaptive boundaries was

examined with a paired Wilcoxon signed-rank test.

Table 2.3: Classification performance (mean ± SD of accuracy, sensitivity, and specificity) for

healthy and insomnia groups

Group Feature set Accuracy (%) Sensitivity (%) Specificity (%)

Actigraphy feature 94.8 ± 2.7 46.8 ± 19.6 99.1 ± 1.0

HRV spectral features (F) 90.3 ± 9.0 32.7 ± 14.1 95.4 ± 9.6

Healthy HRV spectral features (A) 89.3 ± 10.7 33.9 ± 13.8 94.1 ± 11.6

Group HRV features† (F) 89.9 ± 8.5 50.6 ± 13.4 93.3 ± 8.9

HRV features† (A) 93.1 ± 4.2 49.7 ± 19.2 96.6 ± 3.3

Actigraphy + HRV features† (F) 95.7 ± 2.1 56.9 ± 16.4 99.0 ± 0.9

Actigraphy + HRV features† (A) 95.8 ± 2.2 58.1 ± 18.0 99.1 ± 0.9

Actigraphy feature 79.1 ± 12.1 47.2 ± 17.1 95.3 ± 3.5

HRV spectral features (F) 65.2 ± 12.9 42.9 ± 20.0 78.6 ± 17.3

Insomnia HRV spectral features (A) 69.0 ± 11.2 49.1 ± 16.5 78.5 ± 12.5

Group HRV features† (F) 70.1 ± 9.8 54.6 ± 18.4 80.7 ± 14.2

HRV features† (A) 72.9 ± 8.8 53.2 ± 15.3 83.0 ± 12.2

Actigraphy + HRV features† (F) 79.2 ± 10.8 57.4 ± 17.6 91.5 ± 9.0

Actigraphy + HRV features† (A) 80.6 ± 8.5 57.8 ± 16.9 92.2 ± 8.2

F: using fixed boundaries (without adaptation) on the HRV spectral features.

A: using adaptive boundaries (with adaptation) on the HRV spectral features.†The HRV features consist of the spectral features and the non-spectral features selected from a larger fea-

ture set used in [89].


Table 2.4: Classification performance (mean ± SD of κ and pooled AUCPR and

AUCROC) for healthy and insomnia groups

Group Feature set Kappa κ AUCPR AUCROC

Actigraphy feature 0.53 ± 0.15 0.67 0.90

HRV spectral features (F) 0.33 ± 0.18 0.30 0.71

Healthy HRV spectral features (A) 0.33 ± 0.19 0.36 0.74

Group HRV features† (F) 0.44 ± 0.25 0.51 0.80

HRV features† (A) 0.48 ± 0.24∗ 0.54 0.81

Actigraphy + HRV features† (F) 0.63 ± 0.10 0.71 0.89

Actigraphy + HRV features† (A) 0.64 ± 0.13 0.72 0.90

Actigraphy feature 0.45 ± 0.17 0.64 0.71

HRV spectral features (F) 0.20 ± 0.14 0.48 0.60

Insomnia HRV spectral features(A) 0.25 ± 0.13∗ 0.52 0.68

Group HRV features† (F) 0.31 ± 0.11 0.56 0.68

HRV features† (A) 0.34 ± 0.12∗ 0.59 0.72

Actigraphy + HRV features† (F) 0.47 ± 0.17 0.70 0.78

Actigraphy + HRV features† (A) 0.50 ± 0.14∗ 0.72 0.81

F: using fixed boundaries (without adaptation) on the HRV spectral features.

A: using adaptive boundaries (with adaptation) on the HRV spectral features.†The HRV features consist of the spectral features and the non-spectral features selected from

a larger feature set used in [89].∗The difference between using fixed and using adaptive boundaries is significant, examined

with a paired Wilcoxon signed-rank test (with p < 0.05).

lower than that of the actigraphy feature which addresses body motion during bedtime. As

known in literature, body motion activity often happens during wake states [74, 258].

2.4.2 Classification

As shown in Table 2.4, in general, adapting the boundaries of the HRV spectral features can

improve the performance as evaluated by the three metrics. For the healthy group, it is inter-

esting to note that the value of κ is similar when using HRV spectral features with and without

boundary adaptation. This seems to contradict the significant increase in discriminating power

found with the Hellinger distance. Upon closer inspection we found that actually this occurs

only for that single point in the solution space. In fact, when evaluating the performance over

the entire solution space with AUCPR we see an increase from 0.30 to 0.36. The ROC and PR

curves (plotted on Figure 2.6 and Figure 2.7, respectively) with the use of HRV spectral fea-

tures clearly show that the adapted versions are superior to the original ones, particularly in the

region when recall is lower than about 0.30 or larger than about 0.60. For the insomnia group,

the figures also indicate a clear improvement after adapting the HRV spectral features.

When combining the HRV spectral features with the additional HRV features indicated ear-

lier, we see a significant increase (Wilcoxon test, p < 0.01) in κ from 0.44± 0.25 (without


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Healthy subjects

1 speci cityfi−

Sensitiv

ity

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 speci cityfi−

Sensitiv

ity

(b) Insomniacs

HRV spectral features (without adaptation)

HRV spectral features (with adaptation)

HRV features (without adaptation)

HRV features (with adaptation)

Actigraphy

Actigraphy + HRV features (without adaptation)

Actigraphy + HRV features (with adaptation)

Figure 2.6: Pooled ROC curves for sleep and wake classification using different feature sets with and

without adaptation for healthy subjects (a) and insomniacs (b).


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Healthy subjects

Recall

Pre

cis

ion

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

(b) Insomniacs

HRV spectral features (without adaptation)

HRV spectral features (with adaptation)

HRV features (without adaptation)

HRV features (with adaptation)

Actigraphy

Actigraphy + HRV features (without adaptation)

Actigraphy + HRV features (with adaptation)

Figure 2.7: Pooled PR curves for sleep and wake classification using different feature sets with and

without adaptation for healthy subjects (a) and insomniacs (b).


adaptation) to 0.48±0.24 (with adaptation) for the healthy group and from 0.31±0.11 (without

adaptation) to 0.34±0.12 (with adaptation) for the insomnia group. The Wilcoxon significance

test performed pair-wise comparison for each subject, thus indicating that boundary adaptation

improved the classification performance for the majority of the subjects. Likewise, the pooled

AUCPR and AUCROC metrics increased when applying boundary adaptation. As shown in Ta-

ble 2.4, the variations of κ are relatively large compared to the mean values, indicating a large

between-subject variability in the classification performance.

For comparison purposes, Table 2.3 and Table 2.4 also show the classification results using

the actigraphy feature ac . As expected, for the healthy group, it outperforms the HRV features.

For the insomnia group, although the κ value of using the HRV feature set generally is lower

than using ac , the HRV features (in particular the adapted versions) outperforms this actigraphy

feature when recall is higher than ∼0.55 (see Figure 2.6 and Figure 2.7). This indicates that the

sensitivity to wake might be increased by adding these HRV features for the insomnia subjects.

It also highlights the disadvantage of a metric such as κ , which only represents a single point

reflecting a single solution in the space.

The classification results with the actigraphy and the HRV features are also given in the

tables. Although actigraphy is adequate for sleep and wake classification, combining it with the

HRV features (in particular when applying boundary adaptation on the HRV spectral features)

significantly increases the classification performance measured by κ value. The significance

was confirmed with a Wilcoxon signed-rank test (p < 0.01).

2.4.3 Healthy subjects versus insomniacs

To compare between the healthy subjects and insomniacs, it makes less sense to use the pooled

AUCPR metric due to the difference in the ratio between the numbers of sleep and wake epochs

in both groups. For instance, using a decision making rule such that all epochs are classified as

wake (i.e. recall = 1), it will lead to different precision for the healthy and insomnia groups, with

∼92% and ∼70%, respectively, which only depends on their prior probabilities. Differences in

class balance prevent a comparison between the area under the curves of each group. Therefore,

here we used the pooled AUCROC metric instead. Figure 2.6 illustrates that the sleep and

wake classification performances with different feature sets for the healthy subjects are much

better in contrast to that for the insomniacs. This confirms earlier findings, which show that

discrimination between wake and sleep (especially REM sleep) is more difficult in insomniacs

than in healthy subjects, when using cardiac activity [283] or actigraphy [175].

2.4.4 Determination of adaptive boundaries

The method described in this chapter shows a time-varying adaptation of the HRV spectral

features that offer higher discriminative power in classifying sleep and wake states. The features

are used as inputs to a sleep and wake classifier. We re-defined the spectral boundaries which

are adapted to the spectrum information (related to autonomic activity) that can be obtained

before feature extraction. This is because it is aimed at finding frequency bands that can more


accurately capture certain aspects of physiology during sleep. For instance, the HF band should

only includes respiratory activity rather than sympathetic activation, which should be in the

LF band. An excessively larger HF bandwidth might incorrectly include the “spillovered”

spectral power from sympathetic activation (see Figure 2.4). For this purpose, we used an HF∗

bandwidth of 0.1 Hz instead of the 0.25 Hz used in the traditional HF band. Alternatively, rather

than using a constant HF bandwidth (0.1 Hz) in this study, it can be determined by measuring

respiratory effort signals and analyzing their PSDs [117], but the use of an additional sensor is

required.

Additionally, we observed that the LF and HF bands can overlap under different circum-

stances: when the peak in the LF and in the HF band are close to each other, when there is no

clear peak in the HF band, or when the respiratory-frequency peak is below 0.15 Hz and there-

fore lies in the traditional LF band. Such overlaps (or spillovers) can be observed in Figure 2.4.

In these situations, the overlapped part of the spectrum components will actually influence the

features computed for both the LF∗ and the HF∗ bands. This may have an impact in the clas-

sification process, decreasing the accuracy of the classifier. Therefore, a more accurate method

is needed for defining a threshold which separates the two bands rather than just using fixed

bandwidths. This merits further investigation.

Finally, as we mentioned, the respiratory information was derived from the HRV data. Al-

though this may not be as good an estimation as a direct measure of respiratory effort, it has

been proven to be an available estimate of respiratory rate especially during sleep [79]. More

importantly, it does not require the use of an additional sensor to measure respiratory effort. Al-

ternatively, the respiration rate can also be estimated from the ECG signal directly, for example

by computing the changes in the “envelope” of the ECG due to the modulation induced by the

respiration movements [203]. This method will be further studied in future work.

2.5 Conclusion

In this chapter, we used a method based on the time-frequency analysis of HRV spectral power

to adapt HRV spectral features. It aimed at providing more accurate interpretations of the sym-

pathetic and respiratory activities in order to better discriminate between sleep and wake states.

It was achieved by adapting the spectral boundaries according to the peaks found in HF and LF

bands of the HRV power spectral density. The adaptation improved the discriminative power

of the HRV spectral features, and therefore enhanced the sleep and wake classification perfor-

mance, especially after combining the adapted HRV spectral features with the other selected

HRV non-spectral features. Using a linear discriminant classifier tested with leave-one-subject-

out cross-validation, we achieved a significant increase on Cohen’s Kappa coefficient κ (from

0.44 to 0.48 for healthy subjects and from 0.31 to 0.34 for insomniacs). Furthermore, by com-

bining these HRV features and actigraphy, we obtained a significantly increased κ compared

with that obtained when only using actigraphy (0.64 versus 0.53 for the healthy group and 0.50

versus 0.45 for the insomnia group).

CHAPTER 3

Sleep and wake classification with actigraphy and

respiratory effort using dynamic warping

This chapter is adapted from: X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and

Wake Classification with Actigraphy and Respiratory Effort using Dynamic Warping. IEEE Journal of

Biomedical and Health Informatics, 18(4):1272–1284, 2014, c©IEEE

Abstract – This chapter proposes the use of dynamic warping (DW) methods for improving

automatic sleep and wake classification using actigraphy and respiratory effort. DW is an al-

gorithm that finds an optimal non-linear alignment between two series allowing scaling and

shifting. It is widely used to quantify (dis)similarity between two series. To compare the res-

piratory effort between sleep and wake states by means of (dis)similarity, we constructed two

novel features based on DW. For a given epoch of a respiratory effort recording, the features

search for the optimally aligned epoch within the same recording in time and frequency do-

main. This is expected to yield a high (or low) similarity score when this epoch is sleep (or

wake). Since the comparison occurs throughout the entire-night recording of a subject, it may

reduce the effects of within- and between-subject variations of respiratory effort, and thus help

discriminate between sleep and wake states. The DW-based features were evaluated using a

Linear Discriminant classifier on a dataset of 15 healthy subjects. Results show that the DW-

based features can provide a Cohen’s Kappa coefficient of agreement κ = 0.59 which is signifi-

cantly higher than the existing respiratory-based features and is comparable to actigraphy. After

combining the actigraphy and the DW-based features, the classifier achieved a κ of 0.66 and an

overall accuracy of 95.7%, outperforming an earlier actigraphy- and respiratory-based feature

set (κ = 0.62). The results are also comparable with those obtained using an actigraphy- and

cardiorespiratory-based feature set but have the important advantage that they do not require an

ECG signal to be recorded.

29

30 Chapter 3. Dynamic warping on respiratory effort

3.1 Introduction

Sleep plays an important role in human’s emotional wellbeing and physical health. Many peo-

ple live with sleep-related problems (e.g., insomnia and obstructive sleep apnea) that have a

primary implication of one’s health condition [27, 247, 248]. Objective assessment of sleep

is often based on the monitoring of sleep and wake stages throughout the entire night during

bedtime [89, 151]. According to the guidelines of the American Academy of Sleep Medicine

(AASM) [136], the sleep stages consist of rapid-eye-movement (REM) and non-REM (NREM,

including N1, N2, and N3) sleep.

Overnight polysomnography (PSG) recordings with manually annotated hypnograms are

considered the “gold standard” for objectively analyzing sleep architecture and occurrence

of specific sleep-related problems [247]. A PSG typically comprises physiological data such

as the electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), elec-

troocculogram (EOG), oxygen saturation, and respiratory effort. When used for sleep staging,

recorded signals are typically split in non-overlapping epochs of 30 s each in accordance with

the Rechtschaffen and Kales (R&K) rules [247] and also the more recent AASM guidelines

[136].

Although PSG is the gold standard for sleep assessment, it has several drawbacks such as the

high costs of laboratory facilities, disruption of “normal” sleep, and impossibility to perform

long-term monitoring. This has motivated the investigation of sensors/methods that allow for a

reliable acquisition of physiological modalities in an unobtrusive or at least more comfortable

and convenient way. In particular, actigraphy and cardiorespiratory signals have been often

considered in the context of automatic sleep monitoring [89, 248].

Actigraphy is a less-unobtrusive way of measuring the body movement of a subject based

on an accelerometer, which is typically worn on wrist. It has been extensively studied [18, 74,

126, 204, 234, 295] and is considered a standard method for sleep assessment when PSG is

not available [204]. However, researchers argue that actigraphy accounts for error when com-

pared with PSG [295]; and it can not cope with the misclassification of ‘quiet-wake’ with a

low body activity, resulting in low accuracy in detecting wake state [18, 234]. Since actigra-

phy only measures body movement, it reflects limited physiological information. It has been

shown that cardiorespiratory signals contain relevant physiological information which can help

improve actigraphy-based sleep and wake classification [89, 150, 226]. More importantly, these

signal modalities can be acquired in an unobtrusive circumstance in different ways (e.g., bal-

listocardiogram [189], Doppler radar [194], near-infrared camera [166], under-pillow sensor

[66], bed sensor [303]). For example, acquiring cardiorespiratory information using a static-

charge-sensitive bed (SCSB) [140, 158] has been investigated; and in recent years it becomes

more popular for unobtrusive (or non-contact) monitoring of sleep [161, 303]. However, dif-

ficulty has been found in discriminating between wake and REM sleep [249] when only using

cardiorespiratory signals. So it is necessarily important to improve the sleep and wake clas-

sification when actigraphy is absent. On the other hand, cardiac activity is relatively difficult

to capture reliably in an unobtrusive manner, particularly when compared with body move-

ment and respiratory activity [158]. For example, a novel radio-frequency sensing system [85],


which can only capture respiratory effort, was developed for sleep/wake measurement. Thus,

enhancing the sleep and wake classification performance when without cardiac activity is also

of importance. This work therefore addresses the problem of obtaining a reliable sleep and

wake classification based on the following physiological signal modalities: (1) only respiratory

effort, and (2) the combination of actigraphy and respiratory effort.

As presented in previous studies, a large amount of features have been explored for sleep

and wake classification [74, 89, 248]. As long as either ECG or actigraphy is excluded, the clas-

sification performance will degrade to a certain degree [89, 150, 151]. In this work we present

new features based on respiratory effort, which result in a classification performance not only

better than the previous respiratory feature set (and the actigrapgy feature), but also comparable

to the cardiorespiratory feature set described in [89]. Compared to that work, this study does

not require ECG signals, which is particularly well-suited to the problem of unobtrusive sleep

and wake classification.

It is known that the breathing rhythm is usually more stable and more regular during sleep

than when awake [111, 163]. After observing different respiratory effort signals in the time

and the frequency domains, we found that the morphology of the respiratory waveform and the

properties of its power spectral density (PSD) differ between sleep and wake epochs. As illus-

trated in Figure 3.1, the respiratory effort is more regular during sleep than during wake. Note

that the irregularity of respiratory effort would also be caused by body motions. Additionally,

the PSD of the respiratory effort signal of a sleep epoch is typically distributed with a clear peak

indicating the dominant respiration frequency, while that of a wake epoch often distributes with

multiple peaks. Therefore, it is assumed that, a sleep epoch is more similar to another sleep

epoch and less similar to a wake epoch from the perspective of “series shape”, regardless of be-

ing in the time or in the frequency domain. We thereby concentrate on two questions: (1) how

to quantify the “(dis)similarity” between two series in terms of their morphological properties,

and (2) which template best reflects the shape of a specific state (sleep/wake)?

Dynamic Warping (DW) algorithms have been used to assess (dis)similarity of two data

series with respect to their values. In particular, Dynamic Time Warping (DTW) [37] is a signal

matching algorithm that represents the time-alignment between two time series via dynamic

programming by means of a total cumulative distance function. It can therefore be used to

establish the degree to which two patterns match. Dynamic Frequency Warping (DFW) [209]

is an exact analog of DTW but applied in the frequency domain, where it aims at aligning two

PSD curves (often known as spectrogram frames). When used with respiratory effort signals,

DTW is expected to find a good match between the waveforms of the respiratory effort in

two separate sleep periods. In contrast, it should not find any good match of the respiratory

waveform between a sleep and a wake period, or even between two distinct wake periods. This

is simply because the breathing pattern during wake is usually not as regular as it is during

sleep, and sometimes it is more related to body motion artifacts. Analogously to DTW, DFW

can help distinguish respiratory PSD curve between a sleep and a wake state. Using DTW and

DFW we can express the (dis)similarity of signals in the time and in the frequency domains,

and accordingly capture properties of the respiratory effort signals which are characteristic of


0 20 40 60

Time (s)

Resp. effort

(a.u

.)

0 20 40 60

Time (s)

Resp. effort

(a.u

.)0 0.2 0.4 0.6 0.7

Frequency (Hz)

PS

D (

a.u

.)

0 0.2 0.4 0.6 0.7

Frequency (Hz)P

SD

(a.u

.)

Sleep Wake

Sleep Wake

Figure 3.1: Typical examples of respiratory time series (a) during sleep and (b) during wake in a period

of one min, and respiratory PSD series (c) during sleep and (d) during wake.

sleep and wake.

In this chapter we propose two respiratory-based features based on DW algorithms to dis-

criminate the respiration pattern between a sleep and a wake state. More concretely, one feature

uses DTW to calculate dissimilarity scores in the time domain and is applied on the respiratory

(effort) time series; the other uses DFW to calculate dissimilarity scores in the frequency do-

main and is applied on the respiratory PSD series. Both algorithms find an optimal alignment

between two discrete data series allowing variations in two dimensions, e.g., scaling or shift-

ing, and amplitude or offset [37, 152, 209]. For a given epoch from a subject’s recording, the

features search for the most similar epoch (i.e., optimally aligned epoch) as a template over

some other epochs of the same recording based on DW, instead of using a globally pre-defined

template for all subjects. This may possibly reduce the impact of the physiological differences

between subjects. Besides, because these epochs are all taken from the same subject and the

properties of the respiratory activity will not change dramatically throughout the night, the im-

pact of within-subject variation might be small. Consequently, these would potentially increase

the classification performance across the entire data set.

DW has been widely applied to recognize patterns in various topics such as speech process-

ing [241], fingerprint verification [162], and gene expression [1]. However, to our knowledge,

studies exploring the application of DW in association with sleep staging do not seem to exist.

3.2 Subjects and data

The data set was comprised of single-night PSG recordings and actigraphy (Actiwatch, Philips

Respironics) of fifteen healthy adults. Inclusion in the data collection trial was defined by a

score lower than 6 on the Pittsburg Sleep Quality Index (PSQI) [60]. For each subject, full


Table 3.1: Subject Demographics

Parameter Mean ± SD Range

Sex 5 males and 10 females

Age (y) 31.0 ± 10.4 23 − 58

Body mass index, BMI (kg/m2) 24.4 ± 3.3 20.2 − 31.2

Total recording time (h) 7.2 ± 1.1 4.2 − 9.1

Number of total epochs 866.2 ± 135.6 507 − 1092

Sleep efficiency∗ (%) 92.3 ± 3.8 86.0 − 97.9

For some subjects, only a portion of recording was used because EEG

electrodes fell off during the night.∗Ratio between total sleep time and total time in bed (here equal to the

recording length) based on the manual scores.

PSG was recorded according to the guidelines of the AASM [136]. The PSG recordings of

nine subjects were recorded in the Sleep Health Center, Boston, USA, during 2009 (Alice 5

PSG, Philips Respironics) and of six subjects in the Philips Experience Lab of the High Tech

Campus in Eindhoven, The Netherlands, during 2010 (Vitaport 3 PSG, TEMEC). The subject

demographics are presented in Table 3.1 as mean ± standard deviation (SD) and range. The

Ethics Committee of the two sleep laboratories (or labs) approved the study protocol and all

subjects signed an informed consent form.

Actigraphy was obtained with the wrist-worn Actiwatch where acceleration data, caused by

body movements, were recorded and converted into activity counts per second (influenced by

the intensity and frequency of acceleration) [229, 254]. The thoracic respiratory effort signal

was recorded using respiratory inductance plethysmography with a sampling rate of 10 Hz.

Note that the recordings from the Actiwatch were synchronized with those from the PSG, using

markers in both the Actiwatch and the PSG clocks.

Sleep stages were scored on 30-s epochs by sleep experts based on the AASM guidelines

as wake, REM sleep, and three NREM stages N1-N3. For sleep and wake classification, we

considered two classes wake and sleep (including REM and NREM sleep). Each PSG recording

was manually clipped to the time interval comprised between the instant when the subject turned

the lights OFF with the intention of sleeping until the moment the lights were turned ON before

the subject got out of bed in the morning.

3.3 Methods

3.3.1 Dynamic warping algorithm

3.3.1.1 Dynamic warping distance

DW computes a distance between two series by non-linearly aligning them in a given dimen-

sion. Consider two series:

A = a1,a2, ...,ai, ...,an (length n), (3.1)


m

i

j

r

n

Warping path

Warping band(upper)

Warping band(lower)

w1

wk

wK

B

A

Figure 3.2: An example of DW process between two series A and B, where the warping path (circle

markers) and the Sakoe-Chiba warping bands with the size of r (dash lines) are indicated.

B = b1,b2, ...,b j, ...,bm (length m). (3.2)

These two series can be arranged such that they form an n-by-m “warping matrix”, where each

element of the matrix (i, j) is given by a distance function D, expressing the squared distance

between ai and b j:

D(i, j) = (ai −b j)2. (3.3)

A warping path maps the elements of A and B through the matrix so that the total cumulative

distance between them is minimized. The warping path W belongs to a set Ω including all

possible warping paths, and is denoted as

W = w1,w2, ...,wk, ...,wK (length K), (3.4)

where wk = (i, j)k is the kth element of the warping path W and max(n,m) ≤ K ≤ m+ n− 1.

The DW distance between the two series is the minimum measure based on W such that:

DW (A,B) = min

[1

K

√

∑K

k=1wk

]

, W ∈ Ω, (3.5)

where the distance is normalized by a factor K (path length). Figure 3.2 illustrates an example

of the dynamic warping procedure between two series A and B in a 2-D space.


3.3.1.2 Warping conditions

Since the DW algorithm searches for an optimal warping path through all possible paths, the

number of possible combinations quickly explodes with the length of the series. The search

space can be reduced by means of “conditions”, which help to effectively mitigate the quadratic

complexity of the algorithm [37]. Several conditions are used to decrease the number of

possible paths including continuity, monotonicity, slope constraint, and boundary constraint

[37, 245]. They can be used to construct a warping path specified by a recurrence:

∆(i, j) = D(i, j)+min[∆(i−1, j−1),∆(i−1, j),∆(i, j−1)], (3.6)

where the cumulative distance ∆(i, j) is defined as the sum of the distance D(i, j) found in a

warping step with the minimum of the cumulative distances of the adjacent elements on the

warping matrix.

Additionally, the warping path can be restricted by a band of size r (i.e., |ik − jk| ≤ r) on

both sides of the diagonal points of the warping matrix to reduce computational complexity of

a DW procedure (i.e., to reduce search space of the warping matrix). It is called warping band

condition, and the corresponding band is commonly known as the Sakoe-Chiba band [261] (see

Figure 3.2). In regard to the warping band condition, using a band size r that is too large often

results in “over-warping” the periodic series with multiple cycles and thus introducing artificial

features [71]. These artificial features usually occur when the warping path takes excessive

numbers of non-diagonal (i.e., vertical or horizontal) moves. While a very small band size may

account for “under-warping” between two series (the extreme cases is the Euclidean alignment

that corresponds to the diagonal line of the warping matrix) [152]. Over-warping and under-

warping are both undesirable. To determine a suitable band size r, we search for the parameter

value that would result in the highest feature discriminative power. This will be presented later

in Section 3.3.3.

3.3.1.3 DW versus Euclidean

The Euclidean distance (computed as a sequential mapping of two series) is a special case

of the DW distance, where the warping path coincides with the diagonal line of the warping

matrix. It is known to be sensitive to distortion in the horizontal dimension of a series [245].

Figure 3.3 depicts an example of the Euclidean and the DW alignments between series A and

B. It illustrates that the DW allows them to scale or shift along the horizontal dimension. Thus,

in this example, the DW distance should be smaller than the Euclidean distance.

3.3.1.4 DTW and DFW distance

When the DW algorithm is used to compute the distance between two time series AT and BT ,

it is called “DTW algorithm” with corresponding DTW distance. Similarly, it is called “DFW

algorithm” with corresponding DFW distance when used to compute the distance between two

frequency (or PSD) series AF and BF . The superscripts indicate the time series (T ) and PSD se-

ries (F). These two distance measures can be obtained based on Equation 3.5 and 3.6 described

before.


Euclidean DW

A

B

A

B

Figure 3.3: An example of the alignment between two series (A and B) when computing the Euclidean

(Left) and DW (Right) distances.

3.3.2 Sleep and wake classification

3.3.2.1 Signal preprocessing and PSD estimation

Before feature extraction, the respiratory effort signal of each recording is first low-pass filtered

(using a 10th order Butterworth filter with a cut-off frequency of 0.7 Hz) to eliminate high

frequency noise, after which the baseline is removed by subtracting the median peak-to-trough

amplitude estimated over the entire recording. On the other hand, for each epoch, a Short-

Time Fourier Transform (STFT) can be used to estimate a PSD based on the resulting pre-

processed respiratory effort signal according to the following procedure: the resulting signal is

first divided in 60-s frames centered on the epoch of interest, with an frame-to-frame overlap

of 50%; after that, a Hanning window with a length of 60 s is used to reduce spectral leakage;

the spectrum is then computed using the Fast Fourier Transform (FFT); finally, the absolute

spectral values along the positive frequency axis are squared, yielding the PSD estimate for this

epoch.

3.3.2.2 Feature extraction

Respiratory effort and actigraphy are considered, from which features are extracted for sleep

and wake classification. First, an actigraphy feature can be extracted from the output (activity

counts per second) of the Actiwatch. Second, we introduce two new features: a DTW fea-

ture called “minimum DTW distance” and a DFW feature called “minimum DFW distance”,

extracted from the pre-processed respiratory effort signal.

The actigraphy feature (ac ) is first calculated as the sum of activity counts over one epoch

with 30 s; then it is smoothed via a weighted moving average method with a window size of 9

epochs in order to eliminate noise introduced during measurement [89]. This feature gives an

indication of gross body movements during sleep.

The DTW feature (dtw ) is computed based on the respiratory time series of a subject with

the DTW algorithm described earlier. For each epoch, it measures the maximum similarity

in the time domain between that epoch and a “time-series template” having the same length.

Assume that the respiratory effort data recorded for a given subject is split in L non-overlapping

epochs. Each of them consists of a collection of N data points in the time domain with a length


0 50 100 150 200 250 300

Time series sample(a)

0 20 40 60 80 100 120 140

Freq. series sample(d)

0 50 100 150 200 250 300

Time series sample(b)

0 50 100 150 200 250 300

Time series sample(c)

0 20 40 60 80 100 120 140

Freq. series sample(e)

0 20 40 60 80 100 120 140

Freq. series sample(f)

Sleep

Sleep

Sleep

Sleep

Wake

Wake

Wake

Wake

Wake

Sleep

Wake

Sleep

DTW distance = 0.003



DFW distance = 0.7e 4−



Figure 3.4: Examples of DTW alignments of the respiratory time series (a-c) and of DFW alignments of

respiratory PSD series (d-f), respectively, between two sleep epochs (S-S), between a wake and a sleep

epoch (W-S), and between two wake epochs (W-W). Each time series lasts 30 s sampled at 10 Hz and

each PSD series contains 144 samples falling within a frequency range of 0 to ∼0.7 Hz. The values of

corresponding DTW and DFW distances are indicated.

of 30 s, such that:

ET (L) = ET1 ,E

T2 , ...,E

Tp , ...,E

TL , (3.7)

where ETp = xp,1,xp,2, ...,xp,N is the time series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤ L)

and N is the number of data points per epoch (N = 300 at a signal sample rate of 10 Hz). In

order to compute the feature value for a given epoch of a recording, the template needs to be

determined. We search for the template based on a window ΛT with a size of 2λ T (<2λ T when

p < λ T or p > L−λ T ) centered on the given epoch (±λ T ), where this epoch itself should be

excluded to avoid “self-alignment”. Thus, for the pth epoch ETp , the time-series template ΓT

p is

selected using

ΓTp = argmin

ETq

DW (ETp ,E

Tq )

for all q ∈ Z+, |q− p| ≤ λ T ,and q 6= p,

(3.8)


where λ T is a positive integer with 1 ≤ λ T ≤ L−1. Then the feature value of the pth epoch is

computed by

dtw (p) = DW (ETp ,Γ

Tp ). (3.9)

It means that we choose, as the feature value, the minimum of all DTW distances between the

given epoch ETp and all the other epochs within a searching window ΛT .

The DFW feature (dfw ) is computed based on the DFW algorithm. The procedure of com-

puting dfw is the same as that of computing dtw , but for a respiratory PSD series rather than

its time series. This feature compares the shape of the PSD curve between a given epoch and a

“frequency-series template” with an indication of maximum similarity in the frequency domain.

Therefore, the feature value of dfw for the pth epoch is obtained as

dfw (p) = DW (EFp ,Γ

Fp ), (3.10)

where EFp = ϕp,1,ϕp,2, ...,ϕp,M is the PSD series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤

L), containing M frequency bins and ΓFp is the selected frequency-series template. Here the

template searching window of the DFW feature is ΛF with a size of 2λ F epochs. As explained

before, the PSD series are obtained after STFT, for each of which the number of frequency bins

is M = 144 in a frequency range between 0 and ∼0.7 Hz (a subset of the original spectrum

with 1024 frequency bins in the range of 0 to 5 Hz). We limit the comparison of the PSD of

each epoch to this frequency range since it can be observed that the frequency components of

a healthy subject’s respiration during sleep are usually below 0.7 Hz. We experimentally found

that including higher frequency components would result in a lower discriminative power of

the feature since they carry very small but unexpected non-zero noise that would contaminate

the DFW alignment.

The use of template searching window is to reduce the computational complexity when ex-

tracting the DW features, restricting the search for minimum DW value to that window. An

assumption here is that, for a given epoch, it will always offer a suitable template by search-

ing from all the other epochs within the window except the given epoch. The procedure of

determining λ T and λ F will be presented in Section 3.3.3.

3.3.2.3 Understanding of DW-based features

Intuitively, there should be higher similarities of respiratory waveform and PSD shape between

any two sleep epochs than between a wake and a sleep epoch or between two wake epochs.

This will be expressed by the minimum DTW and minimum DFW distances found for each

epoch. To further understand this, we consider two simple cases: the current epoch is sleep or

is wake. Then the feature dtw (or dfw ) of this epoch may have three possible situations, where

the minimal DTW (or DFW) distance may occur: between two sleep epochs (S-S), between

a wake and a sleep epoch (W-S), or between two wake epochs (W-W). Regarding the DTW

feature, we can state the following: (1) if the current epoch is sleep, it is likely to find a small

value of DTW distance after searching for similarities of signal waveform between this epoch

and the remaining epochs in a certain window, since S-S may happen; (2) if the current epoch is


wake, it is not likely to obtain a small feature value, because W-S or W-W may happen. For this

reason, this feature will in turn have discriminative power for distinguishing sleep and wake

states. Regarding the DFW feature, the same reasoning applies; but instead of (dis)similarities

of respiratory waveform, this feature expresses (dis)similarities in the shape of PSD series.

Figure 3.4 depicts two examples of the alignment found by DTW and DFW between epochs,

in the three situations (S-S, W-S, and W-W).

It should be kept in mind that the respiratory waveform and PSD shape might carry some

information of body motion artifacts, which often appear during wake state. This would possi-

bly lead to irregularity of a recorded respiratory effort. As illustrated in Figure 3.5, some peaks

(e.g., around the 420th epoch) of the DW-based features seem correlated to the actigraphy fea-

ture (ac ), expressing the activity counts. It means that these two features might help detect

body motion artifacts. On the other hand, some peaks (e.g., around the 750th epoch) of the

DW-based features seem related to the wake epochs, but where no activity counts are observed.

These peaks might possibly be in correspondence with irregular breathing rhythm.

3.3.2.4 Classifier

A linear discriminant (LD) classifier is adopted in this study. It has been previously proved to

be appropriate for the task of sleep and wake classification using actigraphy, respiratory, and

cardiac data [89, 108, 178, 249]. The details of an LD classifier can be found in [249] and [97].

Note that the classifier used here is based on epoch-by-epoch classification.

Regarding the prior probability in the LD classifier, it can be observed that the probabilities

of different classes vary throughout the night. For example, the probability of being awake

just right after sleep onset or at the end of the night is much higher than in the middle of the

night. To exploit these variations, we compute a time-varying prior probability for each epoch

by counting the relative frequency that specific epoch was annotated as each class [108, 249].

3.3.3 Experiments and evaluation

3.3.3.1 Experimental validation

Due to the relatively small size of our data set, it is not appropriate to split it into separate train-

ing and test sets. To alleviate this issue, a leave-one-subject-out cross validation (LOSOCV)

procedure [97] can be used to evaluate the performance of our sleep and wake classifier. Given

a set of feature vectors, we first divide it into l subsets (corresponding to l = 15 subjects in

this study). On each iteration, one subset is used as test set and the remaining subsets are used

to train the classifier. The classifier is evaluated on each test set, obtaining its performance

for each iteration of the cross-validation. Finally, results are averaged and pooled to obtain an

indication of the overall performance.

3.3.3.2 Evaluation

To evaluate the performance of our classifier, overall accuracy (i.e., ratio of correctly identified

samples to the total number of samples) used in a binary classification problem is not the most


Annota

tion

Resp

.dt

wdf

w

0 100 200 300 400 500 600 700 800

Time (30-s epoch)-

ac

Wake

Sleep

(e)

(d)

(c)

(b)

(a)

Figure 3.5: An example of (a) manually scored sleep/wake annotation, (b) respiratory effort recording

at 10 Hz, and feature values of (c) dtw , (d) dfw , and (e) ac for each 30-s epoch of a healthy subject.

adequate. The reason is that during a recording of a whole night the number of epochs of the

wake class (accounting for 7.6% of all epochs) is much smaller than that of the sleep class

(accounting for 92.4% of all epochs), in what is usually called an “imbalanced class distribu-

tion” [125]. Thus we also consider the metrics specificity (proportion of correctly identified

actual negatives), sensitivity or recall (proportion of correctly identified positives), and preci-

sion (ratio of true positives to true positives plus false positives). Besides to these metrics,

the Cohen’s Kappa coefficient of agreement κ [72] provides a more insightful measure of the

general performance of the classifier (0-0.20: slight, 0.21-0.40: fair, 0.41-0.60: moderate, 0.61-

0.80: substantial, and 0.81-1: almost perfect agreement [172]); but it only represents a single

point in the entire solution space [237]. In order to have an overview of the performance across

the entire solution space, we use a Precision-Recall (PR) curve [103], which plots precision ver-

sus recall by varying the classifier’s decision-making threshold. Compared with the well-known

Receiver Operating Characteristic (ROC) curve that has been shown to be over-optimistic when

the data set is heavily imbalanced between classes [83], a PR curve gives a more conservative

view of the classifier’s performance. The corresponding ‘Area Under the PR Curve’ (AUCPR)

can then be estimated [83]. In the remainder of the chapter, we will consider wake and sleep as

the positive and negative classes, respectively.

An absolute standardized mean difference (ASMD) metric is utilized to evaluate the discrim-

inative power (i.e., separability) of a single feature. It computes as the absolute mean difference

of the feature values between sleep and wake epochs divided by the standard deviation among

that of all epochs. A Mann-Whitney unpaired (1-sided) test is applied to check whether the

feature values of the two classes significantly differ. Moreover, the Spearman’s rank correla-

tion coefficient (denoted as ρ) measures the correlation between features. The significance of


correlation can be examined with a Student’s t-test.

In addition to evaluating the feature discriminative power between sleep and wake epochs,

more specifically, we will also evaluate that between wake and REM epochs and between

sleep and quiet-wake epochs. This is because the sleep and wake misclassification often oc-

curs between wake and REM epochs by means of the traditional (cardio)respiratory features

[249, 283], and between quiet-wake and sleep epochs when using actigraphy only [18]. Here

the quiet-wake is defined as the wake with computed activity counts lower than 4.5, approxi-

mate to the mean value of all the sleep epochs.

The ASMD metric can also be used to determine the parameters (i.e., the Sakoe-Chiba band

size rT and rF and the template searching window size λ T and λ F ) for computing the DW-

based features. For each feature, a grid search method is applied for the two parameters that

optimize the feature’s ASMD value. To obtain an unbiased determination, the grid search is

therefore run on each training set during the LOSOCV procedure. Then for each parameter, the

determined value is in the majority of the optimal values occurred on different training sets.

For the purpose of objectively assessing different aspects of sleep, it makes sense to eval-

uate the performance of the classifier in respect to its ability to deliver good estimates of so-

called “sleep statistics”. The sleep statistics include: total sleep time (TST), total wake time

(TWT), sleep efficiency (SE) computed as the ratio of TST to total time in bed, sleep onset la-

tency (SOL) computed as the time it took before the subject fell asleep, wake after sleep onset

(WASO), and snooze time (ST). Since we are considering exclusively sleep and wake states in

this study, SOL is defined as the period between the beginning of a recording and the first epoch

that is annotated (or classified) as sleep according to the AASM guidelines. For the computa-

tion of ST, we follow a similar criterion, measuring the period between the last epoch that is

annotated (or classified) as sleep and the end of the recording. Keep in mind that the recordings

are restricted to the intervals from lights off until lights on. For each statistic, we compute the

error as the difference value (estimation bias) and as the absolute difference value (absolute

error) between the reference (computed based on the PSG-based manual annotation) and the

estimate (computed based on the classification result). Furthermore, we apply Bland-Altman

scatter plots to assess the degree of agreement between the PSG-based and estimated statistics.

3.3.3.3 Classification performance comparison

The actigraphy and the DW-based features used in this study are first compared. They are

denoted as FAC, FDTW, and FDFW for comparison with other feature sets (see Table 3.2).

Our earlier studies [89, 108] have considered a large amount of features for sleep and wake

classification. In those studies, a subset of features were selected from them based on the feature

selection method described in [108]. It consists of five features – an actigraphy feature activity

counts (ac ); three respiratory features including standard deviation of respiratory frequency

over 9 epochs (sdf ), high frequency components (hfc ) [248], and non-linear measure by

means of sample entropy (se ) [75]; and a cardiac feature mean heart rate (mhr ). However,

these selected respiratory features do not or less reflect characteristic morphological properties

of respiratory effort waveform or their variation over time by means of PSD shape. Those


Table 3.2: Summary of feature sets

Feature set Features # Signal modality∗

FAC ac 1 A

FDTW dtw 1 R

FDFW dfw 1 R

FR1 sdf , hfc , se 3 R

FR2 sdf , hfc , se , dtw , dfw 5 R

FDW dtw , dfw 2 R

FAR1 ac , sdf , hfc , se 4 A, R

FAR2 ac , sdf , hfc , se , dtw , dfw 6 A, R

FAC-DW ac , dtw , dfw 3 A, R

FARC1 ac , sdf , hfc , se , mhr 5 A, R, C

FARC2 ac , sdf , hfc , se , dtw , dfw , mhr 7 A, R, C

∗A: actigraphy data; R: respiratory effort data; C: cardiac data.

properties will be exploited with the introduction of the new DW-based features.

To understand whether the new DW-based features add discriminative power to a sleep

and wake classifier that uses the selected features extracted from different signal modalities,

we consider three respiratory-based feature sets and three actigraphy- and respiratory-based

feature sets, in which the features included are presented in Table 3.2. For the comparison of

classification performance with and without cardiac information, two feature sets FARC1 and

FARC2 including all the previously selected features (or together with the DW-based features)

are also considered.

Since our data were collected from two distinct sleep labs (Boston or Eindhoven), the lab-

effect (possibly caused by the difference of PSG setup during measurement between labs) on

sleep and wake classification is then analyzed by using one data set for training and the other

for testing.

3.3.3.4 Computational complexity of DW-based features

The original dynamic programming (without any conditions) is extraordinarily computationally

intensive because it searches through all possible warping paths [37]. The use of the warping

conditions can, to a great extent, speed up the DW computation [37, 152]. To compare the

computational complexity when extracting DW-based features, three approaches are considered

as follows.

• The most commonly used DW approach is the one with the warping conditions but with-

out the warping band condition [205]. It requires a computational complexity of O(N2),

where the two series have the same length N. When using exhaustive template searching,

the complexity of computing a DW-based feature value becomes O(LN2), in which L is

the epoch number of a recording. This approach is denoted as A1.

• The Sakoe-Chiba warping band condition brings down the computational complexity to


Table 3.3: Parameter determination procedure

Parameter Symbol Grid search Determined value

Min Max Step

Sakoe-Chiba warping rT 0 300 5 60 samples†

band rF 0 144 1 5 frequency bins‡

Template searching λ T 25 500∗ 25 200 epochs§

window size (1-side) λ F 25 500∗ 25 250 epochs

∗The maximal template searching window size could be limited to the total number

of epochs when computing the features.†6 s (6.4 ± 0.9 s).‡∼0.024 Hz (0.026 ± 0.004 Hz).§100 min (94.4 ± 10.3 min).125 min (134.4 ± 23.3 min).

O(LrN) instead of O(LN2), where r is the warping band size and typically r ≪ N [205].

This approach is denoted as A2.

• Setting a template searching window Λ with a size of 2λ can reduce the complexity to

O(λ rN), where λ < L. This approach is denoted as A3.

These three DW approaches will be compared in terms of average computation time of

extracting a DW-based feature for one epoch, implemented in a MEX-compiled C routine used

in Matlab (Mathworks, Natick, MA). All computations were carried out in a laptop computer

with a single Intel(R) Core(TM) i5 processor (2.53 GHz) and 4GB RAM memory.

3.4 Results

Table 3.3 indicates the determined parameter values obtained by the grid search method. Since

the determination was based on the training set of each iteration during the LOSOCV procedure,

the optimal values for each iteration might differ. Their means and variances (over grid search

iterations) are also indicated in the table.

Table 3.4 shows the pooled discriminative power (as measured by ASMD) of the selected

features for all subjects in separating the sleep and wake classes. As confirmed with the Mann-

Whitney unpaired (1-sided) test, the differences of the features between these two classes are

significant. The table also indicates that the DW-based features perform much better than actig-

raphy when discriminating between quiet-wake and sleep; and the feature dtw offers a higher

discriminative power compared with the other features for wake and REM separation. Fig-

ure 3.6 illustrates the box plots of the three features (ac , dtw , and dfw ) for sleep and wake

epochs for every subject and the pool of all these subjects. It clearly shows how the features can

help discriminate (albeit not perfectly) between the two classes. Classification errors will occur

for feature values where the box plots overlap. Besides, the in-between feature correlations for


Table 3.4: Feature discriminative power (ASMD)

Feature Sleep vs. Wake Quiet-wake vs. Sleep Wake vs. REM sleep

ac 1.77∗ 0.16∗∗ 0.92∗

dtw 1.75∗ 0.48∗ 1.03∗

dfw 1.39∗ 0.74∗ 0.70∗

For each feature, the significance of difference between classes (sleep/wake,

quiet-wake/sleep, and wake/REM sleep) was examined with a Mann-Whitney

test (∗p < 0.0001 and ∗∗p < 0.005).

Table 3.5: Feature correlation matrix

Correlation (ρ)∗ ac dtw dfw

ac 1 0.32† 0.26†

dtw – 1 0.26†

dfw – – –

∗Spearman’s rank correlation coefficient.†Significance of correlation was tested with a t-test,

p < 0.0001.

all subjects are presented in Table 3.5, indicating that the correlation between ac and dtw is

higher than the others.

The classification results obtained with each of the feature sets after LOSOCV are summa-

rized in Table 3.6, where both ‘averaged’ and ‘pooled’ results are presented. Note that, for each

feature set, the decision threshold (i.e., operating point) of the classifier was chosen to optimize

Kappa coefficient (based on training sets) rather than overall accuracy due to the between-class

imbalance of our data. As it can be seen in the table, for instance, the two DW-based features

(i.e., FDW) provide a pooled κ of 0.59, which seems to be comparable with the actigraphy fea-

ture (corresponding to a κ of 0.58). Combining them with the actigraphy feature in FAC-DW,

we achieved a pooled κ of 0.66 and a pooled accuracy of 95.7%. The table also presents the

classification results obtained with FAR1 and FAR2, indicating that the addition of DW-based

features significantly improves the classification performance. It also shows that the feature set

FAC-DW performs significantly better than FAR1 and comparably with FAR2. For comparison,

the results based on the feature sets comprising actigraphy, respiratory, and cardiac features are

also provided in Table 3.6. No significant difference was found between FAC-DW, FARC1, and

FARC2. Figure 3.7 compares the pooled PR curves using different feature sets.

The classifier’s learning curves (based on FAC-DW) using LOSOCV are displayed in Fig-

ure 3.8. It is plotted as pooled κ versus the number of subjects (varying from 2 to 15). The

results on training and test sets start converging rapidly from 4 or 5 subjects and become stable

at 13 subjects, ultimately achieving a κ of ∼0.66. This confirms the unsuitability of splitting

separate training and test sets in our experiment.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool

0

400

800

(a.u

.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool

0

0.05

0.1

dtw

Sleep Wake

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool

0

1

2

3x 10

−4

Subject

dfw

ac(a

.u.)

(a.u

.)

Figure 3.6: Box plots (mean and SD) of the feature values of ac , dtw , and dfw for sleep and wake

epochs for each of the 15 subjects and for the pool of all these subjects.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

FAC

FR1

FDW

FAR1

FAC DW−

Figure 3.7: PR curves with features or feature sets with their corresponding operating points of classifier

(representing κ) are marked.

46Chap

ter3.

Dynam

icwarp

ingon

respiratory

effort

Table 3.6: Summary of sleep and wake classification results using LOSOCV

Feature set Precision (%) Sensitivity (%) Specificity (%) Accuracy (%) AUCPR∗ Kappa κ∗

FAC 64.5 (66.2 ± 20.9) 57.5 (61.8 ± 16.1) 97.4 (97.3 ± 2.0) 94.4 (94.2 ± 1.7) 0.66 (0.73 ± 0.09) 0.58 (0.57 ± 0.07)

FDTW 50.7 (51.0 ± 17.9) 62.9 (67.7 ± 17.0) 95.0 (94.9 ± 2.5) 92.5 (92.4 ± 2.1) 0.55 (0.60 ± 0.13) 0.52 (0.51 ± 0.11)

FDFW 43.3 (42.6 ± 11.8) 50.8 (53.7 ± 11.9) 94.5 (94.3 ± 2.3) 91.2 (91.0 ± 2.9) 0.43 (0.44 ± 0.10) 0.41 (0.41 ± 0.08)

FR1 45.2 (51.6 ± 20.2) 54.3 (52.9 ± 16.8) 94.6 (93.8 ± 6.9) 91.5 (90.9 ± 6.0) 0.52 (0.55 ± 0.14) 0.45 (0.44 ± 0.12)

FR2 64.2 (66.8 ± 20.5) 56.0 (55.0 ± 19.3) 97.3 (96.6 ± 3.0) 94.2 (94.1 ± 2.6) 0.64 (0.67 ± 0.16) 0.57 (0.55 ± 0.17)

FDW 63.5 (64.0 ± 18.9) 59.9 (63.4 ± 16.2) 97.3 (97.2 ± 1.9) 94.3 (94.2 ± 2.2) 0.64 (0.68 ± 0.12) 0.59 (0.58 ± 0.11)

FAR1 70.6 (75.3 ± 29.2) 60.5 (62.4 ± 18.7) 97.8 (97.6 ± 2.7) 95.0 (94.8 ± 2.3) 0.68 (0.75 ± 0.12) 0.62 (0.61 ± 0.12)

FAR2 75.0 (80.2 ± 20.1) 62.9 (64.2 ± 20.6) 98.1 (97.9 ± 2.8) 95.6 (95.3 ± 2.4) 0.73 (0.78 ± 0.13) 0.66 (0.64 ± 0.15)

FAC-DW 77.3 (79.1 ± 16.5) 61.2 (64.5 ± 20.6) 98.5 (98.3 ± 1.9) 95.7 (95.5 ± 2.0) 0.74 (0.78 ± 0.12) 0.66 (0.65 ± 0.13)

FARC1† 76.9 (81.4 ± 17.0) 60.3 (59.8 ± 19.1) 98.5 (98.4 ± 2.3) 95.6 (95.5 ± 2.9) 0.72 (0.77 ± 0.12) 0.65 (0.64 ± 0.14)

FARC2† 75.4 (79.8 ± 16.6) 63.2 (63.0 ± 18.8) 98.3 (98.1 ± 2.3) 95.7 (95.5 ± 2.2) 0.74 (0.77 ± 0.13) 0.67 (0.65 ± 0.12)

For each metric, the pooled and the averaged (between brackets) results over subjects are provided. Results were chosen to optimize κ .∗Significance of difference between feature sets was examined with a t-test (with 14 degrees of freedom and p < 0.05). Normality of the results was

confirmed with a Q-Q plot method.†Compared to the previous work [89, 177], a larger data set with 15 subjects was used; the features (except the DW-based features) were selected

from a larger feature set based on the selection method described in [108].


2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.2

0.4

0.6

0.8

1

Number of subjects

Kappa c

oeffi

cie

nt

Training set

Test set

Figure 3.8: Learning curves with LOSOCV by varying the number of subjects.

Table 3.7: Classification results with split training and test sets

Training set Test set Accuracy (%) AUCPR Kappa κ

Boston Boston 96.0 0.80 0.71

Boston Eindhoven 95.4 0.66 0.61

Eindhoven Eindhoven 95.2 0.65 0.59

Eindhoven Boston 95.5 0.78 0.67

Table 3.7 shows the classification results (pooled overall accuracy, AUCPR, and Kappa) of

using our actigraphy- and respiratory-based feature set FAC-DW by splitting training and test sets

with regard to lab (i.e., using the Boston set to train the classifier and testing it on the Eindhoven

set, and the other way around).

The results (absolute error and estimation bias) of the sleep statistics over subjects using

different actigraphy and respiratory feature sets (FAR1 and FAC-DW) are summarized and com-

pared in Table 3.8. Using FAC-DW we achieved significantly lower absolute errors (after t-test,

p < 0.05) in estimating the sleep statistics compared with that using FAR1, with an exception of

ST. To compare the degree of agreement, the Bland-Altman scatter plots were produced in Fig-

ure 3.9. It can be seen that the difference values of SE, TST, TWT, SOL, and WASO are more

converging when using FAC-DW than using FAR1, indicating less variances (or higher degree of

agreement) when estimating the sleep statistics with FAC-DW.

Table 3.9 compares the computational complexity of different DW approaches (A1, A2, and

A3). It means that, when extracting DW-based features, using a warping band for DW and

constraining the template searching range reduces the computation time significantly (after a

t-test, p < 0.001) to an average value of 0.53 s and 0.10 s for computing dtw and dfw of each

30-s epoch, respectively. On average, it takes approximately 7.5 min for the DTW feature and

1.5 min for the DFW feature to compute all their feature values of one night per subject.


Table 3.8: Comparison of sleep statistics (mean ± SD over subjects)

Sleep statistics Absolute error Estimation bias∗

FAR1 FAC-DW FAR1 FAC-DW

SE (%) 3.3 ± 2.4 2.8 ± 2.1 −0.1 ± 4.1 −1.3 ± 3.3

TST (min) 13.7 ± 10.4 11.2 ± 7.9 −0.43 ± 17.6 −6.5 ± 12.4

TWT (min) 13.7 ± 10.4 11.4 ± 8.0 0.43 ± 17.6 −6.9 ± 12.4

SOL (min) 6.4 ± 7.1 5.0 ± 6.8 3.9 ± 8.8 3.4 ± 7.8

WASO (min) 12.2 ± 10.4 7.1 ± 5.3 −3.6 ± 15.9 3.6 ± 8.2

ST (min) 1.0 ± 1.3 1.1 ± 1.7 0.17 ± 1.7 −0.13 ± 2.0

∗For each subject, estimation bias was computed as reference value minus estimated value.

Table 3.9: Comparison of computational complexity for different DW-

based feature extraction approaches

DW Computational Average computation time (s)

approaches complexity dtw (one epoch) dfw (one epoch)

A1 O(LN2) 2.29 ± 0.08 0.44 ± 0.02

A2 O(LrN)∗ 1.43 ± 0.05 0.21 ± 0.01

A3 O(λ rN)∗ 0.53 ± 0.04 0.10 ± 0.03

∗Here the parameters are rT= 60, rF= 5, λ T= 200, and λ F= 250.

3.5 Discussion

During the training step of each LOSOCV iteration, some parameters, evaluated by the pooled

AUCPR, were determined. The determined Sakoe-Chiba warping band for DTW (rT = 60) is

much larger than that for DFW (rF = 5). This is because, when computing the DTW distance

between two respiratory time series, they usually start and end with different phases of a breath-

ing cycle. A larger DTW warping band allows a larger signal variation (caused by breathing

phase, length, amplitude differences, etc.) between two epochs. It helps compensating for the

signal variation and thus enables to find a better alignment between them. On the other hand,

when computing the DFW distance, the respiratory PSDs were normalized between 0 and 1 so

that the amplitude variation between epochs would be no more existing (no improvement on

classification performance was observed without normalizing them). Also, they usually have

less peaks and no troughs compared with time series (see Figure 3.1). These would yield a

higher similarity between two respiratory PSD series than between two respiratory time series.

Besides, using a smaller warping band for DFW is able to avoid over-alignment between two

PSD series, which still enables to discriminate between sleep and wake with respect to their

minimum distance.

The searching window sizes for extracting DW-based features were also determined with

the use of the grid search method. Since we relied on the observation that the minimum DW

distance for a sleep epoch is small, this potential disadvantage of restricting the search space


80 85 90 95 100−10

0

10

SE average (%)

SE

diff.

(%

)

200 300 400 500−50

0

50

TST average (min)

TS

Td

iff.

(m

in)

0 20 40 60 80−50

0

50

TWT average (min)

TW

Td

iff.

(m

in)

0 10 20 30−40

0

40

SOL average (min)

SO

Ld

iff.

(m

in)

0 20 40 60−50

0

50

WASO average (min)

WA

SO

diff.

(m

in)

0 5 10−10

0

10

ST average (min)

ST

diff.

(m

in)

80 85 90 95 100−10

0

10

SE average (%)

SE

diff.

(%

)

200 300 400 500−50

0

50

TST average (min)

TS

Td

iff.

(m

in)

0 20 40 60 80−50

0

50

TWT average (min)T

WT

diff.

(m

in)

0 10 20 30−40

0

40

SOL average (min)

SO

Ld

iff.

(m

in)

0 20 40 60−50

0

50

WASO average (min)

WA

SO

diff.

(m

in)

0 5 10−10

0

10

ST average (min)

ST

diff.

(m

in)

Bland Altman plots using F− AC DW−Bland Altman plots using F− AR1

Figure 3.9: Bland Altman plots of for sleep statistics estimated using FAR1 (Left) with data points marked

by “×” and FAC-DW (Right) with data points marked by “”. Data points in a plot represent different

subjects. Mean bias and 95% limits (± 1.96 SD) are shown as solid and dash lines, respectively.

are alleviated by the fact that sleep epochs are usually not isolated in time, i.e., there are, very

likely, other sleep epochs close to any given (sleep) epoch during the night. Furthermore, a

larger searching window might not provide a better separation between sleep and wake classes.

For instance, when analyzing a wake epoch, the inclusion of more distant (in time) candidate

templates might increase the likelihood of selecting a more similar wake template. This would

result in a smaller DW distance and thus decrease the feature’s discriminative power. Here we


found that the discriminative power of these two features did not dramatically change when

λ > 25 epochs.

The DW-based features performed well for sleep and wake classification. These features can

effectively encode differences in the waveform and PSD shape of the respiratory effort between

sleep and wake states. As shown in Table 3.6, when considering the use of only respiratory

effort, our DW-based feature set FAC-DW offers around relative 31% increase of κ compared to

the existing respiratory feature set FR1 (i.e., κ of 0.59 versus 0.45); and it is comparable with the

well-known actigraphy (κ = 0.58). After combining actigraphy with respiratory effort signal,

our DW-based features improved the classification performance from κ = 0.62 to κ = 0.66,

yielding a higher relative increase (∼14%) when compared with actigraphy. The reason might

be that the DW-based features (particularly the DFW feature) better help distinguish between

quiet-wake and sleep (see Table 3.4).

A previous study [126] presented a novel actigraphy-based algorithm for sleep and wake

classification, in which the authors reported an overall accuracy of ∼86%, a sleep accuracy

of ∼91%, and a wake accuracy of ∼69% for a group of 38 normal subjects. In [234], the

overall accuracy was ∼87% (measured in 14 healthy subjects). In this study, to perform an

even comparison, we varied the operating point of our classifier and obtained comparable results

based on only actigraphy. After combining it with the DW-based respiratory features, as shown

in Table 3.6, we achieved much better results.

It is known that the wrist actigraphy ultimately measures the body (or more precisely, wrist)

movements during sleep, which proved to be an indication of wake state [18, 234]. To a certain

extent, they would often be reflected in respiratory effort signal as body motion artifacts during

measurement. This can be observed in Figure 3.5, which suggests a relatively high correlation

between peaks in the actigraphy feature and respiratory effort series. As mentioned, the res-

piratory waveform and PSD shape not only reflect the respiration information but also contain

some information about body motion artifacts. It means that the DW-based features might en-

code the artifact information in both of the time and the frequency domains. Table 3.5 confirms

this due to the significant correlation between ac and dtw (ρ = 0.32) and between ac and

dfw (ρ = 0.26). These two features (particularly the DTW feature) might help separate wake

and REM sleep, resulting in an improved classification when actigraphy is not provided (see

Table 3.6).

The inclusion of the cardiac feature (i.e., using FARC2) did not significantly improve the

performance of sleep and wake classification (see Table 3.6). It means that a good performance

is still possible to be obtained when using fewer physiological signal modalities. However, it

is still encouraged to explore new cardiac features containing additional information that can

better discriminate between sleep and wake states, for which these information is not contained

by actigraphy and respiratory activity. Moreover, the κ of 0.59 with only DW-based respiratory

features is comparable with that of 0.60 reported in [249], where they used not only respiratory

but also cardiac information.

The results of using FAR2 (with six features) are comparable with that using FAC-DW (with

three features). Since we aimed at evaluating the proposed new DW-based features, they were


simply combined with the other pre-selected features. Often, using more features does not

necessarily guarantee a better performance, and in some cases it may even decrease. This is

because features may be mutually correlated to some extent, and thus some features are likely

redundant. As a consequence, they may hardly contribute to (or even be against) the classifica-

tion when the additionally useful information they carried is limited compared to the increase

of noise level. Therefore, selecting features from a larger feature set aiming at removing the

feature set redundancy (e.g., correlation-based feature selection [121]) merits further investiga-

tion.

As shown in Table 3.7, the sleep and wake classification results obtained on the Eindhoven

set remain worse compared with those on the Boston set, regardless of either set used for train-

ing. This might be associated with the between-subject variability instead of lab-effect. Thus,

it is not sufficiently confident to conclude about the existence of lab-effect based on our data set

with a small number of subjects. Although results have been shown to be consistent between

labs [126], it is encouraged to be further studied on a larger-sized data set .

By choosing different classifier operating points, we can obtain results that prefer a higher

specificity or sensitivity. In practice, this often depends on the requirement of accuracy in

estimating sleep statistics, which can be delivered to subjects. For example, it should be cho-

sen to optimize the estimate of SOL for subjects who might have insomnia; while for overall

assessment of sleep, one can choose to optimize the estimate of SE.

In addition, this study focused on the healthy subjects with high sleep efficiencies (>86%)

rather than, e.g., the insomniacs with low sleep efficiencies. However, it has been indicated that

distinguishing between sleep and wake states is more difficult in insomniacs than in healthy

subjects when using cardiorespiratory activity [85, 283] or actigraphy [175]. Although the

DW-based features perform well in separating sleep and wake states for the healthy subjects, it

is necessarily required to further evaluate how robust they are against low sleep efficiency.

Finally, although the DW-based features seem computationally intensive compared with

many other existing features, it is still practically feasible to achieve an offline classification

of sleep and wake. In fact, recent research has developed a set of techniques that can make

the DW computation much faster and comparable with the Euclidean alignment, so that DW is

applicable on large-sized data sets in real time [242]. Nevertheless, speeding up our algorithms

using these techniques will be carried on in our future work.

3.6 Conclusion

In this chapter, we proposed two new features extracted from respiratory effort based on dy-

namic warping (DW) algorithms to enhance the performance of sleep and wake classification.

The features compared the shape (dis)similarity between two series (in time and frequency

domain) for a given 30-s epoch with the other epochs within a pre-determined window from

an entire-night respiratory effort recording. The minimal dissimilarity (measured by a DW

distance) was computed as the feature value for this epoch. To evaluate the sleep and wake

classification performance, a linear discriminant classifier was tested with a leave-one-subject-


out cross-validation. By combining the two DW-based features with a well-known actigraphy

feature, we obtained a significantly increased Cohen’s Kappa coefficient (κ = 0.66) compared

with the use of the actigraphy feature and the traditional respiratory features (κ = 0.62), and

it significantly outperforms that only with actigraphy (κ = 0.58). It is comparable with that of

0.67, obtained with a feature set comprising the DW-features and the previously used actigraphy

and cardiorespiratory features. Furthermore, when using the respiratory signal only, the DW-

based features provided a large improvement compared with the existing respiratory features

(κ of 0.59 versus 0.45).

CHAPTER 4

Analysis of respiratory effort amplitude for sleep stage

classification

This chapter is adapted from: X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing

respiratory effort amplitude for automated sleep stage classification. Biomedical Signal Processing and

Control, 14:197-205, 2014. c©Elsevier

Abstract – Respiratory effort has been widely used for objective analysis of human sleep

during bedtime. Several features extracted from respiratory effort signal have succeeded in

automated sleep stage classification throughout the night such as variability of respiratory fre-

quency, spectral powers in different frequency bands, respiratory regularity and self-similarity.

In regard to the respiratory amplitude, it has been found that the respiratory depth is more ir-

regular and the tidal volume is smaller during rapid-eye-movement (REM) sleep than during

non-REM (NREM) sleep. However, these physiological properties have not been explicitly

elaborated for sleep stage classification. By analyzing the respiratory effort amplitude, we pro-

pose a set of 12 novel features that should reflect respiratory depth and volume, respectively.

They are expected to help classify sleep stages. Experiments were conducted with a data set

of 48 sleepers using a linear discriminant (LD) classifier and classification performance was

evaluated by overall accuracy and Cohen’s Kappa coefficient of agreement. Cross validations

(10-fold) show that adding the new features into the existing feature set achieved significantly

improved results in classifying wake, REM sleep, light sleep and deep sleep (Kappa of 0.38

and accuracy of 63.8%) and in classifying wake, REM sleep and NREM sleep (Kappa of 0.45

and accuracy of 76.2%). In particular, the incorporation of these new features can help improve

deep sleep detection to more extent (with a Kappa coefficient increasing from 0.33 to 0.43). We

also revealed that calibrating the respiratory effort signals by means of body movements and

performing subject-specific feature normalization can ultimately yield enhanced classification

performance.

53

54 Chapter 4. Analysis of respiratory effort amplitude

4.1 Introduction

According to the rules presented by Rechtschaffen and Kales (the R&K rules) [247], human

sleep is comprised of wake, rapid-eye-movement (REM) sleep and four non-REM (NREM)

sleep stages S1-S4. S1 and S2 are usually grouped as “light sleep” and S3 and S4 correspond

to slow-wave sleep (SWS) or “deep sleep” [276]. The gold standard for nocturnal sleep assess-

ment is overnight polysomnography (PSG) which is typically collected in a sleep laboratory.

With PSG, sleep stage is manually scored on each 30-s epoch throughout the night by trained

sleep experts, forming a sleep hypnogram [247]. PSG recordings usually contain multiple bio-

signals such as electroencephalography (EEG), electrocardiography (ECG), electrooculography

(EOG), electromyography (EMG), respiratory effort, and blood oxygen saturation.

Respiratory information has been widely used for objectively assessing human nocturnal

sleep [95, 226, 281]. Detecting sleep stages over night is beneficial to the interpretation of

sleep architecture or monitoring of sleep-related disorders [102, 248]. Cardiorespiratory-based

automated sleep stage classification has been increasingly studied in recent years [158, 180,

249, 309, 312]. Some of those studies only made use of respiratory activity because, when

comparing with it cardiac activity is relatively more difficult to be captured reliably in an unob-

trusive manner [158, 180]. For respiratory activity, in comparison with the breathing ventilation

acquired with traditional devices such as nasal prongs or face mask [106], respiratory effort can

be obtained in an easier and more noninvasive way, e.g., using a respiratory inductance plethys-

mography (RIP) sensor [73], an infrared camera [166], or a pressure sensitive bed-sheet [264].

Several parameters have been derived from respiratory effort signals for sleep analysis in-

cluding respiratory frequency, powers of different respiratory spectral bands [249], respiratory

self-similarity [180], and regularity [250], etc. These parameters are usually called “features” in

the tasks of epoch-by-epoch sleep stage classification. In addition, it has been reported that the

respiratory amplitude (e.g., depth and volume) differs between sleep stages [95]. For instance,

the “respiratory depth” is more regular and the tidal volume, minute ventilation, and inspiratory

flow rate are significantly lower during REM sleep than during NREM sleep (particularly dur-

ing deep sleep) [67, 129]. To the authors knowledge, these characteristics that express different

physiological properties across sleep stages have not been explicitly elaborated and quantified

for applications of sleep stage classification. We therefore exploit these characteristics by an-

alyzing respiratory effort signal envelope and area. Features quantifying these characteristics

are motivated to be designed which are expected to in turn help separate different sleep stages.

It is assumed that the information about respiratory depth or volume is obtainable from the

respiratory effort signal. For instance, the signal (upper and lower) envelopes and area should

correspond to respiratory depth and volume, respectively. In fact, respiratory effort has often

been used as a surrogate of tidal volume since it is obtained by measuring motions of rib cage or

abdominal with, e.g., RIP [73]. However, Whyte et al. [307] argued that this assumption does

not always hold, particularly when a sleeper changes his/her posture along with body move-

ments during sleep. This is because the respiratory effort amplitude might be affected by body

movements as the sensor position may shift and/or the sensor may be stretched. This will cause

an uneven comparison of the signal amplitude before and after body movements, yielding errors


when computing the feature values. In order to provide a more accurate estimate of respiratory

depth and volume from respiratory effort signal, we must calibrate the signal by means of body

movements. They can be quantified by analyzing the artifacts of respiratory effort signal (of-

ten in line with body movements) using a dynamic time warping (DTW)-based method [180].

DTW is a signal-matching algorithm that quantifies an optimal nonlinear alignment between

two time series allowing scaling and offset [37]. Our previous work [180] has proposed a DTW

measure to effectively capture body motion artifacts by measuring self-similarity of respiratory

effort. This measure has been successfully used as a feature for classifying sleep and wake

states in that work. Therefore, we simply adopted this measure to detect motion artifacts mod-

ulated by body movements in respiratory effort signals. Using the DTW-based method enables

the exclusion of an additional sensor modality (e.g., actigraphy) specifically used for detecting

body movements.

The address of this work is exclusively on investigating a set of novel features that can

characterize respiratory amplitude in different aspects with the ultimate goal of improving sleep

stage classification performance. Previous studies have shown that linear discriminant (LD) is

an appropriate algorithm in sleep stage classification [179, 248, 249]. Likewise, we simply

adopted an LD classifier. Preliminary results of this work in classifying REM and NREM sleep

have been previously published [181].


4.2.1 Subjects and data

Data of 48 healthy subjects (21 males and 27 females) in the SIESTA project, supported by

the European Commission [160], were included in our data set. The subjects had a Pittsburgh

Sleep Quality Index (PSQI) [60] of no more than 5 and met several criteria (no shift work, no

depressive symptoms, usual bedtime before midnight, etc.). All the subjects signed an informed

consent form prior to the study, documented their sleep habits over 14 nights, and underwent

overnight PSG study for two consecutive nights (on day 7 and day 8) in sleep laboratories. The

PSG recordings collected on day 7 were used for analyses, from which the respiratory effort

signals (sampling rate of 10 Hz) were recorded with thoracic inductance plethysmography

Sleep stages were manually scored on 30-s epochs as wake, REM sleep, or one of the NREM

sleep stages by sleep clinicians based on the R&K rules. For sleep stage classification epochs

were labeled as four classes W (wake), R (REM sleep), L (light sleep), and D (deep sleep), or

three classes W, R, and N (NREM sleep).

From the data used in this study the subject demographics and some sleep statistics [mean

± standard deviation (SD) and range] are summarized in Table 4.1.

4.2.2 Signal preprocessing

The raw respiratory effort signals of all subjects were preprocessed before feature extraction.

They were filtered with a 10th order Butterworth low-pass filter with a cut-off frequency of


Table 4.1: Summary of subject demographics and sleep statis-

tics (N = 48)


Age (y) 41.3 ± 16.1 20 − 83

Body mass index (kg/m2) 23.6 ± 2.9 19.1 − 31.3


Wake, W (%) 12.9 ± 6.1 1.2 − 24.5

REM sleep, R (%) 19.0 ± 3.3 15.3 − 26.5

NREM sleep, N (%) 68.1 ± 4.9 56.1 − 76.3

Light sleep, L (%) 53.6 ± 5.5 42.7 − 66.7

Deep sleep, D (%) 14.5 ± 4.8 5.3 − 28.5

0.6 Hz for the purpose of eliminating high frequency noise. Afterwards the baseline was re-

moved by subtracting the median peak-to-trough amplitude. To locate the peaks and troughs,

we identified the turning points simply based on sign change of signal slope and then corrected

the falsely detected ‘dubious’ peaks and troughs (1) with too short intervals between peak and

trough pairs where the sum of two successive intervals is less than the median of all intervals

over the entire recording and (2) with two small amplitudes where the peak-to-trough differ-

ence is smaller than 15% of the median of the entire respiratory effort signal. These methods

were validated by comparing automatically detected results with manually annotated peaks and

troughs and an accuracy of ∼98% was achieved.

4.2.3 Existing respiratory features

A pool of 14 existing features extracted from the respiratory effort signal has been used in

previous studies for sleep stage classification. In the time domain, the mean and SD of breath

lengths (Lm and Lsd) and the mean and SD of breath-by-breath correlations (Cm and Csd) were

calculated [248]. In the frequency domain, we extracted features based on the respiratory effort

spectrum for each epoch where the spectrum was estimated using a short time Fourier transform

(STFT) with a Hanning window. From the spectrum the dominant frequency (Fr) in the range of

0.05-0.5 Hz (estimated as the respiratory frequency) and the logarithm of its power (Fp) were

obtained [248]. We also took the logarithm of the spectral power in the very low frequency

band between 0.01 and 0.05 Hz (VLF), low frequency band between 0.05 and 0.15 Hz (LF),

and high frequency band from 0.15 to 0.5 Hz (HF) and the ratio between LF and HF spectral

powers (LF/HF) [248, 249]. Furthermore the standard deviation of respiratory frequency over

5 epochs (Fsd) was computed [249]. Non-linear features consist of self-similarity measured

between each epoch of interest and the other epochs by means of dynamic time and frequency

warping (Sdtw and Sdfw) [180] and signal regularity estimated by sample entropy (Rse) [250].

The latter was implemented with the PhysioNet toolkit sampen [170].


210 240 270 300 330-1.2

0

1.2

Time (s)

Resp. effort

(a.u

.)

14640 14670 14700 14730 14760-1.2

0

1.2

Time (s)

Resp

. effort

(a.u

.)

27390 27420 27450 27480 27510-1.8

0

1.8

Time (s)

Re

sp.

effo

rt (

a.u

.)

7200 7230 7260 7290 7320-2

0

2

Time (s)

Resp

. effort

(a.u

.)

Peak sequence Trough sequence

Deep sleep

Light sleep

REM sleep

Wake

Figure 4.1: A typical example of a 2-min (or 4-epoch) respiratory effort signal in wake, REM sleep,

light sleep and deep sleep. The peaks and troughs are represented by filled circles and filled squares,

respectively.

4.2.4 Respiratory amplitude features

4.2.4.1 Analysis of respiratory effort amplitude

Figure 4.1 illustrates four short segments of a respiratory effort signal during different sleep

stages. It is observed that the envelopes formed by the peak and trough sequences of the signal

during wake and REM sleep, when compared with that during light and deep sleep: (1) are more

‘irregular’; (2) have generally lower absolute mean or median; and (3) have larger variance. In

addition, as illustrated in Figure 4.2, we also considered the respiratory effort ‘area’ comprised

between the respiratory effort amplitude and its mean value (zero in the example). As explained,

this area should correlate with respiratory volume to a certain extent, which differs across sleep

stages. Relying on these observations, several new respiratory amplitude features were explored

in two aspects, namely respiratory depth-based and volume-based features.

4.2.4.2 Depth-based features

A total of five depth-based features were extracted from the peak and trough sequences (i.e.,

upper and lower envelopes) of the respiratory effort signal. The amplitudes of these peaks and


14700 14710 14720 14730-1.2

0

1.2

Time (s)

Resp

. effort

(a.u

.)

27390 27400 27410 27420-1.8

0

1.8

Time (s)

Resp. effort

(a.u

.)

7230 7240 7250 7260-2

0

2

Time (s)

Re

sp.

effo

rt (

a.u

.)

240 250 260 270-1.2

0

1.2

Time (s)

Resp

. effort

(a.u

.)Inhalation Exhalation

Deep sleep

Light sleep

REM sleep

Wake

One breathing cycle

One breathing cycle

One breathing cycle

One breathing cycle

Figure 4.2: A typical example of a 30-s (or one-epoch) respiratory effort signal in wake, REM sleep,

light sleep and deep sleep. The areas between the curves and the baseline are filled in light gray (inhala-

tion) and dark gray (exhalation). Examples of one breathing cycle period are indicated.

troughs should include the information in regard to respiratory depth. Let us consider p =

p1, p2, . . . , pn and t = t1, t2, . . . , tn the peak and trough sequences from a window of 25 epochs

or 12.5 min centered at the epoch under consideration, containing n peaks and troughs, respec-

tively. We thus computed the standardized median of the peaks (and troughs) by dividing the

median by their interquartile range (IQR, the difference between the third and the first quartile),

such that

Psdm =median(p1, p2, . . . , pn)

IQR(p1, p2, . . . , pn), (4.1)

Tsdm =median(t1, t2, . . . , tn)

IQR(t1, t2, . . . , tn). (4.2)

These two features consider the mean respiratory depth and its variability at the same time in

terms of inhalation (for peaks) and exhalation (for troughs). Note that the period length of 25

epochs was chosen to maximize the average discriminative power of all respiratory amplitude

features in separating wake, REM sleep, light sleep, and deep sleep.

To examine how regular the envelopes are, we used the non-linear sample entropy mea-

sure, which has been broadly used in quantifying regularity of biomedical time series [250].


Considering a time series with n data points u = u1,u2, . . . ,un, let v(i) = ui,ui+1, . . . ,ui+m−1

(1 ≤ i ≤ n−m+1) be a subsequence of u, where the window length m is a positive integer and

m < n. Then for each i, we have Bi,m(r) = (n−m+1)−1η(r), in which η(r) is the number of j

such that dm[v(i),v( j)]≤ r (1≤ j ≤ n−m, j 6= i) where the distance metric dm between two sub-

sequences v(i) and v( j) is given by dm[v(i),v( j)] = max|ui+l −u j+l| for all l = 0,1, . . . ,m−1.

For a higher dimension m+1, we have Ai,m(r). Then the sample entropy of the time series u is

defined by

SE =−ln

[Am(r)

Bm(r)

]

, (4.3)

where

Am(r) =1

n−m

n−m

∑i=1

Ai,m(r), (4.4)

Bm(r) =1

n−m

n−m

∑i=1

Bi,m(r). (4.5)

Similarly, the sample entropy measures of the peak and trough sequences Pse and Tse are com-

puted as

Pse =−ln

[

Ampeak(r)

Bmpeak(r)

]

, (4.6)

Tse =−ln

[Am

trough(r)

Bmtrough(r)

]

, (4.7)

in which r is the tolerance that usually takes the value of 0.1-0.25 SD of the peak or the trough

sequence and m takes a value of 1 or 2 for the sequence of length n larger than 100 data points

[171, 250]. In our study, r of 0.20 SD of the sequence and m of 2 were experimentally chosen

to maximize the discriminative power of the two features.

Additionally, the median of peak-to-trough differences express the range of inhale and ex-

hale depths. It was computed as

PTdiff = median [(p1 − t1),(p2− t2), . . . ,(pn− tn)] . (4.8)

4.2.4.3 Volume-based features

A total of seven volume-based features were extracted from the respiratory effort signal. They

should reflect certain properties of respiratory volume. The respiratory effort signal (sampled

at 10 Hz) over a window of 25 epochs or 12.5 min centered at the epoch of interest is expressed

as s = s1,s2, . . . ,sx, . . . ,sM (x = 1,2, . . . ,M), where M is the number of sample points in this

period. Suppose that Ωbrk is the kth breathing cycle in the epoch where there are in total K

consecutive breathing cycles (k = 1,2, . . . ,K). Then the corresponding kth inhalation and exha-

lation periods are Ωink and Ωex

k , respectively. As illustrated in Figure 4.2, a breathing cycle is the


period between two consecutive troughs and thereby the inhalation and exhalation periods in

this breathing cycle are separated by the peak in between these two troughs. We first computed

the median respiratory volume (expressed by respiratory effort area) measured during breathing

cycles (Vbr), inhalation periods (Vin), and exhalation periods (Vex) for each epoch, such that

Vbr = median

∑sx∈Ωin

1

sx, ∑sx∈Ωin

2

sx, . . . , ∑sx∈Ωin

K

sx

, (4.9)

Vin = median

∑sx∈Ωin

1

sx, ∑sx∈Ωin

2

sx, . . . , ∑sx∈Ωin

K

sx

, (4.10)

Vex = median

∑sx∈Ωex

1

sx, ∑sx∈Ωex

2

sx, . . . , ∑sx∈Ωex

K

sx

. (4.11)

In addition, we computed the median respiratory “flow rate” (expressed by the respiratory

effort area over time) during breathing cycles (FRbr), inhalation periods (FRin), and exhalation

periods (FRex), such that

FRbr = median

1

τbr1

∑sx∈Ωin

1

sx,1

τbr2

∑sx∈Ωin

2

sx, . . . ,1

τbrK

∑sx∈Ωin

K

sx

, (4.12)

FRin = median

1

τ in1

∑sx∈Ωin

1

sx,1

τ in2

∑sx∈Ωin

2

sx, . . . ,1

τ inK

∑sx∈Ωin

K

sx

, (4.13)

FRex = median

1

τex1

∑sx∈Ωex

1

sx,1

τex2

∑sx∈Ωex

2

sx, . . . ,1

τexK

∑sx∈Ωex

K

sx

, (4.14)

in which τ ink and τex

k are the kth inhalation and exhalation time (unit: 100 ms)

τ ink = max

sx∈Ωink

(x)− minsx∈Ωin

k

(x), (4.15)

τexk = max

sx∈Ωexk

(x)− minsx∈Ωex

k

(x), (4.16)

and accordingly the time of the kth breathing cycle is given by

τbrk = τ in

k + τexk . (4.17)

The ratio of the inhalation and the exhalation flow rate FRin and FRex was finally computed as

RTfr =RTin

RTex. (4.18)


0 100 200 300 400 481−10

−5

0

5

10

Time (min)

Re

sp

e

ffo

rt (

a.u

.).

0 100 200 300 400 4810

0.02

0.04

0.06

Time (min)

DT

W m

ea

su

re (

a.u

.)

Threshold

(a)

(b)

Figure 4.3: An example of (a) an overnight respiratory effort signal and (b) the corresponding epoch-

based DTW measure, where the threshold (0.01) for identifying epochs with body movements is indi-

cated.

4.2.4.4 Signal calibration by body movements

As mentioned, the respiratory amplitude features are sensitive to body motion artifacts. We

thus should calibrate the respiratory effort signal before computing these features. This was

done by calibrating each signal segment to have zero mean and unit variance between any two

epochs detected as with body movements. As mentioned in Section 4.1, a DTW-based method

measuring the respiratory similarity between each epoch and its adjacent epochs using DTW

distance [37] was applied to estimate the body movements. For the details of computing the

DTW measure we refer to our previous work [180]. Here the epochs were identified as with

body movements if their DTW measures (expressing body motion artifacts) are larger than

a threshold. A threshold of 0.01 was experimentally found to be adequate for this purpose.

Figure 4.3 compares an overnight preprocessed respiratory effort signal with the corresponding

epoch-based DTW measure from a subject where the peaks (reflecting body movements) are

well aligned in time axis.

4.2.5 Subject-specific feature normalization

Following the feature extraction procedure as described above, we performed a subject-specific

Z-score normalization for each feature. It was done per subject/recording by subtracting the

mean of feature values and dividing by their standard deviation. This allows for reducing phys-

iological and equipment-related variations from subject to subject, thereafter enhancing the

discrimination between sleep stages.


4.2.6 Classifier

An LD classifier was used for sleep stage classification in this study. With LD, the prior prob-

abilities of different classes (i.e. sleep stages) have been observed to change over time. To

exploit this change, we calculated a time-varying prior probability for each epoch by counting

the relative frequency that specific epoch index was labeled as each class [179, 248, 249].

4.2.7 Experiments and evaluation

4.2.7.1 Cross validation

A 10-fold cross validation (10-fold CV) was conducted in our experiments. The subjects were

first randomly divided into 10 subsets, yielding 8 subsets with 5 subjects each and 2 subsets with

4 subjects each. During each iteration of the 10-fold CV procedure, data from 9 subsets were

used to train the classifier and the remaining one was used for testing. After CV, classification

results obtained for each subject in each iterations testing set were collected and performance

metrics (averaged or pooled over all subjects) were computed to evaluate the classifier.

4.2.7.2 Feature evaluation and ranking

We first compared the values of the new respiratory amplitude features in different sleep stages

to see whether they are statistically different between sleep stages. This serves to understand

their feasibility to detect sleep stage at first glance. For each of them, an unpaired Mann-

Whitney test (two-sided) was applied to examine the significance of difference.

To assess the discriminative power or class separability of each single feature in separat-

ing different classes, the information gain (IG) [239] metric was employed. IG describes the

change in information entropy caused by knowing the informative feature values. A higher

discriminative power of a feature is reflected by a larger IG value, vice versa. In this study the

discriminative power of the new features (in separating wake, REM sleep, light sleep, and deep

sleep) with and without calibrating the respiratory effort signal and with and without perform-

ing subject-specific normalization were compared. To examine which sleep stage they are able

to detect best, we compared their IG values (after signal calibration and feature normalization)

in discriminating between each stage and all the other stages as a whole. The new features in

combination with the existing features were ranked by IG which serves to select features.

During each 10-fold CV iteration, features were first ranked by means of the discriminative

power (measured by IG) in a descending order based on the associated training set. Afterwards,

a certain number of top-ranked features were selected. With this approach, we would get 10

feature subsets for all the 10 iterations. To compare the classification performance using differ-

ent number of features, we plot the performance metric versus the number of selected features

and then report the best results. Note that the feature ranking and thus the selected features

may change during each iteration of the cross validation. We allowed for this during our ex-

periments since we found that the feature rankings in different iterations were similar for the

relatively large-sized training data sets (with 43 or 44 overnight recordings) used in this study.


4.2.7.3 Classification performance evaluation

We evaluated the performance of several sleep stage classification tasks. They are (1) two

multiple-stage classification tasks: WRLD (classification of W, R, L, and D) and WRN (clas-

sification of W, R, and N); and (2) four detection tasks: W, R, D, and N (binary classification

between each of them versus all the other stages).

To evaluate the performance of classifiers, conventional metric of overall accuracy was con-

sidered. However, the high class imbalance makes this metric less appropriate. For instance,

the wake epochs account for an average of only 12.9% of all the epochs throughout the night

while the light sleep constitutes 53.6% of the night. The Cohen’s Kappa coefficient of agree-

ment [72] which has often been used in the area of sleep stage classification is considered to be

a better criterion for this problem. By factoring out chance agreement, it is not sensitive to class

imbalance. By these means, it offers a better understanding of the general performance of the

classifier in correctly identifying different classes. For the binary classification tasks, we chose

the classifier decision-making threshold leading to the maximum pooled Kappa and therefore

with this threshold the mean and SD of the overall accuracy and Kappa over all subjects were

computed.

For each classification task, the 10-fold CV using the LD classifier was conducted with

the feature sets comprising the existing pool of 14 respiratory features (set “exist”) and the

combination of the existing features and the new respiratory amplitude features (set “all”). In

addition, we also compared the classification results obtained using features with and without

performing subject-specific (Z-score) normalization. A paired Wilcoxon signed-rank test (two-

sided) was applied to test the significance of difference between classification performances.

4.3 Results

As shown in Figure 4.4, the respiratory amplitude features were found to significantly differ

across sleep stages. This means that the information regarding respiratory depth and volume

estimated from respiratory effort, which are indicators of some properties of respiratory phys-

iology, is not independent of sleep stages and therefore it can be in turn used to classify sleep

stages.

Figure 4.5 compares their discriminative power in separating wake, REM sleep, light sleep

and deep sleep with and without respiratory signal calibration (by means of body motion arti-

facts) and subject-specific feature normalization. Mostly, by calibrating the respiratory effort

and normalizing the features per subject, the IG values of these new features were increased.

The discriminative powers of all the 26 respiratory features for different classification tasks are

presented in Figure 4.6. We note that the respiratory amplitude features rank higher than most

existing features for multiple-stage classifications and NREM sleep detection. Psdm and Tsdm

(reflecting the variability of depth) perform better in detecting deep sleep; Pse and Tse (reflect-

ing the regularity of respiratory depth) have a relatively larger power in distinguishing between

wake and sleep. It can be seen that the volume-based features (with an exception of RTfr) have

higher discriminative powers in detecting REM sleep.


W R L D−5

0

5

Psdm

(a.u

.)

W R L D−5

0

5

Tsdm

(a.u

.)

W R L D−5

0

5

Pse

(a.u

.)

W R L D−5

0

5

Tse

(a.u

.)

W R L D−5

0

5

PT

diff(a

.u.)

W R L D−5

0

5

Vbr(a

.u.)

W R L D−5

0

5

Vin

(a.u

.)

W R L D−5

0

5

Vex

(a.u

.)

W R L D−5

0

5

FR

br(a

.u.)

W R L D−5

0

5

FR

in(a

.u.)

W R L D−5

0

5

FR

ex

(a.u

.)

W R L D−5

0

5

RT

fr(a

.u.)

Figure 4.4: Boxplots of values of the 12 respiratory amplitude features (with signal calibration and

subject-specific normalization) in different classes (W, R, L and D). Outliers are not shown in order to

visualize the boxes clearer. The significance of difference was found between each two classes for each

feature using an unpaired Mann-Whitney test at p < 0.01.

0

0.05

0.1

0.15

0.2

IG

0

0.05

0.1

0.15

0.2

IG

Without signal calibration

With signal calibration

TsePseTsdm

Psdm PTdiff Vbr Vin Vex RTfrFR inFRexFRbr

(b) With subject-specific normalization

(a) Without subject-specific normalization

Figure 4.5: Comparison of discriminative power (as measured by IG) of all the 12 respiratory amplitude

features without and with calibrating the respiratory effort signals for WRLD classification, where the

values (a) without and (b) without subject-specific feature normalization are both presented. IG was

computed by pooling epochs over all subjects.

Figure 4.7 illustrates the average Cohen’s Kappa coefficient versus the number of features

(ranked and selected by IG values) used for different classification tasks. For most tasks the

classification performance obtained using the feature set “all” is always better than that ob-

tained using the feature set “exist” when the number of selected features is larger than a certain

value. The overall accuracy and Kappa coefficient with the number of selected features yield-


0

0.1

0.2IG

Existing respiratory features

Respiratory amplitude features

0

0.05

0.1

0.15

IG(a) WRLD classification

(d) R detection

(e) D detection

(f) N detection

Fsd Pse FRbr Vbr Vex FR in Tse FRex Rse HFLF Cm LF/HF VLF Fp Csd PTdiff RTfrLmLsdPsdmTsdm Sdfw Vin Fr Sdtw

(b) WRN classification

(c) W detection0

0.1

0.2

IG

0

0.05

0.1

0.15

IG

0

0.05

0.1

0.15

IG

0

0.05

0.1

0.15

IG

Figure 4.6: Discriminative power of all the 26 respiratory features (with signal calibration and subject-

specific feature normalization) for (a) WRLD classification, (b) WRN classification, (c) W detection, (d)

R detection, (e) D detection, and (f) N detection. The features were ranked by IG (computed by pooling

epochs over all subjects) for WRLD classification in a descending order.

ing maximum Kappa are summarized in Table 4.2. We see that, on the one hand, normalizing

the features per subject largely increased the sleep stage classification performance for all the

classification tasks. It also shows that, to a certain extent, this method is able to reduce between-

subject variability in respiratory physiology (by comparing their SD). On the other hand, com-

bining the existing and the new respiratory amplitude features resulted in significantly improved

results except for wake detection. In particular, the relatively large improvement in detecting

deep sleep epochs (Kappa of 0.43 ± 0.19 versus 0.33 ± 0.17) indicates that the new features

can benefit the deep sleep detection most.

Table 4.3 compares the performance of our sleep stage classifiers (for multiple stages) with

those reported in literature. For instance, Hedner et al. [127] presented a Kappa of 0.48 and an


0 2 4 6 8 10 12 14

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

# Features

Kappa c

oeffi

cie

nt

0 4 8 12 16 20 24

# Features

0 2 4 6 8 10 12 14

# Features

0 4 8 12 16 20 24

# Features

W R D N WRLD WRN

(d) Feature set all“ ”with normalization

(a) Feature set exist“ ”without normalization

(b) Feature set all“ ”without normalization

(c) Feature set exis“ t”with normalization

Figure 4.7: Kappa coefficient of Sleep stage classification versus the number of selected features ranked

by their IG values in a descending order. Results were obtained based on 10-fold CV using feature set (a)

“exist” and (b) “all” without subject-specific feature normalization and using feature set (c) “exist” and

(d) “all” with subject-specific feature normalization. WRLD: classification of wake, REM sleep, light

sleep and deep sleep; WRN: classification of wake, REM sleep and NREM sleep; W: wake detection; R:

REM sleep detection; D: deep sleep detection; N: NREM sleep detection.

overall accuracy of 65.4% in classifying wake, REM sleep, light sleep and deep sleep, which

outperform our results but they used more signal modalities such as peripheral arterial tone,

pulse rate, oxyhemoglobin saturation and actigraphy. With respect to WRN classification, al-

though Redmond et al. [249] obtained better results compared with our study, they included

more signal modalities including cardiac activity. Besides, our results are slightly better than

those reported in some other studies, e.g., Kappa of 0.42 by Mendez et al. [197] and Kappa of

0.44 by Kortelainen et al. [161], where they considered ballistocardiogram (BCG) that contains

also cardiac information. Nevertheless, when only using respiratory activity, Sloboda et al.

[277] achieved an overall accuracy of ∼70% (with 9 respiratory features using a naive Bayes

classifier) which is much lower than that presented in this chapter.

4.4 Discussion

The respiratory effort signals were calibrated using the DTW-based method. The DTW measure

has been proven to be in association with body movements [180], where a significant Spear-

man’s rank correlation coefficient (r = 0.32, p < 0.0001) was reported. Further, we obtained

a higher correlation (r = 0.56, p < 0.0001) between the quantified body movements using the

DTW-based method (where the DTW measures lower than 0.01 were set to be zero) and activ-

ity counts computed using actigraphy based on the data set used in that study. We also tested


Table 4.2: Summary of sleep stage classification performance (10-fold CV) using feature set “exist”

and “all” with and without performing subject-specific feature normalization

Task Feat. Without normalization With normalization

set # Acc. (%) Kappa # Acc. (%) Kappa

WRLD Exist 14 58.4 ± 6.8 0.26 ± 0.12 13 61.7 ± 6.9 0.32 ± 0.11

All 25 59.2 ± 8.6∗ 0.29 ± 0.14∗ 24 63.8 ± 8.1∗ 0.38 ± 0.14∗

WRN Exist 14 71.7 ± 7.4 0.32 ± 0.14 13 75.0 ± 6.7 0.41 ± 0.13

All 25 72.3 ± 8.1∗ 0.34 ± 0.15∗ 23 76.2 ± 7.9∗ 0.45 ± 0.15∗

W detection Exist 6 89.8 ± 6.3 0.49 ± 0.16 10 90.1 ± 4.2 0.50 ± 0.14

All 9 89.8 ± 6.2∗ 0.49 ± 0.16∗ 15 90.3 ± 4.1∗ 0.51 ± 0.15∗

R detection Exist 14 79.4 ± 7.8 0.29 ± 0.19 14 82.0 ± 5.6 0.39 ± 0.20

All 26 79.9 ± 7.6∗ 0.31 ± 0.19∗ 26 82.7 ± 5.8∗ 0.44 ± 0.20∗

D detection Exist 12 84.6 ± 4.9 0.26 ± 0.19 10 84.9 ± 4.3 0.33 ± 0.17

All 8 86.1 ± 4.1∗ 0.34 ± 0.22∗ 5 86.1 ± 4.1∗ 0.43 ± 0.19∗

N detection Exist 13 72.8 ± 10.8 0.40 ± 0.17 14 75.2 ± 8.0 0.44 ± 0.17

All 23 73.3 ± 11.6∗ 0.42 ± 0.19∗ 25 76.8 ± 8.7∗ 0.48 ± 0.18∗

For each feature set, the results obtained using the selected features leading to maximum Kappa coefficient

are reported (see Figure 4.7). Significance of difference between the results obtained using feature set “ex-

ist” and “all” was examined with a paired two-sided Wilcoxon signed-rank test (∗p < 0.05, ∗∗p < 0.01,∗∗∗p < 0.001, NS: not significant). For all metrics, significant difference was found between the results ob-

tained with and without subject-specific feature normalization at p < 0.01 except for wake detection.

the sensitivity of the threshold and found that the discriminative power of the respiratory am-

plitude features did not dramatically change when the threshold was ranging between ∼0.005

and ∼0.013. To analyze the adequacy of this method for sleep stage classification, we com-

pared the discriminative power as well as the classification performance of these new features

between using actigraphy [181] and using the DTW-based method to calibrate the respiratory

effort signals. The results are comparable. This suggests that the DTW measure is an adequate

estimate of actigraphy for identifying body movements and is therefore effective in mitigating

the effect of body motion artifacts on computing the respiratory amplitude features.

As stated in Section 4.4.2 and 4.4.3, the respiratory amplitude features were computed with

a window of 25 epochs (12.5 min). This served to capture the changes of respiratory depth and

volume as well as providing reliable regularity measures of peak/trough sequences using sam-

ple entropy with sufficient data points. Additionally, we hypothesized that the respiratory effort

area can accurately represent breathing tidal volume or ventilation when extracting the respi-

ratory volume-based features. However, this hypothesis is not always acceptable, in particular

for subjects who change their posture during sleep [307]. In those cases these features might

be inaccurately computed, thus harming classification performance. This challenge should be

further studied.


Table 4.3: Summary of sleep stage classification performance (10-fold CV) using feature set

“exist” and “all” with and without performing subject-specific feature normalization

Task First author/year Modality N # Feat. Classifier Acc. (%) Kappa

WRLD Hedner 2011 [127] PAT 227 – zzzPAT 65.4 0.48

Isa 2011 [138] ECG 16 9 RF 60.3 0.26

This work RE 48 26 LD 63.8 0.38

WRN Redmond 2007 [249] ECG,RE 31 30 LD 76.1 0.45

Mendez 2010 [198] BCG† 17 46 KNN 72.0 0.42

Kortelainen 2010 [161] BCG‡ 18 4 HMM 79.0 0.44

Sloboda 2011 [277] RE 16 9 NB ∼70 –

Xiao 2013 [312] ECG 45 41 RF 72.6 0.46

This work RE 48 26 LD 76.2 0.45

For signal modalities – PAT includes peripheral arterial tone, pulse rate, oxyhemoglobin saturation, and

actigraphy; RE, respiration; BCG, ballistocardiogram measured with bed sensor (†BCG with cardiores-

piratory activity and body movement and ‡BCG with cardiac activity and body movement).

For classifier – zzzPAT, a sleep staging algorithm developed by Herscovici et al. [131]; RF, random for-

est; LD, linear discriminant; HMM, hidden Markov model; NB, naive Bayes; KNN, k-nearest neighbor.

Although the addition of the respiratory amplitude features resulted in enhanced perfor-

mance in WRLD and WRN classifications (Table 4.2), the improvements seem relatively mod-

est in general. One explanation is that these new features are correlated with the existing fea-

tures as discussed before and the additional information is limited. Upon a closer look, we

found that the new features contributed more on deep sleep detection than other detection tasks.

As a result, this would yield relatively lower performance improvements for multiple-stage

classifications since deep sleep only accounts for an average of 14.5% over the entire night.

As shown in Table 4.2, the new features could not help improve wake detection. Actually, the

existing features Sdtw and Sdfw have been shown to be reliable in detecting wake epochs with

body movements in our previous study [180]. In this work, to focus more on the respiratory

depth and volume properties without being influenced by body movements, we excluded the

‘dubious’ peaks and troughs (see Section 4.2) where some of them are possibly body motion

artifacts which are often indication of wake epochs. Therefore, the new features here might not

be able to help detect ‘quiet wake’ (wakefulness without body movements). Nevertheless, the

effect of body movements on the respiratory depth and volume needs to be further studied.

In addition, we observe that the variation of sleep stage classification results between sub-

jects still remains high (see Table 4.2). For instance, the average Kappa values of WRLD and

WRN classifications over all subjects are 0.38 ± 0.14 and 0.45 ± 0.15, respectively. This is

mainly caused by large physiological differences between subjects in the way sleep stages are

expressed on respiratory features, which naturally leads to difficulties in enhancing the clas-

sification performance for some subjects. Therefore, it is still worth investigating methods to

reduce the between-subject variability of the features.

In this work we selected features solely based on their discriminative power measured by


IG. This approach did not take the correlation or relevance between features into account so

that some of them might likely redundant to some extent. On average, the maximum abso-

lute Spearman’s rank correlation coefficient |r|max between each new feature and the existing

features is 0.35 ± 0.11 (ranging from 0.07 ± 0.46 for different new features, p < 0.01). For

instance, the highest correlation (r = 0.46, p < 0.0001) occurs between Fsd and Tsdm, indicat-

ing that the variation of respiratory frequency is highly correlated with respiratory depth and

its change. Hence, employing feature selectors that aim at reducing feature redundancy merits

further investigation, especially when more features are incorporated.

As presented in Table 4.3, our methods achieved acceptable sleep stage classification results

when using respiratory information alone. Although the results are lower than some other stud-

ies, those studies used more signal modalities such as cardiac activity. We therefore anticipate

that the classification performance should be further enhanced when combining respiratory and

cardiac activity, which will be further studied. Moreover, we only used the simple LD classi-

fier as long as we exclusively focused on analyzing new features for sleep stage classification.

Nevertheless, more advanced classification algorithms merit investigation in future work.

4.5 Conclusion

In this chapter, respiratory effort amplitude (with respect to breathing depth and volume) was

analyzed and quantified during nighttime sleep, which was found to differ across sleep stages.

Based on this, 12 novel features that characterize different aspects of respiratory effort ampli-

tude were extracted for automated sleep stage classification. To eliminate the effect of body

movements during sleep, respiratory effort signals were calibrated by using a DTW measure

which has been shown to correlate with body motion artifacts. By calibrating the signals and

normalizing the features for each subject, the discriminative power of the features can be in-

creased. When using only respiratory effort signals, combining the new features proposed in

this work with the existing respiratory features (known in literature) can help significantly im-

prove the performance in classifying and identifying different sleep stages with an exception of

wake state detection.

CHAPTER 5

Measuring dissimilarity between respiratory effort signals

based on uniform scaling for sleep staging

This chapter is adapted from: X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R.

M. Aarts. Measuring dissimilarity between respiratory effort signals based on uniform scaling for sleep

staging. Physiological Measurement, 35(12):2529–2542, 2014. c©IOP Publishing

Abstract – Polysomnography (PSG) has been extensively studied for sleep staging, where sleep

stages are usually classified as wake, rapid-eye-movement (REM) sleep, or non-REM (NREM)

sleep (including light and deep sleep). Respiratory information has been proven to correlate

with autonomic nervous activity that is related to sleep stages. For example, it is known that the

breathing rate and amplitude during NREM sleep, in particular during deep sleep, are steadier

and more regular compared to periods of wakefulness that can be influenced by body move-

ments, conscious control, or other external factors. However, the respiratory morphology has

not been well investigated across sleep stages. We thus explore the dissimilarity of respira-

tory effort with respect to its signal waveform or morphology. The dissimilarity measure is

computed between two respiratory effort signal segments with the same number of consecu-

tive breaths using a uniform scaling distance. To capture the property of signal morphological

dissimilarity, we propose a novel window-based feature in a framework of sleep staging. Ex-

periments were conducted with a data set of 48 healthy subjects using a linear discriminant

classifier and a 10-fold cross validation. It is revealed that this feature can help discriminate be-

tween sleep stages, but with an exception of separating wake and REM sleep. When combining

the new feature with 26 existing respiratory features, we achieved a Cohen’s Kappa coefficient

of 0.48 for 3-stage classification (wake, REM sleep, and NREM sleep) and of 0.41 for 4-stage

classification (wake, REM sleep, light sleep, and deep sleep), which outperform the results

obtained without using this new feature.

71

72 Chapter 5. Uniform scaling dissimilarity on respiratory effort

5.1 Introduction

Previous studies have shown that characteristics of human respiratory activity are associated

with sleep stages throughout the entire night [95, 281]. Respiratory effort has been increasingly

used for objective sleep analysis [253] and sleep staging [69, 249] in contrast to traditional

polysomnography (PSG) which is considered the “gold standard” in sleep studies. This is

because respiratory activity is able to be acquired in an easy and unobtrusive manner using,

for example, bed sensors [161, 304], Doppler radar [194], photoplethysmography [174], or a

watch-based device [131]. Sleep consists of wake, rapid-eye-movement (REM) sleep, and four

non-REM (NREM) sleep stages S1-S4 according to the R&K rules [247]. In regard to S3 and

S4, the American Academy of Sleep Medicine (AASM) guidelines [136] and their updated

rules [38] suggest merging them into a single “deep sleep” or slow wave sleep stage. S1 and

S2 often correspond to “light sleep” [51, 276]. With PSG, sleep stages are manually scored

by sleep technicians on 30-s epochs based on multiple electrophysiological signals including

electroencephalography (EEG), electrooculography (EOG), and electromyography (EMG). The

manually scored sleep stages can be visualized in a hypnogram.

It has been reported in earlier studies that some characteristics of respiration differ across

sleep stages such as respiratory frequency [95], respiratory variability [256], different frequency

components of respiratory spectrum [249], etc. However, the dissimilarity of respiratory effort

in terms of signal waveform or morphology for different sleep stages has not been well explored.

In fact, the respiratory pattern (e.g., amplitude and frequency) has been shown to be more stable

and regular during NREM sleep (in particular during deep sleep) than during wake and REM

sleep [67, 129]. The irregularity of breathing is usually caused by body movements, alternation

of ventilation control, or behavioral factors when awake [230] and it is related to paralysis of

voluntary musculature (muscle atonia) during REM sleep [233]. In this matter, we may then

anticipate that if a sleep stage has a higher regularity in breathing, the respiratory effort in this

stage would have lower dissimilarity in between. On the other hand, the respiratory dynam-

ics have been found to associate with physiologic states such as sleep stages which distinctly

correspond to autonomic regulatory mechanisms [226, 267, 292]. We therefore hypothesise

that (1) the respiratory effort is characterized by signal morphology and (2) the dissimilarity

between two respiratory effort periods is influenced by their corresponding sleep stages. Re-

search has been focusing on investigating respiration changes during sleep [149, 256]. For in-

stance, some researchers analyzed non-random variability of respiration (e.g., breath-by-breath

intervals) on short- and long-term scales [256], whereas with a much less focus on comparing

respiratory patterns of multiple breaths. Although some parameters including breathing rate,

inspiratory/expiratory volumes, and minute volume were investigated, the respiratory morphol-

ogy was less researched.

Many methods have been utilized to compare two time series such as cross-correlation, de-

trended fluctuation analysis, and cross-approximate entropy, however, they can be limited by

several factors including the non-stationary trend of data, insufficient number of data points

for, e.g., polynomial fitting, low relative consistency, and/or unequal length between time series

[31, 133, 250]. The idea here is to use a Euclidean-based distance as a dissimilarity metric


between two respiratory effort signal segments from a subject. When computing the distance,

each signal segment is selected inside its corresponding 30-s epoch to have a certain number of

consecutive breaths, served to provide an even comparison on their signal morphology. These

signal segments are usually less than 30 s. It is inevitable that the length (i.e., number of data

points) of any two signal segments differs so that they are necessarily required to be scaled at

an equal length in order to perform an Euclidian (sequential) mapping. To resolve this prob-

lem, we propose to use a uniform scaling method [314] to re-scale the two signal segments

by searching for the minimal Euclidean distance between them. In other words, they are uni-

formly ‘stretched’ to allow for a reduction on the effects of variant breathing frequency to a

certain degree, resulting in focusing more on signal morphology.

As for automatic sleep staging, it is particularly interesting to know if different sleep stages

can be distinguished by means of respiratory effort data when the PSG-based hypnogram is

absent. This would benefit the applications of home-based sleep staging or sleep stage clas-

sification which has been attracting increasing attention in recent years [89, 182, 248, 264].

Information regarding sleep stages is usually extracted as epoch-based “features” used to per-

form epoch-by-epoch classification. For this purpose, we propose a new feature to describe the

dissimilarity of respiratory effort morphology between different epochs from the same record-

ing. Of this feature, discriminative power in classifying sleep stages will be evaluated and it is

expected to help improve sleep staging performance.


5.2.1 Subjects and protocol

Forty eight healthy subjects [21 men and 27 women; mean age 41.3 y ranging from 20 to

83, standard deviation (SD) 16.1; mean body mass index 23.6 kg·m−2 ranging from 19.1 to

31.3, SD 2.9] in the SIESTA project [160] are considered. The project was supported by the

European Commission and the subjects were monitored in seven different sleep laboratories

located in five European countries over a period of three years from 1997 to 2000. The subjects

had a Pittsburgh Sleep Quality Index [60] of less than 6 and fulfilled several criteria (e.g., no

depressive symptoms, no reported medical, neurological, mental or cardiovascular disorders,

no history of drug abuse or habituation, no psychoactive medication, no shift work, and usually

bedtime before midnight). According to the study protocol of the SIESTA project, all subjects

provided an informed consent, documented their sleep habits over 14 nights, and spent two

consecutive nights (on days 7 and 8) in the sleep laboratory [19]. More details regarding the

subject information and the study protocol can be found online (http://www.ofai.at/siesta). In

this study, we only include single-night PSG recordings (on day 7) for analysis.

5.2.2 Polysomnographic measurements

Full PSG data, including multiple EEG, EOG, and EMG channels, electrocardiography (ECG),

respiratory effort, oxygen saturation, snoring, etc., were recorded for each subject and the sleep


Table 5.1: Sleep data from 48 healthy subjects, where mean ± SD

and range are given.



Total number of epochs (#) 938.3 ± 44.5 796 − 1026

Wake (%) 12.9 ± 6.1 1.2 − 24.5

REM sleep (%) 19.0 ± 3.3 15.3 − 26.5

NREM sleep (%) 68.1 ± 4.9 56.1 − 76.3

Light sleep (%) 53.6 ± 5.5 42.7 − 66.7

Deep sleep (%) 14.5 ± 4.8 5.3 − 28.5

stages were visually scored by professional sleep technicians as wake, REM, and S1-S4 on

30-s epochs according to the R&K rules. Thoracic breathing movements were measured by

respiratory inductance plethysmography (RIP) in the form of respiratory effort signals at a

sampling rate of 10 Hz. For the problem of sleep staging, we consider deep sleep (merged S3

and S4) as a single stage as suggested by the AASM guidelines. In the mean time, S1 and S2

are merged as single light sleep.

Referring to the statistics of normal sleepers across the human lifespan reported previously

[216], the selection of overnight recordings from a larger data set met several criteria including

the sleep efficiency of ≥75%, REM sleep of ≥15%, and deep sleep of ≥5%. The sleep data is

summarized in Table 5.1, in which mean and SD over subjects and range are presented.

5.2.3 Signal processing

The raw respiratory effort signals are first low-pass filtered (10th order Butterworth filter with

a cut-off frequency of 0.6 Hz) in order to eliminate high-frequency noise. Then the baseline is

removed by subtracting the median peak-to-trough amplitude estimated over the entire record-

ing, which serves to compute the respiratory volume-based features. These features will be

described further in Section 5.2.7. The localization of respiratory peaks/troughs is achieved by

detecting the signal turning points based on sign changes of the signal slopes. Afterwards, we

remove the falsely detected peaks/troughs (1) with too short peak-to-trough or trough-to-peak

intervals (where the sum of two successive intervals is less than the median of all intervals over

the entire recording) and (2) with too small amplitudes (where the peak-to-trough difference

is smaller than 0.15 times the median of the entire respiratory signal). These methods were

validated by comparing the automatically detected results with manually annotated peaks and

troughs and an accuracy of ∼98% was achieved.

5.2.4 Dissimilarity measure with uniform scaling

Given an overnight respiratory effort recording with L epochs from a subject, the ith epoch is

expressed as Ui = ui,1,ui,2, . . . ,ui,n (i = 1,2, . . . ,L) with n data points (here n = 300 at the

signal sampling rate of 10 Hz). As explained before, we only choose a signal segment with


a certain number of consecutive breaths λ inside this epoch when computing the dissimilarity

score, thereby the chosen signal segment for this particular epoch Ui is expressed by Vi =

vi,1,vi,2, . . . ,vi,mi with mi data points (mi ≤ n). The locations of vi,1 and vi,mi

are based on the

detected respiratory peaks or troughs within this epoch so that the segment Vi contains several

complete breaths, starting and ending at two different troughs. The signal segment length mi is

dependent of i because respiratory frequency usually varies between signal segments, even if

they might have a same number of breaths. Besides, it also depends on the prescribed number

of breaths λ .

Let us consider two epochs Ui and U j (i, j = 1,2, . . . ,L and i 6= j) with pi and qi consecutive

breaths, respectively. To ensure an equal number of breaths that aims at evenly comparing

their dissimilarity, we have λ = minpi,qi. For the epoch with more breaths, only the λ

breaths in the middle are selected, yielding a signal segment within this epoch. Then the two

signal segments Vi and Vj (i 6= j) with λ breaths each are normalized at zero mean and unit

variance (Z-score normalization). However, the two signal segments may have unequal lengths,

which is not applicable for computing the Euclidean distance between them. To tackle this, we

utilize uniform scaling, a Euclidean-based minimization method. For Vi and Vj, assuming that

mi ≤ m j, a uniformly scaled series of Vi is expressed as V ki = vk

i,1,vki,2, . . . ,v

ki,k with length of k

(mi ≤ k ≤ m j), where vki,x = vi,⌈x·mi·k−1⌉ for x = 1,2, . . . ,mi. Hence, the dissimilarity score dscore

between Ui and U j is the uniform scaling distance dus between Vi and Vj, which can be obtained

by minimizing the Euclidean distance subject to mi ≤ m j, such that

dscore(Ui,U j)≡ dus(Vi,Vj) = minmi≤k≤m j

√√√√1

k

k

∑x=1

(vki,x − v j,x)2. (5.1)

Since the k-space Euclidian distance metric is sensitive to series length k which usually en-

counters different values in Equation 5.1, the distance should be normalized by k. Figure 5.1

depicts an example of computing the dissimilarity score dscore between two epochs. Note that

dscore is computed within each recording (or subject for the single-night data) to avoid the effect

of between-subject variability, often caused by the existence of physiological difference from

subject to subject.

5.2.5 Windowed dissimilarity feature

It is of interest to extract a feature for each 30-s epoch to capture the dissimilarity property of

respiratory effort morphology. This feature can in turn be used to separate different sleep stages.

To do so, we compute the mean dissimilarity score between each epoch and the other epochs

from the same recording within a window, named by windowed (self-) dissimilarity feature and

denoted as Dwin henceforth. We expect that this feature is not independent of sleep stage and

thus it is informative for sleep staging. For the ith epoch Ui of a given subject, it is computed as

Dwin(Ui) =∑ j dscore(Ui,U j)

min(w, i−1)+min(w,L− i), for | j− i| ≤ w and j 6= i, (5.2)


600 605 610 615 620 625 630

1

0

1

(a)

Time (s)

Resp. eff

ort

(a.u

.)

0 50 100 150 200 250

1

0

1

2(b)

Sample (at 10 Hz)

Resp. eff

ort

(a.u

.)

0 50 100 150 200 250

1

0

1

2(c)

Sample (at 10 Hz)

Resp. eff

ort

(a.u

.)Ui Uj

Vi Vj

Vki Vj

Figure 5.1: An example of computing the dissimilarity score of respiratory effort between two epochs:

(a) original signals Ui and U j at 10 Hz within 30-s epochs; (b) selected signal segments Vi and Vj with

5 consecutive breaths, where series lengths are unequal; (c) uniformly scaled series V ki and Vj, where k

equals the length of Vj. Note that the signal segments in (a) and (b) are normalized to have zero mean

and unit variance.

in which L is the total number of epochs for this specific subject and w = 1,2, . . . ,L is the

(single-side) size of the window centered at Ui. This means that Dwin is a feature with a certain

time (or window) scale. The window size w is determined by maximizing the feature discrim-

inative power. Intuitively, the majority of the epochs contained within a small window should

be in the same sleep stage as the given epoch. This can be examined by comparing the percent-

age of occurrence for different sleep stages versus the time difference ∆ between epochs. We

also analyze the changes of dscore for ‘self-comparisons’ versus ∆, where dscore is computed be-

tween epochs with same sleep stage (i.e., wake-wake, REM-REM, light-light, and deep-deep).

To reduce noise in feature level caused by measurement errors or body motion artifacts, Dwin is

smoothed over the entire-night recording using a moving average method (with a 10-min span).

5.2.6 Feature analysis

For the windowed dissimilarity feature Dwin, we first compare its mean value and SD over

all subjects between sleep stages. In addition to that, we compute its discriminative power for

sleep staging using One-Way analysis of variance (ANOVA) F-statistic. A higher discriminative

power leads to a larger value of ANOVA F-statistic. The F-statistic of Dwin is then compared

with that of the existing features by ranking it among all the features. The distributions of

Dwin in different sleep stages are found to approximately follow a normal distribution using a

Quantile-Quantile (Q-Q) plot method.


5.2.7 Sleep staging

As stated, the new feature Dwin can be incorporated to perform automatic sleep staging when

solely using respiratory effort data. A set of 26 existing respiratory features have been used to

classify sleep stages in previous studies. They comprise features in both time and frequency do-

main [248], respiratory depth- and volume-based features [182], and non-linear features based

on sample entropy [75] and dynamic warping [180]. Table 5.2 lists and describes all the respi-

ratory features. To examine whether Dwin can help achieve an enhanced classification perfor-

mance, we compare the classification results with and without adding it to the existing feature

set. Note that for the purpose of reducing between-subject variability in respiration, all the

features are normalized (Z-score) for each overnight recording.

We simply adopt a linear discriminant (LD) classifier which has been widely used for the

task of sleep staging [89, 108, 182, 249]. The data including 48 entire-night recordings is

randomly divided to 10 data subsets where each fold consists of four or five recordings and

then we execute the sleep staging iteratively using a 10-fold cross-validation (CV). During each

iteration, the classifier is trained on nine folds and validated on the remaining one in order to

minimize the classifier bias.

To evaluate the classifier, we use Cohen’s Kappa coefficient κ [72] in addition to overall

accuracy because it is more appropriate for analyzing unbalanced data (in our case light sleep

accounts for 53.6% which is much larger than the other stages). To exploit the prior proba-

bilities of different sleep stages in an LD classifier that may change over time, we compute

a time-varying prior probability (TVPP) for each epoch by counting the relative frequency of

occurrence of each sleep stage at its corresponding time of the night based on the associated

training data. More details about TVPP can be found elsewhere [249]. Here we present results

for two sleep staging schemes, including 4-stage classification (wake, REM sleep, light sleep,

and deep sleep) and 3-stage classification (wake, REM sleep, and NREM sleep).

5.3 Results

The (single-side) window size w of 25 epochs was experimentally found to be an appropriate

value when computing the new feature Dwin, where its feature discriminative power in classi-

fying wake, REM sleep, light sleep, and deep sleep was maximized. Figure 5.2 compares the

percentage of occurrence in different sleep stages changing over ∆. The figure indicates a pres-

ence of self-comparisons with a higher likelihood if |∆| is smaller than a value (e.g., ∼30 epochs

for wake, REM sleep, and deep sleep). It also illustrates that the comparison between each sleep

stage and light sleep dominates if |∆| is larger than that value. These graphs imply that, for our

choice of w = 25 epochs, the feature values of Dwin depend more on the self-comparisons. As

shown in Figure 5.3, in regard to the self-comparisons, we observe that different sleep stages

can be separated by the dissimilarity score within the 25-epoch window except for that between

wake and REM sleep where overlaps occur.

Figure 5.4 compares the feature values of Dwin in different sleep stages (mean ± SD and

histogram), in which the separation can be observed between sleep stages, particularly be-


Table 5.2: A list of respiratory features

Feature index Description

Existing∗

1 Respiratory frequency estimated in the frequency domain

2 Spectral power of respiratory frequency

3 Spectral power in the very low frequency (VLF) band (0.01-0.05 Hz)

4 Spectral power in the low frequency (LF) band (0.05-0.15 Hz)

5 Spectral power in the high frequency (HF) band (0.15-0.5 Hz)

6 Ratio of spectral power between LF and HF bands

7 Standard deviation of respiratory frequency over 150 s

8 Mean breath-by-breath correlation

9 Standard deviation of breath-by-breath correlation

10 Standard deviation of breath length

11 Respiratory frequency estimated in the time domain

12 Respiratory regularity measured by sample entropy

13 Respiratory similarity measured by dynamic time warping

14 Respiratory similarity measured by dynamic frequency warping

15 Standardized median of respiratory peaks

16 Standardized median of respiratory troughs

17 Respiratory peak regularity measured by sample entropy

18 Respiratory trough regularity measured by sample entropy

19 Median respiratory peak-to-trough difference

20 Median respiratory volume during breath cycles

21 Median respiratory volume during inhalations

22 Median respiratory volume during exhalations

23 Median respiratory flow rate during breath cycles

24 Median respiratory flow rate during inhalations

25 Median respiratory flow rate during exhalations

26 Ratio of inhalation and exhalation flow rate

New

27 Respiratory dissimilarity measured by uniform scaling (Dwin)

∗The references for the features are 1-11 [249, 288], 12 [75], 13 and 14 [180], and 15-26 [182].

tween deep sleep and the other stages and between REM and NREM sleep. An example of an

overnight hypnogram and the corresponding Dwin values from a 50-year-old female are illus-

trated in Figure 5.5, where the correlation between them can be seen. Table 5.3 presents the

discriminative powers (as measured by ANOVA F-statistic) of Dwin in separating different sleep

stages. For comparison, we also provide its ranking among all features as well as the top-10

ranked features (in a descending order in terms of F-statistic) in the table.

The respiratory effort-based sleep staging results using the feature set with and without Dwin

are compared in Table 5.4, where the overall accuracy and the Cohen’s Kappa coefficient are

reported. It is noted that combining Dwin with the existing features resulted in a significantly


−200 −100 −25 0 25 100 2000

0.5

1

∆ −(30 s epoch)

Perc

enta

ge (

%)

(a) Wake

−200 −100 −25 0 25 100 2000

0.5

1

∆ −(30 s epoch)

Perc

enta

ge (

%)

(b) REM

−200 −100 −25 0 25 100 2000

0.5

1

∆ −(30 s epoch)

Perc

enta

ge (

%)

(c) Light

−200 −100 −25 0 25 100 2000

0.5

1

∆ −(30 s epoch)

Perc

enta

ge (

%)

(d) Deep

Wake REM Light Deep

Figure 5.2: The probability of occurrence of different sleep stages versus time difference ∆ for (a) wake,

(b) REM, (c) light, and (d) deep sleep epochs. The boundary of the 25-epoch window for computing

Dwin is indicated (dashed line). For all stages, light sleep percentage is larger than any other stages when

|∆| > ∼30 epochs.

0 25 50 75 100

0.7

0.8

0.9

1

| | (30 s epoch)∆ −

dscore

(a.u

.)

Wake wake−

REM REM−

Light light−

Deep deep−

Figure 5.3: Mean dissimilarity score dscore versus absolute time difference |∆| for self-comparisons

wake-wake, REM-REM, light-light, and deep-deep. The boundary of the 25-epoch window for comput-

ing Dwin is indicated (dashed line).


Wake REM Light Deep

0.6

0.8

1

1.2

Sleep stage

Dw

in(a

.u.)

0.3 0.5 0.7 0.9 1.1 1.3 1.50

0.05

0.1

0.15

0.2

Dwin (a.u.)

No

rma

lize

d h

isto

gra

m (

%) Wake

REM

Light

Deep

(b)(a)

0.78 0.17±

0.89 0.16±

0.98 0.13±

0.99 ± 0.14

Figure 5.4: Comparison of the windowed dissimilarity feature Dwin in different sleep stages: (a)

mean ± SD and (b) normalized histogram (i.e., percentage, %).

0 100 200 300 400 500 600 700 800 900

0.5

1

1.5

Deep

Light

REM

Wake

Time (30 s epoch)

Dw

in(a

.u.)

(a)

(b)

Figure 5.5: An example of (a) overnight annotation and (b) feature values of Dwin from a 50-year-old

female, where the unsmoothed (gray) and smoothed (black) feature values are both shown.

increased κ of 0.41 at an overall accuracy of 64.9% when classifying four sleep stages and

of 0.48 at an over accuracy of 77.1% when classifying three sleep stages (both with TVPP).

The table also shows the results obtained without applying TVPP, indicating that using TVPP

can help achieve significantly better results. Here the significance was checked with a two-

sided Wilcoxon signed-rank test. To understand what aspects of sleep staging the new feature

improves, we present the confusion matrices obtained with and without Dwin in Table 5.5 (for

4-stage classification) and in Table 5.6 (for 3-stage classification), where TVPP was applied.


Table 5.3: Discriminative power of Dwin in separating different sleep stages as evalu-

ated and ranked by ANOVA F-statistic. Results are pooled over all subjects

Sleep stages F-statistic Rank† Top 10 features‡ (descending order)

Wake/REM 10.7∗∗ 25 12, 13, 3, 5, 4, 14, 20, 21, 18, 22

Wake/light 1487.5∗ 9 13, 14, 7, 4, 5, 3, 15, 17, 27, 16

Wake/deep 3694.4∗ 2 7, 27, 16, 15, 14, 4, 17, 13, 3, 18

REM/light 1679.0∗ 2 7, 27, 14, 15, 20, 21, 16, 22, 24, 23

REM/deep 4915.8∗ 2 7, 27, 16, 20, 15, 21, 25, 22, 23, 24

Light/deep 1420.9∗ 4 16, 15, 7, 27, 17, 14, 10, 4, 8, 18

Wake/REM/light/deep 1912.6∗ 6 7, 16, 15, 13, 14, 27, 4, 5, 17, 3

Wake/REM/NREM 2012.8∗ 4 7, 13, 14, 27, 5, 4, 15, 16, 3, 12

†Ranking of F-statistic among all respiratory features.‡The feature indices are referred to Table 5.2 and the new feature (feature 27) is indicated

with underline.∗p < 0.0001, ∗∗p < 0.005.

Table 5.4: Ten-fold CV results of 4-stage (wake, REM sleep, light sleep and deep

sleep) and 3-stage (wake, REM sleep, and NREM sleep) classification schemes ob-

tained using the feature set with and without Dwin, where the results obtained with

and without using TVPP are also presented

Scheme TVPP Without Dwin† With Dwin

‡

Accuracy Kappa (κ) Accuracy Kappa (κ)

4 stages No 53.7 ± 8.3% 0.34 ± 0.12 55.2 ± 8.0%∗ 0.37 ± 0.11∗

Yes 63.8 ± 8.0% 0.38 ± 0.14 64.9 ± 7.8%∗∗ 0.41 ± 0.14∗

3 stages No 69.2 ± 9.7% 0.43 ± 0.16 70.0 ± 9.3%∗∗ 0.45 ± 0.15∗∗

Yes 76.1 ± 7.8% 0.45 ± 0.16 77.1 ± 7.6%∗∗ 0.48 ± 0.17∗

†26 existing features.‡27 features (26 existing features and Dwin).

Significance of difference was found with and without Dwin using a paired Wilcoxon

signed-rank test (two-sided) at ∗p < 0.001 or ∗∗p < 0.01.

Table 5.5: Confusion matrix of 4-stage classification (10-fold CV) obtained using fea-

ture set with and without Dwin, where the results without Dwin are given in parentheses

PSG ↓ Classified → Wake REM sleep Light sleep Deep sleep

Wake 2608 (2606) 512 (453) 2533 (2622) 56 (28)

REM sleep 269 (288) 4259 (3679) 3992 (4492) 13 (74)

Light sleep 844 (831) 2018 (1839) 19285 (19569) 1883 (1791)

Deep sleep 35 (33) 55 (65) 3532 (3664) 2887 (2747)


Table 5.6: Confusion matrix of 3-stage classification (10-fold CV) ob-

tained using feature set with and without Dwin, where the results with-

out Dwin are given in parentheses

PSG ↓ Classified → Wake REM sleep NREM sleep

Wake 2605 (2596) 540 (495) 2564 (2618)

REM sleep 271 (278) 4255 (3909) 4007 (4346)

NREM sleep 851 (861) 2112 (2050) 27576 (27628)

5.4 Discussion

The deployment of respiratory effort dissimilarity with several consecutive breaths (as mea-

sured by a uniform scaling distance) to characterize the regulation of breathing within different

sleep stages was investigated. On average, we observe the lowest dissimilarity score between

two deep sleep epochs. This is because respiratory effort during NREM sleep (in particular dur-

ing deep sleep) is steadier and more regular compared with that during wake and REM sleep

as mentioned before. As illustrated in Figure 5.3, the discrimination between wake and REM

sleep in terms of respiratory effort dissimilarity over time difference is not consistent and seems

maximized at |∆| beyond 40 epochs. With smaller time differences, overlap can be observed

between the dissimilarity scores for wake-wake and REM-REM comparisons. During wake,

breathing control might be somewhat less affected by conscious control as well as body move-

ments or other external influences in a short range (e.g., with a |∆| of less than 10 epochs or 5

minutes). This would decrease the dissimilarity scores of wake-wake comparison during that

range, yielding a difficulty in distinguishing between wake and REM sleep. As a result of that,

the windowed dissimilarity feature Dwin has a low discriminative power in separating wake and

REM sleep as shown in Table 5.3. Actually, classifying wake and REM sleep might sometimes

be difficult even with PSG-based visual scoring [276].

In this work, we chose the window size w of 25 epochs to compute Dwin by globally maxi-

mizing the feature discriminative power in classifying wake, REM sleep, light sleep, and deep

sleep. However, it might not be the optimal choice all the time, particularly in separating wake

and REM sleep (see Figure 5.3). The optimal window size might vary when classifying dif-

ferent sleep stages. Therefore, we think that using an adaptive window size to discriminate

between different sleep stages merits further investigation.

Regarding sleep staging, the new feature Dwin helped improve the classification performance

(Table 5.4) and it contributed more to the detection of REM and deep sleep from the other sleep

stages (Table 5.5). It is therefore suggested that this feature contains additional information

that is not carried by the existing features. We also reveal that using TVPP can lead to better

classification results, as shown in Table 5.4. With cardiorespiratory activity, a κ of 0.46 and an

overall accuracy of 76.1% were achieved when classifying wake, REM sleep, and NREM sleep

for 31 healthy subjects [249]. We obtained slightly better results with the use of the respiratory

information alone. For 4-stage classification (wake, REM sleep, light sleep, and deep sleep),

a κ of 0.48 and an overall accuracy of 65.4% (re-computed based on the reported confusion


matrix) were achieved by Hedner et al. [127], which outperform our results. However, they

employed more signal modalities including peripheral arterial tone, pulse rate, oxyhemoglobin

saturation, and actigraphy. In a more recent study, Willemen et al. [309] reported a κ of 0.56

(at an accuracy of 69%) for 4-stage classification using cardiorespiratory and body movement

features, whereas they considered an epoch of 60 s instead of the standard 30 s used in most

studies with respect to sleep staging. Nevertheless, we anticipate that combining respiratory

and cardiac activity will result in a performance enhancement on sleep stage classification and

this will be further studied.

The PSG-based sleep stages were manually scored based on the R&K rules in the SIESTA

database. However, it has been reported that the overall inter-scorer agreement using the new

AASM standard is slightly higher than that obtained using the R&K rules [81]. Therefore, the

AASM standard is suggested to be applied for PSG-based sleep stage scoring in future work,

which is expected to deliver more reliable annotations of overnight sleep stages used for the

task of respiratory-based sleep stage classification.

This study only considered healthy subjects without any reported medical, neurological,

mental, or cardiovascular diseases as mentioned before. However, for patients with sleep-

disordered breathing (e.g., sleep apnea/hypopnea) or other respiratory abnormalities, abnormal

respiratory events during the night can affect measuring the dissimilarity between respiratory

effort signals. Therefore, the approach described in this work needs to be tested further for

these patients. In addition, it has been shown that the respiratory effort is more sensitive to

changes of sleep posture and body movements during sleep in comparison with measurements

by nasal cannulas [307]. In that case, Dwin might be erroneously calculated, thus harming the

classification performance. However, for the dissimilarity measure described in this work, the

effect of sleep posture might be eliminated since it was computed by comparing each respi-

ratory signal segment with its adjacent segments where the same sleep posture was expected.

Moreover, the dissimilarity measure focused on comparing signal morphology with a certain

number of breaths, where the falsely detected peaks and troughs (often corresponding to body

movements) were removed. As a result, the influences of sleep posture and body movements

should be diminished to some extent. Despite that, those influences merit further investigation.

5.5 Conclusion

In this chapter, by analyzing overnight respiratory effort from healthy subjects, we found that

sleep stages can be differentiated using a dissimilarity measure. This measure expresses the

dissimilarity between respiratory effort signals in their morphology. The dissimilarity can be

evoked by autonomic activity, alternation of ventilation control, or other external factors. A

new feature was extracted based on the properties of respiratory effort dissimilarity. Although

it performed worse than an existing feature (standard deviation of respiratory frequency), it can

help improving the performance of sleep staging when combined with all 26 existing respiratory

features (except for detecting wake from REM sleep). This indicates that this new feature

contains additional information that is not carried by the existing features for sleep staging.

CHAPTER 6

Modeling cardiorespiratory interaction during sleep with

complex networks

This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling

cardiorespiratory interaction during sleep with complex networks. Applied Physics Letters, 105:203701,

2014. c©AIP

Abstract – Human sleep comprises several stages including wake, rapid-eye-movement (REM)

sleep, light sleep, and deep sleep. Cardiorespiratory activity has been shown to correlate with

sleep stages due to the regulation of autonomic nervous system. Here the cardiorespiratory

interaction (CRI) during sleep is analyzed using a visibility graph (VG) method that represents

the CRI time series in complex networks. We demonstrate that the dynamics of the interac-

tion between heartbeats and respiration can be revealed by VG-based networks, whereby sleep

stages can be characterized and differentiated.

85

86 Chapter 6. Cardiorespiratory interaction in complex networks

6.1 Introduction

Human sleep is considered a complex biological process with its own internal architecture

expressed by sleep stages [63, 281]. Sleep stages can be typically separated based on pat-

terns observed in standard polysomnography (PSG) recordings including electroencephalogra-

phy (EEG), electromyography (EMG), and electrooculography (EOG) [136, 247]. With PSG,

sleep stages are manually scored on continuous and non-overlapping epochs (lasting 30 s each)

as wake, rapid-eye-movement (REM) sleep, and several non-REM (NREM) sleep stages for

adults. This is usually done by trained sleep technicians according to either the recommenda-

tions provided by Rechtschaffen and Kales (R&K) [247] or using the more recent guidelines of

the American Academy of Sleep Medicine (AASM) [136]. NREM sleep can be further divided

into stages S1-S4 based on the R&K rules, or stages N1-N3 based on the AASM guidelines.

S1 and S2 (or N1 and N2) are associated with ‘light sleep’. S3 and S4 (or N3) correspond

to slow-wave sleep or ‘deep sleep’. For normal subjects, sleep usually starts with light sleep

and then deep sleep with REM sleep following [63]. This sequence is called a sleep cycle and

occurs about every 90 minutes, four to six times per night [63, 243].

Cardiorespiratory activity has proven different characteristics across sleep stages due to

the manifestation of autonomic (sympathetic and vagal) nervous activity [13, 281, 292]. Re-

cently, dynamics of heartbeats and respiration during sleep have been extensively described

[54, 179, 182, 222, 225, 226]. In particular, characteristics of cardiorespiratory interaction

(CRI) or coupling during sleep have attracted more and more attention since they can be used

to provide means to clinically diagnose sleep-related disorders or to identify sleep stages for ob-

jective sleep assessment [29, 30, 147, 266]. For example, Bartsch et al. [30] proposed methods

based on Hilbert-Huang transform (HHT) and detrended fluctuation analysis (DFA) to quantify

and analyze cardiorespiratory phase synchronization in different sleep stages.

In recent years, exploration of a time series has been extended to a two-dimensional complex

network with encoded information stored in the time series, aiming at better exploiting its

dynamics or properties [6, 96, 188, 313, 320]. Lacasa et al. [169] proposed a nonlinear visibility

graph (VG) method in order to describe a time series in a graph based on specific geometric

criteria. They found that random, fractal, and periodic time series correspond to networks with

exponential, scale-free, and regular characteristics, respectively, which means that VG is an

adaptive method for investigating different types of time series.

Some studies have analyzed human physiological activity by means of VG-based networks

[144, 272, 321]. For example, heartbeat dynamics in VG-based networks have been investigated

for healthy subjects and patients with congestive heart failure [272] and for subjects with med-

itation training [144]. In the field of sleep, a recent work has shown that sleep stages are able

to be identified using parameters extracted from EEG signals based on VG-based algorithms

[321, 322]. However, the characteristics of the interaction between cardiac and respiratory

activities in a two-dimensional network during sleep was not studied. Modeling these charac-

teristics during sleep will potentially benefit the cardiorespiratory-based classification of sleep

stages. Therefore, we investigate the CRI dynamics across sleep stages in complex networks

using the VG-based method.


−3

0

3

6

9

EC

G (

a.u

.)

−1

0

1

Resp. (a

.u.)

330 333 336 339 342 345

−1

0

1

Time (s)

CR

I (a

.u.)

(a)

(b)

(c)

RR interval

Figure 6.1: An example of using (a) a 15-s ECG signal and (b) the corresponding respiratory effort

signal to obtain (c) a CRI time series.

6.2 Cardiorespiratory interaction

We consider 330 overnight PSG recordings from 165 healthy subjects (87 males) from the

SIESTA database [160]. Each subject spent two consecutive nights in a sleep laboratory. The

subjects had an average age of 51.8 ± 19.4 y [mean ± standard deviation (SD)] and an average

total recording time of 7.8 ± 0.5 h. According to the SIESTA study protocol, they met several

criteria such as no reported symptoms of neurological, mental, medical, or cardiovascular dis-

orders, no sleep-related disorders, no shift work, and usual bedtime between 22:00 and 24:00.

The PSG recordings were visually scored on 30-s epochs by two independent raters based on

the R&K rules and in case of disagreement, a consensus annotation was obtained.

Here for each 30-s epoch, the location of individual heartbeats is identified by applying

the Hamilton-Tompkins R-peak detector [124] followed by a slope-based QRS localization

method [107] on the ECG signal with a window of 7 epochs (3.5 min) centered on the epoch of

interest. This window serves the purpose of including sufficient data points to capture changes

in heartbeat (or RR) intervals [288]. Afterwards, the corresponding respiratory effort at the

time stamps of the heartbeats is sampled. The resulting CRI time series is then used for VG

analysis. Figure 6.1 illustrates an example of the computation of a CRI time series from its

corresponding ECG signal and respiratory (effort) signal.

6.3 Visibility graph network

In this work, we apply the VG method to build complex networks for modeling a CRI time se-

ries and to analyze its dynamics across different sleep stages for healthy subjects. To formulize

the VG method, let us consider a time series with n data points xk, tkk=1,2,...,n. Two data points


xi+1 xi+2 xi+3 xi+4 xi+5 xi+6 xi+7

ti+1 ti+2 ti+3 ti+4 ti+5 ti+6 ti+7

… …

(a)

(b)

… …

Figure 6.2: An example of converting (a) a time series segment with 7 data points into (b) a network

using the VG method, where the respective degrees of nodes from xi+1 to xi+7 are 4, 3, 3, 5, 3, 2, and 4.

(xi, ti) and (x j, t j) are connected as vertices or nodes through an undirected edge if and only if

the following rule [169] is satisfied

∀ℓ ∈ (i, j); xℓ < x j − (x j − xi)t j − tℓ

t j − ti. (6.1)

Intuitively, this means that the two data points are connected if they are able to ‘see’ each

other (i.e., the linear interpolation between their values is always larger than the value of its

corresponding data point). The time series can therefore be converted into a VG by applying

this rule on all the data points, resulting in its associated complex network with occurrence of

edges that are linked between nodes. Figure 6.2 illustrates an example of converting a time

series x into a VG-based network. For each node, its degree δ is defined as the number of

edges attached to it, giving a heuristic indication of the network’s complexity. Thus, the degree

distribution of the nodes P(δ ) can be used to characterize the time series.

6.4 Network properties of cardiorespiratory interaction

6.4.1 Degree distribution

A total of 310,503 epochs (including 19.2% wake, 15.2% REM sleep, 53.5% light sleep, and

12.1% deep sleep) are analyzed in this work. Figure 6.3 plots the node degree distribution of

CRI, denoted as P(δ ), pooled over all epochs for each sleep stage (wake, REM sleep, light

sleep, and deep sleep). As illustrated, the degree distribution P(δ ) for each sleep stage follows

a power-law topology such that P(δ )∼ δ−λ , in particular when the degree is large (e.g., δ > 4).

The power λ is shown to differ across sleep stages (wake: λ = 3.7, REM sleep: λ = 3.8, light

sleep: λ = 4.1, and deep sleep: λ = 4.2). As reported in literature, a power-law topology should

correspond to a scale-free dynamics [14, 211, 286], suggesting that the CRI time series during


100

101

102

103

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Degree δ

Degre

e d

istr

ibution

P(

)δ

Wake

REM sleep

Light sleep

Deep sleep

λ = 3.7

λ = 4.2

Figure 6.3: Log-log plot of degree distribution P(δ ) of CRI during wake, REM sleep, light sleep, and

deep sleep. P(δ ) follows a power-law topology when δ is larger than 4, such that P(δ )∼ δ−λ with λ of

3.7 for wake, 3.8 for REM sleep, 4.1 for light sleep, and 4.2 for deep sleep.

100

101

102

10−6

10−5

10−4

10−3

10−2

10−1

100

Mean degree δm

Mean d

egre

e d

istr

ibution

(P

δ m)

Wake

REM sleep

Light sleep

Deep sleep

λ = 6.8

λ = 8.6

Figure 6.4: Log-log plot of mean degree distribution P(δm) of CRI during wake, REM sleep, light sleep,

and deep sleep. P(δm) follows a power-law [P(δm)∼ δ−λm ] when λ ≥ 6 with λ of 6.8 for wake, 7.2 for

REM sleep, 8.0 for light sleep, and 8.6 for deep sleep.


Wake REM sleep Light sleep Deep sleep3

4

5

6

7

8M

ean

δm

Figure 6.5: Mean degree δm of the CRI time series networks (mean and SD) in different sleep stages. A

Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.

a specific sleep stage are non-stationary and fractal [169]. In addition, we also observe that the

VG-based networks of CRI for wake epochs have a higher percentage of high-degree nodes (the

networks have a higher complexity) compared with other sleep stages, such as deep sleep which

has the least high-degree nodes of the associated networks. A possible explanation for this is

that the CRI time series is more noisy (caused by the weaker coupling between cardiac and

respiratory signals) during wake and it is more regular (due to the stronger cardiorespiratory

coupling) during deep sleep when compared with the other stages [30, 188]. Consequently, the

CRI time series are more irregular for wake epochs while they are more regular for deep sleep

epochs. The ‘blur’ in the figure at large values of δ might be due to the presence of outliers in

CRI time series caused by loose cables during measurement or body motion artifacts.

Since the degree is different between different sleep stages, it can be used to distinguish them

on an epoch-by-epoch basis. For this purpose, the mean degree δm for each epoch (computed

by averaging the degrees over the nodes with a window of 7 epochs centered on that epoch) can

be used as a quantification of the network ‘complexity’ of the CRI time series in VG for each

epoch. Figure 6.4 shows the distribution of δm for different sleep stages where the separations

between sleep stages can be clearly observed, in particular when the mean degree is smaller

than 3 or larger than 6. These results are similar to those obtained based on the analysis of

EEG signals [321]. In Figure 6.5, the δm values in different sleep stages are compared. Using

a two-tailed Mann-Whitney test, δm is found to be significantly different between each pair of

sleep stages (all with p < 0.0001). This means that, on average, wake epochs have the highest

mean degree in the networks followed by REM sleep epochs, then by light sleep and finally by

deep sleep. Moreover, if we consider the degree variation δsd , computed as the SD of the node

degrees in each epoch, we also find statistically significant differences between sleep stages

(all with p < 0.0001) as illustrated in Figure 6.6. The Spearman’s rank correlation coefficient

r between these two parameters δm and δsd is found to be high [r = 0.733, p < 0.00001; 95%



4

5

6

7

8

Mean

δsd

Figure 6.6: Degree variation δsd of the CRI time series networks (mean and SD) in different sleep stages.

A Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.

confidence interval (CI) 0.730-0.736].

6.4.2 Assortativity mixing

Another important property of a network is its assortative mixing [210], which has been widely

used to analyze many real-world networks such as biological [100], neural [86], and social

networks [212]. For a node in a network, it takes the preference of its connections to high- or

low-degree nodes into account. Considering a network including a total of M edges, the ith

edge connects two nodes with degree of αi and βi at their ends. The assortativity coefficient ζ

of this network [210] is given by

ζ =M−1 ∑i αiβi − [M−1 ∑i

12(αi +βi)]

2

M−1 ∑i12(α2

i +β 2i )− [M−1 ∑i

12(αi +βi)]2

, (6.2)

with ζ ranging between −1 and 1. The network is assortative if ζ > 0, in which case the

high-degree (or low-degree) nodes are more likely to be connected to each other than to the

low-degree (or high-degree) nodes; if ζ = 0, the network is randomly mixed; and if ζ < 0 the

network exhibits disassortativity, in which case the high-degree nodes tend to connect to the

low-degree ones, and vice versa. For the CRI time series in this work, the assortativity coeffi-

cients of the associated VG-based networks in different sleep stages are shown in Figure 6.7.

The CRI networks in all sleep stages present assortative. In comparison with REM and NREM

sleep, the CRI network has a decreased assortativity coefficient during wake, indicating that

the network is more randomly mixed. Deep sleep, on the other hand, has a larger ζ compared

with light sleep, possibly because the CRI time series during deep sleep exhibit a more regular

pattern than light sleep. These findings suggest that sleep stages can also be separated based

on differences between the assortativity coefficients of VG-based CRI networks. It should also



0.05

0.1

0.15

0.2

0.25

Me

an

ζ

Figure 6.7: Assortativity coefficient ζ of the CRI time series networks (mean and SD) for different

sleep stages. A Mann-Whitney test shows significant differences between all pairs of sleep stages, with

p < 0.0001.

be noted that ζ is significantly correlated to δm (r = −0.363, p < 0.0001; 95% CI −0.368 to

−0.358) and δsd (r = −0.526, p < 0.0001; 95% CI −0.531 to −0.522).

6.5 Conclusion

In this chapter, we achieved the quantification of the dynamics of cardiorespiratory interaction

during sleep by converting it into complex networks using the VG method. It can be described

by some important characteristics of the networks including (mean) degree and its distribu-

tion, degree variation, and assortativity coefficient. These characteristics were shown to behave

differently across sleep stages. However, they were found to be correlated, possibly due to

the presence of mutual information between them. Nevertheless, in practice, they can offer

promising features used for classifying sleep stages based on cardiorespiratory activity.

Part II: Timing Between Autonomic

and Brain Activity

CHAPTER 7

Time delay between cardiac and brain activity during

sleep transitions

This chapter is adapted from: X. Long, J. B. Arends, R. M. Aarts, R. Haakma, P. Fonseca, and J. Rolink.

Time delay between changes of cardiac and brain activity during sleep transitions. Applied Physics Let-

ters, 106:143702, 2015. c©AIP

Abstract – Human sleep consists of wake, rapid-eye-movement (REM) sleep, and non-REM

(NREM) sleep that includes light and deep sleep stages. This work investigated the time de-

lay between changes of cardiac and brain activity for sleep transitions. Here the brain activity

was quantified by electroencephalographic (EEG) mean frequency and the cardiac parameters

included heart rate, standard deviation of heartbeat intervals and their low- and high-frequency

spectral powers. Using a cross-correlation analysis, we found that the cardiac variations during

wake-sleep and NREM sleep transitions preceded the EEG changes by 1-3 min but this was not

the case for REM sleep transitions. These important findings can be further used to predict the

onset and ending of some sleep stages in an early manner.

95

96 Chapter 7. Time delay between cardiac and brain activity

7.1 Introduction

In the past decades a phenomenon has been recognized in many domains that two coupled

sources or systems exhibit an unsynchronized interaction with a time difference or delay in be-

tween [29, 50, 90, 155, 157, 290]. For instance, neural oscillators have enhanced coupling in

delayed-time [90]. In particular, this may occur during transitions between two physical or bi-

ological states such as chaotic state changes [290], gene switches [50], neutron emission [157],

and cardiorespiratory phase synchronization transitions [29]. Understanding these phenomena

can help, e.g., explore the coherence of neurons and information transmission of the brain in

neurology [90] and improve ‘perception-action’ planning with stimulus events from external

world in cognitive science [236].

In this work we apply the time delay analysis in the area of human sleep. Neurophysiological

mechanisms of sleep are exceptionally important for humans to maintain, for instance, health,

internal homeostasis, memory, and cognitive and behavioral performance [61, 165]. Numerous

studies have reported significant association between heart rate (and heart rate variability, HRV)

and electroencephalographic (EEG) activity during sleep, where they both vary across sleep

states/stages [46, 54, 292]. Previous studies have demonstrated the presence of unsynchronized

changes of HRV and EEG activity in time course over the entire night [146, 217]. However,

the variations of brain activity and autonomous cardiac dynamics should not be independent of

sleep (state/stage) transitions, for which their coupling might change. We therefore investigated

the time delay in sleep transition profiles between cardiac and EEG activity using a cross-

correlation analysis, which was not studied before.

It is known that human sleep consists of wake state, rapid-eye-movement (REM) sleep state,

and non-REM (NREM) sleep state including four stages 1, 2, 3, and 4 according to the rules

recommended by Rechtschaffen and Kales (R&K) [247]. With the more recent guidelines of the

American Academy of Sleep Medicine [136], stage 3 and 4 are suggested to be merged to single

slow wave sleep or “deep” sleep since no essential difference was found between them. Besides,

stage 1 and 2 usually correspond to “light” sleep. According to one of these manuals, sleep

states/stages are scored by sleep clinicians on continuous 30-s epochs by visually inspecting

polysomnographic (PSG) recordings including multi-channel EEG, electrooculography (EOG),

and electromyography (EMG).


7.2.1 Subjects and recordings

A total of 330 overnight PSG recordings in the SIESTA database [160] from 165 normal sub-

jects (88 females) were considered in our analysis, where each subject spent two consecutive

nights for sleep monitoring. The SIESTA data were collected in seven sleep centers located

in five EU countries within a period from 1997 to 2000. The study was approved by the local

ethical committees of the recording partners and all subjects provided their informed consent.

The subjects had an average age of 51.8 ± 19.4 y and the average total recording length was

Part II. Timing between autonomic and brain activity 97

0.5

0.6

0.7

0.8

0.9

1

1.1

Co

he

n’s

Ka

pp

a c

oe

ffic

ien

t

REM/deep Wake/deep Wake/REM REM/light Wake/light Light/deep

Figure 7.1: Inter-rater agreement as evaluated by Cohen’s Kappa [mean and standard deviation (SD)

over recordings] between different sleep stages. Statistical significance of difference between each two

Kappa values was examined with a t-test, where the Kappa had no significant difference between REM

sleep/deep sleep and wake/deep sleep and between REM sleep/light sleep and wake/light sleep (p <

0.05) but it was significantly different between the others (p < 0.001).

7.8 ± 0.5 h per night. They fulfilled several criteria, such as no reported symptoms of neuro-

logical, mental, medical, or cardiovascular disorders, no history of drug or alcohol abuse, no

psychoactive medication, no shift work, and retirement to bed between 22:00 and 24:00 de-

pending on their habitual bedtime. Sleep states/stages were scored by two independent raters

based on the R&K rules. In case of disagreement, the consensus annotations were obtained.

The inter-rater reliability (measured by Cohen’s Kappa coefficient of agreement [72], rang-

ing from 0 to 1) in separating different sleep stages is compared in Figure 7.1. It shows that

the Kappa in distinguishing between light and deep sleep was statistically significantly lower

than that for separating other sleep stages. This is due to the gradual changes of physiological

behaviors within NREM sleep.

7.2.2 EEG and cardiac activity

The EEG activity was quantified by a parameter fEEG, called EEG mean frequency [217]. To

calculate it, the EEG signals were first band-pass filtered between 0.3 and 35 Hz and then

the power spectral density was computed for each non-overlapping 2-s interval with a discrete

Fourier transform (DFT). Afterwards, the associated peak frequencies between 0.5 and 30 Hz

were detected accordingly and then for each 30-s epoch, they were averaged over a window of

9 epochs (4.5 min) centered on that epoch, yielding the epoch-based estimates of fEEG. The

cardiac parameters, derived from electrocardiography (ECG) signals over a 9-epoch window

centered on each 30-s epoch, included mean heart rate (HR), standard deviation of heartbeat

intervals (SDNN), and the logarithmic spectral powers of heartbeat intervals in low-frequency

(LF, 0.01-0.15 Hz) and high-frequency (HF, 0.15-0.4 Hz) bands. They have been proven to re-

late to certain properties of autonomic nervous system [13, 281]. For instance, HR, SDNN, and


0 1 2 3 4 5 6 7 8

-202

-202

-202

-202

-202

Deep sleepLight sleepREM sleep

Wake

Time (h)

f EE

GH

RS

DN

NL

FH

F

Figure 7.2: An example of epoch-based sleep states/stages over night and the normalized (Z-score) EEG

mean frequency fEEG and cardiac parameters HR, SDNN, LF, and HF (in nu).

LF are associated with sympathetic activity and the HF power is a marker of parasympathetic or

vagal activity activated by respiratory-stimulated stretch receptors [24, 281, 288]. Many stud-

ies have shown that autonomic nervous activity is effective in identifying sleep states or stages

when PSG is absent [179, 183, 248]. Here all the parameters were normalized to zero mean

and unit variance (Z-score) for each recording, leading to a normalized unit “nu”. Note that the

use of a window aimed at including sufficient heartbeats to capture cardiac rhythms and to help

reduce signal noise so that the autonomic nervous activity can be reliably expressed where a

window size of about 5 min was recommended [288]. This could also help reduce signal noise.

For analyzing the time delay during sleep transitions, we chose 30 s the minimum epoch length

because (1) it is the standard resolution for PSG-based manual scoring of sleep stages [247]

and (2) using a smaller length the parameters could be influenced by the subtle changes caused

by the physiological response during arousals [268], which would likely lead to spurious cross-

correlation analysis results. Figure 7.2 illustrates an example of overnight sleep profile and the

EEG and cardiac parameter values from a healthy subject. It can be seen that these parameters

seem correlated with sleep states/stages to some extent.

7.3 Correlation-analysis during sleep transitions

To capture the delayed changes of cardiac and EEG activity, we constrained our analysis on

the periods with 15 epochs (7.5 min) before and after each transition moment where only one

transition occurred in the middle of each period. We note that a small portion of transitions was

sampled according to our criteria, which might lead to under representation of the fragmented

sleep transitions, i.e., the transitions with other transitions immediately ahead or following

within a short time. The amount of these periods was 1077 out of totally 28359 transitions from


0.6%

3.8%

25.0%0.03%

8.1%

20.2%

11.3% 15.5%

1.0%

14.4%

NREM sleep

Wake REM Light Deep0

10203040506070

Dis

trib

utio

n(

)%(b)

sleep

(a)

sleep sleep

Deepsleep

LightsleepREM sleep

Wake

Figure 7.3: (a) Mean percentages of sleep transitions over recordings. The average number of total

transitions per recording is 85.9. The transitions are indicated with arrows, where the REM–deep sleep

transitions are not shown because they account for less than 0.01% of total transitions. (b) Sleep stage

distribution (mean and SD over recordings).

all 330 recordings. The first and the last 5 epochs of these periods were excluded, yielding

10-min segments used for analyzing time delays. This served to avoid the time-delayed effects

of the previous and the next transitions when analyzing the parameter values for the time delay

of current sleep transition and meanwhile, to include enough data points for computing cross-

correlation coefficients. By these means, we only considered major types of sleep transitions in

three “hierarchical” levels, as shown in Figure 7.3. They are the transitions: (1) between wake

and sleep including W→LS (from wake to light sleep), LS→W (from light sleep to wake), and

RS→W (from REM sleep to wake); (2) between REM and NREM sleep including RS→LS

(from REM to light sleep) and LS→RS (from light to REM sleep); and (3) within NREM sleep

including LS→DS (from light to deep sleep) and DS→LS (from deep to light sleep). These

seven types of transitions are of predominance among all sleep transitions [154, 159], which

can also be observed in our data (see Figure 7.3). The transitions between REM and deep sleep

and from deep sleep to wake were not included. For each parameter, we calculated the mean

values over all the 10-min segments for each type of transition and then they were Z-score

normalized. Figure 7.4 illustrates the mean parameter values 5 min (or 10 epochs) before and

after sleep transitions.

The cross-correlation between EEG mean frequency fEEG and each cardiac parameter αc

(HR, SDNN, LF, or HF) for a given time segment with m epochs is expressed by a cross-

correlation function G,

G fEEG,αc(n)≡ ( fEEG ⋆αc)(n) =

1

m

m−n

∑i=1

fEEG,i ·αc,i+n, (7.1)

where n is the number of time shifts (a.k.a. time lag) of the convolution between fEEG and αc.

Therefore, the delayed time ∆τ can be obtained by searching for the lag leading to maximum


-202

f EE

G

-202

HR

-202

SD

NN

-202

LF

-5 0 5

-202

HF

-5 0 5 -5 0 5 -5 0 5


-5 0 5 -5 5 -5 5

W LS LS W RS W RS LS LS RS LS DS DS LS

Wake sleep- REM NREM- NREMT

ran

sitio

n

0 0

Figure 7.4: Mean values of the normalized parameter fEEG, HR, SDNN, LF, and HF (in nu) with 10

epochs (5 min) before and after different sleep transitions (W→LS, LS→W, RS→W, RS→LS, LS→RS,

LS→DS, and DS→LS).

absolute correlation coefficient, such that

∆τ = argmaxn

|G fEEG,αc(n)|. (7.2)

The time delay ∆τ can be positive or negative. A positive ∆τ value indicates that fEEG starts

changing earlier than the cardiac parameter αc, and conversely, a negative value reflects that the

variations of αc are later than fEEG with ∆τ epochs (∆τ/2 min) on average.

7.4 Results and discission

As shown in Table 7.1, the cardiac parameters started changing approximately 1.5 min ahead

of the EEG mean frequency for the entire-night recordings, confirming the findings reported

by Otzenbeger et al. [217]. This indicates that the changes of autonomous activity generally

precede the EEG changes. It was also revealed that, on average, HR, SDNN, and LF were

positively correlated with EEG mean frequency while HF was negatively correlated with it

(p < 0.05). In addition, the table provides the time delay analysis results for different types of

sleep transitions, where the time lag ∆τ (in 30-s epoch) and the associated maximum correlation

coefficients r are given. For SDNN, LF, and HF, we found that the time lag was of −3 to −1 min

for the transitions between wake and sleep and of −2 to −1 min for NREM sleep transitions.

This indicates that the changes of HRV anticipated the variations of EEG mean frequency by

1-3 min for these types of transitions.

In general, the relatively constant time delay between cardiac and EEG parameters indicates

the existence of time differences between autonomic and cortical changes during sleep transi-


Table 7.1: Results of time delay analysis between EEG mean frequency fEEG and four cardiac

parameters HR, SDNN, LF, and HF for different sleep transitions

Sleep transition Samples HR SDNN LF HF

∆τ r ∆τ r ∆τ r ∆τ r

Full-night recording

All N = 330 −2.4 0.22 −2.6 0.24 −2.6 0.19 −3.3 −0.19

Wake-sleep transition

W→LS N = 159 −1 0.90 −3 0.86 −5 0.71 −4 −0.77

LS→W N = 84 −1 0.89 −5 0.62 −2 0.74 −2 −0.73

RS→W N = 29 −2 0.86 −6 0.70 −3 0.79 −3 −0.86

REM-NREM transition

RS→LS N = 180 0 0.84 0 0.90 0 0.71 0 −0.71

LS→RS N = 284 1 0.89 0 0.84 1 0.92 2 −0.90

NREM transition

LS→DS N = 196 0 −0.96 −2 0.70 −2 0.75 −3 −0.64

DS→LS N = 145 1 −0.60 −4 0.78 −4 0.81 −4 −0.83

Correlation coefficients r were computed for lags from -20 to +20 epochs. For full-night recordings,

the average time delays and correlation coefficients are presented which were significant (p < 0.05)

for the majority of the recordings (82.7% for HR, 85.2% for SDNN, 76.1% for LF, and 78.4% for

HF). For sleep transitions, the maximum correlations are presented and they were found to be signifi-

cant (p < 0.0001). The positive delays mean that EEG changes are prior to cardiac changes and the

negative delays indicate the changes in cardiac activity preceding those in EEG activity.

tions. The constant earlier appearance of autonomic variations suggests that cortical changes

are secondary to changes elsewhere in the brain (e.g., brain stem) or central nervous system.

These time differences are sleep state/stage dependent and seem not occurring for REM sleep

(i.e., REM-NREM transitions). This also suggests that the physiology of these changes dur-

ing REM sleep is different from that during wake and NREM sleep. In fact, REM sleep has

different physiological mechanisms compared with NREM sleep, where REM transitions are

‘switch-like’ transitions [187] while the physiological variations within NREM sleep are grad-

ual [55]. The lack of time delay during REM transitions might also be caused by the fact that

the R&K rules force human raters to merge REM epochs of 30 s into one REM sleep period if

they occur within 3 min [247]. For W→LS transitions, upon a closer look, we found that most

of them were in the beginning of the night, indicating the presence of time delay conveyed be-

tween cardiac and brain activity during sleep onset. The time delay from sleep (REM or light

sleep) to wake could be due to the gradual steps of awakening [8, 116]. Additionally, as shown

in the table, the changes of HR seem always later than the HRV changes. We therefore specu-

late that, to a certain degree, parasympathetic changes (reflected by HF changes) might present

slightly earlier than the variations of sympathetic activity (corresponding to HR changes) during

wake-sleep and NREM transitions.

As stated, when computing the parameters, we applied averaging or filtering over a 9-epoch


-6

-4

-2

0

2

4

6

Δτ

(30

s)

1 3 5 7 9

0.4

0.6

0.8

||(-

)r

1 3 5 7 9Window size (30 s)

1 3 5 7 9 1 3 5 7 9

W LS

LS W

RS W

RS LS

LS RS

LS DS

DS LS

HR SDNN LF HF

HR SDNN LF HF

Figure 7.5: Time delay ∆τ between cardiac and EEG activity and the associated (maximum) absolute

correlation coefficient |r| versus averaging window size (1-9 epochs, step size 2 epochs) for computing

the epoch-based parameters.

0

2

4

6

8

10

12

14

Absolu

te H

R c

hange (

bpm

)

Wake sleep transition-

REM NREM transition-

NREM transition

LS W RS W W LS LS RS RS LS DS LS LS DS

Figure 7.6: Absolute changes of HR (mean and SD) during sleep transitions, computed based on the

10-min segments.

(4.5-min) window centered on each epoch in order to obtain reliable parameter values. Fig-

ure 7.5 illustrates the time delay and the associated absolute correlation coefficient versus the

averaging window size. The figure shows that our choice was appropriate where the correla-

tions generally increased and the time delays ∆τ stabilized along with the increase in window

size. In fact, when performing cross-correlation analysis between two signals, using a symmet-

ric linear-phase filtering at the same window size would not cause signal phase distortion [240].

Thus, the averaging here should not affect the lag sought when searching for the time delays.

Figure 7.6 shows the absolute changes of HR (in beat per minute, bpm) during different

sleep state/stage transitions. It is noted that large HR changes (4.6-9.1 bpm) occurred during


the wake-sleep transitions while the NREM transitions had the smallest changes in HR (1.1-

2.7 bpm). This supports the “hierarchical” nature of the various transitions and confirms the

validity of the results.

7.5 Conclusion

In this chapter, we investigated the time delay between cardiac and brain activity for different

sleep transitions using a cross-correlation analysis. The presented results indicate that the au-

tonomic nervous system changes generally precede the EEG changes by 1-3 min during sleep

transitions except for REM-NREM transitions. In practice, the important findings here can be

used in future research to predict sleep state/stage changes based on autonomic nervous activity.

CHAPTER 8

Detection of nocturnal slow wave sleep based on

cardiorespiratory activity

This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of

nocturnal slow wave sleep based on cardiorespiratory activity. Submitted.

Abstract – Human slow wave sleep (SWS) during bedtime is paramount for energy conser-

vation and memory consolidation. This work aims at automatically detecting SWS from noc-

turnal sleep using cardiorespiratory signals that can be acquired with unobtrusive sensors in a

home-based scenario. From the signals, time-dependent features are extracted for continuous

30-s epochs. To reduce the measuring noise, body motion artifacts, and/or within-subject vari-

ability in physiology conveyed by the features and thus enhance the detection performance, we

propose to smooth the features over each night using a spline fitting method. In addition, it is

found that the changes in cardiorespiratory activity precede the transitions between SWS and

the other sleep stages (non-SWS). To this matter, a novel scheme is proposed that performs

the SWS detection for each epoch using the feature values prior to that epoch. Experiments

were conducted with a large data set of 325 overnight polysomnography (PSG) recordings us-

ing a linear discriminant classifier and ten-fold cross validations. Features were selected with a

correlation-based method. Results show that the performance in classifying SWS and non-SWS

can be significantly improved when smoothing the features and using the preceding feature val-

ues of 5-min earlier. When compared with manual PSG scoring, we achieved a Cohen’s Kappa

coefficient of 0.57 (at an accuracy of 88.8%) using only six selected features for 257 recordings

with a minimum of 30-min overnight SWS that were considered representative of their habitual

sleeping pattern at home. A marked drop in Kappa to 0.21 was observed for the other nights

with SWS time of less than 30 min which were found to more likely occur in older subjects.

This will be the future challenge in cardiorespiratory-based SWS detection.

105

106 Chapter 8. Slow wave sleep detection with cardiorespiratory activity

8.1 Introduction

Nocturnal sleep of humans is comprised of rapid-eye-movement (REM) sleep, stages S1-S4

of non-REM (NREM) sleep, and wake according to the R&K rules [247]. S1 and S2 are

grouped into “light sleep”, where S1 and S2 correspond to stages N1 and N2 respectively

according to the more recent guidelines of the American Academy of Sleep Medicine (AASM)

[136]. S3 and S4 are considered slow wave sleep (SWS), in correspondence to N3 stage in the

AASM guidelines. SWS relates to delta electroencephalographic (EEG) activity with no eye

movements [136]. It represents the most restorative period of sleep for metabolic functioning,

during which brain and body energy are conserved [35] and new memories are consolidated

[285]. SWS associates with maintenance of sleep and sleep quality [45]. Lack of SWS may

result in, e.g., loss of daytime performance [45] and increased risk of diabetes [287]. More

interestingly, attention has been engaged in the past decade to improve nighttime sleep (i.e., to

enhance memory consolidation) through external stimulation of sleep slow waves in humans

[192, 193, 213]. Therefore, we were engaged to develop a system to accurately detect SWS

from nocturnal sleep, particularly in a home scenario.

Polysomnography (PSG) is the “gold standard” for objective sleep assessment, relying on

which a hypnogram can be derived through visual scoring by sleep technicians [136, 247]. A

PSG recording typically consists of various bio-signals such as EEG, electromyograhy (EMG),

electroocculography (EOG), electrocardiography (ECG), respiratory effort (RE), and blood

oxygen saturation (SaO2). These signals are usually split into continuous 30-s non-overlapping

intervals, called epochs. Although PSG is a standard method for sleep analysis, it has some

disadvantages, for example, it is conducted in a sleep laboratory, leading to high costs with

facilities; it requires many electrodes to be attached to the body, disrupting a subject’s normal

sleep as a consequence; and it requires subjects to stay in the sleep laboratory overnight that

is not compatible with a prolonged sleep monitoring. To overcome these disadvantages, car-

diac/respiratory information has been deployed to assess sleep for years as long as they can be

acquired with unobtrusive sensing systems in a home-based environment such as with a wrist-

worn watch [127], a bed sensor [161], a textile bedsheet [264], a web-camera [232], an acoustic

device [228], a Doppler radar [319], and a photoplethysmographic sensor [16]. It has been

proven that cardiorespiratory signals contain relevant physiological information for sleep stag-

ing such as heart rate variability (HRV) [46] and respiration rhythm [95]. This is because they

are related to autonomic nervous system that differs between sleep stages [292]. For example,

SWS coincides with an decreased sympathetic activity conveyed by the low-frequency power

in HRV.

Cardiorespiratory-based sleep stage classification has been increasingly studied in recent

years, where many features (representing certain physiological aspects) have been designed

and extracted from cardiac and/or respiratory signals [151, 182, 197, 249, 309]. However,

rather than SWS detection, those studies investigated either wake–REM–NREM or sleep–wake

classification. Many other studies have reported results in classifying wake, REM sleep, light

sleep, and SWS [127], detecting REM sleep [131], or differentiating light sleep and SWS [51],

whereas they used additional physiological signal modalities such as peripheral arterial tone


and oxyhemoglobin saturation. Shinar et al. [273] developed an HRV-based SWS detector and

obtained an accuracy of about 80%, while they used a very small portion with a total duration

of 100 min (SWS of 50 min) rather than entire-night recordings for validation. Therefore, this

chapter addresses the problem of continuously classifying overnight SWS and non-SWS (all

the other stages) with cardiorespiratory signals that can be unobtrusively acquired.

The sleeping pattern of healthy adults usually progresses with several regular cycles through-

out the night [63]. This means that, for each recording, the sleep stage with associated physio-

logical activity across the night is time-variant so that each feature is considered an epoch-based

time series. After visually comparing some feature values and PSG-based annotations chang-

ing over night, we observed many errors occurring in the middle of a long SWS/non-SWS

period, possibly due to measuring noise, feature computing variances, or body motion artifacts.

Another cause might be the ‘within-subject variability’ in physiology, which means that the

physiological expression of features was not perfectly discriminative and thus could not deliver

an ideal separation between sleep stages. For these reasons, we decided to low-pass filter or

smooth each feature’s values over time using a spline fitting method [296]. The main reason

of using spline fitting was that it is capable of interpolating missing data compared with many

other low-pass filters [84, 296]. This is of particular importance because sleep is a continuous

process and we found that our data had an average of ∼10% missing values.

Several researchers have investigated the temporal relationship between cardiac dynamics

and brain activity [62, 146, 217]. For instance, Otzenberger et al. [217] reported that the

overnight HRV changes generally precede the variations in EEG activity by around 1-2 min.

Jurysta et al. [146] demonstrated that the high-frequency power of heartbeat or RR intervals

corresponds to a preceding time (or negative time delay) of approximately 7 min compared

with the delta-wave power of EEG spectrum. Additionally, the decrease of heart rate in stage

S2 was found to anticipates the onset of SWS by several minutes [62]. These studies indicate

that the autonomic changes are not exactly synchronized with the variations in EEG activity,

in particular during the transitions between SWS and non-SWS; rather that a time difference

appears in between. In our data, we also observed that many features started changing prior

to the transition moments between the annotated SWS and non-SWS epochs. This time delay

phenomenon would end up with errors in classifying SWS and non-SWS epochs. To this matter,

we propose a novel scheme by using the preceding feature values in earlier epochs to further

improve the identification of the sleep state of each epoch (SWS or non-SWS). This can also

potentially enable the prediction of SWS onset in an early manner allowing a real-time SWS

detection system, usually required for slow wave stimulation in practice.

Previous work has shown that a linear discriminant (LD) classifier is appropriate in the

problem of sleep stage classification [180, 182, 249], which was adopted in this work for SWS

and non-SWS classification. Preliminary results of this work have been previously reported

[186].


Table 8.1: Subject demographics and sleep data from normal

nights (with a minimal SWS time of 30 min)


Recording N = 257 (145 subjects)

Sex 65 males and 80 females

Age (y) 49.5 ± 19.2 20 − 95



Wake (%) 17.5 ± 11.0 1.1 − 63.0

REM sleep (%) 15.8 ± 5.5 0.0 − 29.0

Light sleep (%) 51.9 ± 8.6 21.1 − 70.4

SWS sleep (%) 14.8 ± 5.1 6.2 − 32.2


Full PSG data (at least 16 channels of bio-signals) from 165 healthy subjects in the SIESTA

project [160] was included, monitored in seven different sleep centers located in five European

countries. In accordance with the SIESTA protocol, the subjects met several criteria such as

no reported symptoms of neurological, mental, medical, or cardiovascular disorders, no history

of drug or alcohol abuse, no shift work, and retirement to bed before midnight depending on

their habitual bedtime [160]. Each subject spent two consecutive nights in a sleep laboratory,

resulting in a total of 330 overnight recordings. For each recording, the scoring of 30-s epoch-

based sleep stages was carried out by sleep technicians based on PSG according to the R&K

rules. For SWS and non-SWS classification, wake, REM sleep, S1, and S2 were merged into

a single non-SWS class; S3 and S4 were labeled as SWS class. The epochs with invalid PSG

scoring (∼3%) were removed.

Five recordings were excluded due to the absence of SWS, yielding an inclusion of 325

recordings in our data set. In addition, this work primarily addressed on the ‘normal’ sleep

nights (from lights OFF in the evening till lights ON in the morning), during which the total

SWS time throughout the night was no less than 30 min [216], resulting a group of 257 record-

ings from 145 subjects in a normal group. These nights were more representative of the normal

sleeping pattern in terms of SWS [216], which were expected with a home-based sleep moni-

toring. The remaining 68 nights (from 51 subjects) with the overnight total SWS time of less

than 30 min (low-SWS group), more from the first nights than the second nights, were excluded

because they might be strongly influenced by the “laboratory effects” where the subjects could

not sleep well as habitual as being at home [196]. The subject demographics and sleep data for

the normal group used in this study is summarized in Table 8.1. In spite of that, we also tested

our approach on the recordings from the low-SWS group.

The thoracic RE signals (sampled at 10 Hz) were acquired with a respiratory inductance

plethysmographic (RIP) chest belt and the cardiac signals (sampled at ≥100 Hz) were recorded

with a modified V1 lead ECG.


8.3 Methods

8.3.1 Signal preprocessing

The RE signal was filtered with a tenth order Butterworth low-pass filter (with a cut-off fre-

quency of 0.6 Hz) to eliminate high frequency noise. Afterwards, the baseline was subtracted

by the median peak-to-trough amplitude over the entire recording [182, 248]. Because we also

extracted respiratory features in the frequency domain, a fast Fourier transform (FFT) with a

Hanning window (used to reduce spectral leakage) was applied to estimate the power spectral

density (PSD) on the resulting signal for each epoch [248].

The ECG signal was high-pass filtered using a Kaiser window (with a cut-off frequency

of 0.8 Hz and a side-lobe attenuation of 30 dB) to remove baseline wander, after which the

resulting signal was zero-meaned. To extract features from RR intervals for each epoch, a

Hamilton-Tompkins R-peak detector[124] combined with a precise QRS localization algorithm

[107] was applied to locate R peaks on the ECG signal with a window of nine epochs centered

at the epoch of interest. This window served to include sufficient data points to capture the

changes in RR intervals, where the window size is close to the value of 5 min recommended

in [288]. The resulting RR interval series was then re-sampled via linear interpolation at a

sampling rate of 4 Hz. The PSD of RR intervals was estimated using an autoregressive (AR)

model with adaptive order [42]. Using the AR model instead of a Fourier-based approach

was due to its limitations such as poor spectral resolution and leakage [44], which were more

sensitive to estimating the PSD of the RR interval series having a lower sampling rate compared

with the RE signal.


A total of 70 features were extracted for each 30-s epoch from ECG and thoracic RE signals,

which are briefly described below. Note that the features for a specific epoch were mostly

computed within a certain window centered at that epoch.

The ECG features were obtained from the RR intervals or heart rates over a window of nine

epochs (with around 300 beats during sleep on average). In the time domain, they included

the mean heart rate, mean RR interval (detrended and non-detrended), standard deviation and

range of RR intervals, the percentage of successive RR intervals that differ by more than 50

ms, and the root mean square and standard deviation of successive RR interval differences

[288]. Frequency domain features comprised the logarithm of normalized power in the very low

frequency (VLF, 0.003-0.04 Hz), low frequency (LF, 0.04-0.15 Hz), and high frequency (HF,

0.15-0.4 Hz) spectral bands, the ratio of LF and HF spectral powers [249, 288], and the module

and phase of HF pole [197]. The VLF, LF, and HF power and LF-to-HF ratio with adapted

spectral bands have succeeded in improving sleep/wake detection [178]. The maximum power

in the HF band and its associated frequency (in line with the mean respiratory frequency) were

also calculated [248]. Additionally, non-linear properties of RR intervals were quantified based

on detrended fluctuation analysis (DFA) with parameter α [148] and its short-term (parameter

α1) and long-term (parameter α2) exponents [224], and multi-scale sample entropy (length: 1


and 2 samples, scale: 1-10) [75].

The RE features included the mean respiratory frequency estimated in the time and the fre-

quency domain, respiratory frequency standard deviation over five epochs, mean and standard

deviation of breath-by-breath correlations, standard deviation of breath lengths, and the spec-

tral power of respiratory frequency [249]. Several features regarding the RE amplitude were

derived: the standardized median and sample entropy of respiratory peaks and troughs (i.e.,

the respiratory upper and lower envelopes indicating the inhalation and exhalation depths, re-

spectively), median peak-to-trough difference, median volumes and flow rates of breath cycles,

inhalations, and exhalations, and the ratio of inhalation and exhalation flow rates [182]. When

computing these amplitude-based features for each epoch, we used a window of thirteen epochs

(with around 120 breath cycles) since the sample entropy measures are less reliable if the num-

ber of samples is less than 100 [250]. Similar to the spectrum analysis of RR intervals, we

found the power in different spectral bands (VLF, LF, and HF) and the LF-to-HF ratio obtained

from respiratory PSD [248]. In addition, we extracted the respiratory regularity quantified with

sample entropy over seven epochs [182] and windowed respiratory dissimilarity measured by

means of uniform scaling [185] and dynamic (time and frequency) warping [180], respectively.

8.3.3 Spline fitting for feature smoothing

As stated, the features should cycle with time in terms of sleep stage, which motivated us

to consider a recording- or night-specific feature smoothing. Before that, each feature was

normalized for each recording to have zero mean and unit variance (Z-score normalization).

This served to reduce the variability between subjects caused by the difference between PSG

systems used in different sleep laboratories and/or the difference in physiological expression

during sleep. Our previous work [186] has revealed that the Z-score normalization can help

improving SWS detection.

The spline fitting method has been widely used for time series smoothing [84]. Let x rep-

resent a sequence of observations x = x1,x2, ...,xn (x1 < x2 < ... < xn) and y their responses

y = y1,y2, ...,yn, then a relation between them can be modeled by

yi = g(xi)+ εi (i = 1,2, ...,n), (8.1)

where g is a smoothing (spline) function, εi are independent and identically distributed resid-

uals. The smoothing function can be estimated by minimizing the objective function (i.e.,

penalized sum of square) such that

g = argming

[n

∑i=1

[yi −g(xi)]2 +λ

∫ xn

x1

g′′(x)2dx

]

, (8.2)

where λ is a smoothing parameter that controls the trade-off between residual and local vari-

ation. The smoothing function can be expressed by cubic B-splines as basis functions and

determined via least squares approximation (LSA) [84, 296].

Given a feature for a recording, the observations here are the epoch indices t = t1, t2, ..., tm

and the responses are their corresponding feature values v = v1,v2, ...,vm, where m is the


total number of epochs. To build up a spline fitting model, the entire sequence is divided in

k continuous subsequences with k − 1 boundaries called knots or breaks; and each of them

contains l epochs. The feature values and epoch indices for this recording are then expressed

respectively as

v = v11,v12, ...,v1l︸︷︷︸

1

,v21,v22, ...,v2l︸︷︷︸

2

, ...,vk1,vk2, ...,vkl︸︷︷︸

k

(8.3)

and

t = t11, t12, ..., t1l︸︷︷︸

1

, t21, t22, ..., t2l︸︷︷︸

2

, ..., tk1, tk2, ..., tkl︸︷︷︸

k

. (8.4)

Thereafter, each subsequence is modeled by Equation 8.1 and 8.2, yielding a spline fitting over

the entire sequence with multiple knots. Since the total number of epochs differs between

recordings, we preferred to fix the window size of subsequences w = ⌈m/k⌉ instead of using a

fixed number of breaks k. A larger window size (or fewer knots) results in a smoother fitting

curve; while a smaller window size (or more knots) decreases its smoothness. For example,

as depicted in Figure 8.1, the feature values throughout the night after spline smoothing seem

better mapped to the PSG-based annotations. The figure also shows that the RR interval and

respiratory rate have lower variances during SWS compared with the other stages.

8.3.4 Feature subset selection

Since an LD classifier is usually sensitive to the presence of redundant and non-discriminative

features, classification performance would degrade as a result. Hence, we applied a correlation-

based feature selector (CFS) [121] to select features that can maximize the discriminative

power. CFS is a supervised algorithm that towards finding an ‘optimal’ feature subset contain-

ing features uncorrelated with each other and highly correlated with the classes. With CFS, the

heuristic evaluation criterion, called “merit”, can be formulized by taking the feature-to-class

and feature-to-feature correlations into account. Starting with no features, a forward search

was used to add new features one-by-one until no increase on merit was observed when in

combination with additional features. More details of CFS can be found elsewhere [121].

8.3.5 Classifier

Here a simple LD classifier was adopted and the classification was performed on each epoch

over the whole recording. The linear discriminant function is given by

Gc(f) =−1

2(f−µµµc)

T ΣΣΣ−1(f−µµµc)+ lnPr(c), (8.5)

where µµµc expresses the mean of the feature vector f, ΣΣΣ the pooled covariance matrix, and Pr(c)

the prior probability for class c [SWS (positive class) or non-SWS (negative class)]. Given a


feature vector, the jth epoch E j ( j = 1,2, ...,m) of a recording is classified based on the decision

making rule

C (E j | f j) =

SWS if GSWS(f j)> Gnon-SWS(f j)

non-SWS otherwise. (8.6)

We observed that the occurrence of each class varied throughout the night. For instance,

the probability of being in SWS at the end of the night should be lower than that in the middle

of the night. This indicates that the prior probabilities are time-varying. Instead of using a

fixed prior probability hence, we computed the time-varying prior probability for each epoch

by simply counting the relative frequency it was, in that specific epoch index, annotated as each

class [248].

8.3.6 Time delay

As illustrated in Figure 8.1, there seems to be some errors in feature values with a few min-

utes before the transitions between SWS and non-SWS, implying the presence of time delay

between the changes of cardiorespiratory properties and the PSG-based annotations. Under the

consideration of the time delay, earlier cardiorespiratory activity can be utilized to identify SWS

or non-SWS class. Supposing that we want to classify the jth epoch E j ( j = 1,2, ...,m), we can

use the feature values of the ( j+τ)th epoch (with a delay of τ epochs to the target epoch) instead

of using the feature values from the epoch itself, such that

C (E j | f j+τ) =

SWS if GSWS(f j+τ)> Gnon-SWS(f j+τ)

non-SWS otherwise(8.7)

in which a negatively delayed time (i.e., a preceding time) was expected. This means that we

anticipated the class of the target epoch with τ epochs earlier. To evaluate this approach, we

computed the discriminative power of the features and the classification results by varying the

time delay from -30 to 0 epochs with a step size of one epoch (a τ of zero corresponds to the

absence of time delay).

8.4 Experiments and evaluation

From a practical point of view, we considered a subject-independent cross validation – the two

nights’ recordings from the same subject were either included in the training or the test data

set. To provide an unbiased evaluation of our classifier, a ten-fold cross validation (CV) pro-

cedure was conducted. The data set was partitioned into ten subsets containing recordings as

nearly equal as possible. During each iteration of the ten-fold CV, nine subsets were used to

generate feature subsets and then train the classifier and the remaining was used for testing.

The classification results were then obtained on each test data set of the cross-validation; there-

after the evaluation of the classifier’s performance was formed by pooling (i.e., aggregating) or

averaging all results.


Non-SWS

SWS

-2

0

2

SD

NN

RR

(a.u

.)

0 1 2 3 4 5 6 7

-2

0

2

Time (h)

SD

FR

E(a

.u.)

PSG

Figure 8.1: An example of overnight PSG-based annotations of SWS and non-SWS and the values of

two representative features SDNNRR (standard deviation of RR intervals) and SDFRE (standard deviation

of respiratory frequency) from a subject. The unsmoothed (dashed) and smoothed (solid) feature values

are plotted. The window width for spline fitting was 25 epochs. By comparing the annotations and the

two features, classification errors might occur around the transitions between SWS and non-SWS (e.g.,

the transition around the 5th h).

To prevent selecting features upon the whole data set and thus biasing the classifier, CFS

was applied during each iteration of the ten-fold CV, yielding ten ‘optimal’ feature subsets, one

for each training set. In order to assemble a single feature list, only the features appearing in all

feature subsets were selected. This list was thereby used in all iterations of ten-fold CV to test

the classifier.

Although the feature selector can automatically choose features that optimally separate the

classes SWS and non-SWS, evaluating the discriminative power of each single feature explores

which physiological aspects help distinguish both classes. It not only allows for the comparison

among features but also indicates to what extent the smoothing and time delay help improve

the features. For these purposes, the absolute standardized mean difference (ASMD) was used

to measure the discriminative power of a single feature. Given a feature f, it is computed as the

absolute mean difference of the feature values between SWS and non-SWS epochs divided by

the standard deviation of the values over all epochs

ASMDf =|µ f

SWS −µ fnon-SWS|

σ f(8.8)

where µ fSWS and µ f

non-SWS express the sample mean of SWS and non-SWS epochs, respectively,

and σ f is the sample standard deviation. A higher discriminative power in separating the two

classes translates to a larger ASMD value.

Overall accuracy, precision, sensitivity, and specificity were first considered to evaluate the

classifier. However, they might not be appropriate criteria for the “imbalanced class distribu-

tion” in our data, where the non-SWS epochs account for an average of 87.6% of the night. The


Cohen’s Kappa coefficient of agreement κ [72] offers an indication of the general classification

performance in correctly identifying imbalanced classes by compensating for the probability

of chance agreement. Here the classifier threshold was chosen to optimize the pooled Kappa

based on training data. To have an overview of the classification performance across the entire

solution space, a Precision-Recall (PR) curve was used. It plots precision versus recall (or sen-

sitivity) by varying the classifier threshold used to separate the two classes. When comparing

classifiers, the metric ‘area under the PR curve’ (AUCPR) was calculated. In general, a larger

AUCPR corresponds to a better classification performance.

In order to evaluate the effectiveness of the feature smoothing and the time delay approaches

in improving SWS and non-SWS classification, we compared four classification schemes by

using features

• A: without smoothing and time delay,

• B: with smoothing but without time delay,

• C: without smoothing but with time delay, and

• D: with smoothing and time delay.

The spline window size and the delayed time were determined to optimize κ based on training

data. Moreover, the classification performance was also compared between using only ECG

and only RE signals and between the normal group and the low-SWS group.

8.5 Results

After the feature selection procedure described before, a total of six features were selected with

CFS when including all cardiorespiratory features. In the same way, we obtained a list of four

features when using ECG alone and four when using solely RE. The selected features using

different signal modalities are listed in Table 8.2.

The averaged discriminative powers of the selected features in different schemes are com-

pared in Figure 8.2. It indicates that the smoothing with spline fitting can improve the feature

discriminative power. Experimentally it was found that the κ value was maximized at a spline

window of 25 epochs. On the other hand, using the features with negative time delay also

increased their discriminative power by comparing the ASMD values between scheme A and

C (or between scheme B and D). Here the optimal time delay τ of −2.5 and −5 min were

experimentally found for scheme C and D, respectively.

Figure 8.3 plots the classification performance (κ and AUCPR) versus time delay (τ) in

scheme C and D. The figure shows that the highest κ and AUCPR occurred with a negative time

delay of five epochs (2.5 min) for the unsmoothed features and of ten epochs (5 min) for the

smoothed features. This means that the optimal time delay should depend on the window size

of spline fitting. As we expected, it was longer in scheme D (with smoothing) than in scheme

C (without smoothing).

The results of SWS and non-SWS classification obtained with respect to the four schemes

are summarized in Table 8.3. The best result, obtained with smoothing and time delay, cor-


Table 8.2: A list of selected features for SWS detection

Feature Description Denotation Signal modality

RR standard deviation∗,† SDNNRR ECG

RR spectrum power LF band∗,† LFRR ECG

RR DFA (parameter α)∗,† DFARR ECG

RR sample entropy (length 2, scale 1)† SERR ECG

Respiratory frequency standard deviation∗,‡ SDFRE RE

Respiratory peak standardized median∗,‡ SDMPRE RE

Respiratory trough standardized median∗,‡ SDMTRE RE

Respiratory uniform scaling dissimilarity‡ UNISRE RE

Selected features for SWS detection ∗using both ECG and RE signals, †using only

ECG signal, or ‡using only RE signal.

A B C D

1

2

Scheme

AS

MD

A B C D

1

2

Scheme

AS

MD

A B C D

1

2

Scheme

AS

MD

A B C D

1

2

Scheme

AS

MD

A B C D0

1

2

Scheme

AS

MD

A B C D0

1

2

Scheme

AS

MD

A B C D0

1

2

Scheme

AS

MD

A B C D0

1

2

Scheme

AS

MD

SDFRE SDMPRE SDMTRE UNISRE

SERRDFA RRLFRRSDNNRR

Figure 8.2: Average discriminative power (as measured by ASMD) of the selected features in different

schemes. The ASMD of scheme D was found to be significantly higher than the others for all the selected

features using a paired (two-sided) Wilcoxon signed-rank test (p < 0.001). The time delay τ was −2.5

min for scheme C and −5 min for scheme D.

responds to a pooled κ of 0.57, an overall accuracy of 88.8%, and an AUCPR of 0.68. With

an average κ of 0.56 ± 0.17, an average accuracy of 88.7 ± 4.2%, and an average AUCPR

of 0.69 ± 0.18, this scheme significantly outperforms all others, tested with a Wilcoxon test

(p < 0.0001). The table indicates that smoothing the features per recording resulted in a sig-

nificant increase in both κ and AUCPR regardless of where time delay was considered. The

classification performances of the four schemes are also compared by PR curves in Figure 8.4.

Taking a recording as an example, Figure 8.5 visually compares the PSG-based annotations and

the identified classes, suggesting an enhancement in classification performance when applying

feature smoothing and time delay. The figure also illustrates that feature smoothing can help


-30 -20 -10 0

0.4

0.5

0.6

0.7

Time delay (30-s epoch)

Va

lue

(-)

-30 -20 -10 0

0.4

0.5

0.6

0.7

Time delay (30-s epoch)

Va

lue

(-)

Kappa

AUC

Kappa

AUC

(a) (b)

PR PR

Figure 8.3: Classification performance using features (a) with smoothing and (b) without smoothing

versus time delay (τ), in epochs. The minus sign of τ indicates the use of preceding feature values.

Table 8.3: Summary of SWS and non-SWS classification results in different schemes using ten-

Fold CV

Result Prec. (%) Sens. (%) Spec. (%) Acc. (%) Kappa κ AUCPR

Scheme A: without smoothing and without time delay

Pool 53.8 53.9 91.8 86.1 0.45 0.54

Average 53.5 ± 17.7 54.9 ± 18.7 91.8 ± 3.5 86.0 ± 4.3 0.43 ± 0.17 0.55 ± 0.18

Scheme B: with smoothing and without time delay

Pool 56.8 57.2 92.3 87.0 0.49 0.60

Average 56.8 ± 18.3 58.1 ± 18.7 92.4 ± 3.8 87.0 ± 4.3 0.48 ± 0.17 0.61 ± 0.18

Scheme C: without smoothing and with time delay (τ = −2.5 min)

Pool 59.1 61.7 92.5 87.9 0.53 0.62

Average 59.0 ± 17.7 63.2 ± 20.5 92.5 ± 3.5 87.8 ± 4.4 0.52 ± 0.18 0.63 ± 0.18

Scheme D: with smoothing and with time delay (τ = −5 min)

Pool 61.8 65.6 92.9 88.8 0.57 0.68

Average 62.0 ± 17.8 67.2 ± 20.4 93.0 ± 3.7 88.7 ± 4.2 0.56 ± 0.17 0.69 ± 0.18

In total six features were selected via CFS (see Table 8.2). Classifier threshold was chosen to maxi-

mize κ for training data. Significance of difference was confimed between scheme D and the others for

accuracy, κ , and AUCPR using a paired (two-sided) Wilcoxon signed-rank test (p < 0.0001).

removing spurious (very few epochs) detections of a class in the middle of a longer period of

the other class. It confirms our expectation that the feature smoothing is an adequate way to

handle this type of errors.

Table 8.4 presents the confusion matrix of our SWS and non-SWS classifier based on car-

diorespiratory features with smoothing and a 5-min negative delay. To analyze the source of

false positives or alarms (i.e., instances where non-SWS epochs were classified as SWS), the

breakdowns of classification results for non-SWS between wake, REM sleep, S1, and S2 are

also given.


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

Scheme A: without smoothing and time delay

Scheme B: with smoothing, without time delay

Scheme C: without smoothing, with time delay (-2.5 min)

Scheme D: with smoothing and time delay (-5 min)

Figure 8.4: Pooled PR curves of SWS and non-SWS classification in different schemes, where the

scheme D performed the best.

Non-SWS

SWS

Non-SWS

SWS

Non-SWS

SWS

Non-SWS

SWS

0 1 2 3 4 5 6 7

Non-SWS

SWS

Time (h)

PSG

Scheme A

Scheme B

Scheme C

Scheme D

Figure 8.5: An example of overnight PSG-based annotations and the corresponding SWS and non-SWS

classification results in different schemes.

When using one signal modality alone, the classification performance would degrade as

shown in Table 8.5 (average of κ = 0.54 for ECG or of κ = 0.51 for RE). Since the optimal time

delay (−5 min) was found to be the same for either ECG or RE features, it was then used for

comparison. Although the inclusion of ECG and RE signals yielded a better classification per-


Table 8.4: Confusion matrix of SWS and non-SWS classification with

indication of false positives (normal group)

PSG → SWS non-SWS

Classified ↓ Total S2 S1 REM Wake

SWS 23122 14283 12841 444 424 574

non-SWS 12106 185309 90693 18172 36844 39060

Table 8.5: Comparison of SWS and non-SWS classification results

using different signal modality (normal group)

Signal modality #Features∗ Accuracy (%) Kappa κ AUCPR

ECG 4 88.2 0.54 0.65

RE 4 87.7 0.51 0.61

The pooled results (in scheme D) are presented.∗Features were selected with CFS (see Table 8.2).

Table 8.6: Performance comparison of SWS detection for different subject

groups of recordings in terms of SWS time.

Subject group N SWS time (min) Accuracy (%) Kappa κ AUCPR

Normal∗ 257 ≥30 min 88.8% 0.57 0.68

Low-SWS 68 (0, 30) min 92.3% 0.21 0.17

All 325 >0 min 88.9% 0.51 0.58

Pooled results (in scheme D) are presented.∗The group focused in this work, in which the recordings had a more repre-

sentative normal sleeping pattern as in a home-based environment where the

overnight SWS time was less influenced by laboratory effects.

formance and they can be easily and unobtrusively acquired as mentioned before, our approach

is still applicable to achieve reasonable results when one of them is absent.

We also applied our SWS detection approach for all the 325 recordings and for those in

the low-SWS group (68 recordings from 51 subjects) with the total SWS time of less than 30

min, where the classification results are presented in Table 8.6. The results for the low-SWS

group (κ = 0.21) were much worse than the normal group engaged in this study, due to which

the classification performance for all recordings dropped to κ = 0.51. Figure 8.6 (upper graph)

illustrates the relation between the amount of SWS and age, confirming what is known from

literature [216], i.e., that the amount of SWS decreases with age. Figure 8.6 (middle graph)

illustrates the classification performance versus SWS time, which were (positively) significantly

correlated. Figure 8.6 (lower graph) shows a significant (negative) correlation between κ and

age, indicating that the classification performance was age-dependent.


20 30 40 50 60 70 80 90 1000

30

60

90

120

150

Age (y)S

WS

tim

e (

min

)

0 30 60 90 120 1500

0.2

0.4

0.6

0.8

1

SWS time (min)

Kappa

20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

Age (y)

Kappa R

2= 0.13

R2

= 0.28

R2

= 0.11

Figure 8.6: Relation between overnight SWS time, subject age, and classification performance (κ) of all

325 recordings (including the normal and the low-SWS recordings). Lines represent the linear equations

fitted for data from samples. Significant Spearman’s rank correlation was found between SWS time and

age (r = −0.35), between κ and SWS time (r = 0.52), and between κ and age (r = −0.32) at p < 0.001.

8.6 Discussion

It is noted that Hedner et al. [127] evaluated a sleep staging system and obtained a SWS and

non-SWS classification with a κ of 0.48 (re-computed in terms of their reported confusion

matrix). They deployed pulse rate, peripheral arterial tone, and actigraphy and their κ value is

smaller than that produced in the current study. To provide a fair comparison, we achieved a

κ of 0.51 for all 325 recordings, which still outperforms their result. A respiratory-based sleep

stager was developed in our previous work [182], reporting a κ of 0.43 in detecting SWS where

a subset of 48 normal sleep nights from the same database were included. It is lower than the

result presented here (κ = 0.51), generated using only four respiratory features.

Although feature smoothing can, in general, benefit SWS detection, it may also introduce

errors when detecting very short SWS periods. This is because some of the high-frequency fea-

ture components are likely not due to measuring noise or to outliers caused by motion artifacts,

but rather reflect some essential characteristics of short SWS periods. In the case of a longer

SWS duration, smoothing the feature values would reduce noise and, in consequence, increase

specificity. However, in the case of a shorter SWS duration (i.e., fragmented SWS), sensitivity

would decrease. This procedure should then be adopted by finding an optimal trade-off be-

tween rejecting noise and keeping useful information. As shown in Table 8.3, the spline fitting

increased all metrics. This was also the case for the low-SWS recordings where we found that

the improvement was mainly contributed by the increase of specificity. The reason might be

that many false positives (misclassified non-SWS epochs) in a long period of non-SWS were


corrected through feature smoothing.

In addition, the “optimal” time delay might be varying over the night possibly influenced by

the sleep stage immediately before or after SWS periods where the transition dynamics between

SWS and different stages in non-SWS usually change over time [159]. Hence, fixing the time

delay on features might not be the most appropriate strategy. An adaptive time delay model

depending on sleeping time should instead be investigated.

In Table 8.4, we notice that the false alarms mostly occurred in stage S2, of which 12.4%

were misclassified as SWS. Evidence has shown that the autonomic activity differs little be-

tween S2 and SWS [292]. Consequently, regardless of whether cardiac or respiratory activity

was used, there might be small differences between these two sleep stages. In fact, even with

PSG, scoring SWS is somewhat difficult due to the gradual changes in physiology between

SWS and light sleep. A relatively low inter-rater agreement with a Cohen’s Kappa coefficient

of only 0.71 was reported [81], which would lead to presence of fragmented SWS shown in

hypnogram (PSG-based annotations). Nevertheless, it merits further investigation upon how to

better discriminate between S2 and SWS stages by means of cardiorespiratory information.

As shown in Table 8.6, the marked decreased classification results for the low-SWS group

indicate that our classifier could not handle well the recordings with a very low number of SWS

epochs. These recordings in the low-SWS group had heavily imbalanced classes, which was

clearly more challenging from a classification point of view. Moreover, it was found that the

recordings with a decreased SWS time along with fragmented SWS appeared more during the

first nights than the second nights, likely caused by the “first-night effect” in a laboratory study

[196]. It can be clearly seen in Figure 8.6 that the total SWS time over night was significantly

correlated with subject age (negatively) and the classification performance (positively). The

figure also illustrates that our SWS detector performed better for young subjects with more SWS

time than elderly. The low-SWS nights should not be neglected and it is therefore suggested

to address SWS detection for subjects with less overnight SWS time (likely at older ages) in

future work.

Finally, as mentioned before, automatic cardiorespiratory-based SWS detection will benefit

the applications of prompting slow waves for enhancing memory consolidation during sleep in

an unobtrusive manner. This usually requires an online system for detecting SWS in real-time.

Although our proposed approach can anticipate the occurrence of SWS with 5 min ahead where

the window size (one side) used for computing features were all less than this time interval, the

feature normalization (based on the entire-night recording) and smoothing (with a spline win-

dow of 25 epochs) would still limit the achievement of an online SWS detector. However, these

limitations seem manageable. First, since the feature normalization mainly served to diminish

between-subject variability, this can be alternatively achieved by using the previous night (as a

baseline) to normalize the following nights in real applications where usually multiple nights

are expected. Second, the smoothing window size can be reduced and the smoothing can be

‘time-progressive’ to fit the online requirement. Nevertheless, their influences on the detection

performance should be further studied when targeting an online SWS detection system.


8.7 Conclusion

In this chapter, overnight epoch-by-epoch classification of nocturnal SWS and non-SWS was

achieved based on cardiorespiratory signals which can be acquired unobtrusively. To reduce

classification errors caused by, for example, sensor noise, body motion artifacts, and/or within-

subject variability, a recording-specific feature smoothing using spline fitting was employed.

Besides, we used the features anticipating each target epoch to identify SWS of that epoch as

long as the preceding cardiorespiratory activity (compared with the PSG-based annotations) ap-

peared during the transitions between SWS and non-SWS. With an LD classifier, we revealed

that the use of feature smoothing and time delay profoundly improved the classification per-

formance (κ of 0.57 versus 0.45). Our approach also produced reasonable classification results

when only one of the signal modalities was present. Furthermore, the classifier performed much

better for subjects who had more total SWS time than for subjects with less SWS time.

Part III: Cardiorespiratory-Based Sleep

Stage Classification

CHAPTER 9

Effects of between- and within-subject variability on

autonomic cardiorespiratory activity during sleep and

their limitations on sleep staging: a multilevel analysis

This chapter is adapted from: X. Long, R. Haakma, T.R.M. Leufkens, P. Fonseca, and R.M. Aarts. Ef-

fects of between- and within-subject variability on autonomic cardiorespiratory activity during sleep and

their limitations on sleep staging: a multilevel analysis. Submitted.

Abstract – Autonomic cardiorespiratory activity changes across sleep stages. However, it

presents unknown to what extent it is affected by variability between and within subjects dur-

ing sleep. As hypothesized that the variability is caused by differences in subject demograph-

ics (age, gender, and body mass index), time, and physiology, we quantitatively investigated

these effects and their limitations on achieving reliable cardiorespiratory-based sleep staging.

Polysomnographic recordings from 165 normal sleepers were included. Six representative pa-

rameters (30-s basis) obtained from overnight heartbeats and respiration, such as breathing rate

and heart rate, were analyzed. Multilevel models were used to evaluate the effects evoked by

differences in sleep stages, demographics, time, and physiology between and within subjects.

We also compared the cardiorespiratory-based sleep staging performance with and without cor-

recting the associated effects. Results show that the between- and within-subject effects were

found to be significant for each parameter. The between-subject variability influenced more on

breathing rate and heart rate while less on their variations compared with the within-subject

variability. When adjusted by sleep stages, the effects in physiology between and within sub-

jects explained more than 80% of total variance but the others explained less. If these effects

were corrected, profound improvements in sleep staging were observed. These results indicate

that the differences in subject demographics, time, and physiology present significant effects

on cardiorespiratory activity during sleep. The primary effects come from the physiological

variability between and within subjects, markedly limiting the sleep staging performance using

cardiorespiratory information. Efforts to diminish these effects will be the main challenge.

125

126 Chapter 9. Effects of between- and within-subject variability

9.1 Introduction

Polysomnography (PSG) is the gold standard and common practice for the objective analyses of

sleep architecture (hypnogram) and sleep-related disorders such as insomnia/parasomnia, sleep-

disordered breathing, and rapid-eye-movement (REM) sleep behavior disorder [168]. With

PSG, sleep stages are manually scored on continuous 30-s epochs based on electrophysiologi-

cal signals including electroencephalogram (EEG), electromyogram (EMG), and electrooccu-

logram (EOG) according to the Rechtschaffen and Kales (R&K) rules [247] or the more recent

guidelines of the American Academy of Sleep Medicine (AASM) [136]. PSG recordings are

usually acquired in a sleep laboratory that requires a lot of manual labor for visual scoring.

It is costly and uncomfortable for subjects and therefore not suited for long-term monitoring.

These disadvantages motivated sleep researchers and clinicians to devote more attention to al-

ternatives such as cardiac and respiratory activities, allowing unobtrusive sleep staging with

minimal discomfort to subjects [127, 161, 249, 253, 303].

Cardiorespiratory activity has been proven to associate with autonomic sympathetic and

parasympathetic (or vagal) nervous system in humans, which relates to sleep stages [95, 135,

183, 279, 292]. For example, the sympathetic activation of the heart usually translates in an in-

creased spectral power of heart rate variability (HRV) in the low-frequency (LF) band between

0.04 and 0.15 Hz and the vagal activity (primarily caused by respiratory sinus arrhythmia) is

associated with the spectral power in the high-frequency (HF) band between 0.15 and 0.4 Hz

[288]. During rapid-eye-movement (REM) sleep, the HF spectral power increases while the

LF spectral power decreases, when compared with non-REM (NREM) sleep and wakefulness

[265]. Furthermore, the respiratory volume and frequency are more regular during NREM sleep

than during REM sleep and wakefulness [95]. Irregular respiration patterns occurring during

wakefulness are usually caused by body movements or alternation of ventilation control ma-

nipulated by some external factors; during REM sleep they can be related to muscle atonia or

subcortical structures with a possible involvement of the bizarre content of dreams [230, 233].

In addition to sleep stages, the cardiorespiratory activity can be influenced by between-

subject variability with respect to 1) subject demographics (including body size) such as age,

gender, and body mass index (BMI) [49, 227, 267], and 2) internal physiology such as response

of autonomic regulation, metabolic function, and subcortical arousals [132, 269, 305]. Other

factors, which differ from subject to subject and within subjects, such as conscious breathing

control and external sleep environment (e.g., noise and temperature), can also cause variations

in autonomic response during sleep [55, 56, 64, 206]. Furthermore, the autonomic activity ap-

pears as a function of time and the ratio of NREM and REM sleep in a sleep cycle changes

during the time course of the night [46, 292]. These would also be reflected in changes of

cardiorespiratory activity throughout the night within subjects. Additionally, the daytime activ-

ity and stressful events may change the sleep architecture and, consequently, affect autonomic

control of cardiorespiratory activity during the night [11, 118, 122]. It is however not clear to

which extent each of these effects can explain the variations in cardiorespiratory activity during

sleep.

In regard to automatic sleep staging with autonomic cardiorespiratory activity, parameters

Part III. Cardiorespiratory-based sleep stage classification 127

Table 9.1: Subject demographics and sleep statistics (n=165).


Gender 77 men and 88 women

Age (y) 51.8 ± 19.4 20 − 95



Wake (%) 22.7 ± 13.2 1.2 − 78.6

REM sleep (%) 13.6 ± 5.3 0 − 26.5

Light sleep (%) 52.3 ± 10.0 15.6 − 72.1

Deep sleep (%) 11.4 ± 6.6 0 − 28.5

are usually derived from cardiac and respiratory signals on a 30-s epoch basis [136, 247]. Due

to the existence of between- and within-subject (variability) effects, the correct identification of

sleep stages based on the cardiorespiratory parameters seems challenging, in particular when

a subject-independent model is used (i.e., when a model is derived from a set of subjects, and

used to identify sleep stages for other new subjects).

The aim of this fundamental study was to quantitatively investigate the effects of between-

and within-subject variability on cardiorespiratory activity during sleep and to evaluate the

limitations of these effects on achieving reliable cardiorespiratory-based sleep staging results.


9.2.1 Subjects and protocol

A total of 165 healthy subjects participating in the SIESTA project [160] were included in

this study. The subjects were monitored over a period of three years from 1997 to 2000 in

seven different sleep laboratories located in five European countries. The subject demograph-

ics [mean ± standard deviation (SD)] including age, gender, and BMI are given in Table 9.1.

The protocol was approved by local ethics committees of all sleep laboratories involved and

all subjects provided a written informed consent. The subjects fulfilled the following criteria:

no significant medical disorders, no reported symptoms of neurological, mental, medical or

cardiovascular disorders, no history of drug abuse or habituation (including alcohol), no psy-

choactive medication or other drugs (e.g., beta blockers), no shift work, and usually retirement

to bed between 22:00 and 24:00 depending on their habitual bedtime ([160].

9.2.2 PSG recordings

For each subject, single-night full PSG recordings were obtained. Each recording consists of

at least 16 channels including EEG (C3-M2, C4-M1, O1-M2, O2-M1, Fp1-M2 and Fp2-M1),

EMG (chin and leg), EOG (2 leads), electrocardiogram (ECG, single-channel, modified V1

lead), nasal airflow, respiratory effort (abdominal and chest wall with respiratory inductance

plethysmography), snoring (microphone), and blood oxygen saturation [160]. Only the ECG


signals, sampled at 100 Hz, 200 Hz, or 256 Hz depending on the equipment setup of each sleep

laboratory, and the respiratory (chest) effort signals, all sampled at 10 Hz were used in this

study.

Each PSG recording was visually annotated in 30-s epochs as nighttime wake, REM sleep,

and one of the NREM sleep stages S1-S4 by two independent raters according to the R&K rules.

In case of disagreement, the consensus annotations between the two raters were obtained. For

the analysis in this study, we considered four stages: wake, REM sleep, light sleep (merging

S1 and S2), and deep sleep or slow wave sleep (merging S3 and S4). Table 9.1 presents some

sleep statistics of the recording nights.

9.2.3 Data preparation

The ECG and respiratory effort signals of all subjects were preprocessed before computing the

parameters used for analyses. The baseline wander of the ECG signal was removed with a linear

phase high-pass filter using an 1.106-s Kaiser window with a 0.8-Hz cutoff frequency and a 30-

dB side-lobe attenuation [297]. The resulting signal was normalized with regard to mean and

amplitude and a low-complexity precise QRS complex localization algorithm [107] was used

to locate the R peaks in the signal. The resulting heartbeat or RR intervals were re-sampled at

4 Hz using a linear interpolator. To compute the cardiac parameters in the frequency domain,

the power spectral density of the re-sampled RR intervals was estimated with an autoregressive

model [42]. Ectopic RR intervals longer than 2 s, shorter than 0.3 s, or shorter than 0.6 times

their previous value were discarded.

The respiratory effort signal was first low-passed filtered using a 10th order Butterworth fil-

ter with a cut-off frequency of 0.6 Hz to eliminate high-frequency noise. Afterwards, the signal

baseline was removed by subtracting the median peak-to-trough amplitude estimated over the

entire signal. The respiratory peaks and troughs were detected by locating the signal turning

points based on sign changes of signal slopes. Finally, we excluded incorrectly detected peaks

and troughs 1) in peak-to-trough or trough-to-peak intervals where the sum of two successive

intervals was less than the median of all intervals over the entire recording and 2) with am-

plitudes where the peak-to-trough difference was smaller than 0.15 times the median of the

entire-night respiratory signal [185].

9.2.4 Cardiorespiratory parameters

We analyzed six cardiorespiratory (two respiratory and four cardiac) parameters. The respi-

ratory parameters were BR, the mean breathing rate or respiratory frequency, and SDBR, the

standard deviation of breathing rates. For cardiac activity, the time-domain parameters included

HR, the mean heart rate, and SDNN, the standard deviation of heartbeat intervals. The spectral-

domain parameters included LF, the spectral power of heartbeat intervals in the LF band, and

HF, the spectral power in the HF band. Note that LF and HF were normalized by dividing

them by the total spectral power minus the power in the very-low-frequency (VLF, 0.003-0.05

Hz) band [58, 288]. This resulted in their expressions in a normalized unit (nu) instead of the


absolute unit (ms2). These parameters have been widely used for the task of cardiorespiratory-

based sleep staging [94, 182, 185, 248, 249]. A logarithmic transformation was applied to BR,

SDBR, HR, and SDNN to correct for non-symmetry in the frequency distributions. Measure-

ment units are therefore expressed in natural logarithmic Hz (ln-Hz) for BR and SDBR, natural

logarithmic beats per minute (ln-bpm) for HR, and natural logarithmic millisecond (ln-ms) for

SDNN.

9.2.5 Descriptive statistics

Values of the cardiorespiratory parameters (mean ± SD) measured from subjects with different

demographics (gender, age, and BMI) and time of night are presented. We considered different

cohort sets including three age groups: young (20-39 y), middle (40-69 y), and elderly (>69

y) and three BMI groups: under weight (<18.5 kg/m2), normal weight (18.5-25 kg/m2), and

over weight (>25 kg/m2). In addition, total sleep time was divided into four periods: 0-90

min, 90-180 min, 180-270 min, and >270 min. Significance of difference between groups was

tested with the analysis of variance (ANOVA) F-test.

9.2.6 Multilevel analysis

Traditional statistical methods such as repeated measures ANOVA (rANOVA) and repeated

measures multivariate ANOVA (rMANOVA) are often used to analyze longitudinal data. How-

ever, they might not be appropriate since they expect uncorrelated and independent observations

[23]. In regard to the nature of multiple dependent variables, a more generalized multilevel re-

gression analysis [134] takes structural variables with fixed and random effects measured at

multiple hierarchical levels into account. Compared with the traditional methods, multilevel

analysis has several advantages [134, 301]. First, it serves to deal with incomplete data while

ANOVA-based methods handle that by simply deleting all subjects with missing measures.

Second, it concerns data with a hierarchical structure and thus allows for meta-analysis of ex-

planatory variables with effects on different levels simultaneously. Third, it is able to quantify

the variability within levels. To these matters, we applied multilevel models to statistically eval-

uate the effects of between- and within-subject variability on the cardiorespiratory parameters.

Under a variety of names used by different authors, multilevel models are also known as, e.g.,

mixed models, random-effects models, and hierarchical linear models [134].

9.2.6.1 Between- and within-subject effects

The between-subject variability effects on cardiorespiratory activity can be linked to physiology

and subject demographics (age, gender, and BMI). On the other hand, cardiorespiratory activity

may change depending on the time of night within subjects [292]. This time effect can also vary

between subjects. Most multilevel models assume homogeneity or equality of variance for each

prediction variable, whereas this might not hold for the time effect. Therefore, it is hypothesized

that the time effect also changes along with subject demographics. This can be evaluated by

‘cross-interactions’ between time and demographic variables. Here we did not take into account


the influences from the differences in sleep environment, daytime energy expenditure, and other

factors or behaviors such as stress, smoking, and personality. These influences, if existent, were

assumed to be conveyed by the physiological variability. Additionally, in our previous work

[184], there were no effects on the cardiac activity found between different laboratories based

on the same data. For this reason, we disregarded the laboratory factor during our modeling

procedure.

To evaluate the between- and within-subject effects, we constructed a multilevel model with

two levels (level two: subject; level one: time or epoch) for a given cardiorespiratory parameter

y. The model predicts/estimates the values of the parameter based on a set of variables including

sleep stages, age, gender, BMI, and time of night. For the parameter value yi j in the ith epoch

of the night (i = 1,2, . . . ,N with a total of N epochs) from subject j ( j = 1,2, . . . ,M where M

is the total number of subjects), the two-level regression model with associated coefficients is

given by

Model #1 : yi j = β0 +∑s

(βs+µs j)si j +(βt +µt j)timei j + e0i j

+βaage j +βggender j +βbBMI j

+βta(time×age)i j +βtg(time×gender)i j +βtb(time×BMI)i j

with

µ0 j

µs j

µt j

∼ N

0

0

0

,

Ω0

Ωs

Ωt

and e0i j ∼ N(0,Ωe), (9.1)

in which β0 is the fixed intercept, µ0 j is the random effect with variance Ω0 indicating the

between-subject variability in physiology (independent of sleep stages or corrected by sleep

stages), and e0i j is the (random) residual term with variance Ωe quantifying the within-subject

physiological variability (independent of time). si j is a dummy variable (0 or 1) specifying the

sleep stage (s = wake,REM, light,deep) of epoch i from subject j with its fixed effect βs and

random effect µs j, where Ωs reflects the between-subject physiological variability in sleep stage

s. The demographic variables age (y), gender (dummy variable: 0 = man, 1 = woman), and BMI

(kg/m2) respectively correspond to the fixed effects βa, βg, and βb varying between subjects.

The variable timei j (min) expresses the relative time of epoch i (timei j = i/2) from subject j,

βt is the fixed time effect corresponding to linear changes over time within subjects, µt j is the

random time effect with variance Ωt indicating the variability of time effect between subjects,

and βa, βta, and βtb are cross-interactions specifying the fixed age-, gender-, and BMI-related

time effects, respectively. Note that the variances from the random effects (including residuals)

were assumed to be drawn from a normal distribution with zero mean. Here the normality was

visually checked using a heuristic Quantile-Quantile (Q-Q) plot method since the commonly

used numerical normality tests are not appropriate on large-sized samples [271].


9.2.6.2 Centering effect

Intuitively, the mean value of a specific cardiorespiratory parameter over the entire night may

differ from subject to subject, which might be due to the physiological variability between sub-

jects at the general mean level. Cronbach [76] proposed a model that regards an additional

predictor indicating the between-group centering effect in real applications, allowing expres-

sions of parameter values as deviations from the group means. In this study, the model with

centering (physiological) effect for a given parameter can be expressed as

Model #2 : yi j = β0 +∑s

(βs +µs j)si j +(βt +µt j)timei j +βcy j + e0i j

+βaage j +βggender j +βbBMI j

+βta(time×age)i j +βtg(time×gender)i j +βtb(time×BMI)i j

with

µ0 j

µs j

µt j

∼ N

0

0

0

,

Ω0

Ωs

Ωt

and e0i j ∼ N(0,Ωe), (9.2)

where y j is the variable that gives the within-subject mean value over the entire night for sub-

ject j and its associated fixed slope βc corresponds to the between-subject centering effect.

This effect is meant to reflect the physiological difference between subjects at the (individual)

overnight mean level. Here the estimation of the overnight mean value was assumed to be in-

dependent of sleep stage composition (percentages of sleep stages) over the entire night. To a

certain degree, the demographic effects were expected to be conveyed by the centering effect.

Therefore, the model without the centering term (Model #1) should be used for exploring the

actual demographic effects with a single model.

9.2.6.3 Model estimation and optimization

The multilevel modeling was implemented using the MLwiN software (Centre for Multilevel

Modeling, the University of Bristol, UK), where an iterated generalized least square (IGLS)

algorithm is issued for the model estimation, i.e., the estimates of regression coefficients and

their variances [244]. The model goodness-of-fit can be evaluated by the deviance (measured

by -2·log-likelihood) obtained during the modeling procedure.

A Wald Z-test was used to statistically examine the significance of the effects, testing the

null hypothesis that a coefficient equals zero [134]. For each estimated model coefficient or

variance corresponding to a specific effect, the Wald Z statistic is computed as the square of

the estimated coefficient divided by its standard error (SE)

Z =γ2

SE2(γ). (9.3)

The acceptance or rejection of the null hypothesis can be tested with a Chi-squared (χ2) test

with one degree of freedom (df).


Table 9.2: Description of the seven explanatory effects (with exclusion of sleep stage effects) on

cardiorespiratory activity considered in this study.

Effect Description

Overall between-subject effect

Demographic effect Fixed, variability in age, gender, BMI between subjects

Centering (physiological) effect Fixed, variability in overnight mean level between subjects

Between-subject time effect Random, variability in time of night between subjects

Between-subject physiological effect Random, variability in physiology between subjects

Overall within-subject effect

Within-subject time effect Fixed, variability in age, gender, BMI within subjects

Within-subject physiological effect Random, variability in physiology within subjects

Cross-interaction effect

Demographic-related time effect Fixed, demographic-related variability in time of night

The models described in Equation 9.1 and 9.2 are ‘full’ models and need to be optimized

by excluding the effects with coefficients statistically not different from zero (tested with the

Wald statistic). Differences between models are assessed by comparing model deviances using

a χ2 statistic (i.e., likelihood ratio test) with df = 2. This chapter only presents the results of the

optimized models that are manipulated by significant effects.

9.2.7 Explanations of variance

It is of particular interest in interpreting how much the model variance is explained by different

variables or effects. As described in Table 9.2, a total of seven explanatory effects for each car-

diorespiratory parameter were considered in this study. Raudenbush and Bryk [246] proposed

an approach by using the squared multiple correlation R2 a sequence of models. Suppose that

the full model under consideration for a given parameter is Model #2, given by Equation 9.2.

A sequence of seven models (Model A-G) can be established in a certain order that serves to

compute the proportion of variance explained (PVE) of each effect. The details of doing this is

described in the Appendix.

9.2.8 Between- and within-subject effects in sleep staging

9.2.8.1 Sleep staging algorithm

Linear discriminant (LD) has been shown to be an appropriate algorithm in classifying overnight

sleep stages based on cardiorespiratory activity in many studies [248, 249]. In this work we

adopted an LD classifier to perform automatic sleep staging. Overall accuracy and the Cohen’s

Kappa coefficient of agreement [72] were used to evaluate the classifier’s performance. Addi-

tionally, sleep statistics including the percentages of wake, REM sleep, light sleep, and deep

sleep were calculated. In order to verify the classification performance, the subjects were ran-

domly divided into a training set of 82 subjects used to train the classifier and a testing set of


83 subjects for testing.

9.2.8.2 Comparison of correction schemes

The objective was to examine how much the between-and within-subject effects on the car-

diorespiratory activity would restrict the performance in classifying sleep stages (wake, REM

sleep, light sleep, and deep sleep) and then estimating the sleep statistics. For comparison, we

analyzed three different ‘correction’ scheme (CS) based on the optimized Model #2 with esti-

mated model coefficients to correct (or predict) the values for each parameter. The corrected

values were then used to perform sleep staging. The sleep staging using originally measured

values without any corrections served as the baseline (BS).

• The first correction scheme CS1 predicts the parameter values with subtraction of all the

fixed effects independent of sleep stages, such that

CS1: yi j = µ0 j +∑s

(βs + µs j)si j + µt jtimei j + e0i j. (9.4)

• The second scheme CS2 corrects the parameter values by subtracting all the (sleep-stage-

independent) fixed effects and all the between-subject random effects, such that

CS2: yi j = ∑s

βssi j + e0i j. (9.5)

• The third scheme CS3 excludes all the (sleep-stage-independent) fixed effects and the

within-subject effect to correct the parameter values, such that

CS3: yi j = µ0 j +∑s

(βs + µs j)si j + µt jtimei j. (9.6)

Note that, again, the exclusive aim of analyzing these correction schemes in the present study

was to evaluate in what aspect and how far the cardiorespiratory parameters can be improved for

sleep staging instead of really performing sleep staging. In other words, we intended to answer

the question what sleep staging performance can be achieved if we can eliminate the effects

caused by the between- or within-subject variability. Investigating methods of estimating the

fixed coefficients and random variances without knowing sleep stages was not addressed here.

9.3 Results

9.3.1 Descriptive results

Figure 9.1 compares the skewness of the parameters with and without being transformed using

natural logarithm. It indicates that the four parameters BR, SDBR, HR, and SDNN need to

be log-transformed since they were of skewed distribution and their skewness values largely

decreased after performing the log-transformation. Table 9.3 shows the values (mean ± SD) of

the six cardiorespiratory parameters BR, SDBR, HR, SDNN, LF, and HF analyzed in this study

for different cohort sets in different gender, age groups, BMI groups, time periods, and sleep

stages. The values significantly differed across different groups for all the cohort sets (ANOVA

F-test, p < 0.001).


BR SDBR HR SDNN LF HF-2

-1

0

1

2

3

Cardiorespiratory parameter

Skew

ness

Original

Natural logarithm

Figure 9.1: Skewness comparison of cardiorespiratory parameters with and without natural logarithm

transformation, indicating that BR, SDBR, HR, and SDNN should be log-transformed.

9.3.2 Multilevel modeling

In comparison with the F-test, the multilevel regression models enable a more adequate and

thorough statistical analysis. With the multilevel Model #1, the estimated coefficients and vari-

ances for all the parameters are shown in Table 9.4. As a result of removing the insignificant

variables (tested using the Wald Z-test with p > 0.05) except for the constant intercept and

sleep stage variables, the model was optimized. The table indicates that the demographics sig-

nificantly influenced the cardiorespiratory activity from different aspects. Upon a closer look, it

is found that the breathing rate BR for the healthy subjects with a higher BMI was significantly

higher than the subjects with a lower BMI (0.011 ln-Hz per kg/m2, p < 0.01) at the baseline

of -1.458 ln-Hz, whereas its variation SDBR remained the same. For cardiac activity, the mean

heart rate HR of women was higher than men (0.042 ln-bpm, p < 0.05) at the baseline of 4.221

ln-bpm while its variation SDNN was lower than men (-0.247 ln-ms, p < 0.0001) at the base-

line of 4.823 ln-ms. SDNN was also negatively correlated to subject age (-0.009 ln-ms per y,

p < 0.0001) and BMI (-0.025 ln-ms per kg/m2, p < 0.01). With the spectral analysis of HRV,

men had an LF power increased by 0.045 nu (p < 0.05) but a lower HF power of 0.052 nu

(p < 0.01) nu compared with women during bedtime sleep. The HF power slightly decreased

along with the increase in age for men (-0.002 nu per y, p < 0.05). These results are consistent

with previous work [49, 101, 257].

Most of the analyzed parameters were found to be time-variant (i.e., they were modulated

by time of night) with an exception of breathing rate (Table 9.4). For instance, the heart rate

HR dropped down gradually along with the time progression over the night (-0.0001 ln-bpm


Table 9.3: Values (mean ± SD) of the six cardiorespiratory parameters in different cohort sets

(n=165)

Cohort set BR SDBR HR SDNN LF HF

(ln-Hz) (ln-HZ) (ln-bpm) (ln-ms) (nu) (nu)

Gender

Man -1.20 ± 0.24 -3.67 ± 0.75 4.13 ± 0.15 3.74 ± 0.77 0.42 ± 0.23 0.47 ± 0.23

Woman -1.22 ± 0.23 -3.81 ± 0.76 4.16 ± 0.16 3.49 ± 0.71 0.39 ± 0.22 0.50 ± 0.23

Age

Young -1.24 ± 0.24 -3.85 ± 0.74 4.11 ± 0.16 3.94 ± 0.63 0.36 ± 0.20 0.56 ± 0.22

Middle -1.20 ± 0.24 -3.71 ± 0.78 4.15 ± 0.16 3.52 ± 0.69 0.45 ± 0.23 0.45 ± 0.23

Elderly -1.18 ± 0.20 -3.70 ± 0.71 4.17 ± 0.13 3.39 ± 0.81 0.38 ± 0.24 0.45 ± 0.22

BMI

Underweight -1.24 ± 0.14 -4.00 ± 0.66 4.11 ± 0.12 4.01 ± 0.53 0.36 ± 0.18 0.56 ± 0.19

Normal -1.23 ± 0.23 -3.77 ± 0.74 4.14 ± 0.16 3.72 ± 0.73 0.41 ± 0.22 0.48 ± 0.23

Overweight -1.18 ± 0.24 -3.70 ± 0.77 4.15 ± 0.15 3.46 ± 0.75 0.39 ± 0.23 0.48 ± 0.23

Time of night

0-90 min -1.22 ± 0.22 -3.81 ± 0.80 4.16 ± 0.15 3.52 ± 0.73 0.39 ± 0.22 0.50 ± 0.23

90-180 min -1.21 ± 0.22 -3.85 ± 0.75 4.17 ± 0.15 3.58 ± 0.74 0.42 ± 0.23 0.46 ± 0.23

180-270 min -1.20 ± 0.23 -3.77 ± 0.77 4.15 ± 0.16 3.61 ± 0.77 0.41 ± 0.23 0.48 ± 0.23

>270 min -1.21 ± 0.24 -3.66 ± 0.72 4.12 ± 0.15 3.67 ± 0.75 0.40 ± 0.22 0.49 ± 0.23

Sleep stage

Wake -1.16 ± 0.23 -3.25 ± 0.62 4.19 ± 0.15 3.61 ± 0.78 0.42 ± 0.24 0.44 ± 0.23

REM sleep -1.18 ± 0.22 -3.44 ± 0.52 4.15 ± 0.16 3.64 ± 0.76 0.45 ± 0.23 0.42 ± 0.22

Light sleep -1.23 ± 0.23 -3.89 ± 0.73 4.13 ± 0.15 3.64 ± 0.73 0.40 ± 0.22 0.49 ± 0.23

Deep sleep -1.24 ± 0.23 -4.29 ± 0.71 4.14 ± 0.15 3.45 ± 0.72 0.33 ± 0.21 0.57 ± 0.21

ln, natural logarithm; nu, normalized unit; young, 20-39 y; middle, 40-69 y; elderly, >69 y; under weight,

<18.5 kg/m2; normal weight, 18.5-25 kg/m2; over weight, >25 kg/m2; light sleep, S1 and S2 stages;

deep sleep, S3 and S4 stages. For all the parameters, values between each cohort groups were signifi-

cantly different (F-test, p < 0.001) but this may be imprecise since subject demographics, time of night,

and sleep stages were possibly not independent.

per min, p < 0.0001) at the baseline of 4.221 ln-bpm while the variation in heartbeat intervals

SDNN increased (0.001 ln-ms per min, p < 0.0001) at the baseline of 4.823 ln-ms, confirming

the findings reported previously [57]. This time modulation varied from subject to subject

because of the presence of significant variance Ωt (p < 0.0001), referring to the random time

effect. The time was also modulated by some demographic variables (such as age for SDNN

and BMI for SDBR, LF, and HF). We note in the table that there appeared to be significant

between-subject physiological effects for all parameters (p < 0.0001), measured by the random

variances of sleep stage variables. These variances seemed approximately homogeneous across

sleep stages for BR and HR but were clearly different for their variations SDBR and SDNN.

Figure 9.2 illustrates an example that compares the parameter values (estimated by multilevel

regression based on Model #1) changing along with time between two subjects with different


Table 9.4: Coefficients and their standard errors (SE) of the optimized multilevel model without the

between-subject centering effect (Model #1) for the six cardiorespiratory parameters.

Coef. BR SDBR HR SDNN LF HF


Fixed, coefficient (SE)

β0 -1.458 (0.087) -3.320 (0.032) 4.221 (0.016) 4.823 (0.255) 0.464 (0.014) 0.535 (0.027)

βwake Baseline Baseline Baseline Baseline Baseline Baseline

βREM 0.002 (0.008) -0.205 (0.026) -0.028 (0.004) -0.104 (0.027) 0.030 (0.007) -0.037 (0.007)

βlight -0.035 (0.008) -0.611 (0.026) -0.061 (0.004) -0.052 (0.021) -0.027 (0.006) 0.039 (0.006)

βdeep -0.044 (0.010) -0.997 (0.033) -0.055 (0.004) -0.249 (0.026) -0.096 (0.008) -0.106 (0.008)

βa -0.009 (0.002) -0.002 (0.001)

βg 0.042 (0.021) -0.247 (0.069) -0.045 (0.018) 0.052 (0.017)

βb 0.011 (0.004) -0.025 (0.011)

βt 0.001 (4e-4) -1e-4 (2e-5) 0.001 (2e-4) 4e-4 (1e-4) -4e-4 (1e-4)

βta -1e-5 (3e-6)

βtg

βtb -3e-5 (1e-5) -2e-5 (5e-6) 2e-5 (5e-6)

Random, coefficient (SE)

Ω0

Ωwake 0.030 (0.003) 0.159 (0.018) 0.018 (0.002) 0.224 (0.025) 0.018 (0.002) 0.014 (0.002)

ΩREM 0.029 (0.003) 0.171 (0.018) 0.019 (0.002) 0.280 (0.031) 0.022 (0.002) 0.018 (0.002)

Ωlight 0.030 (0.003) 0.219 (0.018) 0.020 (0.002) 0.256 (0.028) 0.019 (0.002) 0.017 (0.002)

Ωdeep 0.031 (0.003) 0.257 (0.018) 0.020 (0.002) 0.324 (0.036) 0.020 (0.002) 0.017 (0.002)

Ωt 1e-7 (1e-8) 7e-7 (8e-8) 4e-8 (4e-9) 7e-7 (8e-8) 5e-8 (6e-9) 5e-8 (5e-9)

Residual and deviation (dev.)

Ωe 0.019 (1e-4) 0.290 (0.001) 0.003 (1e-5) 0.230 (0.001) 0.033 (1e-4) 0.033 (1e-4)

Dev. -150487 217253 -398075 186380 -75029 -74306

ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed

constant intercept β0 and sleep stage intercepts βs are presented.

demographics. It shows that the fixed time and demographic effects were generally larger than

the differences between sleep stages.

With the addition of the centering variable to Model #1, we have Model #2 and the estimated

regression coefficients after model optimization (Wald Z-test at p < 0.05, for each coefficient)

are shown in Table 9.5. As stated, this model included the between-subject physiological effect

at the overnight mean level (i.e., centering effect), resulting in an obvious reduction of the

random variance in each sleep stage compared with Model #1. This indicates that, regardless of

sleep stages, the between-subject variability in physiology can be reflected, to a certain degree,

by the difference of the mean value over night. Besides, centering the parameter values per

subject slightly influenced the time effect in both fixed and random parts. In comparison with

Model #1, a lower deviance using Model #2 was obtained for all the parameters (p < 0.0001)

as shown in Table 9.4 and 9.5, indicating a better goodness-of-fit on the parameters using the

model with the centering variable.

Part

III.Card

iorespiratory-b

asedsleep

stageclassifi

cation137

0 200 400-1.28

-1.24

-1.2

-1.16

-1.12

Time (min)

BR

(ln

-Hz)

0 200 400

Time (min)

0 200 4004.05

4.1

4.15

4.2

4.25

Time (min)

HR

(ln

-bpm

)

0 200 400

Time (min)

0 200 4000.2

0.3

0.4

0.5

0.6

Time (min)

LF

(nu)

0 200 400

Time (min)

0 200 400-4.6

-4

-3.4

-2.8

Time (min)

SD

BR

(ln

-Hz)

0 200 400

Time (min)

0 200 4002.8

3.4

4

4.6

Time (min)

SD

NN

(ln

-ms)

0 200 400

Time (min)

0 200 400

0.4

0.5

0.6

0.7

Time (min)

HF

(nu)

0 200 400

Time (min)

Wake REM sleep Light sleep Deep sleep

Man

Man

Man

Man

Man

Man

Woman

Woman

Woman Woman

Woman Woman

Figure 9.2: An example of multilevel regressions of the six cardiorespiratory parameters for a man (age: 24 y, BMI: 21.3 kg/m2) and a woman (age: 70 y,

BMI: 28.6 kg/m2) using coefficients estimated through Model #1 excluding the random coefficients and residual term. The regression variables included

age, gender, BMI, time, time×age, time×gender, time×BMI, and sleep stages wake, REM, light, and deep.


Table 9.5: Coefficients and their standard errors (SE) of the optimized multilevel model with the addi-

tional between-subject centering effect (Model #2) for the six cardiorespiratory parameters.

Coef. BR SDBR HR SDNN LF HF


Fixed, coefficient (SE)

β0 -0.098 (0.079) -0.012 (0.017) 0.104 (0.028) -0.060 (0.047) -0.018 (0.034) 0.131 (0.030)

βc 0.973 (0.011) 0.884 (0.020) 0.993 (0.007) 0.979 (0.011) 0.936 (0.012) 0.923 (0.011)

βwake Baseline Baseline Baseline Baseline Baseline Baseline

βREM 0.002 (0.008) -0.199 (0.025) -0.027 (0.004) -0.104 (0.027) 0.030 (0.007) -0.037 (0.007)

βlight -0.035 (0.008) -0.606 (0.026) -0.062 (0.004) -0.052 (0.020) -0.027 (0.005) 0.039 (0.006)

βdeep -0.044 (0.010) -0.992 (0.033) -0.054 (0.004) -0.248 (0.026) -0.096 (0.008) -0.105 (0.008)

βa -0.002 (0.001) -1e-4 (5e-5) 4e-4 (1e-4) 2e-4 (1e-4)

βg -0.024 (0.012)

βb 0.005 (0.001) -0.004 (0.001)

βt 3e-4 (1e-4) -1e-4 (2e-5) 0.001 (1e-4) 4e-4 (1e-4) -4e-4 (1e-4)

βta -1e-5 (1e-6)

βtg 1e-4 (5e-5)

βb -2e-5 (5e-6) 2e-5 (5e-6)

Random, coefficient (SE)

Ω0

Ωwake 0.012 (0.001) 0.093 (0.011) 0.004 (4e-4) 0.094 (0.011) 0.006 (0.001) 0.005 (0.001)

ΩREM 0.014 (0.002) 0.099 (0.011) 0.003 (3e-4) 0.095 (0.011) 0.007 (0.001) 0.006 (0.001)

Ωlight 0.006 (0.001) 0.061 (0.007) 0.002 (3e-4) 0.044 (0.005) 0.004 (0.001) 0.003 (3e-4)

Ωdeep 0.010 (0.001) 0.131 (0.015) 0.003 (3e-4) 0.087 (0.010) 0.006 (0.001) 0.006 (0.001)

Ωt 1e-7 (1e-8) 7e-7 (8e-8) 4e-8 (4e-9) 7e-7 (8e-8) 5e-8 (6e-9) 4e-8 (5e-9)

Residual and deviation (dev.)

Ωe 0.019 (1e-4) 0.290 (0.001) 0.003 (1e-5) 0.230 (0.001) 0.033 (1e-4) 0.033 (1e-4)

Dev. -151084 216873 -398866 185774 -75617 -74903

ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed

constant intercept β0 and sleep stage intercepts βs are presented.

Normality of the variances was tested and suggested using the Q-Q plot method for all

models. For example, the Q-Q plots of the residual variances Ωe (in Model #1) for all the

parameters are shown in Figure 9.3, suggesting that the variances were approximately drawn

from a normal distribution.

9.3.3 Proportion of variance explained

To exploit by which effects the variance was explained and how much they constituted, we

computed for each cardiorespiratory parameter the PVE for each effect by analyzing the es-

timated variances of random intercept and residual in a sequence of models (Model A-G in

the Appendix). The variance changes in the models with the inclusion of different effects in

a specific order are shown in Table 9.6, based on which the PVE values were obtained in Ta-

ble 9.7. Note that the variances explained by sleep stages were not included in PVE. For BR


-0.5 0 0.5

-0.5

0

0.5

Standard normal quantiles

Quantile

s o

fΩ

e

-2 0 2

-2

0

2


Quantile

s o

fΩ

e

-0.2 0 0.2

-0.2

0

0.2


Quantile

s o

fΩ

e

-2 0 2

-2

0

2


Quantile

s o

fΩ

e

-0.5 0 0.5

-0.5

0

0.5


Quantile

s o

fΩ

e

-0.5 0 0.5

-0.5

0

0.5


Quantile

s o

fΩ

e HF (nu)

BR (ln-Hz) SDBR (ln-Hz) HR (ln-bpm)

SDNN (ln-bpm) LF (nu)

Figure 9.3: Q-Q plots of residual variance Ωe of the multilevel models (Model #1) for the six cardiores-

piratory parameters. These plots suggest approximate normal distributions of the residual variances.

and HR, the between-subject centering effects dominated the variances (55.26% for BR and

77.95% for HR), indicating that the subjects behaved differently with respect to their breathing

rate and heart rate at the general mean level throughout the whole night. We also see that the

variations in breathing rate and heart rate had a lower centering difference between subjects

(with PVE of 26.23% for SDBR and of 39.06% for SDNN) compared with the physiological

variability within subjects (with PVE of 61.69% for SDBR and of 40.87% for SDNN). This

was also the case for LF and HF powers in the spectral domain of HRV as shown in Table 9.7.

As a result, the overall between-subject variability influenced more on breathing rate (PVE

of 66.58%) and heart rate (PVE of 86.25%) while less on their variations (PVE of 37.94%,

58.66%, 33.62%, and 35.13% for SDBR, SDNN, LF, and HF, respectively) compared with the

overall within-subject variability. In general, the variances explained by the effects in physiol-

ogy between subjects (including the effect at the overnight mean level and random effect) and

within subjects accounted for 83.83-97.16% of the total variance for different cardiorespiratory

parameters.

Specifically, a relative larger percentage (13.7%) of the demographic effect can be found

on SDNN compared with the other parameters. The PVE of between-subject physiological

variability (in the random part) ranged from 2.27% to 7.62% depending on the parameters.

For the time effect, the PVE in the fixed part (0.01-1.32%) reflecting the linear changes of

parameters over time within subjects was smaller than in the random part (1.58-2.74%) with

the indication of different changes over time between subjects. In general, the time effect

accounted for relatively less of the total variance than most other effects. Finally, although

the cross-interactions existed between time and demographics for BR, SDNN, LF, and HF, the

proportion of variance they explained was very small (<0.20%).


Table 9.6: Variances of a sequence of models (Model A-G in the Appendix) with different effects

for computing their PVE for the six cardiorespiratory parameters.

Model A-G with different effects BR SDBR HR SDNN LF HF

(Appendix) (ln-Hz) (ln-HZ) (ln-bpm) (ln-ms) (nu) (nu)

Model A: Ωe 0.0229 0.3306 0.0043 0.2626 0.0354 0.0356

baseline model Ω0 0.0328 0.1389 0.0192 0.2997 0.0151 0.0156

Dev. -125045 232926 -348717 202249 -66487 -65952

Model B: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355

+ within-subject time effect Ω0 0.0328 0.1393 0.0191 0.2999 0.0150 0.0155

(fixed) Dev. -125109 232056 -357783 200926 -66724 -66131

Model C: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355

+ demographic effect Ω0 0.0308 0.1329 0.0183 0.2230 0.0147 0.0136

(fixed) Dev. -125120 232048 -357790 200877 -66730 -66152

Model D: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355

+ centering effect Ω0 0.0001 0.0098 0.0001 0.0033 0.0003 0.0002

(fixed) Dev. -126064 231624 -358718 200200 -67367 -66850

Model E: Ωe 0.0227 0.3284 0.0040 0.2597 0.0352 0.0354

+ demographic-related time Ω0 0.0001 0.0098 0.0001 0.0033 0.0003 0.0002

effect (fixed) Dev. -126393 231624Ne -358718Ne 200027 -67718 -67206

Model F: Ωe 0.0210 0.3157 0.0034 0.2476 0.0343 0.0346

+ between-subject time Ω0 0.0003 0.0097 0.0001 0.0041 0.0003 0.0002

effect (random) Ωt 1.1e-7 7.3e-7 3.6e-8 7.1e-7 5.2e-8 4.5e-8

Dev. -136185 226933 -380964 194316 -70913 -69899

Model G: Ωe 0.0186 0.2896 0.0029 0.2298 0.0328 0.033

+ between-subject Ω0 0 0 0 0 0 0

physiological effect Ωt 1.0e-7 6.7e-7 3.5e-8 7.1e-7 4.8e-8 4.3e-8

(random) Dev. -151084 216874 -398866 185774 -75617 -74903

ln, natural logarithm; nu, normalized unit; Dev., model deviance; Ne, no effect. All the models include

fixed (β0) and random (µ0) intercepts, and sleep-stage-dependent variables (wake, REM, light, and deep)

with their coefficients. The models were optimized by excluding the effects with their coefficients statis-

tically equal to zero (Wald Z-test, p > 0.05) and the variances presented in the table were all statistically

significant (Wald Z-test, p < 0.01).

9.3.4 Sleep staging results

The results of sleep staging are presented in Table 9.8, where different schemes (BS and CS1-

CS3) were compared. We observe that the correction by means of the between- and/or within-

subject effects for the parameters generally enabled performance improvement in sleep staging

(by comparing the results of CS1-CS3 with BS). In particular, correcting the parameters by

the fixed effects (demographics, time, and their cross-interactions) independent of sleep stages

(CS1) resulted in a significantly increased Kappa of 0.29 ± 0.11 and a significantly increased

accuracy of 60.4 ± 8.8% (Wilcoxon test, p < 0.00001) compared with the baseline without any

correction (Kappa of 0.19 ± 0.10 and accuracy of 55.8 ± 9.8%). In addition, if we could man-


Table 9.7: Proportion of variance explained (PVE, %) accounted for by different effects for the six

cardiorespiratory parameters.

Effect BR SDBR HR SDNN LF HF

Overall between-subject effect

Demographic effect 3.55% 1.37% 3.36% 13.69% 0.63% 3.70%

Centering (physiological) effect 55.26% 26.23% 77.95% 39.06% 28.63% 26.41%

Between-subject time effect 2.74% 2.72% 2.67% 2.00% 1.87% 1.58%

Between-subject physiological effect 5.03% 7.62% 2.27% 3.91% 3.49% 3.44%

Overall within-subject effect

Within-subject time effect 0.01% 0.37% 1.32% 0.42% 0.16% 0.14%

Within-subject physiological effect 33.39% 61.69% 12.43% 40.87% 65.04% 64.54%

Cross-interaction effect

Demographic-related time effect 0.02% Ne Ne 0.06% 0.18% 0.19%

ln, natural logarithm; Ne, no effect. For each cardiorespiratory parameter, the sum of PVE’s from all the ef-

fects is 100%, representing the total variance for that parameter. The centering effect reflected some between-

subject physiological variability (at the overnight mean level) that was assumed to be independent of sleep

stage composition over the entire night.

Table 9.8: Comparison of sleep staging results (wake/REM sleep/light sleep/deep sleep) using

different schemes in correcting the cardiorespiratory parameters.

PSG BS CS1 CS2 CS3

Overall performance

Accuracy, % – 55.8 ± 9.8 60.4 ± 8.8 62.9 ± 7.8 83.5 ± 14.4

Kappa coefficient – 0.19 ± 0.10 0.29 ± 0.11 0.35 ± 0.09 0.72 ± 0.23

Sleep stage composition (percentage)

Wake, % 19.8 ± 12.5 19.9 ± 14.4 18.4 ± 4.9 20.6 ± 6.4 19.7 ± 10.7

REM sleep, % 14.0 ± 5.6 0.7 ± 1.0 2.4 ± 2.0 3.0 ± 1.7 10.5 ± 7.8

Light sleep, % 53.4 ± 10.7 74.7 ± 15.1 73.5 ± 8.1 71.0 ± 8.2 59.9 ± 12.0

Deep sleep, % 12.8 ± 7.2 4.7 ± 5.6 5.7 ± 5.2 5.4 ± 4.0 9.9 ± 7.6

BL, baseline with original parameter values without correction; CS1, with correction by fixed effects;

CS2, with correction by fixed effects and between-subject random effects; CS3, with correction by

fixed effects and within-subject random effect (model residual). For CS2 and CS3, results were ob-

tained when assuming the sleep stages were known, which was usually not the case in practice. For

accuracy and Kappa coefficient, significance of difference between using each correction scheme

and BS was confirmed with a paired (two-sided) Wilcoxon signed-rank test, all at p < 0.00001.

age to further correct the variability of the parameters evoked by the between-subject random

effects (CS2), the sleep staging results would significantly increase to a Kappa of 0.35 ± 0.09

and an accuracy of 62.9 ± 7.8% (Wilcoxon test, p < 0.00001), where the SD of results over

subjects would be simultaneously reduced. On the other hand, if the within-subject variability

could be corrected (CS3), the sleep staging performance would be markedly improved (at a


Kappa of 0.72 ± 0.23 and an accuracy of 83.5 ± 14.4%) (Wilcoxon test, p < 0.00001), but

meanwhile, the SD would increase because this correction scheme focused on reducing effects

within subjects rather than those between subjects. Similarly, as shown in Table 9.8, correcting

the parameters could help obtain a more accurate estimation of sleep stage composition.

9.4 Discussion

The results of demographic and time of night effects found in this study are consistent with the

findings reported in previous work [49, 57, 101, 257]. It is noted that the model used to facili-

tate the interpretation of the demographic effects (Model #1) should not include the (between-

subject) centering variable. This is because the demographic differences usually correspond to

the autonomic changes at the overnight mean level. Due to the inclusion of the centering effect

in Model #2, it came as a surprise that some demographic variables still had significant effects

(see Table 9.4), which contradicts our expectation that their effects on the cardiorespiratory ac-

tivity are fully manifested by the parameter mean values. The cause is that the percentages (or

composition) of sleep stages were not exactly the same for all subjects. Therefore, the demo-

graphic differences were only partially explained by the centering variable and the unexplained

part depends on the difference of sleep stage composition between subjects.

It is important to note that, since some effects were correlated with each other, the order in

the procedure of constructing the sequence of models (see the Appendix) must be specifically

determined. This aimed at precisely quantifying the proportion of variance explained by each

effect. The procedure should follow the way that the model with fixed effects (e.g., demo-

graphic effects) that are explainable by other effects should be first addressed and the model

with random effects should be included later [134].

In Table 9.4 and 9.5, it can be seen that the time variable was able to explain variance at the

subject level due to the significance of the random time effect. First, the slope of cardiorespi-

ratory activity changing over time might depend on sleep stages (or their transitions) and thus

might not be with a continuous linear trend. A method of handling the sleep-stage-dependency

is to use a model that contains the cross-interactions between sleep stages and time; but for

the influence of sleep stage transitions, it is suggested to regard the night as different segments

without any sleep stage transitions. Second, the random time effect could likely be due to

the difference in autonomic control or changes in sleep architecture between subjects by other

factors such as daytime activity, work stress, and response to the sleep environments during

sleep. This was not addressed in this study and it merits further investigation. On the other

hand, the cross-interactions between time and demographics (in particular, BMI) explained

some total variance at both subject and epoch levels. Although the amount and proportion of

variance explained by the time-related effects seems much smaller than some other effects as

shown in Table 9.7, they are still statistically unequal for different subjects and are relative large

compared with the differences between sleep stages for some parameters such as LF and HF,

especially at the end of the night, which can be observed in Figure 9.2.

Regarding the quantified within effects, several factors in addition to internal physiology


may also explain some of the total variance within subjects in cardiorespiratory activity such

as body movements, body position, sleep environment, conscious breathing control, and even

daytime activity. However, we did not answer which of these effects takes place in this work

and this should be studied in the future.

When evaluating the performance of sleep staging using the cardiorespiratory parameters,

Model #2 should be regarded as the preference. For each parameter, although the estimate of

its overnight mean value for each subject was not completely accurate (due to the difference

of sleep stage composition between subjects), correcting it can still result in a reduction of the

physiological variability between subjects to a great extent. As a consequence, the sleep staging

results can be improved. Table 9.6 confirms that the centering effect actually constituted a

large proportion of the total variance. Moreover, Figure 9.2 illustrates that the variations of the

parameters caused by demographic and time effects were somewhat comparable with or even

larger than the differences between sleep stages, leading to difficulty in separating sleep stages.

With respect to the capability of the parameters in classifying sleep stages, Table 9.5 shows

that, for example, SDBR had a larger difference between sleep stages compared with the other

parameters while BR had no difference between REM sleep and wakefulness. This indicates

that the intrinsic separation of sleep stages should vary between the parameters that express

different aspects of the autonomic activity.

Table 9.8 indicates that the variability between and within subjects conveyed by the car-

diorespiratory activity limited the sleep staging performance. To improve it, the correction

scheme CS1 seems potentially applicable from a practical point of view because the fixed ef-

fects are usually prior information that is independent of sleep stages or they can be estimated

from the training data before performing sleep staging. However, realizing CS2 and CS3 re-

quires either information of sleep stages (which appear practically unknown and need to be

identified) or estimation of random variances (which are hardly predictable for new subjects).

Therefore, the challenge will be on how to diminish the random effects caused by variability

either between or within subjects when sleep stages are unknown. For instance, normalizing

the parameter values based on their variation or distribution throughout the night for each sub-

ject might allow for reduction of between-subject random effect in physiology to some extent.

Incorporating more explanatory variables in the model that are independent of sleep stages and

are able to explain some variance of the model would help better correct the parameters. Com-

pared to the parameters analyzed in this study, exploring new parameters with smaller random

variances (i.e., are less influenced by the between- or within-subject physiological variability)

or additional information in separating sleep stages may improve the sleep staging performance.

Nevertheless, we argue that the performance of cardiorespiratory-based sleep staging will al-

ways be limited unless the between- and/or within-subject random variances are successfully

explained and corrected.


9.5 Conclusion

In this chapter, with a multilevel analysis we statistically modeled and quantified the effects on

autonomic cardiorespiratory activity during sleep caused by differences in subject demograph-

ics, time of night, physiology within and between subjects. All these effects were found to

significant. The primary effects were the physiological variability within and between subjects.

They markedly limit the performance of sleep staging when using cardiorespiratory informa-

tion. Therefore, diminution of these effects will be the main challenge to further improve the

cardiorespiratory-based sleep staging.

9.A Appendix

The sequence of models constructed to compute the PVE values for different effects is described

in the following.

• The first model is the model with solely the constant and random intercepts as well as the

fixed sleep-stage-dependent variables. This baseline model can be written as

Model A: yi j = β A0 +µA

0 j +∑s

β As si j + eA

0i j,

with µA0 j ∼ N(0,ΩA

0 ) and eA0i j ∼ N(0,ΩA

e ), (9.7)

where s = wake,REM, light,deep, and the total variance Ωtotal consists of variance in

two levels: the between-subject variance ΩA0 at the subject level and the within-subject

(residual) variance ΩAe at the time/epoch level. The percentage of the total variance taken

by ΩA0 , called intra-group correlation coefficient (ICC) ρ (21, 39), is computed by

ρ =ΩA

0

Ωtotal=

ΩA0

(ΩAe +ΩA

0 ). (9.8)

• Let us then consider the model with fixed time effect at the first level

Model B: yi j = β B0 +µB

0 j +∑s

β Bs si j +β B

t timei j + eA0i j,

with µB0 j ∼ N(0,ΩB

0 ) and e0i j ∼ N(0,ΩBe ). (9.9)

For the variance analysis of the time variable, instead of using the original time stamps

mentioned before (i.e., timei j = i/2), we use the shifted (centered) values computed as the

original time minus the mean value of the median time over all subjects. This is because,

for a longitudinal multilevel analysis, time is an occasional variable within subjects and

it usually suffices a linear trend for the measurements since, it thus would explain part of

total variance in both levels [134]. Actually, with and without shifting the occasion mea-

sures do result in equivalent models with exactly the same model coefficients (including

residual) and deviance except for the variance estimates of random effects. The variance


estimates obtained by shifting the time values are considered to be more accurate and

realistic [134]. To quantify the PVE constituted by the fixed time effect, we exploit the

relative variance reduction of the baseline model in the two levels R21 and R2

1, such that

PVEtime fixed = (1−ρ)R21 +ρR2

2

= ρΩA

e −ΩBe

ΩAe

+(1−ρ)ΩA

0 −ΩB0

ΩA0

=(ΩA

e −ΩBe )+(ΩA

0 −ΩB0 )

Ωtotal. (9.10)

Now we consider the subject-level fixed effects.

• The model including demographic variables is

Model C: yi j = βC0 +µC

0 j +∑s

βCs si j +βC

t timei j + eC0i j,

+βCa age j +βC

g gender j +βCb BMI j,

with µC0 j ∼ N(0,ΩC

0 ) and eC0i j ∼ N(0,ΩC

e ). (9.11)

Similarly, the PVE explained by the between-subject demographic variables can be com-

puted by

PVEdemographic =(ΩB

e −ΩCe )+(ΩB

0 −ΩC0 )

Ωtotal. (9.12)

The demographic variables only explain the variability between subjects, so the variance

change at the epoch level should be approximately zero (ΩBe −ΩC

e = 0).

• Further, Model D is the model with the inclusion of between-subject centering effect

(expressing the physiological difference between subjects at the overnight mean level),

given by

Model D: yi j = β D0 +µD

0 j +∑s

β Ds si j +β D

t timei j +β Dc y j + eD

0i j,

+β Da age j +β D

g gender j +β Db BMI j,

with µD0 j ∼ N(0,ΩD

0 ) and eD0i j ∼ N(0,ΩD

e ), (9.13)

from which the corresponding PVE is computed such that

PVEcenter =(ΩC

e −ΩDe )+(ΩC

0 −ΩD0 )

Ωtotal. (9.14)


• For the inclusion with cross-interactions that express the demographic-related time ef-

fects, the model is

Model E: yi j = β E0 +µE

0 j +∑s

β Es si j +β E

t timei j +β Ec y j + eE

0i j,

+β Ea age j +β E

g gender j +β Eb BMI j,

+β Eta(time×age)i j +β E

tg(time×gender)i j +β Etb(time×BMI)i j,

with µE0 j ∼ N(0,ΩE

0 ) and eE0i j ∼ N(0,ΩE

e ), (9.15)

and the proportion of cross-interaction variance is

PVEcross =(ΩD

e −ΩEe )+(ΩD

0 −ΩE0 )

Ωtotal. (9.16)

In addition to the fixed part, we consider the random part of some effects.

• The models with additional random time effect is

Model F: yi j = β F0 +µF

0 j +∑s

β Fs si j +(β F

t +µFt j)timei j +β F

c y j + eF0i j,

+β Fa age j +β F

g gender j +β Fb BMI j,

+β Fta(time×age)i j +β F

tg(time×gender)i j +β Ftb(time×BMI)i j,

with

[

µF0 j

µFt j

]

∼ N

([

0

0

]

,

[

ΩF0

ΩFt

])

and eF0i j ∼ N(0,ΩF

e ), (9.17)

The computation of the PVE accounted for by the random time effect can be accordingly

obtained by

PVEtime random =(ΩE

e −ΩFe )+(ΩE

0 −ΩF0)

Ωtotal. (9.18)

• Afterwards, the model with random effects for different sleep stages (expressing the

between-subject physiological variability associated with each sleep stage in random

part) is then expressed as

Model G: yi j = β G0 +µG

0 j +∑s

(β Gs +µG

s j)si j +(β Gt +µG

t j )timei j +β Gc y j + eG

0i j,

+β Ga age j +β G

g gender j +β Gb BMI j,

+β Gta(time×age)i j +β G

tg(time×gender)i j +β Gtb(time×BMI)i j,

with

µG0 j

µGt j

µGs j

∼ N

0

0

0

,

ΩG0

ΩGt

ΩGs

and eG

0i j ∼ N(0,ΩGe ). (9.19)


In this model, the random variance ΩGs not only explain the variance in ΩF

0 and ΩFe , but

also reflect some variance of the random time effect ΩFt . Therefore, the proportion of

variance contained in ΩGs to the total variance is

PVEbetw subj random =(ΩF

e −ΩGe )+(ΩF

0 −ΩG0 )+(ΩF

t −ΩGt )

Ωtotal. (9.20)

Then the PVE of the random time effect to the total variance should be corrected to

PVEtime random =(ΩE

e −ΩFe )+(ΩE

0 −ΩF0)− (ΩF

t −ΩGt )

Ωtotal. (9.21)

• Finally, the remaining residual variance is assumed to only associate with the physiolog-

ical variability within subjects and its proportion can be obtained such that

PVEwithin subj random =ΩG

e

Ωtotal. (9.22)

Note that all these models are optimized by only keeping the variables that do not statistically

equal zero.

CHAPTER 10

Sleep stage classification with ECG and respiratory effort

This chapter is adapted from: P. Fonseca∗, X. Long∗, M. Radha, R. Haakma, R. M. Aarts, and J. Rolink.

Sleep stage classification with ECG and respiratory effort. Submitted. (∗Joint first authorship)

Abstract – Automatic sleep stage classification with cardiorespiratory signals has attracted

increasing attention. In contrast to the traditional manual scoring based on polysomnography

(PSG), these signals can be measured using advanced unobtrusive techniques that are currently

available, promising the applications for personal and continuous home sleep monitoring. This

chapter describe a methodology for classifying wake, rapid-eye-movement (REM) sleep, and

non-REM (NREM) light and deep sleep on a 30-s epoch basis. A total of 142 features were ex-

tracted from electrocardiogram (ECG) and thoracic respiratory effort measured with respiratory

inductance plethysmography (RIP). To improve the quality of these features, subject-specific

Z-score normalization and spline smoothing were used reduce between-subject and within-

subject variability. A modified sequential forward search- (SFS-) feature selector procedure

was applied, yielding 80 features while preventing the introduction of bias in the estimation of

cross-validation performance. Data from 48 healthy adults were used to validate our methods.

Using a linear discriminant classifier and a ten-fold cross-validation, we achieved a Cohen’s

Kappa coefficient of 0.49 and an accuracy of 69% in the classification of wake, REM, light,

and deep sleep. These values increased to Kappa = 0.56 and accuracy = 80% when the classifi-

cation problem was reduced to three classes, wake, REM sleep, and NREM sleep.

149

150 Chapter 10. Sleep stage classification

10.1 Introduction

Sleep is a state of reversible disconnection from the environment and plays an essential role in

the homeostatic regulation of body and mind. The limited consciousness during sleep makes it

one of the hardest lifestyle patterns to reflect upon. Historically this has not been a problem as

the regulation of sleep is rigorously synchronized through a biological circadian rhythm with

the external environment. Yet, in the modern industrialized society where we spend our lives

in artificial environments where lighting, heat and food are available at any moment, sleep

disturbances and disorders have reached epidemic levels [65]. People experience the symptoms

of disturbed sleep such as fatigue, increased impulsiveness and agitation, without the means to

link these issues to their sleeping patterns.

To ensure fitness of body and mind, individuals must be empowered with the ability to mon-

itor sleep easily in order to identify sleep-related problems and adjust their sleeping habits ac-

cordingly. Yet a problem with traditional sleep monitoring, known as polysomnography (PSG),

is that a wide array of potentially sleep-disturbing sensors must be applied to the body and their

measurements can only be interpreted by highly trained sleep technicians or scientists. The

traditional PSG is therefore rather unsuited for individual untrained use and will only introduce

more sleep disturbances when applied on a daily basis. This scenario makes apparent a need for

unobtrusive methods of sleep monitoring, preferably inexpensive and with no training required

to operate them. Cardiorespiratory monitoring can be unobtrusive and the data can be analyzed

by a computer, which makes this technology a promising candidate for personal, continuous

and unobtrusive sleep monitoring.

Cardiorespiratory sleep staging or sleep stage classification is often based on heart rate vari-

ability (HRV) calculated from electrocardiogram (ECG) and respiratory effort, often from res-

piratory inductance plethysmography (RIP). Usually cardiorespiratory information is combined

with body movements from an accelerometer to more accurately distinguish wake from sleep.

One of the earliest studies that presented a successful machine learning approach to cardiorespi-

ratory sleep stage classification with these modalities was done by Redmond et al. [248]. Using

a set of HRV features to model the autonomic nervous activity and a set of respiratory features

to model the parasympathetic tone, Redmond and colleagues showed the viability of a sleep

stage classifier that can generate a simplified hypnogram for an entire night indicating, for each

30-s segment, a sleep stage, classified as either wake, rapid-eye-movement (REM) sleep, or

non-REM (NREM) with no PSG (wake-REM-NREM or WRN classification for short). More

recent research has shown that it is possible to obtain the same cardiorespiratory information

from other sensors for sleep stage classification, such as from bed-mounted ballistocardiogram

[161, 303] or contactless radio frequency [85]. Although these studies focused on distinction

between wake, REM sleep, and NREM sleep (without separating NREM sleep in other sleep

stages) or between wake and sleep (merging REM and NREM sleep), these attempts promised

that cardiorespiratory methods could one day be completely unobtrusive.

In previous work [182] we proposed methods to simultaneously classify wake, REM sleep,

light sleep (NREM stage S1 and S2), and deep sleep or slow wave sleep (stage S3 and S4) us-

ing respiratory activity in order to estimate an overnight wake-REM-light-deep sleep (WRLD)


hypnogram. In comparison with WRN classification, achieving WRLD classification would

allow a more adequate assessment of sleep since, for example, deep sleep is regarded as an in-

dicator of brain memory consolidation and energy reservation [35, 285]. In that work, we also

reviewed the state-of-the-art in sleep stage classification with cardiac and/or respiratory activ-

ity. The methods presented there will be used to benchmark the method proposed in this work.

Since then, at least two additional approaches using cardiorespiratory features have been pro-

posed, which will also be compared with our work. For example, Willemen et al. [309] achieved

a significantly improved performance in cardiorespiratory sleep stage classification. However,

that study classified sleep stages on the basis of one-minute epochs while the standardized scor-

ing of sleep epochs is done on the basis of 30 s [136]. Comparing classification results with a

reference scoring thus involves the merging the ground-truth scores of two successive epochs

for which no official guidelines exist. Nevertheless, the performance reported sets this method

apart from the previous generations of published algorithms. Another cardiorespiratory-based

algorithm with comparable results has been proposed by Domingues et al. [94]. However,

this work only reports results on a three-class task (WRN classification) rather than the more

difficult four-class problem (WRLD classification).

In this chapter, a methodology is described for automatic sleep stage classification based

on machine learned models of the autonomic nervous system during sleep from ECG and RIP

signals. Compared to previous studies, our methodology includes novel features, new feature

post-processing methods, and a refined feature selection method which guarantees that no bias

is introduced in the validation of the algorithm while avoiding the use of a hold-out validation

set, all this applied to both the three-class (WNR) as well as the four-class (WRLD) problem.

10.2 Materials and Methods

10.2.1 Data sets

The data set was the same as used in earlier work [182] and it comprised full single-night

polysomnographic (PSG) recordings of 48 subjects (27 females) acquired in the SIESTA project

[160]. All subjects were healthy sleepers with a Pittsburgh Sleep Quality Index [60] of less than

6 and had no regular sleep complaints nor earlier diagnosis of sleep disorders. The subjects had

an average age of 41.3(±16.1) y at the time of the recording. Full subject demographics can

be found in our earlier work [182]. Sleep stages were scored by trained sleep technicians in six

classes according to the R&K rules [247]. In the scope of this study, S1 and S2 were merged in

a single L (light sleep) class and S3 and S4 were merged in a single D (deep sleep) class.

Each PSG recording comprised, besides the standard signals required for sleep scoring,

modified lead II ECG, and (thoracic) respiratory effort recorded with respiratory inductance

plethysmography (RIP). QRS complexes were detected and localized from ECG signals using

a combination of a Hamilton-Tompkins detector [123, 124] and a post-processing localization

algorithm [107]. Prior to feature extraction, RIP signals were filtered with a 10th order Butter-

worth low-pass filter with a cut-off frequency of 0.6 Hz, after which baseline was removed by

subtracting the median peak-to-through amplitude [182].



We extracted a set of 142 features from cardiac and respiratory activity, and from cardiores-

piratory interaction (CRI) using a sliding window centered on each 30-s epoch, guaranteeing

sufficient data to capture the changes in autonomic activity [288].

10.2.2.1 Cardiac features

Considering cardiac activity, 86 cardiac features were computed from the QRS complexes de-

tected in the ECG signal. Time domain features, computed over nine epochs, include mean

heart rate, mean heartbeat interval (detrended and non-detrended), standard deviation (SD) of

heartbeat intervals, difference between maximal and minimal heartbeat intervals, root mean

square and SD of successive heartbeat interval differences, and percentage of successive heart-

beat intervals differing by >50 ms [249, 288]. We also computed the mean absolute difference

and different percentiles (at 10%, 25%, 50%, 75%, and 90%) of detrended and non-detrended

heart rates and heartbeat intervals [309, 315] as well as the mean, median, minimal, and max-

imal likelihood ratios of heart rates [32]. In the frequency domain, the features include the

logarithmic spectral powers in the very low frequency band (VLF) from 0.003 to 0.04 Hz, in

the low frequency band (LF) from 0.04 to 0.15 Hz, in the high frequency band (HF) between

0.15 to 0.4 Hz, and the LF-to-HF ratio [59], where the power spectral densities were estimated

over nine epochs. The spectral boundaries were adapted to the corresponding peak frequency,

yielding their boundary-adapted versions [179]. We also computed the maximum module and

phase of HF pole [197] and the maximal power in the HF band and its associated frequency

representing respiratory rate [249]. Features describing non-linear properties of heartbeat in-

tervals were quantified with detrended fluctuation analysis (DFA) over eleven epochs [148] and

its short-term (α1), long-term (α2), and all time scaling exponents [139, 224], progressive DFA

with non-overlapping segments of 64 heartbeats [289], windowed DFA over eleven epochs [3],

and multi-scale sample entropy over 17 epochs (length of 1 and 2 samples with scales of 1-10)

[75]. Approximate entropy of the symbolic binary sequence that encodes the increase or de-

crease in successive heartbeat intervals over nine epochs was also calculated [78]. In addition,

we propose new features based on a visibility graph (VG) and a difference VG (DVG) method

to characterize HRV time series in a two-dimensional complex network where samples are con-

nected as nodes in terms of certain criteria [169, 183]. The network-based features, computed

over seven epochs, comprise mean, SD, and slope of node degrees and number of nodes in

VG- and DVG-based networks with a small degree (≤ 3 for VG and ≤ 2 for DVG) and a large

degree (≥ 10 for VG and ≥ 8 for DVG), and assortativity coefficient in the VG-based network

[183, 272, 321].

10.2.2.2 Respiratory features

Concerning respiratory activity, 44 features were derived from RIP signals. In the time do-

main, we estimated the variance of respiratory signal, the respiratory frequency and its SD over

150, 210, and 270 s, the mean and SD of breath-by-breath correlation, and the SD in breath


length [249]. Our previous study [182] introduced respiratory amplitude features for sleep

stage classification, including the standardized mean, standardized median, and sample entropy

of respiratory peaks and troughs (indicating inhalation and exhalation breathing depth, respec-

tively), median peak-to-trough difference, median volume and flow rate for complete breath

cycle, inhalation, and exhalation, and inhalation-to-exhalation flow rate ratio. These features

were adopted in this work. Besides, we also computed the similarity between the peaks and

troughs by means of the envelope morphology using a dynamic time warping (DTW) metric

[37]. From the respiratory spectrum, the respiratory frequency and its power, the logarithm of

the spectral power in VLF (0.01-0.05 Hz), LF (0.05-0.15 Hz), and HF (0.15-0.5 Hz) bands,

and the LF-to-HF ratio were estimated [248]. Respiratory regularity was measured by means

of sample entropy over seven epochs [185, 250] and self-(dis)similarity based on DTW and

dynamic frequency warping (DFW) [180] and uniform scaling [185] were derived. The same

network analysis features as for HRV were also computed for breath-to-breath intervals.

10.2.2.3 Cardiorespiratory interaction features

Numerous studies have shown that the interaction between cardiac and respiratory activity

varies across sleep stages [77, 137, 183]. The power associated with respiratory-modulated

heartbeat intervals was quantified over windows of nine epochs[137]. In addition, we also ex-

tracted the VG- and DVG-based features for CRI [183]. These resulted in a total of 12 CRI

features in our feature set.

10.2.2.4 Feature post-processing

In order to reduce the impact of physiological differences and equipment-related variations from

subject to subject, the features of each subject were first Z-score normalized by subtracting

their mean and dividing by their SD. Further, it is known that the sleep pattern of healthy

adults progresses with several cycles throughout the night [63]. For example, REM and NREM

sleep alternate with 4-6 cycles of about 90-110 minutes with deep sleep usually dominating the

NREM periods during the first half of the night. This suggests that the autonomic physiological

response with its associated sleep stage is time-variant across the night for each subject. For

this reason, we were motivated to smooth each feature for each subject by means of a cubic

spline fitting method [84]. This is also expected to help reduce signal measurement noise and

variability within subjects for each sleep stage conveyed by the feature values. The latter can be

caused by body movements, conscious breathing control, internal physiological variations, or

other external factors such as changes in environmental noise and temperature during bedtime

sleep. Instead of other simpler low-pass filters, spline fitting was chosen since it can interpolate

feature values which could not be computed, for example due to motion artifacts (about 10%

observed in our data set), effectively allowing all epochs in each recording to be classified.

Let t represent a sequence of feature values v = v1,v2, ...,vn at their corresponding time

(or epoch) indices t = t1, t2, ..., tn (in 30 s), then a relation between them can be modeled by

vi = h(ti)+ εi (i = 1,2, ...,n), (10.1)


where h is a smoothing (spline) function, εi are independent and identically distributed resid-

uals. The smoothing function can be estimated by minimizing the objective function (i.e.,

penalized sum of square) such that

h = argminh

[n

∑i=1

[vi −h(ti)]2 +λ

∫ tn

t1

h′′(t)2dt

]

, (10.2)

where λ is a smoothing parameter that controls the trade-off between residual and local vari-

ation. The smoothing function can be expressed by cubic B-splines as basis functions and

determined via least squares approximation [84, 296].

For a specific overnight recording with a total of m epochs, it is divided in s continuous

segments (s = ⌈m/n⌉), designated as smoothing splines. Each segment can then be modeled by

the spline function, yielding a general spline fitting for the epochs over the entire recording. n

represents the smoothing window size where a larger n translates to a smoother fitting curve. In

this work, a window size of nine epochs for modeling splines was experimentally found to be

appropriate for the task of sleep stage classification.

10.2.3 Classifier

This work used a multi-class Bayesian linear discriminant with time-varying prior probabilities

[249], similar to that used in previous work [182]. For each epoch, the selected class (D, L, R,

or W) is the class ωi that maximizes the posterior probability given an feature vector x [97],

ωi (x) = argmaxi

[gi (x)] (10.3)

with the the discriminant function gi for each class given by

gi (x) =−1

2(x−µi)

T Σ−1 (x−µi)+ lnP(ωi, t) (10.4)

where µi is the average feature vector for class i, Σ is the pooled covariance matrix for all

classes, and P(ωi, t) is the prior probability for class i at time (since lights off) t. All parameters

are estimated during training.

10.2.4 Feature selection

To select the final list of features we used a wrapper feature selection method based on sequen-

tial forward selection (SFS) [306] using as criterion the Cohen’s Kappa coefficient of agreement

κ [72] on the training set. This measure of agreement between the classification predictions and

the ground-truth annotations is more adequate than traditional measures of accuracy for this

problem since there is a strong imbalance between classes (L epochs, for example, account for

more than 50% of all epochs in the data set) and this coefficient factors out chance agreement,

compensating for class imbalance.

In many machine learning studies supervised feature selection is often applied on the en-

tire data set, even if the training and validation are kept separate (for example using cross-

validation). This common pitfall is known to introduce a bias in the evaluation of a classifier’s


performance, which will often be overestimated [278]. Although keeping a hold-out set for

validation would solve this problem, the limited size of the data set would either mean that

the model learning would be based on potentially insufficient examples, or that the classi-

fier would be evaluated on a very small sample, potentially unrepresentative of the problem

at hand. Instead, the feature selection procedure was executed by strictly separating, on an

iterative procedure akin to cross-validation, the training and validation sets according to the

following procedure:

1. Randomly divide all subjects in the data set amongst ten folds of the same size

2. For each fold-iteration i = 1, ..,10,

(a) Hold out fold i as a validation set and combine the subjects in the remaining folds

to form a training set

(b) Perform an iterative SFS procedure for the total number of available features N =

142, i.e., for each SFS-iteration j = 1, ..,N,

i. Select which feature fi jshould be added to the set of features selected in the

previous iteration (an empty set when j = 1),

fi j= argmax

k(κik) ∀k : fik /∈ Fi j−1

(10.5)

where κik is the Kappa coefficient of agreement obtained after training and

classification on the training set using the set of features

Fi j−1∪ fik =

fi1 , . . . , fi j−1

, fik

(10.6)

and Fi j−1is the set of features selected in the previous iteration of SFS.

ii. Store the set of features selected up to this iteration,

Fi j=

fi1 , . . . , fi j−1, fi j

(10.7)

and the Kappa coefficient obtained with that set of features, κi j.

After the sets of features for all fold- and SFS-iterations and corresponding Kappa coefficients

are computed, the final consolidated list is obtained:

1. For a varying number of features j = 1, ..,N, calculate the average Kappa κ across the

ten iterations, κ j,

κ =

κ j : ∀ j = 1, ..,N

(10.8)

with

κ j =∑10

i=1 κi j

10(10.9)

2. Calculate the smallest number of features S that yields a certain percentage P of the max-

imum average Kappa such that the Kappa values per fold-iteration are not significantly

different than those which gives the maximum average Kappa,

S =

j | ∀k 6= j : κ j ≥ P ·max(κ)∧κk ≥ κ j

(10.10)


3. For each feature l, count the number of iterations that feature is selected, f cl , when

limiting the set of features on each iteration to S,

f cl =10

∑i=1

f cil (10.11)

where f cil indicates whether feature l is present in the set of selected features for fold-

iteration i and SFS-iteration S,

f cil =

1, fl ∈ FiS

0, otherwise(10.12)

4. Pick the S features withe largest feature count f cl to assemble the final set of consolidated

features, FS.

The discriminative power of selected features was evaluated with the absolute standardized

mean distance (ASMD) between the feature values of two classes, computed as

ASMD =

∣∣∣∣

x1 − x2

σ

∣∣∣∣

(10.13)

where x1 and x2 are the sample means for class 1 and 2 and σ is the pooled sample SD.

10.2.5 Validation and evaluation

After feature selection is performed and the set of features FS is chosen, the classification results

per subject were evaluated using a ten-fold cross-validation procedure using the same folds as

in the feature selection procedure:

1. For each iteration i = 1, ..,10,

(a) Hold out fold i as a validation set and combine the subjects in the remaining folds

to form a training set

(b) Restrict the feature set in the training set to the set FS

(c) Train an LD classifier with the training data

(d) For each subject in the validation set

i. Use the classifier trained in this iteration to classify each epoch of the current

subject

ii. Calculate the Kappa coefficient of agreement between the classification results

and the ground-truth annotations for this subject.


After computing the Kappa coefficient for all subjects in the data set, the average and pooled

performance was calculated.

As mentioned, the (Cohen’s) Kappa coefficient κ is an adequate and well-accepted metric

for evaluating the agreement between sleep technician and computer-based classification since

it compensates for the random agreement that can occur due to class imbalance. In addition

to the Kappa coefficient, we also computed the traditional metric overall accuracy, i.e., the

percentage of correctly identified epochs. For these two metrics, the results are computed both

after pooling the predictions over all epochs of all subjects and after averaging the performance

for each subject.

10.3 Results and discussion

10.3.1 Feature selection

Figure 10.1 indicates the Kappa coefficient obtained for each training set, for a varying number

of features. As illustrated, the maximum average training performance is obtained for 105

features, with an average Kappa of 0.58. Also clear in the figure, is a plateau in performance

between 70 and 100 features. This suggests that the number of features can be greatly decreased

without affecting the training performance. A small feature set is often desirable to prevent

over-fitting to the training data, as long as it is not so small that the model cannot learn the

characteristics of the problem.

Figure 10.2 illustrates the decrease in average training performance associated with a de-

crease in the number of features when choosing different operating points in Figure 10.1 (ex-

pressed in the scatter plot as percentages of the maximum training performance). As it can be

clearly observed in the figure, by allowing a reduction of 0.5% in the training performance,

the number of features can be reduced by 16.2% to a total number of 88 features without a

statistically significant decrease in performance. Allowing a further decrease of 0.5%, the total

number of features is reduced by 23.8% to a total of 80 features, also without a statistically

significant decrease in performance. From this point on, the performance reduction is signifi-

cant and reducing further the number of features will likely lead to a decrease in classification

performance after cross-validation. Using as criteria the smallest number of features that does

not decrease significantly the training performance, a total of S = 80 features was chosen.

Figure 10.3 illustrates the feature count given by (10.11) for each of the 142 features, using

S = 80. A total of 14 features were selected in all 10 iterations of the selection process, while

95 features were selected in at least 50% of the iterations. This means that after ranking the

features by their feature count and selecting the 80 features with the highest count, all features

in the final list of selected features were selected in at least 5 of the 10 iterations (with a mean

count of 7.67). This illustrates the robustness of the modified SFS method described earlier:

despite their simplicity and computational efficiency, sequential selection algorithms are known

to suffer from a so-called ‘nesting effect’, potentially leading to sub-optimal feature sets [238].

By iteratively performing several unbound SFS searches on different training sets and keeping

only the features that are selected most often, this effect should be reduced, as attested by the


0 50 100 1500.2

0.3

0.4

0.5

0.6

0.7

Number of features (-)

Tra

inin

g p

erf

orm

ance (

)κ (105, 0.58)

Avg. performance across all folds

Maximum performance

Figure 10.1: Training performance per fold and average training performance. The maximum maximum

performance is indicated with a marker.

0 5 10 15 20 25 30 35 40 450

1

2

3

4

5

Reduction in number of features (%)

Re

du

ctio

n in

tra

inin

g p

erf

orm

an

ce

(%

)

(58) 95.0%**

(64) 96.5%**

(66) 97.0%**

(68) 97.5%*

(71) 98.0%*

(75) 98.5%*

(80) 99.0%

(88) 99.5%

(60) 95.5%**(61) 96.0%**

(105) 100.0%

Figure 10.2: Reduction in training performance caused by a reduction in the number of features. For

each point, the number of features (in parenthesis) and the corresponding percentage compared to the

total number of features are indicated. Significance of difference between performance with and without

feature reduction was tested with a Wilcoxon signed-rank test (∗p < 0.05, ∗∗p < 0.01).

large average number of iterations each feature in the final set was selected.

For brevity only the 14 features selected in all iterations will be discussed further. Table 10.1

indicates the discriminative power of each feature using the pooled ASMD. It was computed

for each pair of classes after aggregating the feature values for all subjects and also the 90th

percentile of the ASMD (in parenthesis) obtained for each feature, for all individual subjects.

Pooled ASMD values below 0.5 were omitted and 90th percentile ASMD values below 1 were

omitted.

The top features are clearly discriminative for different pairs of classes which helps explain

the relatively large number of features selected. Additionally, it is interesting to observe that

there is one feature (median likelihood ratio) which does not have a pooled ASMD above 0.5 for

any class pair. However, its 90th percentile ASMD value is larger than 1 for the pairs D/W and

L/W. This is a good example of a feature which is discriminative for only a subset of the subjects


20 40 60 80 100 120 1400

2

4

6

8

10

Feature index (-)

Featu

re c

ount (-

)

Figure 10.3: Feature count indicating, per feature, in how many iterations it was selected when the

number of features was limited to S = 80.

(at least 10%) but not for all subjects. The fact that it was selected in every single iteration

using the wrapper method described in Section 10.2.4 suggests that it is complementary to

other chosen features for certain subjects, helping raise the overall training performance.

10.3.2 Cross-validation

Table 10.2 indicates the classification performance obtained after 10-fold cross-validation using

the selected set of 80 features. In addition, it indicates the performance per class, obtained by

considering each class as the positive class and merging the remaining in a single negative class.

The highest performance is obtained for R detection, followed by W. The lowest performance

is obtained for L. This is further confirmed by the confusion matrix of Table 10.3 which shows

that the largest proportion of errors occurs when trying to distinguish L from the other classes.

For all other classes, the percentage of misclassified epochs (relative to the total number of

epochs) is below 1% except for L.

In order to evaluate the performance of the classifier in a three-class task (WRN), classes D

and L were merged in a single N (non-REM) class. Table 10.2 indicates the resulting perfor-

mance. Analyzing the performance of the classifier we see that the classification performance

rises substantially, to a Kappa of 0.56 and an accuracy of 80%. This was expected since a large

number of classification errors occurred between D and L, and in a WNR task these two classes

no longer need to be distinguished.

To evaluate whether the procedure used to determine the number of features during feature

selection was adequate, we plotted the average classification performance after cross-validation

if the whole feature selection procedure from (10.11) onwards is used to select different-sized

sets of features, and cross-validation is repeated with the corresponding feature sets (Fig-

ure 10.4).

As it can be seen, the maximum cross-validation performance (κ = 0.50) is obtained with 76

features, only 5.3% features less than the 80 features chosen by the feature selection procedure,


Table 10.1: Pooled and individual 90th percentile ASMD values for features selected in all

iterations

Feature D/L D/R D/W L/R L/W R/W

Respiratory features:

VLF spectral power 0.56 1.02 0.86 0.68

(1.34) (1.69) (1.52) (1.26)

LF/HF spectral power ratio 0.56 0.85 0.95 0.70

(1.36) (1.62) (1.10) (1.62) (1.30)

Frequency SD over 270 s 0.79 1.46 1.41 0.84 0.97

(1.20) (1.82) (1.87) (1.38) (1.67) (1.09)

Mean breath-by-breath correlation 0.59 1.03 0.82

(1.27) (1.78) (1.76) (1.61) (1.51) (1.46)

Sample entropy regularity 0.71 0.61 0.55 0.86

(1.67) (1.53) (1.41) (1.49) (1.65)

DTW self-dissimilarity 0.59 0.86 0.86 0.56

(1.62) (1.58) (1.68) (1.39)

Standardized mean of troughs 0.82 1.21 0.97 0.56

(1.41) (1.83) (1.85) (1.19) (1.18) (1.34)

DTW peak-to-trough similarity 0.55

(1.06) (1.34) (1.04) (1.38)

Uniform scaling self-dissimilarity 0.92 1.46 1.16 0.85 0.56

(1.47) (1.87) (1.85) (1.50) (1.47) (1.22)

Cardiac (HRV) features:

Mean likelihood ratio 0.86

(1.50) (1.60) (1.09) (1.46) (1.23)

Median likelihood ratio

(1.19) (1.23)

Adapted LF spectral power 0.65 0.88 0.70

(1.47) (1.70) (1.59) (1.09) (1.15) (1.14)

Assortativity coefficient in VG 0.53

(1.34) (1.11) (1.22) (1.44)

Number small-degree nodes in VG 0.59

(1.05) (1.39) (1.32) (1.14) (1.13) (1.24)

The features can be referred to Section 10.2.4. The pooled ASMD was computed for each pair of

classes after aggregating the feature values for all subjects (values below 0.5 were omitted); The

90th ASMD percentiles (in parentheses) were obtained after computing the ASMD of each fea-

ture, for each subject (values below 1 were omitted).

but 38.2% less than the 105 features that give the maximum training performance. Furthermore,

the performance obtained with 80 features (κ = 0.49) is actually slightly larger than the perfor-

mance obtained with 105 features (κ = 0.48), confirming that the feature reduction procedure


Table 10.2: Cross-validation performance for 3 and 4 classes

Pooled Kappa Pooled Acc. Mean Kappa Mean Acc.

WRLD 0.49 0.69 0.49±0.13 0.69±0.08

D 0.51 0.89 0.50±0.17 0.89±0.04

L 0.40 0.71 0.41±0.14 0.71±0.07

R 0.57 0.87 0.58±0.19 0.87±0.08

W 0.54 0.91 0.51±0.18 0.91±0.04

WRN 0.56 0.80 0.56±0.15 0.80±0.08

The pooled performance was computed after aggregating all epochs of all

subjects. The mean and SD were calculated based on the performance for

each individual subject.

Table 10.3: Confusion matrix after cross-validation

Pred.↓ Ref.→ D L R W

D 3431 (7.6%) 1949 (4.3%) 5 (0.0%) 97 (0.2%)

L 2969 (6.6%) 19165 (42.6%) 2947 (6.5%) 2302 (5.1%)

R 86 (0.2%) 2071 (4.6%) 5383 (12.0%) 404 (0.9%)

W 31 (0.1%) 952 (2.1%) 243 (0.5%) 2996 (6.7%)

0 20 40 60 80 100 120 140

0.2

0.3

0.4

0.5

0.6

Number of features (-)

Cro

ss-v

alid

ation p

erf

orm

ance (

)κ

(76, 0.50)

(80, 0.49)

(105, 0.48)

Maximum performancePerformance using feature selectionUsing features that give maximum training performanceAverage performance for all subjectsStandard deviation of the performance for all subjects

Figure 10.4: Performance after cross-validation for a varying number of features with markers indicating

the maximum performance, and the performance with the number of features resulting from the feature

selection procedure and with the number of features that give the best training performance.

is beneficial to reduce over-fitting.

Figure 10.5 illustrates three examples of predicted hypnograms, as compared with the refer-

ence, for three subjects in the data set: the subject with the worst performance (with κ = 0.17),

with the median performance (with κ = 0.50) and with the best performance (with κ = 0.69).

A possible explanation for the poor performance obtained for the worst subject is that the model


D

L

R

W

Time since lights off

PS

G

κ = 0.17

00:00 01:59 03:59 05:59 07:59

D

L

R

W


Pre

dic

tion

D

L

R

W


κ = 0.50

00:00 01:59 03:59 05:59 07:59

D

L

R

W


D

L

R

W


κ = 0.68

00:00 01:50 03:40 05:30 07:20

D

L

R

W


Figure 10.5: Example of sleep stage reference (top) and predictions (bottom) for the subject with the

worst performance (left), with the median performance (middle) and with the best performance (right).

trained with the characteristics of the general sample population does not fully capture this sub-

ject’s cardiac and respiratory expression of different sleep stages. However, despite the low

Kappa coefficient, the predicted hypnogram still exhibits some correct features, namely, most

REM intervals were detected, albeit with the incorrect length, and the two deep sleep periods

were also detected. As the performance improves, we see that the predicted hypnograms match

better the characteristics of the reference hypnogram, and in the best case the most obvious

mistakes are in the missed detection of brief periods of wake during the night while the rest of

the sleep stages are correctly predicted. This is likely caused by the use of spline smoothing

during feature post-processing, which is adequate to capture the slow-changing characteristics

of most sleep stages, but penalizes short, abrupt changes such as brief periods of awakening.

10.3.3 Comparison with state-of-the-art

In literature, only a few studies focused on WRLD classification based on cardiac and/or res-

piratory signals and our results are amongst the best performing. The first observation is that

the results (κ = 0.41 and accuracy = 0.65) of our previous work [185], which used only respi-

ratory features, are worse than those produced in the present work, indicating that combining

cardiac and respiratory activity can lead to an improved classification performance. Isa et al.

[138] presented a Kappa coefficient κ of 0.26 (with an accuracy of 0.60) using only cardiac fea-

tures. The study of Hedner et al. [127] achieved similar results (κ = 0.48 and accuracy = 0.66)

but they used more signal modalities including peripheral arterial tone, actigraphy, and pulse

oximetry. The recent study by Willemen et al. [309] also achieved a good performance with a

κ of 0.56 and an accuracy of 0.69, although it was validated with a younger sample population

(age 22.1 ± 3.2 y), excluded 12% of the epochs from validation and used a basis of 60-s epochs


instead of the standard scoring basis of 30 s which makes the results incomparable.

For WRN classification with cardiac and/or respiratory activity, we see that, to the best of

our knowledge, our results also outperform those reported in almost all of the previous studies,

such as κ = 0.32 and accuracy = 0.67 [248], κ = 0.45 and accuracy = 0.76 [249], κ = 0.42

and accuracy = 0.72 [198], κ = 0.44 and accuracy = 0.79 [161], κ = 0.55 and accuracy = 0.77

[200], κ = 0.48 and accuracy = 0.78 [167], κ = 0.46 and accuracy = 0.73 [312], κ = 0.62 and

accuracy = 0.81 [309], and κ = 0.58 and accuracy = 0.78 [94]. In comparison with one of the

best performing studies [94], we obtain a higher accuracy (albeit a slightly smaller Kappa) but

require one less modality (actigraphy). Regarding the work of Willemen et al. [309] it is again

important to note that the results in that study were obtained on basis of 60-s epochs.

10.4 Conclusion

This chapter presents a method to identify overnight sleep stages using cardiorespiratory fea-

tures extracted from ECG and RIP signals. These features were post-processed by means of

subject-specific Z-score normalization and spline smoothing, which helps reduce the influence

of signal noise, between-subject, or within-subject variability in autonomic physiology. Eighty

features were selected from a set of 142 features using a modified SFS-based feature selector

designed to avoid biasing validation performance. Using a linear discriminant classifier in a

ten-fold cross-validation procedure, the classification results (for both the four-class WRLD

and three-class WRN classification tasks) achieved in this work outperform most of the previ-

ous studies.

CHAPTER 11

General discussion and future perspectives

165

166 Chapter 11. General discussion and future perspectives

11.1 Analysis of features

As mentioned in Chapter 1, to achieve ultimate improvement in sleep stage classification, one

of the aims in this thesis was to extract new features that contain cardiorespiratory characteris-

tics in addition to the existing features are robust to the variability between or within subjects.

Table 11.1 lists all the 142 features including 86 cardiac, 44 respiratory and 12 cardiorespi-

ratory interaction (CRI) features previously used for sleep stage classification. Among those

features, 53 features (15 cardiac, 27 respiratory, and 11 CRI features) were newly proposed in

this thesis. The feasibility of all these new features in enhancing sleep stage classification has

been revealed in previous chapters. The spectral features with adaptive boundaries (C72-C75)

were presented in Chapter 2 and two novel self-similarity respiratory features measured by

means of dynamic time and frequency warping (R16 and R17) were presented in Chapter 3.

These features were shown to help identifying sleep/wake states, especially when actigraphy

was absent. A different similarity metric uniform scaling was exerted to extract a respiratory

self-dissimilarity feature (R33) as described in Chapter 5, which was beneficial for classifying

sleep stages, in particular for detecting deep sleep or slow wave sleep (SWS) from the other

stages. Chapter 4 designed a set of respiratory features (R18-R32) derived from the respira-

tory signal envelopes and area under the curves, expressing the breathing depths and volumes,

respectively. These features were superior in identifying deeper sleep stages. Chapter 6 dis-

cussed a visibility graph model that can be potentially used to extract novel features from in

regard to CRI properties in complex networks. In Chapter 10, besides the VG-based CRI fea-

tures (X2-X12), the model also allowed extracting VG-based features from heartbeat intervals

(C76-C86) and breath-to-breath intervals (R34-R44). Some of them were automatically se-

lected with the feature selection procedure described in that chapter, contributing on achieving

better sleep stage classification results.

To provide a more detailed comparison between the new features and the existing features,

Figure 11.1 illustrates their discriminative power in separating each pair of classes (sleep stages)

including wake versus REM sleep (W/R), wake versus light sleep (W/L), wake versus deep

sleep (W/D), REM sleep versus light sleep (R/L), REM sleep versus deep sleep (R/D), and

light sleep versus deep sleep (L/D). The discriminative power was measured by the standard-

ized mean difference (ASMD) metric, computed by pooling over all 30-s epochs. Note that the

feature values were post-processed per night with Z-score normalization and spline smooth-

ing as described previously. The data set used here was the same as that used in Chapter 4,

5, and 10 where single-night polysomnographic (PSG) recordings acquired from 48 healthy

subjects with normal overnight sleep architectures. It is noted in the figure that many new fea-

tures proposed in this thesis appeared relatively high discriminative powers, indicating that they

were effective in help classifying sleep stages. Further, we observe that the feature ranking in

separating different pairs of classes seems not consistent. For example, the cardiorespiratory

interaction VG-based features X2-X12 generally performed better in identifying wake epochs

when compared with the other new features, while the respiratory amplitude features R18-R32

and the dissimilarity features R33 ranked higher for discriminating deep sleep from the other

sleep stages. This motivates us to investigate specified features in identifying different sleep

310036722

Pencil

310036722

Pencil

Chapter 11. General discussion and future perspectives 167

Table 11.1: A list of features used in this thesis

Feature index Description References

Existing cardiac features

C1-C3 Mean HR, mean RR, and detrended mean RR [248, 288]

C4-C8 SDNN, RR range, pNN50, RMSSD, and SDSD [288]

C9-C12 RR logarithmic VLF, LF, and HF power and LF-to-HF ratio [59, 288]

C13-C16 RR mean resp frequency and power, max phase and module in HF pole [197]

C17-C36 RR SampEn regularity at length 1 scale 1-10 and length 2 scale 1-10 [75]

C37-C42 RR DFA, its short, long exponents and all scales, PDFA, and WDFA [148, 224, 289]

C43-C46 Mean absolute difference in HR and RR and in detrended HR and RR [308, 315]

C47-C56 RR and HR percentiles (10%, 25%, 50%, 75%, and 90%) [308, 315]

C57-C68 Detrended RR and HR percentiles (10%, 25%, 50%, 75%, and 90%) [308, 315]

C67-C70 Mean, median, minimum, and maximum of RR likelihood ratios [32]

C71 RR SampEn regularity in symbolic binary changes [78]

⊲ New cardiac features

C72-C75 Adaptive logarithmic RR VLF, LF, and HF power and LF-to-HF ratio Chapter 2

C76-C81 Node degree mean, SD, and slope in VG and in DVG of RR Chapter 6, 10

C82-C85 Number of small- and large-degree nodes in VG and in DVG of RR Chapter 6, 10

C86 Assortativity mixing coefficient in VG of RR Chapter 6, 10

Existing respiratory (resp) features

R1-R3 Resp frequency in time and frequency domain and its spectral power [248, 249]

R4-R7 Resp logarithmic VLF, LF, and HF power and LF-to-HF ratio [248]

R8-R10 Resp frequency SD over 150, 210, and 270 s [249]

R11-R13 Mean and SD of breath-by-breath correlations, SD of breath lengths [248]

R14-R15 Resp SampEn regularity and resp variance [249, 250]

⊲ New respiratory features

R16,R17 Resp dynamic time and frequency warping self-(dis)similarity Chapter 3

R18-R21 Standardized mean and median of resp peaks and troughs Chapter 4

R22,R23 SampEn regularity of resp peaks and troughs Chapter 4

R24,R25 Median peak-to-trough difference and dynamic time warping similarity Chapter 4

R26-R31 Median volume and flow rate of breaths, inhalations, and exhalations Chapter 4

R32 Ratio of inhalation-to-exhalation flow rate Chapter 4

R33 Resp uniform scaling self-dissimilarity Chapter 5

R34-R39 Node degree mean, SD, and slope in VG and in DVG of BB Chapter 6, 10

R40-R43 Number of small- and large-degree nodes in VG and in DVG of BB Chapter 6, 10

R44 Assortativity mixing coefficient in VG of BB Chapter 6, 10

Existing cardiorespiratory interaction (CRI) features

X1 Co-power between RR and resp [137]

⊲ New cardiorespiratory interaction (CRI) features

X2-X7 Node degree mean, SD, and slope in VG and in DVG of CRI Chapter 6, 10

X8-X11 Number of small- and large-degree nodes in VG and in DVG of CRI Chapter 6, 10

X12 Assortativity mixing coefficient in VG of CRI Chapter 6, 10

HR, heart rate; RR, heartbeat interval; SD, standard deviation; SDNN, SD of RR; RR range, maximal-

to-minimal RR difference; pNN50, percentage of successive RR differences >50 ms; RMSSD, root mean

square of successive RR differences; SDSD, SD of successive RR differences; VLF, very low frequency;

LF, low frequency; HF, high frequency; SampEn, sample entropy; DFA, detrended fluctuation analysis;

PDFA, progressive DFA; WDFA, windowed DFA; VG, visibility graph; DVG, difference VG; BB, breath-

to-breath interval.


0

1A

SM

D

0

1

AS

MD

0

1

AS

MD

0

1

AS

MD

0

1

AS

MD

0

1

AS

MD

Feature index

C1-C71 C72-86 R1-15 R16-44 X2-12X1

Exist. features New featuresW/R

W/L

W/D

R/L

R/D

L/D

Figure 11.1: Discriminative power as measured by ASMD of all the 142 features with post-processing

(Z-score and/or spline smoothing) in separating each two sleep stages.

stages to further improve the classification performance in future work.

When classifying multiple sleep stages simultaneously [(wake, REM sleep, light sleep, and

deep sleep (W/R/L/D) and wake, REM sleep, and NREM sleep (W/R/N)], the feature dis-

criminative power can quantified by the One-Way analysis of variance (ANOVA) F-statistic

metric (Figure 11.2). It also indicates that the several new features outperformed many exist-

ing features. Chapter 4, 5, 8, and 10 have shown that the post-processing with (subject- or

night-specific) Z-score normalization and spline smoothing can improve the features and there-

after enhance the sleep stage classification performance. This is because these methods could

help reduce either the between- or the within-subject effect to a certain extent. Additionally,

the use of spline smoothing instead of the other low-pass filtering could also help interpolate

the missing feature values that constituted an average of about 10% of the total amount of

epochs per night for some features. These missing values were possibly caused by, e.g., in-

sufficient detected heartbeats or respiratory peaks/troughs due to the presence of body motion

artifacts. Figure 11.2 also illustrates the discriminative power (ANOVA F-statistic) for all fea-

tures (1) without post-processing, (2) with Z-score normalization, and (3) with both Z-score

normalization and spline smoothing. It shows that both methods can yield a clear increase in

ANOVA F-statistic for most of the features. However, we also see that, for some features,

the smoothing resulted in a decreased discriminative power. For example, the feature respi-

ratory dynamic time warping self-(dis)similarity (R16) was regarded as an indicator of body


1000

2000

3000

4000

5000

6000

AN

OV

AF

-sta

tistic

Without post-processing

With Z-score

With Z-score and smoothing

W/R/L/D

0

Feature index

C1-71 C72-86 R1-15 R16-44 X2-12X1

1000

2000

3000

4000

5000

6000

7000

AN

OV

AF

-sta

tistic

Without post-processing

With Z-score

With Z-score and smoothing

W/R/N

0

Feature index

C1-71 C72-86 R1-15 R16-44 X2-12X1

Figure 11.2: Discriminative power as measured by ANOVA F-statistic of all the 142 features with and

without post-processing (Z-score and/or spline smoothing) for W/R/L/D and W/R/N separation.

movements that has been successfully used to identify wake epochs. Its feature values had a

strongly skewed distribution, where the body movement information was usually reflected by

the high-frequency components in the spectral domain. Smoothing this feature would filter out

the useful body movement information, leading to a deteriorated discriminative power. There-

fore, we think that the post-processing methods should be ‘feature-dependent’. In other words,

it is worthwhile to investigate a criterion that can be used to determine if a feature needs to be

post-processed or not. For example, this criterion can be linked to the distribution of a specific

feature.

It was likely that the features with a high discriminative power would be significantly corre-

lated with mutual information in sleep stage classification. To have a general view of feature-

to-feature correlations, Figure 11.3 plots the Spearman’s rank correlation coefficients between

all the 142 features. The features respiratory frequency SD over 150, 210, and 270 s (R8-R10)


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Feature index

C1-71X1

C72-86 R1-15 R16-44 X2-12

Featu

re index

C1

-71

X1

C7

2-8

6R

1-1

5R

16

-44

X2

-12

Figure 11.3: Correlation coefficients (Spearman’s rank) between features.

are typical examples. These three features had the highest discriminative power in general but,

apparently, they are strongly correlated. On the other hand, some lower-ranked features could

still contribute to the classification if they contained additional physiological information that

was not observed in the top-ranked features. Therefore, feature selection that takes both the

feature discriminative power and the correlation between features into account. For example,

for the binary-class problem, the correlation-based feature selector (CFS) has been success-

fully used for deep sleep detection (Chapter 8), where only six features were selected without

loss in final classification performance. However, CFS was considered to be inapplicable for

the multiple-class problem since the changes in different features (reflecting certain aspects in

physiology) across sleep stages were not linear and were not even always consistent. A super-

vised sequential forward search (SFS) feature selection algorithm was described in Chapter 10,

whereas some features with a low discriminative power were still selected.

In Chapter 9, it was demonstrated that the physiological variations within subjects and from

subject to subject would be the main barrier for achieving reliable sleep stage classification re-

sults, where a multilevel modeling method was proposed to evaluate features by quantifying

the amount of those variations. As discussed in the associated chapters about the new fea-

tures, they were expected to either have additional physiological information or be robust to

between-/within-subject variability. Additionally, the employment of feature post-processing

methods (normalization and smoothing) was assumed to diminish the variations conveyed by

the features. However, it was not thoroughly exploited for those new features or post-processing

methods that what drove the contribution to getting improved sleep stage classification results.


Noagreement

< 0

Slightagreement

0 - 0.20

Fairagreement

0.21 - 0.40 0.41 - 0.60

Moderateagreement

Substantialagreement

0.61 - 0.80

Almost perfectagreement

0.81 - 1

Figure 11.4: Indication of agreement level of the Cohen’s Kappa coefficient [172].

Further research is required at this point using the method proposed in Chapter 9, which will

help understand the new features and the post-processing methods, thus inspiring us to, for ex-

ample, develop adaptive post-processing/feature selection algorithms to optimize each feature

for identifying different sleep stages.

11.2 Sleep stage classification

Nocturnal sleep stage classification with body movements, cardiac, and/or respiratory activity

that can be unobtrusively acquired represents a novel frontier for quantitative sleep assessment.

Since the evaluation metric of overall accuracy also relies on the class balance (here the per-

centages of sleep stages) usually changing over data sets, we primarily compare the Cohen’s

Kappa coefficient robust to the chance agreement. The Kappa values can be characterized in

different levels of agreement [172] as illustrated in Figure 11.4.

To benchmark the classification methods and results, Table 11.2 compares the best classi-

fication results produced in our work with those reported in literature. The studies with signal

modalities (body movements, cardiac, and/or respiratory activity), subjects, number of record-

ings, number of features, algorithms, and classification performance (accuracy and Kappa co-

efficient) were presented. We included four classification tasks for comparison: four-stage

(WRLD) classification and three-stage (WRN) classification, sleep and wake (SW) classifica-

tion, and deep sleep (D) or SWS detection. In order to allow fair comparisons, only the clas-

sification results in a subject-independent scheme (i.e., classifying sleep stages for an ‘unseen’

subject using the model trained on the data from other subjects) are shown in the table. It is

more realistic with regard to home sleep monitoring compared with a subject-specific scheme

(i.e., classifying sleep stages for a subject using the model trained on the data from the same

subject) that requires at least one-night stay in a sleep laboratory with PSG-based scoring. In

that case, the reported classification performances (usually much better than those estimated

with a subject-independent scheme [151, 248, 312]) would be over-optimized.

Table 11.2 indicates that the classification performance as reported in earlier studies was

mostly poorer than that achieved in this thesis. However, it should be noted that comparing

the classification performance between studies is complicated by the variety of experimental

settings such as data recording protocols (e.g., recording between lights ON and OFF or not),

number of subjects/recordings, subject group (e.g., healthy/unhealthy or age group), and signal

modalities, and by the variety of classification procedure such as features, type of classification

algorithm, and validation method. The study by Willemen et al. [309] reported better classifica-

tion results than those obtained in this thesis in WRLD and WRN classification tasks, but they


Table 11.2: Studies on sleep stage classification with cardiorespiratory activity (and body movements)

Task First author, year Modalitya Record.b Epoch Algo.c Acc.d Kappad

WRLD Yilmaz, 2010 [315] ECG 8 H 30 s SVM 73%† n.a.

classification Isa, 2011 [138] ECG 16 O 30 s RF 60% 0.26

Hedner, 2011 [127] BM, PAT, PO 227 H/O 30 s zzzPAT 66% 0.48

Willemen, 2014 [309] BM, ECG, RE 85 H 60 s SVM 69% 0.56

This thesis (Chapter 10)∗ ECG 48 H 30 s LD 67% 0.46

RE 48 H 30 s LD 66% 0.44

ECG, RE 48 H 30 s LD 69% 0.49

WRN Redmond, 2006 [248] ECG, RE 37 O 30 s LD 67% 0.32

classification Redmond, 2007 [249] ECG, RE 31 H 30 s LD 76% 0.45

Mendez, 2010 [198] BCG 17 H 30 s KNN 72% 0.42

Kortelainen, 2010 [161] BCG 18 H 30 s HMM 79% 0.44

Kurihara, 2012 [167] BCG 20 H 30 s IR 78% 0.48

Xiao, 2013 [312] ECG 45 H 30 s RF 73% 0.46


Domingues, 2014 [94] BM, ECG, RE 24 H 30 s HMM 78%‡ 0.58‡

This thesis (Chapter 10)∗ ECG 48 H 30 s LD 78% 0.52

RE 48 H 30 s LD 77% 0.50

ECG, RE 48 H 30 s LD 80% 0.56

SW Redmond, 2007 [249] ECG, RE 31 H 30 s LD 89% 0.60

classification Karlen, 2009 [151] ECG, RE 6 H 30 s ANN 85% n.a.

Devot, 2010 [89] BM, ECG, RE 35 H/I 30 s LD 87% 0.62

Jung, 2013 [145] BCG 10 H 30 s TH 97%§ 0.83§


This thesis (Chapter 2) ECG 15 H 30 s LD 93% 0.48

BM, ECG 15 H 30 s LD 96% 0.64

This thesis (Chapter 3) RE 15 H 30 s LD 94% 0.59

BM, RE 15 H 30 s LD 96% 0.66

BM, ECG, RE 15 H 30 s LD 96% 0.67

D (SWS) Shinar, 2001 [273] ECG 34 H 30 s TH 80% n.a.

detection Choi, 2009 [68] BM, ECG 4 H 30 s BACT 92%§ 0.62§

Bsoul, 2010 [53] ECG 16 H/O 30 s SVM 83%¶ n.a.

Hedner, 2011 [127] BM, PAT 227 H/O 30 s zzzPAT 89%♯ 0.49♯

Ebrahimi, 2013 [99] ECG 30 H 30 s LD 80% n.a.

Long, 2014 [186] ECG 15 H 30 s LD 81% 0.42

This thesis (Chapter 8) ECG 257 H 30 s LD 88% 0.54

RE 257 H 30 s LD 88% 0.51

ECG, RE 257 H 30 s LD 89% 0.57

aBM, body movements; ECG, mostly heart rate variability (HRV) was used; RE, respiration; BCG, ballisto-

cardiogram (including BM, HRV, and/or RE); PAT, peripheral arterial tone; PO, pulse oximetry (also pulse

rate). bH, healthy subjects; O, subjects with obstructive sleep apnea; I, insomniacs. cSVM, support vector ma-

chine; RF, random forest; zzzPAT, the algorithm described in [131]; LD, (Bayesian) linear discriminant; KNN,

k-nearest neighbour; HMM, hidden Markov model; IR, incidence ratio; ANN, artificial neural network; TH,

thresholding; BACT, the algorithm described in [68]. dSubject-independent classification results are presented.∗Results were either presented in the corresponding chapter or produced using the methods in that chapter.†Cross-validation was used within each subject. ‡Ambiguous epochs were rejected, §Training and test sets

were mixed. ¶Light sleep were disregarded. ♯Results were re-computed from the reported confusion matrix.


used 60-s epochs rather than the clinically standard 30 s which made the classification easier.

Jung et al. [145] (for SW classification) and Choi et al. [68] validated their algorithm without

splitting training and test sets, obviously leading to bias of the classification performance.

The findings in regard to the time delay between changes in autonomic and brain activity

during some sleep stage transitions have been described in Chapter 7 and utilized for helping

detect deep sleep from all the other sleep stages (Chapter 8). Unfortunately, these findings

were not incorporated in classifying multiple sleep stages in the methodology presented in

Chapter 10, which will be promising in further improving the classification performance. It

is important to note that the incorporation of time-delayed methods should depend on sleep

stages as long as some sleep stage transitions appear no time delays between autonomic and

brain activity.

In addition to sleep stages, arousals also influence the changes in cardiorespiratory activity

during sleep [293], constraining the sleep stage classification. Hence, correcting the arousal

influences would promise an improvement in cardiorespiratory-based sleep stage classification.

This merits further study.

To answer the research question of this thesis raised in Chapter 1, Figure 11.5 shows the

progressive increases of our sleep stage classification results achieved in different phases during

the past four years’ PhD work. It can be seen that more reliable performances in sleep stage clas-

sification (for healthy adults) with body movements, cardiac and/or respiratory activity has been

achieved. It is interesting to compare the performance of sleep stage classification (for wake,

REM sleep, light sleep, and deep sleep) using our cardiorespiratory-based approach with that

using an automatic PSG-based system. Here the agreement between automatic classification

and the standard manual scoring were used for comparison. With the validated Somnolyzerr

[20], a Cohen’s Kappa coefficient of 0.80 and an accuracy of 85% were reached for classifying

the four stages (re-computed based on the reported confusion matrix). These results are com-

parable to the agreement between human raters [81], indicating that, unsurprisingly, the PSG-

based automatic sleep staging system far outperforms the cardiorespiratory-based approach pre-

sented in this thesis. This implies that sleep stage classification with cardiorespiratory activity

is not applicable for clinical utilization at present. However, it is still promising for home sleep

monitoring aiming at offering an understandable sleep assessment for users in a healthy con-

dition. Nevertheless, further researches are encouraged to improve the cardiorespiratory-based

sleep classification, particularly in distinguishing between wake and REM sleep and between

light and deep sleep that seem more difficult than separating the other sleep stages (see Chap-

ter 10).

11.3 Classifier

Evaluating different classifiers is out of the scope of the present thesis, in which a simple linear

discriminant classifier was adopted all the time. In fact, many other different classifiers have

been tested over the years including thresholding (TH), quadratic discriminant (QD), hidden

Markov models (HMM), support vector machines (SVM), random forest (RF), neural networks


2011 2012 2013 2014 20150.2

0.3

0.4

0.5

0.6

0.7

Year

Cohen

’s K

appa c

oeffic

ient

BM+ECG+REBM+ECG+RE

BM+ECG

BM+RE

BM+ECG

BM+RE

ECG

ECG

RE

RE

ECG+RE

ECG+RE

RE

RE RE

RE

RE

RE RE

REECG

ECG+RE

ECG

ECG+RE

SW classification

D detection

WRN classification

WRLD classification

RE

ECG

Figure 11.5: Progression of increases in sleep stage classification performance (Cohen’s Kappa coef-

ficient) achieved in different phases during the PhD work. All increases were found to be significant

(p < 0.05), examined with a Wilcoxon (two-sided) sign-rank test. The signal modalities included body

movements (BM), electrocardiogram (ECG), and/or respiratory effort (RE). The classification tasks in-

cluded: sleep and wake (SW) classification, deep sleep (D) or slow wave sleep (SWS) detection, wake,

REM sleep, and NREM sleep (WRN) classification, and wake, REM sleep, light sleep, and deep sleep

(WRLD) classification. The highest Kappa for each classification task is marked in bold.

(NN), etc. For many classification tasks, the LD classifier was found to be one of the best per-

forming algorithms (see Table 11.2). The strength of LD lies in the underlying simple model,

providing a robust model of the features over the different sleep stages. From a machine learn-

ing point of view, we speculate that the current features only expressed limited indicative phys-

iological information for separating sleep stages so that the classification performance would

not be markedly improved unless new features or classifiers that can characterize additional

inherent physiological information are used. For example, because the LD classifier is inde-

pendent of time whereas sleep is a structured process (i.e., the state and characteristics of each

epoch are not independent), temporal classifiers exerting this structure are expected to improve

the classification. Therefore, exploring these types of classifiers should be in the future work.

11.4 Subject/patient groups

For the potential applications of sleep monitoring in a home environment, this thesis primar-

ily addressed on the sleep stage classification for healthy adults who had usually a normal

overnight sleep architecture, in which each sleep stage had a certain amount of epochs as sug-

gested by Ohayon et al. [216]. Therefore, our methods and the corresponding settings (e.g.,

parameters when computing features, selected features, classifier operating thresholds, the use

of normalization, and smoothing window size) were optimally tuned on the data from healthy


0 1 2 3 4 5 6 7

Deep

Light

REM

Wake

Time (h)

0 1 2 3 4 5 6 7

Deep

Light

REM

Wake

Time (h)

0 1 2 3 4 5 6 7

Deep

Light

REM

Wake

Time (h)

Insomniac

Healthy subject

OSA patient

Figure 11.6: Examples of overnight sleep stages for a healthy subject, an insomniac, and an OSA patient.

subjects and might not be appropriate for the other subject groups with prevalent sleep prob-

lems. For example, Figure 11.6 depicts typical examples of overnight sleep architecture (sleep

stages) from a healthy subject, an insomniac, and a patient with severe obstructive sleep ap-

nea (OSA). The sleep stages (wake, REM sleep, light sleep, and SWS) were obtained through

PSG-based manual scoring by sleep technicians. Clearly in the figure, the manifested sleep

architecture differs between the three subjects throughout the night, during which sleep stages

would be altered by the pathophysiology of the disordered sleep. For example, in comparison

with healthy subjects, insomniacs experience much longer wake time [215], sometimes along

with SWS deficiency [109, 112]. Patients with the sleep apnea syndrome is associated with

sleep fragmentation (with a lot of sleep stage transitions) due to the repeated occurrences of

end-apneic arousals [156, 317], altered cardiac variability [207], and dysfunction in autonomic

nervous activity changing across sleep stages [119, 120] which is the rationale and hypothesis

for autonomic-based sleep stage classification. In addition, the autonomic function is also in-

fluenced by the presence of some sleep problems [22, 195, 280]. As a consequence, these will

lead to difficulties in cardiorespiratory-based sleep stage classification for these patient groups.

Using a classifier trained by the data from healthy subjects to classify sleep stages for sleep

disordered patients would obviously not be applicable. For a specific patient group, even if

the classifier derived from some patients is applied for the other patients in the same patient

group, the classification performances would still be worse than those obtained for healthy

subjects. This can be seen in Table 11.2, for example, by comparing the classification results

in [249] for healthy subjects (Kappa = 0.46, accuracy = 76%) and in [248] for OSA patients

(Kappa = 0.32, accuracy = 67%). Moreover, the effectiveness of the features and the post-


processing methods proposed in this thesis is unknown for other subject groups, which should

be further studied. For example, the Z-score normalization assumed that the percentages of

sleep stages for different subjects are similar which is not always the case for sleep disrupted

patients such as insomniacs. Smoothing the feature values per night for OSA patients with

a fragmented sleep architecture seems not appropriate since it would not be able to capture

the fast and subtle changes in cardiorespiratory activity caused by the frequent occurrence of

arousals. Thus, the post-processing methods must be re-examined for those patient groups.

In addition, it was found that the sleep stage classification performance was dependent of

age. Chapter 8 has revealed that the deep sleep epochs were easier to be correctly identified

with cardiorespiratory activity for younger subjects in comparison with elderly people. In fact,

the overnight sleep architecture is age-related as shown by Ohayon et al. [216], in which a meta-

analysis of sleep parameters across the human lifespan (for healthy subjects with ages 5 to 85

y) showed that the total sleep time (TST), wake after sleep onset (WASO), REM sleep time,

and deep sleep time decrease along with the increase in age. Moreover, the multilevel analysis

results presented in Chapter 9 indicate that the autonomic cardiorespiratory activity during

sleep was significantly influenced by age. To these matters, executing sleep stage classification

for different age groups would be promising to further enhance the classification performance.

This merits further exploration.

11.5 Objective and subjective sleep assessments

Although the focus of this thesis was on objective sleep monitoring, it is important to assess

sleep from subjective perspectives because ‘sleep quality’ is also linked to the perception or

feeling by humans. For example, sleep deprivation has been consistently associated with the

loss of daytime (cognitive or behavioral) performance, such as drowsiness, irritability, or in-

creased fatigue [47, 98, 263]. Chronic sleep disruption caused by, e.g., sleep fragmentation, has

been shown to relate to worsened mood [255].

Several clinical questionnaires have been published for examining sleep quality such as the

Pittsburgh Sleep Quality Index (PSQI) [60] and the Self-Assessment of Sleep and Awakening

Quality Scale (SSA) [262]. The relationship between objective sleep variables, derived from a

PSG-based sleep architecture, and subjective sleep quality, obtained from questionnaires, has

been researched thoroughly in the past. Although some inconsistent or even paradoxical re-

sults were found where the study outcomes differed to which extent the objective variables are

correlated, some objective variables were consistently related to the subjective sleep experi-

ence. The most profound association was between wake time and subjective sleep quality with

a correlation r = −0.59 [9, 21, 251]. A reliable objective sleep stage classification will enable

the analysis and construction of a frontier model by combining objective and subjective mea-

surements to assess an overall sleep quality, in particular when the classification is done with

cardiorespiratory activity (and body movements) that can be acquired unobtrusively at home.

Krystal and Edinger [164] proposed new methods to analyze PSG data more at a measure of

nature/depth of sleep, such as indices for the frequency content of electroencephalogram (EEG)


signals obtained during NREM sleep, or to look at particular patterns in the NREM sleep and

their sequence between NREM sleep patterns instead of only taking the variables derived from

sleep stages. In the context of sleep monitoring with cardiorespiratory activity, it would be

tempting to go a further step to investigate the associations between autonomic physiological

measures and subjective sleep quality have not been explicitly analyzed.

Intuitively, it was expected that a stable sleep, seen in, for example, a low breathing rate

variation overnight, is indicative for a good sleep quality rating. Taking this respiratory measure

as an example, we analyzed the (Spearman’s rank) correlation between the total SSA score and

the overnight mean standard deviation of breathing rates (SDBR) for males and females and for

three age groups [young: 20-39 y (n = 52), middle aged: 40-69 y (n = 69), and elderly: ≥70 y

(n = 44)]. The SSA consists of 27 questions, divided in four parts: sleep quality, awakening

quality, somatic complaints, and estimates about sleeping times of last nights. A total SSA

score can be calculated when taking the first three parts, or a sub score of each part separately

can be calculated. The total score range is between 20 and 80, where higher scores indicate

poorer sleep quality. The respiratory measure here was obtained as the mean of the whole

night, calculated by taking the mean SDBR for each sleep stage separately and followed by

calculating the mean of those separate means for different sleep stages. This was done so

that the final mean value was not influenced by the differences in the percentages of the sleep

stages, serving to purely look at the physiological measure without including information of

sleep stages. The data used here was the same as that used in Chapter 9 including 165 subjects

monitored in sleep laboratories with PSG for two consecutive nights. Positive correlations were

found between the mean SDBR and the total SSA score [night 1: r = 0.179, p = 0.024; night 2:

r = 0.213, p = 0.007]. This means that a higher variation of breathing rate was associated

with worse sleep experience. However, the correlation coefficient was not high, implicating

that a weak association in between. A gender effect was observed in both nights, as significant

correlations were found between mean SDBR and total score on SSA for females [night 1:

r = 0.263, p = 0.014, Figure 11.7(a); night 2: r = 0.300, p = 0.005, Figure 11.7(b)], but not

appeared for males. Therefore, a higher mean variation of the breathing rate was associated

with a worse sleep quality in women. The significant correlations found for females contributed

to the presence of the previous (weak) correlations found in all the subjects. An explanation

for this is not clear and needs to be further investigated. Additionally, moderate correlations

were found between the mean SDBR and the total score of the SSA in the first night in the

elderly group [r = 0.399, p = 0.008, Figure 11.7(c)]. No significant correlations were observed

for the other age groups, suggesting that, especially for elderly subjects, a higher breathing rate

variation was associated with worse sleep experience. However, these results were not present

in the second night, meaning that these findings might be due to the “first-night effect” present

in this data set [196].

This was a preliminary analysis (as an example) and future research with more in-depth

analyses of the PSG data is needed to better understand the relationship between objective

sleep measurements and subjective sleep quality. Moreover, multiple nights are necessary to

assess the night-to-night variability within this relationship.


0 0.05 0.120

30

40

50

Mean SDBR (Hz)

To

tal S

SA

sco

re

0 0.05 0.120

30

40

50

Mean SDBR (Hz)

To

tal S

SA

sco

re

0 0.05 0.120

30

40

50

Mean SDBR (Hz)

To

tal S

SA

sco

re

(a) (b) (c)

Figure 11.7: Scatter plot of the total score on the SSA and the mean SDBR for (a) women of night 1

and (b) women of night 2, and (c) the elderly group of night 1.

11.6 Towards unobtrusive sleep monitoring

This thesis has demonstrated the feasibility of using cardiorespiratory activity (and body move-

ments) to classify nocturnal sleep stages for healthy adults. Although those signals were mea-

sured with the traditional ways, i.e., ECG for the cardiac signal and respiratory inductance

plethysmography (RIP) for the respiratory signal, they are less obtrusive than the full PSG

recordings. Moreover, these signal modalities have been shown to be feasible for the applica-

tions of home sleep monitoring in long-term since they can be acquired using several advanced

unobtrusive measurement techniques, as mentioned in Section 1.4 (Chapter 1).

Researchers have already attempted to identify sleep stages using cardiorespiratory activity

(and body movements) measured with some unobtrusive techniques, e.g., BCG in the form of

a mattress [161, 303], textile pressure-sensitive sensor arrays in the form of a bedsheet [264],

photoplethysmography (PPG) in the form of a wrist watch/band [260], and peripheral arterial

tone (PAT) in the form of a finger tip [127]. However, as discussed, the performance reported

in these studies are not as good as expected. The challenges (such as the reliability of mea-

surements to the standard ECG and RIP signals, the robustness against motion artifacts during

measurement, and the influences of sleep postures) vary between the various techniques. These

challenges need to be addressed when using those techniques for sleep stage classification. To-

wards unobtrusive sleep monitoring, the methods presented in this thesis need to be further

validated when those unobtrusively acquired cardiorespiratory (and body movement) informa-

tion comes available.

References

[1] J. Aach and G. M. Church. Aligning gene expression time series with time warping

algorithms. Bioinformatics, 17(6):495-508, 2001.

[2] Adidas miCoach Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:

http://micoach.adidas.com/heartratemonitor.

[3] M. Adnane, Z. Jiang, and Z. Yan. Sleep-wake stages classification and sleep efficiency

estimation using single-lead electrocardiogram. Expert Syst. Appl., 39(1):1401–1413,

2012.

[4] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo. ECG beat detection using filter

banks. IEEE Trans. Biomed. Eng., 46(2):192–202, 1996.

[5] H. W. Agnew and W. B. Webb. Measurement of sleep onset by EEG criteria. Am. J. EEG

Technol., 12:127-134, 1972.

[6] M. Ahmadlou and H. Adeli. Visibility graph similarity: A new measure of generalized

synchronization in coupled dynamic systems. Physica D, 241(4):326-332, 2012.

[7] J. Alihanka, K. Vaahtoranta, and I. Saarikivi. A new method for long-term monitoring of

the ballistocardiogram, heart rate, and respiration. Am. J. Physiol. Regul. Integr. Comp.

Physiol., 240(5):R384-R392, 2012.

[8] T. Akerstedt, M. Billiard, M. Bonnet, G. Ficca, L. Garma, M. Mariotti, P. Salzarulo, and

H. Schulz. Awakening from sleep. Sleep Med. Rev., 6(4):267–286, 2002.

[9] T. Akerstedt, K. Hume, D. Minors, and J. Waterhouse. The meaning of good sleep: a lon-

gitudinal study of polysomnography and subjective sleep quality. J. Sleep Res., 3(3):152–

158, 1994.

[10] T. Akerstedt, K. Hume, D. Minors, and J. Waterhouse. Good sleep–its timing and physi-

ological sleep characteristics. J. Sleep Res., 6(4):221–229, 1997.

179

180 References

[11] T. Akerstedt, A. Knutsson, P. Westerholm, T. Theorell, L. Alfredsson, and G. Kecklund.

Sleep disturbances, work stress and work hours: a cross-sectional study. J. Psychosom.

Res., 53(3):741–748, 2002.

[12] M. Ako, T. Kawara, S. Uchida, S. Miyazaki, K. Nishihara, J. Mukai, K. Hirao, J. Ako and

Y. Okubo. Correlation between electroencephalography and heart rate variability during

sleep. Science, 57(1):59–65, 2003.

[13] S. Akselrod, D. Gordon, F. A. Ubel, D. C. Shannon, A. C. Berger, and R. J. Cohen.

Power spectrum analysis of heart rate fluctuation: a quantitative probe of beat-to-beat

cardiovascular control. Science, 213(4504):220–222, 1981.

[14] R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Rev. Mod.

Phys., 74:47, 2002.

[15] P. Alhola, P. Polo-Kantola. Sleep deprivation: impact on cognitive performance. Neu-

ropsychiatr. Dis. Treat., 3(5):553-567, 2007.

[16] J. Allen. Photoplethysmography and its application in clinical physiological measure-

ment. Physiol. Meas., 28(3):R1–R39, 2007.

[17] F. Amzica and M. Steriade. Electrophysiological correlates of sleep delta waves. Elec-

troencephalogr. Clin. Neurophysiol., 107(2):69–83, 1998.

[18] S. Ancoli-Israel, R. Cole, C. Alessi, M. Chambers, W. Moorcroft, and C. P. Pollak. The

role of actigraphy in the study of sleep and circadian rhythms. Sleep, 26(3):342-392,

2003.

[19] P. Anderer, G. Gruber, S. Parapatics, M. Woertz, T. Miazhynskaia, G. Klosch, B. Saletu,

J. Zeitlhofer, M. J. Barbanoj, H. Danker-Hopfe, S. L. Himanen, B. Kemp, T. Penzel, M.

Grozinger, D. Kunz, P. Rappelsberger, A. Schlogl, and G. Dorffner. An E-health solution

for automatic sleep classification according to Rechtschaffen and Kales: validation study

of the Somnolyzer 24×7 utilizing the Siesta database. Neuropsychobiology, 51(3):115–

133, 2005.

[20] P. Anderer, A. Moreau, M. Woertz, M. Ross, G. Gruber, S. Parapatics, E. Loretz, E.

Heller, A. Schmidt, M. Boeck, D. Moser, G. Kloesch, B. Saletu, G. M. Saletu-Zyhlarz,

H. Danker-Hopfe, J. Zeitlhofer, and G. Dorffner. Computer-assisted sleep classification

according to the standard of the American Academy of Sleep Medicine: validation study

of the AASM version of the Somnolyzer 24×7. Neuropsychobiology, 62(4):250–264,

2010.

[21] R. Armitage, M. Trivedi, R. Hoffmann and A. J. Rush. Relationship between objective

and subjective sleep measures in depressed patients and healthy controls. Depress. Anx-

iety , 5(2):97-102, 1997.

References 181

[22] M. Aydin, R. Altin, A. Ozeren, L. Kart, M. Bilge, and M. Unalacak. Cardiac auonomic

activity in obstructive sleep apnea: time-dependent and spectral analysis of heart rate

variability using 24-hour Holter electrocardiograms. Tex. Heart Inst. J., 31(2):132–136,

2004.

[23] E. Bagiella, R. P. Sloan, and D. F. Heitjan. Mixed-effects models in psychophysiology.

Psychophysiology. Psychophysiol., 37(1):13–20, 2000.

[24] A. Baharav, S. Kotagal, V. Gibbons, B. K. Rubin, G. Pratt, J. Karin, and S. Akselrod.

Fluctuations in autonomic nervous activity during sleep displayed by power spectrum

analysis of heart rate variability. Neurology, 45(6):1183–1187, 1995.

[25] R. Bailon, P. Laguna, L. Mainardi, and L. Sornmo. Analysis of heart rate variability

using time-varying frequency bands based on respiratory frequency. In Proc. 29th Ann.

Int. Conf. IEEE Eng. Med. Biol. Soc., pp. 6675–6678, Lyon, France, 2007.

[26] R. Bakeman and J. M. Gottman. Observing Interaction: An Introduction to Sequential

Analysis, 2nd edn., Cambridge University Press, 1986.

[27] S. Banks and D. F. Dinges. Behavioral and physiological consequences of sleep restric-

tion. J. Clin. Sleep Med., 3(5):519–528, 2007.

[28] A. Bar, G. Pillar, I. Dvir, J. Sheffy, R. P. Schnall, and P. Lavie. Evaluation of a portable

device based on peripheral arterial tone for unattended home sleep studies. Chest,

123(3):695–703, 2003.

[29] R. P. Bartsch, J. W. Kantelhardt, T. Penzel, and S. Havlin. Experimental evidence for

phase synchronization transitions in the human cardiorespiratory system. Phys. Rev.

Lett., 98(5):054102, 2007.

[30] R. P. Bartsch, A. Y. Schumann, J. W. Kantelhardt, T. Penzel, and P. Ch. Ivanov. Phase

transitions in physiologic coupling. Proc. Natl. Acad. Sci. U.S.A., 109(26):10181-10186,

2012.

[31] A. Bashan, R. P. Bartsch, J. W. Kantelhardt, and S. Havlin. Comparison of detrending

methods for fluctuation analysis. Physica A Stat. Mech. Appl., 387(21):5080–5090, 2008.

[32] M. Basner, B. Griefahn, U. Muller, G. Plath, and A. Samel. An ECG-based algorithm

for the automatic identification of autonomic activations associated with cortical arousal.

Sleep, 30(10):1349–11361, 2007.

[33] J. Behar, A. Roebuck, J. S. Domingos, E. Gederi, and G. D. Clifford. A review of current

sleep screening applications for smartphones. Physiol. Meas., 34(7):R29–R46, 2013.

[34] J. H. Benington and H. C. Heller. Restoration of brain energy metabolism as the function

of sleep. Prog. Neurobiol., 45(4):347–360, 1995.

182 References

[35] R. J. Berger and N. H. Phillips. Energy conservation and sleep. Behav. Brain Res., 69(1-

2):65–73, 1995.

[36] I. I. Berlad, A. Shlitner, S. Ben-Haim, and P. Lavie. Power spectrum analysis and heart

rate variability in stage 4 and REM Sleep: evidence for state-specific changes in auto-

nomic dominance. J. Sleep Res., 2(2):88–90, 1993.

[37] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series.

In Proc. Assoc. Advancement Artif. Intell. Workshop Knowl. Disc. Databases (AAAI-

KDD’94), pp. 359–370, 1994.

[38] R. B. Berry, R. Budhiraja, D. Gottlieb, D. Gozal, C. Iber, V. K. Kapur, C. L. Marcus, R.

Mehra, S. Parthasarathy, S. F. Quan, S. Redline, K. P. Strohl, S. L. Davidson Ward, and

M. M. Tangredi. Rules for scoring respiratory events in sleep: update of the 2007 AASM

manual for the scoring of sleep and associated events. J. Clin. Sleep Med., 8(5):597–619,

2012.

[39] R. B. Berry, R. Brooks, C. E. Gamaldo, S. M. Harding, C. L. Marcus, and B. V. Vaughn.

The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminol-

ogy and Technical Specifications, Version 2.0. American Academy of Sleep Medicine,

Darien, IL, 2012.

[40] C. Berthomier, X. Drouot, M. Herman-Stoıca, P. Berthomier, J. Prado, D. Bokar-Thire,

O. Benoit, J. Mattout, and M.-P. d’Ortho. Automatic analysis of single-channel sleep

EEG: validation in healthy individuals. Sleep, 30(11):1587–1595, 2007.

[41] H. Bettermann, D. Cysarz, and P. Van Leeuwen. Detecting cardiorespiratory coordi-

nation by respiratory pattern analysis of heart period dynamicsthe musical rhythm ap-

proach. Int. J. Bifurcation Chaos Appl. Sci. Eng., 10(10):2349–2360, 2000.

[42] A. M. Bianchi, L. Mainardi, E. Petrucci, M. G. Signorini, M. Mainardi, and S. Cerutti.

Time-variant power spectrum analysis for the detection of transient episodes in HRV

signal. IEEE Trans. Biomed. Eng., 40(2):136–144, 1993.

[43] A. M. Bianchi, L. T. Mainardi, C. Meloni, S. Chierchia, and S. Cerutti. Continuous

monitoring of the sympatho-vagal balance through spectral analysis. IEEE Eng. Med.

Biol. Mag., 16(5):64–73, 1997.

[44] A. Boardman, F. S. Schlindwein, A. P. Rocha, and A. Leite. A study on the optimum

order of autoregressive models for heart rate variability. Physiol. Meas., 23(2):325–336,

2002.

[45] M. H. Bonnet. Effect of sleep disruption on sleep, performance, and mood. Sleep,

8(1):11–19, 1985.

References 183

[46] M. H. Bonnet and D. L. Arand. Heart rate variability: sleep stage, time of night, and

arousal influences. Electroenceph. Clin. Neurophysiol., 102(5):390–396, 1997.

[47] M. H. Bonnet and D. L. Arand. Clinical effects of sleep fragmentation versus sleep de-

privation. Sleep Med. Rev., 7(4):297–310, 2003.

[48] A. A. Borbely and P. Achermann. Sleep homeostasis and models of sleep regulation. J.

Biol. Rhythms., 14(6):559-570, 1999.

[49] G. Brandenberger, A. U. Viola, J. Ehrhart, A. Charloux, B. Geny, F. Piquard, and C.

Simon. Age-related changes in cardiac autonomic control during sleep. J. Sleep Res.,

12(3):173-180, 2003.

[50] D. Bratsun, D. Volfson, L. S. Tsimring, and J. Hasty. Delay-induced stochastic oscilla-

tions in gene regulation. Proc. Natl. Acad. Sci. U.S.A., 102(41):14593-14598, 2005.

[51] M. Bresler, K. Sheffy, G. Pillar, M. Preiszler, and S. Herscovici. Differentiating be-

tween light and deep sleep stages using an ambulatory device based on peripheral arterial

tonometry. Physiol. Meas., 29(5):571–584, 2008.

[52] G. de Bruijne, P. Sommen, and R.M. Aarts. Detection of epileptic seizures through audio

classification. In 4th Eur. Congr. Int. Fed. Med. Biol. Eng. (IFMBE’08), pp. 1450–1454,

Antwerpen, Belgium, 2008.

[53] M. Bsoul, H. Minn, M. Nourani, G. Gupta, and L. Tamil. Real-time sleep quality as-

sessment using single-lead ECG and multi-stage SVM classifier. In Proc. 32nd Ann. Int.

Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 1178–1181, Buenos Aires, Argentina,

2010.

[54] A. Bunde, S. Havlin, J. W. Kantelhardt, T. Penzel, J.-H. Peter, and K. Voigt. Corre-

lated and uncorrelated regions in heart-rate fluctuations during sleep. Phys. Rev. Lett.,

85(17):3736, 2000.

[55] H. J. Burgess, A. L. Holmes, and D. Dawson. The relationship between slow-wave activ-

ity, body temperature, and cardiac activity during nighttime sleep. Sleep, 24(3):343–349,

2001.

[56] H. J. Burgess, T. Sletten, N. Savic, S. S. Gilbert, and D. Dawson. Effects of bright light

and melatonin on sleep propensity, temperature, and cardiac activity at night. J. Appl.

Physiol., 91(3):1214–1222, 2001.

[57] H. J. Burgess, J. Trinder, Y. Kim, and D. Luke. Sleep and circadian influences on cardiac

autonomic nervous system activity. Am. J. Physiol. Heart Circ. Physiol., 273(4):H1761–

H1768, 1997.

[58] R. L. Burr. Interpretation of normalized spectral heart rate variability indices in sleep

research: a critical review. Sleep, 30(7):913–919, 2007.

184 References

[59] P. Busek, J. Vankova, J. Opavsky, J. Salinger, and S. Nevsımalova. Spectral analysis of

the heart rate variability in sleep. Physiol. Res., 54(4):369–376, 2005.

[60] D. J. Buysse, C. F. Reynolds, T. H. Monk, S. R. Berman, and D. J. Kupfer. The Pittsburgh

Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry

Res., 28(2):193–213, 1989.

[61] G. BuzsAk. Memory consolidation during sleep: a neurophysiological perspective. J.

Sleep Res., 7(S1):17–23, 1998.

[62] C. Cajochen, J. Pischke, D. Aeschbach, and A. A. Borbely. Heart rate dynamics during

human sleep. Physiol. Behav., 55(4):767–774, 1994.

[63] M. A. Carskadon and W. C. Dement. Normal human sleep: an overview. In Principles

and Practice of Sleep Medicine, edited by M. H. Kryger, T. Roth, and W. C. Dement,

Chap. 2, pp. 16-26, Elsevier Saunders, St. Louis, 2011.

[64] N. Carter, R. Henderson, S. Lai, M. Hart, S. Booth, and S. Hunyor. Cardiovascular and

autonomic response to environmental noise during sleep in night shift workers. Sleep,

25(4):457-464, 2002.

[65] Centers for Disease Control and Prevention (CDC). Perceived insufficient rest or sleep

among adults – United States, 2008. MMWR Morb. Mortal. Wkly. Rep., 58(42):1175-

1179, 2009.

[66] W. Chen, X. Zhu, T. Nemoto, Y. Kanemitsu, K. Kitamura, and K. Yamakoshi. Uncon-

strained detection of respiration rhythm and pulse rate with one under-pillow sensor dur-

ing sleep. Med. Biol. Eng. Comput., 43(2):306-312, 2005.

[67] N. S. Cherniack. Respiratory dysrhythmias during sleep. N. Engl. J. Med., 305(6):325–

330, 1981.

[68] B. H. Choi, G. S. Chung, J.-S. Lee, D.-U. Jeong, and K. S. Park. Slow-wave sleep

estimation on a load-cell-installed bed: a non-constrained method. Physiol. Meas.,

30(11):1163–1170, 2009.

[69] G. S. Chung, B. H. Choi, J.-S. Lee, J. S. Lee, D.-U. Jeong, and K. S. Park. REM sleep

estimation only using respiratory dynamics. Physiol. Meas., 30(12):1327–1340, 2009.

[70] G. S. Chung, J. S. Lee, S. H. Hwang, Y. K. Kim, D.-U. Jeong, and K. S. Park. Wake-

fulness estimation only using ballistocardiogram: Nonintrusive method for sleep moni-

toring. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 2459–

2462, Buenos Aires, Argentina, 2010.

[71] D. Clifford, G. Stone, I. Montoliu, S. Rezzi, F. P. Martin, P. Guy, S. Bruce, and

S. Kochhar. Alignment using variable penalty dynamic time warping. Anal. Chem.,

81(3):1000-1007, 2009.

References 185

[72] J. Cohen. A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20(1):37–

46, 1960.

[73] M. A. Cohn, A. S. Rao, M. Broudy, S. Birch, H. Watson, N. Atkins, B. Davis, F. D.

Stott, and M. A. Sackner. The respiratory inductive plethysmograph: a new non-invasive

monitor of respiration. Bull. Eur. Physiopathol. Respir., 18(4):643–658, 1982.

[74] R. J. Cole, D. F. Kripke, W. Gruen, D. J. Mullaney, and J. C. Gillin. Automatic

sleep/wake identification from wrist activity. Sleep, 15(5):461–469, 1992.

[75] M. Costa, A. L. Goldberger, and C.-K. Peng. Multiscale entropy analysis of biological

signals. Phys. Rev. E, 71(2):021906, 2005.

[76] L. J. Cronbach. Research in classrooms and schools: Formulation of questions, designs

and analysis, Occasional Paper, Stanford Evaluation Consortium, 1976.

[77] D. Cysarz, H. Bettermann, S. Lange, D. Geue, and P. Van Leeuwen. A quantitative com-

parison of different methods to detect cardiorespiratory coordination during night-time

sleep. Biomed. Eng. Online, 3:44, 2004.

[78] D. Cysarz, H. Bettermann, and P. Van Leeuwen. Entropies of short binary sequences in

heart period dynamics. Am. J. Physiol. Heart Circ. Physiol., 278(6):H2163–2172, 2000.

[79] D. Cysarz, R. Zerm, H. Bettermann, M. Fruhwirth, M. Moser, and M. Kroz. Comparison

of respiratory rates derived from heart rate variability, ECG amplitude, and nasal/oral

airflow. Ann. Biomed. Eng., 36(12):2085–2094, 2008.

[80] Y. Dagan. Circadian rhythm sleep disorders (CRSD). Sleep Med. Rev., 6(1):45–54, 2002.

[81] H. Danker-Hopfe, P. Anderer, J. Zeitlhofer, M. Boeck, H. Dorn, G. Gruber, E. Heller,

E. Loretz, D. Moser, S. Parapatics, B. Saletu, A. Schmidt, and G. Dorffner. Interrater

reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM

standard. J. Sleep. Res., 18(1):74–84, 2009.

[82] H. Davis, P. A. Davis, A. L. Loomis, E. N. Harvey, and G. Hobart. Human brain poten-

tials during the onset of sleep. J. Neurophysiol., 1:24-38, 1938.

[83] J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In

Proc. 23rd Int. Conf. Machine Learn. (ICML’06), pp. 223–240, Pittsburgh, PA, 2006.

[84] C. De Boor. A Practical Guide to Splines, Springer-Verlag, New York, NY, 2001.

[85] P. De Chazal, N. Fox, E. O’Hare, C. Heneghan, A. Zaffaroni, P. Boyle, S. Smith, C.

O’Connell, and W. T. McNicholas. Sleep/wake measurement using a non-contact biomo-

tion sensor. J. Sleep Res., 20(2):356-366, 2011.

186 References

[86] S. De Franciscis, S. Johnson, and J. J. Torres. Enhancing neural-network performance

via assortativity. Phys. Rev. E, 83:036114, 2011.

[87] L. De Souza, A. A. Benedito-Silva, M. L. N. Pires, D. Poyares, S. Tufik, and H. M. Calil.

Further validation of actigraphy for sleep studies. Sleep, 26(1):81–85, 2003.

[88] W. Dement and N. Kleitman. The relation of eye movements during sleep to dream

activity: an objective method for the study of dreaming. J. Exp. Psychol., 53(5):339–

346, 1957.

[89] S. Devot, R. Dratwa, and E. Naujokat. Sleep/wake detection based on cardiorespira-

tory signals and actigraphy. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.

(EMBC’10), pp. 5089–5092, Buenos Aires, Argentina, 2010.

[90] M. Dhamala, V. K. Jirsa, and M. Ding. Enhancement of neural synchrony by time delay.

Phys. Rev. Lett., 92(7):074104, 2004.

[91] D. J. Dijk. Sleep-wave sleep: characteristics and homeostatic regulation. In Slow-Wave

Sleep: Beyond Insomnia – The Importance of Slow-Wave Sleep for Your Patients, edited

by T. Roth and D. J. Dijk, Wolters Kluwer Health Pharma Solutions, London, UK, 2010.

[92] D. J. Dijk. Slow-wave sleep, diabetes, and the sympathetic nervous system. Proc. Natl.

Acad. Sci. U.S.A., 105(4):1107-1108, 2008.

[93] M. Di Rienzo, F. Rizzo, G. Parati, G. Brambilla, M. Ferratini, and P. Castiglioni. MagIC

system: a new textile-based wearable device for biological signal monitoring. applicabil-

ity in daily life and clinical setting. In Proc. 27nd Ann. Int. Conf. IEEE Eng. Med. Biol.

Soc. (EMBC’05), pp. 7167–7169, Shanghai, China, 2005.

[94] A. Domingues, T. Paiva, and J. M. Sanches. Hypnogram and sleep parameter computa-

tion from activity and cardiovascular data. IEEE Trans. Biomed. Eng., 61(6):1711–1719,

2014.

[95] N. J. Douglas, D. P. White, C. K. Pickett, J. V. Weil, and C. W. Zwillich. Respiration

during sleep in normal man. Thorax, 37(11):840–844, 1982.

[96] R. V. Donner, Y. Zou, J. F. Donges, N. Marwan, and J. Kurths. Recurrence networksa

novel paradigm for nonlinear time series analysis. New J. Phys., 12:033025, 2010.

[97] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd edn., Wiley-

Interscience Press, 2000.

[98] J. S. Durmer and D. F. Dinges. Neurocognitive consequences of sleep deprivation. Semin.

Neurol., 25(1):117–129, 2005.

References 187

[99] F. Ebrahimi, S.-K. Setarehdan, J. Ayala-Moyeda, and H. Nazeran. Automatic sleep stag-

ing using empirical mode decomposition, discrete wavelet transform, time-domain, and

nonlinear dynamics features of heart rate variability signals. Comput. Methods Programs

Biomed., 112(1):47–57, 2013.

[100] V. M. Eguıluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, and A. V. Apkarian. Scale-free

brain functional networks. Phys. Rev. Lett., 94(1):018102, 2005.

[101] S. Elsenbruch, M. J. Harnish, and W. C. Orr. Heart rate variability during waking and

sleep in healthy males and females. Sleep, 22(8):1067–1071, 1999.

[102] P. A. Estevez, C. M. Held, C. A. Holzmann, C. A. Perez, J. P. Perez, J. Heiss, M. Garrido,

and P. Peirano. Polysomnographic pattern recognition for automated classification of

sleep-waking states in infants. Med. Biol. Eng. Comput., 40(1):105–113, 2002.

[103] T. Fawcett. ROC graphs: notes and practical considerations for researchers, Tech. Rep.

HP Labs, Palo Alto, CA, 2004.

[104] J. Fell, J. Roschke, K. Mann, and C. Schaffner. Discrimination of sleep stages: a com-

parison between spectral and nonlinear EEG measures. Electroencephalogr. Clin. Neu-

rophysiol., 98(5):401–410, 1996.

[105] Fitbit ONE Wireless Activity and Sleep Tracker (retrieved in Jan. 2015). [Online] Avail-

able: https://www.fitbit.com./one.

[106] M. Folke, L. Cernerud, M. Ekstrom, and B. Hok. Critical review of non-invasive respi-

ratory monitoring in medical care. Med. Biol. Eng. Comput., 41(4):377–383, 2003.

[107] P. Fonseca, R. M Aarts, J. Foussier, and X. Long. A novel low-complexity post-

processing algorithm for precise QRS localization. SpringerPlus, 3:376, 2014.

[108] J. Foussier, P. Fonseca, X. Long, and S. Leonhardt. Automatic feature selection for

sleep/wake classification with small data sets. In 7th Int. Joint Conf. Biomed. Eng. Syst.

Technol. (BIOSTEC’13), pp. 178–184, Barcelona, Spain, 2013.

[109] B. L. Frankel, R. D. Coursey, R. Buchbinder, and F. Snyder. Recorded and reported sleep

in chronic primary insomnia. Arch. Gen. Psychiatry, 33(5):615–623, 1976.

[110] J. H. Friedman. Regularized discriminant analysis. J. Am. Stat. Assoc., 84(405):165–175,

2012.

[111] P. M. Fuller and C. J. Amlaner (eds.). SRS Basics of Sleep Guide, 2nd ed., Sleep Research

Society, Darien, IL, 2011.

[112] J. M. Gaillard. Chronic primary insomnia: possible physiopathological involvement of

slow wave sleep deficiency. Sleep, 1(2):133–147, 1978.

188 References

[113] A. S. Gami, D. E. Howard, E. J. Olson, and V. K. Somers. Daynight pattern of sudden

death in obstructive sleep apnea. N. Engl. J. Med., 352(12):1206–1214, 2005.

[114] G. Garcia-Molina, M. Bellesi, S. Pastoor, S. Pfundtner, B. Riedner, and G. Tononi. On-

line single EEG channel based automatic sleep staging. Eng. Psychol. Cogn. Erg. Appl.

Serv. LNCS, 8020:333–342, 2013.

[115] I. Gath and E. Bar-On. Computerized method for scoring of polygraphic sleep record-

ings. Comput. Prog. Biomed., 11(3):217–223, 1980.

[116] D. R. Goodenough, H. B. Lewis, A. Shapiro, L. Jaret, and I. Sleser. Dream reporting

following abrupt and gradual awakenings from different types of sleep. J. Pers. Soc.

Psychol., 2(2):170–179, 1965.

[117] Y. Goren, L. R. Davrath, I. Pinhas, E. Toledo, and S. Akselrod. Individual time-dependent

spectral boundaries for improved accuracy in time-frequency analysis of heart rate vari-

ability. IEEE Trans. Biomed. Eng., 53(1):35–42, 2006.

[118] P. Grossman, F. H. Wilhelm, and M. Spoerle. Respiratory sinus arrhythmia, cardiac va-

gal control, and daily activity. Am. J. Physiol. Heart Circ Physiol., 287(2):H728–H734,

2004.

[119] C. Guilleminault, J. G. Briskin, M. S. Greenfield, and R. Silvestri. The impact of au-

tonomic nervous system dysfunction on breathing during sleep. Sleep, 4(3):263–278,

1981.

[120] C. Guilleminault, A. Tilkian, K. Lehrman, L. Forno, and W. C. Dement. Sleep apnoea

syndrome: state of sleep and autonomic dysfunction. J. Neurol. Neurisurg. Psychiatry,

40(7):718–725, 1977.

[121] M. A. Hall. Correlation-based feature selection for machine learning. Ph.D. dissertation,

Dept. Computer Science, The Univ. of Waikato, Hamilton, New Zealand, 1999.

[122] M. Hall, R. Vasko, D. Buysse, H. Ombao, Q. Chen, J. D. Cashmere, D. Kupfer, and

J. F. Thayer. Acute stress affects heart rate variability during sleep. Psychosom. Med.,

66(1):56–62, 2004.

[123] P. S. Hamilton. Open Source ECG Analysis. In Computing in Cardiology (CinC),

pp. 101–104, Memphis, TN, 2002.

[124] P. S. Hamilton and W. J. Tompkins. Quantitative investigation of QRS detection rules

using the MIT/BIH arrhythmia database. IEEE Trans. Biomed. Eng., 33(12):1157–1165,

1986.

[125] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng.,

21(9):1263-1284, 2009.

References 189

[126] J. Hedner, G. Pillar, S. D. Pittman, D. Zou, L. Grote, and D. P. White. A novel adap-

tive wrist actigraphy algorithm for sleep-wake assessment in sleep apnea patients. Sleep,

27(8):1560-1566, 2004.

[127] J. Hedner, D. P. White, A. Malhotra, S. Herscovici, S. D. Pittman, D. Zou, L. Grote, and

G. Pillar. Sleep staging based on autonomic signals: a multi-center validation study. J.

Clin. Sleep Med., 7(3):301–306, 2011.

[128] A. Heinrich, F. Van Heesch, B. Puvvula, and M. Rocque. Video based actigraphy and

breathing monitoring from the bedside table of shared beds. J. Ambient. Intell. Human

Comput., 6(1):107–120, 2015.

[129] R. C. Heinzer and F. Series. Normal physiology of the upper and lower airways. In Prin-

ciples and practice of sleep medicine, edited by M. H. Kryger, T. Roth, W. C. Dement,

pp. 581–596, Saunders Elsevier, St. Louis, MO, 2011.

[130] E. Hellinger. Neue begrundung der theorie quadratischer formen von unendlichvielen

veranderlichen. J. fur die Reine und Angew Math., 36:210–271, 1909.

[131] S. Herscovici, A. Pe’er, S. Papyan, P. Lavie. Detecting REM sleep from the finger: an

automatic REM sleep algorithm based on peripheral arterial tone (PAT) and actigraphy.

Physiol. Meas., 28(2):129–140, 2007.

[132] R. L. Horner. Autonomic consequences of arousal from sleep: mechanisms and implica-

tions. Sleep, 19(10 Suppl.):S193-195, 1996.

[133] D. Horvatic, H. E. Stanley, and B. Podobnik. Detrended cross-correlation analysis for

non-stationary time series with periodic trends. Eur. Phys. Lett., 94(1):18007, 2011.

[134] J. J. Hox. Multilevel Analysis: Techniques and Applications, 2nd edn., Routledge, 2010.

[135] D. W. Hudgel, R. J. Martin, B. Johnson, and P. Hill. Mechanics of the respiratory system

and breathing pattern during sleep in normal humans. J. Appl. Physiol. Respir. Environ.

Exerc. Physiol., 56(1):133–137, 1984.

[136] C. Iber, S. Ancoli-Israel, A. L. Chesson, and S. F. Quan. The AASM Manual for the Scor-

ing of Sleep and Associated Events: Rules, Terminology and Technical Specifications.

American Academy of Sleep Medicine, Westchester, IL, 2007.

[137] Y. Ichimaru, K. P. Clark, J. Ringler, and W. J. Weiss. Effect of sleep stage on the relation-

ship between respiration and heart rate variability. In Computers in Cardiology (CinC),

pp. 657–660, Chicago, IL, 1990.

[138] S. M. Isa, I. Wasito, and A. M. Arymurthy. Kernel dimensionality reduction on sleep

stage classification using ECG signal. Int. J. Comput. Spec. Iss., 8(4):115–123, 2011.

190 References

[139] N. Iyengar, C. K. Peng, R. Morin, A. L. Goldberger, and L. A. Lipsitz. Age-related

alterations in the fractal scaling of cardiac interbeat interval dynamics. Am. J. Physiol.

Regul. Integr. Comp. Physiol., 271(4):R1078–1084, 1996.

[140] B. H. Jansen and K. Shankar. Sleep staging with movement-related signals. Int. J.

Biomed. Comput., 32(3-4):289-297, 1993.

[141] J. J. Liu, W. Xu, M.-C. Huang, N. Alshurafa, M. Sarrafzadeh, N. Raut, and B. Yadegar.

Sleep posture analysis using a dense pressure sensitive bedsheet. Perv. Mobile Comput.,

10(2):34-50, 2014.

[142] S. Jasson, C. Medigue, P. Maison-Blanche, N. Montano, L. Meyer, C. Vermeiren, P.

Mansier, P. Coumel, and A. Malliani. Instant power spectrum analysis of heart rate

variability during orthostatic tilt using a time-/frequency-domain method. Circulation,

96(10):3521–3526, 1997.

[143] Jawbone UP Fitness Trackers (retrieved in Jan. 2015). [Online] Available:

http://www.jawbone.com/up.

[144] S. Jiang, C. H. Bian, X. B. Ning, and Q. D. Y. Ma. Visibility graph analysis on heartbeat

dynamics of meditation training. Appl. Phys. Lett., 102(25):253702, 2013.

[145] D. W. Jung, S. H. Hwang, H. N. Yoon, Y.-J. G. Lee, D.-U. Jeong, and K. S. Park. Noc-

turnal awakening and sleep efficiency estimation using unobtrusively measured ballisto-

cardiogram. IEEE Trans. Biomed. Eng., 61(1):131–138, 2013.

[146] F. Jurysta, P. Van De Borne, P.-F. Migeotte, M. Dumont, J.-P. Lanquart, J.-P. Degaute,

P. Linkowski. A study of the dynamic interactions between sleep EEG and heart rate

variability in healthy young men. Clin. Neurophysiol., 114(11):2146–2155, 2003.

[147] M. M. Kabir, H. Dimitri, P. Sanders, R. Antic, E. Nalivaiko, D. Abbott, and M. Baumert.

Cardiorespiratory phase-coupling is reduced in patients with obstructive sleep apnea.

Plos One, 5(5):e10602, 2010.

[148] J. W. Kantelhardt, E. Koscielny-Bunde, H. H. A. Rego, S. Havlin, and A. Bunde. Detect-

ing long-range correlations with detrended fluctuation analysis. Physica A Stat. Mech.

Appl., 295(3-4):441–454, 2001.

[149] J. W. Kantelhardt, T. Penzel, S. Rostig, H. F. Becker, S. Havlin, and A. Bunde. Breathing

during REM and non-REM sleep: correlated versus uncorrelated behaviour. Physica A

Stat. Mech. Appl., 319:447–457, 2003.

[150] W. Karlen, C. Mattiussi, and D. Floreano. Improving actigraph sleep/wake classification

with cardio-respiratory signals. In Proc. 30th Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.

(EMBC), pp. 5262-5265, Vancouver, Canada, 2008.

References 191

[151] W. Karlen, C. Mattiussi, and D. Floreano. Sleep and wake classification with ECG and

respiratory effort signals. IEEE Trans. Biomed. Circuits Syst., 3(2):71–78, 2009.

[152] E. Keogh and M. Pazzani. Scaling up dynamic time warping for datamining applications.

In Proc. 6th Assoc. Comput. Mach. SIG Knowl. Discovery Data Mining (ACM SIGKDD),

pp. 285-289, 2000.

[153] L. Keselbrener and S. Akselrod. Selective discrete Fourier transform algorithm for time-

frequency analysis: method and application on simulated and cardiovascular signals.

IEEE Trans. Biomed. Eng., 43(8):789–802, 1996.

[154] J. W. Kim, J.-S. Lee, P. A. Robinson, and D.-U. Jeong. Markov analysis of sleep dynam-

ics. Phys. Rev. Lett., 102(17):178104, 2009.

[155] S. Kim, S. H. Park, and C. S. Ryu. Multistability in coupled oscillator systems with time

delay. Phys. Rev. Lett., 79(15):2911, 1997.

[156] R. J. Kimoff. Sleep fragmentation in obstructive sleep apnea. Sleep, 19(Suppl.9):S61-

S66, 1996.

[157] M. T. Kinlaw and A. W. Hunt. Time dependence of delayed neutron emission for fission-

able isotope identification. Appl. Phys. Lett., 86(25):254104, 2005.

[158] T. Kirjavainen, D. Cooper, O. Polo, and C. E. Sullivan. Respiratory and body movements

as indicators of sleep stage and wakefulness in infants and young children. J. Sleep Res.,

5(3):186-194, 1996.

[159] A. Kishi, Z. R. Struzik, B. H. Natelson, F. Togo, and Y. Yamamoto. Dynamics of sleep

stage transitions in healthy humans and patients with chronic fatigue syndrome. Am. J.

Physiol. Regul. Integr. Comp. Physiol., 294(6):R1980-R1987, 2008.

[160] G. Klosch, B. Kemp, T. Penzel, A. Schlogl, P. Rappelsberger, E. Trenker, G. Gruber,

J. Zeitlhofer, B. Saletu, W. M. Herrmann, S. L. Himanen, D. Kunz, M. J. Barbanoj,

J. Roschke, A. Varri, and G. Dorffner. The SIESTA project polygraphic and clinical

database. IEEE Eng. Med. Biol. Mag., 20(3):51–57, 2001.

[161] J. M. Kortelainen, M. O. Mendez, A. M. Bianchi, M. Matteucci, and S. Cerutti. Sleep

staging based on signals acquired through bed sensor. IEEE Trans Inf. Technol. Biomed.,

14(3):776-785, 2010.

[162] Z. M. Kovacs-Vajna. A fingerprint verification system based on triangular matching

and dynamic time warping. IEEE Trans. Pattern Anal. Mach. Intell., 22(11):1266-1276,

2000.

[163] J. Krieger. Breathing during sleep in normal subjects. Clin. Chest Med., 6(4):577-594,

1985.

192 References

[164] A. D. Krystal and J. D. Edinger. Measuring sleep quality. Sleep Med., 9(Suppl.1):S10-

S17, 2008.

[165] J. M. Krueger, D. M. Rector, S. Roy, H. P. Van Dongen, G. Belenky, and J. Panksepp.

Sleep as a fundamental property of neuronal assemblies. Nat. Rev. Neurosci., 9(12):910-

919, 2008.

[166] Y. M. Kuo, J. S. Lee, and P. C. Chung. A visual context-awareness-based sleeping-

respiration measurement system. IEEE Trans. Inf. Technol. Biomed., 14(2):255-265,

2010.

[167] Y. Kurihara and K. Watanabe. Sleep-stage decision algorithm by using heartbeat and

body-movement signals. IEEE Trans. Syst. Man. Cybern. A Syst. Hum., 42(6):1450-

1459, 2012.

[168] C. A. Kushida, M. R. Littner, T. Morgenthaler, C. A. Alessi, D. Bailey, J. Coleman, L.

Friedman, M. Hirshkowitz, S. Kapen, M. Kramer, T. Lee-Chiong, D. L. Loube, J. Owens,

J. P. Pancer, and M. Wise. Practice parameters for the indications for polysomnography

and related procedures: an update for 2005. Sleep, 28(4):499-521, 2005.

[169] L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. C. Nuno. From time series to com-

plex networks: the visibility graph. Proc. Natl. Acad. Sci. U.S.A., 105(13):4972-4975,

2008.

[170] D. K. Lake, J. R. Moorman, and H. Cao. Sample entropy estimation using sampen. In

PhysioNet (May 2014), [Online] Available: http://physionet.org/physiotools/sampen.

[171] D. E. Lake, J. S. Richman, M. P. Griffin, and J. R. Moorman. Sample entropy anal-

ysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol.,

283(3):R789-797, 2002.

[172] J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical

data. Biometrics, 33(1):159–174, 1977.

[173] L. E. Larsen and D. O. Walter. On automatic methods of sleep staging by EEG spectra.

Electroencephalogr. Clin. Neurophysiol., 28(5):459-467, 1970.

[174] J. Lazaro, E. Gil, R. Bailon, A. Minchole, and P. Laguna. Deriving respiration from

photoplethysmographic pulse width. Med. Biol. Eng. Comput., 51(1-2):233–242, 2013.

[175] K. L. Lichstein, K. C. Stone, J. Donaldson, S. D. Nau, J. P. Soeffing, D. Murray, K. W.

Lester, and R. N. Aguillard. Actigraphy validation with insomnia. Sleep, 29(2):232–239,

2006.

[176] S. S. Lobodzinski and M. M. Laks. New devices for very long-term ECG monitoring.

Cardiol. J., 19(2):210–214, 2012.

References 193

[177] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Using dynamic time

warping for sleep and wake discrimination. In Proc. IEEE-EMBS Int. Conf. Biomed.

Health Inf. (BHI), pp. 886–889, Hong Kong and Shenzhen, China, 2012.

[178] X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Time-frequency analysis

of heart rate variability for sleep and wake classification. In Proc. 12nd IEEE Int. Conf.

BioInform. BioEng. (BIBE), pp. 85–90, Larnaca, Cyprus, 2012.

[179] X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Spectral boundary adap-

tation on heart rate variability for sleep and wake classification. Int. J. Artif. Intell. Tools,

23(3):1460002, 2014.

[180] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-

fication with actigraphy and respiratory effort using dynamic warping. IEEE J. Biomed.

Health Inform., 18(4):1272-1284, 2014.

[181] X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Respiration amplitude

analysis for REM and NREM sleep classification. In Proc. 35th Ann. Int. Conf. IEEE

Eng. Med. Biol. Soc. (EMBC), pp. 5017–5020, Osaka, Japan, 2013.

[182] X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing respiratory ef-

fort amplitude for automated sleep stage classification. Biomed. Signal Process. Control,

14:197–205, 2014.

[183] X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling cardiorespira-

tory interaction during sleep with complex networks. Appl. Phys. Lett., 105(20):203701,

2014.

[184] X. Long, R. Haakma, R. M. Aarts, P. Fonseca, and J. Foussier. Between-laboratory and

demographic effects on heart rate and its variability during sleep. In Workshop Smart

Healthcare and Healing Enviornments (SHHE), Eindhoven, The Netherlands, 2014.

[185] X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Measuring dissimilarity between respiratory effort signals based on uniform scaling for

sleep staging. Physiol. Meas., 35(12):2529–2542, 2014.

[186] X. Long, P. Fonseca, R. Haakma, J. Foussier, and R. M. Aarts. Automatic detection of

overnight deep sleep based on heart rate variability: a preliminary study. In Proc. 36th

Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), pp. 50–53, Chicago, IL, 2014.

[187] J. Lu, D. Sherman, M. Devor, and C. B. Saper. A putative flipflop switch for control of

REM sleep. Nature, 441(7093):589–594, 2006.

[188] B. Luque, L. Lacasa, F. Ballesteros, and J. Luque. Horizontal visibility graphs: exact

results for random time series. Phys. Rev. E, 80:046103, 2009.

194 References

[189] D. C. Mack, J. T. Patrie, P. M. Suratt, R. A. Felder, and M. A. Alwan. Development

and preliminary validation of heart rate and breathing rate detection using a passive,

ballistocardiography-based sleep monitoring system. IEEE Trans. Inf. Technol. Biomed.,

13(1):111-120, 2009.

[190] A. Malliani, M. Pagani, F. Lombardi, and S. Cerutti. Cardiovascular neural regulation

explored in the frequency domain. Circulation, 84(2):482–492, 1991.

[191] P. Maquet, C. Degueldre, G. Delfiore, J. Aerts, J. M. Peters, A. Luxen, and G. Franck.

Functional neuroanatomy of human slow wave sleep. J. Neurosci., 17(8):2807–2812,

1997.

[192] L. Marshall, H. Helgadottir, M. Molle, and J. Born. Boosting slow oscillations during

sleep potentiates memory. Nature, 444(7119):610–613, 2006.

[193] M. Massimini, F. Ferrarelli, S. K. Esser, B. A. Riedner, R. Huber, M. Murphy, M. J. Pe-

terson, and G. Tononi. Triggering sleep slow waves by transcranial magnetic stimulation.

Proc. Natl. Acad. Sci. U.S.A., 104(20):8496–8501, 2007.

[194] G. Matthews, B. Sudduth, and M. Burrow. A non-contact vital signs monitor. Crit. Rev.

Biomed. Eng., 28(12):173-178, 2000.

[195] P. Meerlo, A. Sgoifo, and D. Suchecki. Restricted and disrupted sleep: Effects on auto-

nomic function, neuroendocrine stress systems and stress responsivity. Sleep Med. Rev.,

12(3):197-210, 2008.

[196] J. Mendels and D. R. Hawkins. Sleep laboratory adaptation in normal subjects

and depressed patients (“first night effect”). Electroencephalogr. Clin. Neurophysiol.,

22(6):556-558, 1967.

[197] M. O. Mendez, M. Matteucci, V. Castronovo, L. Ferini-Strambi, S. Cerutti, and A. M.

Bianchi. Sleep staging from heart rate variability: time-varying spectral features and

hidden Markov models. Int. J. Biomed. Eng. Technol., 3(3-4):246–263, 2010.

[198] M. O. Mendez, M. Migliorini, J. M. Kortelainen, D. Nistico, E. Arce-Santana, S. Cerutti,

and A. M. Bianchi. Evaluation of the sleep quality based on bed sensor signals: Time-

variant analysis. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10),

pp. 3994–3997, Buenos Aires, Argentina, 2010.

[199] N. Meziane, J. G. Webster, M. Attari, and A. J. Nimunkar. Dry electrodes for electrocar-

diography. Physiol. Meas., 34(9):R47–R69, 2013.

[200] M. Migliorini, A. M. Bianchi, D. Nistico, J. Kortelainen, E. Arce-Santana, S. Cerutti,

and M. O. Mendez. Automatic sleep staging based on ballistocardiographic signals

recorded through bed sensors. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.

(EMBC’10), pp. 3273–3276, Buenos Aires, Argentina, 2010.

References 195

[201] Mio Alpha Intensive Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:

http://www.mioglobal.com.

[202] N. Montano, T. G. Ruscone, A. Porta, F. Lombardi, M. Pagani, and A. Malliani. Power

spectrum analysis of heart rate variability to assess the changes in sympathovagal balance

during graded orthostatic tilt. Circulation, 90:1826–1834, 1994.

[203] G. B. Moody, R. G. Mark, A. Zoccola, and S. Mantero. Derivation of respiratory signals

from multi-lead ECGs. In Computers in Cardiology (CinC), pp. 113–116, Linkoping,

Sweden, 1985.

[204] T. Morgenthaler, C. Alessi, L. Friedman, J. Owens, V. Kapur, B. Boehlecke, T. Brown,

A. Chesson, J. Coleman, T. Lee-Chiong, J. Pancer, and T. J. Swick. Practice parameters

for the use of actigraphy in the assessment of sleep and sleep disorders: An update for

2007. Sleep, 30(4):519-0529, 2007.

[205] M. Muller. Part 1: Analysis and retrieval techniques for music data – Dynamic time

warping. In Information Retrieval for Music and Motion, Chap. 4, pp. 69-84, Springer-

Verlag, Berlin, Germany, 2007.

[206] A. Muzet. Environmental noise, sleep and health. Sleep Med. Rev., 11(2):135-142, 2007.

[207] K. Narkiewicz, N. Montano, C. Cogliati, P. J. H. Van De Borne, M. E. Dyken, and V.

K. Somers. Altered cardiovascular variability in obstructive sleep apnea. Circulation,

98(11):1071–1077, 1998.

[208] V. Natale, M. Drejak, A. Erbacci, L. Tonetti, M. Fabbri, and M. Martoni. Monitoring

sleep with a smartphone accelerometer. Sleep Biol. Rhythm., 10(4):287–292, 2012.

[209] E. P. Neuburg. Frequency warping by dynamic programming. In Proc. IEEE Int. Conf.

Acoust. Speech Signal Process. (ICASSP), pp. 573-575, New York, NY, 1988.

[210] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, 2002.

[211] M. E. J. Newman. The structure and function of complex networks. SIAM Rev.,

45(2):167-256, 2003.

[212] M. E. J. Newman and J. Park. Why social networks are different from other types of

networks. Phys. Rev. E, 68(3):036122, 2003.

[213] H.-V. V. Ngo, T. Martinetz, J. Born, and M. Molle. Auditory closed-loop stimulation of

the sleep slow oscillation enhances memory. Neuron, 78(3):545–553, 2013.

[214] A. Noviyanto, S. M. Isa, I. Wasito, and A. M. Arymurthy. Selecting features of single

lead ecg signal for automatic sleep stages classification using correlation-based feature

subset selection. Int. J. Comput. Spec. Iss., 8(5):139–148, 2011.

196 References

[215] M. M. Ohayon. Epidemiology of insomnia: what we know and what we still need to

learn. Sleep Med. Rev., 6(2):97–111, 2002.

[216] M. M. Ohayon, M. A. Carskadon, C. Guilleminault, and M. V. Vitiello. Meta-analysis of

quantitative sleep parameters from childhood to old age in healthy individuals: develop-

ing normative sleep values across the human lifespan. Sleep, 27(7):1255–1273, 2004.

[217] H. Otzenberger, C. Simon, C. Gronfier, and G. Brandenberger. Temporal relationship

between dynamic heart rate variability and electroencephalographic activity during sleep

in man. Sleep, 229(3):173–176, 1997.

[218] J. Paalasmaa, M. Waris, H. Toivonen, L. Leppakorpi, and M. Partinen. Unobtrusive on-

line monitoring of sleep at home. In Proc. 34th Ann. Int. Conf. IEEE Eng. Med. Biol.

Soc. (EMBC’14), pp. 3784–3788, San Diego, CA, 2012.

[219] M. Pagani, F. Lombardi, S. Guzzetti, O. Rimoldi, R. Furlan, P. Pizzinelli, G. Sandrone, G.

Malfatto, S. Dell’Orto, and E. Piccaluga. Power spectral analysis of heart rate and arterial

pressure variabilities as a marker of sympatho-vagal interaction in man and conscious

dog. Circ. Res., 59:178–193, 1986.

[220] J. Paquet A. Kawinska, and J. Carrier. Wake detection capacity of actigraphy during

sleep. Sleep, 30(10):1362–1369, 2007.

[221] R. Paradiso, G. Loriga, and N. Taccini. A wearable health care system based on knitted

integrated sensors. IEEE Trans. Inf. Technol. Biomed., 9(3):337–344, 2005.

[222] C.-K. Peng, J. Mietus, J. M. Hausdorff, S. Havlin, H. E. Stanley, and A. L. Goldberger.

Long-range anticorrelations and non-Gaussian behavior of the heartbeat. Phys. Rev. Lett.,

70(9):1343, 1993.

[223] T. Penzel and R. Conradt. Computer based sleep recording and analysis. Sleep Med. Rev.,

4(2):131–148, 2000.

[224] T. Penzel, J. W. Kantelhardt, L. Grote, J. H. Peter, and A. Bunde. Comparison of de-

trended fluctuation analysis and spectral analysis for heart rate variability in sleep and

sleep apnea. IEEE Trans. Biomed. Eng., 50(10):1143–1151, 2003.

[225] T. Penzel, J. W. Kantelhardt, C.-C. Lo, K. Voigt, and C. Vogelmeier. Dynamics of heart

rate and sleep stages in normals and patients with sleep apnea. Neuropsychopharmacol-

ogy, 28:S48-S53, 2003.

[226] T. Penzel, N. Wessel, M. Riedl, J. W. Kantelhardt, S. Rostig, M. Glos, A. Suhrbier,

H. Malberg, and I. Fietze. Cardiovascular and respiratory dynamics during normal and

pathological sleep. Chaos, 17(1):015116, 2007.

References 197

[227] H. R. Peterson, M. Rothschild, C. R. Weinberg, R. D. Fell, K. R. McLeish, and M. A.

Pfeifer. Body fat and the activity of the autonomic nervous system. N. Engl. J. Med.,

318(17):1077-1083, 1988.

[228] D. Pevernagie, R. M. Aarts, and M. D. Meyer. The acoustics of snoring, Sleep Med. Rev.,

14(2):131–144, 2010.

[229] Philips Respironics Actiwatch, Philips Healthcare (retrieved in Nov. 2012). [Online]

Available: http://www.actiwatch.respironics.com.

[230] E. A. Phillipson. Control of breathing during sleep. Am. Rev. Respir. Dis., 118(5):909–

939, 1978.

[231] G. Pocock, C. D. Richards, and D. A. Richards. Human Physiology, 4th edn., Oxford

University Press, 2013.

[232] M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in noncontact, multiparam-

eter physiological measurements using a webcam,” IEEE Trans. Biomed. Eng., 58(1):7–

11, 2011.

[233] M. I. Polkey, M. Green, and J. Moxham. Measurement of respiratory muscle strength.

Thorax, 50(11):1131–1135, 1995.

[234] C. P. Pollak, W. W. Tryon, H. Nagaraja, and R. Dzwonczyk. How accurately does wrist

actigraphy identify the states of sleep and wakefulness? Sleep, 24(8):957-965, 2001.

[235] I. P. Priban and W. F. Fincham. Self-adaptive control and respiratory system. Nature,

208(5008):339–343, 1965.

[236] W. Prinz. Perception and action planning. Eur. J. Cognit. Psychol., 9(2):129–154, 1997.

[237] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for compar-

ing induction algorithms. In Proc. 15th Int. Conf. Machine Learn. (ICML), pp. 445–453,

Madison, WI, 1998.

[238] P. Pudil, J. Novovicova. Floating search methods in feature selection. Pattern Recogn.

Lett., 15(11):1119–1125, 1994.

[239] J. R. Quinlan. C4.5: programs for machine learning, Morgan Kaufmann Publishers Inc.,

San Francisco, CA, 1993.

[240] L. R. Rabiner and B. Gold. Theory and Application of Diginal Signal Processing, Pren-

tice Hall Press, 1975.

[241] L. Rabiner, A. Rosenberg, and S. Levinson. Considerations in dynamic time warping

algorithms for discrete word recognition. IEEE Trans. Acoust. Speech Signal Process.,

26(6):575-528, 1978.

198 References

[242] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria,

and E. Keogh. Searching and mining trillions of time series subsequences under dynamic

time warping. In Proc. Assoc. Comput. Mach. SIG Knowl. Discovery Data Mining (ACM

SIGKDD), pp. 262–270, 2012.

[243] A. N. Rama, S. C. Cho, and C. A. Kushida. Normal human sleep. In Sleep: A Compre-

hensive Handbook, edited by T. Lee-Chiong, Chap. 1, pp. 3-9, Wiley-Liss, New Jersey,

2006.

[244] J. Rasbash, F. Steele, W. J. Browne, and H. Goldstein. A User’s Guide to MLwiN, Centre

for Multilevel Modelling, Univ. of Bristol, Bristol, UK, 2009.

[245] C. A. Ratanamahatana and E. Keogh. Making time-series classification more accurate

using learned constraints. In Proc. SIAM Int. Conf. Data Mining (ICDM), pp. 11-22,

2004.

[246] S. W. Raudenbush and A. S. Bryk. Hierarchical Linear Models, Sage, Thousand Oaks,

CA, 2002.

[247] A. Rechtschaffen and A. Kales. A Manual of Standardized Terminology, Techniques and

Scoring System for Sleep Stages of Human Subjects. National Institutes of Health, Wash-

ington DC, 1968.

[248] S. J. Redmond and C. Heneghan. Cardiorespiratory-based sleep staging in subjects with

obstructive sleep apnea. IEEE Trans. Biomed. Eng., 53(3):485–496, 2006.

[249] S. J. Redmond, P. De Chazal, C. O’Brien, S. Ryan, W. T. McNicholas, and C. Heneghan.

Sleep staging using cardiorespiratory signals. Somnologie, 11(4):245–256, 2007.

[250] J. S. Richman and J. R. Moorman. Physiological time-series analysis using approximate

entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol., 278(6):H2039–2049,

2000.

[251] B. W. Riedel and K. L. Lichstein. Objective sleep measures and subjective sleep satis-

faction: How do older adults with insomnia define a good night’s sleep? Psychol. Aging,

13(1):159–163, 1998.

[252] D. Riemann, M. Berger, and U. Voderholzer. Sleep and depression results from psy-

chobiological studies: an overview. Biol. Psychol., 57(1-3):67–103, 2001.

[253] A. Roebuck, V. Monasterio, E. Gederi, M. Osipov, J. Behar, A. Malhotra, T. Penzel, and

G. D. Clifford. A review of signals used in sleep analysis. Physiol. Meas., 35(1):R1–R57,

2014.

[254] R. Robillard, T. J. R. Lambert, and N. L. Rogers. Measuring sleep-wake patterns with

physical activity and energy expenditure monitors. Biol. Rhythm Res., 43(5):555-562,

2012.

References 199

[255] I. M. Rosen, P. A. Gimotty, J. A. Shea, and L. M. Bellini. Evolution of sleep quantity,

sleep deprivation, mood disturbances, empathy, and burnout among interns. Acad. Med.,

81(1):82-85, 2006.

[256] S. Rostig, J. W. Kantelhardt, T. Penzel, W. Cassel, J.-H. Peter, C. Vogelmeier, H. F.

Becker, and A. Jerrentrup. Nonrandom variability of respiration during sleep in healthy

humans. Sleep, 28(4):411–417, 2005.

[257] S. M. Ryan, A. L. Goldberger, S. M. Pincus, J. Mietus, and L. A. Lipsitz. Gender- and

age-related differences in heart rate dynamics: are women more complex than men? J.

Am. Coll. Cardiol., 24(7):1700–1707, 1994.

[258] A. Sadeh. The role and validity of actigraphy in sleep medicine: an update. Sleep Med.

Rev., 15(4):259–267, 2011.

[259] A. Sadeh, K. M. Sharkey, and M. A. Carskadon. Activity-based sleep-wake identifica-

tion: an empirical test of methodological issues. Sleep, 17(3):201–207, 1994.

[260] C. C. R. Sady, U. S. Freitas, A. Portmann, J.-F. Muir, C. Letellier, and L. A. Aguirre. Au-

tomatic sleep staging from ventilator signals in non-invasive ventilation. Comput. Biol.

Med., 43(7):833–839, 2013.

[261] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word

recognition. IEEE Trans. Acoust., Speech, Signal Process., AASP-26(1):43-49, 1978.

[262] B. Saletu, P. Wessely, P. Grunberger, and M. Schultes. Erste klinische Erfahrungen

mit einem neuen schlafanstoßenden Benzodiacepin Cinolazepam mittels eines Selbst-

beurteilungsbogens fur Schalf–und Aufwachqualitat (SSA). Acad. Med., 66(11):687-

693, 1991.

[263] J. S. Samkoff and C. H. Jacques. A review of studies concerning effects of sleep depri-

vation and fatigue on residents’ performance. Acad. Med., 66(11):687-693, 1991.

[264] L. Samy, M.-C. Huang, J. Liu, W. Xu, and M. Sarrafzadeh. Unobtrusive sleep stage iden-

tification using a pressure-sensitive bed sheet. IEEE Sens. J., 14(7):2092–2101, 2014.

[265] J. P. Saul, R. F. Rea, D. L. Eckberg, R. D. Berger, and R. J. Cohen. Heart rate and

muscle sympathetic nerve variability during reflex changes of autonomic activity, Am. J.

Physiol., 258(3):H713–H721, 1990.

[266] C. Schafer, M. G. Rosenblum, J. Kurths and H.-H. Abel. Heartbeat synchronized with

ventilation. Nature, 329:239–240, 1998.

[267] A. Y. Schumann, R. P. Bartsch, T. Penzel, P. Ch. Ivanov, and J. W. Kantelhardt. Aging

effects on cardiac and respiratory dynamics in healthy subjects across sleep stages. Sleep,

33(7):943–955, 2010.

200 References

[268] E. Sforza, C. Jouny, and V. Ibanez. Cardiac activation during arousal in humans: further

evidence for hierarchy in the arousal response. Clin. Neurophysiol., 111(9):1611–1619,

2000.

[269] A. Sgoifo, C. Coe, S. Parmigiani, and J. Koolhaas. Individual differences in behavior and

physiology: causes and consequences. Neurosci. Biobehav. Rev., 29:1–2, 2005.

[270] K. Shafqat, S. K. Pal, S. Kumari, and P. A. Kyriacou. Time-frequency analysis of HRV

data from locally anesthetized patients. In Proc. 31st Ann. Int. Conf. IEEE Eng. Med.

Biol. Soc. (EMBC), pp. 1824–1827, Minneapolis, MN, 2009.

[271] S. S. Shapiro, M. B. Wilk, and H. J. Chen. Network analysis of human heartbeat dynam-

ics. J. Am. Stat. Assoc., 63(324):1343–1372, 1968.

[272] Z.-G. Shao. Network analysis of human heartbeat dynamics. Appl. Phys. Lett.,

96(7):073703, 2010.

[273] Z. Shinar, A. Baharav, Y. Dagan, and S. Akselrod. Automatic detection of slow-wave-

sleep using heart rate variability. In Computers in Cardiology (CinC), pp. 593–596, Rot-

terdam, The Netherlands, 2001.

[274] Z. Shinar, S. Akselrod, Y. Dagan, and A. Baharav. Autonomic changes during wake-

sleep transition: a heart rate variability based approach. Auton. Neurosci., 130(1-2):17–

27, 2006.

[275] T. Shiomi, C. Guilleminault, R. Sasanabe, I. Hirota, M. Maekawa, and T. Kobayashi.

Augmented very low frequency component of heart rate variability during obstructive

sleep apnea. Sleep, 19(5):370–377, 1996.

[276] M. H. Silber, S. Ancoli-Israel, M. H. Bonnet, S. Chokroverty, M. M. Grigg-Damberger,

M. Hirshkowitz, S. Kapen, S.A. Keenan, M. H. Kryger, T. Penzel, M.R. Pressman, and

C. Iber. The visual scoring of sleep in adults. J. Clin. Sleep Med., 3(2):485–496, 2007.

[277] J. Sloboda and M. Das. A simple sleep stage identification technique for incorporation

in inexpensive electronic sleep screening devices. In Proc. IEEE Nat. Aero. Elect. Conf.

(NAECON), pp. 21–24, Dayton, OH, 2011.

[278] P. Smialowski, D. Frishman, and S. Kramer. Pitfalls of supervised feature selection.

Bioinformatics, 26(3):440–443, 2010.

[279] F. Snyder, J. A. Hobson, D. F. Morrison, and F. Goldfrank. Changes in Respiration, heart

rate, and systolic blood pressure in human Sleep. J. Appl. Physiol., 19(5):417–422, 1964.

[280] V. K. Somers, M. E. Dyken, M. P. Clary, and F. M. Abboud. Sympathetic neural mecha-

nisms in obstructive sleep apnea. J. Clin. Invest., 96(4):1897–1904, 1995.

References 201

[281] V. K. Somers, M. E. Dyken, A. L. Mark, and F. M. Abboud. Sympathetic-nerve activity

during sleep in normal subjects. N. Engl. J. Med., 328(5):303–307, 1993.

[282] K. Spiegel, R. Leproult, E. Van Cauter. Impact of sleep debt on metabolic and endocrine

function. The Lancet, 354(9188):1435–1439, 1999.

[283] K. Spiegelhalder, L. Fuchs, J. Ladwig, S. D. Kyle, C. Nissen, U. Voderholzer, B. Feige,

and D. Riemann. Heart rate and heart rate variability in subjectively reported insomnia.

J. Sleep Res., 20(1pt2):137–145, 2011.

[284] M. Steriade. The corticothalamic system in sleep. Front. Biosci., 8:878–899, 2003.

[285] R. Stickgold. Sleep-dependent memory consolidation. Nature, 437(7063):1272–1278,

2005.

[286] S. H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, 2001.

[287] E. Tasali, R. Leproult, D. A. Ehrmann, and E. V. Cauter. Slow-wave sleep and the risk of

type 2 diabetes in humans. Proc. Natl. Acad. Sci. U.S.A., 105(3):1044–1049, 2008.

[288] Task Force of the European Society of Cardiology and the North American Society of

Pacing and Electrophysiology. Heart rate variability: standards of measurement, physio-

logical interpretation and clinical use. Circulation, 93:1043–1065, 1996.

[289] S. Telser, M. Staudacher, Y. Ploner, A. Amann, H. Hinterhuber, and M. Ritsch-Marte.

Can one detect sleep stage transitions for on-line sleep scoring by monitoring the heart

rate variability? Somnologie, 8(2):33–41, 2004.

[290] C. Texier and S. N. Majumdar. Wigner time-delay distribution in chaotic cavities and

freezing transition. Phys. Rev. Lett., 110(25):250602, 2013.

[291] TomTom Runner Cardio Monitor (retrieved in Jan. 2015). [Online] Available:

http://www.tomtom.com/products/your-sports/running.

[292] J. Trinder, J. Kleiman, M. Carrington, S. Smith, S. Breen, N. Tan, and Y. Kim. Auto-

nomic activity during human sleep as a function of time and sleep stage. J. Sleep Res.,

10(4):253–264, 2001.

[293] J. Trinder, M. Padula, D. Berlowitz, J. Kleiman, S. Breen, P. Rochford, C. Worsnop, B.

Thompson, and R. Pierce. Cardiac and respiratory activity at arousal from sleep under

controlled ventilation conditions. J. Appl. Physiol., 90(4):1455–1463, 2001.

[294] J. Trinder, F. Whitworth, A. Kay, and P. Wilkin. Respiratory instability during sleep

onset. J. Appl. Physiol., 73(6):2462–2469, 1992.

[295] W. W. Tryon. Issues of validity in actigraphic sleep assessment. Sleep, 27(1):158-165,

2004.

202 References

[296] M. Unser. Splines: a perfect fit for signal and image processing. IEEE Signal Proc. Mag.,

16(6):22–38, 1999.

[297] J. Van. Alste and T. S. Schilder. Removal of base-line wander and power-line interference

from the ECG by an efficient FIR filter with a reduced number of taps. IEEE Trans.

Biomed. Engineering, BME-32(12):1052–1060, 1985.

[298] P. Van De Borne, H. Nguyen, P. Biston, P. Linkowski, and J. P. Degaute. Effects of

wake and sleep stages on the 24-h autonomic control of blood pressure and heart rate in

recumbent men. Am. J. Physiol., 266(2):H548–H554, 1994.

[299] E. Vanoli, P. B. Adamson, L. Ba, G. D. Pinna, R. Lazzara, and W. C. Orr. Heart rate

variability during specific sleep stages: a comparison of healthy subjects with patients

after myocardial infarction. Circulation, 91:1918–1922, 1995.

[300] J. Virkkala, J. Hasan, A. Varri, S.-L. Himanen, and K. Muller. Automatic sleep stage

classification using two-channel electro-oculography. J. Neurosci. Meth., 166(1):109–

115, 2007.

[301] P E. Wainwright, S. T. Leatherdale, and J. A. Dubin. Advantages of mixed effects models

over traditional ANOVA models in developmental studies: a worked example in a mouse

model of fetal alcohol syndrome. Develop. Psychobiol., 49(1):664–674, 2007.

[302] M. P. Walker and R. Stickgold. Sleep-dependent learning and memory consolidation.

Neuron, 44(1):121–133, 2004.

[303] T. Watanabe and K. Watanabe. Noncontact method for sleep stage estimation. IEEE

Trans Biomed. Eng., 51(10):1735–1748, 2004.

[304] K. Watanabe, T. Watanabe, H. Watanabe, H. Ando, T. Ishikawa, and K. Kobayashi.

Noninvasive measurement of heartbeat, respiration, snoring and body movements of a

subject in bed via a pneumatic method. IEEE Trans. Biomed. Eng., 52(12):2100–2107,

2005.

[305] D. O. White, J. V. Weil, and C. W. Zwillich. Metabolic rate and breathing during sleep.

J. Appl. Physiol., 59(2):384–391, 1985.

[306] A. W. Whitney. A direct method of nonparametric measurement selection. IEEE Trans.

Comput., C-20(9):1100–1103, 1971.

[307] K. F. Whyte, M. Gugger, G. A. Gould, J. Molloy, P. K. Wraith, and N. J. Douglas. Accu-

racy of respiratory inductive plethysmograph in measuring tidal volume during sleep. J.

Appl. Physiol. (1985), 71(5):1866–1871, 1991.

[308] T. Willemen, D. Van Deun, V. Verhaert, S. Pirrera, V. Exadaktylos, J. Verbraecken, B.

Haex, and J. Vander Sloten. Automatic sleep stage classification based on easy to register

References 203

signals as a validation tool for ergonomic steering in smart bedding systems. Work: J.

Prev. Ass. Rehabil., 41:1985-1989, 2012.

[309] T. Willemen, D. Van Deun, V. Verhaert, M. Vandekerckhove, V. Exadaktylos, J. Ver-

braecken, S. V. Huffel, B. Haex, and J. Vander Sloten. An evaluation of cardio-respiratory

and movement features with respect to sleep stage classification. IEEE J. Biomed. Health

Inform., 18(2):661-669, 2014.

[310] P. Wohlfahrt, J. W. Kantelhardt, M. Zinkhan, A. Y. Schumann, T. Penzel, I. Fietze, F.

Pillmann, and A. Stang. Transitions in effective scaling behavior of accelerometric time

series across sleep and wake. Eur. Phys. Lett., 103(6):68002, 2013.

[311] R. Wolk, A. S. Gami, A. Garcia-Touchard, and V. K. Somers. Sleep and cardiovascular

disease. Curr. Probl. Cardiol., 30:625–662, 2005.

[312] M. Xiao, H. Yan, J. Song, Y. Yang, and X. Yang. Sleep stages classification based on

heart rate variability and random forest. Biomed. Signal Process. Control, 8(6):624–633,

2013.

[313] X. Xu, J. Zhang, and M. Small1. Superfamily phenomena and motifs of networks in-

duced from time series. Proc. Natl. Acad. Sci. U.S.A., 105(50):19601-19605, 2008.

[314] D. Yankov, E. Keogh, J. Medina, B. Chiu, and V. Zordan. Detecting time series mo-

tifs under uniform scaling. In Proc. Assoc. Comput. Mach. SIG Knowl. Discovery Data

Mining (ACM SIGKDD), 2005, pp. 844–853.

[315] B. Yilmaz, M. H. Asyali, E. Arikan, S. Yetkin, and Fuat Ozgen. Sleep stage and ob-

structive apneaic epoch classification using single-lead ECG. Biomed. Eng. Online, 9:39,

2010.

[316] J. Yoo, L. Yan, S. Lee, H. Kim, H.-J. Yoo. A wearable ECG acquisition system with

compact planar-fashionable circuit board-based shirt. IEEE Trans. Inf. Technol. Biomed.,

13(6):897–902, 2009.

[317] T. Young, P. E. Peppard, and D. J. Gottlieb. Epidemiology of obstructive sleep apnea: a

population health perspective. Am. J. Respir. Crit. Care Med., 165(9):1217–1239, 2002.

[318] C. Yu, Z. Liu, T. McKenna, A. T. Reisner, and J. Reifman. A method for automatic

identification of reliable heart rates calculated from ECG and PPG waveforms. J. Am.

Med. Inform. Assoc., 13(3):309–320, 2006.

[319] M. Zakrzewski, H. Raittinen, and J. Vanhala. Comparison of center estimation algo-

rithms for heart and respiration monitoring with microwave Doppler radar. . IEEE Sens.

J., 12(3):627–634, 2012.

[320] J. Zhang and M. Small. Complex network from pseudoperiodic time series: topology

versus dynamics. Phys. Rev. Lett., 96(23):238701, 2006.

204 References

[321] G. Zhu, Y. Li, and P. Wen. Analysis and classification of sleep stages based on difference

visibility graphs from a single-channel EEG signal. IEEE J. Biomed. Health Inform.,

18(6):1813–1821, 2014.

[322] G. Zhu, Y. Li, and P. Wen. An efficient visibility graph similarity algorithm and its ap-

plication on sleep stages classification. Brain Inform. LNCS, 7670:185–195, 2012.

List of the author’s publications

Journal articles

1. X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Spectral boundary adap-

tation on heart rate variability for sleep and wake classification. International Journal on

Artificial Intelligence Tools, 23(3):1460002, 2014.

2. X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-

fication with actigraphy and respiratory effort using dynamic warping. IEEE Journal of

Biomedical and Health Informatics, 18(4):1272–1284, 2014.

3. P. Fonseca, J. Foussier, R. M. Aarts, and X. Long. A novel low-complexity post-process-

ing algorithm for precise QRS localization. SpringerPlus, 3:376, 2014.

4. X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing respiratory

effort amplitude for automated sleep stage classification. Biomedical Signal Processing

and Control, 14:197–205, 2014.

5. X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Measuring dissimilarity between respiratory effort signals based on uniform scaling for

sleep staging. Physiological Measurement, 35(12):2529–2542, 2014.

6. X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling cardiorespira-

tory interaction during sleep with complex networks. Applied Physics Letters, 105(20):

203701, 2014.

7. M. S. Goelema, X. Long, and R. Haakma. Correlations between overnight breathing rate

variation and subjective sleep quality scores. Sleep-Wake Research in the Netherlands

(NSWO Jaarboek), 25:60–63, 2015.

8. X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Erratum: Measuring dissimilarity between respiratory effort signals based on uniform

205

206 List of the author’s publications

scaling for sleep staging (2014 Physiol. Meas. 35 2539). Physiological Measurement,

36(3):625, 2015.

9. X. Long, J. B. Arends, R. M. Aarts, R. Haakma, P. Fonseca, and J. Rolink. Time de-

lay between cardiac and brain activity during sleep transitions. Applied Physics Letters,

106(14):143702, 2015.

10. J. Rolink, M. Kutz, P. Fonseca, X. Long, B. Misgeld, and S. Leonhardt. Recurrence

quantification analysis across sleep stages. Biomedical Signal Processing and Control,

20:107–116, 2015.

11. X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of nocturnal slow

wave sleep based on cardiorespiratory activity. Submitted.

12. X. Long, R. Haakma, T. Leufkens, P. Fonseca, and R. M. Aarts. Effects of between- and

within-subject variability on autonomic cardiorespiratory activity during sleep and their

limitations on sleep staging: a multilevel analysis. Submitted.

13. P. Fonseca∗, X. Long∗, M. Radha, R. Haakma, R. M. Aarts, and J. Rolink. Sleep stage

classification with ECG and respiratory effort. Submitted. (∗Joint first authorship)

14. P. Fonseca, R. M. Aarts, X. Long, and R. Haakma. Estimating actigraphy from motion

artifacts in ECG and respiratory effort signals. Submitted.

15. P. Fonseca, N. Den Teuling, X. Long, J. Rolink, and R. M. Aarts. Cardiorespiratory sleep

stage detection using conditional random fields. Submitted.

16. J. Werth, L. Atallah, P. Andriessen, X. Long, E. Zwartkruis-Pelgrim, and R. M. Aarts.

Unobtrusive sleep state measurements in preterm infants: a review. Submitted.

Conference articles and abstracts

1. X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Using dynamic time

warping for sleep and wake discrimination. IEEE-EMBS International Conference on

Biomedical and Health Informatics (BHI’12), pp. 886–889, Hong Kong and Shenzhen,

China, Jan. 2012. (First Runner-up Student Paper Award)

2. X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Time-frequency analysis

of heart rate variability for sleep and wake classification. 12nd IEEE International Con-

ference on BioInformatics and BioEngineering (BIBE’12), pp. 85–90, Larnaca, Cyprus,

Nov. 2012. (Best Student Paper Award)

3. J. Foussier, P. Fonseca, X. Long, and S. Leonhardt. Automatic feature selection for

sleep/wake classification with small data sets. International Joint Conference on Biomed-

ical Engineering Systems and Technologies (BIOSTEC’13), pp. 178–184, Barcelona,

Spain, Feb. 2013.

List of the author’s publications 207

4. X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Respiration amplitude

analysis for REM and NREM sleep classification. 35th Annual International Conference

of the IEEE Engineering in Medicine and Biology Society (EMBC’13), pp. 5017–5020,

Osaka, Japan, Jul. 2013.

5. P. Fonseca, X. Long, J. Foussier, and R. M. Aarts. On the impact of arousals on the

performance of sleep and wake classification using actigraphy. 35th Annual Interna-

tional Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’13),

pp. 6760–6763, Osaka, Japan, Jul. 2013.

6. J. Foussier, P. Fonseca, X. Long, B. Misgeld, and S. Leonhardt. Combining HRV fea-

tures for automatic arousal detection. Computing in Cardiology (CinC), pp. 1003–1006,

Zaragoza, Spain, Sep. 2013.

7. J. Foussier, X. Long, P. Fonseca, B. Misgeld, and S. Leonhardt. On the relationship of

arousals and artifacts in respiratory effort signals. International Conference on Health

Informatics, pp. 31–34, Vilamoura, Portugal, Nov. 2013. (Finalists Young Investigator

Award)

8. X. Long, R. Haakma, M. Goelema, Tim Weysen, P. Fonseca, J. Foussier, and R. M.

Aarts. Self-dissimilarity of respiratory effort across sleep states and time. Sleep, vol. 37

(Abstract Supplement), p. A36, May 2014.

9. X. Long, P. Fonseca, R. Haakma, J. Foussier, and R. M. Aarts. Automatic detection

of overnight deep sleep based on heart rate variability: a preliminary study. 36th An-

nual International Conference of the IEEE Engineering in Medicine and Biology Society

(EMBC’14), pp. 50–53, Chicago, IL, Aug. 2014.

10. X. Long, R. Haakma, R. M. Aarts, P. Fonseca, and J. Foussier. Between-laboratory and

demographic effects on heart rate and its variability during sleep. Workshop on Smart

Healthcare and Healing Environments in conjunction with the European Conference on

Ambient Intelligence (AmI’14), pp. 1–4, Eindhoven, The Netherlands, Nov. 2014.

11. M. S. Goelema, X. Long, and R. Haakma. Gender effect found in the association be-

tween overnight breathing rate variation and reported sleep quality scores. Sleep, vol. 38

(Abstract Supplement), p. A60, Jun. 2015.

12. X. Long, R. Haakma, P. Fonseca, R. M. Aarts, M. S. Goelema, and J. Rolink. What

causes the differences in cardiac activity within and between subjects during sleep? Sleep,

vol. 38 (Abstract Supplement), p. A63, Jun. 2015.

13. X. Long, R. Haakma, J. Rolink, P. Fonseca, and R. M. Aarts. Improving sleep/wake

detection via boundary adaptation for respiratory spectral features. Submitted, 2015.

208 List of the author’s publications

14. M.-M. Nano, X. Long, J. Werth, R. M. Aarts, and R. Heusdens. Sleep apnea detection

using time-delayed heart rate variability. Submitted, 2015.

Patent application filings

1. X. Long, R. Haakma, P. Fonseca, and R. M. Aarts. System and method for determining

spectral boundaries for sleep stage classification. Pending.

2. X. Long, P. Fonseca, Niek den Teuling, R. Haakma, and R. M. Aarts. System and method

for slow wave sleep detection. Pending.

3. P. Fonseca, Niek den Teuling, X. Long, R. Haakma, and R. M. Aarts. System and method

for cardiorespiratory sleep stage classification. Pending.

4. P. Fonseca, R. Haakma, R. M. Aarts, and X. Long. Actigraphy methods and apparatuses.

Pending.

Articles out of the thesis’s scope

1. X. Long, B. Yin, and R. M. Aarts. Single-accelerometer-based daily physical activ-

ity classification. 31st Annual International Conference of the IEEE Engineering in

Medicine and Biology Society (EMBC’09), pp. 6107–6110, Minneapolis, MN, Sep. 2009.

2. X. Long, S. Pauws, M. Pijl, J. Lacroix, A. Goris, and R. M. Aarts. Analysis and predic-

tion of daily physical activity level data using autoregressive integrated moving average

models. 3rd Workshop on Behaviour Monitoring and Interpretation (BMI’09), pp. 1–15,

Paderborn, Germany, Oct. 2009.

3. X. Long, W. Yin, L. An, H. Ni, L. Huang, Q. Luo, and Y. Chen. Churn analysis of online

social network users using data mining techniques. International MultiConference of

Engineers and Computer Scientists (IMECS’12), pp. 551–556, Hong Kong, Mar. 2012.

4. X. Long, M. Pijl, S. Pauws, J. Lacroix, A. Goris, and R. M. Aarts. Towards tailored phys-

ical activity health intervention: Predicting dropout participants. Health and Technology,

4:273–287, 2014.

5. X. Long, S. Pauws, M. Pijl, J. Lacroix, A. Goris, and R. M. Aarts. Predicting daily

physical activity in a lifestyle intervention program. In Ambient Intelligence and Smart

Environments, Vol. 9: Behaviour Monitoring and Interpretation – BMI, edited by B.

Gottfried and H. Aghajan, Part III, pp. 131-146, IOP Press, Amsterdam, The Netherlands,

2011.

Acknowledgements

I still remember the moment more than four years ago I decided to pursue this PhD project in

Western Europe (Eindhoven, the Netherlands), 11472 km away from my home in the Far East

(Huizhou, China). It was not an easy decision since it meant that I had to change my career

path and stay on the other side of the earth for the second time after I finished my master study

in Eindhoven in 2009, and this time it would be much longer. Even though, I experienced to be

exciting and full of strength at that moment because it seemed that I found my dream, a dream

of dedicating myself to what I am extraordinarily interested in; and then after four years, you

see this book. Herewith, I would like to express my heartfelt appreciation to all of you who

shared my experience over the years.

First and foremost, I would like to express the deepest thanks and gratitude to my supervi-

sors, Prof. Ronald M. Aarts and Dr. Reinder Haakma. Thank you, Ronald, for your sincerest

advices and encouragements for the long walks on both professional development and personal

life I underwent during the past four years, as well as during my master period. Thank you,

Reinder, for masterly and patiently coaching me the doctorate work and for giving me the free-

dom to explore my own ideas. I will never forget the discussions during our regular meetings,

which were always so inspiring and happy. I learned so much from you about how to energize

creative thinking during scientific research. I always feel being lucky under your supervision.

A special note of gratitude to Prof. Jan Bergmans, the chair of the Signal Processing (SPS)

group at TU/e, you offered me this wonderful opportunity to pursue my doctorate degree.

Enormous thanks must go to my colleague Pedro Fonseca who worked closely with me.

Your support and knowledge have been of great help for surpassing the encountered obstacles.

I have been receiving lots of benefits from your critical review for my articles. Without the help

from you, this thesis would never have been possible to be finished. Particular thanks must be

recorded to my second promoter Prof. Johan Arends for your advices and discussions regarding

the neurophysiology aspect of the work. Many thanks go also to Jerome Rolink and Mustafa

Radha, who provided inspiring comments for my manuscripts and provided huge contributions

to the algorithm framework of the project, and to Dr. Sandrine Devot and Reimund Dratwa,

who initiated the framework. I would also like to thank Maaike Goelema, Dr. Tim Weysen, Dr.

Tim Leufkens, Dr. Roy Raymann, Tine Smits, and Renske de Bruijn for supporting the work

209

210 Acknowledgements

with your expertise in psychology or physiology. My gratitude is also due to the other former

team members Adrienne Heinrich and Dr. Igor Berezhnyy as well as the former master students

Jie Yang, Niek den Teuling, Xi Yang, Antonio Rebelo, Yuan Lu, and Xi Zhang being involved

in the project. Your enthusiastic attitude and your hard work during different phases of the

project did accelerate the success of my work. I feel fortunate for having you in the project. A

special thank goes to Timothy A. Nathan, a senior intellectual property counsel from the IP&S

department in Philips United States, for your active responses that expedited the approval of

my work for publication and for helping me with filing several patent applications.

I would also like to acknowledge the committee of this thesis, Prof. Panos Markopoulos,

Prof. Sabine Van Huffel (KU Leuven), Prof. Steffen Leonhardt (RWTH Aachen University),

and the chairman Prof. Peter de With, for the insightful comments of the thesis.

Most of the work presented in this thesis has been conducted at Philips Research, the Nether-

lands. For this reason thanks must go to Marieke van der Hoeven, the head of the Brain, Body

& Behavior department where I spent the first two years in your group, and to Dr. Jorg Habetha,

the head of the Personal Health department where I had the honor to work in your department

during the past two years. It was a grateful and pleasure time of my life where I met so many

knowledgable and energetic scientists, from whom I have learned a lot during coffee breaks,

lunch time, and offside events that we had together. Dr. Michael Rooijakkers, thank you for

helping me with presenting my work in the EMBC’14 conference in Chicago. Since I have also

been involved in a couple of other projects apart from my PhD work, I would like to thank Prof.

Guofu Zhou, Jan Werth, Dr. Peter Andriessen, Dr. Louis Atallah, Elly Zwartkruis-Pelgrim,

Marina Nano, and Dr. Richard Heusdens for having great discussions with you. Thanks go

also to the Philips and SPS secretaries as well as the other adminstration staffs who helped me

with organizing many non-technical issues such as providing instructions and ICT supports at

the beginning of the project, applying business trips and reimbursement, appointing teleconfer-

ences, and making support letters for my parents’ and friends’ visit to the Netherlands. I would

also like to express my sincerest gratitude to Dr. Bin Yin and Dr. Steffen Pauws for guiding my

master project. You started me down the amazing road of scientific research.

During my life in the Netherlands, I have to admit that I owe a lot of thanks to my Chinese

friends and colleagues who made me never lonely: Anmin, Liya, Yuanjia, Xiaoyin, BoC, Qing,

Tao, Tao, Wei, Jianhua, Bin, Rui, Pu, Wei, Quan, Shaoxiong, Yanan, Yan, Lin, Xin, Xiong and

so many others. Particularly, I would also give special thanks to Le, Wenyao, Xiaomin, Xin,

Dan, Tingyun, Joanne, Chen, Fei, and Anqi for different reasons. Last but not least, I want to

express my sincerest thanks to my parents for giving me life to see, listen, feel, and experience

this wonderful world and to my relatives and friends in China for your supports during the past

11586 days.

Being back to 18 years ago during my child age at middle school in 1997, I wrote an article

when I was doing my writing homework where I dreamed to be awarded the Nobel Laureate in

Biomedicine in 2016, although I had no idea what ‘Biomedicine’ means literally. Unfortunately,

since 2016 is coming soon, I now realize that there is even no Nobel Laureate in Biomedicine,

but who knows if that will come in the future.

About the author

Xi Long was born on October 9, 1983 in Ganzhou, China, and

moved to Huizhou, China, in 1992. He received the B.Eng.

degree (with honor) in electronic and information engineering

from Zhejiang University, Hangzhou, China, in June 2006 and

the M.Sc. degree in electrical engineering (with a fully-funded

scholarship awarded by NXP) from Eindhoven University of

Technology, Eindhoven, the Netherlands, in August 2009. Dur-

ing the period between May 2008 and August 2009, he was a

research intern at Philips Research Eindhoven and worked on

accelerometer-based activity monitoring, supervised by Prof.

Ronald M. Aarts, Dr. Bin Yin, and Dr. Steffen Pauws. After that, from January 2010 until

June 2011, he worked for Tencent Inc., Shenzhen, China, with responsibilities for user research

and quantitative data analysis of web-based products and services.

From July 2011 to June 2015, he was a Ph.D. candidate in the Signal Processing Systems

group at the Eindhoven University of Technology, the Netherlands, granted by Philips Research

in Eindhoven, the Netherlands, under the supervision of Prof. Ronald M. Aarts and Dr. Rein-

der Haakma. At the same time, he joined the Brain, Body & Behavior group (and later the

Personal Health group) at Philips Research where he investigated autonomic markers and ma-

chine learning algorithms for sleep stage classification and objective sleep assessment. He has

published over thirty papers and reports, and holds four first US patent application filings and

over ten Philips invention disclosures. He was the recipient of the Best Student Paper Award

at the IEEE 12nd International Conference on BioInformatics and BioEngineering (BIBE) in

2012 and the First Runner-up Student Paper Award at the IEEE-EMBS International Confer-

ence on Biomedical and Health Informatics (BHI) in 2012. He serves as a reviewer of several

journals in biomedical engineering and healthcare, such as IEEE Journal of Biomedical and

Health Informatics, Biomedical Signal Processing and Control, and Physiological Measure-

ment. His research interests include objective sleep analysis, vital sign monitoring, unobtrusive

and wearable sensing, and biomedical signal processing as well as machine learning, time series

analysis, and computational models for medicine and healthcare.

211

on the analysis and classification of sleep stages from

Documents