speech in mobile and pervasive environments · pdf file · 2015-10-27fields with...

15

Upload: donguyet

Post on 18-Mar-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:
Page 2: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:
Page 3: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

Speech in Mobile and PervasiveEnvironments

Page 4: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

Wiley Series on Wireless Communications and Mobile Computing

Series Editors: Dr Xuemin (Sherman) Shen, University of Waterloo, CanadaDr Yi Pan, Georgia State University, USA

The “Wiley Series on Wireless Communications and Mobile Computing” is a seriesof comprehensive, practical and timely books on wireless communication and networksystems. The series focuses on topics ranging from wireless communication and codingtheory to wireless applications and pervasive computing. The books provide engineersand other technical professionals, researchers, educators, and advanced students in thesefields with invaluable insight into the latest developments and cutting-edge research.

Other titles in the series:

Misic and Misic: Wireless Personal Area Networks: Performance, Interconnection,and Security with IEEE 802.15.4, January 2008, 978-0-470-51847-2

Takagi and Walke: Spectrum Requirement Planning in Wireless Communications:Model and Methodology for IMT-Advanced, April 2008, 978-0-470-98647-9

Perez-Fontan and Espineira: Modeling the Wireless Propagation Channel: A simulationapproach with MATLAB®, August 2008, 978-0-470-72785-0

Ippolito: Satellite Communications Systems Engineering: Atmospheric Effects, SatelliteLink Design and System Performance, August 2008, 978-0-470-72527-6

Lin and Sou: Charging for Mobile All-IP Telecommunications , September 2008,978-0-470-77565-3

Myung and Goodman: Single Carrier FDMA: A New Air Interface for Long Term Evo-lution , October 2008, 978-0-470-72449-1

Wang, Kondi, Luthra and Ci: 4G Wireless Video Communications , April 2009,978-0-470-77307-9

Cai, Shen and Mark: Multimedia Services in Wireless Internet: Modeling and Analysis ,June 2009, 978-0-470-77065-8

Stojmenovic: Wireless Sensor and Actuator Networks: Algorithms and Protocols forScalable Coordination and Data Communication , February 2010, 978-0-470-17082-3

Liu and Weiss, Wideband Beamforming: Concepts and Techniques , March 2010,978-0-470-71392-1

Riccharia and Westbrook, Satellite Systems for Personal Applications: Concepts andTechnology , July 2010, 978-0-470-71428-7

Qian, Muller and Chen: Security in Wireless Networks and Systems , March 2014,978-0-470-512128

Page 5: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

Speech in Mobile and PervasiveEnvironments

Nitendra Rajput and Amit A. Nanavati

IBM Research, New Delhi, India

A John Wiley & Sons, Ltd., Publication

Page 6: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

This edition first published 2012© 2012 John Wiley & Sons Ltd.

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UnitedKingdom

For details of our global editorial offices, for customer services and for information about how to applyfor permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance withthe Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, ortransmitted, in any form or by any means, electronic, mechanical, photocopying, recording orotherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the priorpermission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in printmay not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. Allbrand names and product names used in this book are trade names, service marks, trademarks orregistered trademarks of their respective owners. The publisher is not associated with any product orvendor mentioned in this book. This publication is designed to provide accurate and authoritativeinformation in regard to the subject matter covered. It is sold on the understanding that the publisheris not engaged in rendering professional services. If professional advice or other expert assistance isrequired, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Rajput, Nitendra.Speech in mobile and pervasive environments / Nitendra Rajput and Amit A. Nanavati.

p. cm.Includes bibliographical references and index.ISBN 978-0-470-69435-0 (cloth)

1. Speech processing systems. 2. Cell phone systems. I. Nanavati, Amit A. II. Title.TK7882.S65R334 2012006.5–dc23

2011033626

A catalogue record for this book is available from the British Library.

ISBN: 9780470694350 (H/B)

Typeset in 10.5/13pt Times by Laserwords Private Limited, Chennai, India

Page 7: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

For,Z �ooz �oo & Po�o

To,Family & Friends

Page 8: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:
Page 9: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

Contents

About the Series Editors xiii

List of Contributors xv

Foreword xvii

Preface xix

Acknowledgments xxiii

1 Introduction 11.1 Application design 31.2 Interaction modality 31.3 Speech processing 41.4 Evaluations 5

2 Mobile Speech Hardware: The Case for Custom Silicon 72.1 Introduction 72.2 Mobile hardware: Capabilities and limitations 11

2.2.1 Looking inside a mobile device: Smartphone example 112.2.2 Processing limitations 142.2.3 Memory limitations 162.2.4 Power limitations 192.2.5 Silicon technology and mobile hardware 22

2.3 Profiling existing software systems 242.3.1 Speech recognition overview 242.3.2 Profiling techniques summary 252.3.3 Processing time breakdown 272.3.4 Memory usage 292.3.5 Power and energy breakdown 302.3.6 Summary 32

2.4 Recognizers for mobile hardware: Conventional approaches 322.4.1 Reduced-resource embedded recognizers 332.4.2 Network recognizers 352.4.3 Distributed recognizers 362.4.4 An alternative approach: Custom hardware 38

Page 10: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

viii CONTENTS

2.5 Custom hardware for mobile speech recognition 382.5.1 Motivation 382.5.2 Hardware implementation: Feature extraction 402.5.3 Hardware implementation: Feature scoring 412.5.4 Hardware implementation: Search 442.5.5 Hardware implementation: Performance and power

evaluation 472.5.6 Hardware implementation: Summary 49

2.6 Conclusion 49Bibliography 50

3 Embedded Automatic Speech Recognitionand Text-to-Speech Synthesis 573.1 Automatic speech recognition 573.2 Mathematical formulation 583.3 Acoustic parameterization 60

3.3.1 Landmark-based approach 643.4 Acoustic modeling 64

3.4.1 Unit selection 643.4.2 Hidden Markov models 66

3.5 Language modeling 693.6 Modifications for embedded speech recognition 71

3.6.1 Feature computation 713.6.2 Likelihood computation 75

3.7 Applications 773.7.1 Car navigation systems 773.7.2 Smart homes 783.7.3 Interactive toys 783.7.4 Smartphones 79

3.8 Text-to-speech synthesis 793.9 Text to speech in a nutshell 803.10 Front end 813.11 Back end 84

3.11.1 Rule-based synthesis 843.11.2 Data-driven synthesis 863.11.3 Statistical parameteric speech synthesis 90

3.12 Embedded text-to-speech 913.13 Evaluation 923.14 Summary 94

Bibliography 94

Page 11: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

CONTENTS ix

4 Distributed Speech Recognition 994.1 Elements of distributed speech processing 1004.2 Front-end processing 101

4.2.1 Device requirements 1034.2.2 Transmission issues in DSR 1044.2.3 Back-end processing 105

4.3 ETSI standards 1064.3.1 Basic front-end standard ES 201 108 1074.3.2 Noise-robust front-end standard ES 202 050 1074.3.3 Tonal-language recognition standard ES 202 211 107

4.4 Transfer protocol 1084.4.1 Signaling 1094.4.2 RTP payload format 109

4.5 Energy-aware distributed speech recognition 1104.6 ESR, NSR, DSR 111

Bibliography 113

5 Context in Conversation 1155.1 Context modeling and aggregation 115

5.1.1 An example of composer specification 1215.2 Context-based speech applications: Conspeakuous 122

5.2.1 Conspeakuous architecture 1245.2.2 B-Conspeakuous 1255.2.3 Learning as a source of context 1255.2.4 Implementation 1275.2.5 A tourist portal application 130

5.3 Context-based speech applications: Responsive informationarchitect 132

5.4 Conclusion 133Bibliography 134

6 Software: Infrastructure, Standards, Technologies 1376.1 Introduction 1376.2 Mobile operating systems 1396.3 Voice over internet protocol 140

6.3.1 Implications for mobile speech 1416.3.2 Sample speech applications 1426.3.3 Access channels 142

6.4 Standards 1436.5 Standards: VXML 144

Page 12: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

x CONTENTS

6.6 Standards: VoiceFleXML 1456.6.1 Brief overview of speech-based systems 1476.6.2 System architecture 1486.6.3 System architecture: VoiceFleXML interpreter 1506.6.4 VoiceFleXML: Voice browser 1556.6.5 A prototype implementation 159

6.7 SAMVAAD 1636.7.1 Background and problem setting 1656.7.2 Reorganization algorithms 1666.7.3 Minimizing the number of dialogs 1676.7.4 Hybrid call-flows 1716.7.5 Minimally altered call-flows 1726.7.6 Device-independent call-flow characterization 1746.7.7 SAMVAAD: Architecture, implementation and

experiments 1756.7.8 Splitting dialog call-flows 180

6.8 Conclusion 1886.9 Summary and future work 188

Bibliography 189

7 Architecture of Mobile Speech-Based and MultimodalDialog Systems 1917.1 Introduction 1917.2 Multimodal architectures 1937.3 Multimodal frameworks 1957.4 Multimodal mobile applications 196

7.4.1 Mobile companion 1977.4.2 MUMS 1997.4.3 TravelMan 2007.4.4 Stopman 203

7.5 Architectural models 2067.5.1 Client–server systems 2077.5.2 Dialog description systems 2087.5.3 Generic model for distributed mobile multimodal

speech systems 2107.6 Distribution in the Stopman system 2117.7 Conclusions 214

Bibliography 214

Page 13: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

CONTENTS xi

8 Evaluation of Mobile and Pervasive Speech Applications 2198.1 Introduction 220

8.1.1 Spoken interaction 2208.1.2 Mobile-use context 2228.1.3 Speech and mobility 223

8.2 Evaluation of mobile speech-based systems 2248.2.1 User interface evaluation methodology 2258.2.2 Technical evaluation of speech-based systems 2268.2.3 Usability evaluations 2278.2.4 Subjective metrics and objective metrics 2288.2.5 Laboratory and field studies 2308.2.6 Simulating mobility in the laboratory 2318.2.7 Studying social context 2328.2.8 Long- and short-term studies 2328.2.9 Validity 233

8.3 Case studies 2358.3.1 STOPMAN evaluation 2358.3.2 TravelMan evaluation 2408.3.3 Discussion 247

8.4 Theoretical measures for dialog call-flows 2488.4.1 Introduction 2488.4.2 Dialog call-flow characterization 2508.4.3 〈m,q,a〉-characterization 2518.4.4 〈m,q,a〉-complexity 2538.4.5 Call-flow analysis using 〈m,q,a〉-complexity 254

8.5 Conclusions 257Bibliography 258

9 Developing Regions 2639.1 Introduction 2649.2 Applications and studies 264

9.2.1 VoiKiosk 2659.2.2 HealthLine 2679.2.3 The spoken web 2689.2.4 TapBack 271

9.3 Systems 2759.4 Challenges 278

Bibliography 278

Index 281

Page 14: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:
Page 15: Speech in Mobile and Pervasive Environments · PDF file · 2015-10-27fields with invaluable insight into the latest developments and cutting-edge research. ... 2 Mobile Speech Hardware:

About the Series Editors

Xuemin (Sherman) Shen (M’97-SM’02) received aBSc degree in electrical engineering from DalianMaritime University, China in 1982, and the MScand PhD degrees (both in electrical engineering)from Rutgers University, New Jersey, USA, in1987 and 1990 respectively. He is a Professorand University Research Chair, and the AssociateChair for Graduate Studies, at the Department ofElectrical and Computer Engineering, University ofWaterloo, Canada. His research focuses on mobilityand resource management in interconnected wire-less/wired networks, UWB wireless communica-

tions systems, wireless security, and ad hoc and sensor networks. He isa co-author of three books, and has published more than 300 papers andbook chapters on wireless communications and networks, control and filtering.Dr. Shen serves as a founding area editor for IEEE Transactions on WirelessCommunications; editor-in-chief for Peer-to-Peer Networking and Application;associate editor for IEEE Transactions on Vehicular Technology, KICS/IEEEJournal of Communications and Networks, Computer Networks, ACM/WirelessNetworks and Wireless Communications and Mobile Computing . He has alsoserved as a guest editor for IEEE JSAC, IEEE Wireless Communications andIEEE Communications Magazine. Dr. Shen received the Excellent GraduateSupervision Award in 2006, and the Outstanding Performance Award in 2004from the University of Waterloo, the Premier’s Research Excellence Award(PREA) in 2003 from the Province of Ontario, Canada, and the DistinguishedPerformance Award in 2002 from the Faculty of Engineering, University ofWaterloo. Dr. Shen is a registered Professional Engineer of Ontario, Canada.