ieee international conference on acoustics, speech, …hemantmisra, shajith ikbal, sunil sivadas,...
TRANSCRIPT
2005 IEEE International
Conference on Acoustics, Speech,and Signal Processing
Proceedings
Volume I of V
Speech Processing
March 18-23, 2005
Pennsylvania Convention Center/Marriott Hotel
Philadelphia, Pennsylvania, USA
Sponsored by
The Institute of Electrical and Electronics Engineers
Signal Processing Society
TIB/UB Hannover 89
127 547 649
OIEEE
TABLE OF CONTENTS
Volume I
SP-Ll: VOICE MORPHING
SP-L1.1: POLYGLOTSYNTHESIS USING A MIXTURE OF MONOLINGUAL CORPORA I -1
Javier Latorre, Koji Iwano, Sadaoki Furui, Tokyo Institute ofTechnology, Japan
SP-L1.2: INTRODUCING ROUGHNESS IN INDIVIDUALITY TRANSFORMATION I - 5
THROUGH JITTER MODELING AND MODIFICATION
Asliish Verma, IBM India Research Labs, India; Arun Kumar, CARE, Indian Institute ofTechnology Delhi, India
SP-L1.3: SPECTRAL CONVERSION BASED ON MAXIMUM LIKELIHOOD ESTIMATION I - 9
CONSIDERING GLOBAL VARIANCE OF CONVERTED PARAMETER
Tomoki Toda, Nagoya Institute ofTechnology, Japan; Alan W, Black, Carnegie Mellon University, United Slates; Keiichi Tokuda,
Nagoya Institute ofTechnology, Japan
SP-L1.4: A STUDYON RESIDUAL PREDICTION TECHNIQUES FOR VOICE I -13
CONVERSION
David Suendennann, Antonio Bonafonte, Universitat Polilecnica de Catalunya, Spain; Hermann Ney, RWTH Aachen University,Germany; Harald Hoege, Siemens AG, Germany
SP-L1.S: VOICE FORGERY USING ALISP: INDEXATION IN A CLIENT MEMORY I -17
Patrick Perrol, Guido Aversano, Raphael Blouet, Maurice Charbit, Gerard Chollet, Ecole Nationale Superieure cles
Telecommunications, France
SP-L1.6: AN IMPROVED SPECTRAL AND PROSODIC TRANSFORMATION METHOD IN I - 21
STRAIGHT-BASED VOICE CONVERSION
Long Qin, Gaopeng Chen, Zhenhua Ling, Lirong Dai, University ofScience and Technology ofChina, China
SP-L2: SPOKEN LANGUAGE UNDERSTANDING AND DIALOG
SP-L2.1: INCORPORATING DISCOURSE FEATURES INTO CONFIDENCESCORING OF I - 25
INTENTION RECOGNITION RESULTS IN SPOKEN DIALOGUE SYSTEMS
Ryuichiro Higashinaka, Katsuhito Sudoh, Mikio Nakano, Nippon Telegraph and Telephone Corporation, Japan
SP-L2.2: SEMANTIC INTERPRETATION WITH ERROR CORRECTION I - 29
Christian Raymond, Frideric Bechel, Nathalie Camelin, Renato De Mori, University ofAvignon, France; Geraldine Damnati,
France Telecom R&D, France
SP-L2.3: DIALOG ACT TAGGING USING GRAPHICAL MODELS I - 33
GangJi, JeffBilmes, University of Washington, Seattle, United States
SP-L2.4: A CLARIFICATION ALGORITHM FOR SPOKENDIALOGUE SYSTEMS I - 37
Charles Lewis, Giuseppe Di Fabbrizio, AT&TLabs - Research, United States
SP-L2.5: MODEL ADAPTATION FOR SPOKEN LANGUAGE UNDERSTANDING I - 41
Gokhan Tur, AT&T Labs - Research, United States
SP-L2.6: UNSUPERVISED SEMANTIC INTENT DISCOVERY FROM CALL LOG I - 45
ACOUSTICS
Xiao Li, University of Washington, United States; Asela Gunewardana, AlexAcero, Microsoft Research, United States
xv
SP-L3: SPEECH PERCEPTION AND PSYCHACOUSTICS
SP-L3.1: PROPOSAL ON OBJECTIVE SPEECH QUALITY ASSESSMENTFOR WIDEBAND I - 49
IP TELEPHONY
Chiharu Morioka, Atsuko Kurashima, Akira Takahashi, NTT Sendee Integration Laboratories, Japan
SP-L3.2: NEURAL CELL TYPE RECOGNITION BETWEEN GLOBUS PALLIDUS I - 53
EXTERNUS AND GLOBUS PALLIDUS INTERNUS BY GAUSSIAN MIXTURE MODELING
Qiang Fu, Mark Clements, Georgia Institute ofTechnology, United States; Klaus Mewes, Emory University, United States
SP-L3.3: ANALYSIS OF RELATIONSHIP BETWEEENOVERALL QUALITY AND I - 57
PSYCHOLOGICAL FACTORS AFFECTING HIGH-QUALITY SPEECH COMMUNICATION
SERVICES
Hitoshi Aoki, Akira Takahashi, NTT, Japan
SP-L3.4: CAN YOU UNDERSTAND HIM? LET'S LOOKAT HIS WORD ACCURACY - I - 61
AUTOMATIC EVALUATION OF TRACHEOESOPHAGEAL SPEECH
Maria Schuster, Universitdtsklinikum Erlemgen, Gennany; Elmar Noelh, Tino Haderl, Stefan Sleidl, Anton Batliner, Universitdt
Erlangen-NUmberg, Germany; Frank Rosanowski, Universitdtsklinikum Erlangen, Germany
SP-L3.5: A WARPED BANDWIDTH EXPANSION FILTER I - 65
Marc Boillol, University ofFlorida / Motorola, United Slates; John Harris, University ofFlorida, United Stales
SP-L3.6: RELATIVE ENERGY AND INTELLIGIBILITY OF TRANSIENT SPEECH I - 69
INFORMATION
Sungyub Yoo, J. Robert Boston, John Durrant, Kristie Kovacyk, Stacey Karn, Susein Shaiman, Amro El-Jamudi, Ching-Chung Li,
University ofPittsburgh, United Stales
SP-L4: CONFIDENCE MEASURES AND REJECTION ALGORITHMS
SP-L4.1: REJECTION USING RANK STATISTICS BASED ON HMM STATE SHORTLISTS I - 73
Enrico Bocchieri, Sarangarajan Parthasaralhy, AT&T Labs - Research, United Slates
SP-L4.2: SPEAKER ADAPTIVE CONFIDENCE SCORING USING BAYESIAN COMBINING I - 77
Tae-Yoon Kim, Hanseok Ko, Korea University, Republic of Korea
SP-L4.3: IMPROVING UTTERANCE VERIFICATION USING ADDITIONAL CONFIDENCE I - 81
MEASURES IN ISOLATED SPEECH RECOGNITION INTERFACES
Graham Greenland, Willy Wong, Hans Kunov, University of Toronto, Canada
SP-L4.4: GENERALIZED POSTERIOR PROBABILITY FOR MINIMUM ERROR I - 85
VERIFICATION OF RECOGNIZED SENTENCES
Wai Kit Lo, Frank K. Soong, Spoken Language Translation Research Labs, ATR, Japan
SP-L4.5: ROBUST SPEECH RECOGNITION BY INTEGRATING SPEECH SEPARATION I - 89
AND HYPOTHESIS TESTING
Soundararajan Srinivasan, DeLiang Wang, The Ohio Stale University, United States
SP-L4.6: COMBINATION OF MULTIPLE PREDICTORS TO IMPROVE CONFIDENCE I - 93
MEASURE BASED ONLOCAL POSTERIOR PROBABILITIES
Yuewen Fu, Limin Du, Chinese Academy of Sciences, China
SP-L5: DISCRIMINATIVE TRAINING
SP-L5.1: ADAPTATION OF PRECISION MATRIXMODELS ON LARGE VOCABULARY I - 97
CONTINUOUS SPEECH RECOGNITION
Khe Chai Sim, Mark J. F. Gales, Cambridge University, United Kingdom
xvi
SP-L5.2: DISCRIMINATIVE TRAINING OF CDHMMS FORMAXIMUM RELATIVE I -101
SEPARATION MARGIN
Chaojun Liu, Hui Jiang, Xinwei Li, York University, Canada
SP-L5.3: STATISTICAL PERFORMANCE ANALYSIS OF MCE/GPD LEARNING IN GAUSSIAN I -105
CLASSIFIERS AND HIDDEN MARKOV MODELS
MohamedAfify, BBN Technologies, United States; Xin-Wei Li, Hui Jiang, York University, Canada
SP-L5.4: DISCRIMINATIVETRAINING OF ACOUSTIC MODELS APPLIED TO DOMAINS I -109
WITH UNRELIABLE TRANSCRIPTS
Lambert Mathias, Johns Hopkins University, United States; Girija Yegnanarayanan, Juergen Fritsch, Multimodal Technologies,Inc., United States
SP-L5.5: MINIMUM CLASSIFICATION ERROR FOR LARGE SCALE SPEECH I -113
RECOGNITION TASKS USING WEIGHTED FINITE STATE TRANSDUCERS
Erik McDermott, Shigeru Katagiri, NTT Corporation, Japan
SP-L5.6: DISCRIMINATIVETRAINING BASEDONTHE CRITERION OF LEAST PHONE I -117
COMPETING TOKENS FOR LARGE VOCABULARY SPEECH RECOGNITION
Bo Liu, University ofSci. & Tech. ofChina, China; Hui Jiang, York University, Canada; Jian-Lai Zhou, Microsoft Reseach Asia,
China; Ren-Hua Wang, University ofSci. & Tech. of China, China
SP-L6: QUANTIZATION AND QUALITY MEASUREMENT
SP-L6.1: MULTI-FRAME GMM-BASED BLOCK QUANTISATION OF LINE SPECTRAL I -121
FREQUENCIES FOR WIDEBAND SPEECH CODING
Stephen So, Kuldip K. Paliwal, Griffith University, Australia
SP-L6.2: NON-INTRUSIVE GMM-BASED SPEECH QUALITY MEASUREMENT I -125
Tiago Folk, Qingfeng Xu, Wai-Yip Chan, Queen's University, Canada
SP-L6.3: A MULTIPLE-DESCRIPTION PCM SPEECH CODER USING STRUCTURED I -129
DUAL VECTOR QUANTIZERS
Stephen Voran, Institute for Telecommunication Sciences, United States
SP-L6.4: A NEW SEGMENT QUANTIZER FOR LINE SPECTRAL FREQUENCIES USING I -133
LEMPEL-ZIV ALGORITHM
Minoru Kohata, Chiba Institute of Technology, Japan; Motoyuki Suzuki, Shozo Makino, Tohoku University, Japan
SP-L6.5: PREDICTIVE VQFORBANDWIDTH SCALABLE LSP QUANTIZATION I -137
Hiroyuki Ehara, Toshiyuki Morii, Masahiro Oshikiri, Koji Yoshida, Matsushita Electric Industrial Co., Ltd., Japan
SP-L6.6: CODING WITH SIDE INFORMATION TECHNIQUES FORLSF I -141
RECONSTRUCTION IN VOICE OVER IP
Yannis Agiomyrgiannakis, Foundation ofResearch and Technology Hellas, Greece; Yannis Stylianou, University of Crete,
Greece
SP-L7: SPEECH ENHANCEMENT WITH NOISE REDUCTION
SP-L7.1: SIGNAL SUBSPACE SPEECHENHANCEMENT FOR AUDIBLE NOISE I -145
REDUCTION
Changhuai You, SooNgee Koh, Nanyang Technological University, Singapore; Susanto Rahardja, Institutefor Infocomm
Research, Singapore
SP-L7.2: A WAVELET KALMAN FILTER WITHPERCEPTUAL MASKING FOR SPEECH I -149
ENHANCEMENT IN COLORED NOISE
Ning Ma, Martin Bouchard, University ofOttawa, Canada; Rafik A. Goubran, Carleton University, Canada
x\m
SP-L7.3: ADAPTIVE TIME SEGMENTATION OF NOISY SPEECH FOR IMPROVED I -153
SPEECH ENHANCEMENT
Richard Christian Hendriks, Richard Heusdens, Jesper Jensen, Delft University ofTechnology, Netherlands
SP-L7.4: SPEECHENHANCEMENT USING HARMONIC REGENERATION I -157
Cyril Plapous, Claude Marro, France Telecom, France; Pascal Scedart, ENSSAT, France
SP-L7.5: INSTANT NOISE ESTIMATION USING FOURIERTRANSFORM OFAMDFAND I -161
VARIABLE START MINIMA SEARCH
Zhong Lin, RafikA. Goubran, Carlelon University, Canada
SP-L7.6: SPEECH ENHANCEMENT BASED ON SPEECH SPECTRAL COMPLEX GAUSSIAN I -165
MIXTURE MODEL
Guo-Hong Ding, Xia Wang, Yang Cao, Feng Ding, Yuezhong Tang, Nokia Research Center, Beijing, China
SP-L8: SPEAKER RECOGNITION USING ACOUSTIC AND HIGHER LEVEL FEATURES
SP-L8.1: IMPROVED PHONETIC SPEAKER RECOGNITION USING LATTICE DECODING I -169
Andrew Hatch, Barbara Peskin, International Computer Science Institute, United States; Andreas Stolcke, SRI International,
United States
SP-L8.2: SRFS 2004 NIST SPEAKER RECOGNITION EVALUATION SYSTEM I -173
Sachin Kajarekar, Luciano Ferrer, Elizabeth Shriberg, Kemal Sonmez, Andreas Stolcke, Anand Venkataraman, Jing Zheng, SRI
International, United States
SP-L8.3: THE 2004 MIT LINCOLN LABORATORY SPEAKER RECOGNITION SYSTEM I -177
Douglas Reynolds, William Campbell, Terry Gleason, Carl Quillen, Douglas Sturim, Pedro Torres-Carrasquillo, MIT Lincoln
Leiboratory, United States; Andre Adami, Oregon Health & Science University, United States
SP-L8.4: SPEAKER VERIFICATION USING ADAPTED ARTICULATORY FEATURE-BASED I -181
CONDITIONAL PRONUNCIATION MODELING
Ka-Yee Leung, Man-Wai Mak, Hong Kong Polytechnic University, Hong Kong SAR ofChina; Manhitng Siu, Hong KongUniversity ofScience and Technology, Hong Kong SAR ofChina; Sun-Yuan Kung, Princeton University, United Stales
SP-L8.5: PROSODY MODELING AND EIGEN-PROSODY ANALYSIS FORROBUST I -185
SPEAKER RECOGNITION
Zi-He Chen, National Central University, Taiwan; Yuan-Fu Liao, National Taipei University ofTechnology, Taiwan; Yau-TarngJuang, National Central University, Taiwan
SP-L8.6: PROSODIC MODELING FOR SPEAKER RECOGNITION BASED ON SUB-BAND I -189
ENERGY TEMPORAL TRAJECTORIES
Andre Adami, University ofCaxias do Sul, Brazil
SP-L9: LARGE VOCABULARY ASR
SP-L9.1: SUB-PHONETIC POLYNOMIAL SEGMENT MODEL FOR LARGE VOCABULARY I -193
CONTINUOUS SPEECH RECOGNITION
Siu-Kei Au Yeung, Chak-Fai Li, Man-Hung Siu, Hong Kong University ofScience and Technology, Hong Kong SAR ofChina
SP-L9.2: CONTRUCTING ENSEMBLES OF ASR SYSTEMS USING RANDOMIZED I -197
DECISION TREES
Olivier Siohan, Bhuvana Ramabhadran, Brian Kingsbury, IBM T. J. Watson Research Center, United Stales
SP-L9.3: EFFICIENT GENERATION OF HIGH-ORDER CONTEXT-DEPENDENT I - 201
WEIGHTED FINITE STATETRANSDUCERS FOR SPEECHRECOGNITION
Mike Schuster, Takaaki Hori, NTT Corporation, Japan
xviu
SP-L9.4: THE IBM 2004 CONVERSATIONAL TELEPHONY SYSTEM FOR RICH I - 205
TRANSCRIPTION
Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Soon, Geoffrey Zweig, IBM, United States
SP-L9.5: TRAINING LVCSR SYSTEMS ON THOUSANDS OF HOURS OF DATA I - 209
Gunnar Evermann, Ho Yin Chan, MarkJ. F. Gales, Bin Jia, DavidMrva, Phil Woodland, Kai Yu, Cambridge University, United
Kingdom
SP-L9.6: LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS I - 213
HOPKINS SUMMER WORKSHOP
Mark Hasegawa-Johnson, University ofIllinois, United States; James Baker, Carnegie Mellon University, United States; Sarah
Borys, University ofIllinois, United States; Ken Chen, University ofCalifornia, San Diego, UnitedStates; Emily Coogan,
University ofIllinois, United States; Steven Greenberg, University ofCalifornia, Berkeley, United States; AmitJuneja, University
ofMaryland, United States; Katrin Kirchhqff, University of Washington, United States; Karen Livescu, Massachusetts Institute of
Technology, United States; Srividya Mohan, Johns Hopkins University, United States; Jennifer Mutter, Department ofDefense,United Stales; Kemal Sonmez, SRI International, United States; Tianyu Wang, Georgia Institute ofTechnology, United States
SP-L10: NOVEL METHODS FOR SPEECH ANALYSIS
SP-L10.1: SPEECH ANALYSIS BY ESTIMATING PERCEPTUALLY RELEVANT POLE I - 217
LOCATIONS
Venkatraman Atli, Andreas Spanias, Arizona State University, United States
SP-L10.2: COHERENTENVELOPE DETECTION FOR MODULATION FILTERING OF I - 221
SPEECH
Steven Schimmel, Les Atlas, University of Washington, United Slates
SP-L10.3: SPEECHSIGNAL ANALYSIS WITH EXPONENTIALAUTOREGRESSIVE MODEL I - 225
Kentaro Ishizuka, Hiroko Kato, Tomohiro Nakatani, NTT Corporation, Japan
SP-L10.4: COMPARISON OF AUTOREGRESSIVEPARAMETER ESTIMATION ALGORITHMS 1-229
FORSPEECH PROCESSING AND RECOGNITION
Robert Morris, Jon Arrowood, Nexidia Inc., United States; Mark Clements, Georgia Institute of Technology, United States
SP-L10.5: ANALGORITHMFOR LOCATING FUNDAMENTAL FREQUENCY MARKERS IN I - 233
SPEECH SIGNALS
Princy Dikshit, Stephen Zahorian, Shivaram Nagulapati, Old Dominion University, United States
SP-L10.6: AN AUTOREGRESSIVE, NON-STATIONARY EXCITED SIGNAL PARAMETER I - 237
ESTIMATION METHOD AND AN EVALUATION OF A SINGING-VOICE RECOGNITION
Akira Sasou, Masataka Goto, Natl. Inst, ofAdv. lnd. Sci. & Technology (AIST), Japan; Satoru Hayamizu, Gifu University, Japan;
Kazuyo Tanaka, University' ofTsukuba, Japan
SP-L11: NOISE ROBUST SPEECH RECOGNITION
SP-Lll.l: STATIC AND DYNAMIC SPECTRAL FEATURES: THEIR NOISE ROBUSTNESS I - 241
AND OPTIMAL WEIGHTS FOR ASR
Chen Yang, The Chinese University ofHong Kong, Hong Kong SAR ofChina; Frank K. Soong, Spoken Language Translation
Labs, ATR, Japan; Tan Lee, The Chinese University ofHong Kong, Hong Kong SAR ofChina
SP-L11.2: LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH I - 245
RECOGNITION
Weizhong Zhu, INRS-EMT, University ofQuebec, Canada; Douglas O'Shaughnessy, University ofQuebec, Canada
SP-L11.3: A COMPANDING FRONT END FOR NOISE-ROBUST AUTOMATIC SPEECH I - 249
RECOGNITION
Jethran Guinness, Bhiksha Raj, Bent Schmidt-Nielsen, Mitsubishi Electric Research Laboratories, United Slates; Lorenzo
Turicchia, Rahul Sarpeshkar, Massachusetts Institute of Technology, United States
xix
SP-L11.4: MULTI-RESOLUTION SPECTRAL ENTROPY FEATUREFOR ROBUST ASR I - 253
Hemant Misra, Shajith Ikbal, Sunil Sivadas, Herve Bourlard, IDIAP Research Institute, Switzerland
SP-L11.5: PARTICLE FILTER BASED NON-STATIONARY NOISE TRACKING FORROBUST I - 257
SPEECHRECOGNITION
Masakiyo Fujimolo, Satoshi Nakamura, ATR Spoken Language Translation Research Labs, Japan
SP-L11.6: ONLINE CEPSTRAL FILTERING USING A SEQUENTIAL EM APPROACH WITH I - 261
POLYAKAVERAGING AND FEEDBACK
Tor Andre Myivoll, Norwegian University ofScience and Technology, Nonvay; Satoshi Nakamura, SLTLaboratory, ATR, Japan
SP-P1: PROSODY AND SPEECH SYNTHESIS
SP-P1.1: IMPROVING THE UNDERSTANDABILITY OF SPEECH SYNTHESIS BY I - 265
MODELING SPEECH IN NOISE
Brian Langner, Alan W. Black, Carnegie Mellon University, United States
SP-P1.2: AN AUTOMATIC PROSODY RECOGNIZER USING A COUPLED MULTI-STREAM I - 269
ACOUSTIC MODEL ANDA SYNTACTIC-PROSODIC LANGUAGE MODEL
Sankaranarayanan Aneinthakrishnan, Shrikanth Narayanan, University ofSouthern California, United States
SP-P1.3: FO CONTROL CHARACTERIZATION BY PERCEPTUAL IMPRESSIONS ON I - 273
SPEAKING ATTITUDES USING MULTIPLE DIMENSIONAL SCALING ANALYSIS
Yoko Kokenawa, Waseda University, Japan; Minoru Tsuzaki, Kyoto City University' ofArts, Japan; Hiroaki Kato, ATR Human
Information Science Labs, Japan; Yoshinori Sagisaka, Waseda University, Japan
SP-P1.4: ADDITIVE MODELING OF ENGLISH FO CONTOUR FOR SPEECH SYNTHESIS I - 277
Shinsuke Sakai, Massachusetts Institute ofTechnology, United Stales
SP-P1.5: PROSODY ANALYSIS AND MODELING FOR EMOTIONALSPEECH SYNTHESIS I - 281
Dan-ning Jiang, Tsinghua University, China; Wei Zhang, Li-qin Shen, IBM China Research Lab, China; Lian-hong Cai,
Tsinghua University, China
SP-P1.6: SLIDING WINDOW SMOOTHING FORMAXIMUM ENTROPY BASED I - 285
INTONATIONAL PHRASE PREDICTION IN CHINESE
Jian-Feng Li, Guo-Ping Hit, Ren-Hua Wang, Li-Rong Dai, University ofScience and Technology of China, China
SP-P1.7: IDENTIFICATION AND SYNTHESIS OF CANTONESE TONES BASED ON THE I - 289
COMMAND-RESPONSE MODELFOR FO CONTOUR GENERATION
Wenlao Gu, Shanghai Jiaotong University, China; Keikichi Hirose, Hiroya Fujisaki, University of Tokyo, Japan
SP-P1.8: COMPRESSION OF EXCEPTION LEXICONS FOR SMALL FOOTPRINT I - 293
GRAPHEME-TO-PHONEME CONVERSION
Joram Meron, Peter Veprek, Panasonic Digital Networking Lab, United States
SP-P1.9: PREDICTION OF PRONUNCIATION VARIATIONS FOR SPEECH SYNTHESIS: A I - 297
DATA-DRIVEN APPROACH
Christina Bennett, Alan W. Black, Carnegie Mellon University, United States
SP-P1.10: RECORDING SCRIPT DESIGN FOR CORPUS-BASED TTS SYSTEM BASED ON I - 301
COVERAGE OF VARIOUS PHONETIC ELEMENTS
Mitsuaki Isogai, Hideyuki Mizuno, Kazunori Memo, NTT Cyber Space Laboratories, NTT Corporation, Japan
SP-P1.11: OPTIMAL SUBSET SELECTIONFROM TEXT DATABASES I - 305
Jilei Tian, Jcmi Nurminen, Imre Kiss, Nokia Research Center, Finland
SP-P1.12: COMPARATIVE STUDY OF AUTOMATIC PHONE SEGMENTATION METHODS I - 309
FOR TTS
Jordi Aclell, Antonio Bonafonle, Universitat Polite'cnica de Catalunya, Spain; Jon Ander Gdmez., Maria Jose Castro, Universitat
Politecnica de Valencia, Spain
xx
SP-P2: GENERAL TOPICS IN ASR
SP-P2.1: INCREASED ROBUSTNESS AGAINST BIT ERRORS FOR DISTRIBUTED SPEECH I - 313
RECOGNITION IN WIRELESS ENVIRONMENTS
Brian Delaney, Georgia Institute of Technology, United States
SP-P2.2: "OF ALL THINGS THE MEASURE IS MAN": AUTOMATIC CLASSIFICATION OF I - 317
EMOTIONS AND INTER-LABELER CONSISTENCY
Stefan Steidl, Michael Levit, Anton Ballmer, ElmarNoth, HeinrichNiemann, University ofErlangen, Germany
SP-P2.3: DISORDERED SPEECHEVALUATION USING OBJECTIVE QUALITY MEASURES I - 321
Lingyun Gu, John Harris, Rahul Shrivastav, Christine Sapienza, University ofFlorida, United States
SP-P2.4: META-CLASSIFIERS IN ACOUSTIC AND LINGUISTIC FEATURE FUSION-BASED I - 325
AFFECT RECOGNITION
Bjorn Schuller, Raquel Jimenez Villar, Gerhard Rigoll, Manfred Lang, Technische Universitat Munchen, Germany
SP-P2.5: PACKET LOSS CONCEALMENT BASED ONVQ REPLICAS AND MMSE I - 329
ESTIMATION APPLIEDTO DISTRIBUTED SPEECH RECOGNITION
Antonio M. Peinado, Angel M. Gomez, Victoria E. Sanchez, Jose L. Perez-Cordoba, Antonio J. Rubio, Universidad de Granada,
Spain
SP-P2.6: A COMPARISON OF SOFT-FEATUREDISTRIBUTED SPEECH RECOGNITION I - 333
WITH CANDIDATE CODECS FOR SPEECH ENABLED MOBILE SERVICES
Valentin Ion, Reinhold Haeb-Umbach, University ofPaderborn, Germany
SP-P2.7: A HIDDEN TRAJECTORY MODEL WITH BI-DIRECTIONAL TARGET-FILTERING: I - 337
CASCADED VS. INTEGRATED IMPLEMENTATIONFOR PHONETIC RECOGNITION
Li Deng, Xiang Li, Dong Yu, Alex Acero, Microsoft Research, United States
SP-P2.8: A COMPARISON OF CLASSIFIERS FOR DETECTINGEMOTION FROM SPEECH I - 341
Izhak Shafran, Johns Hopkins University, United States; Mehryar Mohri, New York University, United States
SP-P2.9: SOFT DECODING OF TEMPORAL DERIVATIVES FOR ROBUST DISTRIBUTED I - 345
SPEECHRECOGNITION IN PACKET LOSS
Alastair James, Ben Milnet; University ofEastAnglia, United Kingdom
SP-P2.10: DBN-BASED MULTI-STREAM MODELS FOR MANDARIN TONEME I - 349
RECOGNITION
Xin Lei, Gang Ji, Tim Ng, JeffBilmes, Mari Ostendorf, University of Washington, United States
SP-P2.11: SPARSE KPCA FOR FEATURE EXTRACTION IN SPEECH RECOGNITION I - 353
Amaro Lima, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, Nagoya Institute ofTechnology, Japan;
Fernando Gil Resende, Federal University ofRio de Janeiro, Brazil
SP-P2.12: EFFECTS OF PHONEME CHARACTERISTICS ON TEO FEATURE-BASED I - 357
AUTOMATIC STRESS DETECTION IN SPEECH
Evan Ruzanski, University ofColorado, United States; John H. L. Hansen, University ofColorado, Boulder, United States; James
LMeyerhoff, George Saviolakis, Michael Koenig, Walter ReedArmy Institute ofResearch, United States
SP-P3: SPEECH ANALYSIS AND SYNTHESIS
SP-P3.1: SCALABLE CONCATENATIVE SPEECH SYNTHESIS BASED ON THE PLURAL I - 361
UNIT SELECTION AND FUSION METHOD
Masalsune Tamura, Tatsuya Mizutani, Takehiko Kagoshima, Toshiba Corporation, Japan
SP-P3.2: ADAPTIVE TRAINING FOR HIDDEN SEMI-MARKOV MODEL I - 365
Junichi Yamagishi, Takao Kobayashi, Tokyo Institute ofTechnology, Japan
XXI
SP-P3.3: PERCEPTUALLY WEIGHTED LONG TERM MODELING OF SINUSOIDAL I - 369
SPEECH AMPLITUDE TRAJECTORIES
Mohammad Firouzmand, Laurent Girin, INPG, France
SP-P3.4: SPEECH RECOGNITION IN THE BLIND CONDITION BASED ON MULTIPLE I - 373
DIRECTIVITY PATTERNS USING A MICROPHONE ARRAY
Toshiyuki Sekiya, Tetsunori Kobayashi, Waseda University, Japan
SP-P3.5: AN UNSUPERVISED QUANTITATIVE MEASURE FOR WORD PROMINENCE IN I - 377
SPONTANEOUS SPEECH
Dagen Wang, Shrikanth Narayanan, USC Viterbi School ofEngineering, United States
SP-P3.6: SPEECH MODELLING BASED ON GENERALIZED GAUSSIAN PROBABILITY I - 381
DENSITY FUNCTIONS
Kostas Kokkinakis, Asoke K. Nandi, University ofLiverpool, United Kingdom
SP-P3.7: BAYESIAN MODEL BASED NON-INTRUSIVE SPEECH QUALITY EVALUATION I - 385
Guo Chen, VijayParsa, University ofWestern Ontario, Canada
SP-P3.8: ROBUST PITCH ESTIMATION AT VERY LOW SNR EXPLOITING TIME AND I - 389
FREQUENCY DOMAIN CUES
Celia Shahnaz, Wei-Ping Zhu, M. Omair Ahmad, Concordia University, Canada
SP-P3.9: FUNDAMENTAL FREQUENCY ESTIMATION AND VOCAL TREMOR ANALYSIS BY I - 393
MEANS OF MORLET WAVELET TRANSFORMS
Laurence Cnockaerl, Francis Grenez, Jean Schoentgen, University Libre de Bruxelles, Belgium
SP-P3.10: AUTOMATIC SPEECH SEGMENTATION USING AVERAGE LEVEL CROSSING I - 397
RATE INFORMATION
Anindya Sarkar, T. V. Sreenivas, Indian Institute of Science, India
SP-P3.11: DWT-BASED PHONETIC GROUPS CLASSIFICATION USING NEURAL I - 401
NETWORKS
Van Titan Pham, Gemot Kubin, University of Technology, Graz, Austria
SP-P3.12: A NOVEL KLTALGORITHM OPTIMIZED FOR SMALL SIGNAL SETS I - 405
Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetli, Universita Politecnica delle Marche, Italy
SP-P3.13: VOICING-STATE CLASSIFICATION OF CO-CHANNEL SPEECH USING I - 409
NONLINEAR STATE-SPACE RECONSTRUCTION
Yasser Mahgoub, Richard Dansereau, Carleton University, Canada
SP-P3.14: SPEECH RATE ESTIMATION VIA TEMPORAL CORRELATION AND SELECTED I - 413
SUB-BAND CORRELATION
Shrikanth Narayanan, Dagen Wang, USC Viterbi School ofEngineering, United States
SP-P4: MODEL-BASED ROBUST SPEECH RECOGNITION
SP-P4.1: CLOSELY COUPLED ARRAY PROCESSING AND MODEL-BASED I - 417
COMPENSATION FOR MICROPHONE ARRAY SPEECH RECOGNITION
Xianyu Zhao, Zhijian On, Minima Chen, Zuoying Wang, Tsinghua University, China
SP-P4.2: CONTEXT-DEPENDENTDURATION MODELING I - 421
Daniel Willett, Temic Speech Dialog Systems, Germany
SP-P4.3: RECOGNISING SPEECH IN THE PRESENCE OF A COMPETING SPEAKER I - 425
USING A SPEECHFRAGMENT DECODER'
Andre" Coy, Jon Barker, University ofSheffield, United Kingdom
xxu
SP-P4.4: AN ENVIRONMENT COMPENSATED MAXIMUM LIKELIHOOD TRAINING I - 429
APPROACHBASED ON STOCHASTICVECTOR MAPPING
Jian Wu, Microsoft Corp., United States; Qiang Huo, Donglai Zhu, University ofHong Kong, Hong Kong SAR of China
SP-P4.5: EFFECT OF PHASE-SENSITIVE ENVIRONMENT MODELAND HIGHER ORDER I - 433
VTS ON NOISY SPEECH FEATURE ENHANCEMENT
Veronique Stouten, Hugo Van hamme, Patrick Wambacq, Katholieke Universiteit Leuven, Belgium
SP-P4.6: TOWARDS SPEECHRECOGNITION ORIENTED DEREVERBERATION I - 437
Pamornpol Jinachitra, Stanford University, United States; Ramon Prieto, Toyota InfoTechnology Center U.S.A., United States
SP-P4.7: NOISY SPEECHRECOGNITION BASED ONROBUST END-POINT DETECTION I - 441
ANDMODELADAPTATION
Zhipeng Zhang, NTT DoCoMo, Japan; Sadaoki Fund, Tokyo Institute ofTechnology, Japan
SP-P4.8: ANALYSIS OF A LARGE IN-CAR SPEECH CORPUS AND ITS APPLICATION TO THE I - 445
MULTIMODEL ASR
Hiroshi Fujimura, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda, Nagoya University, Japan; Fumilada Itakura, Meijo
University, Japan
SP-P4.9: BUILDING AN EFFECTIVE CORPUS BY USING ACOUSTIC SPACE I - 449
VISUALIZATION (COSMOS) METHOD
Goshu Nagino, Makoto Shozakai, Asahi Kasei Corporation, Japan
SP-P4.10: HMM/ANN BASED SPECTRAL PEAK LOCATION ESTIMATION FOR NOISE I - 453
ROBUST SPEECH RECOGNITION
Shajith Ikbeil, Heiye Bourlard, Mathew Magimai.-Doss, 1DIAP Research Institute, Switzerland
SP-P4.11: ACOUSTIC FEATURE COMBINATION FOR ROBUST SPEECH RECOGNITION I - 457
Andrds Zolnay, RalfSchlueter, Hermann Ney, RWTH-Aachen Germany, Germany
SP-P4.12: ACOUSTIC TRAINING FROM HETEROGENEOUS DATA SOURCES: I - 461
EXPERIMENTS IN MANDARIN CONVERSATIONAL TELEPHONE SPEECH TRANSCRIPTION
Stavros Tsakalidis, Johns Hopkins University, United States; William Byrne, Cambridge University, United Kingdom
SP-P5: SPEECHMINING AND AUDIO-VISUAL INFORMATION PROCESSING
SP-P5.1: DYNAMICMATCH PHONE-LATTICE SEARCHES FORVERY FAST AND I - 465
ACCURATEUNRESTRICTED VOCABULARY KEYWORD SPOTTING
Kishan Thambiralnam, Sridha Sridharan, Queensland University ofTechnology, Australia
SP-P5.2: A STREAM-WEIGHT OPTIMIZATION METHOD FOR MULTI-STREAM HMMS I - 469
BASED ON LIKELIHOOD VALUE NORMALIZATION
Satoshi Tamura, Koji Iwano, Sadaoki Furui, Tokyo Institute of Technology, Japan
SP-P5.3: LIP READING FOR ROBUST SPEECH RECOGNITION ONEMBEDDED I - 473
DEVICES
Jesus Fernando Guitarte Perez, Siemens AG, Corporate Technology, Germany; Alejandro F. Frangi, Pompeii Fabra University,
Spain; Eduardo Lleida Solano, University ofZaragoza, Spain; Klaus Lukas, Siemens AG, Corporate Technology, Spain
SP-P5.4: NOVEL TECHNIQUES FOR TIME-COMPRESSING SPEECH: AN I - 477
EXPLORATORY STUDY
Simon Tucker, Steve Whittaker, University of Sheffield, United Kingdom
SP-P5.5: FAST TWO-STAGE VOCABULARY-INDEPENDENT SEARCH IN SPONTANEOUS I - 481
SPEECH
Peng Yu, Frank Seide, Microsoft Research Asia, China
JOT/7
SP-P5.6: AN HMM-BASED TEXT SEGMENTATION METHOD USING VARIATIONALBAYES I - 485
APPROACH AND ITS APPLICATION TO LVCSRFOR BROADCAST NEWS
Takafumi Koshinaka, Ken-ichi Iso, Akitoshi Okumura, NEC Corporation, Japan
SP-P5.7: DETECTING GROUP INTEREST-LEVEL IN MEETINGS I - 489
Daniel Gatica-Perez, Iain McCowan, Dong Zhang, Sainy Bengio, 1DIAP Research Institute, Switzerland
SP-P5.8: SEMANTIC DATA MINING OF SHORT UTTERANCES I - 493
Lee Begeja, AT&TLabs - Research, United States; Harris Drucker, Monmouth University, United States; David Gibbon, Patrick
Haffner, Zhu Liu, Bernard Renger, Behzad Shahraray, AT&TLabs - Research, United States
SP-P5.9: AUTOMATIC PROCESSING OF AUDIO LECTURES FOR INFORMATION I - 497
RETRIEVAL: VOCABULARY SELECTION AND LANGUAGEMODELING
Alex Park, Timothy Hazen, James Glass, MIT CSAIL, United States
SP-P5.10: BLIND CHANGEDETECTIONFOR AUDIO SEGMENTATION I - 501
Mohamed Omar, Upendra Chaudhari, Ganesh Ramaswamy, IBM, United States
SP-P5.11: COMBINING MULTIPLE SUBWORD REPRESENTATIONS FOR I - 505
OPEN-VOCABULARY SPOKEN DOCUMENTRETRIEVAL
Shi-wook Lee, National Institute ofAlST, Japan; Kazuyo Tanaka, University ofTsukuba, Japan; Yoshiaki Itoh, lwate Prefectural
University, Japan
SP-P5.12: ROBUST LIP-MOTION FEATURES FOR SPEAKER IDENTIFICATION I - 509
Hasan Ertan Cetingiil, Yucel Yemez, Engin Erzin, A. Murat Tekalp, Koc University, Turkey
SP-P6: FEATURE-BASED ROBUST SPEECH RECOGNITION
SP-P6.1: VARIATIONALBAYESIAN FEATURE SALIENCY FOR AUDIO TYPE I - 513
CLASSIFICATION
Fabio Valenle, Christian Wellekens, Eurecom Institute, France
SP-P6.2: PITCH-SYNCHRONOUS ZCPA (PS-ZCPA)-BASED FEATURE EXTRACTION WITH I - 517
AUDITORY MASKING
Muhammad Ghulam, Takashi Fukiida, Junsei Horikawa, Tsuneo Nitta, Toyohashi University of Technology, Japan
SP-P6.3: MFCC COMPENSATION FOR IMPROVED RECOGNITION OF FILTERED AND I - 521
BAND-LIMITED SPEECH
Nicolas Morales, Universidad Autonoma de Madrid, Spain; John H. L. Hansen, University of Colorado, Boulder, United Slates;
Doroteo T. Toleelano, Universidad Autonoma de Madrid, Spain
SP-P6.4: SPEECH FEATURE SMOOTHING FOR ROBUST ASR I - 525
Chia-Ping Chen, JeffBilmes, University ofWashington, United States; Daniel Ellis, Columbia University, United States
SP-P6.5: ON DESENSITIZING THE MEL-CEPSTRUM TO SPURIOUS SPECTRAL I - 529
COMPONENTSFOR ROBUST SPEECH RECOGNITION
Vivek Tyagi, Christian Wellekens, Institute Eurecom, France
SP-P6.6: TWO-STAGENOISE SPECTRA ESTIMATION AND REGRESSION BASED IN-CAR I - 533
SPEECHRECOGNITIONUSING SINGLE DISTANT MICROPHONE
Weifeng Li, Katunobu Itou, Kazuya Takeda, Nagoya University, Japan; Fumitaela Itakura, Meijo University, Japan
SP-P6.7: MASK ESTIMATION BASED ON SOUND LOCALISATION FOR MISSING DATA I - 537
SPEECH RECOGNITION
Sue Harding, Jon Barker, Guy J. Brown, University ofSheffield, United Kingdom
SP-P6.8: SPEECH PROCESSING USING JOINT FEATURES DERIVED FROM THE I - 541
MODIFIED GROUP DELAY FUNCTION
Rajesh Hegde, Hema Murthy, Indian Institute of Technology, India; Gaclde V. Ramana Rao, SRI International, United States
xxiv
SP-P6.9: INFLUENCE OF AUTOCORRELATION LAG RANGES ON ROBUST SPEECH I - 545
RECOGNITION
Benjamin J. Shannon, Kuldip K. Paliwal, Griffith University, Australia
SP-P6.10: SUBSPACE-BASED SPEAKER-INDEPENDENTVOWEL RECOGNITION I - 549
R. Muralishankar, Douglas O'Shaughnessy, University of Quebec, Canada
SP-P6.11: ROBUSTSPEECH RECOGNITION BASED ON SPECTRAL ADJUSTING AND I - 553
WARPING
Rui Zhao, Zuoying Wang, Tsinghua University, China
SP-P6.12: ROBUST SPEECH ACTIVITY DETECTION USING LDA APPLIED TO FF I - 557
PARAMETERS
Jaume Padrell, Dusan Macho, Climent Nadeu, Universitat Politecnica de Catalunya, Spain
SP-P7: LANGUAGE MODELING AND IDENTIFICATION
SP-P7.1: JOINT DISCRIMINATIVE LANGUAGE MODELING AND UTTERANCE I - 561
CLASSIFICATION
Murat Saraclar, AT&TLabs - Research, United States; Brian Roark, OG1 at Oregon Health & Science University, United States
SP-P7.2: LANGUAGEMODEL ESTIMATION FOR OPTIMIZING END-TO-END I - 565
PERFORMANCE OF ANATURAL LANGUAGE CALL ROUTING SYSTEM
Vaibhava Goel, IBM, United States; Hong-Kwang (Jeff) Kuo, IBM T. J. Watson Research Center, United States; Sabine Deligne,
Cheng Wu, IBM, United States
SP-P7.3: LANGUAGE IDENTIFICATION USING PHONETIC AND PROSODIC HMMS WITH I - 569
FEATURENORMALIZATION
Yasunari Obuchi, Nobuo Sato, Hitachi Ltd., Japan
SP-P7.4: RAPID LANGUAGE MODEL DEVELOPMENTUSING EXTERNAL RESOURCES I - 573
FORNEW SPOKEN DIALOG DOMAINS
Ruhi Sarikaya, IBM T. J. Watson Research Center, United States; Agustin Gravano, Columbia University, United States; YuqingGao, IBM T. J. Watson Research Center, United States
SP-P7.5: USING LOCAL & GLOBAL PHONOTACTICFEATURES IN CHINESE DIALECT I - 577
IDENTIFICATION
Boon Pang Lim, Haizhou Li, Bin Ma, Institutefor Infocomm Research, Singapore
SP-P7.6: RANDOM CLUSTERINGS FOR LANGUAGEMODELING I - 581
Ahmad Emami, Frederick Jelinek, Johns Hopkins University, United States
SP-P7.7: DIALECT/ACCENT CLASSIFICATION VIA BOOSTED WORDMODELING I - 585
Rongqing Huang, University of Colorado at Boulder, United States; John H. L. Hansen, University ofColorado, Boulder, United
States
SP-P7.8: WEB-DATA AUGMENTED LANGUAGEMODELS FOR MANDARIN I - 589
CONVERSATIONAL SPEECHRECOGNITION
Tim Ng, Mari Ostendoif, Mei-Yuh Hwang, University of Washington, United States; Manhung Siu, Hong Kong University of
Science and Technology, Hong Kong SAR ofChina; Ivan Bulyko, Xin Lei, University of Washington, UnitedStates
SP-P7.9: AN EFFICIENT ALGORITHM FOR CLUSTERING SHORT SPOKEN I - 593
UTTERANCES
Zhu Liu, AT&T Labs - Research, United States
SP-P7.10: MAXIMUM ENTROPY BASED GENERIC FILTER FORLANGUAGE MODEL I - 597
ADAPTATION
Dong Yu, Mil'md Mahajan, Peter Mau, Alex Acero, Microsoft Research, United States
xx\i
SP-P7.11: LANGUAGE IDENTIFICATION USING PITCH CONTOUR INFORMATION I - 601
Chi-Yueh Lin, Hsiao-Chuan Wang, National Tsing Hua University, Taiwan
SP-P7.12: INTEGRATING MULTIPLE LAYERS OF CONCEPT INFORMATION INTO I - 605
N-GRAM MODELING FOR SPOKEN LANGUAGE UNDERSTANDING
Nick J.-C. Wang, Delta Electronics, Inc., Taiwan
SP-P7.13: AUTOMATIC LANGUAGE IDENTIFICATION USING ERGODIC HMM I - 609
S. A. SantoshKumar, V. Ramasubramanian, Indian Institute ofScience, India
SP-P8: TEXT-INDEPENDENT SPEAKER RECOGNITION
SP-P8.1: DISCRIMINATIVEPOWER OF TRANSIENTFRAMES IN SPEAKER I - 613
RECOGNITION
Jerdme Louradour, Khalid Daoudi, Regine Andrd-Obrecht, IRIT - University Paul Sabatier, France
SP-P8.2: SPEAKER IDENTIFICATION IN UNKNOWN NOISY CONDITIONS - A I - 617
UNIVERSAL COMPENSATION APPROACH
Ji Ming, Danyl Stewart, Queen's University Belfast, United Kingdom; Saeeel Vaseghi, Brunei University, United Kingdom
SP-P8.3: EXTRACTING ADDITIONAL INFORMATION FROM GAUSSIAN MIXTURE MODEL I - 621
PROBABILITIES FOR IMPROVED TEXT-INDEPENDENT SPEAKER IDENTIFICATION
Balakrishnan Narayanaswamy, Carnegie Mellon University, United States; Rashmi Gangadharaiah, Indian Institute ofScience,India
SP-P8.4: COMBINING SELECTION TREE WITH OBSERVATION REORDERING I - 625
PRUNING FOR EFFICIENT SPEAKER IDENTIFICATION USING GMM-UBM
Zhenyu Xiong, Thomas Zheng, Tsinghua University, China; Zhanjieing Song, Beijing d-Ear Technologies Co., Ltd., China; Wenhu
Wu, Tsinghua University, China
SP-P8.5: ADVANCES IN CHANNEL COMPENSATION FOR SVMSPEAKER RECOGNITION I - 629
Alex Solomonoff, William Campbell, Ian Boardman, MIT Lincoln Laboratory, United Stales
SP-P8.6: IMPROVED SPEAKER MODEL MIGRATION VIA STOCHASTIC SYNTHESIS I - 633
Jiri'Navrdtil, Ganesh Ramaswamy, IBM T. J. Watson Research Center, United Stales
SP-P8.7: FACTOR ANALYSIS SIMPLIFIED I - 637
Patrick Kenny, Gilles Boulianne, Pierre Ouellel, Pierre Dumouchel, CRIM, Canada
SP-P8.8: MINIMUM CLASSIFICATION ERROR INTERACTIVE TRAINING FOR SPEAKER I - 641
IDENTIFICATION
Yusuke Kida, Kyoto University, Japan; Hiroyoshi Yamamoto, Nagoya Institute of Technology, Japan; Chiyomi Miyajima, NagoyaUniversity, Japan; Keiichi Tokuda, Tadashi Kitamura, Nagoya Institute ofTechnology, Japan
SP-P8.9: A NEW COMMON COMPONENT GMM-BASED SPEAKER RECOGNITION I - 645
METHOD
Yih-Ru Wang, Chen-Yu Chiang, National Chiao Tung University, Taiwan
SP-P8.10: GMM-BASED BHATTACHARYYA KERNEL FISHER DISCRIMINANT ANALYSIS I - 649
FOR SPEAKER RECOGNITION
Yi-Hsiang Chao, Hsin-Min Wang, Ruei-Chuan Chang, Academia Sinica, Taiwan
SP-P8.11: A STUDY OF THE RELATIVE IMPORTANCE OF TEMPORAL CHARACTERISTICS I - 653
IN TEXT-DEPENDENT AND TEXT-CONSTRAINED SPEAKER VERIFICATION
James Nealand, RMIT, Australia; Jason Peleceinos, Ran Zilca, Ganesh Ramaswamy, IBM T. J. Watson Research Center, United
Slates
SP-P8.12: NOISE ROBUST SPEAKER VERIFICATION USING MEL-FREQUENCY I - 657
DISCRETE WAVELET COEFFICIENTS AND PARALLEL MODEL COMPENSATION
Zekeriya Tufekci, Izmir Institute of Technology, Turkey; Sabri Gurbuz, Harran University, Turkey
xxvi
SP-P9: ACOUSTIC MODELING AND CLUSTERING ALGORITHMS
SP-P9.1: INITIALIZING SUBSPACE CONSTRAINED GAUSSIAN MIXTUREMODELS I - 661
Peder Olsen, Karthik Visweswariah, Ramesh Gopinath, IBM, United States
SP-P9.2: MULTI-RATE AND VARIABLE-RATE MODELING OF SPEECH AT PHONE AND I - 665
SYLLABLE TIME SCALES
Ozgiir Cetin, Mari Ostendorf, University of Washington, United States
SP-P9.3: OPTIMAL CLUSTERING AND NON-UNIFORM ALLOCATION OF GAUSSIAN I - 669
KERNELS IN SCALAR DIMENSION FORHMM COMPRESSION
Xiao-Bing Li, Frank K. Soong, ATR Spoken Language Translation Research Labs, Japan; Tor Andre Myrvoll, NorwegianUniversity ofScience and Technology, Norway; Ren-Hua Wang, University ofScience and Technology ofChina, China
SP-P9.4: HIERARCHICAL CORRELATION COMPENSATION FOR HIDDEN MARKOV I - 673
MODELS
Hui Lin, Tsinghua University, China; Ye Tian, JianLai Zhou, Microsoft Research Asia, China; Hui Jiang, York University,
Canada
SP-P9.5: CLUSTER-DEPENDENT ACOUSTIC MODELING I - 677
Bing Xiang, Long Nguyen, Spyros Matsoukas, Richard Schwartz, BBN Technologies, United States
SP-P9.6: FUZZY PARAMETER CLUSTERINGMETHOD IN SPEECH RECOGNITION I - 681
Xianghua Xu, .lie Zhu, Shanghai Jiaotong University, China
SP-P9.7: AUTOMATIC TRAINING SET SEGMENTATIONFOR MULTI-PASS SPEECH I - 685
RECOGNITION
Mark Mao, Stanford University, United States; Vincent Vanhoucke, Brian Strope, Nuance Communications, United States
SP-P9.8: GENERALIZED STATISTICAL MODELING OF PRONUNCIATION VARIATIONS I - 689
USING VARIABLE-LENGTH PHONE CONTEXT
Yuya Akila, Tatsuya Kawahara, Kyoto University, Japan
SP-P9.9: ON INITIALIZATION OF GAUSSIAN MIXTURES: A HYBRID GENETICEM I - 693
ALGORITHM
Franz Pemkopf, Graz University ofTechnology, Austria
SP-P9.10: ACOUSTIC MODEL TRAINING USING GREEDY EM I - 697
Rusheng Hu, Xiaolong Li, Yunxin Zhao, University ofMissouri-Columbia, United States
SP-P9.11: MODELING SUCCESSIVE FRAME DEPENDENCIES WITH HYBRID HMM/BN I - 701
ACOUSTIC MODEL
Konstantin Markov, Satoshi Nakamura, ATR Spoken Language Translation Research Labs, Japan
SP-P9.12: IMPROVED COVARIANCE MODELING FORMAXIMUM LIKELIHOOD I - 705
MULTIPLE SUBSPACE TRANSFORMATIONS
Xi Zhou, University of Science and Technology ofChina, China; Ye Tian, JianLai Zhou, Microsoft Research Asia, China; Bei-
qian Dai, University ofScience and Technology of China, China
SP-P10: TOPICS IN SPEAKER RECOGNITION
SP-P10.1: A PROBABILISTIC MEASURE OF MODALITY RELIABILITY IN SPEAKER I - 709
VERIFICATION
Jonas Richiardi, Plamen Prodanov, Andrzej Diygajlo, Swiss Federal Institute of Technology (EPFL), Switzerland
SP-P10.2: A CORRELATION METRIC FOR SPEAKER TRACKING USINGANCHOR I - 713
MODELS
Mikael Collet, Delphine Charlet, France Telecom R&D, France; Frederic Bimbot, IR1SA (CNRS & INRIA), France
xxvu
SP-P10.3: ESTIMATING AND EVALUATING CONFIDENCEFORFORENSIC SPEAKER I - 717
RECOGNITION
William Campbell, Douglas Reynolds, Joseph Campbell, Kevin Brady, MIT Lincoln Laboratory, United Slates
SP-P10.4: F-RATIO CLIENT-DEPENDENT NORMALISATION FOR BIOMETRIC I - 721
AUTHENTICATION TASKS
Norman Poh, Samy Bengio, ID1AP Research Institute, Switzerland
SP-P10.5: CLUSTERING SPEECH UTTERANCES BY SPEAKERUSING I - 725
EIGENVOICE-MOTIVATED VECTOR SPACE MODELS
Wei-Ho Tsai, Shih-Sian Cheng, Yi-Hsiang Chao, Hsin-Min Wang, Academia Sinica, Taiwan
SP-P10.6: T-NORM FOR TEXT-DEPENDENT COMMERCIAL SPEAKER VERIFICATION I - 729
APPLICATIONS: EFFECT OF LEXICAL MISMATCH
Matthieu Hebert, Daniel Boies, Nuance Communication, Canada
SP-P10.7: A SESSION-GMM GENERATIVE MODEL USING TEST UTTERANCE GAUSSIAN I - 733
MIXTURE MODELING FOR SPEAKER VERIFICATION
Hagai Aronowitz, Bar-Han University, Israel; David Burshtein, Tel-Aviv University, Israel; Amihood Amir, Bar-Han University,Israel
SP-P10.8: ALIZE, A FREE TOOLKIT FOR SPEAKER RECOGNITION I - 737
Jean-Francois Bonastre, Frederic Wils, University ofAvignon, France; Sylvain Meignier, University of Maine, France
SP-P10.9: SPEAKER ADAPTIVE COHORTSELECTION FOR TNORM IN I - 741
TEXT-INDEPENDENT SPEAKER VERIFICATION
Douglas Sturirn, Douglas Reynolds, MIT Lincoln Laboraloiy, United Slates
SP-P10.10: HYBRID SPEAKER-BASED SEGMENTATION SYSTEM USING MODEL-LEVEL I - 745
CLUSTERING
Hyoung-Gook Kim, Daniel Ertelt, Thomas Sikora, Technical University ofBerlin, Germemy
SP-P10.11: ROBUSTNESS OF BIT-STREAM BASED FEATURES FOR SPEAKER I - 749
VERIFICATION
Antonio Moreno-Daniel, Biing-Hwang (Fred) Juang, Georgia Institute ofTechnology, United Stales; Juan Arluro Nolazco-
Flores, lnstituto Tecnologico de Monterrey (1TESM), Mexico
SP-P10.12: TWO-WAY CLUSTER VOTINGTO IMPROVE SPEAKER DIARISATION I - 753
PERFORMANCE
Sue Tranter, Cambridge University, United Kingdom
SP-P10.13: SPEAKER DETECTION WITHOUT MODELS I - 757
Daniel Gillick, Stephen Stafford, Barbara Peskin, Berkeley, United States
SP-P11: TOPICS IN SPEECH CODING AND ENHANCEMENT
SP-P11.1: IMPROVING THE 2.4 KB/S MILITARY STANDARDMELP (MS-MELP) CODER I - 761
USING PITCH-SYNCHRONOUS ANALYSIS AND SYNTHESIS TECHNIQUESAli Erdem Ertan, Thomas P. Barnwell III, Georgia Institute ofTechnology, United Slates
SP-P11.2: ULTRA LOW BIT RATE SPEECH CODING USING AN ERGODIC HIDDEN I - 765
MARKOV MODEL
Matthew Lee, Adriane Durey, Elliot Moore, Mark Clements, Georgia Institute ofTechnology, United Slates
SP-P11.3: TOWARDS ILBC SPEECH CODING AT LOWER RATES THROUGH A NEW I - 769
FORMULATION OF THE START STATE SEARCH
Christopher M. Garrido, Manohar N. Murlhi, University of Miami, United States; S0ren Vang Andersen, Aalborg University,Denmark
xxvin
SP-P11.4: A MISSING-DATA APPROACH TO NOISE-ROBUST LPC EXTRACTIONFOR I
VOICED SPEECH USING AUXILIARY SENSORS
Cenk Demiroglu, Thomas P. Barnwell 111, Georgia Institute ofTechnology, United States
SP-P11.5: A TECHNIQUEOF MULTI-TAP LONG TERMPREDICTOR (LTP) FILTER I
USING SUB-SAMPLE RESOLUTION DELAY
Mark Jasiuk, Tenkasi Ramabadran, UdarMittal, James Ashley, Michael McLaughlin, Motorola Labs, United States
SP-P11.6: VOICE ACTIVITY DETECTIONBASED ON GENERALIZED GAMMA I
DISTRIBUTION
Jong Won Shin, Seoul National University, Republic of Korea; Joon-Hyuk Chang, University of California, Santa Barbara,
United States; Hwan Sik Yun, Nam Soo Kim, Seoul National University, Republic ofKorea
SP-P11.7: INCREASING THE ROBUSTNESS OF CELP-BASED CODERS BY I
CONSTRAINED OPTIMIZATION
Mohamed Chibani, Philippe Gournay, Roch Lefebvre, University of Sherbrooke, Canada
SP-P11.8: JOINT OPTIMIZATION OF EXCITATION PARAMETERS IN I
ANALYSIS-BY-SYNTHESIS SPEECH CODERS HAVING MULTI-TAP LONG TERMPREDICTOR
Udar Mittal, James Ashley, Edgardo Cruz-Zeno, Mark Jasiuk, Motorola Labs, United States
SP-P11.9: BLOCK-BASED BANDWIDTH EXTENSION OF NARROWBAND SPEECH SIGNAL I
BY USING CDHMM
Sheng Yao, Cheung-Fat Chan, City University ofHong Kong, Hong Kong SAR of China
SP-P11.10: SEGMENTATION-BASED SPEECHENHANCEMENT FOR INTELLIGIBILITY I
IMPROVEMENT IN MELP CODERS USING AUXILIARY SENSORS
Cenk Demiroglu, Sunil Kamath, David Anderson, Georgia Institute ofTechnology, United States
SP-P11.11: STOCHASTIC INTEGRATION AND LONG TERMPREDICTOR ESTIMATION I
UNDER NOISY CONDITIONS FOR SPEECH ENHANCEMENT
Marcin KuropaWinski, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P11.12: A ROBUST NARROWBAND TO WIDEBAND EXTENSION SYSTEMFEATURING I
ENHANCED CODEBOOKMAPPING
Takahiro Unno, Texas Instruments, United States; Alan McCree, MIT Lincoln Laboratory, United States
SP-P11.13: ARTIFICIAL BANDWIDTH EXPANSION METHOD TO IMPROVE I
INTELLIGIBILITY AND QUALITYOF AMR-CODED NARROWBAND SPEECH
Laura Laaksonen, Nokia Research Center, Finland; Juho Kontio, Paavo Alku, Helsinki University ofTechnology, Finland
SP-P11.14: A SOFT DECISION BASED NOISE CROSS POWERSPECTRAL DENSITY I
ESTIMATION FOR TWO-MICROPHONESPEECH ENHANCEMENTSYSTEMS
Xuefeng Zhang, Ying Jia, Intel China Research Center, China
SP-P12: LARGE VOCABULARY ASR
SP-P12.1: LATTICE SEGMENTATION AND SUPPORT VECTOR MACHINES FORLARGE I
VOCABULARY CONTINUOUS SPEECH RECOGNITION
Veera Venkataramani, Johns Hopkins University, United States; William Byrne, Cambridge University, United Kingdom
SP-P12.2: FIRST STEPS IN FAST ACOUSTIC MODELING FOR A NEW TARGET I
LANGUAGE: APPLICATION TO VIETNAMESE
Viet-BacLe, Laurent Besacier, CLIPS /IMAG, France
SP-P12.3: CROSS DOMAIN AUTOMATIC TRANSCRIPTION ONTHE TC-STAR EPPS I
CORPUS
Christian Gollan, Maximilian Bisani, Slephan Kanthak, RalfSchluter, Hermann Ney, RWTH-Aachen Germany, Germany
xxix
SP-P12.4: USING RULE-BASED KNOWLEDGE TO IMPROVE LVCSR I - 829
Rene Beutler, Tobias Kaufmann, Beat Pfister, ETH, Switzerland
SP-P12.5: ADAPTATION STRATEGIES FORTHE ACOUSTIC AND LANGUAGE MODELS IN I - 833
BILINGUAL SPEECH TRANSCRIPTION
Javier Dieguez-Tiraclo, Carmen Garcia-Mateo, Laura Docio-Fernandez, Antonio Cardenal-Lopez, ETSI Telecomunicacion,
Spain
SP-P12.6: A STUDY ON KNOWLEDGESOURCE INTEGRATION FOR CANDIDATE I - 837
RESCORING IN AUTOMATIC SPEECH RECOGNITION
JinyuLi, YuTsao, Chin-Hui Lee, Georgia Institute ofTechnology, United States
SP-P12.7: DEVELOPMENT OF THE CUHTK 2004 MANDARIN CONVERSATIONAL I - 841
TELEPHONE SPEECH TRANSCRIPTION SYSTEM
Mark J. F. Gales, Bin Jia, Andrew Liu, Khe Chai Sim, Phi! Woodland, Kai Yu, Cambridge University, United Kingdom
SP-P12.8: BAYESIAN MODEL COMBINATION (BAYCOM) FOR IMPROVED I - 845
RECOGNITION
Ananth Sankar, Nuance Communications, United Stales
SP-P12.9: INVESTIGATION OF ACOUSTIC MODELING TECHNIQUES FORLVCSR I - 849
SYSTEMS
Xunying Liu, Mark J. F. Gales, Khe Chai Sim, Kai Yu, Cambridge University, United Kingdom
SP-P12.10: IMPROVED CONFUSION NETWORK ALGORITHM AND SHORTEST PATH I - 853
SEARCH FROM WORD LATTICE
Jian Xue, Yunxin Zhao, University ofMissouri-Columbia, United Stales
SP-P12.11: THAI AUTOMATIC SPEECHRECOGNITION I - 857
Sinapom Suebvisai, Paisam Charoenpornsawat, Alan W. Black, Carnegie Mellon University, United States; Monika Woszczyna,Multimodal Technologies, Inc., United States; Tanja Schultz, Carnegie Mellon University, United States
SP-P12.12: DEVELOPMENT OF THE CU-HTK 2004 BROADCAST NEWS TRANSCRIPTION I - 861
SYSTEMS
Do Yeong Kim, Ho Yin Chan, Gunnar Evermann, Mark J. F. Gales, David Mrva, Khe Chai Sim, Phil Woodland, CambridgeUniversity, United Kingdom
SP-P12.13: CROSS-LANGUAGE ACOUSTIC MODEL REFINEMENT FORTHEINDONESIAN I - 865
LANGUAGE
Terrence Martin, Sridha Sridharan, Queensland University of Technology, Austredia
SP-P13: SPEECH ANALYSIS AND PRODUCTION
SP-P13.1: ANALYSIS OF SPECTRAL MEASURES FOR VOICED SPEECH WITH VARYING I - 869
NOISE AND PERTUBATION LEVELS
Eoin O'Leidhin, Peter Murphy, University ofLimerick, Ireland
SP-P13.2: AUTOMATIC DYSPHONIA RECOGNITION USING BIOLOGICALLY-INSPIRED I - 873
AMPLITUDE-MODULATION FEATURES
Nicolas Malyska, Thomas Quatieri, Douglas Sturim, MIT Lincoln Laboratory, United States
SP-P13.3: VOICED/UNVOICED DETERMINATION OFSPEECH SIGNAL IN NOISY I - 877
ENVIRONMENT USING HARMONICITY MEASURE BASED ON INSTANTANEOUS FREQUENCYDhany Arifianto, Takao Kobayashi, Tokyo Institute of Technology, Japan
SP-P13.4: SNR AND LOCAL NOISE POWER ESTIMATIONS BASED ONGAUSSIAN I - 881
MIXTURE MODELING ON THE LOG-POWER DOMAIN
Kazuya Takeda, Tran Huy Dal, Hiroshi Fujimura, Fumitada Itakura, Nagoya University, Japan
xxx
SP-P13.5: DETECTION OF SYMBOLIC GESTURAL EVENTS IN ARTICULATORY DATA 1. 885FOR USE IN STRUCTURAL REPRESENTATIONS OF CONTINUOUS SPEECHAlexander Gutkin, Simon King, University ofEdinburgh, United Kingdom
SP-P13.6: MATHEMATICALEVIDENCE OF THE ACOUSTIC UNIVERSAL STRUCTURE IN I - 889SPEECH
Nobuaki Minematsu, University of Tokyo, Japan
SP-P13.7: MODELING OF THE FRONT CAVITY AND SUBLINGUAL SPACE IN AMERICAN I - 893ENGLISH RHOTIC SOUNDS
Zhaoyan Zhang, Carol Espy-Wilson, University ofMaryland, United States; Suzanne Boyce, University of Cincinnati, United
States; Mark Tiede, Haskins Laboratories, United States
SP-P13.8: OBJECTIVE QUALITY MEASURES FOR GLOTTAL INVERSE FILTERING OF I - 897
SPEECH PRESSURE SIGNALS
Tom Bdckstrom, MattiAiras, Laura Lehto, Paavo Alku, Helsinki University of Technology, Finland
SP-P13.9: EFFECTS OF GLOTTAL AND LIP BOUNDARY CONDITIONS ON VOCAL-TRACT I - 901
AREA FUNCTION ESTIMATESFROMSPEECH SIGNALS
Huiqun Deng, Rabab K. Ward, Michael Beddoes, Murray Hodgson, University ofBritish Columbia, Canada
SP-P13.10: ADAPTIVE FILTERBANKS INSPIRED BY THE AUDITORY SYSTEMFOR I - 905SPEECH FEATURE EXTRACTION
Ramdas Kumaresan, Gopi Krishna Allu, University ofRhode Island, United States; Peter Cariani, Tufts Medical School, United
States
SP-P13.11: MULTI-SPEAKERARTICULATORY RECONSTRUCTIONBASED ON AN EIGEN I - 909
ARTICULATORYHMM
Sadao Hiroya, Tetkemi Mochida, NTT Communication Science Laboratories, Japan
SP-P13.12: A GRAPHICAL MODEL FORFORMANT TRACKING I - 913
Jonathan Malkin, Xiao Li, JeffBilmes, University of Washington, United States
SP-P13.13: DYSPHONIC SPEECH ANALYSIS USING GENERALIZED VARIOGRAM I - 917
Abdellah Kacha, Francis Grenez, Jean Schoentgen, Universite Libre de Bruxelles, Belgium; KhierBenmahammed, Universite de
Setif Algeria
SP-P14: FEATURE EXTRACTION AND MODELING
SP-P14.1: TRAINING WIDEBAND ACOUSTIC MODELS USING MIXED-BANDWIDTH I - 921
TRAINING DATA VIA FEATURE BANDWIDTH EXTENSION
Michael Seltzer, AlexAcero, Microsoft Research, United States
SP-P14.2: MINIMUM PHONEMEERROR BASED HETEROSCEDASTIC LINEAR I - 925
DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION
Bing Zhang, Northeastern University, United States; Spyros Matsoukas, BBN Technologies, United States
SP-P14.3: A STUDY OF AUDITORY MODELING AND PROCESSING FOR SPEECH I - 929
SIGNALS
Woojay Jeon, Biing-Hwang (Fred) Juang, Georgia Institute ofTechnology, United Stales
SP-P14.4: A WAVELETAND FILTER BANKFRAMEWORK FOR PHONETIC I - 933
CLASSIFICATION
Ghinwa Choueiter, James Glass, Massachusetts Institute ofTechnology, United States
SP-P14.5: AUTOMATIC SYLLABLE STRESS DETECTION USING PROSODIC FEATURES I - 937
FORPRONUNCIATION EVALUATION OF LANGUAGE LEARNERS
Joseph Tepperman, Shrikanth Narayanan, University ofSouthern California, United States
xxxi
SP-P14.6: PREDICTINGFORMANT FREQUENCIES FROM MFCC VECTORS I - 941
Jonathan Darch, Ben Milner, Xu Shao, University of East Anglia, United Kingdom; Saeed Vaseghi, Qin Yan, Brunei University,
United Kingdom
SP-P14.7: TONOTOPIC MULTI-LAYERED PERCEPTRON: A NEURAL NETWORKFOR I - 945
LEARNING LONG-TERM TEMPORALFEATURES FOR SPEECH RECOGNITION
Barry Chen, Qifeng Zhu, Nelson Morgan, University' of California Berkeley, United Stales
SP-P14.8: TOWARDS AN INTELLIGENT ACOUSTIC FRONT-END FOR AUTOMATIC I - 949
SPEECH RECOGNITION: BUILT-IN SPEAKER NORMALIZATION (BISN)
Umil Yapanel, University ofColorado at Boulder, United Slates; John H. L. Hansen, University ofColorado, Boulder, United
States
SP-P14.9: QUASI-CONTINUOUS LOCAL CODEBOOKFEATURES FOR MULTILINGUAL I - 953
ACOUSTIC PHONETIC MODELLING
Frank Diehl, Asuncidn Moreno, Universitat Politecnica de Catalunya, Spain
SP-P14.10: GARCH COEFFICIENTS AS FEATUREFORSPEECH RECOGNITIONIN I - 957
PERSIAN ISOLATED DIGIT
MohamadAbdolahi, Hamidreza Amindavar, Amirkabir University ofTechnology, Iran (Islamic Republic of)
SP-P14.11: FMPE: DISCRIMINATIVELY TRAINED FEATURES FOR SPEECH I - 961
RECOGNITION
Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Sollau, Geoffrey Zweig, IBM, United States
SP-P15: ADAPTATION AND NORMALIZATION
SP-P15.1: VARIATIONAL BAYESIAN ADAPTATION FOR SPEAKER CLUSTERING I - 965
Fabio Valente, Christian Wellekens, Institut Eurecom, France
SP-P15.2: AUTOMATIC DISFLUENCY REMOVAL ON RECOGNIZED SPONTANEOUS I - 969
SPEECH - RAPID ADAPTATION TO SPEAKER DEPENDENT DISFLUENCIES
Matthias Honal, Universitat Karlsruhe, Germany; Tanja Schultz, Carnegie Mellon University, United Stales
SP-P15.3: AGGREGATEA POSTERIORI LINEAR REGRESSION FOR SPEAKER ADAPTATION I - 973
Chih-Hsien Huang, Jen-Tz.ung Chien, National Cheng Kung University, Taiwan
SP-P15.4: TWO-STAGE SPEAKER ADAPTATION OF HYBRID TIED-POSTERIOR ACOUSTIC I - 977
MODELS
Jan Sladermann, Gerhard Rigoll, Technische Universitat Miinchen, Germany
SP-P15.5: VARIOUS REFERENCE SPEAKERS DETERMINATION METHODS FOR I - 981
EMBEDDED KERNEL EIGENVOICE SPEAKER ADAPTATION
Brian Mak, Simon Ho, Hong Kong University ofScience and Technology, Hong Kong SAR of China
SP-P15.6: KERNEL EIGENSPACE-BASED MLLRADAPTATION USING MULTIPLE I - 985
REGRESSION CLASSES
Roger Hsiao, Brian Mak, Hong Kong University ofScience and Technology, Hong Kong SAR of China
SP-P15.7: AUTOMATICALLY TRANSCRIBING MEETINGS USING DISTANT I - 989
MICROPHONES
Florian Metze, Christian Ftlgen, Universitat Karlsruhe (TH), Germany; Yue Pan, Waibel Alexander, Carnegie Mellon University,United Stales
SP-P15.8: A NOVEL METHOD FOR RAPID SPEAKERADAPTATION BASED ON SUPPORT I - 993
SPEAKER WEIGHTING
Tie Cai, Jie Zhu, Shanghai Jiaolong University, China
SP-P15.9: ADAPTIVE TRAINING USING SIMPLE TARGET MODELS I - 997
Georg Slemmer, Fabio Britgnara, Diego Giuliani, ITC-irst, Italy
xxxi I
SP-P15.10: LEARNING PRONUNCIATION AND FORMULATION VARIANTS IN I -1001CONTINUOUS SPEECH APPLICATIONS
Daniele Colibro, Luciano Fissore, Cosmin Popovici, Claudio Vair, Loquendo, Italy; Pietro Laface, Politecnico di Torino, Italy
SP-P15.11: ALTERNATE PHONE MODELS FOR CONVERSATIONAL SPEECH I -1005Lori Lamel, Jean-Luc Gauvain, CNRS-LIMSI, France
SP-P15.12: WHISPERY SPEECH RECOGNITION USING ADAPTED ARTICULATORY I -1009FEATURES
Szu-Chen Jou, Tanja Schultz, Alex Waibel, Carnegie Mellon University, United States
SP-P16: TOPICS IN SPEECH PROCESSING AND SYSTEMS
SP-P16.1: OPEN VOCABULARY ASR FOR AUDIOVISUALDOCUMENT INDEXATION I -1013Alexandre Allauzen, Jean-Luc Gauvain, LIMSI-CNRS, France
SP-P16.2: CONSTRAINED PHRASE-BASED TRANSLATION USING WEIGHTED FINITE I -1017
STATE TRANSDUCER
Bowen Zhou, Stanley Chen, Yuqing Gao, IBM T. J. Watson Research Center, United States
SP-P16.3: UNSUPERVISEDVOCABULARY EXPANSION FOR AUTOMATIC I -1021
TRANSCRIPTION OF BROADCAST NEWS
Katsutoshi Ohtsuki, Nobuaki Hiroshima, Masahiro Oku, Akihiro Imamura, NTT Corporation, Japan
SP-P16.4: CLASSIFICATION OF STRUCTUREDDESCRIPTIONS I -1025
Srinivas Bangalore, AT&T Labs - Research, United Slates; Owen Rainbow, Columbia University, United States
SP-P16.5: MAXIMUM ENTROPY SEGMENTATION OF BROADCAST NEWS I -1029
Heidi Christensen, BalaKrishna Kolluru, Yoshihiko Gotoh, University ofSheffield, United Kingdom; Steve Renals, University ofEdinburgh, United Kingdom
SP-P16.6: THE AT&T WATSON SPEECH RECOGNIZER I -1033
Vincent Goffin, CyrilAllauzen, Enrico Bocchieri, Dilek Hakkani-Tur, Andre] Ljolje, Sarangarajan Parthasarathy, Mazin Rahim,
Giuseppe Riccardi, Murat Saraclar, AT&TLabs - Research, United States
SP-P16.7: OPEN VOCABULARY CHINESENAME RECOGNITION WITH THEHELP OF I -1037
CHARACTER DESCRIPTION AND SYLLABLE SPELLING RECOGNITION
Ching-Ho Tsai, NickJ.-C. Wang, Patrick Huang, Jia-Lin Shen, Delia Electronics, Inc., Taiwan
SP-P16.8: ERROR PREDICTION IN SPOKEN DIALOG: FROM SIGNAL-TO-NOISE RATIO I -1041
TO SEMANTIC CONFIDENCESCORES
Dilek Hakkani-Tur, Gokhan Tur, Giuseppe Riccardi, AT&TLabs - Research, United States; Hong KookKim, Gwangju Institute
ofScience and Technology, Republic ofKorea
SP-P16.9: INCORPORATING DIALOGUE CONTEXT AND TOPIC CLUSTERING IN I -1045
OUT-OF-DOMAINDETECTION
Ian Lane, Tatsuya Kawahara, Kyoto University, Japan
SP-P16.10: STRUCTURING BASEBALL LIVE GAMES BASED ON SPEECH RECOGNITION I -1049
USING TASK DEPENDENT KNOWLEDGEAND EMOTION STATE RECOGNITION
Atsushi Sako, Yasuo Ariki, Kobe University, Japan
SP-P16.11: A NEWASR EVALUATION MEASUREAND MINIMUM BAYES-RISK DECODING I -1053
FOR OPEN-DOMAIN SPEECH UNDERSTANDING
Hiroaki Nanjo, Ryukoku University, Japan; Tatsuya Kawahara, Kyoto University, Japan
SP-P16.12: SPEECHRECOGNITION OF A NAMEDENTITY I -1057
Tatsuhiko Tomila, Waseda University, Japan; Yoshiyuki Okimolo, Matsushita Electric Industrial Co., Ltd., Japan; HirofumiYamamoto, ATR Spoken Language Translation Research Labs, Japan; Yoshinori Sagisaka, Waseda University, Japan
xxxiii
SP-P16.13: AUTOMATIC DIALOG ACT SEGMENTATION AND CLASSIFICATION IN I -1061
MULTIPARTY MEETINGS
Jeremy Aug, Yang Liu, Elizabeth Shriberg, International Computer Science Institute, United States
SP-P16.14: SENTENCE EXTRACTION-BASED PRESENTATION SUMMARIZATION I -1065
TECHNIQUES AND EVALUATION METRICS
Makolo Hirohata, Yousuke Shinnaka, Kofi Iwano, Saelaoki Furui, Tokyo Institute of Technology, Japan
SP-P17: TOPICS IN SPEECH ENHANCEMENT, SEPARATION AND DEREVERBERATION
SP-P17.1: BLIND DEREVERBERATION BASED ON ESTIMATES OF SIGNAL I -1069
TRANSMISSION CHANNELS WITHOUT PRECISEINFORMATION OF CHANNEL ORDER
Takafumi Hikichi, Marc Delcroix, Masato Miyoshi, NTT Corporation, Japan
SP-P17.2: FAST ESTIMATION OF A PRECISE DEREVERBERATION FILTER BASED ON I -1073
SPEECH HARMONICITY
Keisuke Kinoshita, Tomohiro Nakalani, Masato Miyoshi, NTT Corporation, Japan
SP-P17.3: CODEBOOK-BASED BAYESIAN SPEECH ENHANCEMENT I -1077
Sriram Srinivasan, Jonas Samuelsson, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P17.4: OVERCOMING THE STATISTICAL INDEPENDENCE ASSUMPTION W.R.T I -1081
FREQUENCY IN SPEECH ENHANCEMENT
Tim Fingscheidt, Christophe Beaugeeint, Suhaeli Suhadi, Siemens AG, COM Mobile Phones, Germany
SP-P17.5: A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH I -1085
Mingyang Wu, Fair Isaac Corporation, United States; DeLiang Wang, The Ohio Stale University, United Stales
SP-P17.6: MATRIX QUANTIZATION BASED TIME-VARYING FILTER SPEECH I -1089
ENHANCEMENT
Sharath Rao K, Boston University, United Stales; Sreenivas Thippur, Indian Institute ofScience, India
SP-P17.7: LEAKAGE MODELAND TEETH CLACK REMOVAL FOR AIR- AND I -1093
BONE-CONDUCTIVE INTEGRATEDMICROPHONES
Zieheng Liu, Amar Subramanya, Zhengyou Zhang, Jasha Droppo, AlexAcero, Microsoft Research, United Slates
SP-P17.8: SPEECH ENHANCEMENT USING A MMSE SHORT TIME SPECTRAL I -1097
AMPLITUDE ESTIMATOR WITH LAPLACIAN SPEECH MODELING
Bin Chen, Philipos Loizou, University ofTexas, Dallas, United States
SP-P17.9: SEPARATION OF FRICATIVES AND AFFRICATES I -1101
Guoning Hu, DeLiang Wang, The Ohio Stale University, United States
SP-P17.10: SPEECH ENHANCEMENT BASED ON FILTERING THE SPECTROTEMPORAL I -1105
MODULATIONS
Nima Mesgarani, Shihab Shamma, University ofMaryland, United Slates
SP-P17.11: IMPROVED KALMAN FILTERING FOR SPEECH ENHANCEMENT I -1109
Volodya Grancharov, Jonas Samuelsson, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P17.12: ADAPTIVE DECORRELATION FILTERING ALGORITHM FOR SPEECH SOURCE I -1113
SEPARATION IN UNCORRELATED NOISES
Rong Hu, Yunxin Zhao, University ofMissouri-Columbia, United Slates
SP-P17.13: AN IMPROVED ESTIMATION OF A PRIORI SPEECH ABSENCE PROBABILITY I -1117
FOR SPEECHENHANCEMENT : IN PERSPECTIVE OF SPEECHPERCEPTION
Min Seok Choi, Hong-Goo Kang, Yonsei University, Republic ofKorea
xxxiv
SP-P17.14: SPEECHENHANCEMENT USING A SWITCHING KALMAN FILTER WITH A I -1121
PERCEPTUAL POST-FILTER
Jianping Deng, Martin Bouchard, TetH. Yeap, University of Ottawa, Canada
Volume II
IMDSP-Ll: WATERMARKING
IMDSP-L1.1: USING PERCEPTUAL MODELS TO IMPROVE FIDELITY AND PROVIDE II -1
INVARIANCE TO VALUMETRIC SCALINGFOR QUANTIZATION INDEX MODULATIONWATERMARKING
Qiao Li, Ingemar Cox, University College London, United Kingdom
IMDSP-L1.2: SCALAR SCHEME FORMULTIPLE USER INFORMATION EMBEDDING II - 5
AbdellatifZaidi, Pablo Piantanida, Pierre Duhamel, LSS/CNRS SUPELEC, France
IMDSP-L1.3: RANDOMIZED DETECTIONFOR SPREAD-SPECTRUM WATERMARKING: II - 9
DEFENDING AGAINST SENSITIVITY AND OTHER ATTACKS
Ramarathnam Venkalesan, Mariusz Jakubowski, Microsoft Research, United States
IMDSP-L1.4: LINEAR COMBINATION COLLUSION ATTACK AND ITS APPLICATION ON AN II -13
ANTI-COLLUSION FINGERPRINTING
Yongdong Wu, Institute for Infocomm Research, Singapore
IMDSP-L1.5: PITCH AND DURATION MODIFICATION FOR SPEECH WATERMARKING II -17
Mehmet Celik, Gaurav Sharma, University ofRochester, United States; A. MuratTekalp, University ofRochester, United States /
Koc University, Turkey
IMDSP-L1.6: MORPHOLOGICAL STEGANALYSIS OF AUDIO SIGNALS AND THE II - 21
PRINCIPLE OF DIMINISHING MARGINAL DISTORTIONS
Oktay Altun, Gaurav Shanna, Mehmet Celik, Mark Sterling, Edward Titlebaum, MarkBocko, University ofRochester, United
States
IMDSP-L2: DENOISING
IMDSP-L2.1: IMAGE DENOISING BY NON-LOCAL AVERAGING II - 25
Antoni Buades, Bartomeu Coll, Universitat de les Illes Balears, Spain; Jean-Michel Morel, ENS Cachan, France
IMDSP-L2.2: IMAGE DENOISING FOR SIGNAL-DEPENDENT NOISE II - 29
Keigo Hirakawa, New England Conservatoiy ofMusic, United States; Tliomas W. Parks, Cornell University, United States
IMDSP-L2.3: WAVELET DOMAIN PARTITION-BASED IMAGE DENOISING II - 33
11 Ryeol Kim, Kenneth E. Banter, University ofDelaware, United States
IMDSP-L2.4: AN IMPROVED IMAGE DENOISING ALGORITHM BASED ON WEIGHTED II - 37
ADAPTIVE LOCAL BOUNDS
Qi Li, Tania Stathaki, Imperial College London, United Kingdom
IMDSP-L2.5: A SELF-CONSISTENTWAVELETMETHOD FOR DENOISING IMAGES II - 41
WITH MISSING PIXELS
Thomas Lee, Colorado State University, United States; Xiao-Li Meng, Harvard University, United States
JCOT