a3: incremental specification in context cooperation: lss, ims grzegorz dogil, bin yang, wolgang...
TRANSCRIPT
A3: Incremental Specification in ContextCooperation: LSS, IMS
Grzegorz Dogil, Bin Yang, Wolgang Wokurek
Stefan Uhlich, Andreas Madsack
Content• Current research topics:
– Subglottal resonances
– Robust speech representation
– Landmark/Phoneme detection
• Future research direction
Subglottal Resonances
• Topic:– Measurement of subglottal resonances
– Relationship to vowel chart
• Results: (in cooperation with Steven Lulich, MIT)
– Recording of 20 speaker with sensors
– Analysis of Swabian Diphthongs
• Future (until 2010):– Measure 50+ speakers with new sensor
– Calculate for these speakers vowelspacewith first two sg resonances
VTSG
G
• How can we identify features that are important?
• So far: Measuring relevance with mean squared error– A feature is more important, if it allows a good
reconstruction in the spirit of the mean squared error
– Mathematically more tractable
• General question for the next project phase:– How do we measure perceptional relevance?
Robust Speech Representation
Random ErasureProcess
Features s
1,...,s
N Subset of Features y
1,...,y
MReconstructionReconstr.
Features ŝ1,...,ŝ
N
Landmark/Phoneme Detection• Topic:
– Find relevant Features for Landmarks/Phonemes using statistical evaluation methods
– Identify characteristic temporal contour of relevant features
• Used features (only subset selected):– En. envelopes (A2), Liu bands
– LPC (LSS), VQP (Wokurek), f0, MFCC, ...
• Results (for different tasks):– Segment wise detection
– Evaluation of performance difference for fixed and phoneme-based segmentation
Future Research Direction (I)
• Identify perceptual relevant regions of speech
• Identify perceptual relevant regions of speech
Future Research Direction (I)
Future Research Direction (II)• Example: /ae/ of
handbag• Exemplars: Different
versions of perceptual relevant regions for the same phoneme
Set of all /ae/'s in corpus and corresponding feature values
Feature Selectione.g. find best five features
Statistical Classifier
New Exemplar, i.e. coverage of 80 %
not coveredi.e. 20 %
Future Research Direction (III)
• Work packages:– Regions: How to identify perceptual relevant regions in the
(t,f)-plane?• Feature extraction: IMS, LSS (part already done), robustness• Feature selection: IMS (phonetically motivated), LSS
(statistically motivated) + Combination of both + memory decay
– Evaluation: Are the identified regions relevant?• ... for speech representation in context? (IMS)• ... for usage-induced context information? (LSS)
– Transition to higher levels (pitch-accents (A1), syllables (A2), words (A4))
References• Subglottal Resonces
– W. Wokurek and A. Madsack (2008), Messung subglottaler Resonanzen mit Beschleunigungssensoren, Fortschritte der Akustik--DAGA-2008 (Dresden) pp. 125-126
– A. Madsack, S. Lulich, W. Wokurek and G. Dogil (2008), Subglottal Resonances and Vowel Formant Variability: A Case Study of High German Monophthongs and Swabian Diphthongs, LabPhon11, Wellington
• Robust Speech Representation, Incremental Specification
– M. Lugger and B. Yang (2007), An incremental analysis of different feature groups in speaker independent emotion recognition, Proc. ICPhS 2007
– S. Uhlich and B. Yang (2008), A generalized optimal correlating transform for multiple description coding and its theoretical analysis, Proc. IEEE ICASSP 2008
– R. Blind, S. Uhlich, B. Yang and F. Allgöwer, Robustification and Optimization of a Kalman Filter with Measurement Loss using Linear Precoding, submitted to Proc. ACC 2009
– M. Lugger and B. Yang, “Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features“, to be published in: Speech Recognition, Publisher: I-Tech Education and Publishing, Vienna, Austria
• Landmark/Phoneme Detection
– A. Madsack, G. Dogil, S. Uhlich, Y. Zeng and B. Yang, On the Importance of Timing Information in Plosive Detection, submitted to Proc. ICASSP 2009