voice morphing 12
TRANSCRIPT
-
8/8/2019 Voice Morphing 12
1/12
STQ Workshop, Sophia-Antipolis, February 11th, 2003
Packet loss concealment using
audio morphingFranck Bouteille
Pascal Scalart
Balazs Kvesi
PRESCOM SA, Lannion, FRANCE
France Telecom R&D, Lannion, FRANCE
-
8/8/2019 Voice Morphing 12
2/12
MotivationI
n packet data networks, excess traffic leads to delays or loss in delivery ofinformation. In voice communication, long delays are intolerable and network
delay budgets have strong influence on the design of packet voice systems.
To increase the tolerance of packet voice systems to lost packets some
techniques have been developed.
These techniques do not use the a posterioriinformation of the nextpacket that indicates and detects the lost of one or several frames.
However those techniques are not adapted for long lost periods
(>15ms) because of the non long-term stationnarity of speech signal.
This a posteriori information is generally available because of the playout
buffer management and real time network protocol.
The technique proposed uses the knowledge of the frame received after the
last lost one, the models of the last received frames, and a model interpolation
to synthesized the missing signal.
-
8/8/2019 Voice Morphing 12
3/12
Outline
Introduction
Morphing audio principle
Voiced / Unvoiced strategy
Modelisation and Interpolation
Blocks concatenation and smoothing
Some results of concealed signal
Comparisons and performances
Config
u
ration Results
Conclusion
-
8/8/2019 Voice Morphing 12
4/12
Morphing audio principleContext of lost :
Previous Frame
Frame A
Missing Signal Next Frame
Frame B
Voiced/Unvoiced strategy
Pitch estimation
Frame A : P0
Pitch estimation
Frame B : P1
UVUV
VUV
UVV
VV
Frame BFrame AP0 , P1
P0 , P1 = P0P0 = P1 , P1
Unvoiced
signal
(400 z ) 2.5 ms 0, 1 15 ms (67 z)e eP P
When missing signalis defined as
unvoiced, Frame A iscopied to missingsignal or comfort
noise is generated
-
8/8/2019 Voice Morphing 12
5/12
Morphing audio principle Modelisation and Interpolation: P0 and P1 are used to estimate the number of necessaryintermediate blocks (NbBloc) and the size of these blocks (SizeBloc).
max( 0, 1)!SizeBloc P PNbSampleLoss
Nb Bloc roundSizeBloc
!
-
We model the last pitch period vector (X
0) of the Frame A (Mod
P0)and the first pitch period vector (X1) of the Frame B (ModP1). DCT
(Dicret Cosinus Transform) is used to model X0 and X1. Resolution is 120points at 8kHz of sample frequency. Intermediate blocks, , are used in order to transform, in acontinuous way, the model vector ModP0 to the model vector ModP1 with
linear interpolation of model parameters.
iBlock
IDCT : Inverse Discrete Cosinus Transform.
120
1 00 *
0 1 0 k 120 1
0 1
i
ModP k Mod P k Block n IDCT ModP k i
NbBloc
i NbBloc
n SizeBloc
!
-
e e e e
e e
1
-
8/8/2019 Voice Morphing 12
6/12
Morphing audio principle Blocks concatenation and smoothing
Each block is then copied in the synthesis frame.
. .
Smoothing
Frame A Frame B
Synthesis
Frame
0Block 1Block iBlock 1NbBlocBlock
Smoothing between blocks is realized according to:
(0) (0) * ( 1) (1 (0)) * (0)
( 1) : last sample o previous block (or rame)
y(0) : irst sample o current block (or rame)
( ) ( ) * ( 1) (1 ( )) * ( )
1( ) : moothing actor ( ) 1
x x y
x
x i j x i i y i
ii i
NbPSmoothing
E E
E E
E E
!
!
!
1
0 i NbPSmoothinge
-
8/8/2019 Voice Morphing 12
7/12
Morphing audio principle Some results of concealed signal
Conceal
frame
Nb sampleNb sample
Original
frame
Case of voiced frames of a female speechsignal (30ms of missing signal)
-
8/8/2019 Voice Morphing 12
8/12
Morphing audio principle Some results of concealed signal
Behaviour of the morphing technique during a transition frame (30ms)for male speech signal.
Original
frame
Nb sample
Conceal
frame
Nb sample
We can notice that the concealed speech to noise transition is more voiced
than original frame. In an enhanced morphing techniqu
e the voiced du
rationcould be controlled.
-
8/8/2019 Voice Morphing 12
9/12
Comparisons and performances
Configuration Two speech coders (G.711 and G.723.1) were independently tested,T
he size frame is 30
ms; Five concealment techniques : Previous Frame Copy: PFC, double SidedPeriodic Substitution: DSPS1, ITU-Trecommended technique definedfor each specific coder: G.711 and G.723.1, GFEC technique2 and AudioMorphing;
Two series of rate were defined: 5 % and 10 %. The losses can appearby burst, but are usually isolated ;
The number of sentences was 15 (8 female and 7 male speech files)1 : J. Tang, "Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing
Speech in Packet Voice Communications," IEEE Computers and Communications, pp. 454-458, 1991.
2 : B. Kvesi, D. Massaloux, "Method of Packet Errors Cancellation Suitable for any Speech and Sound CompressionScheme", ETSI STQ Workshop, February 2003, Sophia-Antipolis
Ten subjects were participating to an informal test: they were
asked to listen to coded speech signals that have beencorrected by different concealment techniques
-
8/8/2019 Voice Morphing 12
10/12
Comparisons and performances Results for G.711 codec
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
Taux 5%
Taux 10%
Rate 5%
Rate 10%
Score (/15)
PFC FECG711 DSPS GFEC MORPHING
-
8/8/2019 Voice Morphing 12
11/12
Comparisons and performances Results for G.723.1 codec
Note (/15) - G.723.1 - Taux de perte 5% et 10%
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
RTP FECG.723.1 DSPS KB Morphing
Taux 5%
Taux 10%
Rate 5%
Rate 10%
Score (/15)
PFC FECG723 DSPS GFEC MORPHING
-
8/8/2019 Voice Morphing 12
12/12
Conclusion
Proposed technique improves the quality of theframe correction for strong lost rate (5 % and 10 %);
Morphing audio adds latency (Frame B is required),but is acceptable for application of VoIP;
Another modelisation are possible and voicedcondition can be controlled to improve restitution
quality