perceptual evalua-on of singing quality (pesnq) · 2020. 12. 13. · perceptual evalua-on of...

1
Perceptual Evalua-on of Singing Quality (PESnQ) Chitralekha Gupta 1,2 , Haizhou Li 3 , and Ye Wang 1 [email protected], [email protected], [email protected] 1 School of Compu0ng, 2 NUS Graduate School for Integra0ve Sciences and Engineering, 3 Department of Electrical and Computer Engineering, Na0onal University of Singapore 1. Introduc0on Singing pedagogy is dependent on human music experts, and is not always accessible to the masses A perceptually-valid automa-c singing evalua-on score could serve as a complement to singing lessons, and make singing training more accessible to learners 7. Conclusions We propose perceptually relevant features to objec0vely evaluate singing quality We adopt the cogni-ve modeling theory of PESQ to design a PESnQ score which performs beKer than distance features PESnQ shows 96% improvement over baseline scores in correla0ng with the music-expert human judges 5. PESnQ Formula0on Sound & Music Compu0ng Lab hKp://www.smcnus.org/ 2. How do experts perceptually evaluate singing quality? Experimental Dataset 20 audio recordings collected from 20 singers with varied singing abili0es – professional to poor Subjec0ve evalua0on for singing quality by 5 professionally trained musicians – inter-judge agreement was 0.82 Reference Good Poor Disturbance Features Computa0on Cogni0ve Modeling Test signal Reference signal PESnQ score 3. Objec0ve Characteriza0on of Singing Quality Use DTW of MFCC vectors between frame-equalized reference and test. Uniformly faster or slower tempo shouldn’t be penalized Rhythm Consistency Reference Vs. Good Reference Vs. Poor Intona-on Accuracy Compare post-processed pitch contours from rhythm- aligned reference and test Key transposi-on should be allowed à pitch deriva-ve, and median-subtracted pitch Appropriate Vibrato Vibrato oscilla0ons: Rate: 5-8 Hz; Extent: 30-150 cents Features: vibrato likeliness, rate, extent Voice Quality and Pronuncia-on Pitch Dynamic Range 4. PESQ-based Feature Modeling Combine frame-disturbances of these features with cogni0ve modeling inspired by telecommunica0on standard PESQ [Rix2001]: a localized error in ,me has a larger subjec,ve impact than a distributed error Localized error: L6-norm over split second intervals (320ms) Distributed error: L2-norm over all split second intervals System Descrip-on Baselines Pitch distance [Tsai2012], pitch-aligned rhythm distance [Molina2013], volume distance [Chang2007, Tsai2012] PESnQ systems Combina0ons of L2-norm, L6+L2-norm and distance features for the various MFCC-aligned perceptual features 6. Results System Correla-on objec-ve score with avg. overall human score Leave-one-judge-out avg. correla-on score Human Judge 0.87 Baseline 0.30 0.38 PESnQ 0.59 0.66 Rhythm Consistency Intona0on Accuracy Appropriate Vibrato Voice Quality Pitch Dynamic Range Baseline PESnQ Regression DTW distance between MFCC feature vectors Comparison of difference between min and max pitch values Disturbance Features Frame-level devia-on of the op0mal path from the diagonal in DTW for rhythm and intona0on features

Upload: others

Post on 29-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perceptual Evalua-on of Singing Quality (PESnQ) · 2020. 12. 13. · Perceptual Evalua-on of Singing Quality (PESnQ) Chitralekha Gupta1,2, Haizhou Li3, and Ye Wang1 chitralekha@u.nus.edu,

PerceptualEvalua-onofSingingQuality(PESnQ)ChitralekhaGupta1,2,HaizhouLi3,andYeWang1

[email protected],[email protected],[email protected],2NUSGraduateSchoolforIntegra0veSciencesandEngineering,3DepartmentofElectricalandComputerEngineering,

Na0onalUniversityofSingapore

1.Introduc0on•  Singing pedagogy is dependent on

human music experts, and is notalwaysaccessibletothemasses

•  Aperceptually-validautoma-csingingevalua-on score could serve as acomplement to singing lessons, andmake singing trainingmore accessibletolearners

7.Conclusions•  We propose perceptually relevant features to objec0vely

evaluatesingingquality•  Weadoptthecogni-vemodelingtheoryofPESQtodesigna

PESnQscorewhichperformsbeKerthandistancefeatures•  PESnQ shows 96% improvement over baseline scores in

correla0ngwiththemusic-experthumanjudges

5.PESnQFormula0on

Sound&MusicCompu0ngLabhKp://www.smcnus.org/

2.Howdoexpertsperceptuallyevaluatesingingquality?

ExperimentalDataset•  20 audio recordings collected from 20 singers with varied

singingabili0es–professionaltopoor•  Subjec0ve evalua0on for singing quality by 5 professionally

trainedmusicians–inter-judgeagreementwas0.82

Reference

Good

Poor

DisturbanceFeatures

Computa0on

Cogni0veModelingTestsignal

Referencesignal PESnQ

score

3.Objec0veCharacteriza0onofSingingQuality

•  UseDTWofMFCCvectorsbetweenframe-equalizedreferenceandtest.Uniformlyfasterorslowertemposhouldn’tbepenalized

RhythmConsistency

ReferenceVs.Good ReferenceVs.PoorIntona-onAccuracy•  Compare post-processed pitch contours from rhythm-

alignedreferenceandtest•  Key transposi-on should be allowedà pitch deriva-ve,

andmedian-subtractedpitchAppropriateVibrato•  Vibratooscilla0ons:Rate:5-8Hz;Extent:30-150cents•  Features:vibratolikeliness,rate,extentVoiceQualityandPronuncia-on

PitchDynamicRange

4.PESQ-basedFeatureModelingCombine frame-disturbances of these features with cogni0vemodelinginspiredbytelecommunica0onstandardPESQ[Rix2001]:

alocalizederrorin,mehasalargersubjec,veimpactthanadistributederror

•  Localizederror:L6-normoversplitsecondintervals(320ms)•  Distributederror:L2-normoverallsplitsecondintervals

System Descrip-onBaselines Pitch distance [Tsai2012], pitch-aligned rhythm distance

[Molina2013],volumedistance[Chang2007,Tsai2012]PESnQsystems Combina0onsof L2-norm, L6+L2-normanddistance features

forthevariousMFCC-alignedperceptualfeatures

6.Results

System Correla-onobjec-vescorewithavg.overallhumanscore

Leave-one-judge-outavg.correla-onscore

HumanJudge – 0.87Baseline 0.30 0.38PESnQ 0.59 0.66

•  RhythmConsistency•  Intona0onAccuracy•  AppropriateVibrato•  VoiceQuality•  PitchDynamicRange

Baseline PESnQ

Regression

•  DTWdistancebetweenMFCCfeaturevectors

•  ComparisonofdifferencebetweenminandmaxpitchvaluesDisturbanceFeatures•  Frame-leveldevia-onoftheop0malpathfromthediagonal

inDTWforrhythmandintona0onfeatures