perceptual evalua-on of singing quality (pesnq) · 2020. 12. 13. · perceptual evalua-on of...
TRANSCRIPT
PerceptualEvalua-onofSingingQuality(PESnQ)ChitralekhaGupta1,2,HaizhouLi3,andYeWang1
[email protected],[email protected],[email protected],2NUSGraduateSchoolforIntegra0veSciencesandEngineering,3DepartmentofElectricalandComputerEngineering,
Na0onalUniversityofSingapore
1.Introduc0on• Singing pedagogy is dependent on
human music experts, and is notalwaysaccessibletothemasses
• Aperceptually-validautoma-csingingevalua-on score could serve as acomplement to singing lessons, andmake singing trainingmore accessibletolearners
7.Conclusions• We propose perceptually relevant features to objec0vely
evaluatesingingquality• Weadoptthecogni-vemodelingtheoryofPESQtodesigna
PESnQscorewhichperformsbeKerthandistancefeatures• PESnQ shows 96% improvement over baseline scores in
correla0ngwiththemusic-experthumanjudges
5.PESnQFormula0on
Sound&MusicCompu0ngLabhKp://www.smcnus.org/
2.Howdoexpertsperceptuallyevaluatesingingquality?
ExperimentalDataset• 20 audio recordings collected from 20 singers with varied
singingabili0es–professionaltopoor• Subjec0ve evalua0on for singing quality by 5 professionally
trainedmusicians–inter-judgeagreementwas0.82
Reference
Good
Poor
DisturbanceFeatures
Computa0on
Cogni0veModelingTestsignal
Referencesignal PESnQ
score
3.Objec0veCharacteriza0onofSingingQuality
• UseDTWofMFCCvectorsbetweenframe-equalizedreferenceandtest.Uniformlyfasterorslowertemposhouldn’tbepenalized
RhythmConsistency
ReferenceVs.Good ReferenceVs.PoorIntona-onAccuracy• Compare post-processed pitch contours from rhythm-
alignedreferenceandtest• Key transposi-on should be allowedà pitch deriva-ve,
andmedian-subtractedpitchAppropriateVibrato• Vibratooscilla0ons:Rate:5-8Hz;Extent:30-150cents• Features:vibratolikeliness,rate,extentVoiceQualityandPronuncia-on
PitchDynamicRange
4.PESQ-basedFeatureModelingCombine frame-disturbances of these features with cogni0vemodelinginspiredbytelecommunica0onstandardPESQ[Rix2001]:
alocalizederrorin,mehasalargersubjec,veimpactthanadistributederror
• Localizederror:L6-normoversplitsecondintervals(320ms)• Distributederror:L2-normoverallsplitsecondintervals
System Descrip-onBaselines Pitch distance [Tsai2012], pitch-aligned rhythm distance
[Molina2013],volumedistance[Chang2007,Tsai2012]PESnQsystems Combina0onsof L2-norm, L6+L2-normanddistance features
forthevariousMFCC-alignedperceptualfeatures
6.Results
System Correla-onobjec-vescorewithavg.overallhumanscore
Leave-one-judge-outavg.correla-onscore
HumanJudge – 0.87Baseline 0.30 0.38PESnQ 0.59 0.66
• RhythmConsistency• Intona0onAccuracy• AppropriateVibrato• VoiceQuality• PitchDynamicRange
Baseline PESnQ
Regression
• DTWdistancebetweenMFCCfeaturevectors
• ComparisonofdifferencebetweenminandmaxpitchvaluesDisturbanceFeatures• Frame-leveldevia-onoftheop0malpathfromthediagonal
inDTWforrhythmandintona0onfeatures