nokia_3gpp_evs_codec_white_paper.pdf

24
The 3GPP Enhanced Voice Services (EVS) codec Nokia Networks Nokia Networks white paper The 3GPP Enhanced Voice Services (EVS) codec

Upload: yusran-em

Post on 04-Sep-2015

7 views

Category:

Documents


1 download

TRANSCRIPT

  • The 3GPP Enhanced Voice Services (EVS) codec

    Nokia Networks

    Nokia Networks white paperThe 3GPP Enhanced Voice Services (EVS) codec

  • Page 2 networks.nokia.com

    Contents

    Executive summary 3

    1. Introduction 3

    2. Evolution of 3GPP mobile voice/audio codecs 4

    3. EVS Standardization 6

    4. EVS Features 6

    5. EVS Technology 7

    6. EVS Performance 9

    7. EVSCapacityBenefitsforLTE 17

    8. Beyond EVS 18

    9. Conclusion 19

  • Executive summaryThe 3rd Generation Partnership Project (3GPP) has developed a new voice codec that brings voice quality onto an unprecedented high level. The Enhanced Voice Services (EVS) codec provides equally high quality both for voice and generic audio (such as music) and has low latency enabling its use for real-time communication. The EVS codec was developed in particular for 3GPPLTE,butitwillbringbenefitstoanyVoIPandCSsystem.Comparedtoearliervoicecodecs,theEVScodecprovidessubstantiallyimprovedqualityatsimilarbitrates,qualityraisedtoanunprecedentedlevelathigherbitrates,andsignificantlyimprovednetworkcapacitywhilemaintainingthesame quality. The EVS codec is fully interoperable with HD Voice and forms a continuous evolution to higher quality and more natural communication with a realistic impression of being there.

    1. IntroductionVoice1 quality plays an important role in the evolution of mobile systems. Users are becoming accustomed to computer based Voice-over-IP (VoIP) services with extended audio bandwidth and also expect high quality from mobilevoiceservices.HD(HighDefinition)Voiceiscurrentlybeingrolledoutglobally,bringingsubstantialimprovementforvoicequalityinmobilecommunications [1]. HD Voice allows people to communicate with each other more clearly and easily. The wider audio bandwidth up to 8 kHz provided by the AMR-WB (Adaptive Multi-Rate Wideband) codec improves the intelligibility andnaturalnessofspeech,bringsafeelingoftransparentcommunicationand also makes speaker recognition easier [2]. HD Voice doubles the audio bandwidth compared to traditional telephony. It is therefore not surprising that HD Voice makes one-to-one voice calls much more intimate and conferencecallsmoreefficient.Theimprovedintelligibilityandnaturalsoundenable clear calls even in noisy environments.

    WhileHDVoiceisbeingintroduced,thenextstepofimprovementinvoicequality has already emerged from research into standards [3]. A new Enhanced Voice Services (EVS) codec has recently been standardized by The 3rd Generation Partnership Project (3GPP) to take voice quality beyond HD. It brings voice quality to an unprecedented level. By extending the audio bandwidthupto20kHz,itcoversthefullrangeofhumanhearing.TheEVScodecisthefirst3GPPconversationalcodec(i.e.withlatencysuitableforcommunication) that provides equally high quality both for voice and generic audio,suchasmusic(andmixedvoice/musiccontent).EVSenableshighquality live music sharing with mobile phones from concerts (or sharing any other important live events) with a realistic impression as if you were there. TheEVScodecwasdevelopedspecificallyforIP-basedcommunicationswithhigh robustness to delay jitter and packet losses. The codec therefore not only reaches the very limits of human audio perception but is also designed to cope with network impairments in IP-based communications.

    Page 3 networks.nokia.com

    1 Voice and speech are used interchangeably in this paper.

  • ThispaperdiscussestheEVScodec,presentsitsbenefitsanddescribeshowitwill revolutionize voice communications. Voice codec evolution in 3GPP is also discussed. Nokia is one of the developers of the new EVS codec.

    2. Evolution of 3GPP mobile voice/audio codecs

    The new EVS codec continues the strong contribution from Nokia in developing voice and audio codecs for mobile communication for GSM and3GPPsystems(3GWCDMAandLTE).Nokia,incollaborationwithothercompanies,hasdeveloped:

    1. theGSMEFR(EnhancedFull-Rate)codecin1996,

    2. theAMR-NB(AMRNarrowband)codec-knownalsosimplyasAMR-in1999,

    3. theAMR-WB(AMRWideband)codecin2001,

    4. theAMR-WB+(ExtendedAMR-WB)codecin2004,andnow

    5. the new EVS codec in 2014.

    The EFR codec [4] is used in the 2G GSM system. It is also the highest quality rate in the multi-rate AMR-NB codec. The AMR-NB codec [5] is the default codecinall3GPPvoiceservicesin3Gandbeyond(WCDMAandLTE).HDVoiceisbasedontheAMR-WBcodec[2],whichisawidebandaudioevolutionoftheAMR-NBcodec.TheExtendedAMR-WB(AMR-WB+)codec[6],withmodesforcodingstereosignals,isdesignedforgenericaudiofornon-conversationalapplications such as music streaming. The EVS codec is the next step in 3GPP codec evolution. See Table 1.

    Page 4 networks.nokia.com

    GSM/GERAN EFR (Enhanced Full-Rate)

    AMR-NB (Adaptive Multi-Rate - Narrowband)

    AMR-WB (Adaptive Multi-Rate - Wideband)

    Extended AMR-WB (AMR-WB+)

    EVS (Enhanced Voice Services)

    Standardized 1996 1999 2001 2004 2014

    Audio bandwidth

    Narrowband Narrowband Wideband Fullband Fullband Super wideband Wideband Narrowband

    Use in 3GPP Used in the GSM system.

    Default codec for voice in 3G and beyond(WCDMA,LTE).

    The HD Voice codec. Default codec for wideband voice in 3G and beyond (WCDMA,LTE).

    Recommended codec for generic audio in 3G and beyond (PS services over WCDMA,LTE).

    The evolved HD Voice codec for LTE.

    Bitrate(s) 12.2 kbit/s 4.75 12.2 kbit/s 6.6 23.85 kbit/s 6 48 kbit/s 5.9 128 kbit/s

    Table 1. 3GPP codecs developed by Nokia in collaboration with other companies.

  • Inadditionto3GPPspeechcodecstandardization,Nokiahasalsosignificantlycontributedtothefollowingspeechcodecstandards:PCS1900EFR(standardizedin1995),US-TDMAEFR(1996),CDMAEVRC(1996),ITU-TG.722.2(2001),CDMAVMR-WB(2004)andITU-TG.718(2007andsuperwideband extension in 2009).

    TheGSM/3GPPvoicecodecevolutionhassignificantlyimprovedvoicequalityandsystemefficiencyandhasbeenasignificantfactorinthesuccessof modern mobile communication. There has been continuous extension of audio bandwidth from narrowband voice to the whole range of human hearing,bringinghighqualitynotonlyforspeechbutforgenericaudioaswell(including music). At the same time robustness to transmission impairments andsystemefficiencyhavebeenimproved.

    ThenewEVScodecbringssubstantiallyenhancedvoicequality,improvederrorresilienceandincreasedcodingefficiencyfornarrowband(NB)andwideband (WB) audio bandwidths. Further quality enhancement is brought bytheintroductionofsuperwideband(SWB)andfullband(FB)audio,seeFigure1.ComparedtoHDVoice,theEVScodeccanprovide(a)substantiallyimprovedqualityatsimilarbitrates,(b)qualityraisedtoanunprecedentedlevelatincreasedbitrates,andalso(c)significantlyimprovednetworkcapacitywhile maintaining the same quality as in HD Voice.

    An important goal of the EVS codec development was to enable the further evolved HD Voice (brought by the EVS codec) to interact with the HD Voice service. The resulting EVS codec is fully interoperable with the AMR-WB codec ofHDVoice.TheEVScodecthereforerepresentsacontinuousevolutionofHD Voice to even higher quality and more natural communication; it is a complementary and not a competing solution for reaching even higher levels of voice and audio quality.

    Page 5 networks.nokia.com

    Figure1.Illustrationoftheoreticalaudiobandwidths:narrowband,wideband,super wideband and fullband. (The actual bandwidths in use may be somewhat narrowerthanthese,e.g.,fornarrowbandthebandwidthistypicallyabout300 3400 Hz.)

    Super wideband (SWB)

    Fullband (FB)

    0 4000 8000 16000 20000

    Wideband (WB)

    Narrowband (NB)

    Frequency (Hz)

  • 3. EVS Standardization The 3GPP EVS codec work started in 2007 with a pre-study to develop the EVS codec concept. The study was immediately followed by codec standardization during2010-2014.MostEVSspecificationswereapprovedinSeptember2014 and the standardization work was completed in December 2014 for 3GPPRelease12.TheEVScodecspecificationscanbefoundin[7-18].

    EVSstandardizationconsistedofthreephases:(1)Qualification(topre-selectthebestcandidatecodecs),(2)Selection(toselectthebestcodecmeetingallrequirements),and(3)Characterization(todeterminethefullperformanceofthe new codec). Codec selection was based on a rigorous process comparing the codec performance and other characteristics against requirements set for voicecodecsinLTE.

    Inparticular,theEVScodecwasdevelopedwiththeall-IP3GPPLTEinmind,butitwillbringbenefitstoanyVoIPorCSsystem.In3GPPRelease12,theEVScodecisrecommendedforvoiceinPSmultimediatelephonyoverLTEandWCDMA.Moreover,theAMR-WBinteroperablemodeofEVSisdefinedasan alternative implementation of the AMR-WB codec (when the EVS codec is supported)andEVSisdefinedasthedefaultcodecforSWBandFBspeech.InRelease13(targetedtobefinalizedbyendof2015)theuseofEVSisintended to be extended to CS telephony over WCDMA.

    TheEVScodecwasdevelopedjointlyby12companies:Ericsson,FraunhoferIIS,Huawei,Nokia,NTT,NTTDOCOMO,Orange,Panasonic,Qualcomm,Samsung,VoiceAgeandZTECorporation.

    4. EVS FeaturesThe EVS codec enhances perceptual audio quality and improves coding efficiencyforNBandWBaudiobandwidthsusingawiderangeofbitratesstartingfrom7.2kbit/s.Inadditiontofixed-ratecodingrates,asource-controlled variable bitrate (SC-VBR) mode at an average bitrate of 5.9 kbit/s is supportedforNBandWBaudio.TheEVScodecfurtherprovidesasignificantstepinvoicequalitywithSWBandFBoperation,startingfrom9.6and16.4kbit/s,respectively.ThemaximumbitrateoftheEVScodecis24.4kbit/sforNB and 128 kbit/s for all other audio bandwidths.

    TheEVScodecoffersinputandoutputsamplingat8,16,32and48kHz.Inordertooptimizetheperceptualcodingquality,anintegratedbandwidthdetectorautomaticallyadaptstotheactualbandwidthoftheinputsignal,which may be lower than the bandwidth indicated to the codec. The ability to switch the bit rate at every 20 ms frame further allows the codec to easily adapt to changes in channel capacity.

    The codec features discontinuous transmission (DTX) with algorithms for voice/sound activity detection (VAD) and comfort noise generation (CNG). In theSC-VBRcodingmode,DTX/CNGisalwaysusedforinactivespeechcoding.An advanced error concealment mechanism mitigates the quality impact of

    Page 6 networks.nokia.com

  • Page 7 networks.nokia.com

    channel errors resulting in lost packets. The codec also contains a system for jitterbuffermanagement(JBM)totacklethejitter,i.e.variationinthedelayofthereceivedpackets.Thereisalsoaspecialchannel-awaremode,achievingincreased robustness in particularly adverse channel conditions. The channel-aware coding mode operates at 13.2 kbit/s for both WB and SWB audio.

    InadditiontotheEVSPrimarymodesdescribedabove,theEVScodecenablesbackward compatibility with the Adaptive Multi-Rate Wideband (AMR-WB) codecthroughaninteroperable(IO)mode.Enhancedperceptualqualityover AMR-WB is achieved for all nine bit rates between 6.6 and 23.85 kbit/s thankstobothencoderanddecoderimprovementsintheIOmode,suchasoptimized pitch prediction and enhanced post-processing. Seamless switching isprovidedbetweenAMR-WBIOandtheEVSPrimarymodes.

    ThefeaturesetoftheEVScodecresultsinahighlyflexible,dynamicallyconfigurablecodecthatspansqualityrangesfromthehighestcompressionratesallthewayuptotransparentcoding.Furthermore,theAMR-WBIOmodemaybeusedforvoiceoverLTEasanalternativeimplementationofAMR-WBinterminals and gateways that support the EVS codec.

    5. EVS Technology Inordertoprovidegoodqualityforbothspeechandmusicsignals,acontent-dependenton-the-flyswitchingbetweenspeechandaudiocompressionisemployed.TheEVScodecisthefirstcodectoprovidecoreswitchingtechnology at the low algorithmic delay of 32 ms.

    A high level diagram of the structure of the codec is presented in Figure 2.

    EVS modes

    AMR-WB IO mode

    ACELP based encoder

    Core, and DTX

    switching

    EVS modes

    ACELP based decoder

    Core, and DTX

    switching

    AMR-WB IO encoder AMR-WB IO mode

    AMR-WB IO decoder

    POST

    -PRO

    CESS

    ING

    ENCODER DECODER

    PRE-

    PRO

    CESS

    ING

    BWE encoder

    DTX, CNG encoder CH

    ANN

    EL

    MDCT based encoder

    Jitt

    er b

    uer

    man

    agem

    ent

    (JBM

    )

    BWE decoder

    MDCT based decoder

    DTX, CNG decoder

    Figure 2. High-level diagram of the EVS codec.

  • Thepre-processingconsistsmainlyofhigh-passfilteringat20Hz,resampling,signalactivitydetection,noiseupdateandestimation,bandwidthdetector,signalanalysisandclassification.Theanalysisandprocessingfromthepre-processingunitenablesaveryfineclassificationofthesignaltypeandatargetedcodingmethodisappliedtoeachcase.Thisfinetuningtotheinputsignal characteristics makes the EVS codec extremely versatile content-wise. Eventhoughthesignalbandwidthissignaledasaninputparameter,thebandwidth detector is employed to detect lower than signaled bandwidth. This isdoneinordertoavoidinefficientcodingforband-limitedcontent.Basedonthebitrateandinputbandwidth,theencodingprocessintheEVScodecusestwodifferentinternalsamplingrates,therebyallowingbetterfidelityoftheencodedsignalwithrespecttotheoriginalinputsignal.WithintheEVSmodes,the core and DTX switching is performed based on information extracted duringpre-processing,aswellasontheoverallinputparameters(bitrate,useofdiscontinuoustransmission,inputsignalbandwidth).IfAMR-WBIOmodeisselected,encodingaccordingtotheinteroperableAMR-WBencoderisperformed.Theswitchingbetweenthecores,aswellasbetweendifferentmodes can be done at each 20 ms frame boundary.

    The speech core used in the EVS codec is based on the principles of Algebraic Code-ExcitedLinearPrediction(ACELP)inheritedfromtheAMR-WBstandard[2].ACELPreliesonthemodellingofspeechusinglinearpredictionwithinananalysis-by-synthesismethod.Thelinearprediction(LP)parametersandtheexcitation parameters are encoded and most of the bit budget of this model isallottedtotheLPparameters.Inordertomakeuseofthefinecontentdescriptionofthecoderfromthepreprocessingstage,differentcodingrepresentationsforeachsignaltypeareused.Thisdifferentiationrequireshighmemoryconsumption.Furthermore,thisdifferentiationand,implicitly,the memory requirements are further increased by the large range of bitrates that the codec supports. Nokias multi-stage structured quantizer technology [19],basedonmultiple-scalelattices,allowshighencodingefficiencyaccommodatingallsignaltypes,bandwidthsbitratesandinternalsamplingrates,whilekeepingtheencodingcomplexityandtheROMtableswithinpracticallimits.Inadditiontotheflexibilitybroughtbythequantizerstructure,controlled alternation between predictive and non-predictive modes for encodingtheLPparametersprovidesgoodresiliencetoframelosserrors.Forspeechsignals,thepartofthebandwidthnotcoveredbytheACELPmodelfor SWB signals is encoded using a time-domain bandwidth extension (BWE) technique.Multi-bandwidthlisteningtestresultsshowasignificantqualityimprovement for SWB compared to WB for all supported operating points.

    EncodingbasedontheModifiedDiscreteCosineTransform(MDCT)isbestsuited for music and various background type signals. Compared with other music-orientedcontentdistributioncodecssuchasAAC[20],theEVScodecoffershighqualitycompressionofmusicsignalsatlowdelayandlowbitrates.ThisisaccomplishedbyusingdifferentMDCT-basedmodes,dependingoncontent type and operating mode.

    Discontinuous transmission within the EVS modes is important for optimizationofbatterylifeinmobilecommunications.InDTXmode,

    Page 8 networks.nokia.com

  • Page 9 networks.nokia.com

    transmission of noise is replaced by comfort noise generation in the decoder. An improved voice activity detection helps to distinguish between active speech,activemusicandinactiveperiods(recordingnoise,backgroundnoise)andestimatesthelevelofthebackgroundnoise.Basedonthesedecisions,theEVScodecimplementstwoversionsofCNG,oneLP-basedandthesecondoneafrequencydomainCNG.Consequently,anappropriateDTXmodeisofferedbyallapplicableoperatingpointsupto24.4kbit/s.

    Post-processingtools,suchasmusicenhancer,inactivesignalpostprocessing,bass-boostfilterandformantpostfilter,furtherensurethehighfidelityofthedecodedsignal.ComparedwiththeAMR-WBcodec,themostnotable improvements due to post-processing are audible in noisy channel conditions and for mixed content.

    6. EVS PerformanceThe EVS codecs subjective speech and audio quality has been evaluated extensively by means of listening tests. During the EVS codec standardization processin3GPP,severalroundsoflisteningtestswereconducted.Themost recent rounds were the selection phase listening testing [21] and characterizationphaselisteningtesting[22].Additionally,Nokiahasconducted several listening tests to characterize the quality even further [23]. 3GPP has also published a technical report about the performance characteristics of the EVS codec [18].

    Themainattributesaffectingthesubjectivespeechandaudioqualityareaudiobandwidth,signaltype,channelconditions,andtheavailablebitrate.

    Wideraudiobandwidthprovidesadditionalnaturalnesstovoicequalityandalsoenableshighqualityforgenericaudio.Traditionally,narrowbandhasbeenusedfortelephony.Recently,widebandaudiousingtheAMR-WBcodec(HD Voice) has gained traction. EVS brings to the table a totally new quality level with super wideband and fullband bandwidths. Table 2 summarizes the sampling rates and audio bandwidths of the EVS codec. The EVS encoder hasaglobalhigh-passfiltersetatabout20Hzforallsamplingrates.Thisremoves very low-frequency noise outside the range of human hearing and improves coding.

    Sampling rate Audio bandwidth

    Narrowband (NB) 8 kHz up to 4 kHz

    Wideband (WB) 16 kHz up to 8 kHz

    Super wideband (SWB) 32 kHz up to 16 kHz

    Fullband (FB) 48 kHz up to 20 kHz

    Table 2. Sampling rates and audio bandwidths in the EVS codec.

  • Signaltypeaffectsthecodingquitesignificantly,especiallyifthecodecisnotoptimizedforbothspeechandmusic.Forexample,theolder3GPPcodecs AMR-NB and AMR-WB are traditional speech codecs optimized primarily for speech and thus provide only limited audio quality with music signals,evenwiththehighestavailablebitrates.EVS,ontheotherhand,isalso designed for music and to be signal independent. Therefore it provides reasonablequalityforgenericaudioevenatlowbitrates.Despiteitsname,EVS is much more than a voice codec.

    Channelconditionsreferheretothereal-timetransmissionchannel(e.g.LTEradionetwork)where,dependingontheradiochannelcharacteristicsandnetworkconditions(e.g.congestion),someaudiopacketsmaybelost,corruptedorreceivedtoolateforreal-timedecoding.Inthesecases,theaudio codec has to perform frame error concealment (FEC). The decoder usessomeveryspecialalgorithmstogenerateanartificialbutpleasantsounding signal to replace the lost signal frames - instead of muting or producingdisturbingsounds.Typically,speechcodecscanhandleonlyasmall amount of lost frames (up to about 3%) without severe artefacts. With the EVS codec this is further improved. With even more than 10% of lost frames,EVSstillprovidesunderstandablevoiceoutputtotemporarilycopewith such extreme conditions without severe artefacts.

    Bitratedefineshowmuchnetworkandradioresourcesareconsumed.Themost typical bitrates in 3GPP voice telephony are around 12-13 kbit/s. AMR-NB supports bitrates from 4.75 to 12.2 kbit/s and AMR-WB from 6.6 to 23.85 kbit/s. EVS supports a wide range of bitrates from 5.9 to 128 kbit/s.Naturally,withincreasingbitrate,onecanexpectimprovedsubjectivequality. EVS can provide (depending on the choice of modes that are used)significantlyimprovedqualityoverAMR-NBandAMR-WBatsimilarbitrates,qualityraisedtoanunprecedentedlevelatincreasedbitrates,orsignificantlyimprovednetworkcapacitywhilemaintainingthesamequality.

    ThefollowingfigurescontainresultsfromsubjectivelisteningtestsconductedbyNokia.Theyarerepresentedonameanopinionscore(MOS)scale.TheMOSscoreiscalculatedbasedonhownon-expertlistenerscastvotesfordifferenttestsignalsegmentscontainingspeechsentences,noisyspeechexcerptsormusicsequences.Typically,inasinglelisteningtest,eachconditionisheardby24-32differentlistenersandeachlistenercastsvotesfor4-8differentsignalsegmentswiththesameconditions.ThustheindividualMOSscoreistypicallyan average of 96-256 casted votes.

    The quality scale in standardized P.800 test is from 1 to 5 [31]. In the Nokia tests,thiswasextendedto9,wheretheendpointshaveverbaldescriptionfrom 1 (poor) to 9 (excellent). The range extension gives better resolution in the results. The tests used ACR (Absolute Category Rating) with a 9-point scale,i.e.ACR9MOSmethodology.

    InadditiontotheEVScodecitself,thetestscoveredseveralotherstandardized codecs and reference conditions to provide insight on the level of improvement. The following list contains the reference conditions and codecs:

    Page 10 networks.nokia.com

  • Directreferenceconditionswithlimitedaudiobandwidthbutnocoding.Thefollowinglowpasscut-offfrequencieswereused:20kHzforFB,16kHzforSWB,8kHzforWBand4kHzforNB.

    AMR-NBcodec,alsoknownasAMR[25].

    AMR-WBwidebandcodec[26].

    ITU-TG.722.1AnnexC[27].Thisisalow-complexitysuperwidebandvoicecodec with an audio bandwidth of 14 kHz used in teleconferencing systems.

    ITU-TG.718AnnexB[28].Thisisanembedded(864kbit/s)speechcodecfornarrowband,widebandandsuperwidebandaudio.Superwidebandaudio was used in the tests.

    ITU-TG.719[29].Thisisafullbandvoiceandaudiocodecusedinteleconferencing systems.

    IETFOpus,versionopus-tools-0.1.9-win32.zip[30].TheCBR(Constant Bit-Rate)configurationwasused.

    Clean speech resultsThe clean speech (i.e. no background noise) results in Figure 3 show that EVSissignificantlybetterthaneitherAMRorAMR-WBatalloperatingpoints.

    Page 11 networks.nokia.com

    Figure3.CleanspeechMOSvalues.

    NB

    WB

    SWB FB

    2

    3

    4

    5

    6

    7

    8

    0 8 16 24 32

    G.722.1C

    G.718B

    G.719

    Opus CBR

    AMR

    AMR-WB

    EVS-NB

    EVS-WB

    EVS-SWB/FB

    kbit/s

    ACR9

    MOS

  • InSWBandFBmodes,EVSissignificantlybetterthanG.718B,G.722.1CorG.719.Atbitratesbelow32kbit/s,EVSinSWBandFBmodesisalsobetterthanOpusCBR,andbelow24.4kbits/sthedifferencebecomesverysubstantial.Notably,EVS-SWB9.6kbit/sisbetterthanAMR-WB23.85kbit/sandOpusCBR20kbit/s,providingbettervoicequalityatlessthanhalfthebitrate.

    Speech with background noise resultsThe noisy speech (i.e. speech with background noise) results in Figure 4 are verysimilartothecleanspeechresults,withtheexceptionthatG.722.1Cand G.719 perform somewhat better in noisy speech in relation to the other codecs in the test. This phenomenon is well known from previous listening tests[24].Nevertheless,EVSintheSWBandFBmodesremainssignificantlybetterthaneitheroftheseothercodecs.Office,street,carandcafeterianoisetypes were used in the test.

    Music and mixed content resultsThe music and mixed content results in Figure 5 are quite similar to the clean speechresults,butnowthelackofqualityimprovementwhenincreasingthebitrateismorevisibleat20kbit/sforOpus.Withmusicandmixedcontent,

    Page 12 networks.nokia.com

    NB

    WB

    SWB FB

    2

    3

    4

    5

    6

    7

    8

    0 8 16 24 32

    G.722.1C

    G.718B

    Opus CBR

    AMR

    AMR-WB

    EVS-NB

    EVS-WB

    G.719

    EVS-SWB/FB

    kbit/s

    ACR9

    MOS

    Figure4.NoisyspeechMOSvalues.

  • EVSreallyshowsexcellentperformancecomparedtoOpus,AMRandAMR-WB.Ateachbitrateintherange5.9-24.4kbit/s,EVSisupto2MOSscoresbetterthan any other codec. The reason for the good performance at low bitrates is duetotheswitchedACELP/transformcodingcoresusedinEVS.Themixedcontent used in the test contained speech and music signals intermixed (e.g. radio advertisements and telephone on-hold messages/music).

    Channel condition resultsFigures 6 and 7 show how EVS performs with varying amounts of channel errors. The frame error rate (FER) is shown as a percentage on the X-axis and theMOSscoreisshownontheY-axis.DirectNB,WBandFBsignalsobtainedMOSscoresof4.91,7.04and8.00inthislisteningtest.Allsignaltypeswereused:cleanspeech,noisyspeech,musicandmixedcontent.

    Figure6showsresultsforAMR,AMR-WBandEVSatrespectivebitratesof12-13kbit/s(13.2kbit/sforEVS,12.65kbit/sforAMR-WBand12.2kbit/sforAMR).The inherent robustness of the EVS codec is clearly visible in the curves when comparing EVS-SWB and EVS-WB to AMR-WB and EVS-NB to AMR. EVS showssignificantimprovementatalloperatingpoints.Moreover,thehighertheFER,thegreateristheperformanceimprovement.

    Page 13 networks.nokia.com

    NB

    WB

    SWB FB

    2

    3

    4

    5

    6

    7

    8

    0 8 16 24 32

    G.722.1C

    G.718B

    G.719

    Opus CBR

    AMR

    AMR-WB

    EVS-NB

    EVS-WB

    EVS-SWB/FB

    kbit/s

    ACR9

    MOS

    Figure5.MusicandmixedcontentMOSvalues.

  • Ascanbeseen,EVS-SWB,EVS-WBandEVS-NBshowverylittleornoqualitydegradationat3%FERwhencomparedtoacleanchannel.Evenat4-6%FER,EVS-WB and EVS-SWB provide quality comparable to AMR-WB without errors andEVS-NBprovidesqualitycomparabletoAMRwithouterrors.At10%FER,EVS-WB and EVS-SWB provide useable audio quality (similar to AMR-WB at 4-5%FER),asatemporarymeasuretocopewitherrors,thusdoublingtheframe error robustness for similar quality. At the highest measured FER of 15%,EVS-WBprovidesbetterqualitythanAMRat3%FER.Athigherrorratesabove8%,FEREVS-NBprovideshigheraudioqualitythanAMR-WB,althoughit is limited to a narrower signal bandwidth.

    Figure 7 shows how EVS performs compared to the highest bitrate of AMR-WB i.e. 23.85 kbit/s. The inherent robustness of EVS is also evident from these results.Ascanbeseen,EVS-WBandEVS-SWBat13.2kbit/sbothperformsignificantlybetterthanAMR-WBat23.85kbit/satallframeerrorrates,eventhough the bitrate is almost halved. If we compare EVS-SWB at 32 kbit/s to AMR-WBat23.85kbit/s,weseeanaverageimprovementof1.4MOSpointsoveralloperatingpoints.Evenat6%frameerrorrate,EVS-SWBat32kbit/sprovidessignificantlybetterqualitythanAMR-WBat23.85kbit/sincleanchannel conditions. EVS-SWB at 13.2 kbit/s with 6% FER provides similar quality to AMR-WB at 23.85 kbit/s in clean channel conditions. EVS-FB at 48 kbit/s provides consistently the highest quality at all operating points (1.8 MOSpointaverageimprovementoverAMR-WBat23.85kbit/s).EVS-FBthusachieves unprecedented quality for communications.

    Page 14 networks.nokia.com

    1

    2

    3

    4

    5

    6

    7

    0 3 6 10 15

    ACR9

    MOS

    AMR - 12,2

    EVS-NB - 13,2

    AMR-WB - 12,65

    EVS-WB - 13,2

    EVS-SWB - 13,2

    FER%

    Figure 6. Audio quality in the presence of channel errors. EVS codec compared to AMR-WB at 12.65 kbit/s. (All codecs in test have about the same bitrate.)

  • Summary of EVS performanceFinally,allpreviousresultsarecombinedintoasinglegraph(Figure8).Someadditional results for higher bitrates up to 128 kbit/s are also included. FromthesecombinedresultsitcanbeseenthatEVSisbetterthan,orstatisticallyequivalentto,anyothercodecatallbitrates.EVS-NBat24.4kbit/s is statistically equivalent to NB direct. EVS-WB at 32 kbit/s is statistically equivalent to wideband direct. Also EVS-SWB and EVS-FB reach statistical equivalencetodirectat64and96kbit/s,respectively.

    Page 15 networks.nokia.com

    1

    2

    3

    4

    5

    6

    7

    8

    9

    0 3 6 10 15

    ACR9

    MOS

    AMR-WB - 23,85

    EVS-WB - 13,2

    EVS-SWB - 13,2

    EVS-SWB - 32

    EVS-FB - 48

    FER%

    Figure 7. Audio quality in the presence of channel errors. EVS codec compared to the highest bitrate mode of AMR-WB at 23.85 kbit/s.

  • NB

    WB

    SWB FB

    2

    3

    4

    5

    6

    7

    8

    0 8 16 24 32 40 48 64 96 128

    G.722.1C

    G.718B

    G.719

    Opus CBR

    AMR

    AMR-WB

    EVS-NB

    EVS-WB

    EVS-SWB/FB

    kbit/s

    ACR9 MOS

    Figure8.CombinedMOSvalues.

    Page 16 networks.nokia.com

    From the results it can be seen that the 3GPP EVS codec produces cutting-edge voice and audio quality across all tested bitrates and bandwidths. The codec excels especially at bitrates up to 32 kbit/s which are of utmost importanceforthedeploymentofcost-effectivemobileservices.ComparedtoOpusandothertestedcodecs,atlowbitrateEVSprovidesthesamequalityatabouthalfthebitrate.Forexample,EVS-SWBat9.6kbit/sissignificantlybetterthanOpusCBRat20kbit/sandEVS-SWBat16.4kbit/sprovidesthesame quality as G.722.1C at 48 kbit/s.

    IfweconsiderthatEVS-SWBat13.2kbit/sprovidesanMOSscoreof6.15,AMR-WBat12.65kbit/sprovidesMOSof4.95andAMRat12.2kbit/sprovidesMOSof3.51,theimprovementfromAMR-WBtoEVS-SWB(1.2points)isalmost as large as the improvement is from AMR to AMR-WB (1.44 points). Further,ifweconsiderslightlyhigheroperatingpoints,EVS-SWBat24.4kbit/sversusAMR-WBat23.85kbit/s,theimprovementisalmost1.7MOSpointsand thus a larger improvement than from AMR at 12.2 to AMR-WB at 12.65 kbit/s.

    The SWB and FB modes of EVS at the same bit-rates show similar performance and are therefore shown with a single curve. For audio with frequencies above 16 kHz (and for listeners capable of hearing such high frequencies) the FB modeprovidessignificantqualityimprovementovertheSWBmode.

  • Page 17 networks.nokia.com

    7. EVSCapacityBenefitsforLTEThevoicecapacityimpactofusingtheEVScodecina3GPPLTEnetworkhas also been studied. Based on the results of the subjective listening tests presentedintheprevioussection,EVSaudiobandwidthsandbitratesthatproduce subjective quality equal to or better than the most relevant use cases of AMR and AMR-WB codecs were chosen for analysis. The most relevant referenceusecasesareasfollows:

    AMR12.2kbit/s.Thisisusedwhennarrowbandcommunicationsisneeded,e.g. interworking with CS network supporting only narrowband voice.

    AMR-WB12.65kbit/s.Thiswidebandcodecbitratecanbeusedwhenoptimal interworking with HD voice towards a 3GPP 2G or 3G network is desired. This bitrate ensures optimal voice quality by enabling transcoder-freeoperation(TrFO)betweenLTEandCSnetworks.

    AMR-WB23.85kbit/s.ThisisthedefaultHDvoicecodecbitrateforLTE-to-LTEcalls.

    Note that lower AMR/AMR-WB bitrates can also be used if rate control is active on the CS side.

    Table 3 shows the EVS audio bandwidths and bitrates that produce subjective quality equal to or better than AMR or AMR-WB for typical conversational voice scenarios,includingbothcleanspeechandspeechwithbackgroundnoise.

    EVSvoicecapacityestimationsinLTEarebasedonexistingLTEvoicecapacitysimulations for the AMR codec using 5.9 kbit/s and 12.2 kbit/s as reference points [32]. The main system simulation parameters were aligned with [33]. The reference AMR capacities were 410 and 240 users for 5.9 kbit/s and 12.2kbit/s,respectively,at5MHzsystembandwidthintheuplinkdirectionusing a semi-persistent scheduling scheme. The uplink direction was chosen becauseLTEvoicecapacityisuplinklimitedandthusdefinestheend-to-endsystemcapacity.Additionally,moreoptimumvoicecapacitycanbeachievedby using semi-persistent scheduling over the air interface rather than dynamic scheduling.

    The voice capacities of AMR-WB 23.85 kbit/s and EVS bitrates were interpolated/extrapolated from the capacities of the reference points. A regression analysis tool was used to perform the interpolation/extrapolation.

    Reference Equal bandwidth Wider bandwidth

    AMR 12.2 kbit/s EVS-NB 8.0 kbit/s EVS-WB 5.9 kbit/s

    AMR-WB 12.65 kbit/s EVS-WB 9.6 kbit/s EVS-SWB 9.6 kbit/s

    AMR-WB 23.85 kbits/s EVS-WB 13.2 kbit/s EVS-SWB 9.6 kbit/s

    Table 3. EVS having at least equal subjective quality to reference scenarios

  • Page 18 networks.nokia.com

    Figure 9 illustrates estimated EVS capacity gains compared to the AMR and AMR-WB reference cases shown in Table 3. It can be seen that the highest capacity gain is achieved when the EVS-SWB 9.6 kbit/s codec is used instead ofAMR-WB23.85kbit/s.Almostadoublingofcapacityispossiblehere.Also,significantcapacitygainsareachievableagainstthenarrowbandreferencescenario,whereEVS-WB5.9kbit/sprovidesacapacityimprovementofabout70% over AMR 12.2 kbit/s. The lowest capacity gain is expected when EVS is used instead of AMR-WB 12.65 kbit/s. Here both EVS-WB 9.6 kbit/s and EVS-SWB 9.6 kbits/s provide a capacity gain of about 20%. The capacity gain is the same for WB and SWB modes because 9.6 kbit/s is the lowest available bitrate for EVS-SWB mode.

    8. Beyond EVSImproving naturalness of voice communications and robustness against transmission errors have been the main drivers for new voice codecs. Natural voice quality is particularly important in multi-party conference calls and in telepresence for improved speaker recognition. It is also important to avoid listener fatigue during long sessions. The EVS codec currently supports coding ofstereosignals,butonlybymeansofcodingtwoseparatemonochannels.Extending EVS in future 3GPP releases to stereo and multi-channel coding (withspatiallocalisationoftheparticipants)willbeanaturalevolutionofEVS,improvingbothuserexperienceandsystemefficiencyevenfurther.

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    140%

    160%

    180%

    200%

    EVS vs. AMR 12.2 EVS vs. AMR-WB 12.65 EVS vs. AMR-WB 23.85

    EVS with the same bandwidth

    EVS with wider bandwidth

    EVS-

    NB

    8.0

    EVS-

    WB

    5.9

    EVS-

    WB

    9.6

    EVS-

    SWB

    9.6

    EVS-

    WB

    13.2

    EVS-

    SWB

    9.6

    Figure 9. EVS voice capacity compared to AMR and AMR-WB.

  • Page 19 networks.nokia.com

    9. ConclusionA new codec for Enhanced Voice Services (EVS) has been standardized in 3GPP.TheEVScodecisthesuccessorfortheHighDefinition(HD)mobilevoice codec AMR-WB and it provides full interoperability with HD Voice. EVS offerscutting-edgeperformance;itoutperformsothercodecstandardsfromITU-T,IETFand3GPP.EVSexcelswithhighaudioquality(forbothspeechandmusic),outstandingcompressionefficiencyandsuperiorrobustnessagainst transmission impairments. These characteristics are all essential for cost-effectivemobileservicesandforsuccessfulmobileoperatorbusiness.ComparedtoHDVoice,theEVScodecprovidessubstantiallyimprovedqualityatsimilarbitrates,qualityraisedtoanunprecedentedlevelathigherbitrates,andalsosignificantlyimprovednetworkcapacitywhilemaintainingthesamequality as HD Voice.

    Listofabbreviations2G 2nd Generation

    3GPP The 3rd Generation Partnership Project

    AAC Advanced Audio Coding

    ACELP AlgebraicCode-ExcitedLinearPrediction

    ACR Absolute Category Rating

    ACR9 Absolute Category Rating with 9-point scale

    AMR Adaptive Multi-Rate (same codec as AMR-NB)

    AMR-NB Adaptive Multi-Rate Narrowband

    AMR-WB Adaptive Multi-Rate Wideband

    AMR-WB+ Extended AMR-WB

    BWE Bandwidth Extension

    CBR Constant Bit-Rate

    CDMA Code Division Multiple Access

    CNG Comfort Noise Generation

    CS Circuit-Switched

    DTX Discontinuous Transmission

    EDGE Enhanced Data rates for GSM Evolution

    EFR Enhanced Full-Rate

    EVRC Enhanced Variable Rate Codec

    EVS Enhanced Voice Services

  • Page 20 networks.nokia.com

    FB Fullband

    FEC Frame Error Concealment

    FER Frame Error Rate

    GERAN GSM EDGE Radio Access Network

    GSM Global System for Mobile communications

    HD HighDefinition

    IETF Internet Engineering Task Force

    IO Interoperable

    ITU-T International Telecommunication Union - Telecommunication standardization sector

    JBM JitterBufferManagement

    LP LinearPrediction

    LTE LongTermEvolution

    MDCT ModifiedDiscreteCosineTransform

    MOS MeanOpinionScore

    NB Narrowband

    PCS Personal Communications Service

    PS Packet-Switched

    SC-VBR Source-Controlled Variable Bitrate

    SWB Super wideband

    TrFO Transcoder-freeoperation

    US-TDMA United States - Time Division Multiple Access

    VAD Voice/sound Activity Detection

    VMR-WB Variable-Rate Multimode Wideband

    VoIP Voice over Internet Protocol

    WB Wideband

    WCDMA Wideband Code Division Multiple Access

  • Page 21 networks.nokia.com

    References[1] http://www.gsacom.com/hdvoice/

    [2] B.Bessetteetal.,TheAdaptiveMulti-RateWidebandSpeechCodec(AMR-WB),IEEETransactionsonSpeechandAudioSignalProcessing,Vol.10,No.8,November2002.

    [3] S.Bruhnetal.,Standardizationofthenew3GPPEVSCodec,IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP)2015,Brisbane,Australia,April2015.

    [4] K.Jrvinenetal,GSMEnhancedFullRateCodec,Proc.ofIEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,Munich,Germany,April1997.

    [5] K.Jrvinen,StandardisationoftheAdaptiveMulti-RateCodec,Proc.ofXEuropeanSignalProcessingConference(EUSIPCO),Tampere,Finland,September 2000.

    [6] J.Mkinenetal.AMR-WB+:anewaudiocodingstandardfor3rdgenerationmobileaudioservices,Proc.ofIEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,Philadelphia,USA, March 2005.

    [7] 3GPPTS26.441,CodecforEnhancedVoiceServices(EVS);Generaloverview.Online:http://www.3gpp.org/DynaReport/26441.htm

    [8] 3GPPTS26.442,CodecforEnhancedVoiceServices(EVS);ANSICcode(fixed-point).Online:http://www.3gpp.org/DynaReport/26442.htm

    [9] 3GPPTS26.443,CodecforEnhancedVoiceServices(EVS);ANSICcode(floating-point).Online:http://www.3gpp.org/DynaReport/26443.htm

    [10] 3GPPTS26.444,CodecforEnhancedVoiceServices(EVS);Testsequences.Online:http://www.3gpp.org/DynaReport/26444.htm

    [11] 3GPPTS26.445,CodecforEnhancedVoiceServices(EVS);Detailedalgorithmic description. Online:http://www.3gpp.org/DynaReport/26445.htm

    [12] 3GPPTS26.446,CodecforEnhancedVoiceServices(EVS);AdaptiveMulti-Rate - Wideband (AMR-WB) backward compatible functions. Online:http://www.3gpp.org/DynaReport/26446.htm

    [13] 3GPPTS26.447,CodecforEnhancedVoiceServices(EVS);Errorconcealment of lost packets. Online:http://www.3gpp.org/DynaReport/26447.htm

    [14] 3GPPTS26.448,CodecforEnhancedVoiceServices(EVS);Jitterbuffermanagement.Online:http://www.3gpp.org/DynaReport/26448.htm

    [15] 3GPPTS26.449,CodecforEnhancedVoiceServices(EVS);ComfortNoise Generation (CNG) aspects. Online:http://www.3gpp.org/DynaReport/26449.htm

  • Page 22 networks.nokia.com

    [16] 3GPPTS26.450,CodecforEnhancedVoiceServices(EVS);Discontinuous Transmission (DTX). Online:http://www.3gpp.org/DynaReport/26450.htm

    [17] 3GPPTS26.451,CodecforEnhancedVoiceServices(EVS);VoiceActivity Detection (VAD). Online:http://www.3gpp.org/DynaReport/26451.htm

    [18] 3GPPTR26.952,CodecforEnhancedVoiceServices(EVS);PerformanceCharacterization.Online:http://www.3gpp.org/DynaReport/26952.htm

    [19] A.Vasilacheetal.,Flexiblespectrumcodinginthe3GPPEVScodec,IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP)2015,Brisbane,Australia,April2015.

    [20]M.Bosietal.,ISO/IECMPEG-2advancedaudiocoding,JournaloftheAudioengineeringsociety45.10,pp.789-814,1997.

    [21] 3GPPTdocS4-141065,ReportoftheGlobalAnalysisLabforEVSSelectionPhase,August2014. Online:http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_80bis/Docs/S4-141065.zip

    [22] 3GPPTdocS4-141161,ReportoftheGlobalAnalysisLabfortheEVSCharacterizationPhase,3GPP,November2014. Online:http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_81/Docs/S4-141161.zip

    [23] A.RmandH.Toukomaa,SubjectiveQualityEvaluationofthe3GPPEVSCodec,IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP)2015,Brisbane,Australia,April2015.

    [24] A.RmandH.Toukomaa,VoicequalitycharacterizationofIETF Opuscodec,inProc.Interspeech,pp.25412544,Florence,Italy,August 2011.

    [25] 3GPPTS26.090,Adaptivemulti-rate(AMR)speechcodec;Transcodingfunctions.Online:http://www.3gpp.org/DynaReport/26090.htm

    [26] 3GPPTS26.190,Adaptivemulti-ratewideband(AMR-WB)speechcodec; Transcoding functions. Online:http://www.3gpp.org/DynaReport/26190.htm

    [27] ITU-TG.722.1,Low-complexitycodingat24and32kbit/sforhands-freeoperationinsystemswithlowframeloss,ITU,May2005. Online:http://www.itu.int/rec/T-REC-G.722.1/en

    [28] ITU-TG.718,Amendment2,Frameerrorrobustnarrow-bandandwideband embedded variable bit-rate coding of speech and audio from 832kbit/s;Amendment2:NewAnnexBonsuperwidebandscalableextensionforITU-TG.718andcorrectionstomainbodyfixed-pointC-codeanddescriptiontext,ITU,March2010. Online:http://www.itu.int/rec/T-REC-G.718/en

  • Page 23 networks.nokia.com

    [29] ITU-TG.719,Low-complexity,full-bandaudiocodingforhigh-quality,conversationalapplications,ITU,June2008. Online:http://www.itu.int/rec/T-REC-G.719/en

    [30] J.-M.Valinetal.,DefinitionoftheOpusaudiocodec,IETFRFC6716,September2012.Online:http://opus-codec.org/

    [31] ITU-TP.800,Methodsforsubjectivedeterminationoftransmissionquality,ITU,August1996. Online:https://www.itu.int/rec/T-REC-P.800-199608-I/en

    [32] H.HolmaandA.Toskala,LTEforUMTS:EvolutiontoLTE-Advanced,SecondEdition,JohnWiley&Sons,2011.

    [33] NGMNAlliance,NextGenerationMobileNetworksRadioAccessPerformanceEvaluationMethodology,January2008. Online:http://www.ngmn.org/uploads/media/NGMN_Radio_Access_Performance_Evaluation_Methodology.pdf

  • networks.nokia.com

    NokiaisaregisteredtrademarkofNokiaCorporation.Otherproductandcompanynamesmentionedhereinmaybetrademarksortradenamesoftheir respective owners.

    Nokia NokiaSolutionsandNetworksOy P.O.Box1 FI-02022 Finland

    Visitingaddress: Karaportti3, ESPOO, Finland Switchboard +358 71 400 4000

    Product code C401-011896-WP-201506-1-EN

    Nokia Solutions and Networks 2015