automatic detection technique for voice department of...

1

ISSN: xxx-xxxx (Print) ; xxxx-xxxx (Online)

Journal of Advanced Sciences and Engineering Technologies

available online at: http://www.isnra.com/ojs/index.php/JASET/

Takialddin Al Smadi

1Department of Communications and Electronics Engineering, College of

Engineering, Jerash University, Jerash, Jordan.

Keywords: voice signal Automatic detection inter-disciplinary speech recognition A R T I C L E I N F O

Article history: Received 01 April 2018 Accepted 20 April 2018 Available online 05 xxx 2018 DOI: http://dx.doi.org/xxx.xxx.xxx

Jou

rnal

of

Ad

vanc

ed

Scie

nces

a

nd E

ngin

eeri

ng

Tech

nolo

gies

Jou

rnal

of

A

dvan

ced

Scie

nces

an

d En

gine

erin

g Te

chno

logi

es

Automaticdetectiontechniqueforvoicequalityinter-disciplinarymethodologies

A B S T R A C T

This paper mainly studies process of dynamic routing in a multi-level perspective the mobile radio network based on the new generation of radio enhance the mobility a new and higher quality of service is required for different types of traffics,The Routing protocol in data networks will understand a formal set of rules and agreements on sharing network information between routers to determine the route of data transmission that satisfies a given quality of service requirements and provides a balanced load across the mobile radio network as a whole. including routing issues, devoted to the work of scientists, For modern computer networks of large dimension is typical multilevel routing in which in some way divided into a subnet routing domains, with at the most efficient protocol subnets group IGP, EGP group and protocols between networks. It is proposed to use a well-known proactive routing protocol OLSR multipoint handoff service packages as part of a hybrid protocol (HWMP). The description, the algorithm and the features of the implementation of the proactive protocol (OLSR).

© 201x JASET, International Scholars and Researchers Association

Introduction

Artificial neural networks are the simplest mathematicalmodels of the brain. To understand the basic principles ofbuilding,youcanconsiderthemasasetnetworkof individualstructures neurons. Very roughly the structure of a biologicalneuroncanbedescribedas follows. Theneuronhas soma - abody,atreeofentrances-dendrites,anexit-anaxon.[1].Onthe soma and on the dendrites, the endings of the axons ofother neurons, called si-naps, are located. The synapsesreceivedbythesignalstendtoeitherexcitetheneuronorslowdown.When the total excitation reaches a certain threshold,theneuron isexcitedandsendsa signal to theotherneuronsalongtheaxon.Eachsynapsehasauniquesynapticforcethat,in proportion to its value, changes the input signal to theneuron. In accordance with the above description, themathematical model of the neuron is a summing thresholdelement,Thedirectdistributionofsignal is layer,startingwiththeinput layer,usethecalculatedamountoftheinputsignalsfor each neuron and the Function is generated by theActivationResponseoftheneuron,whichisdistributedinthe

next layer, taking into account the weight of the neuralconnectiononthefig(1).Asaresultofthisstepwegetavectorofoutputvaluesoftheneuralnetwork.

Fig 1 anartificial neuron the formula for triggeringof theneuron:

O = F(<WT , X >) =

F( wi ⋅ xii=1

n

∑ ),=O =1, (<WT , X >) ≥ 0,0, .....

⎧⎨⎩

The neural networks (NA) learning occurs on some trainingsample, for each sample of which all current outputs are

2

determined and compared with the desired values. If thedifference is unacceptable, the weights change. The end oftrainingisthesituationwhenacommonerroronallsamplesispermissible. All algorithms for learning neural networks are avariety of learning algorithms based on the method of errorcorrection,which iscarriedout indifferentways.The ideaofchanging the (NA)weights is to findageneralmeasureof thequalityofthenetwork,whichisusuallychosenasthenetworkerrorfunction[2,3]inordertofindtherightweights,Themostcommon method of finding the minimum is the method ofgradientdescent.Forthecaseofafunctionwithonevariable,weightschange inthedirectionoppositetothederivativeTheReverseAlgorithmErrorDistributioninvolvesthecalculationoftheerror,astheoutputlayerandeachneuronnetwork,aswellas correction weights of neurons in accordance with theircurrentvalues.Inthefirststepofthealgorithmoftheweightofall the ties are initialized with small random values (0 to 1).Aftertheinitializationofweightsinthelearningprocessoftheneuralnetworktoperformthefollowingsteps:

• Directdistributionofsignal;• Errorcalculationoftheneuronsofthelastlayer;• Inversedistributionoferrors.

Recognitionbasedonneuralnetworksinter-disciplinary

Speech recognition is an interdisciplinary subfield ofcomputational linguistics that develops methodologies andtechnologies thatallowyouto recognizeandtranslatespokenlanguageintotextbycomputers.Itisalsoknownasautomaticspeech recognition and computer speech recognition" orsimply"speechinthetext. It includesknowledgeandresearchinthefieldoflinguistics,informaticsandelectricalengineering.

The input signal is divided into 20 frames, each of whichcontains 512 samples. For each Frame a gives 255 Spectralobjects, on the input neurons in the neural network. On thebasis of the input data and theoutput requirements in directneuralnetwork.

Discretewavelettransform(DWT)algorithminspeechrecognitionDefinitionofwords

The word determination can be performed by comparingnumericformssignalsorbycomparingthespectrogramofthesignals. The comparison process in both cases shouldcompensate for thedifferent lengthsof thesequenceand thenon-linearnatureofthesound.TheDWTalgorithmmanagestoresolve these problems by finding the deformationcorresponding to the optimal distance between two rows ofdifferentlengthsthereare2featuresofthealgorithm:

1.Directcomparisonofnumericalwaveforms.Inthiscase,foreach numerical sequence a new sequence is created, thedimensionsofwhicharemuch smaller.Anumerical sequencecanhaveseveralthousandnumericvalues[4].

While a subsequence can have several hundred values,reducingthenumberofnumericalvaluescanbeaccomplishedby removing them between corner points. This process ofreducingthe lengthofanumericalsequencemustnotchangeits representation. Undoubtedly, the process leads to a

decreaseaccuracyofrecognition.However,takingintoaccountthe increase in speed, accuracy, in fact, is increased byincreasingthewordsinthedictionary.

2. Representation of spectrogram signals and application oftheDTWalgorithmforcomparisonoftwospectrograms.

Themethod consists in dividing the digital signal into somenumberof intervals thatwilloverlap.Foreachpulse, Intervalsof real numbers (sound frequencies), will calculate the FastFouriertransform,andwillbestoredinthematrixofthesoundspectrogram. Options will be the same for all computationaloperations: pulse lengths, Fourier transform lengths, overlaplengths for two consecutive pulses. The Fourier transform issymmetrically connected with the center, and the complexnumbersononesidearerelatedtothenumbersontheotherhand. In this regard,only thevalues fromthe firstpartof thesymmetrycanbesaved,sothespectrogramwillrepresentthematrix of complex numbers, the number of lines in such amatrixisequaltohalfthelengthoftheFouriertransform,andthe number of columnswill be determined depending on thelengthofthesound.(DTW)willbeappliedtothematrixofrealnumbers as a result of conjugation of the spectrogram ofvalues;suchamatrixiscalledtheenergymatrix[5].

Developed a neural network has demonstrated the expectedbehaviorassociatedwiththestudyandgeneralizationerror. Itwas found that even if the error synthesis, decreases withincreasingtrainingsequence,errorsstartsoscillatingregardlessof the introductionofdynamic learningspeed. In thenetworkwere prepared enough to meet the requirements for thegeneralizationerror,but,nevertheless,thereisstillapossibilitytoimproveaggregateerrorshowthefig2

Figure2.BlockSchemeofspeechRecognition.

Signsofnonlineardynamics

Themaximumcharacteristicindexoftheemotionalstateofaperson,towhichcorrespondscertaingeometryoftheattractorphaseportrait;person'semotionalstate,from"calmness"to"anger"deformationandsubsequentshiftofthespeechsignalspectrum.[6].

Foragroupofsignsofnon-lineardynamics,thespeechsignalisconsidered as a scalar quantity observed in the human vocaltract system. The process of speech formation can beconsiderednonlinearandanalyzedbythemethodsofnonlineardynamics. The problem of nonlinear dynamics consists infinding and studying in detail the basic mathematical modelsandrealsystemsthatcomefromthemost typical suggestionsaboutthepropertiesoftheindividualelementsmakingupthesystem and the laws of interaction between them. Currently,the methods of nonlinear dynamics are based on thefundamentalmathematicaltheory,whichisbasedontheTaken

3

theorem,whichprovidesarigorousmathematicalbasisfortheideasofnonlinearautoregressionandprovesthepossibilityofreconstructing the phase portrait of an attractor from a timeseriesorfromoneofitscoordinates.Anattractorisdefinedasa setofpointsora subspace in thephasespace towhich thephasetrajectoryapproachesafterdecayoftransientprocesses.Estimates of signal characteristics from reconstructed speechtrajectories are used in constructing nonlinear deterministicphase-spacemodelsoftheobservedtimeseries.Therevealeddifferencesintheformofattractorscanbeusedfordiagnosticrules and signs allowing recognizing and correctly identifyingvariousemotionsinanemotionallycoloredspeechsignal[7].

RelatedWorks

Usuallyvoicesignalissplitintosmallerpieces-framessegmentseachframeissubjectedtopreliminaryprocessing,forexample,usingtheFouriertransform.Thisisdoneinordertoreducethesurfaceareaandanincreaseintheattributivestrippedclassestheprocessofthestructureneuralnetworkshowinfig3

Figure3theprocessofthestructureneuralnetwork

Thenetworkattimetacceptsaninputvectorxt,thelatentconditioninthepreviousstepst1andcalculatestheoutputvector.Afterthisthenewstatestistransferredtothenextiterationoftheprocess.Suchnetworksallowprocessingasequenceofunknownlength,giventheconnectionbetweenthepresentandthepast.Themainmethodofteachingsuchnetworksisthemethodreturnanerrordistributionintime.[8]Itworksasfollows,allofthelearningalgorithmsofneuralnetworksarethevarietieslearningalgorithmthemethodoferrorcorrection,whichiscarriedoutindifferentways.Theideaofchangingweightstofindcommonmeasuresthequalityofthenetwork,asusuallychoosethefunctionofnetworkerror.Then,inordertofindtherightweightloss,itisnecessarytominimizetheerrorfunction.Themostcommonmethodoffindingaminimumisthemethodofgradientdescent.Inthecaseoffunctionwithonevariable,theweightofthechangeintheoppositedirectionofthederivativethefairformula.

⇒−+

=∂ →

tWFetWFWF t

e))()((lim)( 0

е=(0,0...1...0)Definetheprivatedifferential.

⇒−

=∂ +→

tWFwwwwFWF netit

i))(),...,,...,,((lim)( 210

)1.........())(...,),(...,),(),((()( 21T

ni wFwFwFwFWF ∂−∂−∂−−∂=∂

Todeterminethegeneralizedfunctionslet'slookatthetutorial

sample

{(хk,Kk)},k=1,...,K.Theaccumulatedinallepoch’serror

)2......(...........2/1)(1 1

2

1∑ ∑∑= ==

⎟⎠

⎞⎜⎝

⎛−==

K

k

m

iii

K

k

k YOEE

HeformulaformodificationoftheweightsoftheNA

WEhWW nn ∂∂⋅−=+ /1

>⇒=< iii XWO ,

⇒⋅−−=∂∂ ,)(/ XOYWE ii

⇒⋅−⋅−=+ .)(1 XOYhWW iinn

)3......(,.........1 XhWW nn ⋅⋅−=+ δ

Propertiesoffunctions(t,Ω):

0),(&.0),( ε<Ω=Ω tptp

P(t,Ω)∈(0.1)……(4)

TheexamsimilarityfunctionchowinFigure4.

Figure4anthesimilarityfunction

( ) ( ){ } ( ){ } )4(..........supsup 1 iii tPtPtP <<−+

Sincethefunction ( ) [ ]1,0∈itP ,so

( ){ } 1sup =itP and3.1canberewrittenas

( ) ( ){ } )5(....................1sup 1 <<−+ ii tPtP .

Extend the standard configuration of a neural network waslooking for a vector with length values at once

)(),(),( 0100 MnynynyM +⇔− +

tn Δ⇒ 0 , ( ) tn Δ+10 , ( ) tMn Δ+0 . The numberof

inputvectors,respectivelytoincreasethe

Mnnxmnnxnnx

nnxnxnnynnxM

++

−+++

+

+−−−⇒

−

0

00

100

00

(,)1((),(

),()(),1(),(

4

Figure5neuralnetworksforrecognitionofphonemictasks.

Theselayersarereferredtoaslinear,asisthemultiplicationofthe input vector by amatrix ofweightsw(k) for k-th layer. Inpractice,suchatransformation isusuallyaddtotheconfusionofbk , i.e. theoutputvectorofk-th layerofu(k) is calculatedthroughthepreviouslayer

u(k)=φ(k)(A(k)•u(k−1)+bk),A(k)∈Rd1×d2,bk∈Rd1

[10,11and12].

( ) ( ) 1212 <<−+ ii tPtP ,i=1to3,…,

Ø ( ) ( )ii tPtP 212 ,+ on

( ) ( )ii toto 212 ,+ ,

( ) ( ) ( )2

21212

iii

tPtPto += +

+ ,

Ø and

o t2i( ) =P t2i+1( )−P t2i( )

2.......(6)

Ø Converteasilytoformulas

( ) ( ) ( )iii tototP 21212 += ++

Ø ( ) ( ) ( ) )7........(2122 iii tototP −= +

Ø ( )ito 2 . ( )k

to i1sup 2 =

Ø 11<<

k.Theredundantinterval

⎟⎠

⎞⎢⎣

⎡ 1;1k

ki =2

sup{P ti+1( )−P ti( )}..............(8)

Theunclaimedgoes secondwith large-scaleactivities. First, inthe caseof theunclaimed input values, theneural network ismore to achieve the same accuracy, scaled towant to take a

fewiterations.⎭⎬⎫

⎩⎨⎧

ii k2sup .

Findtheerrorsoftheneuralnetworkresultsinthisiteration,inthiscase,theerrorsofthefirstneuralnetwork.

E1 = 2ε2.....(9)

E2 = ε2 +

εki

⎛

⎝⎜

⎞

⎠⎟

2

= 1+ 1ki

⎛

⎝⎜

⎞

⎠⎟ε 2....(10)

Themodelinterpretsashortsequenceofwords

FixedthesizeoftheClassictherecurrentneuralnetworkstakeontheentranceofarbitrarylength,butattheoutletisstillemitthe vector of fixed size. One possible solution is thearchitecture of the encoder-decoder the main idea of thismethod is the first of the vector of fixed size, describing theinput sequence and then deploying already in the outputsequence [13]. In more detail, the process usually occurs asfollows:

Ø Loginispassedthroughtheneuralnetworkandcellstate (instead of in the hidden state) after passingthe whole sequence is considered to be a vectordescribingtheentrance.

Ø Thisphaseencodes thedata inavector; therefore,theneuralnetworkiscalledencoder.

Ø At the second stage (decoding) is the task of thevector understands the output sequence. For thisthecell states the initial stateofalso technology inthefollowingnetwork,whichconsistentlygeneratessymbolsfromtheoutputAlphabetuntilitgeneratesaspecialterminalsymbol.Often,thisarchitectureisusedinthetasksofmachinetranslation,soherewasaconceptoftheoutputalphabet,asintheclassicalgenerated at time t is input to the network in thenextmomentof time t 1. But in the subject in thework of the task, it is not necessary, the outputalphabet(encodedmanywords)hasasmallvolume,and the symbols of the received letters, which inturn is a lot of words, do not form between anyrelationships. Therefore, used described thearchitecture without Reference generated by thecharacter further in the process of decoder. TheprocedurestructurecanbeseeninFigure6

Figure6Modeloftheencoder-decoderLSTM

Asyouexit theexpectedsequenceofmarksofwords,spokenon this video, too shortwordswere as emptyword, actually,thismeansthatalltheshortwordswereinonelargeclass.

THERESULTS

The example uses a two-layer perceptronwith two nonlinearneurons in the first layer and one in the second layer. Theoperationofthealgorithmforbackpropagationoftheerroris

5

broken down into the following steps: the assignment of theinput and the desired output, the direct passage of the inputsignaltotheoutput,the inversepropagationoftheerror,and

the change in the weights. Variables permitting to trace theoperationofthealgorithmforreversepropagationoferror

Table1Theresultsofthephasedimplementationofthebackpropagationalgorithm

StageDirectdistributionoftheinput

signalDirectdistributionoftheinputsignal Changingweights

A1(1),A1(2) Logsig(W1P+B1)=

=[0,321,0,368]

Notrunning Notrunning

А2 Pureline(W1P+В1)=

=0,446

» »

е t-A2=1,261 » »

N1(1),N1(2) Notrunning

]100,0,049,0[

)(log

221

1

=

=⋅⋅∂

∂NWNNsim

»

N2 Тоже

522,2)(22

2 −=⋅∂

∂⋅−

eNNpurelin

»

W1(1)

W1(2)

» Notrunning

]420,0,265,0[lr 111

−−=

=×⋅−= PNWW

B1(1),B1(2) » »

]140,0,475,0[lr 111

−−=

=⋅−= NBB

B2 » » 732,0lr 222 =⋅−= NBB

W2(2) » »

]077,0,171,0[lr 1222

=

=×⋅−= NNWW

Fig 7 Result of approximation of vectors by a two-layerperceptronandFig8.Radialbasisfunction

Fig9approximationbymeansofa

radialbasisneuralnetworktest

6

Acknowledgement

TheDeanshipofScientificResearch,JerashUniversity,supportedthisWork.

Conclusion

The article describes a mathematical model of a middle earperson with the help of psychoacoustic perception approachheightsandobtainedfromhimtheimageclassification.Aretheresultsoftheexperimentsspeechrecognitionbasedonneuralnetworks, The advantages of this method, you can make itsufficient simplicity of implementation, as well as the veryobvious analogy with the processes taking place in the realorgan of hearing rights. Disadvantage is the level of errors intherecognition(13-23%)whichisofferedtoreducetheuseofcontextual recognition offer Recognition of individual voicecommands.automatedKeywords fromthestreamof speech,whichareassociated withtheprocessingoftelephonecalls

or sphere of security. The ability to learn and summarize theaccumulated knowledge, the neural network has the featuresof artificial intelligence. A network trained on a limited set ofdataisabletogeneralizetheinformationobtainedandtoshowgoodresultsondatanotusedinthelearningprocess.

Acharacteristicfeatureofthenetworkisalsothepossibilityofits implementationwith theuseof technologyof a very largedegree of integration. The difference in network elements issmall, and their repeatability is enormous. This opens theprospectofcreatingauniversalprocessorwithahomogeneousstructure,capableofprocessingavarietyofinformation.

Reference

[1]. MOHAMMAd, A. L., and A. L. AbdUSAMAd. "ArtificialIntelligence Technique for Speech Recognition based onNeuralNetworks."(2014).

[2].AlSmadi,T.,AlIssa,H.A.,Trad,E.,&AlSmadi,K.A.(2015).Artificial intelligence for speech recognition based onneuralnetworks.J.SignalInform.Process,6,66-72.

[3]. Graves, Alex,Abdel-rahman Mohamed, and GeoffreyHinton. "Speech recognition with deep recurrent neuralnetworks."Acoustics, speech and signal processing(icassp), 2013 ieee international conference on. IEEE,2013.

[4]. Abdel-Hamid, Ossama, et al. "Convolutional neuralnetworks for speech recognition."IEEE/ACMTransactionson audio, speech, and language processing22.10 (2014):1533-1545.

[6]. Abdel-Hamid,O.,Mohamed,A.-R.,Jiang,H.,andPenn,G.(2012).Applying convolutional neural networks conceptsto hybrid NN-HMM model for speech recognition.InICASSP,pages4277–4280.IEEE.

[7]. Dede, G., Sazlı, M.H.:Speech recognition with artificialneural networks. Digital Signal Processing 20, 763–768(2010)

[8].Dahl, G., Yu, D., Li, D., and Acero, A. (2011). Largevocabulary continuous speech recognition with context-dependentdbn-hmms.InICASSP.

[9].Graves, A., Jaitly, N., and Mohamed, A. (2013). HybridspeechrecognitionwithdeepbidirectionalLSTM.InASRU.

[10]. Toscano, J.C., McMurray, B.:Cue Integration WithCategories: Weighting Acoustic Cues in Speech UsingUnsupervised Learning and DistributionalStatistics.CognitiveScience34,434–464(2010)

[11. Al Smadi ,An Improved Real-Time Speech In Case OfIsolated Word Recognition, Int. Journal of EngineeringResearch and Application, Vol. 3, Issue 5, Sep-Oct 2013,pp.01-05

[12] .Siniscalchi, SabatoMarco, et al. "Exploiting deep neuralnetworks for detection-based speech recognition."Neurocomputing106(2013):148-157.

[13]. Smadi, Kalid A., and Takialddin Al Smadi. "AutomaticSystemRecognition of License Plates usingNeuralNetworks."(2017).

automatic detection technique for voice department of...

Documents