automatic detection technique for voice department of...
TRANSCRIPT
1
ISSN: xxx-xxxx (Print) ; xxxx-xxxx (Online)
Journal of Advanced Sciences and Engineering Technologies
available online at: http://www.isnra.com/ojs/index.php/JASET/
Takialddin Al Smadi
1Department of Communications and Electronics Engineering, College of
Engineering, Jerash University, Jerash, Jordan.
Keywords: voice signal Automatic detection inter-disciplinary speech recognition A R T I C L E I N F O
Article history: Received 01 April 2018 Accepted 20 April 2018 Available online 05 xxx 2018 DOI: http://dx.doi.org/xxx.xxx.xxx
Jou
rnal
of
Ad
vanc
ed
Scie
nces
a
nd E
ngin
eeri
ng
Tech
nolo
gies
Jou
rnal
of
A
dvan
ced
Scie
nces
an
d En
gine
erin
g Te
chno
logi
es
Automaticdetectiontechniqueforvoicequalityinter-disciplinarymethodologies
A B S T R A C T
This paper mainly studies process of dynamic routing in a multi-level perspective the mobile radio network based on the new generation of radio enhance the mobility a new and higher quality of service is required for different types of traffics,The Routing protocol in data networks will understand a formal set of rules and agreements on sharing network information between routers to determine the route of data transmission that satisfies a given quality of service requirements and provides a balanced load across the mobile radio network as a whole. including routing issues, devoted to the work of scientists, For modern computer networks of large dimension is typical multilevel routing in which in some way divided into a subnet routing domains, with at the most efficient protocol subnets group IGP, EGP group and protocols between networks. It is proposed to use a well-known proactive routing protocol OLSR multipoint handoff service packages as part of a hybrid protocol (HWMP). The description, the algorithm and the features of the implementation of the proactive protocol (OLSR).
© 201x JASET, International Scholars and Researchers Association
Introduction
Artificial neural networks are the simplest mathematicalmodels of the brain. To understand the basic principles ofbuilding,youcanconsiderthemasasetnetworkof individualstructures neurons. Very roughly the structure of a biologicalneuroncanbedescribedas follows. Theneuronhas soma - abody,atreeofentrances-dendrites,anexit-anaxon.[1].Onthe soma and on the dendrites, the endings of the axons ofother neurons, called si-naps, are located. The synapsesreceivedbythesignalstendtoeitherexcitetheneuronorslowdown.When the total excitation reaches a certain threshold,theneuron isexcitedandsendsa signal to theotherneuronsalongtheaxon.Eachsynapsehasauniquesynapticforcethat,in proportion to its value, changes the input signal to theneuron. In accordance with the above description, themathematical model of the neuron is a summing thresholdelement,Thedirectdistributionofsignal is layer,startingwiththeinput layer,usethecalculatedamountoftheinputsignalsfor each neuron and the Function is generated by theActivationResponseoftheneuron,whichisdistributedinthe
next layer, taking into account the weight of the neuralconnectiononthefig(1).Asaresultofthisstepwegetavectorofoutputvaluesoftheneuralnetwork.
Fig 1 anartificial neuron the formula for triggeringof theneuron:
O = F(<WT , X >) =
F( wi ⋅ xii=1
n
∑ ),=O =1, (<WT , X >) ≥ 0,0, .....
⎧⎨⎩
The neural networks (NA) learning occurs on some trainingsample, for each sample of which all current outputs are
2
determined and compared with the desired values. If thedifference is unacceptable, the weights change. The end oftrainingisthesituationwhenacommonerroronallsamplesispermissible. All algorithms for learning neural networks are avariety of learning algorithms based on the method of errorcorrection,which iscarriedout indifferentways.The ideaofchanging the (NA)weights is to findageneralmeasureof thequalityofthenetwork,whichisusuallychosenasthenetworkerrorfunction[2,3]inordertofindtherightweights,Themostcommon method of finding the minimum is the method ofgradientdescent.Forthecaseofafunctionwithonevariable,weightschange inthedirectionoppositetothederivativeTheReverseAlgorithmErrorDistributioninvolvesthecalculationoftheerror,astheoutputlayerandeachneuronnetwork,aswellas correction weights of neurons in accordance with theircurrentvalues.Inthefirststepofthealgorithmoftheweightofall the ties are initialized with small random values (0 to 1).Aftertheinitializationofweightsinthelearningprocessoftheneuralnetworktoperformthefollowingsteps:
• Directdistributionofsignal;• Errorcalculationoftheneuronsofthelastlayer;• Inversedistributionoferrors.
Recognitionbasedonneuralnetworksinter-disciplinary
Speech recognition is an interdisciplinary subfield ofcomputational linguistics that develops methodologies andtechnologies thatallowyouto recognizeandtranslatespokenlanguageintotextbycomputers.Itisalsoknownasautomaticspeech recognition and computer speech recognition" orsimply"speechinthetext. It includesknowledgeandresearchinthefieldoflinguistics,informaticsandelectricalengineering.
The input signal is divided into 20 frames, each of whichcontains 512 samples. For each Frame a gives 255 Spectralobjects, on the input neurons in the neural network. On thebasis of the input data and theoutput requirements in directneuralnetwork.
Discretewavelettransform(DWT)algorithminspeechrecognitionDefinitionofwords
The word determination can be performed by comparingnumericformssignalsorbycomparingthespectrogramofthesignals. The comparison process in both cases shouldcompensate for thedifferent lengthsof thesequenceand thenon-linearnatureofthesound.TheDWTalgorithmmanagestoresolve these problems by finding the deformationcorresponding to the optimal distance between two rows ofdifferentlengthsthereare2featuresofthealgorithm:
1.Directcomparisonofnumericalwaveforms.Inthiscase,foreach numerical sequence a new sequence is created, thedimensionsofwhicharemuch smaller.Anumerical sequencecanhaveseveralthousandnumericvalues[4].
While a subsequence can have several hundred values,reducingthenumberofnumericalvaluescanbeaccomplishedby removing them between corner points. This process ofreducingthe lengthofanumericalsequencemustnotchangeits representation. Undoubtedly, the process leads to a
decreaseaccuracyofrecognition.However,takingintoaccountthe increase in speed, accuracy, in fact, is increased byincreasingthewordsinthedictionary.
2. Representation of spectrogram signals and application oftheDTWalgorithmforcomparisonoftwospectrograms.
Themethod consists in dividing the digital signal into somenumberof intervals thatwilloverlap.Foreachpulse, Intervalsof real numbers (sound frequencies), will calculate the FastFouriertransform,andwillbestoredinthematrixofthesoundspectrogram. Options will be the same for all computationaloperations: pulse lengths, Fourier transform lengths, overlaplengths for two consecutive pulses. The Fourier transform issymmetrically connected with the center, and the complexnumbersononesidearerelatedtothenumbersontheotherhand. In this regard,only thevalues fromthe firstpartof thesymmetrycanbesaved,sothespectrogramwillrepresentthematrix of complex numbers, the number of lines in such amatrixisequaltohalfthelengthoftheFouriertransform,andthe number of columnswill be determined depending on thelengthofthesound.(DTW)willbeappliedtothematrixofrealnumbers as a result of conjugation of the spectrogram ofvalues;suchamatrixiscalledtheenergymatrix[5].
Developed a neural network has demonstrated the expectedbehaviorassociatedwiththestudyandgeneralizationerror. Itwas found that even if the error synthesis, decreases withincreasingtrainingsequence,errorsstartsoscillatingregardlessof the introductionofdynamic learningspeed. In thenetworkwere prepared enough to meet the requirements for thegeneralizationerror,but,nevertheless,thereisstillapossibilitytoimproveaggregateerrorshowthefig2
Figure2.BlockSchemeofspeechRecognition.
Signsofnonlineardynamics
Themaximumcharacteristicindexoftheemotionalstateofaperson,towhichcorrespondscertaingeometryoftheattractorphaseportrait;person'semotionalstate,from"calmness"to"anger"deformationandsubsequentshiftofthespeechsignalspectrum.[6].
Foragroupofsignsofnon-lineardynamics,thespeechsignalisconsidered as a scalar quantity observed in the human vocaltract system. The process of speech formation can beconsiderednonlinearandanalyzedbythemethodsofnonlineardynamics. The problem of nonlinear dynamics consists infinding and studying in detail the basic mathematical modelsandrealsystemsthatcomefromthemost typical suggestionsaboutthepropertiesoftheindividualelementsmakingupthesystem and the laws of interaction between them. Currently,the methods of nonlinear dynamics are based on thefundamentalmathematicaltheory,whichisbasedontheTaken
3
theorem,whichprovidesarigorousmathematicalbasisfortheideasofnonlinearautoregressionandprovesthepossibilityofreconstructing the phase portrait of an attractor from a timeseriesorfromoneofitscoordinates.Anattractorisdefinedasa setofpointsora subspace in thephasespace towhich thephasetrajectoryapproachesafterdecayoftransientprocesses.Estimates of signal characteristics from reconstructed speechtrajectories are used in constructing nonlinear deterministicphase-spacemodelsoftheobservedtimeseries.Therevealeddifferencesintheformofattractorscanbeusedfordiagnosticrules and signs allowing recognizing and correctly identifyingvariousemotionsinanemotionallycoloredspeechsignal[7].
RelatedWorks
Usuallyvoicesignalissplitintosmallerpieces-framessegmentseachframeissubjectedtopreliminaryprocessing,forexample,usingtheFouriertransform.Thisisdoneinordertoreducethesurfaceareaandanincreaseintheattributivestrippedclassestheprocessofthestructureneuralnetworkshowinfig3
Figure3theprocessofthestructureneuralnetwork
Thenetworkattimetacceptsaninputvectorxt,thelatentconditioninthepreviousstepst1andcalculatestheoutputvector.Afterthisthenewstatestistransferredtothenextiterationoftheprocess.Suchnetworksallowprocessingasequenceofunknownlength,giventheconnectionbetweenthepresentandthepast.Themainmethodofteachingsuchnetworksisthemethodreturnanerrordistributionintime.[8]Itworksasfollows,allofthelearningalgorithmsofneuralnetworksarethevarietieslearningalgorithmthemethodoferrorcorrection,whichiscarriedoutindifferentways.Theideaofchangingweightstofindcommonmeasuresthequalityofthenetwork,asusuallychoosethefunctionofnetworkerror.Then,inordertofindtherightweightloss,itisnecessarytominimizetheerrorfunction.Themostcommonmethodoffindingaminimumisthemethodofgradientdescent.Inthecaseoffunctionwithonevariable,theweightofthechangeintheoppositedirectionofthederivativethefairformula.
⇒−+
=∂ →
tWFetWFWF t
e))()((lim)( 0
е=(0,0...1...0)Definetheprivatedifferential.
⇒−
=∂ +→
tWFwwwwFWF netit
i))(),...,,...,,((lim)( 210
)1.........())(...,),(...,),(),((()( 21T
ni wFwFwFwFWF ∂−∂−∂−−∂=∂
Todeterminethegeneralizedfunctionslet'slookatthetutorial
sample
{(хk,Kk)},k=1,...,K.Theaccumulatedinallepoch’serror
)2......(...........2/1)(1 1
2
1∑ ∑∑= ==
⎟⎠
⎞⎜⎝
⎛−==
K
k
m
iii
K
k
k YOEE
HeformulaformodificationoftheweightsoftheNA
WEhWW nn ∂∂⋅−=+ /1
>⇒=< iii XWO ,
⇒⋅−−=∂∂ ,)(/ XOYWE ii
⇒⋅−⋅−=+ .)(1 XOYhWW iinn
)3......(,.........1 XhWW nn ⋅⋅−=+ δ
Propertiesoffunctions(t,Ω):
0),(&.0),( ε<Ω=Ω tptp
P(t,Ω)∈(0.1)……(4)
TheexamsimilarityfunctionchowinFigure4.
Figure4anthesimilarityfunction
( ) ( ){ } ( ){ } )4(..........supsup 1 iii tPtPtP <<−+
Sincethefunction ( ) [ ]1,0∈itP ,so
( ){ } 1sup =itP and3.1canberewrittenas
( ) ( ){ } )5(....................1sup 1 <<−+ ii tPtP .
Extend the standard configuration of a neural network waslooking for a vector with length values at once
)(),(),( 0100 MnynynyM +⇔− +
tn Δ⇒ 0 , ( ) tn Δ+10 , ( ) tMn Δ+0 . The numberof
inputvectors,respectivelytoincreasethe
Mnnxmnnxnnx
nnxnxnnynnxM
++
−+++
+
+−−−⇒
−
0
00
100
00
(,)1((),(
),()(),1(),(
4
Figure5neuralnetworksforrecognitionofphonemictasks.
Theselayersarereferredtoaslinear,asisthemultiplicationofthe input vector by amatrix ofweightsw(k) for k-th layer. Inpractice,suchatransformation isusuallyaddtotheconfusionofbk , i.e. theoutputvectorofk-th layerofu(k) is calculatedthroughthepreviouslayer
u(k)=φ(k)(A(k)•u(k−1)+bk),A(k)∈Rd1×d2,bk∈Rd1
[10,11and12].
( ) ( ) 1212 <<−+ ii tPtP ,i=1to3,…,
Ø ( ) ( )ii tPtP 212 ,+ on
( ) ( )ii toto 212 ,+ ,
( ) ( ) ( )2
21212
iii
tPtPto += +
+ ,
Ø and
o t2i( ) =P t2i+1( )−P t2i( )
2.......(6)
Ø Converteasilytoformulas
( ) ( ) ( )iii tototP 21212 += ++
Ø ( ) ( ) ( ) )7........(2122 iii tototP −= +
Ø ( )ito 2 . ( )k
to i1sup 2 =
Ø 11<<
k.Theredundantinterval
⎟⎠
⎞⎢⎣
⎡ 1;1k
ki =2
sup{P ti+1( )−P ti( )}..............(8)
Theunclaimedgoes secondwith large-scaleactivities. First, inthe caseof theunclaimed input values, theneural network ismore to achieve the same accuracy, scaled towant to take a
fewiterations.⎭⎬⎫
⎩⎨⎧
ii k2sup .
Findtheerrorsoftheneuralnetworkresultsinthisiteration,inthiscase,theerrorsofthefirstneuralnetwork.
E1 = 2ε2.....(9)
E2 = ε2 +
εki
⎛
⎝⎜
⎞
⎠⎟
2
= 1+ 1ki
⎛
⎝⎜
⎞
⎠⎟ε 2....(10)
Themodelinterpretsashortsequenceofwords
FixedthesizeoftheClassictherecurrentneuralnetworkstakeontheentranceofarbitrarylength,butattheoutletisstillemitthe vector of fixed size. One possible solution is thearchitecture of the encoder-decoder the main idea of thismethod is the first of the vector of fixed size, describing theinput sequence and then deploying already in the outputsequence [13]. In more detail, the process usually occurs asfollows:
Ø Loginispassedthroughtheneuralnetworkandcellstate (instead of in the hidden state) after passingthe whole sequence is considered to be a vectordescribingtheentrance.
Ø Thisphaseencodes thedata inavector; therefore,theneuralnetworkiscalledencoder.
Ø At the second stage (decoding) is the task of thevector understands the output sequence. For thisthecell states the initial stateofalso technology inthefollowingnetwork,whichconsistentlygeneratessymbolsfromtheoutputAlphabetuntilitgeneratesaspecialterminalsymbol.Often,thisarchitectureisusedinthetasksofmachinetranslation,soherewasaconceptoftheoutputalphabet,asintheclassicalgenerated at time t is input to the network in thenextmomentof time t 1. But in the subject in thework of the task, it is not necessary, the outputalphabet(encodedmanywords)hasasmallvolume,and the symbols of the received letters, which inturn is a lot of words, do not form between anyrelationships. Therefore, used described thearchitecture without Reference generated by thecharacter further in the process of decoder. TheprocedurestructurecanbeseeninFigure6
Figure6Modeloftheencoder-decoderLSTM
Asyouexit theexpectedsequenceofmarksofwords,spokenon this video, too shortwordswere as emptyword, actually,thismeansthatalltheshortwordswereinonelargeclass.
THERESULTS
The example uses a two-layer perceptronwith two nonlinearneurons in the first layer and one in the second layer. Theoperationofthealgorithmforbackpropagationoftheerroris
5
broken down into the following steps: the assignment of theinput and the desired output, the direct passage of the inputsignaltotheoutput,the inversepropagationoftheerror,and
the change in the weights. Variables permitting to trace theoperationofthealgorithmforreversepropagationoferror
Table1Theresultsofthephasedimplementationofthebackpropagationalgorithm
StageDirectdistributionoftheinput
signalDirectdistributionoftheinputsignal Changingweights
A1(1),A1(2) Logsig(W1P+B1)=
=[0,321,0,368]
Notrunning Notrunning
А2 Pureline(W1P+В1)=
=0,446
» »
е t-A2=1,261 » »
N1(1),N1(2) Notrunning
]100,0,049,0[
)(log
221
1
=
=⋅⋅∂
∂NWNNsim
»
N2 Тоже
522,2)(22
2 −=⋅∂
∂⋅−
eNNpurelin
»
W1(1)
W1(2)
» Notrunning
]420,0,265,0[lr 111
−−=
=×⋅−= PNWW
B1(1),B1(2) » »
]140,0,475,0[lr 111
−−=
=⋅−= NBB
B2 » » 732,0lr 222 =⋅−= NBB
W2(2) » »
]077,0,171,0[lr 1222
=
=×⋅−= NNWW
Fig 7 Result of approximation of vectors by a two-layerperceptronandFig8.Radialbasisfunction
Fig9approximationbymeansofa
radialbasisneuralnetworktest
6
Acknowledgement
TheDeanshipofScientificResearch,JerashUniversity,supportedthisWork.
Conclusion
The article describes a mathematical model of a middle earperson with the help of psychoacoustic perception approachheightsandobtainedfromhimtheimageclassification.Aretheresultsoftheexperimentsspeechrecognitionbasedonneuralnetworks, The advantages of this method, you can make itsufficient simplicity of implementation, as well as the veryobvious analogy with the processes taking place in the realorgan of hearing rights. Disadvantage is the level of errors intherecognition(13-23%)whichisofferedtoreducetheuseofcontextual recognition offer Recognition of individual voicecommands.automatedKeywords fromthestreamof speech,whichareassociated withtheprocessingoftelephonecalls
or sphere of security. The ability to learn and summarize theaccumulated knowledge, the neural network has the featuresof artificial intelligence. A network trained on a limited set ofdataisabletogeneralizetheinformationobtainedandtoshowgoodresultsondatanotusedinthelearningprocess.
Acharacteristicfeatureofthenetworkisalsothepossibilityofits implementationwith theuseof technologyof a very largedegree of integration. The difference in network elements issmall, and their repeatability is enormous. This opens theprospectofcreatingauniversalprocessorwithahomogeneousstructure,capableofprocessingavarietyofinformation.
Reference
[1]. MOHAMMAd, A. L., and A. L. AbdUSAMAd. "ArtificialIntelligence Technique for Speech Recognition based onNeuralNetworks."(2014).
[2].AlSmadi,T.,AlIssa,H.A.,Trad,E.,&AlSmadi,K.A.(2015).Artificial intelligence for speech recognition based onneuralnetworks.J.SignalInform.Process,6,66-72.
[3]. Graves, Alex,Abdel-rahman Mohamed, and GeoffreyHinton. "Speech recognition with deep recurrent neuralnetworks."Acoustics, speech and signal processing(icassp), 2013 ieee international conference on. IEEE,2013.
[4]. Abdel-Hamid, Ossama, et al. "Convolutional neuralnetworks for speech recognition."IEEE/ACMTransactionson audio, speech, and language processing22.10 (2014):1533-1545.
[6]. Abdel-Hamid,O.,Mohamed,A.-R.,Jiang,H.,andPenn,G.(2012).Applying convolutional neural networks conceptsto hybrid NN-HMM model for speech recognition.InICASSP,pages4277–4280.IEEE.
[7]. Dede, G., Sazlı, M.H.:Speech recognition with artificialneural networks. Digital Signal Processing 20, 763–768(2010)
[8].Dahl, G., Yu, D., Li, D., and Acero, A. (2011). Largevocabulary continuous speech recognition with context-dependentdbn-hmms.InICASSP.
[9].Graves, A., Jaitly, N., and Mohamed, A. (2013). HybridspeechrecognitionwithdeepbidirectionalLSTM.InASRU.
[10]. Toscano, J.C., McMurray, B.:Cue Integration WithCategories: Weighting Acoustic Cues in Speech UsingUnsupervised Learning and DistributionalStatistics.CognitiveScience34,434–464(2010)
[11. Al Smadi ,An Improved Real-Time Speech In Case OfIsolated Word Recognition, Int. Journal of EngineeringResearch and Application, Vol. 3, Issue 5, Sep-Oct 2013,pp.01-05
[12] .Siniscalchi, SabatoMarco, et al. "Exploiting deep neuralnetworks for detection-based speech recognition."Neurocomputing106(2013):148-157.
[13]. Smadi, Kalid A., and Takialddin Al Smadi. "AutomaticSystemRecognition of License Plates usingNeuralNetworks."(2017).