matlab15: audio processing
TRANSCRIPT
Matlab 15:AudioProcessing
Cheng-Hsin HsuNationalTsingHuaUniversity
DepartmentofComputerScience
SlidesandsamplecodesarebasedonthematerialsfromProf.RogerJang
CS3330ScientificComputing 1
WhatAreAudioSignals?
• Audiosignalsare…– Signalsthatareaudibletohuman,suchasspeechandmusic
– Therangeoffundamentalfrequenciesofaudiblesignalsisabout20~20000Hz.• Therangeiswiderfortheyoungpeople,narrowerfortheelderly.
VoiceGeneration&Reception
• Stepsinvoicegeneration&reception– Vibrationofvoicesource– Resonancebysurroundingobjects
– Travelingthroughair(orothermedia)
– Receptionofmembranesandneuronsatinnerears
– Recognitionbybrains
• Examples– Singing– Whistling– Guitar– Flute
CategorizationofAudioSignals• Numberofsources
– Monophonic– Polyphonic
• Waveform– Quasi-periodicsound
• voicedsoundofspeech– Aperiodicsound
• Unvoicedsoundofspeech
• Sourcetypes– Soundsfromanimals
(bioacoustics)• Dogbarking,catmeowing,frogcroaking,duckquacking
– Soundsfromnon-animals• Carengines,thunders,musicinstruments
S/U/VinSpeech
❚ SpeechsignalscanbedividedintoS,U,V❙ S(silence):nospeechactivity❙ U(unvoiced):speechactivitywithoutvibrationfromvocalchords
❙ V(voiced):speechactivitywithvibration❚ HowtodetectS,U,V?❙ Byputtingyourhandonyourthroattofeelthevibration
❙ Bywaveformobservation
ToolsforGeneralAudioProcessing
• Toolsforrecordingandwaveformobservation– CoolEdit– GoldWave– Audacity–MATLAB
• Quizquestion!–Whatisthemajordifferencebetweenthewaveformsofspeechandwhistle?
SpeechSignalof“Sunday”
• Unvoicedvs.voicedframes
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-1
-0.5
0
0.5
1
Time (sec)
Ampl
itude
Details waveform of "Sunday"
0.18 0.2 0.22-1
-0.5
0
0.5
1
0.54 0.56 0.58-1
-0.5
0
0.5
1
Silence,UnvoicedandVoicedSounds
• ExamplesofS,U,V– “Six”
– “資訊系”s u v u sv u v
s u v s u s
Quiz candidate!
HumanSpeechProduction
Source-filterModelforHumanSpeechProduction
Speechissplitintoarapidlyvaryingexcitationsignalandaslowlyvaryingfilter.Theenvelopeofthepowerspectracontainsthevocaltractinfo.
Twoimportantcharacteristicsofthemodelarefundamentalfrequency (f0)andformants (F1,F2,F3,…)
unvoiced
voiced
TheVocalTract
GlottalVolumeVelocity&ResultingSoundPressure(Voiced)
SpeechProduction
Glottal Pulses Vocal Tract Speech Signal
(a) Source Spectrum (c) Output Energy Spectrum
+
+ =
=
(b) Filter Function
VideosforVocalCordsMovement
• Movementofvocalcords– http://www.youtube.com/watch?v=mJedwz_r2Pc– http://www.youtube.com/watch?v=v9Wdf-RwLcs
ParametersforRecording
• Threemajorparametersforrecordingaudiofiles– Samplerate:no.ofsamplespersec
• 8kHz(phonequality)• 16KHz(forcommonspeechrecognition)• 44.1KHz(CDquality)
– Bitresolution:no.ofbitsforrepresentingasample• 8-bit (uint8withrange:0~255)• 16-bit(int16withrange:-32768~32767)
– Noofchannels• Mono• Stereo
StorageforAudioFiles
• Examplesofstoragerequirement– 1min.ofrecordingwithfs=16000,nbits=16,#channel=1è 60(sec)*16(KHz)*2(byetes)*1(channel)=1920KB=1.92MB
– 3-minsofCDmusicwithfs=44.1KHz,nbits=16,#channel=2è 180(sec)*44.1(KHz)*2(bytes)*2(channels)=31752KB=32MB
HowtoGenerateaSineWaveSignal
–Mathformula:–MATLABcode:
duration=3;f=440;fs=16000;time=(0:duration*fs-1)/fs;y=0.8*sin(2*pi*f*time);plot(time,y);sound(y,fs);
)2sin(* qp += ftay
18
• Analog:continuousphenomenon,betweenanytwopointsthereexistinfinitenumberofpoints– Mostnaturalphenomena
• Discrete:points(eitherintimeorspace)areclearlyseparated
• Computersworkwithdiscretevaluesè analog-to-digitalconversion
• Digitalmedia:– Betterquality,lesssusceptibletonoise– Morecompacttostoreandtransmit(high
compressionratios)
Analog&DiscretePhenomena
19
• Sampling:– choosediscretepointsatwhichwemeasure(sample)thecontinuoussignal
– Samplingrateisimportanttorecreatetheoriginalsignal
• Quantization:– Representeachsampleusingafixednumberofbits
– Bitdepth(orsamplesize)specifiestheprecisiontorepresentavalue
Analog-to-DigitalConversion:TwoSteps
20
• Nyquist frequency– Theminimumsamplingratetoreconstructtheoriginalsignal: r=2f
– f isthefrequencyofthesignal
• Undersamplingcanproducedistorted/differentsignals(aliasing)
Sampling
21
• f=637 Hz
• Samplingat770(<2f)producesadifferentwave
UnderSampling:Example
22
• n bitstorepresentadigitalsampleèmaxnumberofdifferentlevelsism=2n
• è real,continuous,samplevaluesarerounded(approximated)tothenearestlevels
• è Someinformation(precision)couldbelost
Quantization
PropertiesofAudioSignals• Volume:amplitude,loudness,intensity,orenergy• Pitch:fundamentalfrequency• Timbre:tonecolororquality
0.5 1 1.5 2 2.5 3 3.5
x 104
-1
-0.5
0
0.5
1
Sample index
Amplit
ude
taiwan.wav
50 100 150 200 250 300 350 400 450 500-1
-0.5
0
0.5
1
Sample index within the frame
Amplit
ude
Time-domainFeatures
n Time-domainaudiofeaturespresentedinaframe(analysiswindow)
Intensity
Fundamental period
Timbre: Waveform within an FP
Frequency-domainFeatures• Frequency-domainaudiofeaturesinaframe– Energy:Sumofpowerspectrum– Pitch:Distancebetweenharmonics– Timber:Smoothedspectrum Second formant
F2First formantF1
Pitch freq
Energy
Read,Play,andVisualizeAudioFiles
• Useaudioread toreadawavfile• Usesoundtoplaythesound• Plotthewaveform[y, fs]=audioread('bear.wav');sound(y, fs);time=(1:length(y))/fs;plot(time, y);
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
ReadtheMetadatainAudioFiles
• Reading metadata– info=audioInfo(‘file’);– Different types of
audio files may returndifferent fields of info.
• Two types of readingdata from audio files– For audio itself
• y= audioread(‘file’)– For metadata
• info=audioinfo(‘file’)
fileName='bear.wav';info=audioinfo(fileName);fprintf('Filename = %s\n', info.Filename);fprintf('Compression = %s\n', info.CompressionMethod);fprintf('No. Channels = %g\n', info.NumChannels);fprintf('Smpling Rate = %g Hz\n', info.SampleRate);fprintf('Samples = %g\n', info.TotalSamples);fprintf('Duration = %g secs \n', info.Duration);fprintf('Resolution = %g bits/sample\n', info.BitsPerSample);
NormalizationonAudioSignals
• Dataisaudiofiles– 8bitsè uint8,[0,28-1]– 16bitsè int16,[-215,
215-1]
• MATLAB’smethodtoscaletorange[-1,1]– 8bitsè (y-128)/128– 16bitsè y/32768
• CheckMATLABs’scaling
fileName='bear.wav';[y, fs]=audioread(fileName);info=audioinfo(fileName);nbits=info.BitsPerSample;y0=round(y*32768)
StereoAudioFiles
• audioread canalsoreadstereowavfiles
• EachcolumnrepresentsaL-Rchannel
fileName= 'bear.wav';[y, fs]=audioread(fileName);sound(y, fs);left=y(:,1);right=y(:,2);subplot(2,1,1), plot((1:length(left))/fs, left);subplot(2,1,2), plot((1:length(right))/fs, right);
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1
-0.5
0
0.5
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1
-0.5
0
0.5
1
ReadPartofanAudioFile
– Ifweonlywanttoreadpartsofanaudiofile– audioRead05.m
[y,fs]=audioread('bear.wav', [5001 7000]); figure; plot(y)
0 200 400 600 800 1000 1200 1400 1600 1800 2000-0.03
-0.02
-0.01
0
0.01
0.02
0.03
PlayAudioinMatlab
• AfterreadingaudiofilesintoMatlab• Wecanalsoprocesstheaudiodata– Increase/decreasevolume– increase/reducepitches– De-noise
• Antthenplayouttheresultaudiosignals
PlayAudio (1/2)
• Playasinglesound
• Playmultiplesounds
load bear.matsound(y, fs);
[y, fs]=audioread('bear.wav'); sound(y, fs);pause(1);sound(3*y, fs);pause(1);sound(3*y, fs*0.8);
PlayAudio (2/2)
n Createaaudioobjectn audioplayern playn playblocking
• Playasinglesound
• SequentiallyPlaymultiplesounds
load bear.matp=audioplayer(y, fs);play(p);
[y, fs]=audioread('bear.wav'); p=audioplayer(y, fs);playblocking(p);load bear.matp=audioplayer(y, fs * 1.2);playblocking(p);
ChangingtheAmplitudes
– Adjust volumes– Question: do you think the volume goes up for 3,
5, and 7 times in the following example?
[y, fs]=audioread('bear.wav');p=audioplayer(1*y, fs); playblocking(p);p=audioplayer(3*y, fs); playblocking(p);p=audioplayer(5*y, fs); playblocking(p);p=audioplayer(7*y, fs); playblocking(p);
ChangingtheSamplingRates(1/2)
– New sampling rates à new play duration *and*new pitches
– In the following example, the play durations getshorter and the pitches go higher à sounds likeDonald Duck
[y, fs]=audioread('bear.wav');p=audioplayer(y, fs);p.SampleRate=1.0*fs; playblocking(p);p.SampleRate=1.2*fs; playblocking(p);p.SampleRate=1.5*fs; playblocking(p);p.SampleRate=2.0*fs; playblocking(p);
ChangingtheSamplingRates(2/2)
– In the following example, the play durations getlonger and the pitches go lowerà sounds likecows
[y, fs]=audioread('bear.wav');p=audioplayer(y, fs);p.SampleRate=1.0*fs; playblocking(p);p.SampleRate=0.8*fs; playblocking(p);p.SampleRate=0.6*fs; playblocking(p);p.SampleRate=0.4*fs; playblocking(p);
ObservationsnObservations
nHighersamplerateforplaybackleadsto…nShorterdurationandhigherpitch
nLowersamplerateforplaybackleadsto...nLongerdurationandlowerpitch
nHowto…nGeneratehigherpitchwithoutdurationchange?
è PitchmodificationnCreatelongerdurationwithoutpitchchange?
è Durationmodification
ChangetheAudioSignals
– 0) Play the wav as-is– 1) Change the sign of audio signals– 2) Reverse the signals (along the time domain)– What will happen?
[y, fs]=audioread('bear.wav');p=audioplayer(y, fs); playblocking(p);p=audioplayer(-y, fs); playblocking(p);p=audioplayer(flipud(y), fs); playblocking(p);
VolumeAdjustment
– Soundsc scales the data so that the sound isplayed as loud as possible without clipping
[y, fs]=audioread('bear.wav');sound(y, fs);fprintf('Press any key to continue...\n'); pausesoundsc(y, fs);
RecordAudioFiles
• Wehaveseenhowtoreadaudiofiles• Wehavelearnedhowtoplayaudiofiles• Let’screatenewaudiofiles– audiorecorder– recordblocking
AudioRecordingExample(1/2)
• Record3secondsusingdefaultsettings
– Defaultsettings• Samplingrate:8000Hz• Persampleresolution:8bits• Mono
duration=3;recObj=audiorecorder;recordblocking(recObj, duration);fprintf('Press any key to play out:'); pauseplay(recObj);
AudioRecordingExample(2/2)
• Usenon-defaultsettings
fs=16000; % sampling ratenBits=16; % bit resolutionnChannel=1; % no. channelduration=3; % duration in secondsrecObj=audiorecorder(fs, nBits, nChannel);fprintf('Press any key to start recording for %g seconds:', duration); pausefprintf('recording...');recordblocking(recObj, duration);fprintf('Press any key to playout...'); pauseplay(recObj);y = getaudiodata(recObj, 'double');% get data as a double arrayplot((1:length(y))/fs, y);xlabel('Time (sec)'); ylabel('Amplitude');
WriteAudioRecordsasFiles(1/2)
• Matlab alsoallowsustosaverecordingsasfiles– audiowrite(audioFile,y,fs)• audioFile isthefilename,yistheaudiosample,fsisthesamplingrate
load bear.mataudioFile='bear2.wav';fprintf('Saving to %s...\n', audioFile);audiowrite(audioFile, y, round(1.5*fs));fprintf('Press any key to play %s...\n', audioFile);dos(['open ', audioFile]);
WriteAudioRecordsasFiles(2/2)
• Combinerecording,playing,andsavingintothefollowingcode
fs=16000;nBits=16;nChannel=1;duration=3;recObj=audiorecorder(fs, nBits, nChannel);fprintf('Press any key to record for %g seconds:', duration); pauserecordblocking(recObj, duration);y = getaudiodata(recObj, 'double');plot((1:length(y))/fs, y); xlabel('Time (sec)'); ylabel('Amplitude');sound(y, fs);audioFile='myRecording.wav';fprintf('Saving to %s...\n', audioFile);audiowrite(audioFile, y, fs);system('open myRecording.wav');
Matlab #13Homework(M13)1. Waverecording(2%):WriteaMATLABscriptrecordMyVoice01.mto
record10secondsofyourspeech,sayintroduceyourself.SaveyourrecordingasmyVoice.wav.Otherrecordingparametersare:samplerate=16KHz,bitresolution=16bits.PleaseusethescripttoprintoutanswerstothefollowingquestionswithintheMATLABwindow.– HowmuchspaceistakenbytheaudiodataintheMATLABworkspace?– Whatthedatatypeoftheaudiodata?– Howdoyoucomputetheamountoftherequiredmemoryfromthe
recordingparameters?– WhatisthesizeofmyVoice.wav?– HowmanybytesisusedinmyVoice.wav torecordoverheadsotherthan
theaudiodataitself?
CS3330ScientificComputing 45