matlab15: audio processing

Matlab 15:AudioProcessing

Cheng-Hsin HsuNationalTsingHuaUniversity

DepartmentofComputerScience

SlidesandsamplecodesarebasedonthematerialsfromProf.RogerJang

CS3330ScientificComputing 1

WhatAreAudioSignals?

• Audiosignalsare…– Signalsthatareaudibletohuman,suchasspeechandmusic

– Therangeoffundamentalfrequenciesofaudiblesignalsisabout20~20000Hz.• Therangeiswiderfortheyoungpeople,narrowerfortheelderly.

VoiceGeneration&Reception

• Stepsinvoicegeneration&reception– Vibrationofvoicesource– Resonancebysurroundingobjects

– Travelingthroughair(orothermedia)

– Receptionofmembranesandneuronsatinnerears

– Recognitionbybrains

• Examples– Singing– Whistling– Guitar– Flute

CategorizationofAudioSignals• Numberofsources

– Monophonic– Polyphonic

• Waveform– Quasi-periodicsound

• voicedsoundofspeech– Aperiodicsound

• Unvoicedsoundofspeech

• Sourcetypes– Soundsfromanimals

(bioacoustics)• Dogbarking,catmeowing,frogcroaking,duckquacking

– Soundsfromnon-animals• Carengines,thunders,musicinstruments

S/U/VinSpeech

❚ SpeechsignalscanbedividedintoS,U,V❙ S(silence):nospeechactivity❙ U(unvoiced):speechactivitywithoutvibrationfromvocalchords

❙ V(voiced):speechactivitywithvibration❚ HowtodetectS,U,V?❙ Byputtingyourhandonyourthroattofeelthevibration

❙ Bywaveformobservation

ToolsforGeneralAudioProcessing

• Toolsforrecordingandwaveformobservation– CoolEdit– GoldWave– Audacity–MATLAB

• Quizquestion!–Whatisthemajordifferencebetweenthewaveformsofspeechandwhistle?

SpeechSignalof“Sunday”

• Unvoicedvs.voicedframes

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-1

-0.5

0

0.5

1

Time (sec)

Ampl

itude

Details waveform of "Sunday"

0.18 0.2 0.22-1

-0.5

0

0.5

1

0.54 0.56 0.58-1

-0.5

0

0.5

1

Silence,UnvoicedandVoicedSounds

• ExamplesofS,U,V– “Six”

– “資訊系”s u v u sv u v

s u v s u s

Quiz candidate!

HumanSpeechProduction

Source-filterModelforHumanSpeechProduction

Speechissplitintoarapidlyvaryingexcitationsignalandaslowlyvaryingfilter.Theenvelopeofthepowerspectracontainsthevocaltractinfo.

Twoimportantcharacteristicsofthemodelarefundamentalfrequency (f0)andformants (F1,F2,F3,…)

unvoiced

voiced

TheVocalTract

GlottalVolumeVelocity&ResultingSoundPressure(Voiced)

SpeechProduction

Glottal Pulses Vocal Tract Speech Signal

(a) Source Spectrum (c) Output Energy Spectrum

+

+ =

=

(b) Filter Function

VideosforVocalCordsMovement

• Movementofvocalcords– http://www.youtube.com/watch?v=mJedwz_r2Pc– http://www.youtube.com/watch?v=v9Wdf-RwLcs

ParametersforRecording

• Threemajorparametersforrecordingaudiofiles– Samplerate:no.ofsamplespersec

• 8kHz(phonequality)• 16KHz(forcommonspeechrecognition)• 44.1KHz(CDquality)

– Bitresolution:no.ofbitsforrepresentingasample• 8-bit (uint8withrange:0~255)• 16-bit(int16withrange:-32768~32767)

– Noofchannels• Mono• Stereo

StorageforAudioFiles

• Examplesofstoragerequirement– 1min.ofrecordingwithfs=16000,nbits=16,#channel=1è 60(sec)*16(KHz)*2(byetes)*1(channel)=1920KB=1.92MB

– 3-minsofCDmusicwithfs=44.1KHz,nbits=16,#channel=2è 180(sec)*44.1(KHz)*2(bytes)*2(channels)=31752KB=32MB

HowtoGenerateaSineWaveSignal

–Mathformula:–MATLABcode:

duration=3;f=440;fs=16000;time=(0:duration*fs-1)/fs;y=0.8*sin(2*pi*f*time);plot(time,y);sound(y,fs);

)2sin(* qp += ftay

18

• Analog:continuousphenomenon,betweenanytwopointsthereexistinfinitenumberofpoints– Mostnaturalphenomena

• Discrete:points(eitherintimeorspace)areclearlyseparated

• Computersworkwithdiscretevaluesè analog-to-digitalconversion

• Digitalmedia:– Betterquality,lesssusceptibletonoise– Morecompacttostoreandtransmit(high

compressionratios)

Analog&DiscretePhenomena

19

• Sampling:– choosediscretepointsatwhichwemeasure(sample)thecontinuoussignal

– Samplingrateisimportanttorecreatetheoriginalsignal

• Quantization:– Representeachsampleusingafixednumberofbits

– Bitdepth(orsamplesize)specifiestheprecisiontorepresentavalue

Analog-to-DigitalConversion:TwoSteps

20

• Nyquist frequency– Theminimumsamplingratetoreconstructtheoriginalsignal: r=2f

– f isthefrequencyofthesignal

• Undersamplingcanproducedistorted/differentsignals(aliasing)

Sampling

21

• f=637 Hz

• Samplingat770(<2f)producesadifferentwave

UnderSampling:Example

22

• n bitstorepresentadigitalsampleèmaxnumberofdifferentlevelsism=2n

• è real,continuous,samplevaluesarerounded(approximated)tothenearestlevels

• è Someinformation(precision)couldbelost

Quantization

PropertiesofAudioSignals• Volume:amplitude,loudness,intensity,orenergy• Pitch:fundamentalfrequency• Timbre:tonecolororquality

0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1

Sample index

Amplit

ude

taiwan.wav

50 100 150 200 250 300 350 400 450 500-1

-0.5

0

0.5

1

Sample index within the frame

Amplit

ude

Time-domainFeatures

n Time-domainaudiofeaturespresentedinaframe(analysiswindow)

Intensity

Fundamental period

Timbre: Waveform within an FP

Frequency-domainFeatures• Frequency-domainaudiofeaturesinaframe– Energy:Sumofpowerspectrum– Pitch:Distancebetweenharmonics– Timber:Smoothedspectrum Second formant

F2First formantF1

Pitch freq

Energy

Read,Play,andVisualizeAudioFiles

• Useaudioread toreadawavfile• Usesoundtoplaythesound• Plotthewaveform[y, fs]=audioread('bear.wav');sound(y, fs);time=(1:length(y))/fs;plot(time, y);

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

ReadtheMetadatainAudioFiles

• Reading metadata– info=audioInfo(‘file’);– Different types of

audio files may returndifferent fields of info.

• Two types of readingdata from audio files– For audio itself

• y= audioread(‘file’)– For metadata

• info=audioinfo(‘file’)

fileName='bear.wav';info=audioinfo(fileName);fprintf('Filename = %s\n', info.Filename);fprintf('Compression = %s\n', info.CompressionMethod);fprintf('No. Channels = %g\n', info.NumChannels);fprintf('Smpling Rate = %g Hz\n', info.SampleRate);fprintf('Samples = %g\n', info.TotalSamples);fprintf('Duration = %g secs \n', info.Duration);fprintf('Resolution = %g bits/sample\n', info.BitsPerSample);

NormalizationonAudioSignals

• Dataisaudiofiles– 8bitsè uint8,[0,28-1]– 16bitsè int16,[-215,

215-1]

• MATLAB’smethodtoscaletorange[-1,1]– 8bitsè (y-128)/128– 16bitsè y/32768

• CheckMATLABs’scaling

fileName='bear.wav';[y, fs]=audioread(fileName);info=audioinfo(fileName);nbits=info.BitsPerSample;y0=round(y*32768)

StereoAudioFiles

• audioread canalsoreadstereowavfiles

• EachcolumnrepresentsaL-Rchannel

fileName= 'bear.wav';[y, fs]=audioread(fileName);sound(y, fs);left=y(:,1);right=y(:,2);subplot(2,1,1), plot((1:length(left))/fs, left);subplot(2,1,2), plot((1:length(right))/fs, right);

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1

-0.5

0

0.5

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-1

-0.5

0

0.5

1

ReadPartofanAudioFile

– Ifweonlywanttoreadpartsofanaudiofile– audioRead05.m

[y,fs]=audioread('bear.wav', [5001 7000]); figure; plot(y)

0 200 400 600 800 1000 1200 1400 1600 1800 2000-0.03

-0.02

-0.01

0

0.01

0.02

0.03

PlayAudioinMatlab

• AfterreadingaudiofilesintoMatlab• Wecanalsoprocesstheaudiodata– Increase/decreasevolume– increase/reducepitches– De-noise

• Antthenplayouttheresultaudiosignals

PlayAudio (1/2)

• Playasinglesound

• Playmultiplesounds

load bear.matsound(y, fs);

[y, fs]=audioread('bear.wav'); sound(y, fs);pause(1);sound(3*y, fs);pause(1);sound(3*y, fs*0.8);

PlayAudio (2/2)

n Createaaudioobjectn audioplayern playn playblocking

• Playasinglesound

• SequentiallyPlaymultiplesounds

load bear.matp=audioplayer(y, fs);play(p);

[y, fs]=audioread('bear.wav'); p=audioplayer(y, fs);playblocking(p);load bear.matp=audioplayer(y, fs * 1.2);playblocking(p);

ChangingtheAmplitudes

– Adjust volumes– Question: do you think the volume goes up for 3,

5, and 7 times in the following example?

[y, fs]=audioread('bear.wav');p=audioplayer(1*y, fs); playblocking(p);p=audioplayer(3*y, fs); playblocking(p);p=audioplayer(5*y, fs); playblocking(p);p=audioplayer(7*y, fs); playblocking(p);

ChangingtheSamplingRates(1/2)

– New sampling rates à new play duration *and*new pitches

– In the following example, the play durations getshorter and the pitches go higher à sounds likeDonald Duck

[y, fs]=audioread('bear.wav');p=audioplayer(y, fs);p.SampleRate=1.0*fs; playblocking(p);p.SampleRate=1.2*fs; playblocking(p);p.SampleRate=1.5*fs; playblocking(p);p.SampleRate=2.0*fs; playblocking(p);

ChangingtheSamplingRates(2/2)

– In the following example, the play durations getlonger and the pitches go lowerà sounds likecows

[y, fs]=audioread('bear.wav');p=audioplayer(y, fs);p.SampleRate=1.0*fs; playblocking(p);p.SampleRate=0.8*fs; playblocking(p);p.SampleRate=0.6*fs; playblocking(p);p.SampleRate=0.4*fs; playblocking(p);

ObservationsnObservations

nHighersamplerateforplaybackleadsto…nShorterdurationandhigherpitch

nLowersamplerateforplaybackleadsto...nLongerdurationandlowerpitch

nHowto…nGeneratehigherpitchwithoutdurationchange?

è PitchmodificationnCreatelongerdurationwithoutpitchchange?

è Durationmodification

ChangetheAudioSignals

– 0) Play the wav as-is– 1) Change the sign of audio signals– 2) Reverse the signals (along the time domain)– What will happen?

[y, fs]=audioread('bear.wav');p=audioplayer(y, fs); playblocking(p);p=audioplayer(-y, fs); playblocking(p);p=audioplayer(flipud(y), fs); playblocking(p);

VolumeAdjustment

– Soundsc scales the data so that the sound isplayed as loud as possible without clipping

[y, fs]=audioread('bear.wav');sound(y, fs);fprintf('Press any key to continue...\n'); pausesoundsc(y, fs);

RecordAudioFiles

• Wehaveseenhowtoreadaudiofiles• Wehavelearnedhowtoplayaudiofiles• Let’screatenewaudiofiles– audiorecorder– recordblocking

AudioRecordingExample(1/2)

• Record3secondsusingdefaultsettings

– Defaultsettings• Samplingrate:8000Hz• Persampleresolution:8bits• Mono

duration=3;recObj=audiorecorder;recordblocking(recObj, duration);fprintf('Press any key to play out：'); pauseplay(recObj);

AudioRecordingExample(2/2)

• Usenon-defaultsettings

fs=16000; % sampling ratenBits=16; % bit resolutionnChannel=1; % no. channelduration=3; % duration in secondsrecObj=audiorecorder(fs, nBits, nChannel);fprintf('Press any key to start recording for %g seconds：', duration); pausefprintf('recording...');recordblocking(recObj, duration);fprintf('Press any key to playout...'); pauseplay(recObj);y = getaudiodata(recObj, 'double');% get data as a double arrayplot((1:length(y))/fs, y);xlabel('Time (sec)'); ylabel('Amplitude');

WriteAudioRecordsasFiles(1/2)

• Matlab alsoallowsustosaverecordingsasfiles– audiowrite(audioFile,y,fs)• audioFile isthefilename，yistheaudiosample，fsisthesamplingrate

load bear.mataudioFile='bear2.wav';fprintf('Saving to %s...\n', audioFile);audiowrite(audioFile, y, round(1.5*fs));fprintf('Press any key to play %s...\n', audioFile);dos(['open ', audioFile]);

WriteAudioRecordsasFiles(2/2)

• Combinerecording,playing,andsavingintothefollowingcode

fs=16000;nBits=16;nChannel=1;duration=3;recObj=audiorecorder(fs, nBits, nChannel);fprintf('Press any key to record for %g seconds：', duration); pauserecordblocking(recObj, duration);y = getaudiodata(recObj, 'double');plot((1:length(y))/fs, y); xlabel('Time (sec)'); ylabel('Amplitude');sound(y, fs);audioFile='myRecording.wav';fprintf('Saving to %s...\n', audioFile);audiowrite(audioFile, y, fs);system('open myRecording.wav');

Matlab #13Homework(M13)1. Waverecording(2%):WriteaMATLABscriptrecordMyVoice01.mto

record10secondsofyourspeech,sayintroduceyourself.SaveyourrecordingasmyVoice.wav.Otherrecordingparametersare:samplerate=16KHz,bitresolution=16bits.PleaseusethescripttoprintoutanswerstothefollowingquestionswithintheMATLABwindow.– HowmuchspaceistakenbytheaudiodataintheMATLABworkspace?– Whatthedatatypeoftheaudiodata?– Howdoyoucomputetheamountoftherequiredmemoryfromthe

recordingparameters?– WhatisthesizeofmyVoice.wav?– HowmanybytesisusedinmyVoice.wav torecordoverheadsotherthan

theaudiodataitself?

CS3330ScientificComputing 45

matlab15: audio processing

Documents