automatic measurement and modelling of contact sounds · 2011-11-22 · this thesis documents the...

Automatic Measurementand Modelling of Contact Sounds

by

JoshuaL. Richmond

B.A.Sc.,Universityof Waterloo,1998

A THESISSUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTSFORTHE DEGREEOF

Master of Science

in

THE FACULTY OFGRADUATE STUDIES

(Departmentof ComputerScience)

Weacceptthis thesisasconformingto therequiredstandard

The University of British Columbia

August2000

c

JoshuaL. Richmond,2000

Abstract

Soundplaysan importantrole in our everydayinteractionswith theenvironment.Soundmodelsenablevirtual objectsto producerealisticsounds.The manualcreationofsoundmodelsfrom realobjectsis tediousandinaccurate.A brief review of soundmodelsis presented,with detailsof asoundmodelfor contactsounds.

This thesisdocumentsthe developmentof a systemfor the automaticacquisitionof soundmodels. The systemis composedof four modules:a soundacquisitiondevice,an asynchronousdataserver, an algorithmfor computingprototypicalsoundmodelsandan adaptive samplingalgorithm. A descriptionof eachmoduleand its requirementsisincluded.Implementationsof eachmodulearetestedandexplained.

Resultsof typical datacollectionsarediscussed.Soundmodelsfor a calibrationobject,brassvase,plasticspeaker andtoy drumareconstructedusingthesystem.Compar-isonsof thesoundmodelsto theoriginal recordingsaredisplayedfor eachobject.

Under ideal circumstancesthe systemproducesaccuratesoundmodels. Environ-mentalnoise,however, decreasestheaccuracy of theestimationtechnique.An evaluationof theparameterestimationalgorithmconfirmsthisobservation.

Many opportunitiesexist for future work on this system.Ideasfor improvementsandfutureinvestigationsaresuggested.

ii

Contents

Abstract iii

Contents v

List of Tables ix

List of Figures xi

Acknowledgements xv

Dedication xvii

1 Intr oduction 1

2 Contact Sounds 5

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 A ContactSoundModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 EmpiricalParameterEstimation . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 PerformanceEvaluation . . . . . . . . . . . . . . . . . . . . . . . 11

3 A Systemfor Automatic Measurementof Contact Sounds 15

3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 SystemRequirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

iii

3.3 SystemOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 A SoundAcquisition Device 21

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 TheActiveMeasurementFacility . . . . . . . . . . . . . . . . . . . . . . . 23

4.5 SoundEffector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.6 SoundCaptureHardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.7 SoundEffectorSoftware . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 An AsynchronousData Server 29

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2 Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.1 GenericDataServer Requirements. . . . . . . . . . . . . . . . . . 30

5.2.2 SoundServer Requirements . . . . . . . . . . . . . . . . . . . . . 30

5.3 Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3.1 Server Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3.2 DataFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3.3 Specialisationto SoundData . . . . . . . . . . . . . . . . . . . . . 35

5.4 ImplementationDetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.4.1 PerformanceEvaluation . . . . . . . . . . . . . . . . . . . . . . . 37

5.4.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Building a Prototypical Model 41

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.3 SpectrogramAveraging. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.4 PerformanceEvaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

iv

7 An AdaptiveSampling Algorithm 49

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.3 Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.4 SurfaceRepresentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.5 AcousticDistanceMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.5.1 Frequency-Independent DampingCoefficient . . . . . . . . . . . . 56

7.5.2 Frequency Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.6 PerceptualThresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8 SampleData Collections 61

8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8.2 TuningFork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

8.2.1 EstimationResults . . . . . . . . . . . . . . . . . . . . . . . . . . 63

8.3 BrassVase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63


8.3.2 RefinementResults. . . . . . . . . . . . . . . . . . . . . . . . . . 68

8.4 PlasticSpeaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69



8.5 Toy Drum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72



9 Conclusions 77

9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

9.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Bibliography 81

v

Appendix A SoundEffector Specifications 85

A.1 MountingBracket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.2 Controlcircuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Appendix B Effect of White Noiseon Spectrograms 89

Appendix C Detailsof Unique Frequency-MappingAlgorithm 91

vi

List of Tables

5.1 Resultsof soundserver performanceevaluation. . . . . . . . . . . . . . . . 39

7.1 Regressionof similarity on frequency anddecaydifference.. . . . . . . . . 59

7.2 Examplesof calculatedsimilarity factors. . . . . . . . . . . . . . . . . . . 60

8.1 Summaryof setupparametersfor testobjects. . . . . . . . . . . . . . . . . 62

8.2 Summaryof refinementresults.. . . . . . . . . . . . . . . . . . . . . . . . 67

vii

List of Figures

2.1 Attributesof contactsounds. . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 ParameterEstimationAlgorithm. . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Meanerrorsof theparameterestimationalgorithmon syntheticdata. . . . . 13

3.1 SoundMeasurement-to-ProductionPipeline.. . . . . . . . . . . . . . . . . 16

3.2 Generalprocedurefor soundmodelcreation.. . . . . . . . . . . . . . . . . 19

3.3 A testobjecton theACME teststation.. . . . . . . . . . . . . . . . . . . . 20

4.1 TheSoundEffector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Digital OutputConnectionserver architecture.. . . . . . . . . . . . . . . . 27

5.1 Dataserver architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Dataserver statediagram.. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3 Control loop for RUNNING state.. . . . . . . . . . . . . . . . . . . . . . . 33

5.4 Control loop for CAPTURING state. . . . . . . . . . . . . . . . . . . . . . 34

5.5 Server dataflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.6 OneAudioPacket. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.7 Spectrogramof soundcontaining“pops”. . . . . . . . . . . . . . . . . . . 38

6.1 Meanerrorsof theparameterestimationalgorithmonspectrogram-averaged

syntheticdata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 Standarddeviation of errorfor estimateddampingparameters.. . . . . . . 47

viii

7.1 Adaptive samplingalgorithm.. . . . . . . . . . . . . . . . . . . . . . . . . 50

7.2 Resultof adaptive sampling. . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.3 Loop schemerefinementmasksfor subdivision. . . . . . . . . . . . . . . . 54

7.4 Algorithm to find uniquefrequency mappingbetweentwo models. . . . . . 58

8.1 Setupfor acquiringmodelof tuningfork. . . . . . . . . . . . . . . . . . . 62

8.2 Recordedspectrogram.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

8.3 Resultsof tuningfork collection. . . . . . . . . . . . . . . . . . . . . . . . 62

8.4 A photoof thebrassvaseandthesubdivision surfacewhich representsit. . 64

8.5 Setupfor acquiringsoundmodelof brassvase. . . . . . . . . . . . . . . . 64

8.6 Resultsof brassvaseexperiment.. . . . . . . . . . . . . . . . . . . . . . . 66

8.7 Detail of narrow-bandlow-frequency noise . . . . . . . . . . . . . . . . . 67

8.8 Effectof noiseon low-frequency modes.. . . . . . . . . . . . . . . . . . . 67

8.9 Refinementresultsof brassvaseexperiment.. . . . . . . . . . . . . . . . . 68

8.10 A photoof theplasticspeaker andthesubdivisionsurfacewhichrepresentsit. 70

8.11 Resultsof plasticspeaker experiment. . . . . . . . . . . . . . . . . . . . . 71

8.12 Refinementresultsof plasticspeaker experiment. . . . . . . . . . . . . . . 72

8.13 A photoof thetoy drumandthesubdivision surfacewhich representsit. . . 73

8.14 Pivot of thedrum’s metalbars. . . . . . . . . . . . . . . . . . . . . . . . . 73

8.15 Resultsof toy drumexperiment(metal). . . . . . . . . . . . . . . . . . . . 74

8.16 Resultsof toy drumexperiment(plastic).. . . . . . . . . . . . . . . . . . . 75

8.17 Refinementresultsof toy drum.. . . . . . . . . . . . . . . . . . . . . . . . 76

A.1 Bendingschedulefor soundeffectormountingbracket. . . . . . . . . . . . 86

A.2 Schematicfor solenoidcontrolcircuit. . . . . . . . . . . . . . . . . . . . . 87

B.1 Effectof white noiseonspectrograms.. . . . . . . . . . . . . . . . . . . . 90

C.1 Algorithm to find uniquefrequency mappingbetweentwo models. . . . . . 92

C.2 Differencematrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

ix

C.3 Orderarray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

C.4 Index array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

C.5 Matchedarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

C.6 Mappingarray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

C.7 FindUniqueMappingPseudo-code. . . . . . . . . . . . . . . . . . . . . . 96

x

Acknowledgements

“Great discoveries and improvementsinvariably involve the cooperationofmany minds.” - AlexanderGrahamBell

This thesisis the result of encouragementand supportof dozensof my friends.Without the contributions listed below, this thesismay have never proceededbeyond thecourseprojectfrom which it began.

I owe ACME memberspastandpresenta hugethankyou! In particular, JochenLangandJohnLloyd contributedthebulk of theACME softwarewithin whichmy experi-mentsresided.Thankyou for theseeminglyendlessdiscussionsregardingthesensorclass,mysteriousabortsandassortedoddACME behaviours.

DerekDiFilippo, Paul Kry andDougJamescontributedmany focusedideaswhenminewerefuzzy. Thanksfor all theinsightinto sound,surfacesandlife!

My supervisor, DineshPai, playedan enormousrole in the developmentof thisthesis. His thoughtfulsuggestionsandadvicesteeredme out of dark cornerswhenever Igot lost. Thankyou for beinga fantasticsupervisorandfriend!

Many otherfriendssuppliedadviceandsupportwhenI neededit mostover thesepasttwo years.JimGreenandJacobOfir wereconstantbarometersof my progress.BonnieTaylor, KristenPlayfordandAndreaBuntkeptmegoingthroughtoughtimes.Kelly Croningave meagoodreasonto finish. And of course,therestof thefaculty, staff andstudentsofthedepartmentalwaysmademefeelwelcome.

This thesiswould not have beenpossiblewithout thegenerousfinancialcontribu-tionsof NSERC,BC AdvancedSystemsInstituteandIRIS.

JOSHUA L. RICHMOND

TheUniversity of British ColumbiaAugust2000

xi

Dedicatedto myfamily: Gary, CarolandEricaRichmond.

Thankyou for yourunwaveringencouragementandsupport.Youareterrific examplesof

successin work, loveandlife.

xii

Chapter 1

Intr oduction

Whenradiodramatistsbeganaddingsoundeffectsto their productionsin the1920’s, they

demonstratedthe importantrole of soundin our perceptionof everydayeventsandenvi-

ronments[20]. Without visual aid, listenerscould easily identify unspoken actions,e.g.,

a character’s entranceinto a roomby thesoundof a dooropening,followed thesoundof

footsteps.Evenwith theadventof motionpicturesin 1896,it wasquickly realizedthatthe

rattles,pingsandknocksof eventswereneededto cementtherealismof theviewing expe-

rience1. What theentertainmentindustryhaslearnedis thatsoundis a critical component

of oureverydaylives.

Theubiquity of soundin our livesis not surprisingwhenthephysicsof contactis

considered.Whencontactoccursbetweentwo objects,the energy of the impactis trans-

ferredto eachobject. This energy propagatesthroughtheobjects,producingvibrationsof

their surfaces.The frequency, amplitudeanddecayof thesevibrationsdepend,in part,on

theshapeandmaterialof theobject[31]. Thesesurfacevibrationscreatefluctuationsin the

air pressurearoundeachobjectandareperceivedassound.Such“contactsounds”provide

a listenerwith muchinformation: locationof contact,contactforce,materialcomposition

of theobjectaswell asits shape,sizeandsurfacetexture.Humanscanusetheseaudiocues1Theemergenceof soundin film in 1926wasdelayedonly by thetechnicaldifficultiesof syn-

chronisingrecordedsoundandfilm. In fact,ThomasEdisonviewedmoving picturesasanaccessoryto thephonograph![29]

1

Draft of 3:03am,Friday, August18,2000 2

to recogniseeventsanddiscriminatebetweenmaterials[10, 12, 16].

Given the importanceof soundto real-life interactions,it is clear that soundis a

requiredcomponentof any successfulvirtual environmentor simulation.Oneapproachto

including soundin virtual environmentsis synthesisfrom physics-basedmodels[30]. A

soundmodelof anobjectcansynthesiseappropriatecontactsounds,givena contactforce

andcontactlocation,by direct computation.By “soundmodel” we meana mathematical

representationthatcanbeusedfor simulationof theobject’s sound.This is in contrastto an

approachwhererecordedsamplesaresimply replayed,or modulatedto accountfor various

forcesandlocations.An advantageof themodel-basedapproachis thatonemodelcanbe

usedto generatesoundsfor any numberof different interactions(e.g.,scraping,pinging,

rolling) by substitutingtheappropriateforceprofile.

Theparametersof asoundmodelcanbederivedanalyticallyfor objectswith simple

geometryandmaterialcomposition[30]. Most everydayobjects,however, have complex

geometryandmaterialcompositionwhich complicateanalyticalsolutions. For suchob-

jects,modelparameterscanbeestimatedfrom empiricalmeasurements[30].

Recentwork on robotic perceptionhasdeterminedthat soundmodelscanalsobe

usedto automaticallyidentify materials[8, 17].

A soundmodelparameterisedover the surfaceof an objectcanbe viewed asan

acousticmapof theobject,analogousto a texturemapin graphics.To createsucha sound

modelof an object from empiricalmeasurementsrequiresrecordingsoundsat hundreds

of locationsover theobject’s surface.To populateevena simplevirtual environmentwith

sonifiedobjectscouldthereforerequirethousandsof measurements.Thesystemdescribed

in this thesisautomatesthesoundmeasurementprocedure.To our knowledge,this system

is the first to automaticallycreatecompletesoundmodelsof objects. This soundmodel

canbe registeredto othermodelsof an object (e.g.,surface,deformation)to representa

completereality-basedmodel.Partsof thiswork werepublishedpreviously: [25], [26].

A review of soundmodelsandmeasurementtechniquesis presentedin Chapter2.

The requirementsof a soundmeasurementsystemandan overview of our systemarein-


cludedin Chapter3. Thecomponentsof thesystemaresubsequentlydescribed:thesound

acquisitiondevice (Chapter4), asynchronousdataserver (Chapter5), prototypicalmodel

generation(Chapter6) andadaptivesamplingalgorithm(Chapter7). Resultsof typicaldata

collectionsarepresentedin Chapter8, andconclusionson thework arestatedin Chapter9.

Chapter 2

Contact Sounds

2.1 Overview

Contactsoundsarethesoundsproducedby oureverydayinteractionswith theenvironment.

Weusethephrase“everyday”in thecontext of Gaver’sdefinitionof everydaylistening: the

perceptionof eventsfrom thesoundsthey make [9]. Justashecontrastseverydaylistening

to musicallistening, we distinguishcontactsoundsfrom musicalsounds.Musicalsounds

are themselves the intendedresult of someaction; contactsoundsare the unintentional

consequenceof an action. In general,musicalsoundsareharmonicwhile contactsounds

areinharmonic. Thesedistinctionsarenot strict, but serve to separatethe domainof our

taskfrom themodellingof musicalinstruments.

As anillustrativeexampleof contactsounds,imaginethesoundproducedby setting

your coffee mug onto your desk,or pushingit acrossthe desk. The soundof the mug

striking thedesk,andthescrapingsoundof it beingpushed,arebothcontactsounds.The

easewith which thesesoundsarebroughtto mind emphasisesthe importanceof soundto

our everydayexperiences;thoughyou mayhave never consciouslyattendedto thesounds

in reallife, they canbeeasilyrecalled.

Muchusefulinformationis conveyedby contactsounds.Gaverwasthefirst to enu-

meratethis informationin [9]. Heclassifiedtheinformationinto threecategories:material,

4


Structure

Type ForceDensity Damping Internal

Cavities

Interaction Configuration

Shape Size Resonating

Material

Restoring

Force

Figure2.1: Attributesof contactsounds.Gaver identifiedthreecategoriesof informationconveyedby contactsounds:material,interactionandconfiguration.Eachcategory is com-posedof moreprimitive dimensionsasshown. Adaptedfrom[9].

interactionandconfiguration.Thesecategoriesareexpandedin Figure2.1. Suchinforma-

tion is beneficialto many applications.For instance,it is usefulfeedbackin teleoperative

systems.As mentionedpreviously, theadditionof contactsoundsto virtual environments

andsimulationsenhancestherealismof theenvironment.DurstandKrotkov alsodemon-

stratedtheuseof contactsoundmodelsfor automaticmaterialclassification[8]. Further-

more,recentwork hasrelatedtheparametersof acontactsoundmodelto humanperception

of material[16, 12].

The systemdescribedby this thesisautomaticallyacquiresthe measurementsre-

quiredto createcontactsoundmodels.By “contactsoundmodel”,wemeanamathematical

representationthat canbe usedto synthesisethe soundproducedby contactbetweentwo

objects.Themodelis parameterisedover thesurfaceof anobject,muchlike a texturemap

in graphics.

This chapteris anoverview of themathematicalmodelwe useto representcontact

sounds.A brief descriptionof the modelandits parametersis given. The algorithmfor

estimatingthemodel’s parametersfrom recordedsamplesis alsoexplained.

2.2 RelatedWork

Much researchhasbeenconductedin the field of soundmodelling. In particular, many

peoplehave modelledmusicalinstruments.Physicalmodelling is possiblefor a variety

of musicalinstruments[1]. As oneexample,ChaigneandDoutautmodelledtheacoustic


responseof woodenxylophonebarsusingaone-dimensionalEuler-Bernoulliequationwith

the additionof two dampingtermsanda restoringforce [3]. This formulationis derived

from thegeometryof thexylophonebars,with empiricalvaluesfor thedampingparameters.

Anotherexampleis Cook andTruemanwho usedprincipal componentanalysis,Infinite

ImpulseResponse(IIR) filter estimationandwarpedlinearpredicationmethodsto model

thedirectionalimpulseresponseof stringedinstrumentsfrom empiricaldata[4].

Gaver introducedthe modellingof everydaysounds[9]. His modelof metaland

woodbarsis basedon thewave equationwith anexponentialdampingterm for eachfre-

quency mode. The fundamentalmode,its initial amplitudeanddampingfactoraredeter-

minedempirically. Partialmodesarecalculatedasratiosof thefundamentalmode.

Recently, a physics-basedmodelfor contactsoundsof everydayobjectswasdevel-

opedby van denDoel [30, 31]. The modelof van denDoel wasselectedfor usein our

work, andwill bedescribedin thesubsequentsectionsof this chapter.

The modelof van denDoel is appealingfor several reasons.First, it is similar in

structureto modelsusedby Gaver [9], Hermes[12] andKlatzky et al. [16] in their per-

ceptionexperiments. Suchperceptualstudiesmay yield resultsuseful to evaluatingthe

performanceof our system.Furthermore,thesestudiesprovide informationapplicableto

ouradaptive samplingalgorithmasdescribedin Chapter7. Secondly, themodelis suitable

for synthesisingsoundsin real-time— a necessarycomponentof interactive simulations.

Finally, it wasselectedfor its effectivenessat representingcontactsoundsof everydayob-

jects.

It shouldbenotedthatthemeasurementsystemdescribedhereinis not restrictedto

producingonespecificsoundmodel.Any soundmodelwhoseparameterscanbeestimated

from recordingsof acousticimpulseresponsesmaybecreatedusingthissystem.

2.3 A Contact SoundModel

Formulatinga generalsoundmodel is complex becauseit dependson many parameters:

material,geometry, massdistribution, etc. This sectionfollows the developmentof the


contactsoundmodelpresentedin [30, 31].

If we assumelinear-elasticbehaviour1, thevibrationof anobject’s surfaceis char-

acterisedby thefunction . Here representsthedeviation of thesurfacefrom equi-

librium point at time . obeys a wave equationof the form in Equation2.1, where is aself-adjointdifferentialoperatorand is aconstantrelatedto thespeedof soundin the

material[31].

(2.1)

In theabsenceof externalforces,Equation2.1canbesolvedby theexpression

"!$# % & ' (2.2)

where and # aredeterminedby boundaryconditions, arerelatedto theeigenvaluesof

operator and ' arethecorrespondingeigenfunctions.

Whentheobjectis sufficiently far from the listener, thesoundpressuredueto this

solutioncanbe approximatedby the impulseresponsefunction in Equation2.3. The ex-

ponentialtermis addedto modelmaterialdamping.Completedetailsof this derivationare

providedin [30].

( ) )*"+ " ,.- / 021 3 4 5 6 7 8. , - (2.3)

This modelexpressesthe soundpressure( at time , asthesumof 9;: frequency

modes.Here <" is the locationof theforceimpulse.Eachmode = in themodelhasa fre-

quency ",.- , initial amplitude ,.- andexponentialdampingfactor> , - . Thedampingfactor

modelsmaterialdampingdueto internalfriction; in theliterature[32], this is parameterised

by aninternalfriction parameter? asexpressedin Equation2.4.

> ,.- @ , - A B ?" (2.4)1A reasonableassumptionfor light contacts.


1. ComputethewindowedDFT spectrogramof therecordedsample.

2. Identify thesignal.

3. Estimatethefrequency modes.

4. Estimatethedampingparameters.

5. Estimatetheinitial amplitudes.

Figure2.2: ParameterEstimationAlgorithm.

For eachlocation C on thesurfaceof an object,theparametersDE F G , H E F G and I E.F Gmustbeestimatedfor all JLK modes.

Becausethe model is an impulseresponsemodel,soundsproducedby any linear

forceinteractioncanbesynthesisedby asimpleconvolution of theforceandthemodel.

2.4 Empirical Parameter Estimation

Theparametersfor thesoundmodelasit is expressedin Equation2.1 canbe determined

analyticallyby Equation2.2for objectsof simplegeometryandmaterialcomposition[30].

Of course,to solve this expressionfor arbitraryeverydayobjectsis not feasible.Thealter-

native is to estimatetheparametersof a similar model(i.e., Equation2.3) from empirical

measurements.Thatis, theobjectis struckat position C , theresultingsoundrecorded,and

theparametersestimatedfrom therecording.Examplesof suchtechniquesare[4, 8, 30].

Sinceweareusingthemodelof vandenDoel,wewill usetheparameterestimation

algorithmdescribedin the samework. An overview of the algorithmis describedin this

section.Figure2.2 lists thestepsof thealgorithm;abrief explanationof eachstepfollows.

Readersrequiringmoredetailarereferredto [30].

In the first step,a spectrogramis computedfor the entire recordedsample. The

spectrogramis computedby calculatingthe discreteFourier transform(DFT) on fixed-


width (in time) segmentsof the recording. The segmentsareselectedusingoverlapping

Hanningwindows. For detailson Hanningwindows and the DFT, a good introduction

is [28].

The signalmustnow be isolatedfrom thebackgroundnoise(Steptwo). Theseg-

mentof thespectrogramwith maximumintensityis thestartof thesignal. This peakcor-

respondsto the onsetof the impact. The endof the signal is the first segment M whose

intensity NPO falls below QN$RTSVUW NYX , where QN is theaverageintensityof theregion before

thesignal’s start, U)W NYX is thestandarddeviation of that region and S is a constantwith a

typical valueof 10.

To estimatethefrequency modes(Stepthree),ahistogramis created.Eachsegment

(in time)of thespectrogramwithin thesignalregioncastsvotesfor the Z;[ frequencieswith

greatestamplitudewithin thatsegment.The Z [ frequenciesthatobtainthemostvotesover

all thesegmentsareselectedasthedominantfrequency modes(\"].^ _ ) of themodelat that

location.

For eachmode,the ` a b of thespectrogramis fit to the linear function cPd"_ MR$e_ ,where M is an index into thesegmentsof thespectrogram,andthesignalstartsat segment

Mfg . Thedampingcoefficientsarethencalculatedby Equation2.5.

h ] ^ _fjiYk l d"_ m Zon (2.5)

where i is thewindow overlapfactorof theDFT, k l is thesamplingfrequency and Z is

thesizeof theDFT window.

The initial amplitudeof eachfrequency modecan then be calculatedby Equa-

tion 2.6.

p ] ^ _frq s t u ] ^ _v c q wx y z t n (2.6)

with u ].^ _"f h ] ^ _ Zm k l .As stated,thealgorithmassumestherecordingis theresponseto animpulsive im-

pact.Thealgorithmcanbeusedwith responsesto otherforcesby de-convolving thesignal


with theforceprofile.

Again, themeasurementsystemdescribedby this thesiscoulduseany suitablepa-

rameterestimationtechnique,and is not restrictedto the methodoutlined above. This

techniquewaschosenbecauseit is designedspecificallyfor theselectedsoundmodel.

2.4.1 PerformanceEvaluation

An evaluationof theestimationalgorithmwasperformedusingsyntheticdatato determine

its robustnessto noise. Sincethe real sampleswill be recordedin a relatively noisy envi-

ronment,thisevaluationis necessaryto estimatetheexpectedperformanceof thealgorithm

with realdata.

To evaluatetheestimationalgorithm,a testsignalis synthesisedusingthemodelin

Equation2.3. Fifty frequency modesarerandomlyselectedfrom a Gaussiandistribution

(| = 7 kHz, = 4 kHz). For eachfrequency mode,dampingfactors( ~ ) andinitial am-

plitudes( ) arerandomlyselectedfrom auniform distribution of ranges and respectively. Theinitial amplitudesarescaledsothat V" " " .

Noise is addedto this testsignalat a specifiedsignal-to-noiseratio. Most of the

noisein the ACME environmentoriginatesfrom fansusedto cool equipment. A quick

spectralanalysisof theroomrevealedthehighestconcentrationof noisein abandfrom 0 to

200Hz. Ambientwhite noisewaspresent,thoughat lower energy levels. This roomnoise

is approximatedin oursimulationby low-passGaussiannoise,band-limitedat200Hz by a

fourth-orderfilter. Broadbandwhite noiseis addedat of thenormalisedamplitude

of thelow-frequency noise.

Onehundredtrialsof theevaluationwereexecuted.For eachtrial, a testsignalwas

synthesisedwith noiseaddedateightsignal-to-noiselevels: (i.e.,nonoise),100,50,30,

20,15,10and52. Thenoisysignalwasthenprocessedby theestimationalgorithmoutlined

in Section2.4.Theentiretestwasimplementedin Matlab.2The signal-to-noiseratio is calculatedasthe maximumsignalamplitudedividedby the max-

imum noiseamplitude.This yields an appropriatemeasureof signal-to-noisefor our experimentssincethesignaldecaysto zeroamplitude.


Theestimatedmodelparametersarecomparedby thefollowing metrics.Eachesti-

matedfrequency mode is assumedto betheestimationof thegeneratedfrequency mode thatminimisesthedifference $ o) ¡ . A logarithmicerror ratio (Equa-

tion 2.7)wascomputedfor eachestimatedmodelparameter. Themeansof theseerrorratios

areplottedin Figure2.3.

¢¤£ ¦¥ § ¨j©£ ª£ «¢¤¬ ¦¥ § ¨ ©¬ ª¬ «¢¤ ¦¥ § ¨ © ª «(2.7)

where is chosento minimise 2® ;) ¡ .As illustratedby Figure2.3, theparameterestimationalgorithmis extremelysen-

sitive to noise. For estimatesof initial amplitudeandfrequency, the error increasescon-

sistentlywith increasinglevels of noise(Figure2.3 (a,b)). The trendis not asconsistent

for thedampingparameterestimate(Figure2.3 (c)). However, asthe dashedline in Fig-

ure2.3(c) indicates,thevarianceof estimationerrorincreasesdramaticallywith increasing

noise.Most importantly, it shouldbenotedthat frequency estimateshave anerror ratio of

almost30 atevenmodestlevelsof noise(SNR= 100).


0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

noise/signal

Ea

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.210

15

20

25

30

35

40

45

50

55

60

noise/signal

Ef

(a) Meaninitial amplitudeerrors( ¯°P± ) (b) Meanfrequency errors( ¯°P² )

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

0.5

1

1.5

noise/signal

mea

n E

d

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

50

100

150

200

250

300

350

400

450

std

dev

of m

ean

Ed

Mean Std Deviation

(c) Meandampingerrors( ¯°¤³ )Figure2.3: Meanerrorsof theparameterestimationalgorithmonsyntheticdata.Themeanerror over 100 trials is plotted for eight signal-to-noiseratios: ´ , 100, 50, 30, 20, 15,10 and5. For convenience,the inverseratio (noise/signal)is usedasthe abscissa.Sub-figure(c) alsoplotsthestandarddeviation of errorin thedampingparameterestimate.SeeEquation2.7for definitionsof

°P²,°P±

and° ³

.

Chapter 3

A Systemfor Automatic

Measurementof Contact Sounds

3.1 Objectives

Thegoal of this work is to producea systemwhich automaticallyacquirescontactsound

measurementsover the surfaceof an arbitraryobject for the purposeof creatinga sound

model. This systemis one componentof a larger project focusedon creatingcomplete

reality-basedmodels.A reality-basedmodelis onewhoseparametersarecalculatedfrom

empirical measurementsof real objects. A completereality-basedmodel is inherently

multi-modal, with soundcomprisingonly one part. Other modescould include surface

texture,shapeanddeformation.

As part of the reality-basedmodelling project, our soundmeasurementsystem

sharesits developmentplatform: theActive Measurement(ACME) facility at theUniver-

sity of British Columbia(UBC). TheACME facility is a fifteendegree-of-freedom(DOF)

robotdesignedto acquiremeasurementsfor reality-basedmodelcreation[21]. Thefacility

is equippedwith a 5-DOF gantry, 6-DOF robot arm on a linear stagestageanda 3-DOF

test stage. Theseactuatorsare usedto position sensorsand make measurementsof ob-

jectsmountedon the teststation. Sensorsincludea 3-CCD colour camera,Triclops [22]

13


(x,y,z)model

model(x,y,z)

Scope of thesis

(x,y,z)Force

Simulation

mix

er

Position

estimateSound Measurement

Sound Measurementestimate

(x,y,z)Convolve Environmental Effects

(reverb, spatialisation, etc.)

Figure3.1: Thesoundmeasurement-to-production pipeline. Measurementsat many loca-tionson anobject’s surface(x, y, z) areacquiredto createa modelat eachlocation.Forceandposition input from a simulationareusedto synthesisesoundsproducedby interac-tions with the object. A mixer algorithminterpolatesbetweensoundmodelsat differentlocations. Environmentaleffectssuchasreverb andspatialisationenhancethe realismofthesynthesisedsound.Ourscopeis restrictedto measurementandmodelling(shadedbox).

trinocularvisionsystemanda 6-DOFforce/torquesensor.

Thework that is thesubjectof this thesisincludesthedesignandimplementation

of acontactsoundmeasurementsystemfor ACME. As describedin Section3.3,thesound

measurementsystemincludesall the hardware and software neededto meet the design

requirementslistedin Section3.2.

The scopeof the project is restrictedto a systemfor the measurementof contact

sounds.Relatedresearch,includingsoundsynthesis,interpolationbetweensoundmodels

andmodellingof theenvironmentis outsidethescopeof this thesis.Figure3.1placesour

projectin thecontext of thesoundmeasurement-to-production pipeline.

3.2 SystemRequirements

To be successful,a soundmeasurementsystemmustbe ableto deliver low-inertia, near-

impulsive impactsover thesurfaceof anarbitraryobject.Modelsgeneratedby thesystem

mustberegisteredto thesurfaceof theobjectfor integrationwith othermodesof thecom-

pletereality-basedmodel.

The forceprofile of eachimpactmustbeknown sothat theparametersof a sound


modelcanbeestimatedasdescribedin Section2.4.

The systemshouldacquiremodelsautomatically. Enoughsamplesshouldbe col-

lectedto adequatelyrepresentthe variation in soundover the entireoutersurfaceof the

object.

Sincethe systemwill be a componentof ACME, all devicesusedby the system

mustbe controllableunderthe ACME framework. That is, an interfaceto every device

mustexist in theACME Java architectureandbeoperablefrom aremoteworkstation.

It is assumedthatasurfacerepresentationof thetestobjectis providedto thesound

measurementsystemby theACME trinocularcamera.This aspectof thesystemis not an

objective of thecurrentresearchandis notdiscussedherein.

More detailedrequirementsof eachcomponentof thesoundmeasurementsystem

arepresentedin therespective chapters.

3.3 SystemOverview

A soundmeasurementsystemwas designedand implementedto meetthe requirements

listedin Section3.2.Thesystemcanbeviewedasfour modules:asoundacquisitiondevice,

anasynchronousdataserver, analgorithmfor computingprototypicalsoundmodelsandan

adaptivesamplingalgorithm.Eachof thesemoduleswill bediscussedin thefollowing four

chapters.

Chapter4 outlinesthe requirementsandimplementationof the soundacquisition

device. This includesthedesignof a soundeffectorfor theACME robotarm,thesoftware

to controlit andtheselectionandlocationof microphones.Thesoundeffectoris asolenoid

thatmountson theendof thePuma260robotarmanddeliversnear-impulsive impactsto

objectsundercomputercontrol.

A descriptionof the asynchronousdataserver is provided in Chapter5. The data

serverarchitectureis adaptableto many typesof data.Its designandspecialisationto sound

dataarediscussed.

Oneadvantageof usinga robotic systemis the ability to recordmultiple samples


at thesamesurfacelocation. A collectionof samplescanbeusedto generatea prototyp-

ical modelwhich bestrepresentsthesoundat that location. Onealgorithmfor generating

prototypicalmodelsis explainedin Chapter6.

Thefinal moduleis anadaptive samplingalgorithmfor selectingpointson theob-

ject’s surfaceat which to sample.Thesamplingalgorithmusestheobject’s surfacerepre-

sentationto createa meshof samplelocations.It is adaptive becauseit usesdifferencesin

soundmodelsto adjustthe granularityof the samplingmesh. The surfacerepresentation

andadaptive algorithmaredescribedin Chapter7.

The generalprocedurefor creatinga completesoundmodelusing this systemis

diagrammedin Figure3.2. First, a testobject is positionedat the centerof the test sta-

tion (Figure3.3). Typically the object is attachedsuchthat it doesnot move from light

contact,yet remainsfree to vibrate. A surfacemodelof theobject is thenacquiredusing

the trinocularstereocamera.From this surfacemodel,a coarsesamplingmeshis created

usingthe surface’s verticesassamplelocations. Next, the soundeffector is positioned5

mmawayfrom eachsamplelocation.This is donewith aguardedmove: thesoundeffector

approachesthesamplelocationuntil contactis sensed,thenretracted5 mm. Oncein po-

sition, thesoundeffector is actuatedmultiple times,theresultantsoundsamplesrecorded,

andmodelscomputed.From the multiple samples,a prototypicalmodel is producedfor

eachsamplelocation. If two prototypicalmodelsat adjoiningverticesof thesurfacemesh

differ by toomuch,thesamplingmeshis refined,andanew modelis acquiredataposition

betweenthetwo coarselocations.Thisprocedurecontinuesuntil nofurtherrefinementsare

required.

Thecompletesoundmodelproducedby this procedureis easilyregisteredto other

modesof theobjectmodelsinceit createssoundmodelsat verticesof theobject’s surface

model.

Resultsof sometypical datacollectionsarepresentedin Chapter8. Theaccompa-

nying CD containsaudiotrackscomparingrecordedsoundsto soundssynthesisedfrom a

model,andaJava applicationfor browsingthesoundmodelof anobject.


neighbouring sample locations

Compute acoustic distance to

> threshold?

Is distance

times

Repeat M

Create prototypical

model

End

Yes

Yes

Yes

No

No

No

Refine sample

grid

sampled?

Is grid completely

model

Acquire surface

model

Select next

Position test

object

Move sound effector

sample location

Start

to sample location

and record sample

Create sound

Actuate sound effector

Move sound effector

back 5 mm

sensed?

Is contact

Figure3.2: Generalprocedurefor soundmodelcreation.


Figure3.3: A testobjecton theACME teststation.

Chapter 4

A SoundAcquisition Device

4.1 Overview

In orderto estimatethe parametersof an object’s soundmodel,a recordingof the object

beingstruckwith aknown forceis required.Theprocessof estimatingparametersfrom the

recordingis describedin Section2.4. Sinceour goal is to createsoundmodelsautomati-

cally, werequireadevicewhich is capableof strikingarbitraryobjectsin amannersuitable

for theestimationprocess.A device is alsorequiredto recordtheresultantsounds.More

detailedrequirementsof thesedevicesarelistedin Section4.3.

Thischapterdescribesadevicethatwasdesignedto strikeobjectsfor thepurposeof

soundmeasurement.Thedeviceis anendeffectorfor thePuma260robotarmwhichis part

of theActive Measurementfacility at UBC. TheActive Measurementfacility is described

in Section4.4. Detailsof theendeffector andsoundcapturinghardwarearepresentedin

Sections4.5and4.6.

Section4.7describesthesoftwarerequiredto controltheendeffector. Thesoftware

usedto recordthesoundsamplesis thetopic of Chapter5.

19


4.2 RelatedWork

Althoughthis is thefirst work to automaticallycreatesoundmodelsover theentiresurface

of anobject,otherpeoplehave investigatedsoundmodelcreationfrom recordingsof real

objects.Themechanismsby which theserecordingswereproducedarevaried.

Cook andTruemanrecordeddirectionalimpulsesfrom a numberof acousticin-

strumentsby striking theinstruments’stringswith aModalShopModel086C80miniature

force hammer[4]. The soundwasrecordedusinga twelve-microphoneicosahedralgrid

assembly. Recordingswerestoredusingtwo TascamDA-88 digital audiotaperecorders.It

wasnot mentionedhow preciselythe impactlocationswereregisteredin space,sincereg-

istrationwasnot importantto thestudy. Impactswereperformedmanually, andthe force

profileof eachimpactwasusedto filter “bad” impacts.Thecommercialforcehammerused

by CookandTruemanis not suitedto our taskbecauseit cannoteffect animpactunderits

own power. While it couldbemountedat theendof a robotarm,theswingingof ahammer

is difficult to control.

DurstandKrotkov createdimpulseresponsemodelsof differentmaterialsby drop-

ping an aluminumcanewith a plastic tip onto eachmaterial[8]. The canewasdropped

from a constantheightthrougha cylindrical guide[17]. Recordingsweremadeusingan

omni-directionalcondensermicrophoneconnectedto a PC-basedA/D board.This method

is clearlynot suitablefor our task.

van denDoel createdsoundmodelsfrom impactswith a non-instrumentedham-

mer [30]. Again, thehammermechanismis not suitedto our application;this experiment

is mentionedbecausesatisfactoryresultswereobtainedwith only acrudeapproximationof

theimpactforce.

Huangusesa tappingdevice to position and orient objectsin a plane[13]. His

requirementsof thedevice aresimilar to ours,excepthedoesnot requirethedevice to op-

erateoff thehorizontalplane.Themechanismheproposesis similar to apinballplunger;a

spring-loadedrod is releasedby anelectriclatch, thenreloadedautomatically[W. Huang,

personalcommunication,April 28, 2000]. Unfortunately, his device maynot operatecor-


rectly with any verticalinclination.

4.3 Requirements

The striking device must deliver low-inertia impactsto objectsat any position on their

surface.This requiresoperationthrougha µY¶ · ¸ verticalrange.

Theforceof impactshouldnotbesostrongasto movetheobject,yetstrongenough

to producea recordablesound. The quantityof force is dependenton the materialbeing

measured.

The force must be near-impulsive. A true impulseis not realisable,but may be

approximatedby a force that is well localisedin spaceandtime. As demonstratedby van

denDoel,suchapproximateimpulsesyield satisfactorysoundmodels[30].

To permitfutureexperiments,thedeviceshouldbeusablefor scrapingobjects.This

will permitacquisitionof soundsfor granularsynthesis,anothermodelfor soundsynthesis.

For integrationwith othermodelsproducedusingtheActive Measurementfacility,

thelocationsof impactmustberegisteredto acommonframe-of-reference.

4.4 The ActiveMeasurementFacility

As mentionedin the first chapter, our systemis a componentof a larger projectat UBC:

theActiveMeasurementfacility (ACME). TheACME facility providesa rich environment

for our measurementsystem. Currently, ACME consistsof threemain subsystems:the

field measurementsystem(FMS),teststationandcontactmeasurementsystem(CMS)[21].

Thesesubsystemsareusedasabaseplatformfor thehardwarerequiredby thesoundmea-

surementsystem.Therolesof eachsubsystemareexpandedbelow.

Thefield measurementsystemis a five degree-of-freedom(DOF) robotconsisting

of a 3-DOF gantry and a pan/tilt unit. Currently a 3 CCD colour cameraand Triclops

trinocularstereocameraareattachedto thepan/tilt unit. TheFMS is usedto acquiremea-

surementsat a distancefrom thetestobject(i.e., in “the field”). A microphoneis addedto


theFMSfor makingsoundmeasurements.

Theteststationis a3-DOFrobotwhichcanposition(x, y) andorientobjectsbeing

measured.Its accuracy is ¹Yº.» º º º ¼ ½ ¾ ¾ and ¹L¿ º arc-min[21].

Thecontactmeasurementsystemconsistsof a6-DOFPuma260robotarmwith an

ATI force/torquesensormountedat its tip. TheCMScanmoveanendeffectorinto contact

with the testobjectin a working volumearoundthe teststation. To preparetheCMS for

soundmeasurement,thethreefansof thePumacontrolboxwerereplacedby whisperfans.

ThePumaarmcannotbeusedto directlystrike theobjectsbecauseits inertiapreventslight,

impulsive impacts.Instead,aspecialendeffectoris attached.Theendeffectoris described

in Section4.5.

Eachof thesesubsystemsis controlledremotelyusingJava-basedcontrolsoftware.

For moredetailson thisarchitecture,referto [21].

UsingtheACME facility asa developmentplatformprovidesregistrationof sound

samplesto other models(e.g., deformationmodels)sinceall ACME sensorsand actua-

torssharea commonframe-of-reference.Thesoundmodelis specificallyregisteredto the

surfacemodelof thetestobject.

4.5 SoundEffector

Wereferto theendeffectordesignedto strike objectsfor soundmeasurementasthesound

effector (seeFigure4.1). This device is centeredon anelectricpush-solenoid(Ledex STA

model195025-227).Thesolenoidis augmentedwith areturn-springandmountingbracket.

A retainingmechanismis designedinto themountingbracket to provide somerigidity in

thede-energisedstateof thesolenoid.This rigidity enablesthesoundeffector to beused

for scrapingobjects.Themountingbracket is constructedfrom aluminumplate.

Thecurrentdesignis minimal soasto reducetheweightof theeffector. For exam-

ple,giventhata threadedrod is availableon therobot’s interfaceplate,thedesignincorpo-

ratestherod for bothattachmentpurposesandastheaforementionedretentiondevice. The

total torqueappliedto the rod by theeffector is approximately0.0653N Àm (total weight:


Figure4.1: Thesoundeffector is a spring-returnpushsolenoidmountedon analuminumbracket. A condensermicrophoneis attachedto thebottomof thebracket.

106 g) — within acceptablelimits of the force/torquesensor(500 g). A blueprintof the

mountingbracket anddetailsof its constructionareincludedin AppendixA.

An interfacecircuit is requiredto activatethesolenoidfrom software. Thecircuit

schematicis includedin AppendixA. Theinterfacecircuit is connectedtoadigital outputof

a PrecisionMicroDynamicsMC8 board.Theboardprovidesa 5 VDC outputcontrollable

from its on-boardSHARCDSPor thehostcomputer. CurrentlytheMC8 boardis alsoused

to run a PID control loop for theFMS andteststation.A descriptionof thesoftwareused

to controlthesoundeffector is presentedin Section4.7.

4.6 SoundCaptureHardware

Thesoundcapturehardwareconsistsof two condensermicrophonesandaPCsoundcard.

The microphonesareOptimusomni-directionallapel microphoneswith a flat fre-

quency responsefrom 70 to 16 000Hz [14]. Onemicrophoneis attachedto thebottomof

thesoundeffector’s mountingbracket (Figure4.1); theotherto thepan/tiltunit of theFMS.

This placementenablesnearand far-field recordingof impactsounds. The microphone


on the FMS canbe moved to any location aroundthe object for experimentsevaluating

directionalimpulseresponses.

A Creative SoundBlasterLive! card is usedto recordthe soundsdigitally. This

card is commerciallyavailable and cansampleat up to 46 kHz [5]. Using a PC sound

cardfacilitateseasyandaffordableupgradesastechnologyimproves.Onedisadvantageof

thecardis thatonly two channelsof soundmayberecordedsimultaneously. Furthermore,

both channelsmustusethe line-in connection,sincethe microphoneconnectionis single

channelled.Thisrestrictsoursystemto usingtwo microphones,bothof whichmustbepre-

amplifiedto line levels. If moreinput channelsarerequired,a high-endsoundcardcould

bepurchased.

4.7 SoundEffector Software

Activation of the soundeffector requiresa unit stepsignal from the digital outputof the

MC8 board. The stepwidth determinesthe stroke distanceof the solenoid. Initially, the

stepfunctionwasto begeneratedin theinterfaceelectronics.A softwaresolution,however,

enablesusto changethewidth of theunit step,andhencestroke length,at runtime.To con-

trol this outputusingACME requiresa Java interfacecompliantwith theACME Device

interface. This sectiondescribesthedesignof theACME DigitalOutputConnect-

ionServer (DOCS)— aninterfacebetweenACME andthedigital outputsof theMC8

board. This interfaceis usedby thesoundeffector, but canalsobeusedby otherdevices

requiringsimilar control(e.g.,lights).

TheDigitalOutputConnectionServer hasthreemain components(Fig-

ure 4.2): a ConnectionServer, OutputServer andoneor moreDigitalOut-

putDevices (e.g.,soundeffector, or spotlight). The MC8 boardhas32 digital output

linesavailablefor externaldevices[23]. TheConnectionServer managesthealloca-

tion of eachoutputline to a specificDigitalOutputDevice. TheOutputServer

is a ‘C’ program(with a Java native interface)which controlstheinitialisation,timing and

output of the signal to the MC8 board. The DigitalOutputDevice is an abstract


H/WRMI

MC8

Server

Connect’n

ACME Server

Experiment

Device

DigOut

Server

Output

Figure4.2: Digital OutputConnectionserver architecture.

implementationof theACME Device interface. Eachdevice usingtheDigitalOut-

putConnectionServer is controlledby a classextendingDigitalOutputDev-

ice. Eachdevice mustlock onechannelof the MC8 for its exclusive useby registering

with theConnectionServer at initialisation.

TheOutputServer andConnectionServer run asseparateprocessesfrom

theACME server. Although theOutputServer andConnectionServer resideon

theSolariscomputerhostingtheMC8 board,devicescanbecontrolledfrom ACME exper-

imentsrunningon any computerbecausetheDigitalOutputDevices communicate

to theConnectionServer usingtheJava RemoteMethodInvocation(RMI) interface.

As mentionedabove, timing of theunit stepis controlledby theOutputServer.

This ‘C’ programprovides resolutionbetterthan a millisecond,limited only by the So-

laris operatingsystem.Isolatingthetiming from theACME experimentensuresconsistent

timing of theoutputsignal.

Chapter 5

An AsynchronousData Server

5.1 Overview

Thenext moduleof thesystemis thesoftwareusedto recordsoundsproducedby striking

the test object. Previously, no genericarchitectureexisted within ACME for capturing

streamingdata.As with theDigitalOutputConnectionServer (seeSection4.7)a

genericdataserver wasdesigned,andthenspecialisedto sounddatafor this research.The

genericdataserver framework becametheSensor classof theACME project.

This chapterdescribesthe designand implementationof an asynchronousdata

server for ACME, including its specialisationto sounddata. The next sectionlists the

requirementsof suchsoftware. Section5.3 describesthe architectureof the dataserver.

Implementationdetailsarediscussedin Section5.4.

5.2 Requirements

This sectionoutlinestherequirementsof a genericdataserver (Section5.2.1)andits spe-

cialisationto sounddata(Section5.2.2).

26


5.2.1 GenericData Server Requirements

Onedataserver mustexist for eachsensordevice. If multiple sensordevicesof thesame

datatypeexist, eachsensormusthave its own dataserver process.

Eachdataserverwill runasaseparateprocess.Thisdivisionenablesdistribution of

theserverprocessesovermultiplecomputers.If thesensorhardwareresidesonacomputer

that is not the main ACME host, the sensorprocessshouldalsoresideon that computer.

Sincedatacollectioncanbeanintensive operation,distribution increasesthedataservers’

ability for real-timecollectionby allocatingmoreprocessingresources.Distribution also

reducestheamountof datawhichmustbestreamedover thenetwork in real-time.

If platform-dependentsoftwareis requiredfor a dataserver, it mustnot preventthe

ACME experimentfrom accessingthedatafrom adifferentoperatingsystem.

The dataserver mustbe asynchronous.That is, the dataserver processcontrols

thestartingandstoppingof datacollectionindependentlyof themainACME server. This

autonomyeliminatesthe needfor the dataserver to streamdatato the ACME server in

real-time.Sincesensordatacanbelarge(e.g.,imagedatafrom cameras)or frequent(e.g.,

44.1kHz for sound)it is implausibleto transmiteachframeof datato theACME server for

real-timemonitoring.

Thecriteriafor startingandstoppingdatacollectionmustbedefinableby theACME

experiment.

A methodmustbe provided to the ACME experimentthat indicateswhendatais

beingcollectedandwhenit hasfinished.

5.2.2 SoundServer Requirements

Thesoundserver mustcapturedataat at leasttwo samplingrates:44 100Hz and22 050

Hz. If the capturinghardware supportshigher samplingrates,the soundserver should

accommodatethemby userdefinableproperties.Theformatof thedatamaybe8 or 16-bit,

andoneor morechannels(to thelimitationsof thecapturinghardware).

SincePCsoundcardsarereadilyavailableandof sufficientqualityfor ourpurposes,


thesoundserver will capturedatafrom acommercialsoundcard.

5.3 Server Ar chitecture

Thedataserver architecturehasfour maincomponents:sensorhardware,aSensorSer-

ver, SensorDevice, and a datastreamconnectingthe SensorServer andSen-

sorDevice (Figure5.1). The sensorhardwarecomponentis an abstractionof both the

physicalhardwareanddevice drivers. Typically, a native interfaceis createdto allow the

device to communicatewith otherJava components.Also, althoughincomingdatais typ-

ically bufferedat thehardwarelevel, it cannotbeguaranteedto not drop framesof dataif

readtoo slowly. TheSensorServer is aprocesswhich resideson thecomputerhosting

thesensorhardware. This processqueriesthe sensorhardwareat a prescribedrate,starts

andstopsdatacapture,andbuffers the incomingdata. TheSensorDevice is a remote

interfaceto theSensorServer. It communicatesto theSensorServer via the Java

RemoteMethodInvocation(RMI) protocol. TheSensorDevice residesin the ACME

serverandis theACME experiment’s link to theSensorServer. A datastreamconnects

the SensorServer andSensorDevice. The streamusessocket communicationto

passdatafrom theSensorServer to theSensorDevice. Datais writtento thestream

by theSensorServer andbuffereduntil it is readby theSensorDevice. Thus,data

canbereliablystreamedto theACME experimentata rateslower thanits acquisitionat the

hardwaredevice.

This architectureenablesus to acquiredatafrom any computer, regardlessof its

platform. Only the SensorServer usesplatform-dependentnative code; the Sen-

sorDevice canresideon any remotecomputerthathostsa Java Virtual Machine.Since

eachSensorServer is written for a specificdevice operatingon a fixed platform, this

dependenceis not restrictive.

Not all datais passedfrom thesensorhardwareto theACME experiment.Usersde-

fine thecriteria for startingandstoppingdatacaptureby creatingcustomSensorTrig-

gers. OneSensorTrigger is createdto start capturing,and one to stop capturing.


Data Stream

RMI

control flow

data flow

Experiment

ACME Server

Sensor

Hardware Server

Sensor

Device

Sensor

Figure5.1: Thearchitectureof thedataserver is dividedinto four maincomponents:sensorhardware,SensorServer, SensorDevice anda datastream.TheSensorServerrunsasa separateprocessandis not necessarilyrun on thesamecomputerastheACMEserver.

TheseSensorTriggerscanbe monitoredby otherprocessesat runtimeby registering

aTriggerListener object. TheTriggerListener objectwill benotifiedwhena

SensorTrigger is activated.This mechanismprovidesa roughsynchronisationtool to

the ACME experiment. The next sectionclarifies the role of the SensorTriggers in

server execution.

5.3.1 Server Execution

This sectiondescribestheoperationof thedataserver. Its operationcanbestbeviewedas

thestatediagramin Figure5.2. Theserver beginsin aLIMBO state.This is a default state

indicatingthat it hasbeencreated,but not yet initialised. TheACME server initialisesthe

dataserver at startup.Thedataserver theninitialisessensorhardwareandcreatesrequired

databuffers.After initialisation,thedataserver waitsin aSTOPPED state.

The RUNNING stateis enteredby a requestfrom the ACME experimentto start

spoolingdata. In the RUNNING state,the dataserver executesa control loop at a rate

prescribedby the ACME experiment. The control loop is listed in Figure5.3. At each


START

startSpooling()

Trigger

init()STOPPEDLIMBO

CAPTURING RUNNING

Trigger

STOP

Figure5.2: Dataserver statediagram.

1. Readnext framefrom sensorhardware.

2. Evaluateframein START trigger.

3. If START triggerfires,changestateto CAPTURING.

4. Otherwise,loop to step1.

Figure5.3: Controlloop for RUNNING state.

iteration,the next frameof datais requestedfrom the sensorhardware. Here,a frameis

definedasa datameasurementfor onetime period(e.g.,oneimagefrom a camera,or one

6-valuereadingfrom a force/torquesensor).Thecurrentframeis passedwith a window of

previous datato the START trigger. If the conditionsfor the START trigger aresatisfied

by thecurrentframe,thedataserver is placedinto theCAPTURING state.Otherwise,the

RUNNING controlloop repeats.

OncetheSTART triggerhasbeenactivated,thedataserver is in theCAPTURING

state.TheCAPTURING staterunsacontrolloopsimilarto thatfor theRUNNING state,with

theexceptionthatdatais passedontothedatastream(Figure5.4). Datais readoneframe


1. Readnext framefrom sensorhardware.

2. Evaluateframein STOPtrigger.

3. If STOPtriggerfires,changestateto STOPPED.

4. Otherwise,sendframeto datastream.

5. Loop to step1.

Figure5.4: Control loop for CAPTURING state.

Trigger1 frame1 frame

Buffer

Transfer

RUNNING loop stops here

Buffer

HardwareBuffer

Data

Stream

Ring

reference

copy

Figure5.5: Server dataflow.

ata time from thesensorhardware,thenanalysedby theSTOPtrigger. If theconditionsof

theSTOP trigger aremet, thedataserver enterstheSTOPPED state.Otherwise,the data

is addedto thestreamandthecontrol loop repeats.Section5.3.2elaborateson theflow of

datathroughoutthisprocess.

5.3.2 Data Flow

Theprecedingdiscussionof server controlloopspresentedasimplifiedview of theflow of

datathroughthesystem.More detailsareincludedin thissection.Figure5.5 illustratesthe

flow of datathroughtheserver. Both theRUNNING andCAPTURING controlloopsusethe

samedataflow structureto thepoint indicatedin Figure5.5.

At eachiterationof the control loops,one frameof datais readfrom the sensor

hardware. As mentionedin Section5.3, the sensorhardware typically provides a data


buffer; this occursat either the hardware or device driver level. This buffer is assumed

to overwrite old dataasthe buffer fills. Thus, if datais not readquickly enough,frames

maybelost. It is assumedthateachframereadhasnotbeenreadpreviously.

This frameof datais copiedinto a ring buffer thatresidesin theSensorServer.

TheSensorServer guaranteesthat the datain the ring buffer will not be overwritten

until it hasbeenexamined.

The ring buffer is passedto the trigger object(eitherSTART or STOP depending

on thesystemstate).The trigger hasa definedwindow sizefor looking at thedata. This

window sizemustbe smallerthanthe sizeof the ring buffer andgreaterthanor equalto

oneframe.By definingawindow sizegreaterthanoneframe,thetriggercanperformtime-

domainfiltering of the signal,or usea time-dependenttrigger condition. An exampleof

sucha trigger is onethat is activatedby a signalthat is greaterthantheaverageof thepast

five frames.

If theserver is in theRUNNING state,thedataflow endshere.In theCAPTURING

state,thenew frameof datais appendedto a transferbuffer. Oncethetransferbuffer is full,

it is written to thedatastream.Thesizeof thetransferbuffer is controlledby theuser. For

datawhich is sampledat a high rate,it maybedesirableto buffer severalhundredframes

beforewriting to thestream.This reducestheamountof communicationoverheadneeded

to transmiteachframeover thenetwork.

TheACME experimentcanmonitor thedatacaptureby queryingthedatastream.

Oncethedatacaptureis complete,thedatastream’s end-of-fileflag is raised.

5.3.3 Specialisationto SoundData

The ACME soundsensorserver is a specialisationof the genericdataserver model. The

specialisationis straightforwardwith oneexceptionthatis explainedbelow.

Sounddatais typically capturedfrom the soundcard at a frame rate of 44 100

Hz. Oneframeof sounddatacontainsoneor two channelsof 16 or 8-bit soundsamples

(Figure5.6).


One AudioPacket

One Audio Frame (2 x 16-bit channels)

HL

LR

LRÁ Á Á Á Á Á Á ÁÁ Á Á Á Á Á Á ÁÁ Á Á Á Á Á Á ÁÁ Á Á Á Á Á Á ÁÁ Á Á Á Á Á Á Á

Â Â Â Â Â Â Â ÂÂ Â Â Â Â Â Â ÂÂ Â Â Â Â Â Â ÂÂ Â Â Â Â Â Â ÂÂ Â Â Â Â Â Â ÂÃ Ã Ã Ã Ã Ã Ã ÃÃ Ã Ã Ã Ã Ã Ã ÃÃ Ã Ã Ã Ã Ã Ã ÃÃ Ã Ã Ã Ã Ã Ã ÃÃ Ã Ã Ã Ã Ã Ã ÃÄ Ä Ä Ä Ä Ä Ä ÄÄ Ä Ä Ä Ä Ä Ä ÄÄ Ä Ä Ä Ä Ä Ä ÄÄ Ä Ä Ä Ä Ä Ä ÄÄ Ä Ä Ä Ä Ä Ä Ä

LH

Figure5.6: An AudioPacket containsmultiple framesof audiodata.Eachaudioframecontainsoneor morechannelsof data.Eachchannelcontainsoneaudiosample.Here,twochannels(L, R) of 16-bit (2-byte)samplesareillustrated.

As explainedin Section5.3.2,a lot of processinganddatamovementoccurswith

the receiptof eachframeof data. Although datais passedby referencewherepossible,

thereareinevitably datacopiesandmemoryallocationswhennew datais passedontothe

datastream.Thereis alsosomecomputationaloverheadeachtime thetriggeranalysesthe

data.

The genericserver modelcannotprocessincomingdataat audioframerates;be-

causedatais not readquickly enoughfrom thehardware,framesaredropped.To achieve

desiredframerates,audiodatamustbeprocessedin groupsof morethanoneaudioframe.

A simpleredefinitionof “frame” sufficesto achieve real-timeperformancewith thegeneric

dataserver model. Herewe defineanAudioPacket to be an arrayof oneor moreau-

dio frames(Figure5.6). TheAudioPacket becomestheconceptual“frame” thatis read

from thesoundsensorhardwareby thesoundserver. Performanceof thesoundserverusing

AudioPackets is discussedin Section5.4.1.

5.4 Implementation Details

Thegenericdataservermodelis anabstractJavaclass,with interfacesdefinedfor Sensor

andTrigger objects.Thesoundserver is alsowritten in Java with theexceptionof the

codethatreadsdatafrom thesoundcard.


Originally, datawasto bereadfrom thesoundcardusingtheJavaSoundAPI from

SunMicrosystems.At the time of development,JavaSoundwasa Beta release(version

0.86). Not all requiredfunctionality wasavailable,andfrequentchangesto the API ren-

deredthisoptionundesirable.

Currently, datais readfrom thesoundcardusingtheMicrosoft DirectX API (ver-

sion6.0). A Java wrapperwaswritten to interfacetheDirectX ‘C’ codewith therestof the

soundserver. Everyeffort wastakento createaJavawrapperthatparallelledtheJavaSound

API sothatit couldbesubstitutedata laterdate.

Two triggersareusedto capturesoundfor modelling:aThresholdTriggerand

FixedDurationTrigger. TheThresholdTrigger is usedasa START trigger to

begin capturingdataafter the amplitudeof the signalexceedsa threshold.TheFixed-

DurationTrigger is usedasa STOPtrigger to make eachrecordingthesamelength.

Thiscombinationof triggersyieldsrecordingsof theimpulseresponsesthatareidenticalin

lengthandcontainapproximatelythesamelengthof silencebeforeeachimpact.

5.4.1 PerformanceEvaluation

The soundserver wassubjectedto a performanceevaluationto guaranteethe integrity of

therecordings,andestimateboundsonthesizeof theAudioPacketsandtransferbuffer.

A one-second1 kHz sinusoidaltone was usedas the test stimulus. A cableconnected

the line-level outputof onecomputer1 to the line-level input of a secondcomputer2. The

soundserver ran on the secondcomputerand transferreddatato a client runningon the

first computerover an Ethernetconnection.A FixedDuration trigger wasusedasa

STOPtrigger to record2.5 secondsof soundfor eachtrial. In eachtrial, thesizeof either

the transferbuffer or AudioPackets wasvaried. The testwasrun five timesfor each

parametersettingwith the one-secondtoneplayedat a randomstart time within the 2.5

secondrecordingwindow. TherecordedsoundwassavedasaPCMwavefile andexamined

visually andaudibly for evidenceof degradation.Recordingswerecapturedat a 44.1kHz1A dualPII 350MHzrunningWindowsNT with a CreativeSoundBlasterLive! soundcard.2A Pentium120MHz runningWindows98 with a CreativeSoundBlasterLive! soundcard.


Time

Fre

quen

cy0 0.5 1 1.5 2

0

0.5

1

1.5

2

x 104

Figure5.7: Spectrogramof soundcontaining“pops”. Three“pops” arevisible in thespec-trogramas vertical lines at approximately1.4, 1.5 and 1.65 seconds.These“pops” arecausedby droppedaudioframes.This samplewasrecordedusinga transferbuffer of onesecond,andanAudioPacket sizeof 200. Therecordingparametersweretwo-channel,16-bit soundsampledat44.1kHz.

samplingrate.

Twocriteriaareusedtosubjectively measureperformance:thelengthof therecorded

tone,andthepresenceor absenceof “pops”. Thelengthof therecordedtoneis computed

manuallyusinga graphicalsoundeditor. If the recordedtoneis lessthanonesecond,the

parametersettingsaredesignatedunsatisfactory. Smallergapsin the recordingareexpe-

riencedas “pops” when heard,and observed as spikes in the spectrogram(Figure 5.7).

Parametersettingsproducing“pops” arealsounsatisfactory.

Resultsof this evaluationaretabulatedin Table5.1. From this table,it is evident

that theaudiosignalwill loselargechunksof dataif theAudioPacket sizeis not great

enough.Similarly, “pop”s will bepresentin therecordeddataunlessthetransferbuffer is

large enough.Fromthis data,it is concludedthata transferbuffer of two secondsandan

AudioPacket sizeof 100framesarerequiredfor adequateperformance.


TransferbufferChannels Bits AudioPacket size size(secs) Tooshort “Pop”s

1 16 1 1 Y N1 16 10 1 Y N1 16 100 1 N Y1 16 200 1 N N2 16 1 1 Y N2 16 10 1 Y N2 16 100 1 Y N2 16 200 1 N Y2 16 1000 1 N Y2 16 1 2 Y N2 16 10 2 Y N2 16 100 2 N N2 16 200 2 N N

Table5.1: Resultsof thesoundserver performanceevaluation. Highlightedrows indicateparametersettingsthataresuccessful.Recordingswerecapturedatasamplingrateof 44.1kHz on Pentium120MHz runningWindows 98.

5.4.2 Limitations

While performanceof the soundserver is acceptable,one limitation remains. Using the

DirectX v6.0 API preventsus from runningthesoundsensorserver on operatingsystems

other than Microsoft Windows 95/98/2000. As mentionedin Section5.3, this doesnot

prevent us from streamingdatato clientson other computingplatforms,but restrictsus

to Windows-compatiblesoundcards. Currently, Windows is the most widely-supported

operatingsystemfor soundcards,sothis is not a major limitation. Also, theDirectX Java

interfacemaintainsthe JavaSounddesignand classstructureto enableeasysubstitution

oncetheJavaSoundAPI is complete.

Chapter 6

Building a Prototypical Model

6.1 Overview

The hardwareandsoftwaredescribedin the previous two chaptersenablesus to acquire

multiplesamplesatany locationon thesurfaceof anobject.Thesesamplesareeachrepre-

sentedby asoundmodel.Thesoundmodelmaybeany oneof theimpulseresponsemodels

asdiscussedin Chapter2. Theparametersof themodelarecomputedfor eachsampleus-

ing theappropriateestimationtechnique.In our currentimplementation,themodelof van

denDoel [30] representseachsample;the parametersof the modelarecomputedby the

techniquepresentedin [30]. Referto Section2.4for abrief descriptionof this technique.

Theadvantageof acquiringmultiplesamplesat thesamelocationis thatinstrumen-

tation andbackgroundnoisecanbe minimisedby averagingthe samples.A prototypical

model, representative of multiple models,can be createdat eachsamplelocation. The

prototypicalmodelshouldbe lessaffectedby noisethaneachindividual model. Sucha

prototypicalmodelis usedfor comparingtheacousticdistancebetweentwo sampleloca-

tions; this useis discussedin Chapter7. A prototypicalmodel is alsousedfor synthesis

whentheobjectis simulated.

This chapteroutlinesone approachto generatinga prototypicalmodel from the

available data. The approachis an intuitive oneand producessatisfyingresults. While

37


thereareothersolutionsto this problemwhich mayproducebetterresults,our intentionis

to provide oneexampleby which to demonstratetheutility of suchmodels.

6.2 RelatedWork

Althoughour exactproblemappearsto beunique,two otherfieldsof audioresearchhave

producedrelatedwork: speechandspeaker recognitionandaudiomorphing. Neitherof

thesefieldssharesourexactgoal,but eachis similar in somerespect.

Speechandspeaker recognitionresearchershave investigatedmethodsfor cluster-

ing setsof audiodata.Their problemis oneof classicpatternrecognition:givena setof N

categories(e.g.,differentspeakersor words),andM trainingexemplarsfor eachcategory,

classifya new datasampleinto oneof the categories. At a high level, this is mostoften

accomplishedby creatingaprototypefor eachcategory from thetrainingset,thencomput-

ing a distancefrom thenew datato eachof theprototypes.Theseprototypesaretypically

formedusingamodifiedK-meansalgorithmonthevectorsresultingfrom LinearPredictive

Coding(LPC)analysis[24]. Similar methodswereimplementedon oursoundmodels,but

thevariability of ourmodels(dueto noise)causedunsatisfactoryresults.

Audio morphingis a processthat smoothlyblendsonesoundinto another. If you

imaginea continuumbetweentwo sounds,an audiomorphingalgorithmcangeneratea

soundatany pointonthecontinuum.Thissoundcontainsaproportionof eachsound,yet is

perceivedasasinglenew sound.Slaney etal. describeonesuchmethodin [27]. They claim

thatcross-fadingconventionalspectrogramsis not convincing if thetwo sourcesoundsare

notsimilar in pitch. Theirapproachis to representsoundsusingasmoothspectrogram(de-

rived from themel-frequency cepstralcoefficients[24]) anda residualspectrogramwhich

encodesthepitch. Interpolationoccursin thishigher-orderspectralrepresentationwhich is

theninvertedto producetheresultantsound.If theirmorphingalgorithmcouldbeextended

to morphbetweenmorethantwo sounds,thisapproachcouldbeusedto generateourproto-

typical soundmodelsby estimatingmodelparametersfrom thecompositesoundproduced

by themorph. Indeed,a morphingalgorithmis a powerful tool in its ability to weight the


contribution of eachsoundto themorphedresult.

6.3 Spectrogram Averaging

Theapproachwe useis similar in spirit to themorphingalgorithmof [27], but muchless

sophisticated.Weareableto useasimpleralgorithmbecausewearedealingwith anarrow

classof sounds:singleimpactsoundswhich decayexponentiallyandaresimilar in pitch.

Our approachis to computethe “average”spectrogramof M samplesat onesampleloca-

tion, thenestimatethemodelparametersfrom this “average”spectrogram.Theapproachis

intuitively satisfyingandproducesreasonablesoundmodels.

Computingthe averagetime signalof soundsamplesis non-trivial. Becausethe

modesof theimpulseresponsemaynotbein phaseacrosssamples,exactalignmentin time

is difficult. Phasedifferencesintroducedby inexact alignmentwill createa signalwhich

soundslike many separatesoundsplayedtogether.

In contrast,computingtheaveragespectrogramis easiersincethespectrogramcon-

tainsno phaseinformation. Furthermore,theaveragespectrogramis a naturalrepresenta-

tion for our task,giventhat themodelparametersareestimatedfrom spectrograms.Since

thesamplesarerecordedatthesamesamplelocation,weassumethesampleshavethesame

pitch, therebyavoiding theproblemsindicatedby Slaney et al. [27].

Aligning theM spectrogramsis straightforward. Sinceeachspectrogramrepresents

an impactsound,they canbe alignedby matchingtheonsetof impact. This onsetis rep-

resentedin the spectrogramasthe time framewith the maximumtotal amplitude. Once

aligned,theaveragespectrogramis computedby themeanamplitudeof eachfrequency in

eachtime frame.

Since the energy of eachimpact is not exact, the energy of eachsignal is nor-

malisedprior to computingits spectrogram(Equation6.1). Energy normalisationensures

that ÅÇÆÈ"É Ê Ë ÌPÍÎ .


ÏÐ"Ñ Ò ÓÕÔ×Ö Ø Ù ÚÛ Ü Ö Ø Ù Ú Ý (6.1)

Mathematically, our approachis alsosatisfying. If we assumewe arerecordinga

signal Þ ß Ñ Ò Ó which is composedof thetruesignal Ð"Ñ Ò Ó anda randomadditive noiseprocessà ß Ñ Ò Ó , thedevelopmentin Equation6.2shows thattheaveragespectrum áâ Ñ ãÓ is identically

equalto thetruespectrumä Ñ ãÓ if wealsoassumeazero-meannoiseprocess1.

Þ ß Ñ Ò ÓåÔ¦Ð"Ñ Ò Ó"æTà ß Ñ Ò Óâ ß Ñ ãÓÕÔ ä Ñ ãÓ"æ®ç ß Ñ ãÓáâ Ñ ãÓèÔ ÜTéê ë.ì í ê Ø î.Úï

Ô Ü éê ë.ì ð Ø î.Úï æ Ü éê ë.ì ñ ê Ø î.ÚïÔ ä Ñ ãÓ"æ áç$Ñ ãÓ

(6.2)

6.4 PerformanceEvaluation

To evaluatethe effectivenessof spectrogramaveraging,a testwasconductedusingsyn-

thetic data. This test is similar to the evaluationof the parameterestimationalgorithm

(Section2.4.1).

For thisevaluation,M sampleswerecreatedfor eachtrial by addingnoiseto sounds

synthesisedusingfifty randommodes.A descriptionof thenoiseandsynthesisedsoundsis

foundin Section2.4.1.Eight signal-to-noiseratioswereused: ò (i.e., no noise),100,50,

30, 20, 15, 10 and5. Averagingover theM samplesproduceda spectrogramfrom which

modelparameterswereestimated.Theexperimentwasconductedusingfivevaluesof M: 1

(control),2, 5, 10 and20. Onehundredtrials wereconductedfor eachpairingof SNRand

M values.

The resultingsoundmodelsareevaluatedby thesamemeasuresasSection2.4.1;

themetricsarerepeatedin Equation6.3.1This characterisationof noiseis a commonassumption,thoughunlikely to be correctin our

situation.


ó¤ôöõ¦÷ ø ùjúô ûô üó¤ýöõ¦÷ ø ù úý ûý üó¤þöõ¦÷ ø ù úþ ûþ ü

(6.3)

whereÿ is chosento minimise ÿ õ .Themeanerror for eachsignal-to-noiseratio (SNR) is plottedby thedashedlines

in Figure6.1. Thesolid linesarethemeanerrorover all eightSNR’s. Thecolourof each

line indicatesthenumberof samples(M) usedin thespectrogramaverage.As Figure6.1

shows,spectrogramaveragingsubstantiallyreducesthemeanerrorfor initial amplitudeand

frequency estimates.Even averagingtwo samplesyields improvementsof 20% and12%

onó¤þ

andóPô

respectively.

The meanerrorof thedampingparameteris not significantlyreducedby spectro-

gramaveraging.Thestandarddeviationof theerroris, however, dramaticallyreducedwith

increasingvaluesof M. As Figure6.2 illustrates,thestandarddeviation of error is reduced

by anaverageof 61%usingonly fivesamples.This resultimpliesthatspectrogramaverag-

ing yieldsamoreconsistentestimationof thedampingparameter.


0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

noise/signal

Ea

1 sample

2 samples

5 samples

10 samples

20 samples

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.210

15

20

25

30

35

40

45

50

55

60

noise/signal

Ef

1 sample

2 samples

5 samples

10 samples

20 samples

(a) Meaninitial amplitudeerrors( ). (b) Meanfrequency errors( ).

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

0.5

1

1.5

noise/signal

Ed

1 sample

2 samples

5 samples

10 samples

20 samples

(c) Meandampingerrors( ).Figure6.1: Meanerrorsof the parameterestimationalgorithm on spectrogram-averagedsyntheticdata.Themeanerrorover100trials is plottedfor eightsignal-to-noiseratios( ,100,50, 30, 20, 15, 10 and5) by thedashedlines. A solid line is themeanerrorover allsignal-to-noiseratios.Thecolourof eachline signifiesthenumberof samples(M) usedintheaverage.For convenience,the inverseratio (noise/signal)is usedastheabscissa.SeeEquation6.3for definitionsof

,

and

.


0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

50

100

150

200

250

300

350

400

450

noise/signal

Ed

1 sample

2 samples

5 samples

10 samples

20 samples

Figure6.2: Standarddeviation of error for estimateddampingparameters.The standarddeviation of the meanerror over 100 trials is plotted for eight signal-to-noiseratios( ,100, 50, 30, 20, 15, 10 and 5) by the dashedlines. A solid line is the meanstandarddeviation over all signal-to-noiseratios. The colour of eachline signifiesthe numberofsamples(M) usedin theaverage.For convenience,the inverseratio (noise/signal)is usedastheabscissa.SeeEquation6.3for thedefinitionof .

Chapter 7

An AdaptiveSampling Algorithm

7.1 Overview

To completelyautomatethecreationof soundmodelsfrom measurements,theselectionof

samplinglocationsmustalsobeautomatic.By samplinglocationwe meanthelocationon

thesurfaceof an objectwhereit is to be struckby the soundeffector. Themost intuitive

approachis to selecta uniform grid over thesurfaceof theobject. Without knowledgeof

the object’s shape,a uniform grid in Cartesianspacecould alsobe considered.This was

demonstratedin [25] and[26].

As partof theActive Measurementfacility, we assumethata surfacemodelof test

objectsis attainableusingtheTriclopstrinocularcamera1. With a surfacemodelavailable,

auniform grid canbeprojected,andthesamplelocationsuniformly distributed.

Onequestioninevitably ariseswhengeneratingtheuniformgrid: how finely should

we sample?Many everydayobjectsarecomposedof differentmaterials(e.g.,a glassjar

with a metal lid), or have density, volume and thicknessvariationsacrosstheir surface.

Thesepropertieswill causethe soundof the objectto changeperceptuallyover small re-

gionsof the surface. Otherobjects(e.g.,an eraser)may have a relatively constantsound1Currently, softwareto generatesurfacemodelsfrom the Triclops’ rangedatais not available.

This is ongoingwork of theACME project. For theexamplesin this thesis,surfacemodelsof testobjectsareconstructedmanually.

44


1. Selectanunsampledvertex asthenext samplelocation.

2. Strike theobjectat thesamplelocationandcreateasoundmodel.

3. Comparethenew modelto themodelsatall adjoiningsamplelocations.

4. If the acousticdistancebetweentwo adjoiningmodelsis greaterthana perceptualthreshold,addanew vertex betweenthesetwo.

5. Otherwise,repeatfrom Step1 until noverticesareunsampled.

Figure7.1: Adaptive samplingalgorithm.

over theentiresurface.Consequently, theselectionof a multi-purposesamplingdensityis

non-trivial. It cannotbeconstant,nor canit bepre-determinedby objectgeometryalone.

The algorithmpresentedin this chapterchangesthe densityof a samplingmesh

basedon perceived differencesin soundover the surfaceof an object. This algorithmis

straightforward andis listed in Figure7.1. Here, the samplelocationsarechosenas the

verticesof a surfacemodelrepresentingthe testobject. At eachvertex, a soundmodelis

created.If theacousticdistancebetweenthismodelandoneatanadjoiningsamplelocation

is greaterthanaperceptualthreshold,anew samplelocationis addedbetweenthetwo (see

Figure7.2). Thenew samplelocationshouldbea vertex in the refinementof thesurface.

This procedurecontinuesuntil the acousticdistancebetweenall adjoiningmodelsis less

thanaperceptualthreshold.

Selectionof unsampledverticesin thefirst stepof thealgorithmis conductedby any

heuristicrule. Currently, thevertex nearestthepreviouslysampledvertex is selected.Other

possibleheuristicsincludeselectingverticesby their connectivity in themeshor selecting

verticesto minimisetheamountof movementby therobots.

Thealgorithmin Figure7.1 is simplistic,but relieson threecomplex components:

aprocedurefor addingnew verticesto thesurfacemodel,anacousticdistancemetricanda

perceptually-relevant thresholdfor this metric. This chapterdescribesoneimplementation


Figure7.2: Resultof adaptive sampling. The soundmodelsat two adjoiningsamplelo-cationsarecompared(left). If theacousticdistancebetweentheverticesis too great,thesurfaceedgeis refinedby insertinganew vertex (right).

of eachof thesecomponents.Notethat thesecomponentsareresearchtopicson their own

and it is not presumedthat the implementationsdiscussedhereinare optimal solutions.

Thealgorithmin Figure7.1 is applicableto any vertex insertionrule, distancemetricand

threshold,andshouldbeconsideredtheprincipalresultof this chapter.

7.2 RelatedWork

To ourknowledge,this is thefirst algorithmfor adaptively selectinglocationsonthesurface

of anarbitraryobjectfor purposesof soundmodelling.Previouswork onmodellingof mu-

sicalinstrumentsfrom empiricalmeasurementsis vaguein its selectionof samplelocations;

two examplesare[2] and[4]. In [2] aguitaris “thumped”on thebridgeby a “sharpinstru-

ment” to recordmeasurementsfor estimatesof a bodymodel.A physics-basedmodelwas

usedto estimatethesamplelocationof pluckedstringsfrom therecordings.In [4], strings

of musicalinstrumentswerestruckusinga forcehammerat thepoint on the instruments’

bridgewerethestringsmadecontact.

Rulesfor the addition of new verticesto a surfacehave beenexplored in multi-

resolutionsurfaceresearch.Multi-resolutionsurfacesareappealingin computergraphics

becausetheir level-of-detailnaturesimplifieseditingandprovidesscalablerenderquality


whichcanspeedcomputeranimationdevelopment.Wehave selectedasubdivision surface

representation,using the Loop [19] schemefor vertex insertion. A brief descriptionis

providedin Section7.4.

Distancemetricsfor soundhave beenexploredin severaldomains.Early efforts in

this researchwerein thefield of speechrecognition.A classicsurvey of techniquesis [11].

Dubnov andTishby exploredcomparingmusicalsoundsusinga spectralandbis-

pectralacousticdistortionmeasure[7]. Their approachis to measurestatisticalsimilarity

usingtheKullback-Lieblerdivergencebetweenmodels.

More recentwork, which focusedon creatinga “content-based”soundbrowser

(MuscleFish),usestwo levelsof featuresto form acomplex parameterspacewithin which

distancesarecomputed[15, 33]. Onesetof featuresis extractedfrom eachframeof audio

data:loudness,pitch,brightness,bandwidthandmel-filteredcepstralcoefficients(MFCCs).

Thetime seriesof theseframevaluesprovidesa secondsetof features:themean,standard

deviation andderivative of eachframe-level parameter.

To our knowledge,noneof the metricslisted above have beenusedto determine

perceptualthresholdson similarity. By “similarity” we meanin thecontext of perceiving

two soundsasbeingproducedby thesameobject. In our application,we requirea metric

by which we may decideif two similar soundsare perceived as separateobjectsand/or

materials. Someresearchhasexaminedthe factorsby which differencesin materialand

geometryareperceivedfrom audition[8, 9, 12, 16]. While this researchdoesnot examine

acousticdistancesdirectly, it doesprovideempiricalresultsindicatingwhichparametersof

a soundmodelarerelevant to materialandgeometricperception.Specifically, theanalysis

in [16] suggestsa metric by which modelparametersarerelatedto perceptualsimilarity.

Becauseempirical thresholdsareavailable for this metric, it wasselectedfor usein this

implementationof the adaptive samplingalgorithm. Details of the metric are statedin

Section7.5; Section7.6 containsa discussionrelatingthe resultsof [16] to thresholdson

thismetricusefulto ourapplication.


7.3 Requirements

The samplingalgorithmmustselectpointsover the surfaceof the testobject in a dense-

enoughmeshso as to captureperceptuallyrelevant differencesin soundover the entire

surface.Satisfactionof this requirementis difficult to quantifyandrequiresfuturepercep-

tual studies.

7.4 SurfaceRepresentation

As mentionedin Section7.2,asubdivisionsurfacerepresentationis usedto facilitatevertex

insertion. A subdivision surfacerepresentsa smoothsurfaceasthe limit of a sequenceof

successive refinementsof acoarsemesh[6]. Thatis, it representsacoarseapproximationto

thetrue(smooth)surfacewith afinite numberof vertices,yet canbesystematicallyrefined

by addingnew verticesuntil, in thelimit, thesmoothsurfaceis produced.

Severaldifferentschemesexist for specifyingthelocationof verticesaddedduring

refinement.Typically, the locationsof new verticesareweightedcombinationsof neigh-

bouringcoarsevertices’positions. We usetheLoop schemeasfirst proposedby Charles

Loop[19]. TheLoopschemeusesthevertex masksin Figure7.3to calculatethepositionof

verticesin thenext level of refinement.Thesemasksareappliedto a coarsesetof vertices

to calculatethepositionof verticesin thenext level of refinement.Themaskon theleft is

usedto inserta new vertex on theedgebetweentwo coarsevertices.Themaskon theright

is usedto refinethepositionof eachcoarsevertex. Thelimit positionof interior verticesis

computedby replacing in Figure7.3with ! " #$ [6]. For verticeson theboundary

of a surface,thelimit positionis computedby changingthecoefficientsof theeven-vertex

boundarymaskto % , % and % [6].

Thedetailsof therefinementscheme,andits effectonthegeometryof themesh,are

not importantto our application;therefinementschemesimply definesa hierarchyof suc-

cessive edgerefinements.Interestedreadersshouldconsult[6] for a thoroughintroduction

to subdivision surfaces.


8

3

8

3

8

1

1

8

β

β

βInterior

ββ

β

1

8

13

82

1 1

2 4

Crease & Boundary

Masks for edge refinement

(odd vertices)

1-k

(even vertices)

Masks for vertex refinement

Figure7.3: Loop schemerefinementsmasksfor subdivision. Maskson the left areusedfor edgerefinement;maskson the right areusedfor vertex refinement.& is chosento be'() * + ,.-/) 012 '354 6 798 :(<; ; 8 .


In our implementation,webegin by samplingthesoundat thelimit positionof each

vertex in the coarsestrepresentationof the surfacemesh. The acousticdistancebetween

verticesjoined by an edgeof the meshis thencomputed. If the acousticdistanceis too

great,thevertex which is therefinementof thejoining edgeis addedto the list of vertices

to besampled.

Theminimum vertex-spacingrequiredfor adequatesurfacerepresentationandre-

finementdictatesthe maximumspacingof the soundsamplingmesh. For somesimple-

soundingobjects,this may producean overly densesamplingmesh. This is not a large

concernfor applicationssuchassimulationsincethedensermeshwill rely lesson interpo-

lation betweensoundmodelsduringsynthesis.

7.5 AcousticDistanceMetrics

As prescribedby theadaptive samplingalgorithm(Figure7.1), anacousticdistancemust

becalculatedbetweeneachpairof verticesjoinedby anedgein thesurface.Thecalculated

distancewill be comparedto a perceptualthresholdto make a refinementdecision. The

selecteddistancemetric must thereforebe indicative of the perceptualproximity of two

sounds.

Most applicationsof acousticdistancemetricsdiscussedby the literaturein Sec-

tion 7.2usedistanceto make relative comparisons.For example,classifyinga new sound

by choosingthegroupof exemplarsoundsto which it is nearest.Our applicationis differ-

entbecausewe needto determineif two modelswill beperceivedasdifferentmaterialsor

shapesoncesynthesised.The realismof a soundmodelrequiresthat thesoundproduced

over thesurfaceof anobjectis notdiscontinuousin regionswhereit shouldvary smoothly.

The distancemetric must thereforebe applicableto an appropriateperceptualthreshold.

Furtherdiscussionof sucha thresholdis delayeduntil Section7.6.

Most soundmetricscalculatea distancedirectly from waveforms.Wechoseto use

a metricderivedfrom modelparametersinstead.Much of thework on materialperception

anddiscriminationdiscussesdiscriminantsin termsof modelparameters.It is therefore


convenientto expressthedistancemetricon theseparametersaswell.

Specifically, we usea logarithmicratio of the frequency-independent dampingco-

efficient anda ratio of modalfrequenciesasmetrics.Thesemetricsweresuggestedby the

researchof Klatzky etal. in theirpaperonauditorymaterialperception[16]. Sections7.5.1

and7.5.2describethesemetricsin greaterdetail.

7.5.1 Frequency-IndependentDamping Coefficient

As statedpreviously, and repeatedin Eq. 7.1, the soundmodel is composedof =<> fre-

quency modes,eachwith a distinct frequency (?A@ B C ), initial amplitude( D @EB C ) anddamping

coefficient ( F @EB C ).

GAH IJ K LMNPOQC RPS D @EB C T UV W X Y Z [ \E] ^ _

H ?A@ B C K L (7.1)

It is theorisedthatthedampingcoefficient in Eq. 7.1( F @EB C ) is relatedto amaterial’s

internalfriction parameter( ` ) by Equation7.2[9, 12, 16].

F @EB C M/a ?A@ B C b c _ H ` L (7.2)

Furthermore,it hasbeendemonstratedthat theinternalfriction parameter( ` ) is an

approximateshapeandfrequency-independent materialproperty[18]. A recentstudy[16]

hasshown thata frequency-independent dampingfactor(d Mfe b c _ H ` L ) is a perceptually

usefuldiscriminantof material.An estimateof thisparameter( gd @EB C ) canbecalculatedfrom

theestimateddampingcoefficient ( gF @EB C ) of eachfrequency mode:

gd @EB C M e gF @EB Ca ? (7.3)

gF @ B C is only anestimateat onefrequency andis subjectto noiseandvariability. A

betterestimateof the frequency-independent dampingcoefficient ( gd @ ) is computedasthe

medianof gdE@ B C for all frequency modesh . Given multiple samplesat a singlevertex, the


estimateof thedampingcoefficient is improvedby calculatingi j asthemedianof kiEj over

all samples.

Thedistancebetweentwo vertices’dampingcoefficients(iEl and inm ) is expressed

asa logarithmicratio:

oqp r s t uAv w xEy z| q~9 <iEliEm (7.4)

7.5.2 FrequencySimilarity

Using frequency similarity asa distancemetric is supportedby recentperceptualstudies.

Klatzky et al. found frequency anddampingto be “independentdeterminantsof similar-

ity” [16]. Theresultsof their studyshow thatfrequency anddecayareusedindependently

whenjudgingthesimilarity of sounds, but in combinationwhenclassifyingmaterial. Ad-

ditionally, experimentsby Hermes[12] andGaver [9] supportthetheorythat frequency is

animportantperceptualfeaturefor material,shapeandsizeestimation.

We definethe frequency distancebetweentwo modelsby Equation7.5. Modes

arematchedbetweenmodelsusingthealgorithmin Figure7.4. This algorithmguarantees

thateachmodein modelA will beuniquelymatchedto a modein modelB. Detailsof the

implementationareincludedin AppendixC.

oqp r s t uPv w Ay z| q~9P E n Py E n ~

P E P n (7.5)

where AlP and AmA are the frequenciesof modesmatchedin modelsA and B by the

algorithmin Figure7.4.

Thefrequency distancebetweenonemodelandacollectionof modelsis computed

as the frequency distance(Equation7.5) betweenthe single model and the prototypical

modelof thecollection.Chapter6 describestheprocessof creatingaprototypicalmodel.


For eachfrequency modeAP in modelA . . .

1. Find thefrequency modeAA nearestto AP 2. If the AA is unmatched,matchit to AP 3. Or, if A is matchedto anothermodewhich is fartherthan , matchA

to AP instead.Mark 9| asmatched,andthemodeit replacesasunmatched.

4. Otherwise,loop to step1, picking thenext nearestfrequency mode

Repeatuntil all frequency modesin modelA arematched.

Figure7.4: Algorithm to find uniquefrequency mappingbetweentwo models.

7.6 PerceptualThr esholds

As mentionedin Sections7.2and7.5,thedistancemetricsselectedfor this implementation

weresuggestedby theanalysisin [16]. Themotivationfor thisselectionwastheavailability

of empiricaldataratingtheperceptualsimilarity of soundsby thesemetrics.

Thestudyof Klatzky et al. askedsubjectsto ratethesimilarity of two synthesised

sounds.In their first two experiments,subjectsratedhow likely two soundswereto have

beenproducedby thesamematerial,regardlessof shape.In thethird experiment,subjects

ratedthe relative lengthof the barsthatwerepurportedto have producedthe synthesised

sounds.While thesetasksarenot identicalto theours,we feel they aresimilar enoughto

providesuitablethresholdsin theabsenceof moreappropriatedata.Theappropriatenessof

themetricsandthresholdsto ourapplicationcanbeproperlyevaluatedonly by aperceptual

studyof thesynthesisedmodels.

The resultsof Klatzky et al.’s first two experimentsaresummarisedin Table7.1.

The averageof the two experimentsis listed in the last row of the table. The regression

coefficientsexpressthechangein perceivedsimilarity for a unit changein thelogarithmic

ratiosof fundamentalfrequency anddecayconstant.Thesimilarity ratingsareonascaleof

0 to 100,but theperceivedratingof identicalsoundswasfoundto beapproximately92.45


Freq.Diff. DecayDiff. ProductExperiment Intercept Coeff. Coeff. Coeff. ¡.¢

1 88.7 -0.56 -1.02 0.30 0.762 96.2 -0.50 -1.06 0.31 0.80

mean 92.45 -0.53 -1.04 0.305 0.78

Table7.1: Regressionof similarity on frequency anddecaydifference[16]. Themeanofthevaluesfor thetwo experimentsis alsolisted.

(meanintercept).Referto [16] for moredetailsof theexperimentsandresults.

Themeanvaluesin Table7.1arescaledby a constantsimilarity factor(S) to form

the thresholdsfor our distancemetrics. The similarity factoradjuststhe amountof per-

ceivedmaterialdissimilaritypermittedbeforea refinementis required.Table7.1 suggests

threevaluesfor a threshold:£q¤ ¥ ¦ § ¨P© ª «E¬ |® ¯q°±³² ´ µ ¶E· , £q¤ ¥ ¦ § ¨P© ª ¸A¬ |® ¯q°±µn´ ¹ º · , and

£q¤ ¥ ¦ § ¨A© ª «E¬ |® ¯q°A»£q¤ ¥ ¦ § ¨A© ª ¸¬ |® ¯q°|±¼µn´ º µ ¹E· . In our implementation,if any of these

thresholdsis exceeded,a refinedvertex is added.

Currently, a empiricalvalueof S = 0.75is usedasthesimilarity factor. This value

wasestimatedby reviewing coarsedatacollectionsandmanuallyselectingwhich models

requiredrefinement.As anexample,Table7.2summarisesthecalculatedsimilarity factors

of modelscollectedfrom threeobjects:a brassvase,glasswine bottleandplasticspeaker.

The last row of Table7.2 comparestwo locationson the brassvase. Only ten frequency

modeswereusedto modelthewine bottleandplasticspeaker. Forty modeswereusedto

modelthebrassvase.Most of thedistancesreportedagreewith expectation.Althoughthe

distancebetweenthebrassvaseandwine bottlearelow, it is not a surprisingresultwhen

thebrassvaseis synthesisedwith only tenmodes,sinceit thensoundsvery similar to the

winebottle.Futureperceptualstudiescouldbeusedto determinea lessadhocvalueof S.

It shouldbenotedthatthefrequency distanceof Equation7.5 is not identicalto the

expressionusedto computethecoefficientsin Table7.1. In their experiments,Klatzky et

al. comparedonly thefundamentalfrequency of their synthesisedstimuli. Herehowever,

we comparetheaverageof all frequency modesof themodel,but weighttheir contribution

to theaverageby their initial amplitude.This approximationhasprovedsuccessful,aswill


Similarity Factor(S)ObjectA ObjectB Frequency Decay ProductBrassvase Plasticspeaker 1.14 0.781 1.61Plasticspeaker Wine bottle 1.05 0.585 1.11Wine bottle Brassvase 0.267 0.196 0.0943Brassvase(I) Brassvase(II) 0.0919 0.000125 0.0000208

Table7.2: Examplesof calculatedsimilarity factors.

beillustratedby thesamplecollectionsin Chapter8.

Chapter 8

SampleData Collections

8.1 Overview

This chapterpresentsthe resultsof four sampledatacollections. The first collection, a

tuningfork, is meantasacalibrationexperiment.A surfacemodelof thetuningfork is not

used,nor is the adaptive samplingalgorithmof Chapter7. For the otherthreeobjects,a

brassvase,plasticspeaker andtoy drum,theentiresystemis usedto build asoundmodel.

The objectswere selectedto provide examplesof a variety of materials. Also,

sincetheshape-acquisitioncomponentof ACME is notcompleted,werequiredobjectswith

geometriesthat could be easilymodelledmanually. Eachsurfacemodelwasconstructed

from manualmeasurementsusing3D modellingsoftware.

Eachtestobjectwasmountedon theACME teststation.Objectswhosediameters

aresmallerthanthediameterof the teststationcouldnot besampledbelow heightsof 30

mmdueto collisionsbetweenthesoundeffectorandtheteststation.For futurecollections

requiringcompletecoverage,objectscouldberaisedon anarrow pedestal.

The following four sectionsdiscussthe testobjects,experimentalsetup,and the

resultsof thedatacollections.

56


Coarse Microphone Numberof Similarity NumberofObjectName Vertices Distance(mm) Modes Threshold SamplesTuningfork 1 5 5 N/A 5Brassvase 136 190 40 0.75 5Plasticspeaker 98 40 10 0.75 5Toy drum 31 130 40 0.75 5

Table8.1: Summaryof setupparametersfor testobjects.

Figure8.1: Setupfor acquiringmodelof tun-ing fork.

Time

Fre

quen

cy

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Figure8.2: Recordedspectrogram.

Time (sec)

Fre

quen

cy (

Hz)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

50

100

150

200

250

300

350

400

450

500

550

(a) Synthesisedspectrogram. (b) Closeup.

Figure8.3: Thefigureontheleft (a)showsthespectrogramof asoundsynthesisedfrom theprototypicalmodelof thetuning fork. Thefigureon theright (b) is a closeupof thespec-trogramshowing theestimationof thefundamentalfrequency at 430.7Hz anda harmonicat215.3Hz.


8.2 Tuning Fork

An A-440 tuning fork wasusedasa calibrationobject. The tuning fork wasmountedon

theACME teststationasillustratedin Figure8.1. As is alsoshown, themicrophonewas

located5 mm from thenearesttyne. Thetuningfork wasstruckfive timesat onelocation

to producefive three-secondrecordings.A five-modeprototypicalmodelwascreatedfrom

thefiverecordings.Table8.1summarisesthesetupparametersfor all of thetestobjects.

8.2.1 Estimation Results

Figure8.2 displaysthespectrogramof onerecording.The440Hz toneis present,asis a

harmonicat220Hz andseveralovertonesat9 430Hz and10120Hz. Whenthetuningfork

is struck,only theovertonesareaudiblefrom a distance.At its closeproximity, however,

themicrophonewasableto recordthelow amplitude440Hz tone.

Figure8.3 containstwo spectrogramsproducedby a soundsynthesisedfrom the

prototypicalmodel.As bothspectrogramsshow, theprototypicalmodelcontainsrelatively

accurateestimatesof the two harmonicsandtheovertonespresentin theoriginal spectro-

gram. Themoderepresentingthe440Hz tonewasestimatedat 430.7Hz. The frequency

wasestimatedfrom a 1024-pointDiscreteFourierTransform(DFT), with a frequency res-

olution of 43 Hz. Sincetheestimationis within 43 Hz of thetruefrequency, thetestwasa

success.

8.3 BrassVase

The brassvasedisplayedin Figure8.4 (a) wasthe first testobject for which a complete

soundmodel was generated.The subdivision surface in Figure 8.4 (b) representedthe

vasefor the adaptive samplingalgorithm. This coarsemeshcontains136 vertices. Forty

frequency modeswere estimatedat eachsamplelocation,andprototypicalmodelswere

producedusingfiverecordingsateachlocation.

Thevasewassecuredto theACME teststationusingthin double-sidedtape. The


(a) (b)

Figure8.4: A photo(a) of brassvaseandthesubdivision surfacewhich representsit (b).

Figure8.5: Setupfor acquiringsoundmodelof brassvase.


microphoneon thefield measurementsystem(FMS) wasusedto recordthesamples,and

was located190 mm behindthe vase(seeFigure8.5). Early experimentsusing the mi-

crophonemountedon thesoundeffectorproducedpoorsoundmodelsdueto thetransient

effectsof clipping andechoes.Theseeffectsarediminishedby recordingin thefar field.

At eachsamplelocation, the vasewasstrucknormal to the surfaceby the sound

effector.


Figure8.6 is a comparisonof spectrogramsof synthesisedsoundsandrecordedsamplesat

threepositionsonthevase.Whitenoisewasaddedto thesynthesisedsoundsatasignal-to-

noiseratio approximatingthesignal-to-noiseratio of therecording.Theadditionof noise

createsspectrogramsthataremorecomparableto theoriginals.AppendixB discussesthis

techniquewith examples.

Thefrequency modeswereestimatedquiteaccuratelyateachlocationin Figure8.6.

Audibly, thesynthesisedsoundsat mostsamplelocationson thevasewerecomparableto

therecordings.

Most often,any differencein thesoundswasa lower perceivedpitch. Evidenceof

this is presentin thespectrogramsof Figure8.6,particularlyatZ = 90 mm. A narrow band

of highenergy noiseis visible in therecordedspectrogramfrom approximately0 to 300Hz

(Figure8.7). This bandof noisewasestimatedasa modein themodelat 215 Hz with a

very smalldampingconstant(0.452).In fact,thismode’s dampingconstantis smallerthan

any otherof themodesby at leastoneorderof magnitude.

Thoughbackgroundnoiseis concentratedbetween0 and300Hz, a moderatelevel

of noiseis alsopresentin a bandfrom 300 to 600 Hz (Figure8.7). This bandof noise

artificially reducedtheestimateddampingconstantsof frequency modeswithin thatrange.

As an example,two modeswereestimatedat 646.0Hz and473.7Hz with dampingcon-

stantsof 8.6 and4.5 respectively. Thoughrecordedmodesin this rangetypically lasted

approximately0.1 seconds,theseestimatedmodesremainat significantamplitude( ½ -3


Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(a)Recordedspectrogram(Z = 90 mm). (b) Synthesisedspectrogram(Z = 90 mm).

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(c) Recordedspectrogram(Z = 61 mm). (d) Synthesisedspectrogram(Z = 61 mm).

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(e)Recordedspectrogram(Z = 45 mm). (f) Synthesisedspectrogram(Z = 45 mm).

Figure8.6: Resultsof brassvaseexperiment.Spectrogramsof recordedsamplesandthosesynthesisedfrom prototypicalmodelsarecomparedat threepositionson the brassvase:Z = 90, 61 and45 mm. White noisewasaddedto the syntheticsoundsto producemorecomparablespectrograms.


Time (sec)

Fre

quen

cy (

Hz)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

200

400

600

800

1000

1200

1400

1600

Figure 8.7: Detail of narrow-band low-frequency noise. A high-energy bandofnoiseis visible between0 and 300 Hz.Moderatelevelsof noisearealsovisiblefrom 300to 600Hz.

Time (sec)

Fre

quen

cy (

Hz)

0 0.2 0.4 0.6 0.80

200

400

600

800

1000

1200

1400

Time (sec)

Fre

quen

cy (

Hz)

0 0.2 0.4 0.6 0.80

200

400

600

800

1000

1200

1400

Figure 8.8: Effect of noise on low-frequency modes.Thoughrecordedlow-frequency modesareaudiblefor approx-imately0.1seconds,moderatenoiselev-els sustainthe estimatedmodesto ap-proximately0.44and0.75seconds.

dB) for 0.44and0.75seconds(Figure8.8). This artificial sustainof low-frequency modes

alsocontributesto thelower perceivedpitch.

Thesignal-to-noiseratio for theserecordingswasin therangeof 30 to 40.

Numberof verticesObjectName Coarse Rejected Missed Added SampledTuningfork 1 0 0 N/A 1Brassvase 136 56 14 93 173Plasticspeaker 98 41 14 28 85Toy drum 31¾ 0 4 163 190¿

Numberof verticesof thetopsurface.

Table8.2: Summaryof refinementresults. This tablesummarisesthe numberof coarse,rejected,missed,addedandsampledverticesfor all of thetestobjects.A rejectedvertex isonewhich is out of theworking envelopeof thePumaarm. Missedverticeswerecountedwhenno forcewassensedatasamplelocation(i.e.,ahole).


(a) (b)

Figure8.9: Refinementresultsof brassvaseexperiment.Therefinedsamplelocationsareplotted on the surfacemodel. Despitegeometricsymmetry, refinementpatternsare notsymmetricas the two views (a) and(b) illustrate. Verticesarecolour-codedby revisionnumber;black,grey andwhite representcoarse,first andsecondrefinements.

8.3.2 RefinementResults

Refinementof thesamplingmeshby theadaptive samplingalgorithmis illustratedin Fig-

ure8.9.Table8.2summarisestheresultsof refinementfor all thetestobjects.An interesting

resultof the vase’s refinementis its unusualasymmetry. Sincethe vaseis approximately

circularly symmetric,it wasexpectedthat the refinementpatternwould alsobe symmet-

ric. As shown in Figure8.9,however, refinementvariedgreatly. The imageon the left (a)

shows many refinedsamplinglocations,while the imageon theright (b) is almostvoid of

refinements.Therearetwo possibleexplanations.First, it is possiblethat the vaseis not

acousticallysymmetric. If so, this exampleis a strongargumentfor the necessityof an

adaptively samplingalgorithm.Alternatively, it maybethat theacousticdistancebetween

coarsemodelsis very closeto the threshold.If so, thevariability of parameterestimation

maybesufficient to occasionallyincreaseacousticdistancesabove thethreshold.Giventhe

regularpatternof revision apparentin Figure8.9(a), thisexplanationis unlikely.

Oneaspectof thevase’s geometryintroduceda difficulty for thesystem:thereare

threerows of small holesaroundthe mouthof the vase. The experimentis programmed

to identify missedsamplelocationsif no contactis sensed.With this particularobject,


though,theoutercaseof thesolenoidoften contactedthesidesof a holeeven thoughthe

plungerpassedthrough.Whenthis occurred,thesystemacquiredsamplesof nothing. Of

course,thesedegeneratesamplesintroducedrefinementsaroundtheseholes.Althoughthis

mayresultin overly densesamplingof thetop rim, it alsoincreasesthelikelihoodthat the

surfacessurroundingtheholeswill besampled.

8.4 Plastic Speaker

A completesoundmodelwasalsogeneratedfor thesmallspeakershown in Figure8.10(a).

Thespeaker is completelyplastic,with theexceptionof ametalgrill coveringthefront face.

A cubemeshwith 98 verticesrepresentedthespeaker for theadaptive samplingalgorithm

(Figure8.10(b)). Althoughthespeaker’s surfacecouldbeadequatelydescribedby fewer

vertices,interior verticeswereaddedto seedthe refinementof the adaptive samplingal-

gorithm. Sincethe soundsat the cornersof the speaker aresimilar, refinementwould be

unlikely if only cornerverticeswereusedto representthesurface.

Tenfrequency modeswereestimatedateachsamplinglocation.Preliminaryexper-

imentsdeterminedthat ten modessufficiently representthe soundof the speaker at most

locations.Five recordingsateachsamplelocationwereusedto createprototypicalmodels.

Thesoundeffectorstruckthespeaker normalto thesurfaceateachsamplelocation.

The speaker wassecuredto the ACME teststationusingdouble-sidedtapealong

the bottomedges.The microphoneon the FMS recordedthe samplesfrom a distanceof

approximately90mm. Becauseof thelow amplitudeof theimpactsounds,thesignalwould

bedominatedby roomnoiseat largerdistances.


The speaker was a problematictest object for two reasons.First, the contactsoundsit

producesare quiet and decayquickly. Low amplitudeis a concernsince it decreases

the signal-to-noiseratio. As proven by the evaluationof the estimationalgorithm(Sec-


(a) (b)

Figure8.10: A photo(a) of plasticspeaker andthesubdivision surfacewhich representsit(b).

tion 2.4.1),estimationaccuracy degradesdramaticallywith increasingnoiselevels. Also,

becausethesounddecaysquickly, thesoundof thesolenoid’s returnis sometimespresent

in therecordings.Whenthesolenoidreturnsafterimpact,theplungeroftenstrikestheside

of its exit hole. Normally, this chatteris quietenough,or themicrophonefar enough,that

it is not recorded.Becausethemicrophoneneedsto be so closeto the speaker, however,

thesolenoid’s soundis recordable.Choiceof anacceptablerecordingdistanceis therefore

a trade-off betweengoodsignalamplitude,andrecordingthesoundof thesolenoid.

Theproblemof microphonedistancewasfurthercomplicatedby thehardwaresur-

roundingthe microphoneon the FMS (i.e., the camera,Triclops andpan/tilt unit). Fre-

quently, themicrophonecouldnotbemovedcloserto thestrike locationbecausethePuma

armwouldcollidewith theFMShardware.

Thesecondproblemwith usingthespeaker asa testobjectis registration.Because

surfacemodelcreationis not yet automatic,theobjectmustbemanuallyregisteredto the

stagefor correspondencewith the surfacemodel. With objectsthat are circularly sym-

metric, small imprecisionis tolerable. A squareobject,however, mustbe moreprecisely

positioned.During thetestit wasnotedthatthespeaker wasnot accuratelypositionedand

the edgesof eachfacewere not reliably struck at a normal angle. It is hopedthat this


Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(a) Recordedspectrogram. (b) Synthesisedspectrogram.

Figure8.11:Resultsof plasticspeaker experiment. A spectrogramof a samplerecordedat the top of the speaker (a) is comparedto a spectrogramof a soundsynthesisedfrom aten-modeprototypicalmodel(b).

problemwill beeliminatedby thedevelopmentof ashapeacquisitionmodulefor ACME.

Despitethesedifficulties,goodsoundmodelswereproducedat many sampleloca-

tions. The spectrogramsin Figure8.11 illustratethe similarity betweenthe recordedand

synthesisedsounds.Themodelis clearlyan accuraterepresentationof the recordedsam-

ple. Somemodelssufferedthesamepitch-lowering effectsasdiscussedin Section8.3.1.

Additionally, The soundmodelsof metal grill were generallynot audibly similar to the

recordings,sincetheir signal-to-noiseratioswerepoorer.

In comparisonto the brassvase,the speaker’s soundmodelhasa narrower band-

width andsharperdecay. This resultsupportstheperceptualstudiesof Klatzky etal. [16].


The soundof the plastic speaker is mostly uniform, thoughit doesvary from the edge

to the middle of eachface. Sincethe middle of eachfaceis unsupported,the soundis

generallylower in frequency thanthe edges.We expectedthis variation to trigger some

refinementof thesamplingmesh.As is illustratedin Figure8.12,very little refinementwas

required. Futureexperimentscould investigatelower similarity thresholdsanda coarser

surfacemodelasstimulantsof refinement.


Figure8.12:Refinementresultsof plasticspeaker experiment.Very few refinementsweremadeon the speaker, with most refinementsoccurringnear the edges. The refinementpatternfor the top of the speaker is shown here. Verticesare colour-codedby revisionnumber;black,grey andwhite representcoarse,first andsecondrefinements.

8.5 Toy Drum

Thefourth testobject,a toy drum,wasselectedfor its diversityin soundacrossits surface.

Thedrum(Figure8.13(a)) is achild’s toy, madeof plasticwith threemetalbarssuspended

acrossa slot in the top face. Eachmetalbarhasa differentlengthandthereforedifferent

frequency.

A completemodelof thedrumcould not be createdfor theexperimentdueto re-

strictionsof our subdivision surfaceloader. Instead,thedrumis approximatedby a simple

cylindrical mesh(Figure 8.13 (b)). Becausethe handleof the drum is not modelledby

thesurface,we wereunableto createa soundmodelfor theentiredrum. We insteadcre-

ateda soundmodelof only the top facein orderto show the resultsof refinementon an

acousticallycomplex object.

Five sampleswere recordedat eachsamplelocation,and forty modeswereesti-

matedfor eachmodel. Similarly to the brassvaseandspeaker, the drum wasaffixed to

the ACME teststationusingdouble-sidedtapeon its bottomedges.The microphoneon


(a) (b)

Figure8.13: A photo(a) of toy drumandthesubdivision surfacewhich representsit (b).

Figure8.14: Whenmeasuringneartheedgeof thebars,theplungeroftencausedthebarstopivot on their supports.Whentheplungerretracted,thebarswould returnto their nominalpositions,reducingthedistancebetweentheplungerandthebar.

theFMS wasagainusedto recordthesamplesat a distanceof 130mm,andthedrumwas

strucknormalto its surface.


The toy drum’s constructionintroducedtwo problemswhich affectedthe quality of the

soundmodels. Sincethe metalbarsaresupportedonly alongtheir centralaxis, they are

ableto pivot aroundthataxis (Figure8.14). Unfortunately, whenthesoundeffectormea-

suredlocationsneara bar’s edge,thebarmoved a few millimetresuponcontact,thenre-

turnedoncethesoundeffectorwasretractedto strike. Thismovementreducedthedistance

betweentheplungerandthemetalbarsandproduceduncharacteristically dampedsounds.

The seconddifficulty arosefrom the spacesbetweenthe metalbars. If a samplelocation


Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(a)Recordedspectrogram. (b) Synthesisedspectrogram.

Figure8.15: Resultsof toy drumexperiment(metal).Thespectrogramof arecordedsampleof themiddlemetalbar is shown on the left (a). Thespectrogramof a soundsynthesisedfrom theprototypicalmodelat thesamelocationis shown on theright (b).

lay in oneof thosespaces,nosoundmodelwascreated.Thisabsencepreventedany refine-

mentbetweenthat samplelocationandadjoiningvertices.Sincetheseholeslay between

thebars,adaptive refinementbetweenthebarsdid notalwaysoccurasexpected.

Apart from the effectsjust listed,mostof thesoundmodelsweresuccessful.Ex-

ceptionallygoodmodelswereproducedfor themetalbarswhenthey werestruckneartheir

centers.For example,Figure8.15 illustratesthe fidelity of the model for the middle bar.

With theexceptionof thenoiseeffectsmentionedpreviously, thespectrogramsarenearly

identical.

Resultsof modellingtheplasticsurfacewereacceptable,thoughnotassuccessfulas

themetalbars.As Figure8.16demonstrates,thefrequency spectrumwastypically correct,

but the dampingparameterswere often inaccurate. One additionalconsequenceof the

constructionof thedrumis thatthemetalbarsoftenresonatedwhentheplasticwasstruck.

Thougha minor effect, it mayhave contributedto thesustainof somemodes.More likely,

theprimaryreasonfor poorerestimationis theloweramplituderesponseof theplastic.


Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(a) Recordedspectrogram. (b) Synthesisedspectrogram.

Figure8.16: Resultsof toy drumexperiment(plastic).Thespectrogramof a recordedsam-pleof drum’splasticsurfaceisshown of theleft (a). Thespectrogramof asoundsynthesisedfrom theprototypicalmodelat thatlocationis shown on theright (b).


Resultsof the refinementare plotted in Figure 8.17. As expected,the model was re-

finedsubstantially, especiallyaroundtheinterfacebetweenthemetalbarsandplastic(Fig-

ure 8.17 (b)). Unfortunately, several missedsamplelocationsoccurredat gapsbetween

metal barson the right side. Slight inaccuraciesin the manualregistrationof the drum

causedsomesamplelocationson theright sideto lie betweenbars,but noton theleft side.

Regardless,it is clearfrom the left sideof thediagramsthat thesamplingmeshis denser

nearmaterialboundaries.This resultis convincing evidenceof thesuccessof theadaptive

samplingalgorithm.


−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

(a)Samplelocationsplottedon surface. (b) Diagramof refinement.

Figure8.17: Refinementresultsof toy drum.Therefinedsamplelocationsareplottedonthesurfacemodelin (a). Verticesarecolour-codedby revision number;black,grey andwhiterepresentcoarse,first andsecondrefinements.A simplifieddiagram(b) alsoshows samplelocationsthatweremissed(diamonds).The locationof themetalbarsis indicatedby thevertical rectangle.Hererefinementlevels arerepresentedas’o’, ’+’ and’x’ in ascendingorderof refinementlevel.

Chapter 9

Conclusions

9.1 Overview

A systemto automaticallycreatesoundmodelsof everydayobjectswasdesignedandcon-

structed. This systemusesa surfacemodel of a test object to adaptively selectsample

locations. At eachof theselocationsa device, calleda soundeffector, strikes the object

to elicit an acousticimpulseresponse.Multiple impactsareusedto createa prototypical

soundmodelwhich bestrepresentsthesoundat that location.Thehardware,softwareand

algorithmsrequiredfor this systemwereimplementedandtested.

Overall, thesoundmodelsproducedof thetuningfork, drum,speakerandtoy drum

areencouraging.Theseresultsdemonstratethatexcellentsoundmodelscanbeconstructed

underfavourablenoiseandimpactconditions.For example,thefundamentalfrequency of

thetuningfork wasaccuratelyestimatedwithin thelimits of thediscreteFouriertransform.

Constructionssuchas the drum’s metal barsposea difficult problem. Still, the

systemperformsreliably for “rigid” objects– acharacteristicof many everydayobjects.

The Achilles Heel of the systemis noise. Evaluationof the parameterestimation

algorithmin Chapter2 revealedlarge errorsin estimationfor evenmodestlevelsof back-

groundnoise.Evenmodelswith relatively accuratefrequency estimatescontainedspurious

low-frequency modesand incorrectdelayconstantsdue to noise. Theseeffects areper-

72


ceivedasa lowerpitch of thesynthesisedsounds.

Prototypicalmodelsareoneway to reducethe effect of noise. Evaluationof the

spectrogramaveragingtechniqueshoweda significantreductionis meanestimationerror.

The adaptive samplingalgorithm presentedin Chapter7 suffers a sensitivity to

noisymodelsandmissedsamplelocations.Its refinementof thevaseanddrum’s sampling

meshes,however, areevidenceof its potential. Improvementof the modelsandacoustic

distancemetricshouldimprove futureresults.

9.2 Futur e Work

As with all research,this thesishascreatedmany new opportunitiesfor futurework. This

sectionsuggestsavenuesfor futureresearchanddevelopmentof thesystem.

Ultimately, environmentalnoisemust be removed at its source. Either a sound-

proofenclosuremustbeconstructedaroundACME, or it mustbelocatedin aroomwithout

machines.

Solenoidnoisealsopresenteda problemfor “low-amplitude”materials.Sincethe

soundis producedby chatterbetweentheplungerandthesolenoidcasing,it is presumed

thatthesolenoidmustbereplacedby aspecial-purposedevice. An alternateapproachis to

coatthesolenoidplungerwith rubber.

The original designof the soundeffector suggesteda replaceabletip. Currently,

a hard steel tip is usedto producea sufficiently impulsive impact. Other tips could be

constructedfrom othermaterialsto investigatetheir effect.

Anotherusefuladditionto thesystemwouldbeaforcesensoror loadcell for thetip

of theeffector’s plunger. Knowing theforceprofileof theimpactcouldyield moreaccurate

soundmodels.Recordingtheback-currentthroughthesolenoidcoil mayalsobeasolution.

The problemof microphoneplacementcould be addressedby repositioningthe

microphoneontheFMSfor eachimpact.A sophisticatedmotionplannerwouldberequired

to preventcollisionswith theobject,teststationandcontactmeasurementsystem.

Oneissuenot addressedby thecurrentdesignis theeffect attachingobjectsto the


ACME teststationhason theboundaryconditionsof thesoundmodel. By attachingone

sideof anobjectto theteststation,thatsideis preventedfrom vibratingfreely. Theresulting

soundmodel is thereforeonly applicableonly to synthesisof the object’s soundin this

configuration.Oneusefulconfigurationis an “all-free” boundaryconditionwhereall but

the impulselocationarefree to vibrate. This is usefulfor simulationsof droppedobjects.

Onepossiblesolutionis to holdtheobjectin placeby aminimumnumberof pointcontacts.

A setof rubberconesmaybeusedfor thispurpose.

Thoroughevaluationof the system,particularly the adaptive samplingalgorithm,

requiresaplaybackdevice. Intelligentaudiomorphingalgorithmsfor synthesismayreduce

a model’s dependenceon adaptive sampling. At the very least,software which morphs

betweensamplelocationsto provide a continuousaudio map of objectswill encourage

perceptualstudiesevaluatingtheeffectivenessof theacousticdistancemetricandperceptual

thresholds.This researchwill hopefully resultin an iterative improvementof theadaptive

samplingalgorithm.

Otherexperimentsshouldalsobeconductedto investigatetheeffectof strike angle

relative to surfacenormalon thesoundmodel. It is clearthat theamplitudeof thesound

will changeasthestrike angleapproaches0À . It is not immediatelyclearwhetherobjects

requireanisotropicsoundmodels. The currentsystemis easilyprogrammedto perform

theseexperiments.

Bibliography

[1] VariousAuthors.ComputerMusicJournalSpecialIssuesonPhysicalModeling. 16(4)and17(1),MIT Press,1996and1997.

[2] Kevin Bradley. Synthesisof anacousticguitarwith a digital stringmodelandlinearprediction.Master’s thesis,Carnegie Mellon University, 1995.

[3] AntoineChaigneandVincentDoutaut.Numericalsimulationsof xylophones.Journalof AcousticalSocietyof America, 101(1):539–557,1997.

[4] Perry R. Cook and Dan Trueman. A databaseof measuredmusical in-strument body radiation impulse responses, and computer applications forexploring and utilizing the measured filter functions. Available Online:http://www.cs.princeton.edu/prc/ism98fin.pdf, 1998.

[5] CreativeTechnologyLtd. SoundBlasterLive! HardwareSpecifications, 2000.Avail-ableonline: http://www.soundblaster.com.

[6] Tony DeRose.Subdivision surfacecoursenotes.SIGGRAPHCourseNotes,1998.

[7] ShlomoDubnov, Naftali Tishby, andDaliaCohen.Clusteringof musicalsoundsusingpolyspectraldistancemeasures.In Proceedingsof the1995InternationalComputerMusicConference, pages460–463,1995.

[8] RobertS. Durst andEric P. Krotkov. Objectclassificationfrom analysisof impactacoustics. In Proceedingsof the IEEE/RSJInternationalConferenceon IntelligentRobotsandSystems, volume1, pages90–95,1995.

[9] W. W. Gaver. EverydayListeningand Auditory Icons. PhD thesis,Univeristy ofCaliforniain SanDiego,1988.

[10] W. W. Gaver. Synthesizingauditoryicons.In Proceedingsof theACM INTERCHI’93,pages228–235,1993.

75


[11] AugustineH. Gray, Jr. andJohnD. Markel. Distancemeasuresfor speechprocessing.IEEE Transactionson Acoustics,Speech and Signal Processing, ASSP-24(5):380–391,October1976.

[12] D. J. Hermes.Auditory materialperception.IPO AnnualProgressReport33, Tech-nischeUniversiteitEindhoven,1998.

[13] Wesley H. Huang. A tappingmicropositioningcell. In Proceedingsof the IEEEInternationalConferenceon RoboticsandAutomation, pages2153–2158,2000.

[14] InterTAN Inc. Optimusultra-miniature tie-clip microphonespecifications, 1996.

[15] DouglasKeislar, Thom Blum, JamesWheaton,andErling Wold. A content-awaresoundbrowser. In Proceedingsof the1999InternationalComputerMusicConference,pages457–459,1999.

[16] RobertaKlatzky, DineshK. Pai,andEric Krotkov. Perceptionof materialfrom contactsounds.Presence, (in press).

[17] Eric Krotkov. Roboticperceptionof material.In Proceedingsof theFourteenthInter-nationalJoint Conferenceon Artificial Intelligence, pages88–94,1995.

[18] Eric Krotkov, RobertaKlatzky, and Nina Zumel. Robotic perceptionof material:Experimentswith shape-invariant acousticmeasuresof materialtype. In O. KhatibandJ.K. Salisbury, editors,ExperimentalRoboticsIV, number223in LectureNotesin ControlandInformationSciences,pages204–211.Springer-Verlag,1996.

[19] CharlesLoop. Smoothsubdivision surfacesbasedon triangles.Master’s thesis,Uni-versityof Utah,1987.

[20] RobertL. Mott. SoundEffects:Radio,TV, andFilm. Butterworth Publishers,1990.

[21] DineshK. Pai, JochenLang,JohnE. Lloyd, andRobertJ.Woodham.Acme,a teler-oboticactive measurementfacility. In Proceedingsof theSixthInternationalSympo-siumon ExperimentalRobotics, 1999.

[22] PointGrey Research,Vancouver, Canada.TriclopsOn-lineManual. Availableonline:http://www.ptgrey.com.

[23] PrecisionMicroDynamicsInc. PrecisionMicroDynamicsInc.MC8-DSP-ISARegister

AccessLibrary andUser’s Manual, 1.3edition,1998.

[24] LawerenceRabinerandBiing-HwangJuang. Fundamentalsof Speech Recognition.PTRPrentice-Hall,Inc., 1993.


[25] JoshuaL. RichmondandDineshK. Pai. Active measurementof contactsounds.InProceedingsof theIEEEInternationalConferenceonRoboticsandAutomation, pages2146–2152,2000.

[26] JoshuaL. RichmondandDineshK. Pai. Roboticmeasurementandmodelingof con-tact sounds. In Proceedingsof the International Conferenceon Auditory Display,2000.

[27] Malcolm Slaney, Michele Covell, andBud Lassiter. Automaticaudiomorphing. InProceedingsof the IEEE InternationalConferenceon Acoustics,Speech and SignalProcessing, pages1001–1004,1996.

[28] KenSteiglitz. A Digital SignalProcessingPrimer with applicationsto Digital AudioandComputerMusic. Addison-Wesley, 1996.

[29] Mark Ulano. Moving picturesthat talk – the early history of film sound. Availableonline: http://www.filmsound.org/ulano/index.html.

[30] K. van denDoel. SoundSynthesisfor Virtual Realityand ComputerGames. PhDthesis,Universityof British Columbia,May 1999.

[31] Keesvan denDoel andDineshK. Pai. The soundsof physicalshapes.Presence,7(4):382–395,1998.

[32] RichardP. Wildes andWhitmanA. Richards. Recovering materialpropertiesfromsound.In WhitmanRichards,editor, Natural Computation. TheMIT Press,1988.

[33] Erling Wold, ThomBlum, DouglasKeislar, andJamesWheaton.Content-basedclas-sification,searchandretrieval of audio. IEEE Multimedia, 3(3):27–36,1996. Alsoavailableonline(www.musclefish.com).

Appendix A

SoundEffector Specifications

A.1 Mounting Bracket

Theconstructionof thesoundeffector’s mountingbracket deservesabrief descriptionhere

for future reference.Constructedfrom a singlepieceof Á|ÂÃÄ ” x ÃÅ ” aluminum,it was

formed on a bending-barfollowing the schedulein Figure A.1. To allow for the finite

bendingradiusof the material,an additional ÃÄ ” ( ÃÅ ” x 2) wasaddedto the lengthof the

material.Following thebendingspecificationsin FigureA.1, thespecifiedspacingbetween

thetwo ends(i.e.,2.5”) wasmaintained.Unfortunately, two artifactsof thebendingprocess

arepresentin thebracket: fatiguemarksandoff-centrealignment.Thefatiguemarkswere

producedon theexterior radiusof eachbend.Theseoccurredbecausethemetalwasbent

beyond its permissiblestresslimit. This might be avoided in future constructionsif the

metalwasfirst heated.Thealignmentof thesolenoidis alsoslightly off-centrefollowing

thebending.This is aflaw of thealignmentof thematerialin thebendingvise.

78


1/2"3/4"3/4"

1-1/4"

nominal spacing

bend lines

5/8"

2-1/2"

2-1/4"1/2"

0.25"0.5625"

FigureA.1: Bendingschedulefor soundeffectormountingbracket.


A.2 Control circuit

The circuit interfacing the soundeffector to the PrecisionMicroDynamicsMC8 boardis

diagrammedin FigureA.2. It is a simpleswitchingcircuit, with a 74F245Octal buffer

to isolatethe MC8 from the relay. This circuit may be duplicatedto control otherdigital

outputdevicessuchaslights.

+5 VDC

Pin J4-37

+12 VDC

V

To Solenoid

solenoid

100

1 k1/8 Q1

T1

Q1: 74F245 Octal BufferT1: P2N2222 NPN

Ω

Ω

FigureA.2: Schematicfor solenoidcontrolcircuit.

Appendix B

Effect of White Noiseon

Spectrograms

Thetwo spectrogramsin FigureB.1 arepresentedto comparetheeffect of white noiseon

theappearanceof aspectrogram.Withoutbackgroundnoise(FigureB.1 (a)),thefrequency

modesappearaswide bands,andappearto besustainedlonger. It thereforebecomesdif-

ficult to comparesynthesisedspectrogramsto measuredones. By addinglow amplitude

(e.g.,SNR= 100)whitenoiseto thesynthesisedsound,themappingof coloursto intensity

valuesis scaledmorecomparablyto the original recording. For this reason,all spectro-

gramsof synthesisedsoundsin Chapter8 includewhite noiseaddedat a signal-to-noise

ratio approximatelythe sameasthe measuredsamples.The white noiseis addedto the

synthesisedsignalprior to computingits spectrogram.

81


Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

Time (sec)

Fre

quen

cy (

Hz)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

4

(a)Purespectrogram. (b) Spectrogramwith additive white noise.

FigureB.1: Effect of white noiseon spectrograms.Thespectrogramin (a) is producedbya soundsynthesisedfrom a forty-modesoundmodel. Thespectrogramin (b) is thesamespectrogram,but with whitenoiseaddedatanSNRof 100.Whitenoisescalesthemappingof colourto intensitymorecomparablyto spectrogramsof recordedsounds.

Appendix C

Detailsof Unique

Frequency-MappingAlgorithm

A simplealgorithmwasdesignedto matchfrequency modesbetweentwo soundmodels

suchthata one-to-onemappingexists.That is, giventwo models(A andB), eachwith Æ|Çfrequencies,a mappingfunction ÈÉ Ê Ë is producedsuchthat ÌAÍÎ Ï mapsto ÌAÐAÎ Ñ when ÒÔÓÈÉ Ê Ë . Thealgorithmis summarisedin Chapter7 (Figure7.4) andrepeatedin FigureC.1

for convenience.This appendixdescribesthe implementationof the algorithmin greater

detail.

Five arraysareusedto track the matchingof modes.The first is a matrix of fre-

quency differencesbetweenall modesin thetwo models(FigureC.2). Thisdifference

matrix is usedto createtheorder array. This arrayorderstheindicesof modesin model

B from nearestto farthestfor eachmodein modelA. (FigureC.3). Thethird arrayis a list

of indicesindicatingthenext modeto check(FigureC.4). A fourth array(FigureC.5) is

usedto trackwhichmodesin modelA arecurrentlymatched.Thefifth arrayis theresultof

thealgorithm: anarrayrelatingthemappingof modeindicesin modelA to modeindices

in modelB (FigureC.6). Theindex of themodein modelA matchedto modej in model

B is storedasmapping[j]. Theelementsof themapping arrayareinitialised to ’-1’,

indicatingunmatchedmodes.

83


Õ For eachfrequency modeÖA×PØ Ù in modelA . . .

1. Find thefrequency modeÖAÚ Ø Û nearestto ÖA×PØ Ù2. If the ÖAÚ Ø Û is unmatched,matchit to ÖA×PØ Ù3. Or, if ÖÚ Ø Û is matchedto anothermodewhich is fartherthan Ö×Ø Ù , matchÖÚ Ø Û

to ÖA×PØ Ù instead.Mark Ö9Ü|Ý Þ asmatched,andthemodeit replacesasunmatched.

4. Otherwise,loop to step1, picking thenext nearestfrequency mode

Õ Repeatuntil all frequency modesin modelA arematched.

FigureC.1: Algorithm to find uniquefrequency mappingbetweentwo models.

2 3B

1 4

3

4

1

2 3 4

8 1

1

1

A

6

6

1

4

A3 and B4 is 6.

0The difference

between modes5

6

1

7

2

FigureC.2: difference matrix. Dif-ferences in frequencies between allmodes in Model A and all modes inModel B are storedin this array. Forexample,difference[3][4] is thedifferencebetweenthe ßA×PØ à and ß Ú Ø á .

larg

er d

iffer

encefa

r

3

4

32A 4

4

3

mode A4.

third nearest to 3

4 4

3

1

Mode B1 is the1

11

1

22

2

2

FigureC.3: order array. Eachcolumnof theorder arraycontainsthe indicesof modesin ModelB from nearestto far-thestof a modein Model A. For exam-ple, order[3][4] is the index of theâ ã ä

nearestmodein ModelB to ßA×Ø á .


1

A

1

13

4

1

2

3

iteration.

against the

Check A1

mode next

3rd closest

Figure C.4: indexarray. Each elementcontains an indexinto the order ar-ray at which to selectthe next mode fromModel B for compar-ison. The elementsare non-decreasing,thus eliminating cycleswhere two modes arerepeatedly comparedagainstanotherpair.

2

3

A

1

4

false

false

true

false

Figure C.5: matchedarray. Indicateswhichmodes of Model Ahave beenmatchedto amode in Model B. Thealgorithm terminateswhen all elements ofthisarrayaretrue.

B

-1

-1

4

1

2

3

-1

to mode A2.

Mode B1

is mapped

2

Figure C.6: mappingarray. When the algo-rithm terminates, thisarray maps modes inModel A to modes inModel B. For examples,mapping[1] is con-tains the index of themodein ModelA whichmapsto åAæAç è .


The pseudo-codein FigureC.7 usesthesefive arraysto implementthe algorithm

of FigureC.1. Eachunmatchedmodei in modelA is examinedin sequence.Thej =

index[i] é ê modelisted in columni of theorder matrix is checkedagainstthemap-

ping array. If nomappingexists(i.e.,mapping[b] = -1, whereb = order[i][j]),

themappingis setto modei, modei is marked asmatched(i.e.,matched[i] = true)

andthe next modein modelA is examined(i.e., i = i + 1). If a mappingexists, the

differencebetweenthe currently mappedmodesis comparedto the differencebetween

modesi andb (usingthedifferencematrix). If thecurrentmapping’sdifferenceis less

thantheproposednew mapping,index[i] is incremented,andthenext nearestmodej

is examined. Otherwise,mapping[b] is set to i, andthe previously mappedmodeis

markedasunmatched.

This processcontinuesuntil all modesin modelA arematched.If thelastmodei

in modelA is examinedbeforethe mappingis complete,i is resetto thefirst unmapped

modein modelA andtheloop continues.

SincemodelsA andB containthesamenumberof modes,eachof whichis uniquely

mappedto oneothermodes,the algorithmis guaranteedto terminate.Theindex array

preventstwo pairsof modesfrom beingcomparedtwice, therebyeliminatingpotentialcy-

cles.


int[] FindUniqueMapping(double[] modesOfA, dou-ble[] modesOfB)

// the difference[a][b] is the difference be-tween frequency

// modes a (from Model A) and b (from Model B)difference = calculateDistance(modesOfA, modesOfB);

// order[a][i] is the index to the ith near-est mode in Model B

// to mode a in Model Aorder = sort(difference);

// index[a] is the index of the next mode in the order// array for mode aint[] index;

// initialise all elements of index to 0index is all 0;

// matched[a] indicates whether mode a in Model A has// been matched yetboolean[] matched;

// Initialise all elements of matched to falsematched is all false;

// mapping[b] is the index of the mode in Model A that best// maps to mode b of Model B.int[] mapping;

// Initialise all elements of mapping to -1mapping is all -1;


// Iterate until all modes in Model A are matchedwhile (matched is not all true)

// For each unmatched mode in Model A...for (i = 0 to modesOfA.length) if (matched[i] == false) // Find nearest unmapped mode in Model Bfor (j = index[i] to order[i].length)

// b is the index of the next near-est mode in Model B

b = order[i][j];

// a is the index of the mode in Model A to which b// is mapped (if any)a = mapping[b];

// if mode b is unmapped, or if it is mapped// to mode in Model A which is far-

ther, map it to// mode iif ((a == -1) OR (difference[a][b] > dis-

tance[i][b])) mapping[b] = i;if (a > -1) then matched[a] = false;matched[i] = true;index[i] = j + 1;break j loop;

// end j loop

// end i loop

FigureC.7: FindUniqueMappingPseudo-code

automatic measurement and modelling of contact sounds · 2011-11-22 · this thesis documents the...

Documents