a case study of improving memory locality in polygonal
TRANSCRIPT
A CaseStudy of Impr oving Memory Locality In Polygonal
Model Simplification: Metrics and Performance
Victor Salamon�
PaulLu�
BenWatson�
Dima Brodsky�
DaveGomboc�
�Dept. of Computing Science
University of Alberta
Edmonton, Alberta, Canada�salamon,paullu,dave�
@cs.ualberta.ca
�Dept. of Computer Science
Northwestern University
Evanston, IL, USA
�Dept. of Computer Science
University of British Columbia
Vancouver, BC, Canada
Abstract
Trade-offsbetweenqualityandperformanceareimportantissuesin real-timecomputergraphicsand
visualization.Polygonalmodelsimplificationalgorithmstakea full-sizedpolygonalmodelasinput and
outputa less-detailedversionof themodelwith fewer polygons.For real-timedisplay, fewer polygons
resultin fasterrendering.For otherapplications,thespeedof renderingmaybetraded-off for improved
imagequality. Even thoughthe simplified modelsmay be pre-computed,somefull-sized modelsare
largeenoughto requirehoursto simplify, dueto paging.Whencomputingmultipleversionsof thesame
model,with varyinglevelsof detail,theperformanceof thesimplificationprocessbecomescritical.
R-Simp is arecentsimplificationalgorithmthatofferslow runtimes,easycontrolof theoutputmodel
size,andgoodmodelquality. However, whentheinternaldatastructuresfor theinput modelarelarger
thanmain memory, many simplificationsalgorithms,including R-Simp, suffer from poor performance
duetopaging.Wepresentacasestudyof theR-Simp algorithmandhow its datalocalityandperformance
canbesubstantiallyimprovedthroughanoff-line spatialsortandanon-linereorganizationof its internal
datastructures.
We empiricallycharacterizethedataaccesspatternof R-Simp on threelargemodelsandpresentan
application-specificmetric,calledcluster pagespan, of R-Simp’s locality of memoryreference.We also
experimentallyshow that both spatialsortingandreorganizationcanindependentlyimprove R-Simp’s
performanceby a factorof 2 to 6-fold. Whenbothtechniquesareused,R-Simp’sperformanceimproves
by up to 7-fold.
Keywords: graphics,visualization,performance,polygonalmodelsimplification,memorylocality, pag-
ing, datastructurereorganization,spatialsorting
Length: 15 pagesof text and6 pagesof figures/tables
1
Model Faces(Polygons) Vertices Initial VirtualMemorySize(MB)
File Sizefor PLYModel (MB)
hand 654,666 327,323 123.6 31.1dragon 871,306 435,545 164.1 32.2blade 1,765,388 882,954 331.3 80.1
Table1: Summaryof Full-SizedPolygonalModels
1 Intr oduction
In general,themorepolygonsin a three-dimensional(3D) computergraphicsmodel,themoredetailedand
realistic is the renderedimage. For example,modelsof complicatedmachineryandscannedobjectscan
containmillions or billions of polygonsin their mostdetailedform. However, thesamelevel of detailand
realismmay not be requiredin all computergraphicsapplications.For real-timedisplay, fewer polygons
resultin fasterrendering;performancemaybethemostimportantcriteria.For otherapplications,thespeed
of renderingmaybe tradedoff for improved imagequality. Theflexibility to selecta versionof thesame
modelwith adifferentlevel of detail(i.e.,adifferentnumberof polygons)canbeimportantwhendesigning
agraphicssystem.
Polygonalmodelsimplificationalgorithmstake a full-sizedpolygonalmodelasinput andoutputa ver-
sionof themodelwith fewer polygons.Althoughthesimplifiedmodelsareof high quality, it is alsoclear
that,uponcloseexamination,somedetailshavebeensacrificedto reducethesizeof themodel(Figure1). A
numberof modelsimplificationalgorithmshave beenproposed[3], eachwith their differentstrengthsand
weaknessesin termsof executiontime andtheresultingimagequality. In general,thealgorithmsareused
to pre-computesimplifiedversionsof themodelsthatcanbeselectedfor useat run-time.
However, modelsthathave over a million polygonsrequiretensof megabytesof disk storage,andre-
quire hundredsof megabytesof storagefor their in-memorydatastructures(Table1). For example,the
blade modelhasalmost1.8 million polygonsandrequires331.3megabytes(MB) of virtual memoryafter
beingreadin from an80.1MB diskfile.1 Pointersandlistsassociatingverticesto facesandotherdatastruc-
turesaccountfor thesubstantialgrowth in a model’s storagerequirementsasit is readinto mainmemory.
Furthermore,asthecomputationprogresses,thevirtual memoryfootprint of theprocessincreasesasother
datastructuresaredynamicallyallocated.
1Initial virtual memorysizeis readfrom the/proc filesystem’sstat device underLinux 2.2.12beforeany simplificationcomputationis performed.
2
(a) Hand (Original,654,666polygons) (b) Hand (Simplified,R-Simp, 40,000polygons)
(c) Dragon (Original,871,306polygons) (d) Dragon (Simplified,R-Simp, 40,000polygons)
(d) Blade (Original,1,765,388polygons) (f) Blade (Simplified,R-Simp, 40,000polygons)
Figure1: Full-SizedOriginal andSimplifiedVersionsof PolygonalModels
3
As thesizeof theinput modelincreases,thedatastructuresrepresentingthemodelbecometoo largeto
residein mainmemory. Of course,thevirtual memorysubsystemin contemporaryoperatingsystemsallows
for datastructuresthat do not fit in main memory. Pagesof datacanbe swappedto andfrom secondary
storage,suchasa harddisk, in responseto thedataaccesspatternsof theuser’s process.However, if there
is poorlocality andre-usein thedataaccesspatterns,performanceis greatlyreducedaspagingbecomesthe
performancebottleneck.
Consequently, weexaminetheproblemof dataaccesslocality andits effectonperformancefor amodel
simplificationalgorithm. In particular, we make a casestudy of the recently-introduced R-Simp simpli-
fication algorithm. R-Simp offers low executiontimes, easycontrol of the outputmodel size, andgood
modelquality [1]. We focuson systems-orientedrun-timeperformanceandrelatedmetricsof R-Simp, as
opposedto measuresof modelquality. A detailedstudyof othermodelsimplificationalgorithmsis partof
ouron-goingresearchandis beyondthescopeof this currentpaper.
1.1 Moti vation and RelatedWork: Lar geModels
As new modelacquisitiontechnologieshave developed,thecomplexity andsizeof 3D polygonalmodels
have increased.First, modelproducershave begun to use3D scanners(for example,[5]). Second,asthe
speedandresolutionof scientific instrumentationincreases,so hasthe sizeof the datasetsthat mustbe
visualized.Both of thesetechnologychangeshave resultedin modelswith hundredsof millions to billions
of polygons.Currenthardwarecannotcomecloseto displayingthesemodelsin real time. Consequently,
thereis a largebodyof “modelsimplification” researchaddressingthisproblem[3].
Most of thesemodel simplification algorithms(e.g., [8, 4, 2]) usea greedysearchapproachwith a
time complexity of O(� log � ), where � is thenumberof polygonsin theoriginal model.Also, thegreedy
algorithmshavepoorlocality of memoryaccess,jumpingaroundthesurfaceof theinputmodel,from puzzle
pieceto puzzlepiece. Oneexceptionto this trendis the simplificationalgorithmdescribedby Lindstrom
[6], basedon the algorithmby RossignacandBorrel [7]. Lindstrom’s algorithm is fast, but it produces
simplifiedmodelsof poorqualityand,by its nature,it is difficult to controltheexactnumberof polygonsin
thesimplifiedmodel.
In contrast,therecently-introducedR-Simp algorithm[1] producesapproximatedmodelsof substantially
higher quality than thoseproducedby Rossignacand Borrel’s approach,and R-Simp allows very exact
control of outputsize,all without a severecostin executionspeed.R-Simp is uniquein that it iteratively
refinesaninitial andvery poorapproximation,ratherthansimplifying thefull input model. Consequently,
4
R-Simp hasa timecomplexity of O(� log ), ratherthanO(� log � ), where is thenumberof polygonsin
thesimplifiedmodel.Since��� in practice,R-Simp’s advantagecanbesubstantial.Furthermore,aswe
will seein Section3, R-Simp’s approachof refininga largepieceof themodelinto smallerpieces,provides
for reasonablelocality of memoryaccesses,if the“piece” canfit into mainmemory.
1.2 Overview
We begin by providing empiricalevidencefor the poor memorylocality of R-Simp whenthe modeldoes
not fit in mainmemory. Othersimplificationalgorithmswould alsohave poormemorylocality, but we fo-
cuson R-Simp. We introducecluster pagespan asanapplication-specificmetric for locality andshow how
even small improvementsin clusterpagespancangreatlyimprove the residentworking setof R-Simp. In
particular, we describehow two simpletechniquescanreduceclusterpagespanandimprove performance
by upto 7-fold. First, theoff-line techniqueof spatial sorting addsanalgorithm-independent preprocessing
stepto modelsimplification.Second,theon-linetechniqueof cluster data structure reorganization (or reor-
ganization)is application-specific.We useoptimizedandinstrumentedversionsof R-Simp to demonstrate
theperformanceof our techniquesandto provide graphsthatprovide intuition asto thenatureof R-Simp’s
memorylocality andotherperformancemetrics.
2 The Problem: Locality of Memory Accesses
To understandthedataaccesspatternsof R-Simp, we instrumentedaversionof thecodesuchthatanon-line
traceis producedof all themodel-relatedmemoryaccesses.Eachmemoryaccessis timestampedusingthe
valuereturnedby theUnix systemcall gettimeofday(), wherethefirst accessis fixedat time 0. The
resultingtracefile is processedoff-line. A scatterplot of thevirtual addressof eachmemoryaccessversus
thetimestampshowstherelatively poorlocality of referencefor R-Simp (Figure2). Notethedifferentscales
on theX andY-axesof eachgraph,eventhoughthey areof thesamesize.Thehardwareplatformis a 500
MHz PentiumIII system,runningLinux 2.2.12,with 128 MB of RAM, anda IDE swap disk. Although
therearedualprocessors,R-Simp is asingle-threadedapplication.
The instrumentationandtracingaddssignificantoverheadto R-Simp’s executiontime. Consequently,
we have scaledtheX-axis. For example,thenon-instrumentedR-Simp requires2,597secondsto compute
thesimplifiedmodelfor hand. TheinstrumentedR-Simp requiresmuchmoretime to execute,but we have
post-processedthe traceinformationso that theX-axis appearsto be2,597seconds.Thesamescalingof
theX-axisappearsfor theothermodels.Althoughthescalingmayintroducesomedistortionsto thefigures,
5
it still capturesthebasicnatureof thememoryaccesspatterns.
As R-Simp computesthesimplifiedmodel,it accessespartsof theoriginal modelwith poor locality of
reference.Thegraphsshow a “white noise”pattern(Figure2). If themodelfits in main memory, paging
would not be a problem. However, the graphsintuitively explainswhy pagingis a performanceproblem
for modelsthatdo not fit in mainmemory. As will bediscussedfurther in Section4.2, thepoor locality of
referenceis not dueto apooralgorithm,but is dueto poorspatiallocality in theoriginalmodel.
Every graphin Figure 2 hassomecommonpatterns. The darker horizontalband,betweenaddress ������� ����
and � ������ ����on the LHS Y-axis, is the region of virtual memorywherethe datastructure
for the vertex list is stored. The horizontalregion above the verticesis wherethe facelist is stored. The
datastructuresanddetailsof R-Simp arediscussedin Section3. In eachgraph,thereis a vertical line (for
example,attime � ������ ��� millisecondsfor dragon) whichis whenR-Simp finishesthesimplificationprocess
andbeginsto createthenew simplifiedmodel.Thebandof memoryaccessesat thetopof thegraph,andto
theimmediateright of theverticalline, is wherethenew modelis stored.
Superimposedon eachgraphin Figure2 is a solid curve representingthecumulative majorpagefaults
of R-Simp over time,asreportedby theUnix functiongetrusage(). A majorpagefault requiresa disk
accessto swapout thevictim pageandanotherdisk accessto swap in theneededpageof data. Note that
thereis somediscrepancy betweenthemajorpagefaultsincurredby theinstrumentedR-Simp (Y-axisonthe
RHS)andthemajorpagefaultsincurredby thenon-instrumentedR-Simp (notedin thecaptionsof Figure2
andTable3). Wehave purposelynotscaledtheRHSY-axissincethatis moreproblematicthanscalingreal
time. Again, thefiguresareonly meantto give an intuitive pictureof thedataaccesspatternsinherentto
R-Simp.
SinceR-Simp incurssomany pagefaultson thesemodels,andsincethereappearsto bepoorlocality of
referencefor themodel’s in-memorydatastructures,we canfocusour attentionon improving locality and
reducingpagefaults.However, we mustfirst provide anoverview of theR-Simp algorithm.
6
X-axis is scaledrealtime in milliseconds,LHS Y-axisis virtual address(scatterplot),
RHSY-axisis cumulativemajorpagefaults(solid curve) in theinstrumentedversionNOTE: X andY-axeshavedifferentrangesfor eachgraph
(a)Hand: 2,597secondsrun-time,1,812,649majorpagefaultsin non-instrumentedversion
(b) Dragon: 7,736secondsrun-time,6,410,393majorpagefaultsin non-instrumentedversion
(c) Blade14,312secondsrun-time,13,063,493majorpagefaultsin non-instrumentedversion
Figure2: MemoryAccessPatternfor Original R-Simp: Computing40,000polygonsimplifiedmodel7
3 The R-Simp Algorithm
The R-Simp algorithmbegins by readingthe original model into main memory. The model is contained
within asinglecluster andrepresentstherootnodeof an � -ary tree.A clusteris acollectionof verticesand
facesfrom theoriginalmodel.Theinitial clusteris thensubdividedinto eightsub-clusters.Ultimately, each
clusterwill betransformedinto a new vertex in thesimplifiedmodel.Therefore,thesesub-clustersarethen
iteratively subdivideduntil therequirednumberof clustersis reached.Thedecisionto subdivide aclusteris
basedon theamountof variationin theorientationof thefacesin thecluster(i.e., curvature).Thefinal set
of clustersrepresentsverticesthatlie on thesimplifiedsurface.Next, theseverticesaretriangulatedto form
thefacesof thenew surface.Finally, thenew verticesandfacesarewritten to anoutputfile.
Thiswholeprocesscanbepartitionedintosixphases:input,initialization,simplification,post-simplification,
triangulation,andoutput. Below, we discussthe maindatastructuresin R-Simp andprovide moredetails
abouteachphaseof thecomputation.
3.1 Data Structures
R-Simp usesthreemain datastructuresto performmodelsimplification. The first two datastructuresare
relatively staticandareusedto storetheoriginalmodel.Thelastdatastructure,thecluster, is moredynamic
andrepresentsanodein the � -ary tree.
First, thevertex list is a globaldatastructurethatstorestheverticesfrom the input model. Thevertex
list is anarrayof Vertex structures.A Vertex structurecontainsthreemaindatafields.Thefirst field is
the !#"%$'&($*),+ coordinatesof thevertex in 3D space,representedasfloatingpoint values.Thesecondfield is
anadjacency list of vertices.Thethird field is anadjacency list of faces.Theadjacency listsareconstructed
during the initialization phaseandareindicesinto the vertex list andfacelist. All referencesto vertices
andfacesin R-Simp aredonethis way. Therefore,theglobalvertex andfacelists areaccessedthroughout
R-Simp’s execution. Note that the adjacency lists do not exist in the versionof the modelon secondary
storage,whichpartly accountsfor why themodelrequiresmorestoragewhenreadinto mainmemory.
Second,the face list is alsoa globaldatastructurethatstoresthefacesfrom the input modelandis an
arrayof structuresof typeFace andcontainsthreefields. Thefirst field containstheverticesthatmake up
the face. The secondandthird fields arethe face’s normalandits area,respectively. This informationis
usedduringthesimplificationphaseandcomputedduringtheinitializationphase.
Lastly, thecluster structurerepresentsaportionof theoriginal modelandis a nodein the � -ary tree.A
clustercontainsa list of verticesfrom theoriginal model,a representative normalto thepatch,theamount
8
of surfacevariation,andthe areaof the patch. The patchareaandthe representative normalareusedto
computethesurfacevariation.
Iteratively, a leaf clusteris selectedfrom a priority queueandsubdividedinto 2, 4, or 8 sub-clusters.At
theendof thesimplificationphase,eachclusterrepresentsasinglevertex in thesimplifiedmodel.Therefore,
many verticesfrom theoriginalmodelaresimplifiedinto asinglevertex. Thenew vertex is computedin the
post-simplificationphaseandusedin thetriangulationphaseto computethesimplifiedsurface.
3.2 Phasesof R-Simp
Thefirst phaseis the input phase.Theoriginal modelis readin from thefile andthe initial vertex list and
facelist arecreated.Thesedatastructuresrequirea lot of virtual memory(Table1). Theblocksof memory
thatareallocatedin thisphaseareusedthroughouttheentiresimplificationprocess(Section2, Figure2) and
areonly freedat theend. Theseblocksof memorydo not increaseor decreasein sizeduring theprocess,
but they areaccessedthroughoutthecomputation.
In thesecondphase,the in-memorydatastructuresareinitialized. The initialization phasecreatesthe
vertex andfaceadjacency lists in thevertex list andcomputesthenormalandareafor eachfacein theface
list. The adjacency lists areusedin the simplificationphaseto determineandenforcesurfacetopology.
Theinitialization phasesequentiallyiteratesthroughthefacelist. For eachface -/. thevertices( 021 , 0�3 , and
0�4 ) areextracted.Then,for eachvertex 065 the face -/. is addedto the 0�5 ’s faceadjacency list. The vertex
adjacency listsareupdatedfor eachvertex 065 by addingtheothervertices(i.e., 0�7 where8:9;=< ) to thelist if
they arenotalreadypresent.Finally thefacenormalandthefaceareaarecomputed.
Thethird phaseis theheartof R-Simp: thesimplificationphase.In thisphasetheoriginal setof vertices
is reducedto the desiredamount. The simplificationphasestartswith all the verticesin a singlecluster.
Thisclusteris thensubdividedinto eightsub-clustersby computingaboundingboxaroundtheverticesand
subdividing theboundingbox usingthreeaxis-alignedplanespositionedat its centre.Theseeightclusters
areinsertedinto apriority queueandthemainloopof R-Simp begins.Thepseudo-codeis shown in Figure3.
The priority queue( >@? ) holdsreferencesto the clustersandordersthe clustersbasedon the surface
variation. A topologycheck,basedon breadth-firstsearch,is usedto determinewhetherthesurfacein the
clusteris connected.If thesurfaceis disconnected,theclusteris partitionedalongcomponentboundaries
andeachcomponentis put into its own sub-cluster. If thesurfaceis connectedthena statisticalapproach
is usedto approximatethe optimal partitioning for the cluster. The clusteris partitionedinto 2, 4, or 8
sub-clusters.Oncethesetof sub-clusters(� &BA ����� &�CD� ) hasbeencreated,thesurfacevariationis computed
9
E , " , &�. are clusters� &BA ����� &�CD� is the set of sub-clusters created from ">F? is a priority queuedo"HG front of priority queue >F?Compute face list for "Run topology check on "if " is disconnected thenSplit " along component boundaries I � &BA ����� &�CD�
elseUsing " compute quadric JSplit " using JKI � &BA ����� &�CD�
foreach E G � &BA ����� &�CD�E � LNM�OQP/M�OSR &UT ; Compute surface variation for Einsert( s, >F? )
until sizeof( >@? ) ; desired number of vertices
Figure3: SimplificationPhase
for eachsub-cluster( E ) asthey areinsertedinto thepriority queue.
Thegreaterthesurfacevariation,thehigherthepriority of thecluster. Therefore,thesimplifiedmodel
will have more verticesin regions of high surfacevariation. When a cluster is partitioned,the surface
representedby the clusteris subdivided into sub-patchesthat arepotentiallyflatter. This processiterates
until thepriority queuecontainstherequirednumberof clusters,whichdeterminesthenumberof polygons
in thesimplifiedmodel. Eachsplit causesmoreclustersto becreatedthustheamountof memoryusedby
thisphasedependsonthelevel of detailrequired.In general,thecoarserthelevel of detailrequired,theless
memoryis needed.
The fourth phaseis thepost-simplificationphase:thefinal setof verticesfor thesimplifiedmodelare
computed.Initially, thealgorithmexaminesall theclustersin thepriority queueto ensurethateachcluster
only representsa single connectedcomponent;this is accomplishedby running the topology checkon
the cluster. Eachclusterrepresentsa singlevertex, 0BVQW , in the simplified model; for eachclusterR-Simp
computesthe optimal positionof 0BVXW . Finally, the algorithmchangesall of the pointersfrom the original
verticesin aclusterto 0BVQW .Thefifth phaseis the triangulationphase,wherethealgorithmcreatesthe faces(i.e., polygons)of the
new simplifiedmodel.Thealgorithmiteratesthroughall thefacesin theoriginalmodelandexamineswhere
10
Model Output Size Original R-Simp w/Spatial Sort w/Reorganization w/Both(polygons) H:M:S seconds H:M:S seconds H:M:S seconds H:M:S seconds
hand 10,000 00:23:21 1,401 00:14:14 854 00:05:05 305 00:04:30 270hand 20,000 00:31:45 1,905 00:19:00 1,140 00:05:42 342 00:04:53 293hand 40,000 00:43:17 2,597 00:26:22 1,582 00:06:50 410 00:05:53 353
dragon 10,000 01:12:55 4,375 00:25:24 1,524 00:23:29 1,409 00:14:52 892dragon 20,000 01:37:09 5,829 00:34:18 2,058 00:29:38 1,778 00:19:43 1,183dragon 40,000 02:08:56 7,736 00:46:21 2,781 00:41:24 2,484 00:26:03 1,563
blade 10,000 02:18:26 8,306 01:07:32 4,052 01:15:39 4,539 00:59:05 3,545blade 20,000 03:01:17 10,877 01:28:09 5,289 01:35:00 5,700 01:19:52 4,792blade 40,000 03:58:32 14,312 01:58:04 7,084 02:08:40 7,720 01:51:53 6,713
Table2: Performanceof PolygonalModelSimplification:VariousStrategies
the verticesof thesefaceslie. If two verticespoint to thesamenew vertex (i.e., two original verticesare
within thesamecluster)thenthefacehasdegeneratedinto a line andthefaceis thrown away. Likewise,if
all threeverticespoint to thesamenew vertex, it is becausethenthefacehasdegeneratedinto a point, and
thatfaceis thrown awayaswell. Only if all threeoriginalverticespoint to differentnew verticesdowekeep
thefaceandaddit to thenew facelist. Theverticesof this new facepoint to theverticesin thenew vertex
list.
In the lastphase,theoutputphase,thealgorithmwrites thenew vertex list andthenew facelist to the
outputfile.
4 Impr oving Memory Locality and Performancein R-Simp
In additionto experimentswith aninstrumentedversionof R-Simp, we have alsobenchmarked versionsof
R-Simp without the instrumentationandwith compileroptimizations(-O) turnedon. Again, our hardware
platformis a 500MHz PentiumIII system,runningLinux 2.2.12,with 128MB of RAM, anda IDE swap
disk. Althoughtherearedualprocessors,R-Simp is a single-threadedapplication.R-Simp is written using
C++ andwe usedtheegcs compiler, version2.91.66on ourLinux platform.
Whenthedatastructuresof amodelfit within mainmemory, R-Simp is known to have low runtimes[1].
However, thehand, dragon, andblade modelsarelargeenoughthatthey requiremorememorythanis phys-
ically available(Table1).2 And, asthecomputationprogresses,datastructuresaredynamicallyallocated
andthesizeof thevirtual memoryusedby theprocessincreases.Consequently, thebaselineR-Simp (called
2Thehand modelby itself wouldinitially fit within 128MB, but theoperatingsystemalsoneedsmemory. Consequently, pagingoccurs.
11
Model Output size Original R-Simp w/Spatial Sort w/Reorganization w/Both(polygons) (majorpagefaults) (majorpagefaults) (majorpagefaults) (majorpagefaults)
hand 10,000 1,042,444 701,789 265,328 249,849hand 20,000 1,361,492 880,498 281,338 254,083hand 40,000 1,812,649 1,167,593 297,028 268,952
dragon 10,000 3,670,516 1,409,489 1,070,466 702,050dragon 20,000 4,892,932 1,808,060 1,329,220 875,622dragon 40,000 6,410,393 2,386,238 1,767,490 1,084,346
blade 10,000 7,678,929 3,878,566 4,127,079 3,125,348blade 20,000 9,988,417 5,002,567 5,195,597 4,149,098blade 40,000 13,063,493 6,599,763 6,997,380 5,717,086
Table3: PageFaultCountof PolygonalModelSimplification:VariousStrategies
“Original R-Simp”) experienceslong run times(Table2) dueto thehighnumberof pagefaultsthatit incurs
(Table3). As expected,the larger the input model,the longerthe run time andthe higherthe numberof
pagefaults.As theoutputmodel’s sizeincreases,therun timeandpagefaultsalsoincrease.
As previously discussed(Section3), the global vertex andfacelists arelarge datastructuresthat are
accessedthroughoutthe computation.Therefore,they arenaturaltargetsof our attemptsto improve the
locality of memoryaccess.In particular, we have developedtwo different techniquesthat independently
improve the memorylocality of R-Simp andimprove its real time performanceby a factorof 2 to 6-fold,
dependingon the modeland the sizeof the simplified model. Whencombined,the two techniquescan
improve performanceby up to 7-fold.
4.1 Metrics: Cluster Pagespanand ResidentWorking Set
We introducecluster pagespan asan application-specificmetric of the expectedlocality of referenceto a
model’s in-memorydatastructure. Clusterpagespanis definedas the numberof unique virtual memory
pagesthat have to be accessedin order to touch all of the verticesand faces,in the global vertex and
facelists, in thecluster. For eachiterationof thesimplificationphase(Figure3), R-Simp’s computationis
focussedonasingleclusterfrom thefront of thepriority queue.Thetopologycheckandsplit computations
arelocalizedto onecluster, but they have to accessall of thecluster’s verticesandfaces.Therefore,if the
pagespanof theclusteris large,thereis a greaterchancethata pagefault will beincurred.Thesmallerthe
clusterpagespan,thelower thechancethatoneof its pageshasbeenpagedoutby theoperatingsystem.
12
Cluster Pagespan:X-axis is scaledwallclock time,Y-axisis pagespanofclusterat front of priority queue
ResidentWorking Setof Original Model:X-axis is scaledwallclock time,Y-axisis numberofclustersin themodelwith Y 95%of pagesin mainmemory
(a)OriginalR-Simp, 40,000polygonsin output
(b) R-Simp with SpatialSortingOnly, 40,000polygonsin output
Figure4: ClusterPagespanandResidentWorkingSetfor blade: Part1
13
Cluster Pagespan:X-axis is scaledwallclock time,Y-axisis pagespanofclusterat front of priority queue
ResidentWorking Setof Original Model:X-axis is scaledwallclock time,Y-axisis numberofclustersin themodelwith Y 95%of pagesin mainmemory
(c) R-Simp with ReorganizationOnly, 40,000polygonsin output
(d) R-Simp with bothSpatialSortingandReorganization,40,000polygonsin output
Figure5: ClusterPagespanandResidentWorkingSetfor blade: Part2
14
Figure4(a)showstheclusterpagespanof theclusteratthefront of thepriority queueduringanexecution
of R-Simp with theblade model.Sincetheinitial clustersarevery large,theclusterpagespanis alsolargeat
thebeginningof theexecution.As clustersaresplit, theclusterpagespandecreasesover time. Thecluster
pagespandatapointsare joined by lines in order to more clearly visualizethe pattern. An instrumented
versionof R-Simp wasusedto gatherthedata.Furthermore,for eachiterationof thesimplificationloop,all
of theclustersin thepriority queueareexamined.If Z\[��B] of thepagescontainingtheverticesandfaces
of a clusterarein physicalmemory(asopposedto beingswappedout onto theswap disk), that clusteris
consideredto beresidentin mainmemory. Therefore,Figure4(a)alsoshows thecount of how many of the
clustersin thepriority queueareconsideredto bein memoryandpartof theresidentworkingset,over time.
WemodifiedtheLinux kernelsothatit is possibleto checkif aspecificvirtual pageof datais in memoryor
on theswapdisk. Thatpageof datais not pagedbackinto memoryaspartof thecheck.
The moreclustersin the residentworking set, the lesslikely it is that a computationperformedon a
clusterwill resultin pagefaults. As theclustersdecreasein size,moreclusterscansimultaneouslyfit into
main memory. Note that the verticesand facesare from the original (not simplified) model. The graph
of the residentworking setis not monotonicallyincreasingbecausetheoperatingsystemperiodically(not
continuously)reclaimspages,sotheresidentworkingsetcountcandropsignificantlywithin a shortperiod
of time. Also, the vertical lines representimportantphasetransitions. The residentworking set count
begins to decreaseafter the simplificationphasebecausethe post-simplificationand triangulationphases
dynamicallycreatenew vertex andfacelists,whichdisplacefrom memorythelists from theoriginalmodel.
Theclusterpagespangraphin Figure4(a)indicatesthatthereis poormemorylocality in how themodel
is accessedfor the initial portion of R-Simp’s execution. Consequently, the total numberof clustersthat
residein mainmemoryremainsunder2,000for mostof thesimplificationphaseof thealgorithm.
Wenow describetwo techniquesthatmeasurablyimprove memorylocality, accordingto clusterpages-
panandtheresidentworkingsetcount,andalsoimprove realtimeperformance.
4.2 Off-Line Spatial Sorting
Themodelsusedin this casestudyarestoredon disk in thePLY file format. Thefile consistsof a header,
a global list of the vertices,anda list of the faces. Eachpolygonalfaceis a triangleandis definedby a
per-facelist of threeintegers,whicharetheindex locationsof theverticesin theglobalvertex list. Thereis
no requirementthattheorderin whichverticesappearin theglobal list correspondsto their spatiallocality.
Two verticesthatarespatialneighboursin the3D geometryspacecanbein contiguousindicesin theglobal
15
vertex list, or they canbeseparatedby anarbitrarynumberof othervertices.Thereis alsonospatiallocality
constrainton theorderof facesin thefile. In R-Simp, theverticesandfacesarestoredin mainmemoryin
thesameorderin which they appearin thefile, thereforetheorderof theverticesandfacesin thefile have
adirectimpacton thelayoutof thedatastructuresin memory.
Thelargeclusterpagespanvaluesseenin theearlyportionof R-Simp’s execution(Figure4(a))suggests
thatperhapsthePLY modelshavenotbeenoptimizedfor spatiallocality. Therefore,wedecidedto spatially
sort thePLY file. Themodelitself is unchanged;it hasthesamenumberof verticesandfacesat thesame
locationsin geometryspace,but we changetheorderin which theverticesandfacesappearin thefile. Our
spatialsort readsin the modelfrom the file, sortsthe verticesandfaces,andthenwrites the samemodel
backto disk in thePLY format. Therefore,thespatialsort is a preprocessingstepthatoccursbeforemodel
simplification. The spatially sortedversionof the PLY file can thenbe re-usedfor different runsof the
simplificationprogram.
Thespatialsortis arecursivealgorithm.Excludinginputandoutput,therearefivestepsto thealgorithm.
After readingthe model into memory, the first stepis to identify a 3D boundingbox for the model. The
secondstepis to selectthreeorthogonaldividing planesto partitiontheboundingbox into eightsub-boxes.
Eachdimensionof a sub-boxis half the dimensionof the original boundingbox. Then,eachsub-boxis
recursively partitionedinto eight sub-boxes. The stoppingconditionfor the recursionis whena sub-box
containslessthantwo vertices.Thethird stepis to reordertheverticesin thevertex list suchthatverticesin
thesamesub-box,whicharespatialneighbours,have indicesthatarecontiguous.
Sincetheverticeshave beenreorderedin theglobalvertex list, thefourth stepis to updatetheper-face
vertex lists to reflectthenew index order. For eachface,thethreeverticesthatdefinethefacearelistedin
monotonicallyincreasingorderof thevertex’s index in theglobal vertex list. Thefifth stepis to spatially
sort the faces.The first vertex’s global index in eachper-facelist is usedastheprimary sort key, andthe
otherindicesarethesecondaryandtertiarysortkeys, respectively. Consequently, facesthatareneighbours
in geometryspacearealsoneighboursin thefacelist. Finally, themodelis written backto disk in thePLY
formatwith thevertex andfacelists in thenew spatiallysortedorder.
Our implementationof the spatialsort is written in C++ andusesthe standardC++ library’s efficient
std::partition() andstd::sort() functionsto reordertheverticesandsort the faces.Spatially
sortingthehand, dragon, andblade modelsrequire28, 28, and89 seconds,respectively, on our 500MHz
PentiumIII-basedLinux platform. Again, themodelsonly have to besortedoncesincethenew PLY files
areretainedon disk.
16
WhenR-Simp is given a spatiallysortedmodel for the input, clusterpagespanis reducedthroughout
theprocess’s executionwith a resultingimprovementin theresidentworking setof theoriginal model.For
blade, therearesignificantimprovementsin bothclusterpagespanandresidentworking set(Figure4(b)),
which resultsin morethana 50%reductionin R-Simp’s executiontime (Figure4(b) andTable2). Spatial
sortingalsobenefitsthehand anddragon models(Table2).
Sincespatialsortingdramaticallyimprovesthememorylocality of R-Simp without any changesto the
algorithmitself, it suggeststhatR-Simp doesexhibit goodmemorylocality if theinputmodelalsohasgood
spatiallocality. If theinput modelhaspoorlocality, which is thecasefor thecommonly-usedPLY files for
thesemodels,thenR-Simp will suffer a performancehit. Although thespatialsort is currentlyan off-line
preprocessingphasefrom R-Simp, we mayintegrateit into our implementationof R-Simp in thefuture. In
themeantime,our experimentsandmeasurementssuggestthatotherresearchersshouldconsiderspatially
sorting the input modelsfor their graphicssystems.Spatialsorting is a simple and fast procedurewith
substantialperformancebenefitsfor R-Simp (Table2). Whetherothersimplificationalgorithmswill also
benefitfrom spatialsortingis unclearandis left for futurework.
4.3 On-Line Reorganizationof Data Structures
Theperformanceimprovementsdueto astaticspatialsortaresubstantial.However, asamodelis iteratively
simplified, theremay be an opportunityto dynamicallyimprove memorylocality above andbeyond the
benefitsof the spatialsort. In particular, asa clusterin themodelis split andrefinedby R-Simp, the size
of the resultingclustersis reduced,but the verticesandfacesreferencedby the clusterarethe samedata
structuresaswhenthe modelwasfirst readinto main memory. A clusterthat containsverticesandfaces
from disparatepartsof theoriginal modelwill have a high clusterpagespanevenif thenumberof vertices
andfacesarelow.
We have implementeda versionof R-Simp that dynamicallyreorganizesits internaldatastructuresin
order to reduceclusterpagespan.Specifically, before a sub-clusteror clusteris insertedinto the priority
queue(seeFigure3), theclustermaybeselectedfor cluster data structure reorganization (or simply, reor-
ganization). If selectedfor reorganization,theverticesandfacesassociatedwith theclusterarecopiedfrom
theglobalvertex andfacelists into new lists on new pagesof virtual memory. Thebasicideais similar to
compactingmemoryto reducefragmentationin memorymanagement.Internalto theclusterdatastructure,
the lists of verticesandfacesnow refer to thenew vertex andfacelists, therebyguaranteeingtheminimal
possibleclusterpagespan.If theclustermakesit to thefront of thepriority queueagain,thetopologycheck
17
andothercomputationsperformedon theclusterwill have fewerpagefaultsdueto theimprovedlocality of
how thedatais stored.
Sincetherearecopying andmemoryallocationoverheadsassociatedwith reorganization,it is notdone
indiscriminately. Two criteriamustbemetbeforereorganizationis performed:
1. Theclusterpagespanafter reorganizationmustbelessthan � pages.
2. Theclusteris aboutto beinsertedinto thepriority queuewithin thefront 50%of clustersin thequeue
(i.e., theclusteris in thefront half of thequeue).
The first criteria controlsat what point reorganizationis performed,in termsof clustersize. Reorga-
nizing whenclustersarelarge is expensive andinherentlypreserveslargeclusterpagespans.Reorganizing
whenclustersaresmallmaydelayreorganizationuntil mostof thesimplificationphaseis completed,thus
reducingthechancesof benefitingfrom the reorganization.A simplecountof thenumberof verticesand
facesin thecluster, multipliedby theamountof storagerequiredfor each,candeterminethepotentialcluster
pagespanafter reorganization.Thespecificvalueof � chosenasthe thresholdfor thefirst criteriahas,so
far, beendeterminedexperimentally. For example,asweincrease� , thebenefitsof reorganizationfor blade
increasesuntil it plateausandstartsto decrease(Figure9). The optimal valuefor � is found to be 1,024
pages(i.e.,4 MB giventhe4 K pagesonourplatform)for blade. For thethreemodels,theoptimalvalueof
� wasempiricallybetween1,024and4,096pages.
Thesecondcriteriatriesto maximizethechancesthata reorganizedclusterwill beaccessedagain(i.e.,
it will reachthe front of thepriority queueagainandbere-used).Reducingthepagespanof a clusterthat
is not accessedagainuntil thepost-simplificationphaseproducesfewer benefits.Thevalueof “50%” was
alsoempiricallydetermined.Sincethepriority valueof theclusteris computedbeforeit is insertedinto the
queue,this is acheapcriteriatestto perform.
Thebenefitsof reorganizationarereflectedin thereducedclusterpagespanandincreasedresidentwork-
ing set(Figure5(c)), reducednumberof pagefaults(Table3), andmostimportantly, in thelower run times
(Table2). By itself, reorganizationcansubstantiallyimprove performance.In fact,for hand (Figure6), re-
organizationaloneproducesa substantiallygreaterimprovementthanspatialsortingalone.And, for blade
(Figure8), spatialsortingaloneproducesa slightly greaterimprovementthanreorganizationalone. Both
techniquesareuseful. But, whenbothspatialsortingandreorganizationareappliedto R-Simp, thereis an
additionalbenefit(Table2, Figure6, Figure7, Figure8). Of course,thetwo techniqueshave someoverlap,
sothebenefitsarenot completelycumulative.
18
0
20
40
60
80
100
120100
61.0
10,000
21.8 19.3
100
59.8
20,000
18.0 15.4
100
60.9
40,000
15.8 13.6
Outputsize(polygons)
NormalizedExecutionTime (lower is better)
R-Simp SpatialSort Reorganization Both
Figure6: Hand:NormalizedExecutionTime
0
20
40
60
80
100
120100
34.8
10,000
32.220.4
100
35.3
20,000
30.520.3
100
35.9
40,000
32.120.2
Outputsize(polygons)
NormalizedExecutionTime (lower is better)
R-Simp SpatialSort Reorganization Both
Figure7: Dragon:NormalizedExecutionTime
0
20
40
60
80
100
120100
48.8
10,000
54.642.7
100
48.6
20,000
52.444.1
100
49.5
40,000
53.946.9
Outputsize(polygons)
NormalizedExecutionTime (lower is better)
R-Simp SpatialSort Reorganization Both
Figure8: Blade:NormalizedExecutionTime
19
0
20
40
60
80
100
120100
64
86.4
128
79.5
256
73.4
512
72.4
1024
73.3
2048
73.1
4096Threshold̂ for Reorganization(in pages)
NormalizedExecutionTime(lower is better)
Figure9: Blade:VaryingReorganizationThreshold,NormalizedExecutionTime,40,000polygonoutput
5 Concluding Remarksand Futur e Work
In computergraphicsandvisualization,thecomplexity of themodelsandthesizeof thedatasetshavebeen
increasing.Modern3D scannersandrising standardsfor imagequality have fueleda trendtowardslarger
3D polygonalmodelsandalsointo modelsimplificationalgorithms.Suchalgorithmsreducethenumberof
polygonsin a modelwhile attemptingto maximizethequality of thesimplifiedmodel. By pre-computing
simplified modelsin advance,a modelwith the appropriatelevel of detail for a given usecanbe selected
andeffective trade-offs canbemadebetweenquality andperformance.
R-Simp is a new modelsimplificationalgorithmwith low run times,easycontrol of the outputmodel
size,andgoodmodelquality. However, R-Simp andsimilaralgorithmssuffer from pagingbottleneckswhen
the model is too large to fit in physicalmemoryandthe run-timescanextend into hours. No matterthe
amountof RAM that onecanafford, theremay be a model that is too large to fit in memory. We have
developedspatialsortinganddatastructurereorganizationtechniquesto improve the memorylocality of
R-Simp andexperimentallyshown it to improve performanceby up to 7-fold. We have alsointroducedthe
clusterpagespanmetricasonemeasureof memorylocality in modelsimplification.
Froma systemspoint of view, eventhoughthesimplifiedmodelscanbepre-computed,it is important
that thesimplificationprocessbeasfastaspossible.Often,many differentsimplifiedversionsof thesame
full-sizedmodelmustbepre-computed;themoreversions,themorelikely thatamodelwith acertainlevel
of detailisavailablefor agivendisplaysituation.If simplificationtakestoolong,thedesignerof thegraphics
or visualizationsystemmaycomputefewer versionsof themodel,resultingin a poorertrade-off between
imagequality andrenderingperformancein a given situation. Sincememorylocality andpagingappear
to be first-orderdeterminantsof performancein the simplificationof large models,we have characterized
locality throughmetricsanddevelopedtechniquesto significantlyimprove performancein practice.
20
For futurework, weplanto studymemorylocality in othersimplificationalgorithmsandto determineif
spatialsortingandreorganizationcanalsoimprove their performance.More experimentationandanalysis
is alsorequiredto understandthespecificthresholdsandcriteriausedto selectively reorganize.Finally, we
needto repeatourexperimentsondifferenthardwareplatforms,with differentratiosof physicalmemoryto
modelsize,to betterunderstandthatparameterspace.
References
[1] Dmitry Brodsky and BenjaminWatson. Model simplification throughrefinement. In Sidney Fels and Pierre
Poulin, editors, Graphics Interface ’00, pages221–228.CanadianInformation ProcessingSociety, Canadian
Human-ComputerCommunicationsSociety, May 2000.
[2] Michael GarlandandPaul S. Heckbert. Surfacesimplificationusingquadricerror metrics. In SIGGRAPH 97
Conference Proceedings, AnnualConferenceSeries,pages209–216.ACM SIGGRAPH,AddisonWesley, August
1997.
[3] Paul S.HeckbertandMichaelGarland.Survey of polygonalsurfacesimplificationalgorithms.Technicalreport,
CarnegieMellon University, 1997.Draft Version.
[4] HuguesHoppe,Tony DeRose,TomDuchamp,JohnMcDonald,andWernerStuetzle.Meshoptimization.In SIG-
GRAPH 93 Conference Proceedings, volume27 of Annual Conference Series, pages19–26.ACM SIGGRAPH,
AddisonWesley, August1993.
[5] M. Levoy, K. Pulli, B. Curless,S.Rusinkiewicz,D. Koller, L. Pereira,M. Ginzton,S.Anderson,J.Davis, J.Gins-
berg, J.Shade,andD. Fulk. Thedigital michaelangeloproject:3dscanningof largestatues.In Proceedings ACM
SIGGRAPH 2000, pages131–144.ACM, 2000.
[6] PeterLindstrom. Out-of-coresimplificationof largepolygonalmodels.In Proceedings ACM SIGGRAPH 2000,
pages259–262.ACM, 2000.
[7] JarekRossignacandPaulBorrel. Multi-resolution3D approximationsfor renderingcomplex scenes.In Modeling
in Computer Graphics: Methods and Applications, pages455–465,Berlin, 1993.Springer-Verlag.
[8] William J. Schroeder, JonathanA. Zarge,andWilliam E. Lorensen.Decimationof trianglemeshes.Computer
Graphics, 26(2):65–70,July 1992.
21