sketch-based recognition system for general …sherwood/pubs/sbim-10-figures.pdfstick figure, and...

8
EUROGRAPHICS Symposium on Sketch-Based Interfaces and Modeling (2010) M. Alexa and E. Do (Editors) Sketch-Based Recognition System for General Articulated Skeletal Figures S. Zamora 1 and T. Sherwood 1 1 Department of Computer Science, University of California, Santa Barbara Abstract We present a new recognition system for detecting general articulated skeletal figures in sketch-based applications. We abstract drawing style from recognition by defining figures using two data models: templates and figure targets. Our system recognizes general skeletal figures consisting of lines and ellipses which may be drawn partially off the canvas with the system automatically completing the figure. Figures are modeled as graphs and are allowed to contain cycles. Subgraph matching on a graph built from the stroke input is used to perform recognition. This paper outlines our system design, key details to its proper implementation, and proposes its application to various domains of sketch recognition. Categories and Subject Descriptors (according to ACM CCS): I.4.8 [Image Processing and Computer Vision]: Scene Analysis—Object recognition 1. Introduction Stick figures are a universal staple of sketching that effec- tively demonstrate the sort of expressiveness that motivates research into sketch recognition. Each figure conveys a great deal of information – an entity is represented (such as a hu- man being) in a given position and orientation on the canvas. The strokes of a sketched figure outline a structure of bones and joints in a specific pose with each bone in its own po- sition and orientation. The resulting composition is an artic- ulated skeletal structure that we can define using a graph of connected strokes. However, stick figures may be drawn in different styles; some may draw a torso with an ellipse while others may use a single line. They are also often drawn roughly; lines may not touch or could cross (undershoot and overshoot), cross junctions can be expressed using two, three, or four lines, etc. Figures may even be drawn partially offscreen, as of- ten the case with human stick figures where only the torso is important and the rest of the body is ‘off camera’, so to speak. But even when drawn partially offscreen, these fig- ures remain recognizable by a human observer. While sketch recognition systems strive to be robust under noisy input, ro- bustness to drawing style and abstraction is a different class of problem. Figure 1: Two human stick figures sketched and detected by our skeletal figure recognizer. Because the templates for the recognized figures identify both as ‘Human’, the two figures are annotated with labels from the ‘Human’ figure target. In order to design a stick figure recognizer to handle style and abstraction, we generalize our approach to the more gen- eral skeletal figure, a general articulated skeleton of lines and ellipses. In order to abstract from the application the style by which a figure is drawn, we separate drawing styles (mod- eled as templates which outline a general skeletal figure) c The Eurographics Association 2010.

Upload: others

Post on 10-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

EUROGRAPHICS Symposium on Sketch-Based Interfaces and Modeling (2010)M. Alexa and E. Do (Editors)

Sketch-Based Recognition System for General ArticulatedSkeletal Figures

S. Zamora1 and T. Sherwood1

1Department of Computer Science, University of California, Santa Barbara

AbstractWe present a new recognition system for detecting general articulated skeletal figures in sketch-based applications.We abstract drawing style from recognition by defining figures using two data models: templates and figure targets.Our system recognizes general skeletal figures consisting of lines and ellipses which may be drawn partially offthe canvas with the system automatically completing the figure. Figures are modeled as graphs and are allowedto contain cycles. Subgraph matching on a graph built from the stroke input is used to perform recognition. Thispaper outlines our system design, key details to its proper implementation, and proposes its application to variousdomains of sketch recognition.

Categories and Subject Descriptors (according to ACM CCS): I.4.8 [Image Processing and Computer Vision]: SceneAnalysis—Object recognition

1. Introduction

Stick figures are a universal staple of sketching that effec-tively demonstrate the sort of expressiveness that motivatesresearch into sketch recognition. Each figure conveys a greatdeal of information – an entity is represented (such as a hu-man being) in a given position and orientation on the canvas.The strokes of a sketched figure outline a structure of bonesand joints in a specific pose with each bone in its own po-sition and orientation. The resulting composition is an artic-ulated skeletal structure that we can define using a graph ofconnected strokes.

However, stick figures may be drawn in different styles;some may draw a torso with an ellipse while others may usea single line. They are also often drawn roughly; lines maynot touch or could cross (undershoot and overshoot), crossjunctions can be expressed using two, three, or four lines,etc. Figures may even be drawn partially offscreen, as of-ten the case with human stick figures where only the torsois important and the rest of the body is ‘off camera’, so tospeak. But even when drawn partially offscreen, these fig-ures remain recognizable by a human observer. While sketchrecognition systems strive to be robust under noisy input, ro-bustness to drawing style and abstraction is a different classof problem.

Figure 1: Two human stick figures sketched and detected byour skeletal figure recognizer. Because the templates for therecognized figures identify both as ‘Human’, the two figuresare annotated with labels from the ‘Human’ figure target.

In order to design a stick figure recognizer to handle styleand abstraction, we generalize our approach to the more gen-eral skeletal figure, a general articulated skeleton of lines andellipses. In order to abstract from the application the style bywhich a figure is drawn, we separate drawing styles (mod-eled as templates which outline a general skeletal figure)

c© The Eurographics Association 2010.

Page 2: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 2: The six graph elaboration strategies – (a) prox-imity linking, (b) link closure, (c) virtual junction split-ting, (d) spurious segment jumping, (e) continuity tracing,and (f) crossed segment co-splitting. Grey/blue lines denotebonds/links present before the strategy is applied. Green/redlines denote new bodied/non-bodied links created by thestrategy. In (f), the right structure replaces that on the left.

from the class of figure (such as Human, Cat, Elephant, etc.,modeled as figure targets) to be recognized. We then extendprevious work in the field that has been shown to be veryeffective in recognizing skeletal figures [MF02b]. The con-tributions in our paper include a new recognition system forgeneral articulated skeletal figures consisting of lines and el-lipses, detection of partially drawn figures that extend off thecanvas, and support of skeletal figures that include cycles.

2. Related Work

Hammond and Davis developed LADDER [HD05], a cross-domain framework for creating recognizers from descrip-tions written in the LADDER language. While many sketchapplications have been developed using LADDER across avariety of domains [TH10] [PEW∗08] [TPH09], its lan-guage is geared for diagrams with spatial (above, left of, etc.)and angular (acute, vertical, etc.) constraints and is not suitedfor general articulated figures that could be drawn irregularly(e.g. a T junction drawn with two lines versus three). Exam-ples of stick figures in LADDER can be found using suchconstraints [HD03]. LADDER also does not support rota-tional constraints on ellipses as described in Section 3.

Alvarado and Davis developed SketchREAD [AD04],a recognition system similar to LADDER that also uses atemplate language to define recognition. SketchREAD alsouses relative and angular constraints that make it unsuit-able for detection of articulated skeletal figures. Both ofthese projects require templates to be hand-coded, although agraphical tool for finding bugs was developed for LADDER[HD06]. The Electronic Cocktail Napkin project [GD96] al-lows templates to be generated from sketched examples butonly builds constraints from relative and loose spatial rela-tions (concentric, contains, overlaps, etc.).

Lee et al. [LBKS07] developed a graph based recog-nizer that functions similarly to the model and data graphapproach used in our system. Their recognizer uses offlinegraph matching with training data to build a template to rec-ognize, and uses subgraph matching in order to recognize the

stroke input. However, their matching algorithm takes intoaccount relative angles between primitives, which would notsupport articulated figures. Sezgin and Davis [SD05] usedHidden Markov Models to build a recognizer that takes ad-vantage of the consistency of an individual’s stroke draw-ing order. While their examples and user studies involvedrecognizing stick figures, recognized lines are identifiedspecifically as horizontal, vertical, and positively or nega-tively sloped, making the recognizer sensitive to rotation andunable to handle articulated figures. The recognizer usedby Yang et al. [YSvdP05] also uses graph based templatematching. Their system does not require figures to be fullyconnected and uses a hybrid spatial and structural approachfor matching sketches to templates. However, their systemdoes not fully discern the structure of the sketched scene andis not designed for articulated, rotationally-invariant figures.

Our work extends Mahoney and Fromherz’s stick figurerecognition system [MF02a] [MF02b] [MF01] which mod-els stick figures as graphs that are matched to a graph gen-erated from the stroke input. However, their implementa-tion only recognizes a single figure consisting only of linesand does not have support for ellipses. Their implementa-tion does not support partially offscreen figures, and doesnot explore the extension to generalized skeletal figures thateffectively applies the recognizer to a variety of domains.

To overview the work we are extending, Mahoney firstrepresents the templated figure to be recognized as the modelgraph, where bones of the figure are nodes and joints be-tween bones are edges. The sketched input data is repre-sented as the data graph where the bodies of lines are edgescalled bonds and their endpoints serve as nodes. Relationsbetween strokes in the data graph are edges called links. Aprocess called graph elaboration is used to add the spatialand visual structure of the sketched scene to the data graph.This process adds links, alternate line segments, and mutualexclusion constraints (to prevent a stroke and its alternateinterpretations from participating in a match together) to thedata graph. Figure 2(a–e) outlines the original elaborationstrategies used in Mahoney’s algorithm. These elaborationstrategies also make the recognizer tolerant to noise in thesketched figure, including line overshoot and undershoot atjunctions. These strategies are discussed in Section 4.1.

Finally, Mahoney’s system performs subgraph matchingto find the best match for the model graph in the data graph.Considerations of mutual exclusion constraints and compo-nents marked as optional are applied. The best match thatmaximizes the number of matched parts and minimizes anenergy function supplied with the model graph (such as theone described in Section 5.3) is then returned. A key bene-fit of this approach is that new strokes only add structure tothe data graph and do not destroy substructures that couldproduce matches.

c© The Eurographics Association 2010.

120

Page 3: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 3: Overview of our system for abstracting drawingstyle from figure class. In (a) we model the type of figure torecognize (Human, Cat, etc.) as a Figure Target and eachstyle of drawing the figure target as a Template. When atemplated figure is sketched, as shown in (b), the sketch isannotated with the skeletal structure of the figure target.

3. System Design

Our key motivation lies in handling the variety of styles usedfor sketching skeletal figures. Even though human stick fig-ures are drawn in a number of styles, with different config-urations of lines and ellipses for heads, torsos, and limbs,all are recognizable by an observer as a ‘human’ figure. Oursystem mirrors this sort of polymorphism using two models:templates and figure targets. Figure targets outline a classof figure the system will match (e.g. Human, Cat, Arrow),while templates define each style of drawing the given fig-ure, as illustrated in Figure 3(a). When recognition occursin 3(b), the sketched figure that matches a given template isannotated with data points from the figure target. Our mo-tivation is satisfied by abstracting from the application thestyles by which the user can draw, for instance, a humanstick figure, and exposing that they have simply drawn a hu-man.

Figure 4: This figure motivates rotational constraints on el-lipses. (1a) and (1b) are a control and posed human with apoint as a torso. (2a) and (2b) are a control and posed hu-man with an ellipse as a torso. As junctions consisting of alllines collapse to a single point, ( shown by a single green cir-cle in (1a) and (1b) ) and junctions involving ellipses do not( shown by many green circles in (2a) and (2b) ), (2b) shouldnot be recognized as its right leg and arm are swapped.

The system performs the following steps in order to rec-ognize sketched figures:

• Drawn strokes are classified as lines and ellipses andadded to our data graph. Special care is given to strokesdrawn near the edge of the canvas which can be identifiedas extending "offscreen" (Discussed in Section 4).

• After each stroke is drawn, graph elaboration is used toadd connectivity relations and alternate interpretations ofthe sketched scene’s structure to the data graph. These al-ternate interpretations are kept seperate using mutual ex-clusion constraints so no two interpretations of the sameset of strokes can be used in the same match.

• Each template loaded into the recognizer (each used as aseparate model graph) is checked against the data graphusing subgraph matching. If an instance of the template isfound in the data graph, it is added as a candidate figure.

• A fitness metric for each candidate is used to sort the setof candidate figures, and the set of best fitting, distinctcandidates (not sharing any line segments or ellipses) arechosen as the set of recognized figures in the scene.

• The strokes for each recognized figure are annotated withthe set of key points defined by it’s template’s figure tar-get. Applications that work only with these key pointscan then ignore the particular template (i.e. drawing style)used to recognize the figure.

Figure targets define these key points as anchor points. Ahuman figure target may have anchor points such as ‘Top ofHead’, ‘Shoulder 1’, ‘Elbow 1’, etc. Each figure targets is agraph where anchor points serve as nodes. These nodes areconnected in a way that is meaningful to the figure’s skele-ton – hips connect to knees, knees connect to ankles, etc.Anchor points may also be flagged as optional, identifying

c© The Eurographics Association 2010.

121

Page 4: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 5: Rotational constraints are not imposed withinchild groups of an ellipse. The template definition for a skele-tal figure is given in the top left. In the remaining three fig-ures we see three valid matches for the given template.

components of a figure that may not present in all drawingstyles, such as a human figure’s hands and feet.

Templates serve as the model graphs used during sub-graph matching. Templates are mapped to a given figure tar-get by labeling the template’s nodes with anchor labels fromthe figure target. When a sketched figure is matched againsta template, these labels are used to annotate positions on theline segments and ellipses of the sketched figure. Since alltemplates that correspond to a figure target share the samelabel set, if an application only works with the set of the la-bels on detected figures, it will be able to ignore the style bywhich a figure has been drawn.

Our algorithm uses both lines and ellipses as primitive el-ements of the data and model graphs. We support ellipses byintroducing rotational constraints that enforce a rotationalordering of components along the contour of an ellipse. Aswe have modeled ellipses as nodes in the model and datagraph, these rotational constraints are necessitated by thedistinctions between line endpoint and ellipse junctions as il-lustrated in Figure 4. But since some figures may not requirerotational constraints between all of their components (ex-emplified in Figure 5) we organize primitives connected toan ellipse in the model graph into n child groups and imposerotational constraints between (but not within) these groups.These constraints are necessary but significantly complicatethe structure of the graphs as well as the matching process.

There are five node types in our model graph: line, el-lipse, anchor, reference, and ‘node’. Line and ellipse nodesrepresent the line and ellipse primitives the model graph willmatch in the sketched scene. The connectivity of the skeletalfigure to be matched is mirrored by the edge connections be-tween these primitive nodes in the model graph. Ellipses alsodefine 8 child groups in counter-clockwise order at 45 degreeangles across the ellipse contour. As illustrated in Figure 6,

Figure 6: The placement of leaf anchor nodes in an ellipse’sset of child groups determines the position of the anchorpoint on matched figures.

this allows for anchor labels to be positioned at different lo-cations on the ellipse using anchor nodes.

Anchor nodes are placed in the model graph at joints be-tween a set of connected lines and ellipses. Labels appliedto anchor nodes will then position the anchor point on amatched figure on the joint between the set of correspondingstrokes. Along with anchor nodes, labels may also be appliedto lines. Labels applied to lines will position the anchor pointon a matched stroke’s midpoint. This adds additional flexi-bility – a human stick figure using a single line to representa leg can place a knee anchor point at the line’s midpoint.

All templates use another label not inherited from theirfigure target called "Start". The node labeled start defines thestart point on the template where subgraph matching will be-gin. Our model graph is then modeled as a model graph treerooted at the node labeled start. If a line is labeled start itsnode will be the root of the model graph tree. If a joint is la-beled start, then a ‘node’ node is set as the root of the tree in-dicating that subgraph matching should begin at clusters, asdefined in Section 5. An example template and correspond-ing model graph tree is illustrated in Figure 7(a) and 7(c).Note also that as ellipses will always have a parent in themodel graph tree, only 7 of its child groups will be used forchildren. Lines at the root of the model graph tree have 2child groups for both its endpoints.

Reference nodes are used to represent cycles in the modelgraph tree. This allows templates and the drawing styles theycapture to include closed shapes. These reference nodes tar-get an ellipse or line node elsewhere in the model graph tree.During subgraph matching a candidate figure must be ableto satisfy the cycles denoted by reference nodes in order tobe a valid candidate.

Finally, the arrangement of optional anchor labels may al-low some components of a matched figure to be omitted. De-termining which lines and ellipse nodes in the model graphare optional for a general skeletal figure requires caution asoptional anchor labels can be placed anywhere in the tem-plate. We identify nodes in three ways - non-optional, par-tially optional, and pure optional. Non-optional nodes mustbe matched, while pure optional nodes may be absent. Par-

c© The Eurographics Association 2010.

122

Page 5: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 7: Our editor for creating figure targets and tem-plates. (a) shows a human figure target. (b) shows a hu-man template and its matching model graph tree is shownin (c). (d) shows our skeletal animation previewer, allowingthe user to drag around anchor points and deform the model.

tially optional nodes which are assigned both optional andnon-optional anchor labels are also required to be presentfor a match. Using this categorization, a recursive operationstarting at the root node of the model graph tree can be usedto determine the optionality of a templates line and ellipsecomponents.

4. Generating the Data Graph

Strokes are categorized as lines, arcs, polylines, ellipses (in-cluding circles), or complex, using shape recognizers de-scribed by Paulson and Hammond [PH08] and a corner de-tection scheme described by Sezgin [SD04]. Lines and arcsare added to the data graph as two line endpoint nodes con-nected by a bond edge. Ellipses are added as an ellipse node.Polylines are decomposed and added as line segments. Com-plex strokes are treated like polylines but adjacent substrokesare connected with direct links as described in Section 4.1.

Graph elaboration may create bodied links in the datagraph which function like bonds and can be matched dur-ing recognition. Bodied links are alternate interpretations ofstrokes in the sketched scene. For example, two adjacentcollinear strokes are merged as a bodied link. Mutex con-straints placed between the bodied link and its source com-ponents ensure that a stroke is not matched multiple times ina template under different interpretations.

To support partially offscreen figure detection we identifylines and ellipses that extend offscreen. If a line endpoint iswithin a threshold of the canvas edge we identify that end-point and line as variable extension. If both endpoints of anarc are near the canvas edge we classify it as a virtual ellipse.

4.1. Elaboration Strategies

The graph elaboration strategies we employ are outlined inFigure 2. With the exception of CSC, these strategies are de-

tailed in [MF02b]. Our system’s utilization of these strate-gies differs from Mahoney’s work in the following ways:

• We add a new elaboration strategy called crossed segmentco-splitting (CSC). Because our recognizer is designedfor tablet input (as opposed to a scanned document, asper Mahoney’s approach), CSC is needed to decomposepairs of crossed strokes into four separate line segments.Four new line endpoint nodes are added at the intersectionpoint and the original bonds are replaced with four newbonds between the new and original line endpoints. Di-rect links are placed between subsegments of the originalbonds, and regular links are placed to fully connect thenew endpoints. Lines are similarly split when they passthrough ellipses.

• Proximity linking (PL) adds links between free ends, de-fined as line endpoints not connected to any non-bodiedlinks. In order to handle cycles induced by sketchedclosed structures, we also allow PL to link any line end-points with distances that lie below a threshold.

• Link closure (LC) does not need to be performed due toour explicit handling of clusters as defined in Section 5.

• While Mahoney only describes handling instances of vir-tual junction splitting (VJS) independently, we locate allendpoints that split a bond B. If n free endpoints are foundthat create virtual junctions, n+1 bodied links are createdbetween the n free endpoints and the two endpoints ofB. A mutex constraint is made between all the generatedbodied links and B.

4.2. Applying the Strategies

Close consideration of graph elaboration finds that cautionmust be made in applying these strategies. Improper order-ing will result in an mis-elaborated graph, which will notrecognize figures that should be recognized. For example, ifPL is applied before VJS, two lines drawn at a T junctionwill fail to create a virtual junction and instead connect thelines at their closest endpoints.

We apply an elaboration step each time a stroke is addedto the scene. This is run once for lines and ellipses and oncefor each line segment in polylines and complex strokes (forcomplex strokes the direct links are created between the linesegments after all of these elaboration steps. Elaboration isthen run once more without adding any new primitives).

We begin an elaboration step by removing all of the bod-ied links, non-bodied links, and mutex constraints from thedata graph, with the exception of links created during CSC.We then perform CSC using the added primitive, checkingif an added line is to be split with other bonds or if an addedellipse should split existing bonds.

We next combine VJS, spurious segment jumping (SSJ),and what we call round 1 continuity tracing (R1CT). As VJSand SSJ can be easily combined into a single pass on a can-didate line, this pass is augmented with continuity tracing. If

c© The Eurographics Association 2010.

123

Page 6: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 8: Generated R1CT bodied links is shown in thisfigure. Six bodied links are created by VJS amongst whichall contiguous subsets of lengths [2,5] form R1CT bodiedlinks. The blue R1CT bodied link is mutexed against all linkswithin the green box ( highlighted in red ) and the bond.

B is split into n bodied link segments, then n(n− 1)/2− 1bodied links should be made to cover all sets of i adjacentsegments, where i = [2,n−1], as illustrated in Figure 8. Mu-tex constraints (also highlighted in Figure 8) are carefullyplaced to enforce all appropriate constraints within the setof new bodied links and the original bond. We apply thisVJS/SSJ/R1CT process to every bond in the data graph.

We then perform PL, followed by round 2 continuity trac-ing (R2CT) which functions as described in Section 4.1.R2CT is applied to all of the bonds and bodied links in thedata graph, including those created during R2CT.

5. Subgraph Matching

After graph elaboration we recognize figures using subgraphmatching. We initialize a candidate match for template T atappropriate locations based on the root node of T ’s modelgraph tree. Matching can begin at free endpoints, ellipses,and clusters - subgraphs in the data graph of non-bodiedlinks and line endpoint nodes. For each candidate we walkthrough the data graph and assign primitives to nodes in T ’smodel graph tree with regard to mutex constraints. Anchorpoints for anchor nodes in the model graph tree are placedat the centroids of the set of line endpoints and ellipse con-nection points that participate in the joint. Anchor points forlabels placed on lines are positioned at the midpoint betweenthe line’s endpoints. We also make sure that primitives as-signed to reference nodes match the assignment to the tar-geted model graph node to support cycles.

Once the model graph tree has been satisfied, we havefound a match. Our implementation finds all possiblematches in the scene, sorts them by fitness, and returns thebest distinct figures (that do not share strokes).

5.1. Enumerating All Possible Matches

At each step during the matching process, there exists a setS of all lines and ellipses that can be used from our current

Figure 9: Template and sketched figure exemplifying a fac-torial expansion of the search space. 28,304,640 possiblematches exist between the sketch and the template.

position in the data graph. All permutations and combina-tions of S that can satisfy the next group of primitives to bematched (with respect to rotational ordering along ellipses,mutex constraints, etc.) must be explored in order to find allpossible matches. As shown in Figure 9, exploring all validsubsets of S can result in factorial growth of the search space,but must be done to locate the user’s intended figure.

5.2. Partially Offscreen Figures / Optional Components

If variable extension lines are matched or optional compo-nents are not matched, the subtree of the template rooted atthat element is ignored and effected anchor point locationsare extrapolated using relative angles and edge distances inthe figure target and the scaling factor of the drawn figure(as defined in Section 5.3). If a virtual ellipse is matched, weobserve that any of the virtual ellipse’s children in the modelgraph tree may be matched to an imaginary primitive on thecontour of the virtual ellipse, requiring additional consider-ation to properly enumerate all possible assignments.

Note that because matching begins at the root of the tem-plate’s model graph tree, the joint or line labeled start mustbe present in the canvas for an offscreen figure to be de-tected. This should be kept in mind when determining whatelement should be labeled start on a template.

5.3. Determining Fitness

We define fitness using the least squares error of each prim-itive’s scale in the drawn figure against its matching ele-ment in the template. To support scale invariance, we usethe matches scaling factor. Given n elements that make upthe template, ti gives the size of the element i in the template(defined as the length of a line and diameter of the circle rep-resenting an ellipse), and d j gives the size of the primitive jin the scene (the distance between endpoints for lines andtwice the average distance of points along the stroke fromthe stroke centroid for ellipses), our error is computed as:

scaling factor =1n

n

∑i=1

tidi

error =n

∑i=1

(ti − scaling factor ∗di)2

c© The Eurographics Association 2010.

124

Page 7: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

Figure 10: Nine symbols from various domains modeled, identified, and annotated by our recognizer. The template for thesymbol is given at the top, and below are 3 successful recognitions (the third being partially offscreen) and 1 failure case. Thesefailures occur for different reasons, such as wrong primitives being matched (1,8), mis-elaboration (2), template mismatching(3,4,5,7), and illogically articulated yet positive results (6,9). Computation times are given for the combined elaboration andrecognition step after the final stroke added to each figure. All 9 templates were loaded in the system during the matching phasefor each trial. These times are only given for reference - our implementation is non-optimal.

If i is not present in the match due to being completelyoffscreen, di is set to ti. If i is a variable extension line thendi is set to max(di, ti). If i is a virtual ellipse then di is setto the diameter of the circle found using the two endpointsand center point of the stroke of the virtual ellipse. A fit-ness score is determined using this error, the percentage ofmatched non-optional components, and the percentage ofmatched optional components.

6. Implementation

We implemented our skeletal figure recognizer in Java 1.6.Matched figures are annotated with anchor points, as seenin Figure 1. Our application also uses the anchor points topose a skeletal mesh built on our figure targets using tra-ditional skeletal animation methods. Java3D was used fortexture mapping our posed skeletons.

Figure 7 shows our GUI-based editor for generating fig-ure targets and templates. Figure targets are built by defin-ing each anchor point and noting if they are optional. Anchorpoints are then dragged into place in the canvas and edges aremade between them. When the graph is connected the figuretarget is marked as ready. To support skeletal animation andtexture mapping, a paint brush is given to draw a caricatureof the figure on the canvas. Vertices of the mesh are added tothe canvas and manually triangulated by the user. An anchorpoint is chosen as the ‘root’ of the figure target’s skeleton

and a hierarchy of bones is generated. Vertices can be as-signed to two bones with a weighting factor. A previewer,seen in Figure 7(d), allows the user to drag anchor pointsand preview how their skeleton deforms.

Building templates involves adding lines and ellipses to acanvas. Two nodes are created on the endpoints of each line,and eight nodes are created along the contour of each ellipse.Lines and ellipses are connected by being drawn so that theirnodes merge. The figure target the user intends the templateto reference is chosen from a drop-down list of ready figuretargets. Choosing a figure target presents a list of its anchorlabels. Once the user assigns each label to nodes and linemidpoints, and if the graph of components is connected, thetemplate is marked as ready. The optionality of primitivesis automatically inferred based on the distribution of the an-chor labels. All ready templates are then used in the mainapplication’s recognition step.

7. Conclusions

Our recognizer works as intended and has a good recogni-tion rate of sketched symbols. Although the system is de-signed to recognize articulated figures, the generality of ourapproach and its support for templates that contain closedshapes allows the recognizer to accept a wide variety of sym-bols, including those which are not articulated. Any symbolthat is connected and consists of lines and ellipses can be

c© The Eurographics Association 2010.

125

Page 8: Sketch-based recognition system for general …sherwood/pubs/SBIM-10-figures.pdfstick figure, and exposing that they have simply drawn a hu-man. Figure 4: This figure motivates rotational

S. Zamora & T. Sherwood / Sketch-Based Recognition System for General Articulated Skeletal Figures

found by our recognizer. Figure 10 shows our system rec-ognizing a variety of symbols, from humans and animals togeometric shapes, houses, cars and arrows. Although rigidlydefined symbols could be recognized by our system as in-appropriately articulated, our system is easily extended withhigh level sanity checking to cull these matches.

Although our system can be used for a variety of gen-eral recognition tasks, its focus on articulated figures can bevaluable to many domains with more specified needs. Forexample, artists who resort to motion capture or keyfram-ing to create animations for characters could instead drawa series of stick figures from which an animation could beinferred (such as in [DAC∗03]). In a storyboarding appli-cation, figures can be added and posed into storyboard cellsquickly and naturally. The support for partially offscreen fig-ures would allow artists to explore shot composition.

As for its limitations, our recognizer will not work onunconnected symbols (e.g. analog circuit symbols such ascapacitors and batteries). Due to the rotational constraintsplaced on ellipses, mirroring across ellipses cannot be han-dled without making a second, mirrored template. Withoutsanity checking, inappropriate matches with superior fitnessmetrics may be favored by the recognizer. These can re-sult from matching unintended primitives, wrong templates,mirroring (e.g. stick figures with left and right limbs maybe detected as reversed), etc. Unless the elaboration stepsare properly tuned with the right parameters, the data graphmay be improperly elaborated such that that a valid sketchmay not be recognized. Finally, our recognizer is a proofof concept and uses many brute force solvers that recom-pute large amounts of data each time a stroke is added. Wepresent some failure cases and computation times in Fig-ure 10 which could be improved upon by an optimal imple-mentation, tuned parameters and fitness metrics, and sanitychecking.

For further work, developing an implementation of ourrecognizer with an acceptable time performance is critical.Better heuristics for exploring the search space, determin-ing optimal thresholds and parameters for our elaborationstrategies, and holding a user study to assess our recog-nizer’s effectiveness are important for preparing our recog-nizer for real world scenarios. We would also like to exploreadditional elaboration strategies such as offscreen junctions,where pairs of variable extension lines or virtual ellipses areextrapolated and checked for intersection points that lie off-screen. Finally, we would like to integrate the sorts of spatialconstraints found in other recognition frameworks, with caretaken to uphold our generality and focus on articulation.

References

[AD04] ALVARADO C., DAVIS R.: Sketchread: a multi-domainsketch recognition engine. In UIST ’04: Proceedings of the 17thannual ACM symposium on User Interface Software and Tech-nology (2004), pp. 23–32. 2

[DAC∗03] DAVIS J., AGRAWALA M., CHUANG E., POPOVICZ., SALESIN D.: A sketching interface for articulated figure an-imation. In Eurographics/SIGGRAPH Symposium on ComputerAnimation, SCA (2003). 8

[GD96] GROSS M., DO E.: Demonstrating the electronic cock-tail napkin: a paper-like interface for early design. In CHI ’96:Conference companion on Human factors in computing systems(1996), pp. 5–6. 2

[HD03] HAMMOND T., DAVIS R.: Ladder: a language to describedrawing, display, and editing in sketch recognition. In IJCAI’03:Proceedings of the 18th international joint conference on Artifi-cial intelligence (2003), pp. 461–467. 2

[HD05] HAMMOND T., DAVIS R.: Ladder, a sketching languagefor user interface developers. Computers & Graphics 29, 4(2005), 518–532. 2

[HD06] HAMMOND T., DAVIS R.: Interactive learning of struc-tural shape descriptions from automatically generated near-missexamples. In IUI ’06: Proceedings of the 11th international con-ference on Intelligent user interfaces (2006), pp. 210–217. 2

[LBKS07] LEE W., BURAK KARA L., STAHOVICH T. F.: Anefficient graph-based recognizer for hand-drawn symbols. Com-puters & Graphics 31, 4 (2007), 554–567. 2

[MF01] MAHONEY J. V., FROMHERZ M. P. J.: Handling am-biguity in constraint-based recognition of stick figure sketches.In Document Recognition and Retrieval IX (2001), vol. 4670,pp. 89–100. 2

[MF02a] MAHONEY J. V., FROMHERZ M. P. J.: Interpretingsloppy stick figures by graph rectification and constraint-basedmatching. In GREC ’01: Selected Papers from the Fourth In-ternational Workshop on Graphics Recognition Algorithms andApplications (2002), pp. 222–235. 2

[MF02b] MAHONEY J. V., FROMHERZ M. P. J.: Three mainconcerns in sketch recognition and an approach to addressingthem. In Sketch Understanding, Papers from the 2002 AAAISpring Symposium (2002), 105–112. 2, 5

[PEW∗08] PAULSON B., EOFF B., WOLIN A., JOHNSTON J.,HAMMOND T.: Sketch-based educational games: "drawing" kidsaway from traditional interfaces. In IDC ’08: Proceedings of the7th international conference on Interaction design and children(2008), pp. 133–136. 2

[PH08] PAULSON B., HAMMOND T.: Paleosketch: accurateprimitive sketch recognition and beautification. In IUI ’08: Pro-ceedings of the 13th international conference on Intelligent userinterfaces (2008), pp. 1–10. 5

[SD04] SEZGIN T., DAVIS R.: Scale-space based feature pointdetection for digital ink. In Making Pen-Based Interaction Intel-ligent and Natural, AAAI Spring Symposium (2004). 5

[SD05] SEZGIN T., DAVIS R.: Hmm-based efficient sketchrecognition. In IUI ’05: Proceedings of the 10th internationalconference on Intelligent user interfaces (2005), pp. 281–283. 2

[TH10] TAELE P., HAMMOND T.: Lamps: A sketch recognition-based teaching tool for mandarin phonetic symbols i. Journal ofVisual Languages & Computing 21, 2 (2010), 109–120. 2

[TPH09] TAELE P., PESCHEL J., HAMMOND T.: A sketch inter-active approach to computer-assisted biology instruction. In IUI’09 Workshop on Sketch Recognition (2009). 2

[YSvdP05] YANG C., SHARON D., VAN DE PANNE M.: Sketch-based modeling of parameterized objects. In SBIM (2005),pp. 63–72. 2

c© The Eurographics Association 2010.

126