an agent-based framework for sketched symbol interpretation · 2008. 2. 20. · journal of visual...

ARTICLE IN PRESS

Journal ofVisual Languages & ComputingJournal of Visual Languages and Computing

] (]]]]) ]]]–]]]

1045-926X/$

doi:10.1016/j

�CorrespoE-mail ad

(V. Mascard

Please cite

Journal of

www.elsevier.com/locate/jvlc

An agent-based framework for sketchedsymbol interpretation

Giovanni Casellaa,b, Vincenzo Deufemiab,�, Viviana Mascardia,Gennaro Costagliolab, Maurizio Martellia

aDipartimento di Informatica e Scienze dell’Informazione—Universita di Genova Via Dodecaneso 35,

16146, Genova, ItalybDipartimento di Matematica e Informatica—Universita di Salerno, Via Ponte don Melillo,

84084 Fisciano (SA), Italy

Received 16 March 2007; received in revised form 12 April 2007; accepted 24 April 2007

Abstract

Recognizing hand-sketched symbols is a definitely complex problem. The input drawings are often

intrinsically ambiguous, and require context to be interpreted in a correct way. Many existing sketch

recognition systems avoid this problem by recognizing single segments or simple geometric shapes in

a stroke. However, for a recognition system to be effective and precise, context must be exploited,

and both the simplifications on the sketch features, and the constraints under which recognition may

take place, must be reduced to the minimum.

In this paper, we present an agent-based framework for sketched symbol interpretation that

heavily exploits contextual information for ambiguity resolution. Agents manage the activity of low-

level hand-drawn symbol recognizers, that may be heterogeneous for better adapting to the

characteristics of each symbol to be recognized, and coordinate themselves in order to exchange

contextual information, thus leading to an efficient and precise interpretation of sketches. We also

present AgentSketch, a multi-domain sketch recognition system implemented according to the

proposed framework. A first experimental evaluation has been performed on the domain of UML

Use Case Diagrams to verify the effectiveness of the proposed approach.

r 2007 Elsevier Ltd. All rights reserved.

Keywords: Sketch understanding; Agent-based systems; Diagram recognition; Visual language parsing

- see front matter r 2007 Elsevier Ltd. All rights reserved.

.jvlc.2007.04.002

nding author.

dresses: [email protected] (G. Casella), [email protected] (V. Deufemia), [email protected]

i), [email protected] (G. Costagliola), [email protected] (M. Martelli).

this article as: G. Casella, et al., An agent-based framework for sketched symbol interpretation,

Visual Language and Computing (2007), doi:10.1016/j.jvlc.2007.04.002

www.elsevier.com/locate/jvlc

dx.doi.org/10.1016/j.jvlc.2007.04.002

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESSG. Casella et al. / Journal of Visual Languages and Computing ] (]]]]) ]]]–]]]2

1. Introduction

The recognition of hand-sketched symbols is a very active research field, since it finds anatural application in a wide range of domains, such as engineering, medicine, andarchitecture [1–7]. However, it is a particularly difficult task since the symbols can bedrawn by using a different stroke-order, -number, and -direction. The difficulties in therecognition process are often made harder by the lack of precision and by the presence ofambiguities in messy hand-drawn sketches [8,9]. In fact, hand-sketched symbols areimprecise in nature: corners are not always sharp, lines are not perfectly straight, andcurves are not necessarily smooth.Usually, hand-drawn sketches consist of shapes whose meaning depends heavily on

context. For example, a single line fragment could constitute the side of a box, or aconnector between boxes, and its role could be disambiguated only by looking atneighboring fragments. This means that, when a recognized symbol is unique to a context,then the recognizer may exploit this information to determine the context and therebyresolve pending recognition ambiguities. The context can also be used to recover from low-level interpretation errors by reclassifying low-level shapes, obtaining significantly reducedrecognition errors [10].Besides this, the literature describes many proposals where sketch interpretation is

carried out by applying, to any symbol in the sketch, the same recognition approach. Tothe best of our knowledge, no framework exists that allows the adoption of differenttechniques for recognizing different symbols. Nonetheless, this is a very useful feature foran effective recognition process, since each symbol shows its own peculiar characteristics,which makes a particular recognition technique more or less suitable for it.In this paper, we present an agent-based framework for sketched symbol interpretation.

The reasoning process performed by the intelligent agents devoted to symbol recognition(Symbol Recognition Agents, SRA for short) and to the correct interpretation of the sketch(Sketch Interpretation Agent, SIA for short), is based on the knowledge about the domaincontext, which is used for disambiguating the recognized symbols. SRAs exchangecontextual information by cooperating with other SRAs in the system. The contextualinformation obtained in this way is sent to the SIA that solves possible conflicts and givesan interpretation of the sketch drawn so far. At the lowest level of our framework, thesymbols of the domain language are recognized by applying suitable hand-drawn symbol

recognizers (HDSRs, for short) to the input strokes. The execution of these HDSRs iscoordinated by SRAs. In spite of the differences among the existing HDSRs, several ofthem could be profitably integrated into our system. As long as there is one SRA thatcorrectly integrates a set of HDSRs by managing their execution as well as data conversionissues, the actual implementation of the HDSRs and the approach to recognition that theyadopt do not matter. For this reason, our framework has the potential to seamlessly

integrate symbols recognized by heterogeneous HDSRs.Moreover, we describe the AgentSketch system, an implemented instance of the general

framework, which can be applied to a variety of domains by just integrating the propersymbol recognizers. We have applied AgentSketch to the domain of UML Use CaseDiagrams and performed a preliminary evaluation study. The obtained results havehighlighted the effectiveness of the proposed context-based recognition approach.The paper is organized as follows. Section 2 illustrates three scenarios that, despite of

their differences, would all benefit from a multi-agent approach to sketch recognition.

Please cite this article as: G. Casella, et al., An agent-based framework for sketched symbol interpretation,

Journal of Visual Language and Computing (2007), doi:10.1016/j.jvlc.2007.04.002

dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESSG. Casella et al. / Journal of Visual Languages and Computing ] (]]]]) ]]]–]]] 3

Section 3 explains why we claim that the multi-agent approach is suitable to sketchrecognition, and outlines the agent-based general framework we propose. Sections from 4to 6 describe the three main components of our general framework, namely HDSRs,SRAs, and SIA, respectively. Section 7 describes the AgentSketch system and itspreliminary evaluation. The related work is discussed in Section 8. Finally, conclusions andfurther research are presented in Section 9.

The preliminary version of this paper was published in the proceedings of IEEESymposium on Visual Languages and Human-Centric Computing, 2006 [11].

2. Motivating examples

To motivate the agent-based approach to hand-drawn sketch recognition discussed inSection 3, we analyze three types of sketches taken from three different domains: Use CaseDiagrams (software engineering domain), ancient Egyptian hieroglyphs (hand-writtenpatterns domain), and anatomic sketches (medical domain). We have chosen these types ofsketches because they raise different challenges that could be suitably faced following anapproach based on intelligent software agents. In the sequel of this section we will discusseach of them; we will pay more attention to the Use Case Diagrams, since they are usedthroughout the paper to show the design, implementation and use of our system.

2.1. Use Case Diagrams

Description. The first type of sketch is taken from the software engineering area, wheresketching is mainly used to represent software design ideas in terms of diagrams. It is acommon practice for the designers to model software systems by using the UMLdiagrammatic notation [12], the de-facto standard for engineering Object-Orientedsoftware. In particular, during the requirement analysis phase, designers draw Use CaseDiagrams to describe the available functionalities and the main usage scenarios of anapplication. Such diagrams are composed of use cases (represented by ovals), actors

(represented by stick figures), and connectors among them (Fig. 1).There is only one type of relationship that may occur between actors and use cases; it is

visualized like a solid line, named communication link, and means that an actor participatesto a use case. Instead, four types of relationships between use cases are supported by UML:communication, inclusion (visualized as a dashed arrow from the including to the includeduse case, with label ‘‘include’’), extension (a dashed arrow from the extending to theextended use case, with label ‘‘extend’’), and generalization (a solid line ending in a hollowtriangle drawn from the specialized to the more general use case). The only type ofrelationships that may hold among actors is generalization.

Being able to correctly interpret hand-drawn Use Case Diagrams would allow softwareengineers to converge more quickly towards an agreement in situations like brainstorming

Use Case Actor Communication Include

<<include>> <<extend>>

Extend Generalization

Fig. 1. Symbols used in a Use Case Diagram.



dx.doi.org/10.1016/j.jvlc.2007.04.002


meetings and teleconferences, where manually sketching the ideas that emerge during theconversation are easier, more convenient and rapid than drawing them by means of aCASE tool. An example of UML Use Case Diagram is shown in Fig. 2. Fig. 2(a) shows adiagram drawn using a UML editor, and modeling the functionality of account creationfor the Administrator of a blog. Fig. 2(b) shows a hand-drawn version of the samediagram, without considering textual descriptions whose recognition is out of the scope ofthis paper. Note that extend and include symbols have been drawn with solid lines forsimplifying the drawing process.

Challenges raised by Use Case Diagrams recognition. The recognition of the hand-drawnUse Case Diagram in Fig. 2(b) presents several challenges. In fact, users should be able todraw without indicating where one symbol ends and the next one begins, and should befree to draw a symbol with any number of strokes and in any order. Indeed, use case

symbols might be drawn as a single pen stroke, or as two or more separated strokes.Alternatively, a single pen stroke might contain multiple shapes, as the communication

symbol and a use case in Fig. 2(b). Moreover, another problematic condition occurs whentwo different symbols present similar parts. Considering the example of Fig. 2(b), how is itpossible to distinguish the communication links from a line composing the actor oranother relationship symbols? Finally, a last problem is due to the on-line processing of thesketch. In fact, new strokes become available while the user is drawing, and these newlydisclosed strokes may change the interpretation previously given to a symbol. Thus,solving ambiguities must undergo an incremental approach, where any new informationcoming from the user is exploited for tuning the sketch interpretation given so far.The variation in drawing style may be managed for example by an ink parsing process

that establishes which strokes are part of which shapes by grouping and segmenting theuser’s strokes into clusters of intended symbols [13]. It may be also useful to integrate,within the system, symbol recognizers implemented following different approaches; in fact,recognizing a symbol composed by many strokes, like the actor, might benefit from theadoption of techniques different than those used to recognize a communication symbol,usually made by just one stroke.

AdministratorCreate Blog Account

Create Regular Blog Account Create Editorial Blog Account

<<include>>

a b

<<extend>>

Record Application Failure Check Identity

Fig. 2. A Use Case Diagram drawn with a UML editor (a) and using a sketch-based interface (b).



dx.doi.org/10.1016/j.jvlc.2007.04.002


The ambiguities due to the possible membership of one stroke to more than one symbol,can be solved by analyzing the objects surrounding the ambiguous parts, i.e., the contextaround them. As soon as the context changes, or new strokes become available forrecognizing new symbols or for correcting previously given interpretations, the systemmust be able to use this new information in an incremental way, namely, without re-inter-preting portions of the sketch whose interpretation does not raise any conflict, andworking only on the ambiguous portions.

2.2. Ancient egyptian hieroglyphs

Description. Ancient Egyptian hieroglyphs consist of more than 2000 logographic,alphabetic, and ideographic elements. Logographs may represent either a word or amorpheme (a meaningful unit of language). The logographic approach stands in contrastto other writing systems, such as alphabets, where each symbol (letter) primarily representsa sound or a combination of sounds. Ideographic elements are similar to (sometimes usedas synonyms of) logographic ones, but they represent pure ideas rather than words ofmorphemes. Examples of hieroglyphs are shown in Fig. 3.

Challenges raised by hieroglyphs recognition. The challenges raised by recognizingEgyptian hieroglyphs, as well as similar pattern-based writing systems such as the ancientlogographic languages of Near East, India, China, and Central America, are almostdifferent from those raised by diagrammatic sketch recognition. In fact, although thehieroglyph, like a symbol in a diagrammatic language, can be divided into a set ofprimitives shapes with relationships among them, there is no global ‘‘sketch’’ in the writtendocument, and the recognition process works on each symbol of the language (eachhieroglyph, in this case) in an independent manner. Besides this, although all ancienthieroglyphs are, of course, ‘‘hand-written’’, the hand-writing process does not take placeon-line. It is in fact reasonable to assume that hieroglyphs are supplied to the recognitionsystem as scanned images, thus making recognition an off-line process. This means thatprimitive patterns composing each hieroglyph are not produced by a sketch editor, but bya static image segmentation system. Thus, no information on the temporal succession ofpattern drawing may be exploited for helping the recognition. Also, the same hieroglyphmay have a different appearance in different documents, due to differences in the period,place and style of the document composition: the recognition process must be able torecognize the same symbol despite of its different representations and to the differentsupports where it has been engraved or painted (see Fig. 4).

Finally, the set of known hieroglyphs is still evolving: as new historical documents arediscovered by the archaeologists, new hieroglyphs are classified and interpreted. Thus, thesystem should be able to easily integrate new hieroglyph recognizers without requiring anysubstantial change to its code and behavior.

Based on the challenges identified so far, a system for the recognition of ancientEgyptian hieroglyphs should be able to work off-line, taking scanned images as input; do

soul mountain house sun scarab (''hpr'' triliteral sound)

Fig. 3. Some Egyptian hieroglyphs (from Wikipedia, http://en.wikipedia.org/wiki/Egyptian_hieroglyph).



http://en.wikipedia.org/wiki/Egyptian_hieroglyph

dx.doi.org/10.1016/j.jvlc.2007.04.002


not make assumption on the temporal succession of drawn primitive patterns; be open tothe insertion of new hieroglyph recognizers as new hieroglyphs are discovered; be flexibleenough to recognize the same symbol also when represented in different fashions, like thesystem described in [14] that uses Fuzzy Hierarchical Attributed Graphs for recognition ofhieroglyphs.

2.3. Anatomical sketches

Description. As observed by Haddawy et al. in [3], drawing anatomical sketches is acommon practice in medicine. Physicians use sketches when taking notes in patient recordsand for conveying diagnoses and treatments to patients, and medical students use them forsupporting reasoning about clinical problems in individual and group problem solving.The top right box of Fig. 5 shows a template of the external view of lung, while in the left

part of the figure, the first column shows the sketches drawn by different students, thesecond column is the physician’s segmentation of the sketch, and the last column is thesegmentation performed by UNAS, the automatic system described in [3]. By looking atthe sketches, it is clear that they show deep differences in line style and width; also, theymay either miss strokes or contain more strokes than expected.

Challenges raised by anatomic sketches recognition. Understanding an anatomical sketchrequires the ability to recognize what anatomical structure has been sketched and from

Fig. 4. Different representations of the scarab symbol.

5

Template

1. Superior lobe of the right lung

2. Middle lobe of the right lung

3. Inferior lobe of the right lung

4. Superior lobe of the left lung

5. Inferior lobe of the left lung

6. Trachea

4

(i) (ii) (iii)

(iv) (v) (vi)

(vii) (viii) (ix)

1

23

6

Fig. 5. A template and three hand sketches of lung’s external view (courtesy of Haddawy et al. [3]).



dx.doi.org/10.1016/j.jvlc.2007.04.002


what view (e.g., parietal view of the brain), as well as to identify the anatomical parts andtheir locations in the sketch (e.g., parts of the brain), even if they have not been explicitlydrawn. The lowest level objects of the sketches are not geometric primitives, such as lines,rather they are symbols that must be recognizable in isolation.

Thus, an automatic sketch recognizer should be able to exploit symbol recognizers thatperform an image-based analysis in order to avoid many problematic issues such assegmentation and feature extraction. Symbol recognizers should be tolerant of overstroking, missing and extra pen strokes, variations in line style, and variations in drawingorder, since anatomical sketches are usually characterized by all these features.

In this domain, an automatic sketch recognition system might be used both on-line andoff-line. Its main requirement would be, once again, the ability to integrate in a seamlessway heterogeneous symbol recognizers, since, due to their different complexity, thesketches of some anatomical organs may be better recognized using different techniques.

2.4. Features required by a recognition system for facing the above challenges

If we consider the challenges arising in the three above scenarios, the features that anautomatic sketch recognizer should have in order to face them may be summarized inbeing able to:

�

P

J

exploit contextual information (when the graphical language of the sketch allows it) forcorrectly interpreting symbols;
� coordinate the behavior of the symbol recognizers in such a way to detect and solve
conflicting interpretations of symbols;
� integrate in a seamless way heterogeneous symbol recognizers in order to exploit
different techniques for recognizing different symbols starting from primitive shapes;
� add new symbol recognizers as soon as they become available, without needing to
change any other component of the system;
� work both on-line and off-line; � according to the working modality (on-line vs. off-line), either do or do not exploit
information on the temporal succession of drawn primitive patterns;
� receive input from different kinds of devices and in different formats.
We claim that a multi-agent approach to sketch recognition may help in designing andimplementing a system having the above features. The next section introduces agents andmulti-agent systems, and discusses the reasons upon which our claim grounds.

3. The multi-agent approach to sketch recognition

The AgentLink III Technology Roadmap [15] defines an agent as:

leas

our

‘‘a computer system that is capable of flexible autonomous action in dynamic,

unpredictable, typically multi-agent domains.’’

According to [16], agents should be

1.
autonomous: they should operate without the direct intervention of humans or others,and have some kind of control over their actions and internal state;
e cite this article as: G. Casella, et al., An agent-based framework for sketched symbol interpretation,

nal of Visual Language and Computing (2007), doi:10.1016/j.jvlc.2007.04.002

dx.doi.org/10.1016/j.jvlc.2007.04.002


2.

P

Jo

responsive: they should perceive their environment and respond in a timely fashion tochanges that occur in it;

3.
pro-active: they should not simply act in response to their environment, but shouldexhibit opportunistic, goal-directed behavior and take the initiative where appropriate;
4.
social: they should be able to interact, when appropriate, with other artificial agents andhumans in order to complete their own problem solving and to help others with theiractivities.
Another characterizing feature of agents is situatedness: the agent receives sensory inputfrom its environment and it can perform actions which change the environment in someway [17].As far as sociality is concerned, it is now widely recognized that interaction is probably

the most important single characteristic of nowadays’ complex software. Two goodreasons for agents to interact and eventually cooperate, are to solve conflicts [18,19], and todisambiguate the interpretation of objects in some domain [20].A multi-agent system, or MAS for short, is a system composed by many interacting

agents. In a MAS, each agent has incomplete information or capabilities for solving theproblem, thus each agent has a limited viewpoint, there is no global system control, data isdecentralized, and computation is asynchronous. Also, due to the dynamicity andunpredictability of scenarios where agents live, MASs are open to change. This means thatthe topology of a MAS cannot be fixed a priori, but it dynamically changes as agents enterand leave the MAS.If we keep the above features in mind while considering the problem of recognizing

hand-drawn sketches in the three scenarios described in Section 2, we soon realize that anarchitecture based on agents might be a proper solution.In fact, when drawing takes place as an on-line process, the ‘‘virtual blank sheet’’ where

the user draws, represents a dynamic and unpredictable environment. Then, an ‘‘entity’’devoted to recognizing the sketch drawn by the user must be situated in it, in order toproperly perceive the user’s actions. Also, this entity must react to changes that take placein the virtual blank sheet, i.e., new strokes drawn by the user. Situatedness and reactivityare not crucial issues if the system processes the input in an off-line way, like whenrecognizing an Egyptian hieroglyph stored in an image file.However, despite the on-line vs. off-line processing modality, entities working for

recognizing the sketch must have a complex long term goal, i.e., giving a correctinterpretation to what the user is drawing or to what the image represents, and mustoperate in an autonomous way to reach this goal, since no explicit input or suggestionsmust be required to the user of the system. Although each single entity may be able toperform some task useful for recognizing the sketch (for example, recognizing one specificsymbol of the language with a certain degree of confidence), by working alone it cannoteasily resolve ambiguities (‘‘Is this symbol an arrow or a line?’’), and conflicts (‘‘In order torecognize my symbol, I am using a primitive shape that is also used by another entity; towhich symbol does the shape really belong?’’). Thus, a social behavior is required to reachthe final goal of each entity, which consists in overcoming conflicts and ambiguities, andproviding the right interpretation of the sketch to the user. Finally, since the numberof graphic elements likely to be recognized is almost unlimited, a modular softwarearchitecture is necessary so that entities can be connected or disconnected with the leastpossible repercussions on the system [21].

lease cite this article as: G. Casella, et al., An agent-based framework for sketched symbol interpretation,

urnal of Visual Language and Computing (2007), doi:10.1016/j.jvlc.2007.04.002

dx.doi.org/10.1016/j.jvlc.2007.04.002


In the end, each ‘‘entity’’ must be responsive, pro-active, situated, autonomous, andsocial, and must live in an open environment. In other words, each entity must be anintelligent agent living in a MAS.

Our proposal exploits the suitability of an agent-oriented conceptualization of the hand-drawn sketch recognition problem, and the advantages of an agent-oriented approach toengineering complex systems [22,23], for modeling and designing the agent-basedframework depicted in Fig. 6. An implemented instance of this framework is discussedin Section 7 and demonstrates, in the domain of diagrammatic sketch recognition, thefeasibility of our approach. In this section, we provide a high-level overview of the generalframework, whereas in Section 7 we will discuss all the details of the implemented instance.

Our framework is composed by four kinds of agents:Interface Agent. It represents an interface between the agent-based framework and the

generic ‘‘Input Suppliers’’ that are not included inside the framework. The nature of theseinput suppliers may vary according to the type of sketch to be interpreted and to thedrawing process (on-line vs. off-line). For example, they might be editors suitable for on-line drawing of diagrammatic and anatomic sketches, as well as interfaces that allow theuser to select an existing image to be interpreted (useful for an off-line recognition processlike that of hieroglyphs interpretation). The Interface Agent informs the ‘‘SketchInterpretation Agent’’ (SIA) and the ‘‘Input Pre-Processing Agent’’ (that, in turns,informs the ‘‘Symbol Recognition Agents’’) about the nature of the recognition process(off-line or on-line) and converts the information produced by the input suppliers into asuitable format for these agents. It sends each new available piece of input (or the wholeinput, in case of an off-line recognition process) to the Input Pre-Processing Agent, andinteracts with the SIA for sending the sketch interpretation requests to it, and fordelivering its answer to the user. While for on-line recognition the user can request thesketch interpretation several times as he/she is drawing, for off-line recognition the sketchinterpretation is requested by the user, or automatically by the input supplier, just one timewhen all the input has been acquired. For any new input supplier to be linked to theframework, a new ‘‘stub’’ inside the Interface Agent must be created ad-hoc. For example,

Symbol Recognition Agent 1 •••

(new) input

Sketch Interpretation

Agent

Symbol Recognition Agent 2

Symbol Recognition Agent N

Input Supplier 1

HDSR 1 HDSR 2 HDSR N

Input Pre-processing

Agent

interpretation Input Supplier 2

Input Supplier 3

Interface Agent

Fig. 6. The agent-based framework for sketch recognition.



dx.doi.org/10.1016/j.jvlc.2007.04.002


the implemented instance of the Interface Agent discussed in Section 7 provides a stub foran on-line editor that takes advantage of Java Swing components and Satin [24].

Input pre-processing agent. It processes the input received from the Interface Agent andsends the obtained results to the ‘‘Symbol Recognition Agents’’ (SRAs) described in thefollowing, using a format compliant with the recognition approach they apply. Forexample, if the input to pre-process is a Use Case Diagram sketched on-the-fly, the mainactivity of this agent will be to segment and classify the user strokes into a sequence ofdomain independent primitive shapes. It will receive from the Interface Agent the sequenceof strokes drawn by the user, and send their classification results, together with theirattributes, to the SRAs. In the case of the hieroglyphs recognition problem, the pre-processing agent will behave in a different way: it will be devoted to identify and isolateeach single hieroglyph within the sketch, and send it to one or more SRAs, according tosome strategy based on the hieroglyph’s features.

Symbol Recognition Agents: Each SRA is devoted to recognize a particular symbol of thedomain. Moreover SRAs may collaborate with other SRAs in order to apply contextknowledge to the symbols they are recognizing, and with the SIA that deals with the sketchinterpretation activity. In our running example based on UMLUse Case Diagrams, there willbe an SRA devoted to recognize ‘‘generalize’’ symbols by managing the execution of therelated ‘‘hand-drawn symbol recognizers’’ (HDSRs), and by collaborating with other SRAsto obtain contextual feedback. Each SRA tries to recognize a domain symbol by applyingsuitable HDSRs to the stroke classifications produced by the Input Pre-Processing Agent.The Input Pre-Processing Agent sends a stroke classification to an SRA only if the SRA mayuse it to recognize new domain symbols (i.e., strokes classified as ellipses will not be sent tothe SRA devoted to recognize the generalize symbol, which does not include ellipses).When a new symbol has been recognized by the HDSR, the corresponding SRA starts

the collaboration process with other SRAs for obtaining contextual information for therecognized symbol. The collaboration consists of sending a feedback request messagecontaining the attributes of the recognized symbol to all the SRAs that recognize relatedsymbols (that are known ‘‘a priori’’ by each SRA). When an SRA receives a feedbackrequest message, it checks its set of recognized symbols to give a response. If it finds asymbol that satisfies the language relationship with the symbol in the feedback request, itsends a positive response to the requester; otherwise, it sends a negative response. Forexample, due to the rules that govern the definition of well-formed UML Use CaseDiagrams, an SRA that recognizes an actor symbol should ask a feedback to the SRA thatrecognizes communication symbols, since actors always participate to use cases to whichthey are connected via a communication symbol.

Sketch Interpretation Agent: The SIA provides the correct interpretation either of thesketch drawn so far (in case of an on-line drawing process) or of the entire sketch (in caseof an off-line recognition process) to the Interface Agent. In particular, it analyzes theinformation received from SRAs and solves conflicts between symbols that might arise.When all the conflicts have been solved, the SIA proposes the sketch interpretation to theuser, interacting with the Interface Agent. The SIA looks for conflicts by checking if thereare symbols that share one or more strokes. For example, conflicts may take place eitherbecause a stroke is classified as two different shapes (for example, as a line and as an arc)due to the sketch inaccuracy, or because the same stroke, although correctly classified, isused by two SRAs to recognize two different symbols. By solving the ‘‘easiest’’ conflictsfirst (namely, those conflicts where it is easier to identify the right interpretation), the SIA



dx.doi.org/10.1016/j.jvlc.2007.04.002


is able to free some symbols from the conflicts they were involved in. These symbols areused for solving other conflicts, and the process goes on until there are no more conflictsleft. The SIA also applies some heuristics to select a set of unlikely symbol interpretations,which are pruned by the recognized symbol set of the SRAs improving their recognitionefficiency.

Hand-drawn symbol recognizers: HDSRs are not agents since they lack most of theagent-characterizing features. They are just software modules managed by SRAs. Manyapproaches have been proposed for recognizing free-hand drawings [25–30]. Each of theseproposals has the potential of being integrated into our system in spite of the fact that theydiffer one from another under several aspects, ranging from the identification of the shapeof the symbols to the approach used to construct them. As we have already pointed out, aslong as there is one SRA that correctly integrates HDSRs by managing their execution aswell as data conversion issues, the actual implementation of the HDSRs and the approachto recognition that they adopt do not matter. The framework that we propose mayseamlessly integrate symbols that have been recognized by heterogeneous HDSRsmanaged by ad-hoc SRAs.

4. Hand-drawn symbol recognizers

In this section, we describe the main features of three HDSRs that can be exploited bySRAs to identify the symbols of a given domain: LADDER [25], Sketch Grammars [27],and CALI [28].

LADDER. In LADDER, the symbol recognition is performed using the rule-basedsystem Jess [31]. In particular, for each symbol of the domain, a Jess rule is automaticallygenerated from a LADDER structural shape description, which mainly containsinformation on the shape of the symbol, but may include other information helpful tothe recognition process, such as stroke order or stroke direction. As an example, theLADDER description and the generated Jess rule for the visualized generalize symbol ofUML Use Case Diagrams are described in the boxes below.



dx.doi.org/10.1016/j.jvlc.2007.04.002


The LADDER rule defines a generalize shape as a list of four lines from which the shapeis built plus the geometric constraints defining the relationships on these elements. Thegenerated Jess rule, like all Jess rules, is composed of two parts. The part on the left of the‘‘) ’’ symbol contains the name of the rule (‘‘GeneralizeCheck’’) and the preconditionsthat enable the rule to fire, namely: getting four lines, making sure that the four lines areunique, getting the components of each line, and finally checking that the lines’components meet the topological and geometric constraints that allow an arrow to becomposed with them. The part on the right of the ‘‘) ’’ symbol, defines what to do whenthe precondition is met; in this case, the symbol, together with its constituent parts and itsaccuracy, computed from the accuracy values produced by the primitive shape recognizers,are given in output. Thus, when in the working memory of the HDSR there are four linesthat respect the precondition of the rule, the rule is fired and a generalize symbol isrecognized.The shapes described with LADDER must be diagrammatic or iconic since they have to

be drawn using a predefined set of primitive shapes and composed using a predefined set ofconstraints.

Sketch Grammars: Sketch Grammars (SkGs) represent a direct extension of stringgrammars, where more general relations other than concatenation are allowed [27]. Thesymbol recognizers automatically generated from SkGs try to cluster stroke interpretationsinto symbols of the domain language. The parsing technique extends the approachesproposed in [32]: the parsers scan the input in an incremental and non-sequential way,driven by the spatial relations specified by the grammar productions.An SkG G can be seen as a context-free string attributed grammar where the

productions have the following format:

AG! x1R1x2R2 . . . xm�1Rm�1xm;Act,

A is a nonterminal symbol, each xj is a terminal or nonterminal symbol, and each Rj is asequence of spatial and/or temporal relations [27]. Act specifies the actions that have to beexecuted when the production is reduced during the parsing process. These may include aset of rules used to synthesize the values of the attributes of A from those of x1, x2,y, xm.Actions are enclosed into the brackets { }. G is used to dynamically insert new terminalshapes in the input during the parsing process, enhancing the expressive power of theformalism.To go on with our running example based on UML Use Case Diagrams recognition, the

Actor SRA might manage an ‘‘Actor HDSR’’ implemented using SkG. This HDSR woulduse the following production to recognize the Actor symbol:

P

Jo

Actor-Ellipse /joint1_1(t1)SLine1/near(t2), near(t3), rotate(45,t4)SLine2/joint2_1(t5), near1(t5), near2(t6), rotate1(�45,t4)SLine3 /joint2_1

2(t7), rotate2(135,t4)SLine4 /joint2_1

3(t9), rotate3(�135,t4)S Line5,{Actor.attach(1) ¼ Ellipse.attach(1) [ Line1.attach(1);

Actor.accuracy ¼ ComputeAccuracy();}

The Actor symbol is composed of an ellipse and five lines, as shown in Fig. 7 (theattributes are represented with bullets). The non-terminals Ellipse and Line cluster the



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

Ellipse

Line2

Line1

Line3

Line5 Line4

Fig. 7. The Actor Symbol.

G. Casella et al. / Journal of Visual Languages and Computing ] (]]]]) ]]]–]]] 13

single stroke arcs that form an ellipse and the parallel single stroke lines, respectively. Theattribute 1 of Ellipse, which represents its borderline, is jointed to the attributes 1 of Line1,Line2, and Line3. The latter are rotated with respect to the former of 451 and �451,respectively. The values t1,y,t9 specify the error margin in the satisfaction of the relations.Finally, the attribute 1 of Actor is calculated from the values of the attributes of Ellipse andLine1, and the accuracy of Actor is computed by the ComputeAccuracy function, whichcombines the accuracy of the strokes forming the sketch and of their spatial relations.

CALI. CALI is a system for recognizing multi-stroke geometric shapes based on a naıveBayesian classifier [28]. It is able to identify shapes of different sizes and rotated atarbitrary angles, drawn with dashed, continuous strokes or overlapping lines. It detects notonly the most common shapes in drawing such as triangles, lines, rectangles, circles,diamonds and ellipses, using multiple strokes, but also other useful shapes such as arrows,crossing lines or wavy lines. A training process has to be performed for adding new shapesto the set of symbols to be recognized.

The shapes are recognized by statistical analysis of various ratios of values such as theconvex hull around the area designated as containing a single shape, the largest triangleand largest quadrilateral that can be inscribed within the hull, the smallest area enclosingrectangle that can be fitted around the shape. For handling ambiguities naturally, fuzzylogic is used to associate degrees of certainty to recognized shapes. Thus, the recognizerworks by looking up values of specific features in fuzzy sets associated to each shape andcommand. This process yields a list of plausible shapes ordered by degree of certainty.Nevertheless, the filters applied during the recognition are ineffective on ambiguous shapessuch as pentagon and hexagon.

In order to provide on-line recognition of sketches, the systems constructed on top ofCALI submit the strokes drawn by the user to the recognizer when the user’s pauses arelonger than a given time between strokes [33].

5. Symbol recognition agent

An SRA is an autonomous software agent able to control an HDSR and to cooperatewith other agents in order to recognize hand-drawn symbols. The SRA behavior is mainlyaffected by received messages, as highlighted in the state diagram shown in Fig. 8.

Initializing, terminating and reading new messages: In the Initializing state each SRAreads the MAS configuration in order to correctly initialize itself. In particular, the SRAlearns how to communicate (i.e., address, supported communication protocols, languages,



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

feedback request

feedback response

Pruning Recognized

Symbols

terminate

Reading New

Messages

TerminatingIntializing

new input

Recognizing

Sending feedback

Request

interpretation request

pruning request

Sending

Recognized Symbol

Updating Recognized

Symbol FeedbacksChecking

Recognized Symbols

Fig. 8. The state diagram describing the behavior of Symbol Recognition Agents.

attributes

confidence

feedback set stroke

classification set

sender SRA - ID

sender SRA - ID

…

Symbols IDs

Symbol Recognition Agent

SRA names

addresses

communication protocols

communication languages interaction protocols symbols relation

HDSR

received classification

received feedback

incoming message queue

outgoing message queue

Recognized Symbols

Related SRAs

Fig. 9. SRA internal structure.

G. Casella et al. / Journal of Visual Languages and Computing ] (]]]]) ]]]–]]]14

interaction protocols, and so on) with SRAs that recognize related symbols. Thisinformation is stored by the SRAs in the related SRAs data structure as shown in Fig. 9.Moreover, the SRA loads some parameters to correctly initialize its HDSR. Theoperations performed to initialize an HDSR depend on the particular HDSR considered.For example, if the SRA uses a HDSR based on Jess, then suitable rules must be loaded inthe Jess engine during the initialization stage, while if the HDSR is based on SkGs, thenthe parser must be instantiated with the proper rules, and if a CALI HDSR is used, thetraining samples for the domain symbol must be provided. For other kinds of HDSRs,suitable operations are carried out.Once the initialization phase has been completed, the SRA goes in the Reading New

Messages state where it just waits for receiving a message. A terminate message is sent to allthe SRAs by the Interface Agent if the user stops the execution of the external inputsupplier, be it a sketch editor or another kind of device. When the terminate message isreceived, each SRA moves to the Terminating state, releases the allocated resources and



dx.doi.org/10.1016/j.jvlc.2007.04.002


terminates. When an SRA is in a state different from Reading New Messages, any newmessage is stored in the incoming message queue and served when the SRA comes back tothe Reading New Messages state. On the other hand, when the SRA is in the Reading New

Messages state and no message is available, then the agent stays idle in a waiting mode.Recognizing: When a message containing a new input is received from the Input pre-

processing agent, the SRA moves from the Reading New Messages state to the Recognizing

state. With the first message the Input pre-processing agent also informs the SRA of thesketch recognition mode (on-line or off-line). The SRA uses this information to correctlyconfigure its HDSR. In the Recognizing state, for each received message, the SRA builds arepresentation of the received data suitable for its HDSR (i.e., a fact for a Jess basedHDSR, a new token symbol for the sketch parser for an HDSR based on SkGs, etc.). Thenthe SRA tries to recognize a new domain symbol giving the new available information ininput to its HDSR. If a new domain symbol is recognized, then the HDSR outputs thestrokes used to recognize it and the symbol attributes (e.g., starting point, ending point,radius, etc.). For each recognized symbol the SRAs compute a confidence value whichrepresents a sort of probability that the symbol has been correctly recognized. Theconfidence values obtained by the HDSRs require a normalization process since HDSRscan output values belonging to different ranges. The recognized symbols and their featuresare stored by the SRAs in the set of recognized symbols in a way that they can be easilyretrieved using their IDs, as shown in Fig. 9. If a new symbol is recognized when the SRAis in the Recognizing state, then it moves to the Sending Feedback Request state, else itmoves back to the Reading New Messages state.

Sending feedback request: When the SRA recognizes a new symbol it sends a feedbackrequest to all the SRAs that recognize related symbols. The technical details necessary tosend the message are contained in the related SRAs set. The feedback request messagecontains the attributes of the recognized symbol and its ID. SRAs always include infeedback request and response messages their name (unique in the MAS) and theidentifiers of the involved symbols in order to better trace related recognized symbols. Theinformation on related recognized symbols will be used by the SIA to solve conflicts and toprune misleading interpretations.

Checking recognized symbols: When a feedback request is received, the SRA checks itsrecognized symbols set to verify if the symbol in the feedback request message is relatedthrough a given relationship with one or more recognized symbols. If some relation holds,the SRA sends a positive response to the requester, otherwise sends a negative response. Asshown in Fig. 9, a feedback vector is associated to each recognized symbol in therecognized symbol set for containing all its related recognized symbols. If the SRA is ableto give a positive feedback response then the SRA updates the feedback vector associatedwith the symbols involved in the positive feedback response.

Updating recognized symbols feedbacks: When a positive feedback response is received,the SRA updates the feedback vector of the symbol that started the feedback requestprocess.

Sending recognized symbol: When the Interface Agent sends an interpretation request tothe SIA, it forwards the request to all the SRAs. Then, each SRA sends to the SIA thesymbols it was able to recognize together with the set of input drawings used to recognizeit, the symbol’s confidence, and the symbol’s feedback vector.

Pruning recognized symbols: HDSRs could produce wrong symbol interpretations,mainly due to input inaccuracy. In particular, the input components could be not correctly



dx.doi.org/10.1016/j.jvlc.2007.04.002


classified or could be used to recognize more than one symbol. For example, if we considerUse Case Diagrams, a line belonging to an actor might be also used by the communicationSRA to recognize a communication symbol. The SIA, as described in the following, isresponsible to detect and solve this kind of errors. For on-line recognition, when the SIAterminates the interpretation process, it sends to all the SRAs a message containing the listof all the misrecognized symbols. When this message is received the SRA goes in thePruning Recognized Symbols state in which it updates its recognized symbols set. Inparticular it deletes all the wrong symbols and updates the feedback vectors of the symbolswith feedbacks from the wrong symbols. Moreover, to improve the efficiency of the on-linerecognition process, if the recognition process is stroke-based, the SRAs may perform apruning operation on the unrecognized strokes. As an example, the arrow symbol is oftendrawn with a single stroke (as the include symbol of Fig. 2(b)), which is segmented in fourlines, three of which are used in the recognition of the arrow and one remains asunprocessed input. The elimination of these strokes allows the SRAs to improve theperformance of the recognition process. However, the selection of the strokes to beremoved is not a trivial task and can depend from the domain language.Considering our running example based on UML Use Case Diagrams, Fig. 10 shows the

incremental editing of a diagram and the recognition process performed by SRAs havingassociated HDSRs constructed with Jess rules. At each edited symbol the figure shows theresult of input pre-processing, the symbol recognized by each SRA, the exchanged messagesand the obtained feedbacks. In particular, the drawings in the first column show the newedited strokes in black, whereas the previous one are in gray; the numbers associated to thestrokes denote the temporal sequence of the drawing process. The second column showsthe result of the pre-processing on the new edited strokes. This process consists in thesegmentation of the strokes (the identified key points1 are visualized with a dot) and inthe classification of the substrokes obtained. We consider three primitive shapes in theclassification process: line, arc, and ellipse. The label associated with the strokes indicatesthe result of the classification process: L, A, and E denote a line, an arc, and an ellipseclassification, respectively. The third column shows for each symbol to be recognized(Actor, Communication, Use Case, Generalize, Extend, Include) an appropriate SRA, whichincludes an appropriate HDSR, and the messages exchanged to compute feedbacks. Therecognized symbols are listed on the right side, whereas the solid arrows show the messagesproducing positive feedbacks while the dashed arrows show those producing negativefeedbacks. The recognized symbols that have obtained a positive feedback are in boldface.When the strokes from 1 to 4 in Fig. 10(a) are drawn, the strokes 3 and 4 are segmented

in two substrokes, and then classified in an ellipse E1 and five lines (L1,y,L5). Note thatstroke 3 is classified both as a line and an arc, and that the first one has a greater accuracyvalue. The HDSR associated with the Actor SRA recognizes these primitive shapes as anactor symbol. Moreover, stroke 1 is also recognized as the Use Case symbol u1 by theHDSR associated with the Use Case SRA, whereas the line interpretations from L1 to L5

are recognized as the communication symbols c1, c2, c3, c4, and c5 by the HDSR associatedwith the Communication SRA. As previously described, when an SRA recognizes asymbol it starts to collaborate with SRAs recognizing related symbols for obtainingcontextual information. In Use Case Diagrams, the Use Case symbol is related to

1A key point is a point that contains the most characterizing geometric features of a sketch. For example, a high

curvature point, a tangency point, a corner point and an inflexion point.



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

StrokePre-Processing

L1

L2 L3, A1

L4 L5

E1

1

2

3

4

Symbol Recognition Agents

Actor

SRA

Communic-

ation SRA

UseCase

SRA

Extend

Include

SRA

I

Generalize

SRA

a1=<E1,L1,L2,L3,L4,L5>

c1=<L1>c2=<L2>c3=<L3>c4=<L4> c5=<L5>

u1=<E1>

a

StrokePre-Processing

L6

1

2

3

4

5


Actor

SRA

UseCase

SRA

Extend

SRA

E

Include

SRA

I

Generalize

SRA

a1=<E1,L1,L2,L3,L4,L5>

c1=<L1> c2=<L2>c3=<L3>c4=<L4>c5=<L5>c6=<L6>

u1=<E1>

b

StrokePre-Processing A2

A3

1

2 3

4

5

6

7


Actor

SRA

UseCase

SRA

Extend

SRA

E

Include

SRA

I

Generalize

SRA

a1=<E1,L1,L2,L3,L4,L5>

c1=<L1> c2=<L2>c3=<L3>c4=<L4>c5=<L5>c6=<L6>

u1=<E1>u2=<A2,A3>

c

Communic-

ation SRA

Communic-

ation SRA

E SRA

Fig. 10. The recognition of the symbols of a Use Case Diagram by SRAs.


communication, generalize, include and extend. Thus, when the Use Case SRA recognizesu1, it sends a feedback request to Communication SRA, Generalize SRA, Extend SRA,and Include SRA. All of them reply with a negative response since no other symbol hasbeen recognized yet. These exchanged messages are not shown in the figure. When theCommunication SRA recognizes c1,y, c5, it sends a feedback request to the Use Case and



dx.doi.org/10.1016/j.jvlc.2007.04.002


Actor SRAs. The first replies with a positive response to c1, c2, c3, since u1 is correctlyrelated to them, while the second replies with a negative response. Finally, when the ActorSRA recognizes a1, it sends a feedback request to the Communication and GeneralizeSRAs, which reply with a negative response.When stroke 5 in Fig. 10(b) is drawn, it is classified as a primitive shape line (L6) and

used by the Communication SRA to recognize c6. Once recognized c6, the CommunicationSRA sends a feedback request to Use Case SRA and Actor SRA, and the latter replies witha positive response.Finally, strokes 6 and 7 in Fig. 10(c) are classified as two arcs, and recognized by the Use

Case SRA as symbol u2. As done previously, the Use Case SRA sends a feedback requestto the related SRAs, and receives a positive reply from the Communication SRA since u2 isrelated to c5.

6. The Sketch Interpretation Agent

The goal of the SIA is to give, every time the user requests it, an interpretation of thesketch drawn so far. In order to build the sketch interpretation the SIA requests theinformation they have computed to the SRAs. In particular, it receives the set ofrecognized symbols and their collected feedback, which are exploited to obtain the correctsketch interpretation (detecting wrongly recognized symbols).The SIA behavior is represented by the state diagram in Fig. 11 and described in the

following:Initializing, terminating and waiting for interpretation requests: In the Initializing state the

SIA initializes itself and reads some configuration files to learn how to communicate withSRAs (i.e., address, supported communication protocols, languages, interaction protocolsand so on). The SRAs information are stored in the SRA table as shown in Fig. 12.When the SIA is in the Waiting for Interpretation Request state, it just puts itself in

waiting until a request message is received. On the other side if a sketch interpretationrequest is received while the SIA is not in the Waiting for Interpretation Request state thenthe message is stored in the incoming message queue.

terminate

terminate

Terminating

Intializing

recognized symbols received

[offline recognition][online recognition]

Waiting for

Interpretation Request

interpretation request received

Interpreting

Sketch

Communicating Symbols to

Be Pruned

Collecting

Recognized Symbols

Fig. 11. The state diagram describing the behavior of SIA.



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

Recognized Symbol Graph

Sketch Interpreter Agent

SRA names

addresses

communication protocols

communication languages interaction protocols symbols relation

received symbols & feedback

incoming message queue

outgoing message queue

Reasoning Component SRAs info

mistaken symbols

Fig. 12. The SIA internal structure.


If a terminate message is received (it is sent by the Interface Agent when the user closesthe editor), independently from its state, the SIA transits to the Terminating state, releasesthe allocated resources and terminates.

Collecting Recognized Symbols: When a sketch interpretation request message is receivedthe SIA transits from the Waiting for Sketch Interpretation Request state to the Collecting

Recognized Symbols state. Here, the SIA sends an interpretation request message to all theSRAs and collects their responses. Using this information the SIA builds the recognizedsymbol graph. Each node of the graph has associated the symbol that represents with someinformation, such as the SRA that has recognized the symbol, the symbol confidence, thestrokes belonging to the symbol. The graph edges can be of two types: the conflict edges

link conflicting symbols, whereas the feedback edges link the symbols that have produced apositive feedback during their recognition. A symbol without conflicts is namedunambiguous symbol. The SIA is able to build the graph thanks to the information sentby the SRAs as described in the previous section. When all the SRA responses have beenreceived and the graph has been built the SIA transits to the Interpreting Sketch state.

Interpreting Sketch: In this state the SIA looks for conflicts by checking the recognizedsymbol graph. The SIA goal, in this state, is to solve all conflicts by finding the wronglyrecognized symbols, and, consequently, to delete all the conflict edges in the graph. Thistask is accomplished by the Reasoning Component.

The conflict between two symbols is solved in favor of the one having the followinghigher truthful value:

tr ¼ w1 conf þ w2#rn

#n

� �,

where conf is the confidence value of the symbol, #n is the total number of nodes, #rn is thenumber of unambiguous symbols reachable by following a feedback edge from the symbol,and w1 and w2 are values between 0 and 1 that depend on the domain language. Inparticular, for languages whose diagrams may have symbols involved in many relationswith each other, w2 must be greater than w1, in order to weight the existence of feedbackmore than the accuracy of the symbol. Vice versa, for languages with few relations betweensymbols, it is more important to consider the accuracy associated to the symbol, and thus



dx.doi.org/10.1016/j.jvlc.2007.04.002


w1 must be greater than w2. Unambiguous symbols are used to solve conflicts because theyrepresent stable and not conflicting elements in the current sketch interpretation.Conflicts are solved starting from:

1.

P

Jo

Those that involve one symbol with feedback from unambiguous symbol(s) (unambiguous

feedback) and one symbol without unambiguous feedback.
2. Those that involve symbols with higher difference between their numbers of unambiguous
feedbacks.
3. Those that involve symbols with higher difference of their accuracy values.
This criterion helps in solving the ‘‘easiest’’ conflicts first, in order to obtain newunambiguous symbols that can be used to solve other conflicts. When a conflict is solved,the graph is updated deleting the node associated to the loosing symbol.In addition to solve conflicts, the SIA might apply heuristics dependent from the domain

language for removing some symbol interpretations. For example, if there are two symbolss1 and s2 such that s1 graphically includes s2 (e.g., in Use Case Diagrams actor includes aUse Case symbol), then the SIA can solve all conflicts that arise between these two symbolsa priori.When the conflict resolution phase terminates the SIA sends the sketch interpretation to

the Interface Manager that interacts with the Editor to show the sketch interpretation tothe user.When the conflict resolution ends the SIA behavior depends on the drawing process

nature (on-line or off-line) communicated by the Interface Manager with the sketchinterpretation request. For on-line recognition, the SIA stores the graph before going inthe Communicating symbols to be pruned state. In this way, the next time the SIA has tointerpret the input sketch it only updates the stored graph and solves the new arisingconflicts, avoiding to solve the same conflicts more than one time. For off-line recognition,the SIA just goes in the ‘‘Waiting for Interpretation Request Message’’ where it will receivea terminating message by the Interface Manager.

Communicating symbols to be pruned: In this state the SIA selects and communicates tothe SRAs the recognized symbols to be pruned. Many heuristics can be chosen: forexample, pruning could be applied to HDSRs that have recognized symbols withoutfeedback, and are involved in conflicts with symbols having feedback, or to HDSRsrecognizing symbols whose constituent strokes all belong to another symbol with morepositive feedback, and so on. The choice of the heuristics to be applied also depends fromthe diagrammatic language.As an example, if the user requests the interpretation of the sketch in Fig. 10(c) the SIA

constructs the graph in Fig. 13 using the symbols communicated by the SRAs. In the figuresolid lines represent conflict edges, dashed lines represent feedback edges, while filled nodesrepresent unambiguous symbols.The actor symbol a1 is in conflict with several symbols (the communications c1, c2, c3, c4,

c5, and the Use Case u1). The first conflict that is solved is the one between a1 and c1.Indeed, a1 collected one unambiguous feedback from c6, while c1 did not receiveunambiguous feedback (the one from u1 is not unambiguous). Supposing that a1 has agreater truthful value than c1, a1 wins the conflict. The conflict resolution goes on and sincea1 wins all its conflicts, it becomes an unambiguous symbol providing unambiguousfeedback.



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

a1

u1

c1

u2c6

c3

c4

c5

c2

Fig. 13. The graph constructed by the SIA on the Use Case Diagram of Fig. 10(c).


7. The AgentSketch system

AgentSketch is a system for recognizing freely hand-drawn diagrammatic sketchesexploiting the multi-agent approach presented in Section 3. It supports both on-line andoff-line recognition mode and can be applied to a variety of domains by providing thesuitable recognizers. In the following, we describe the implementation details and then wepresent an experimental evaluation of the system on the domain of Use Case Diagrams.

7.1. Implementation

The architecture of AgentSketch, that instantiates the general one presented in Fig. 6,for recognizing UML Use Case Diagrams, is shown in Fig. 14. AgentSketch has beenimplemented on top of the Jade agent-based platform [34] using Java 1.5. Agents inAgentSketch communicate by exchanging messages encoded in FIPA-ACL messages [35],a communication language natively supported by Jade. FIPA-ACL specifies both themessage fields and the message ‘‘performatives’’ (communicative acts such as requesting,informing, making a proposal, accepting or refusing a proposal, and so on). Instead, thecontent language of the message is not fixed by the FIPA-ACL specification, and may bechosen by the MAS developer. We express it in XML according to a set of XML-Schemasthat we have defined. The messages within AgentSketch have various purposes, such ascommunicating the availability of new strokes, requesting/providing feedbacks, request-ing/providing the sketch interpretation, and so on. Jade offers the means for sending andreceiving messages, also to and from remote computers, in a way that is transparent to theagent developer: for each agent, it maintains a private queue of incoming ACL messagesthat our agents access in a blocking mode. Jade also allows the MAS developer to monitorthe exchange of messages using the ‘‘sniffer’’ built-in agent.

A screenshot of the Jade sniffer agent captured during the execution of AgentSketch isshown in Fig. 15. The boxes represent the agents while the arrows the exchanged messages.In particular the first box on the left represents the set of agents included in the jadeplatform but not monitored by the sniffer agent, while the other boxes (from left to right)represent the Interface Agent, the Input Pre-processing Agent, the SRAs, and the SIA. Thefigure shows the messages exchanged in the following scenario:

(1)

Pl

Jo

A new line is drawn. The Interface Agent sends an inform message containing the strokeattributes to the Input Pre-processing Agent (message 1) and the Input Pre-processing

ease cite this article as: G. Casella, et al., An agent-based framework for sketched symbol interpretation,


dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

Editor

Jade Platform

Sketch Interpretation Agent

Interface Agent

newinput

(stroke)

StrokeSegmenter

Input (stroke) Pre-processing Agent

StrokeClassifier

Graph Manager

Rine

UseCase SRA

HDSRne

Generalize SRA

HDSRJess engine

Actor SRA

HDSRJava

Comm SRA

ContextualReasoning

Module

Fig. 14. The AgentSketch architecture.

Fig. 15. Agent message exchange in AgentSketch.


Pl

Jo

Agent sends to all the SRA the new available input in a format that they canunderstand (messages 2–7). The Communication SRA recognizes, using the new line, anew Communication symbol and sends a feedback request to the Actor SRA and to theUse Case SRA (messages 8 and 9). The Actor SRA and the Use Case SRA check theirrecognized symbol set and answer to the feedback request (messages 10 and 11).



dx.doi.org/10.1016/j.jvlc.2007.04.002


(2)

Pl

Jo

The interpretation of the sketch is required. The Interface Agent sends a request messageto the Sketch Interpreter Agent (message 12) that forwards it to all the SRA (messages13–18) in order to collect their recognized symbols. Each SRA sends its set ofrecognized symbols to the Sketch Interpreter Agent (messages 19–24). When the SketchInterpreter Agent has collected all the recognized symbols and elaborated a sketchinterpretation it sends the sketch interpretation to the Interface Agent (message 25).

The Input pre-processing Agent is composed of two modules, the Stroke Segmenter andthe Stroke Classifier. Both modules have been implemented using third parties softwareand, for their highly modular implementation, might be easily replaced with other pieces ofsoftware with the same purpose. The same consideration holds for the Graph Managerthat composes the Sketch Interpretation Agent.

The Stroke Segmenter segments strokes into sub-strokes by identifying key points. Inthis way, symbols can be drawn with multiple pen strokes, and a single pen stroke cancontain multiple symbols. Each sub-stroke may be classified by the Stroke Classifier eitheras a line, an arc, or an ellipse. However, if the user stroke is imprecise, then the StrokeClassifier may classify it as more than one primitive shape (for example, both as a line andas an arc).

In our future work we want to exploit Web Services Technology to implementreplaceable, independent, self-consistent HDSRs. This would allow to reach a completedecoupling between agents and HDSR, decoupling which is not fully achieved in ourcurrent architecture. In fact, by publishing their functionalities as Web Services, HDSRsmight be implemented by anyone and in any language, might be physically storedanywhere, might run on any platform, and might be replaced with a minimal effort. Inorder to access them, agents should only know their physical address and the specificationof the offered services. These services would consist of operations that given a set of strokeclassifications, return a recognized symbol, if any.

In our current implementation of AgentSketch, each SRA checks symbol relations forfeedback exchange purposes. In particular, exploiting the attributes of symbols recognizedby HDSRs, SRAs are able to check a set of relations between symbols (or symbols’portions/points) such as near, intersect, touch, parallel, and so on.

The behavior and the structure of the AgentSketch components (both agents andHDSRs) are those described in Sections 4, 5 and 6. In the following, we provide somedetails on their actual implementation.

Sketch-based user interface. The graphical user interface of AgentSketch has beendeveloped using the Java Swing components and Satin, a toolkit designed to support thecreation of 2D pen-based applications [24]. Satin provides a Java Swing component(namely, the Satin sheet), that can be easily included in any Java based GUI and thatrepresents a canvas where the user can create strokes based on the path drawn by a pen ormouse.

Strokes are represented as a list of (x, y, time) tuples. A Satin interpreter, associated withthe sheet, is automatically notified when a stroke is drawn on the sheet and receives astroke reference that can be used to obtain information about the stroke (i.e.,characterizing points, length, height, width, etc.) and/or to modify it (for example, tochange its color or to execute beautification operations). We have developed a customizedinterpreter that converts the available stroke information into a suitable representation forour system.



dx.doi.org/10.1016/j.jvlc.2007.04.002


The AgentSketch editor is also able to load and save sketches in InkML format [36].InkML is an XML data format, designed by the W3C working group, suitable to transferdigital ink data between devices and software components, and for storing hand-inputtraces for handwriting recognition, signature verification or gesture interpretation. Byusing InkML, AgentSketch is able to load user sketches previously saved and to importsketches realized with other sketch-based editors.

Stroke Segmenter and Stroke Classifier. The purpose of the Stroke Segmenter is to spliteach stroke into a set of meaningful sub-strokes enabling the user to draw each symbolwith a single-stroke or with a set of strokes. Currently, we have used the segmentationalgorithm provided by Satin [24]. In the future, we intend to use the dynamic programm-ing algorithm for stroke segmentation proposed in [37], which is able to split a freehandstroke into the optimal number of line segments and elliptical arcs. When a new stroke isreceived by the Stroke Pre-Processing Agent it is segmented into a set of sub-strokes(possibly one). The classification process performed by the Stroke Classifier is then appliedto each sub-stroke.The Stroke Classifier analyzes each sub stroke received, and classifies it into one or

more domain independent primitive shapes with a set of attributes that the SRAs canuse to build the input for AgentSketch HDSRs. In the current version of AgentSketch,the Stroke Classifier is based on HHreco [38], a software library providing multi-strokesymbol recognition capabilities written in Java. HHreco employs a statistical approach tosketched symbol recognition that uses the Zernike moments of strokes as features andenables the user to choose between various classification techniques. We have configuredHHreco to use a Nearest Neighbor Classifier, which compares each user stroke with everysample in the training set by computing the normalized Euclidean distance. To classifysingle strokes in three categories (lines, arcs and ellipses) we have created a training setcomposed of about fifteen samples for each category. The statistical approach is quitesuitable for our problem because usually users draw primitive shapes (arc, line, ellipse) in asimilar way.

HDSR. The HDSRs included in AgentSketch have been implemented following theapproach proposed by LADDER, described in Section 4, and using the Jess engine [31].When the SRA receives a new stroke classification, it builds a new fact representing thenew stroke classification and adds the fact to the working memory of the Jess recognizer.Then the SRA queries the engine to discover if a new domain symbol has been recognizedthanks to the new knowledge just made available. In a similar way, when an SRA receivesa feedback request, it executes a Jess query on the working memory of the Jess recognizerin order to check if the symbol in the feedback request message is related through somerelationship with some of the symbols it has already recognized. Only the HDSR of thecommunication symbol has been implemented with Java, since it simply tests if one or morereceived stroke interpretations represent a line longer than a given threshold.

Graph manager and reasoning module. The SIA of AgentSketch contains a reasoningmodule and a graph manager. The first analyzes the graph of the recognized symbols anddetermines how to interpret the input drawings. Currently, the interpretation is chosen byconsidering the contextual information on the conflicting symbols as described in theprevious section. If this module is not activated the conflicts are solved considering theinformation on symbol’s accuracy only.The graph manager is based on JUNG [39], a software library that provides a common

and extensible language for modeling, analysis, and visualization of data that can be



dx.doi.org/10.1016/j.jvlc.2007.04.002


represented as a graph. Moreover, it includes the implementation of many algorithms fromgraph theory.

7.2. An experimental evaluation

In order to evaluate the effectiveness of the proposed recognition system we have run anexperiment in the domain of Use Case Diagrams. We recruited twenty subjects (15 studentsand 5 teachers of computer science) with basic knowledge of Use Case Diagrams, to which webriefly introduced AgentSketch, included the multi-stroke symbol recognition capabilities.

We asked to the subjects to use the system for some minutes until they felt comfortablewith it and with the pen based input device. Then we gave them two diagrams of realsoftware systems, which they had to draw using AgentSketch. They knew that they werenot being timed and that they could not erase strokes during the editing process. The twodiagrams were formed by 16 and 23 symbols, respectively, thus we obtained 780 symbolsdrawn by the subjects. Fig. 16 shows one of the diagrams.

To analyze the importance of the contextual information for the interpretation ofsketched symbols, we compared the results obtained by the context-based recognizer,indicated with CR, with those obtained deactivating the SIA contextual reasoning module,indicated with BR (baseline recognizer). As said above, with this approach, the conflictsare solved by considering the shape’s accuracy only.

Table 1 contains some statistics on the recognition performances of both systems. Inparticular, for each domain symbol we reported: the number of instances drawn by thesubjects, those correctly recognized by BR and CR with the relative percentages, thepercentage of true negatives (TN), the number of false positives (FP), and the percentageof instances not recognized by any HDSR (US).

The results show that the context-based conflict resolution technique implemented bythe SIA considerably improves the overall precision in the recognition. On average, the

Fig. 16. AgentSketch user interface.



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

Table 1

Recognition statistics by symbol

Symbols #Instances BR CR %US

#Correct %Correct %TN #FP #Correct %Correct %TN #FP

Actor 100 73 73 27 5 84 84 16 1 0

Use Case 260 206 79.23 13.46 15 241 92.69 0 10 7.31

Communication 280 259 92.50 6.07 76 274 97.86 0.71 52 1.43

Include 60 37 61.67 26.66 28 40 66.67 21.66 12 11.67

Extend 40 24 60 30 0 27 67.5 22.5 0 10

Generalize 40 26 65 27.5 4 35 87.5 5 4 7.5

Total 780 625 80.13 15.13 128 701 89.87 5.39 79 4.74

% BR ¼ percentage of symbol instances correctly interpreted without using contextual information.

% CR ¼ percentage of symbol instances correctly interpreted using contextual information.

% TN ¼ percentage of real instances of a symbol S recognized as another symbol (true negatives).

# FP ¼ number of symbol instances erroneously recognized as a symbol S (false positives).

% US ¼ percentage of symbol instances unrecognized.


baseline system correctly identified 80% of the symbols while CR correctly identified 90%,with a 39% reduction in the number of recognition errors (i.e., the false positives decreasefrom 128 to 79). This is particularly true for the generalize symbol. This symbol is oftenrecognized as both generalize and communication, and by considering only the shapeaccuracy communication wins almost all the conflicts. On the other side, the generalize

symbol is the only one that can link two actors. Thus, in many cases the use of contextualinformation allows to disambiguate the generalize symbol properly.We can also notice that include and extend are the symbols with worst performances

for both BR and CR. This is mainly due to the symbol recognition approach, which isunsuitable to recognize letters.Table 2 shows the statistics of each diagram for CR only. We can observe that as the

number of symbols to be recognized in a diagram increases, the more the number of truenegatives and false positives decreases. This is mainly due to the increase of contextualinformation exchanged between SRAs. On the contrary, the number of unrecognizedsymbols is dependent from the imprecision in the symbol drawing only.Table 3 reports the confusion matrix obtained for the CR recognizer. The rows contain

the interpretations obtained for the drawn symbols, while the columns indicate theinstances (mis-)recognized for a given symbols. As an example, the Generalize rowindicates that 35 out of 40 generalize symbols drawn are correctly recognized and 2 havebeen misrecognized as include (true negatives), whereas the Use Case column indicates that251 Use Case symbol instances have been recognized, 10 of which are false positives sincethey should be part of the actor symbol. Thus, a number in the matrix indexed by (row,column) indicate how many times a row symbol is misclassified as a column symbol. Thecorrectly recognized symbols appear on the diagonal.As shown in the matrix, when an actor symbol is misrecognized, more than one symbol

interpretation can be produced. This motivates why the first row exceeds the total numberof actor symbols (100). Indeed, the actor head can be interpreted as a Use Case and theactor body as a set of communication symbols, which obtain feedback with the Use Case.For the same reason some include symbols are misrecognized as communication symbols.



dx.doi.org/10.1016/j.jvlc.2007.04.002

ARTICLE IN PRESS

Table 2

Recognition statistics of CR by diagram

#Symbols Total no. of symbols #Correct %CR %TN %FP %US

Diagram 1 16 320 277 86.56 8.75 11.9 4.69

Diagram 2 23 460 424 92.17 3.04 7.3 4.78

Table 3

Confusion matrix for symbols recognized by CR.

Symbols Recognized

Actor Use Case Communication Include Extend Generalize

Drawn Actor 84 10 39 4 – –

Use Case – 241 – – – –

Communication 1 – 274 1 – –

Include – – 9 40 – 4

Extend – – 4 5 27 –

Generalize – – – 2 - 35


Finally, analyzing the confusion matrix columns we notice that the communication andinclude symbols are those more involved in false positives. This depends on the fact thattheir shape is contained, or at least in the case of include, in all the other symbols except forthe Use Case. On the contrary, the extend symbols are never involved in false positivesbecause the letter E has a little probability to be contained in the other drawing symbols.

8. Related work

In the last two decades several approaches have been proposed for the recognition offreehand drawings. The novelty of our work consists in the exploitation of intelligentagents for the integration and coordination of heterogeneous hand-drawn symbolrecognizers.

If we consider the agent technology, that mainly characterizes our approach, we findthat very few approaches are based on it. One of the oldest systems we are aware of isQuickSet, a suite of agents for multimodal human-computer communication [40].Underneath the QuickSet suite of agents lies a distributed, blackboard-based, multi-agentarchitecture. The blackboard acts as a repository of shared information and as afacilitator. The agents rely on this facilitator for brokering, message distribution, andnotification. The gesture recognition agent recognizes gestures from strokes drawn on themap. Along with the coordinate values, each stroke from the user interface also providescontextual information about objects touched or encircled by the stroke. Recognitionresults are an n-best list of interpretations and an associated probability estimate for eachinterpretation. This list is then passed to the multimodal integrator that accepts typedfeature structures from both the gesture and the parser agents (that interpret naturallanguage sentences), and unifies them. A very similar, but more recent, agent-basedmultimodal system is Demo, described in [41]. In [42], Achten and Jessurun discuss howgraphic unit recognition in drawings can take place using a multi-agent systems approach,



dx.doi.org/10.1016/j.jvlc.2007.04.002


where singular agents may specialize in graphic unit-recognition, and multi-agent systemscan address problems of ambiguity through negotiation mechanisms. In [43], Mackenzieand Alechina propose an agent-based technique for the classification and understanding ofchild-like sketches of animals, using a live pen-based input device. Once the segmentationstage is completed, an agent-based search of the resultant data begins in an attempt toisolate recognizable high-level features within the sketch. Newly recognized strokes areinserted into an ‘‘arena’’ (a sort of blackboard shared by all the agents, where agents canmove), and every single agent is in charge of recognizing a specific high-level feature.Agents are mobile, and they wander around the arena looking for strokes that they can usefor recognizing a symbol. The mobile agents gradually become more and more restlessover time if they are unsuccessful in their task. The more restless an agent is, the lessreputable it is considered by other agents; agents that are happy with their position areindeed considered more reliable, since they are likely to have correctly recognized theirsymbol, and they exert a major influence on the other agents. Agents can cooperate in animplicit way, if they are close enough, and if the strokes that they are recognizing should beclose enough in the sketch. In this case, the close agents become more confident in theirrecognition. In [21], Juchmes et al. describe EsQUIsE, an interactive tool for free-handsketches to support early architectural design. The EsQUIsE environment uses pencomputer technologies featuring the ‘‘virtual blank sheet’’. Lines drawn on the screen arecaptured and interpreted in real time thanks to the activity and collaboration of differenttypes of agents. ‘‘One stroke’’ agents recognize single strokes, while ‘‘multi-stroke’’ agentsrecognize group of strokes with chronological, topological and geometric relations, such asdotted lines. The system also involves agents that recognize individual chars, and adictionary agent. Each stroke can be marked with a flag by the agents (apart from thedictionary one) that are sure enough to have recognized a symbol by using this stroke.When many different interpretations of the drawing are possible, the agents must worktogether to propose a pertinent interpretation of the sketch. In particular, agents thatput a flag on the same stroke, collaborate to decide which of them is wrong, based onthe probability that each of them assigns to the successful recognition. If for example,a long sequence of vertical lines is drawn, the char recognizer agent puts flags on them,since it is almost sure that they are ‘‘i’’s, and the multi-stroke agent recognizing ‘‘hatchmarks’’, also puts flags on them. Since the dictionary agent cannot find any wordcomposed by only ‘‘i’’s, the text recognizer agent realizes that it was wrong, and removesits flags from the vertical lines. Thus, the sequence of vertical lines is recognized as a hatchmark. In [44], Azar et al. extend the previous multi-agent architecture with the possibilityof interpreting also vocal information. The graphical inputs are interpreted by either rule-based agents or model-based agents, while the spoken inputs are interpreted by model-based vocal agents.When we compare our proposal with those using the agent technology, we find that the

main differences lie in the technique exploited for recognizing symbols from strokeclassifications, that is established once and for all by the other approaches, and in theintended usage domain of the system, which is very specific for all the implementedsystems. In fact, Quickset and Demo have been developed for interacting with maps andfor drawing Gantt charts, respectively. The system described in [43] classifies child-likesketches of animals, and EsQUIsE is designed for architectural sketches. The only general-purpose view is provided by Achten and Jessurun. However, their paper is more similar toa feasibility analysis of the adoption of multi-agent techniques to sketch recognition in the



dx.doi.org/10.1016/j.jvlc.2007.04.002


very general case, rather than to a proposal of a concrete MAS architecture, like the otherpapers and ours are.

Regarding the construction of symbol recognizers, besides the ones described in Section 4,many other approaches have been proposed [26,29,30,45,46]. We believe that each of theseproposals can be integrated into our framework in spite of the fact that they differ onefrom another under several aspects, ranging from the identification of the shape of thesymbols to the approach used to construct them. For instance, the Rubine recognitionengine is a trainable recognizer for single stroke gestures [30]. Gestures are represented byglobal features and are classified according to a linear function of the features. However,the recognizer is applicable to single-stroke sketches and is sensitive to the drawingdirection and orientation. In [26], Apte et al. developed a hard-coded recognizer thatexamines the geometric properties of the convex hull of a symbol. The recognizer alsomakes use of special geometric properties of particular shapes. In [29], Kara and Stahovichhave developed a symbol recognizer that is capable of learning new definitions from singleprototype examples. Moreover, since it is based on a down-sampled bitmap representation,it is particularly useful for drawings with heavy over-stroking and erasing. In [45], Gennariet al. present a circuit diagram recognition system that runs isolated symbol recognizers togenerate an interpretation. The symbols are located by considering the areas with highdensity of pen strokes and the temporal information associated to the segmented inputstrokes. Then, the candidate symbols are classified using a statistical model constructed ontraining examples. Their system has as number of strengths such as fast recognition,support for multi-stroke objects and multi-object strokes and arbitrary stroke orderingswithin each object. However, the approach constrains users to draw non-overlappingsymbols and to complete drawing one symbol before beginning the next. Moreover, toreduce the size of the search space, the parsing algorithm sets a limit for the number ofsegments that a symbol may contain. Finally, in [46] Krishnapuram et al. present agenerative probabilistic framework for symbol recognition. Their approach is able torecognize messy drawings using single examples by assuming a Gaussian noise model.However, to keep the search for the optimal segmentation tractable, the authors assumethat each subset of stroke fragments considered during segmentation contains no morethan seven straight line segments. This might be a severe limitation in domains withmoderately complex objects.

Regarding the systems that take into account contextual information for sketchrecognition several interesting approaches have been proposed [8,13,47,48]. We believethat the rationale underlying these proposals can be integrated into our SIA reasoningmodule. Mahoney and Fromherz [47] uses a structural language to model and recognizestick figures. The recognition process starts with the generation of an attributed graphrepresenting the input sketch using the set of lines in the sketch and their relationships.Then, by applying perceptual organization principles such as good continuation andproximity the scene graph is augmented by subgraphs corresponding to possibleambiguities in the interpretation of line relations. Next, the algorithm looks for instancesof the model graph in the scene graph by performing subgraph matching. In [13], Karaet al. present a multi-level parsing scheme based on a ‘‘mark-group-recognize’’ archi-tecture. The recognition process is guided from ‘‘marker symbols’’, which are symbols easyto recognize. The identified marker symbols are used to efficiently cluster the remainingstrokes into individual symbol using a trainable symbol recognizer having the advantage oflearning new definitions from single prototype examples [29]. The parser uses contextual



dx.doi.org/10.1016/j.jvlc.2007.04.002


knowledge to both improve accuracy and reduce recognition times. The applicability ofthis approach is constrained to the presence of marker symbols in the domain language. In[49], Costagliola et al. present a multi-layer parsing strategy for the on-line recognition ofhand-drawn diagrams based on the grammar formalism of SkGs [27]. The recognitionsystem consists of three hierarchically arranged layers that include context-baseddisambiguation and ink parsing. The recognizer of the diagrammatic language, which isat the top of the hierarchy, applies its context knowledge to prune some of the symbolinterpretations, to force the recognition of incomplete symbols, and to select the moresuitable interpretation. A similar approach has been developed by Alvarado and Davis[8,10]. They present a parsing approach based on dynamically constructed Bayesiannetworks that combines bottom-up and top-down recognition algorithm that generates themost likely interpretations first, then actively seeks out parts of those interpretations thatare still missing. A major strength of this system is its ability to model low-level errorsexplicitly and use top-down and bottom-up information together to fix errors that can beavoided using context. Finally, by the observation that in certain domains people drawobjects using consistent stroke orderings, Sezgin and Davis have presented a hierarchicalrecognition model that uses both stroke- and object-level temporal ordering information(i.e., the temporal context) gathered automatically from training data [48]. By exploitingknowledge of how people characteristically follow specific patterns when drawing certainsketches, segmentation and recognition can be perform efficiently and some recognitionerrors should be avoidable. However, the recognition algorithm requires each object to becompleted before the next one is drawn.

9. Conclusions and future work

In this paper we have presented an agent-based framework for interpreting hand-sketched symbols in a context-driven fashion, exploiting heterogeneous techniques for therecognition of each symbol. The recognition process is supported by intelligent agents(SRAs) that manage the activity of hand-drawn symbol recognizers, and coordinatethemselves in order to provide efficient and precise interpretations of the sketch to the user.In a certain sense, SRAs can be seen as mediators that implement the middle layer forintegrating information provided by different data sources (the HDSRs). The approach toinformation mediation based on intelligent agents has a long tradition [50,51]. In ourapproach we apply the ideas behind mediation to a new research field.We have also presented AgentSketch, a sketch recognition system implemented

according to the framework exemplified in Section 3. It supports both on-line and off-line recognition mode and can be applied to a variety of domains by providing the suitablerecognizers. We have performed a first experimental evaluation of the system on the UseCase Diagrams domain, and the results have indicated that the use of context todisambiguate symbol shapes significantly reduced recognition error over a baseline systemthat did not consider contextual information.We are investigating how to make our framework a fully open and dynamic MAS,

where new SRAs and new SIAs can be integrated at run-time, and exploring ways toimprove the effectiveness of the recognition process, such as the integration of a parser intothe SIA for analyzing the syntax of the domain language. This will allow us to improve theaccuracy in computing feedback information and to reduce the number of activerecognition processes.



dx.doi.org/10.1016/j.jvlc.2007.04.002


We also intend to investigate user interface issues, such as how to visualize feedback forthe recognition results. In the current version of AgentSketch it is the user that decideswhen to interpret the input drawings, but this can lead to compound errors that can bedifficult to detect and correct. On the other hand, recognizing and adjusting shapes as theyare drawn can distract the user during the editing process. Thus, sketch-based interfacesshould balance between these two modes of showing feedback [9].

Finally, for improving the recognition performances and the usability of the system itwould be useful to incorporate information from external knowledge sources in therecognition process. As an example, the user could explicitly indicate the interpretation ofa set of strokes interacting with the user interface or giving voice commands. The proposedagent-based framework has the advantage to be easily extensible, so external informationcan be incorporated in the system independently from the nature of the suppliedinformation.

References

[1] D. Blostein, L. Haken, Using diagram generation software to improve diagram recognition: a case study of

music notation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(11) (1999) 1121–1136.

[2] M.D. Gross, The electronic cocktail napkin—A computational environment for working with design

diagrams, Design Studies 17 (1) (1996) 53–69.

[3] P. Haddawy, M.N. Dailey, P. Kaewruen, N. Sarakhette, L.H. Hai, Anatomical sketch understanding:

recognizing explicit and implicit structure, Artificial Intelligence in Medicine 39 (2) (2007) 165–177.

[4] T. Hammond, R. Davis, Tahuti: a geometrical sketch recognition system for UML class diagrams, in:

Proceedings of the AAAI Symposium on Sketch Understanding, AAAI Press, 2002, pp. 59–66.

[5] J.A. Landay, B.A. Myers, Sketching interfaces: toward more human interface design, IEEE Computer 34 (3)

(2001) 56–64.

[6] M.W. Newman, J. Lin, J.I. Hong, J.A. Landay, DENIM: an informal web site design tool inspired by

observations of practice, Human–Computer Interaction 18 (3) (2003) 259–324.

[7] T.F. Stahovich, R. Davis, H. Shrobe, Generating multiple new designs from a sketch, Artificial Intelligence

104 (1-2) (1998) 211–264.

[8] C. Alvarado, R. Davis, SketchREAD: a multi-domain sketch recognition engine, in: Proceedings of User

Interface and Software Technology (UIST’04), Santa Fe, NM, USA, ACM Press, New York, October 24–27,

2004, pp. 23–32.

[9] T. Igarashi, B. Zeleznik, Sketch-based Interaction, IEEE Computer Graphics and Applications 27 (1) (2007)

26–27.

[10] C. Alvarado, R. Davis, Dynamically constructed Bayes nets for multi-domain sketch understanding, in:

Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI’05),

Edinburgh, Scotland, UK, July 30–August 5, 2005, pp. 1407–1412.

[11] G. Casella, G. Costagliola, V. Deufemia, M. Martelli, V. Mascardi, An agent-based framework for context-

driven interpretation of symbols in diagrammatic sketches, in: Proceedings of IEEE Symposium on Visual

Languages and Human-Centric Computing (VL/HCC’06), Brighton, UK, IEEE CS Press, 2006, pp. 73–80.

[12] Object Management Group. UML Specification version 2.0. /http://www.omg.org/technology/documents/

formal/uml.htmS, 2005.

[13] L.B. Kara, T.F. Stahovich, Hierarchical parsing and recognition of hand-sketched diagrams, in: Proceedings

of User Interface and Software Technology (UIST’04), 2004, ACM Press, 2004, pp. 13–22.

[14] D. Arrivault, N. Richard, P. Bouyer, Fuzzy hierarchical attributed graph approach for handwritten

hieroglyphs description, in: Proceedings of 11th International Conference on Computer Analysis of Images

and Patterns (CAIP’05), Lecture Notes in Computer Science, vol. 3691, Springer, Berlin, 2005, pp. 748–755.

[15] M. Luck, P. McBurney, O. Shehory, S. Willmott, The AgentLink Community, Agent technology: computing

as interaction—a roadmap for agent-based computing, AgentLink III, 2005.

[16] M. Wooldridge, N.R. Jennings, Intelligent agents: theory and practice, The Knowledge Engineering Review

10 (2) (1995) 115–152.



http://www.omg.org/technology/documents/formal/uml.htm

http://www.omg.org/technology/documents/formal/uml.htm

dx.doi.org/10.1016/j.jvlc.2007.04.002


[17] N.R. Jennings, K.P. Sycara, M. Wooldridge, A roadmap of agent research and development, Journal of

Autonomous Agents and Multi-Agent Systems 1 (1) (1998) 7–36.

[18] M. Amer, A. Karmouch, T. Gray, S. Mankovski, An agent model for resolution of feature conflicts in

telephony, Journal of Network and Systems Management, 8(3) (2000).

[19] J. Chu-Carroll, S. Carberry, Communication for conflict resolution in multi-agent collaborative planning, in:

Proceedings of the International Conference on Multi-Agent Systems (ICMAS’95), 1995, pp. 49–56.

[20] J.L.T. da Silva, V.L. Strube de Lima, Lexical categorical disambiguation using a multi-agent systems architecture,

in: Proceedings of the International Conference on Multi-Agent Systems (ICMAS’98), 1998, pp. 417–418.

[21] R. Juchmes, P. Leclercq, S. Azar, A freehand-sketch environment for architectural design supported by a

multi-agent system, Computers & Graphics 29 (6) (2005) 905–915.

[22] M. Luck, P. McBurney, C. Preist, A manifesto for agent technology: towards next generation computing,

Journal of Autonomous Agents and Multi-Agent Systems 9 (3) (2004) 203–252.

[23] F. Zambonelli, A. Omicini, Challenges research directions in agent-oriented software engineering, Journal of

Autonomous Agents and Multi-Agent Systems 9 (3) (2004) 253–283.

[24] J.I. Hong, J.A. Landay, SATIN: a toolkit for informal ink-based applications, in: Proceedings of User

Interface and Software Technology (UIST’00), November 5–8, San Diego, California, ACM Press, New

York, 2000, pp. 63–72.

[25] T. Hammond, R. Davis, LADDER: A sketching language for user interface developers, Computers &

Graphics 29 (4) (2005) 518–532.

[26] A. Apte, V. Vo, T.D. Kimura, Recognizing multistroke geometric shapes: an experimental evaluation, in:

Proceedings of User Interface and Software Technology (UIST’93), 1993, ACM Press, New York,

pp. 121–128.

[27] G. Costagliola, V. Deufemia, M. Risi, Sketch grammars: a formalism for describing and recognizing

diagrammatic sketch languages, in: Proceedings of the Eighth International Conference on Document

Analysis and Recognition (ICDAR’05), Seoul, South Korea, IEEE CS Press, 2005, pp. 1226–1230.

[28] M.J. Fonseca, C. Pimentel, J.A. Jorge, CALI—An online scribble recognizer for calligraphic interfaces, in:

Proceedings of the AAAI Symposium on Sketch Understanding, AAAI Press, 2002, pp. 51–58.

[29] L.B. Kara, T.F. Stahovich, An image-based trainable symbol recognizer for hand-drawn sketches,

Computers & Graphics 29 (4) (2005) 501–517.

[30] D. Rubine, Specifying gestures by example, Computer Graphics 25 (4) (1991) 329–337.

[31] E. Friedman-Hill, Jess, The Java Expert System Shell, Technical Report, Sandia National Laboratories,

2000, /http://herzberg.ca.sandia.gov/jessS.

[32] G. Costagliola, V. Deufemia, G. Polese, A framework for modeling and implementing visual notations with

applications to software engineering, ACM Transactions on Software Engineering and Methodology, 13(4)

(2004) 431–487.

[33] A. Ferreira, M. Vala, J.A. Madeiras Pereira, J.A. Jorge, A. Paiva, A calligraphic interface for managing

agents, in: Proceedings of the 14th International Conference in Central Europe on Computer Graphics,

Visualization and Computer Vision (WSCG’06), 2006, pp. 25–31.

[34] F. Bellifemine, A. Poggi, G. Rimassa. Developing multi-agent systems with JADE, in: Seventh International

Workshop Agent Theories Architectures and Languages (ATAL2000), Lecture Notes in Computer Science,

vol. 1986, Springer, Berlin, 2000, pp. 89–103.

[35] FIPA ORG, FIPA ACL Message Structure Specification, Document no. SC00061G, 2002.

[36] InkML, /http://www.w3.org/TR/InkML/S, October 2006.

[37] Y. Liu, Y. Yu, W. Liu, Online segmentation of freehand stroke by dynamic programming, in: Proceedings of

International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, South Korea, IEEE

CS Press, 2005, pp. 197–201.

[38] H. Hse, A.R. Newton, Sketched symbol recognition using zernike moments, in: Proceedings of International

Conference on Pattern Recognition (ICPR’04), Cambridge, UK, August, IEEE CS Press, 2004, pp. 367–370.

[39] JUNG-the Java Universal Network/Graph Framework, ver. 1.7.5, /http://jung.sourceforge.net/S, 2006.

[40] P.R. Cohen, M. Johnston, D. McGee, I. Smith, J. Pittman, L. Chen, J. Clow, Multimodal interaction for

distributed interactive simulation, in: Proceedings of Innovative Applications of Artificial Intelligence

Conference (IAAI’97), 1997, AAAI Press, pp. 978–985.

[41] E. Kaiser, D. Demirdjian, A. Gruenstein, X. Li, J. Niekrasz, M. Wesson, S. Kumar, Demo: a multimodal

learning interface for sketch, speak and point creation of a schedule chart, in: Proceedings of the Sixth

International Conference on Multimodal Interfaces (ICMI’04), State College, Pennsylvania, USA, October

14–15, 2004, pp. 329–330.



http://herzberg.ca.sandia.gov/jess

http://www.w3.org/TR/InkML/

http://jung.sourceforge.net/

dx.doi.org/10.1016/j.jvlc.2007.04.002


[42] H.H. Achten, A.J. Jessurun, An agent framework for recognition of graphic units in drawings, in:

Proceedings of 20th International Conference on Education and Research in Computer Aided Architectural

Design in Europe (eCAADe’02), Warsaw, 2002, pp. 246–253.

[43] G. Mackenzie, N. Alechina, Classifying sketches of animals using an agent-based system, in: Proceedings of

10th International Conference on Computer Analysis of Images and Patterns (CAIP’03), Lecture Notes in

Computer Science, vol. 2756, Springer, Berlin, 2003, pp. 521–529.

[44] S. Azar, L. Couvreury, V. Delfosse, B. Jaspartz, C. Boulanger, An agent-based multimodal interface

for sketch interpretation, in: Proceedings of International Workshop on Multimedia Signal Processing

(MMSP-06), British Columbia, Canada, 2006.

[45] L. Gennari, L.B. Kara, T.F. Stahovich, Combining geometry and domain knowledge to interpret hand-

drawn diagrams, Computers & Graphics 29 (4) (2005) 547–562.

[46] B. Krishnapuram, C. Bishop, M. Szummer, Generative Bayesian models for shape recognition, in:

Proceedings of 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR-9), Japan,

IEEE CS Press, 2004, pp. 20–25.

[47] J.V. Mahoney, M.P.J. Fromherz, Three main concerns in sketch recognition and an approach to addressing

them, in: Proceedings of AAAI Spring Symposium on Sketch Understanding, AAAI Press, 2002, pp. 105–112.

[48] T.M. Sezgin, R. Davis, Sketch interpretation using multiscale models of temporal patterns, IEEE Computer

Graphics and Applications 27 (1) (2007) 28–37.

[49] G. Costagliola, V. Deufemia, M. Risi, A multi-layer parsing strategy for on-line recognition of hand-drawn

diagrams, in: Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing

(VL/HCC’06), Brighton, UK, IEEE CS Press, 2006, pp. 103–110.

[50] S. Bergamaschi, Extraction of informations from highly heterogeneous sources of textual data, in:

Proceedings of First International Workshop on Cooperative Information Agents (CIA’97), Lecture Notes in

Computer Science, vol. 1997, Springer, Berlin, pp. 42–63.

[51] V.S. Subrahmanian, S-S. Chen, J.A. Hendler, R. Hull, V. Tannen, Smart mediators and intelligent agents

(panel), in: Proceedings of the Fifth International Conference on Information and Knowledge Management

(CIKM’96), 1996, p. 343.



dx.doi.org/10.1016/j.jvlc.2007.04.002

an agent-based framework for sketched symbol interpretation · 2008. 2. 20. · journal of visual...

Documents