multi-paradigm software environment for the real-time processing of sound, music and multimedia

Multi-paradigm software environment for the real-time

processing of souml, nmsic aml multimedia

Antonio Camurri, Carlo lnnocenti and Clandio Massucco

The paper introduces a system and a software architecture for the representation and real-time processing of sound, music, and multimedia based on artificial intelligence techniques. This system, called WinProcne/HARP, is able to represent objects in a two-fold formalism-- symbolic and analogical - - at different levels of abstraction, and to carry out plans according to the user's goals. It also provides both formal and informal analysis capabilities for extracting information. In WinProcne/HARP the user can build, update, browse, and merge various knowledge bases of sound, music, and multimedia material, as well as enter queries, start and manage real time performance, using a high-level graphical user interface. The system is currently used by researchers and composers in various experiments, including (a) advanced robotics projects, in which the system is used as a tool for interacting, controlling and simulating robot movements, and (b) theatrical automation, where the system is delegated to manage and integrate sound, music, and three- dimensional computer animations of humanoid figures. The paper explicitly refers to some applications in the music field.

Keywords: AI and music, multimedia knowledge representation, hybrid systems, analogical reasoning

Systems for sound and music processing are generally based on tools and languages designed for low-level manipulation of music scores and composition algorithms (Loy and Abbott, 1985). Music V (Mathews, 1969), and cmusic (Moore, 1990) are two well known

Computer Music Laboratory, Department of Communication, Computer and System Sciences (DIST), University of Genoa, Via Opera Pia 11A, 1-16145 Genoa, Italy Paper received 1 April 1993. Revised paper received 18 May 1993. Acccepted 4 May 1993

examples of such a class of traditional software systems for sound synthesis.

In the last few years, both researchers and industries have shown a growing interest in such kinds of systems. In fields such as user-interface design, and data and knowledge base management systems, the integration of capabilities encompassing graphics, animation, voice, sound and music into a flexible, high-level interactive multimedia computing system is one of the major goals. In the human-computer interaction field, important aspects are the spatial and temporal management and processing of integrated multimedia objects.

In this scenario, research in AI and music (Roads, 1985; Leman, 1989; Camurri, 1990) plays a significant role, from both the academic and industrial viewpoints. On the one hand, research efforts are directed toward the definition of models for the manipulation of sound and music in multimedia information systems; on the other hand, research is focused toward the definition of intelligent systems for music processing (Camurri and Canepa, 1991), with the aims of achieving a deeper understanding of the cognitive aspects of the 'musical mind', and pro- viding composers, musicologists, and researchers in general with more powerful computer tools. For example, significant studies are currently in progress in several areas: real-time performance including interpretation (Bresin, De Poli and Vidolin, 1991), computer assisted music composition (Dannenberg et al., 1991; Pope, 1991), listening (Leman, 1991), tutoring, music analysis and musicological applications.

In this paper, we introduce a software architecture for the representation and real-time processing of sound, music, and multimedia that is useful for both specific artistic applications and multimedia systems. The specific experimental paradigm adopted in this paper to show

0950-705119410210114--13 © 1994 Butterworth-Heinemann Ltd 114 Knowledge-Based Systems Volume 7 Number 2 June 1994

Multi-paradigm software environment for real-time processing: A Camurri et al.

the system's features is in the field of computer aided composition. This does not restrict its role.

Our system, called WinProcne/HARP (WINdows PROlog tool Combining logic and semantic NEts for Hybrid Action Representation and Planning), is the core module of the Intelligent Musical Workstation project (Camurri and Haus, 1991) of the National Council of Research. WinProcne/HARP is a knowledge-based system that is able to store and process music and sounds, and to carry out plans for manipulating this material according to composers' goals. The system is also designed to control the generation of new pieces as well as the manipulation of existing ones at different levels of detail, on the basis of composer-supplied material and requests. Several capabilities of the system are provided by the knowledge representation environment WinProcne (Ardizzone et al., 1993; Costa et al., 1988; Camurri et al., 1993), on which the system is based.

From a knowledge representation viewpoint, the system is grounded on a two-fold formalism: analogical and symbolic.

The low-level sound representation and data on real- time performance, as well as particular recognition and synthesis algorithms, are managed by classes included in an object-oriented concurrent environment: this is the analogical level of representation. The analogical level is grounded on the metaphor of a mental model (Richards, 1990), the importance of which in music representation has been pointed out by Krumhansl (1990) and Leman (1991): music imagery is one of the more investigated concepts in the fields of cognitive musicology and AI and music. The analogical level is implemented as a set of actors (Agha and Hewitt, 1990): each actor is hooked to an object in the symbolic level, which is the repository of high level entities, scores, composition rules, high-level descriptions and definitions in general. Therefore a mental model can be built as a set of concurrent agents, hooked to and controlled by high-level symbolic entities. The activation of such a network of actors produces a simulation/execution of the mental model we are representing. The mental model we refer to is a representation of a simulation of actions in a three-dimensional simu- lated world, based on metaphors (see below in the paper) and on music imagery representations, i.e. the mental representation of music entities based on psychoacous- tics. Note that the system can also operate on the real world itself by substituting for the mental representation with a set of real sensors and actuators in real world situations (e.g. choreographic situations on which the system acts).

The symbolic level of representation consists of a declarative symbolic environment, including a multiple- inheritance semantic network formalism derived from KL-ONE (Brachman and Schmolze, 1985; Woods and Schmolze, 1992), extended with temporal primitives and a typing mechanism.

Summing up, two different types of inference mechanism are available in the system: a symbolic one, to derive new facts, based on classification and inheritance, and an analogical inference mechanism, which allows the system to find out new facts by simulation and measurement on a mental model. A formal link between the two

Symbolic Level nProcne Symbolic DB

I r7 t e r f El c e

Figure 1 WinProcne/HARP architecture

Analogical Level

Experts

l exp ert [

expert

mechanisms exists, so that, for example, asking the semantic network to generate a particular instance of a symbol, say a music object, automatically 'fires' the appropriate methods of the classes at the analogical level which are in charge of generating such an object. A scheduler has been introduced for managing and activat- ing sets of processes (actors) in the analogical level, in the proper temporal order, starting from high-level descriptions and queries at the symbolic level. Another interest- ing feature of this model is the feedback mechanism from the analogical level to the symbolic level: the execution of actors in the analogical level can modify the symbolic database. This allows the system to demand of the analogical level some cases of inference that are cryptic for the symbolic level and in general for symbolic deductive systems. The whole model is therefore hybrid, since it combines symbolic and analogical representations and inference mechanisms. A global scheme of the system is shown in Figure 1.

The system's initial architecture and the basic features of the representation formalism can be found in (Camurri, Canepa, Frixione and Zaccaria, 1991; 1992). A formal presentation of the AI model is given in (Camurri et al., 1993; Ardizzone et al., 1993). In this paper we present an overview of the model and its latest developments, with a particular focus on the following issues:

• The software architecture and the implementation scheme of the whole system, as part of a workstation for multimedia processing,

• The scheduling mechanism for the management and the activation of processes, according to the descrip- tigns defined in the symbolic level of representation,

• The symbolic/analogical communication protocol, which allows the actors defined in the analogical level to receive data from the symbolic knowledge base (KB), and, in their turn, to send back results, which can influence or modify the structure of the symbolic KB.

Knowledge-Based Systems Volume 7 Number 2 June 1994 115

Mul t i - pa rad igm so f tware e n v i r o n m e n t for rea l - t ime p rocess ing : A Camur r i et al.

i I L., "2, ,i I ~'1

....... '.,,',, -.. usic ob jec t . / ) - " 1 1 ....... ~ - - ~ "" ~ _ - ~ ' (' im i ta t i on )

..... ou tpu t . . . . . . . . . . . . . . . . ~-'-

'1.1"1

Figure 2 Simple HARP net showing the definition of an imitation as a specialization (i.e. subclass) of the more general class transformation

R E P R E S E N T A T I O N F O R M A L I S M O F S Y M B O L I C S U B S Y S T E M

input . . . . . . . . . . ~ * r ~ n ~ - - - ~ " * _ _ _ - ~ - i t ransforn la t ior l )

.... ",,4- . . . . . . " J " "~ . . . . -/ .,, rn u SlC_Ot3je ci.) ~ I/nil ........... ,.T--"~ ~

~ r ,--~--- <" -~ - -----,. output .... ~ /" , 4

...... - - I l I - - ' - - " t ! I , I I . . . . . . . . . ---- ;lm,I ,/' (" ir-nitation - ' )

/ ' - - - ' - - ~ / .... L" "

i'--i" 'I .1/ I .,'I ......... _.~L___ _...

H

(mverse_ imi ta l i )

Figure 3 Refinement of net of Figure 2 [The subclass inverse_imitation has been added to specify a particular specialization of imitation.]

A brief introduction to the representation formalism on which the symbolic component of the system is based is given in this section. The reader can find more detailed introductory material on the KL-ONE family in (Brach- man and Schmolze, 1985; MacGregor, 1988; Woods and Schmolze, 1992).

The symbolic subsystem consists of two parts (see Figure 1): the T box (where 'T' stands for 'terminologic') and the A box (where 'A' stands for 'assertional').

The T box is the 'definitional' part of the WinProcne/ HARP knowledge base, i.e. it is the repository for the definitions of the entities and their relations manipulated by the system. It is a sort of long term memory of the system; no individual instance is represented at this level.

Instances of the concepts and relations defined in the T box can be introduced in the A box, a sort of short term memory containing individual instances. Indeed, the A box can have a further role, besides that of short term memory; if necessary, it can also contain knowledge and definitions which cannot be modeled in the T box (e.g. rules), as we discuss further in the paper.

In the T box we use a term 'subsumption language' similarly to KL-ONE, extended with temporal primitive constructs called tense maps (TMs) (Ardizzone et al., 1993). Such a formalism allows a taxonomic representation of objects in a frame-based semantic network formalism (SI net), and is formally equivalent to a subset of first order predicate calculus.

As for the A box, we use a subset of first order predicate calculus corresponding to Horn clauses. The A box is therefore considerably richer than is the case in KL- ONE, e.g. allowing certain kinds of universally quanti- fied formula. As in Krypton (Brachman, Fikes, and Levesque, 1985), the T box gives the definition for one- place and two-place predicates used in the A box. The TMs correspond, at the assertional level, to order relations among objects of the instant type.

Let us examine the very simple fragment of T box in Figure 2. This net shows the relations (roles in KL-ONE) between the music_object and the imitation classes (concepts in KL-ONE, shown in Figure 2 as ellipses). This simple net defines an imitation as a specialization (i.e. subclass) of the more general object transformation.

Imitation has, among other features, exactly one (1/1) input and one output, both of type music_object. The graphic notation is the same as in KL-ONE: ellipses represent classes, double arrows represent is-a specialization links (imitation is-a transformation), and single edges connected to boxes represent relations (roles in KL- ONE, slots in object-oriented programming languages). This example shows only the 'definitional' part of the HARP knowledge base (the T box), that is, only the definitions of the entities and their relationships manipulated by the system. Instances of the concepts defined in the T box can be introduced in the A box.

Symbolic entities can be 'hooked' to actors (Agha and Hewitt, 1990) at the analogical level. This is a sort of symbol grounding mechanism, driven by a scheduler (or simulative engine, a high-level special actor). Actors can be either experts (analogical experts (AEs), who embed some skill, i.e. they are experts, on a part of the domain) or icons (i.e. they are low-level representations of 'snap- shots' of the world domain, e.g. fragments of music scores, sounds, images, animations). In both cases, they are structured as C + + like classes, implementing low- level music or multimedia objects. The communication between actors (the network topology) is defined on the basis of the existing I/O roles (a special type of relation in the symbolic part) connecting their symbolic counter- parts. The topology of actors' links is therefore built by the scheduler on the basis of the information contained in the symbolic level.

In the example of Figure 2, we can specify the expert hooked to imitation, e.g. the class which has a method able to take a music object as input (this means that there is a link from the expert hooked to imitation to the icon actor hooked to music_object), and generate a new instance (and a new icon) of music object as an imitation of the first. Here, the kind and the shape of imitation is not important; however, it is a simple task to specialize different kinds of imitation, as shown in Figure 3 in the case of inverse imitation. If we define a particular instance il of imitation, which has as input an instance ml of music_object, corresponding to a music score, the user can ask the system to determine the output of i~. Note

116 Knowledge-Based Systems Volume 7 Number 2 June 1994


that m~ should be hooked to an icon object (for example, a score object class), and i~ should be hooked to a proper expert able to perform that transformation. In the sim- plest case, it is a sort of procedural attachment to the symbols in the T box and in the A box; the activation of the experts is driven by the scheduler, described below. Figure 3 shows a refinement of the net of Figure 2. The subclass inverse_imitation has been added to specify a particular specialization of imitation. The other refinement in the example is on the role definitions; the roles input and output are now defined for the concept transformation. This means that a transformation is an object that, among other things, has at least one input and at least one output, both of type music_object (1~nil means 'at least 1', and 1/1 means 'exactly 1', i.e. at least 1 and at most 1). Imitation inherits these roles, with a restriction on the number of input/output music objects. It is worth noting that inverse_imitation automatically inherits these restricted roles from imitation.

The symbolic language of the T box shown in these two examples of Figures 2 and 3 has several advantages: it is based on KL-ONE, 'one of the most influential and imitated knowledge representation systems in the Artifi- cial Intelligence community' (Woods and Schmolze, 1992). It is characterized by formally effective and useful inference algorithms. For example, the classifier is able to find the right place in the taxonomy for a new concept, given its properties (roles); the recognizer is a similar algorithm that works for individuals (MacGregor, 1988), which is heavily used by the scheduler each time a new assertion is generated by the analogical level. Further- more, this formalism is object-oriented, and has an intuitive graphical representation.

As for the A box (the part of the knowledge base delegated to store instances), as a minimum requirement it contains one-place and two-place predicates. This minimum definition is sufficient to support the embodi- ment of analogical entities within the symbolic level. Nevertheless, the representation in the A box of more complex multimedia knowledge might require deduction abilities (e.g. rule-based inferences) available only in a more complex A box architecture, for example one that includes Horn clauses. In WinProcne/HARP the expres- siveness of the symbolic level can be decided on the basis of the class of problems; those requiring intensive analogical inference (measurement or simulation) and low deductive power can adopt a simple A box as described above. More complex problems, such as those described in this paper, require more powerful models. In our hybrid architecture, the system allows the definition of independent, multiple A boxes with different deductive powers. This means, for example, that we can have at our disposal A boxes with the power of the full Prolog language, if necessary.

In WinProcne/HARP there is a strict dialogue between the actors in the analogical level and the symbolic level (both T box and A box). As previously men- tioned, symbolic queries can cause calls to experts (controlled by the scheduler), possibly producing addi- tional hooks. It is also possible for the analogical level to modify the symbolic subsystem (add or delete objects).

For example, an expert, once activated, can ask the scheduler to generate new links among symbols and experts or icons, or to produce new instances in the symbolic knowledge base.

Experts can send messages to the scheduler with their requests, and then receive system replies. The basic res- trictions in our system are on the message number and on the inability for an expert to generate new experts and to possess knowledge about the other experts. This is only permitted to the scheduler, the higher-level, first- class actor described in the third section of the paper.

DEFINING SYMBOLIC MUSIC ENTITLES: H I G H - L E V E L O N T O L O G Y

It is useful to define in the T box a high level taxonomy as the starting point of any music and multimedia KB. This implies the definition of the general vocabulary, i.e. the basic entities and their relations, once and for all. In AI terms, these definitions constitute the basic ontology of the domain. A portion of such a high level taxonomy is shown in Figure 4. The concepts action and situation subsume all concepts representing entities characterized by a duration or by a temporal location (their begin and end roles, not shown in Figure 4 for the sake of clarity). Actions are hooked to processes executing some change in the represented world (AEs); situations represent states of the world in which there are no significant changes, and they are hooked to icons. The role participant, with object as a value restriction (V/R), represents the 'physical' objects involved in a particular situation or action, e.g. a musical instrument (in computer music languages it corresponds to a sound synthesis technique), a three- dimensional object on the stage.

The sub-concepts compound_situation and compound_ action describe situations and actions which can be decomposed in terms of sub-parts, which are, in turn, situations and actions, respectively. A situation holds for a certain interval without relevant changes. An action produces some kind of change in the world. For this concept, the roles initial_situation, state, and final_situa- tion are defined. For each action, the filler of initial_ situation is the state of the domain before the action is performed, while the filler offinal~ituation corresponds to the state of the domain after the action is performed. The fillers of state correspond to significant intermediate situations holding during the action's performance. Since a compound_action has one or more actions as subparts (the fillers of the role part_of), its initial instant (the filler of its role begin) precedes or coincides with the initial instants of its subactions. Analogously, its final instant follows or coincides with the final instants of its subparts. Suitable TMs, not shown in Figure 4, represent the above relations.

One of the main differences of our symbolic formalism with respect to traditional KL-ONE formalisms is that concepts and roles are typed; types reflect the basic concepts and roles forming the high level taxonomy described above. Some basic role types are presented in Appendix B. This extension of the KL-ONE language allows the system to integrate better the symbolic level



action..participarct action fe~ure

Figure 4 Excerpt of high-level part of T box

with the analogical level; different types of object correspond to different types of analogical entity.

BRIDGING GAP BETWEEN SYMBOLS AND A C T I O N S

The analogical level is interfaced to the symbolic level by means of a distributed interface module, visible to both the symbolic and the analogical level. The global scheme is shown in Figure 1. The interface embeds the symbolic information necessary to the scheduler, which cannot be represented in the SI net; symbolic entities (concepts, roles and their assertions) have associated headers, which specify 'how' and 'where' these entities can be hooked to particular components in the analogical level. Therefore, the interface is a distributed component whose parts (the headers) are associated with SI net entities; headers follow the inheritance rules defined by the is-a links in the SI net. Anyway, they can be overridden by resident local information.

From the analogical viewpoint, the scheduler uses the interface for deciding which actors must be considered for execution and when they must be fired, suspended, or killed. Its decisions are guided by the symbolic level.

The interface specifies three main parts:

• Hooks connect symbols to analogical entities (symbol grounding).

• ActorsTopology connect actors among them (actors I/O links or actors topology).

• FiringConditions are conditions that cause the start of the execution (firing) of an actor.

The hooks part specifies the 'hooks' between symbols and actors, i.e. the symbols grounding to analogical classes corresponding to executable (sequential) 'bodies' at the analogical level.

An expert is an actor which is an 'expert' of (a sub-set of) the domain, and which performs actions upon it. Icons are actor classes as well, and can be manipulated and investigated by experts by sending messages to their icon methods, i.e. entry points to the icon classes. The icon class is hooked to the high-level concept situation, the expert class is hooked to action, and they are properly inherited by all the subsumed concepts.

An actor can be specific in the related concept (that is, is directly hooked to its concept), inherited (it is inherited from superconcepts), or absent (it must be searched in more specialized concepts, according to the subsumption of the terminological level, or it must be searched in yet another concept via the part_of roles linked to the initial concept). AEs communicate with each other only by requests to the scheduler; an AE a~ can tell another AE a2 to start its action only by asking the scheduler to add one or many assertions, which are in the firing conditions of the header of a2. The firing conditions for AEs specify that an AE (hooked to a concept of type action) must be executed only when a particular (set of) assertion(s) occurs in the symbolic KB.

The communication between AEs can also be implemented in a different way; the scheduler can build a sort



of Unix pipe between experts. These pipes are the analogical counterpart of particular types of I/O roles (the signal roles (see Appendix B)). Once the scheduler has set these pipes, the actors are able to exchange messages and data autonomously. This form of communication is more efficient in comparison with the firing condition mechanism, but the scheduler loses control of the communication, now entirely managed at the analogical level by the AEs involved (and therefore this information does not emerge automatically at the symbolic level).

A role can be of one of the following types, whose semantics are defined either symbolically (e.g. in the aggregate type) or analogically (e.g. in the perceptor type):

• Kill the actor running process.

Actors are synchronous sequential objects, and can use receive/send primitives as well as a select guard construct; they are activated concurrently by the Simulative- Engine::SchedulerO method, which makes use of synchronous message passing primitives. In the following pseudo-code description, we give a sketch of the structure of the analogical subsystem. We use a C+ + notation, extended with primitives to manage concur- rency.

The roles of type signal correspond, at this level, to the input and output ActorList. Their associated data flow is implemented by using the receive/send primitives.

• Perceptor: An AE can either access an icon class to extract data from it, or communicate with other AEs by means of roles of type perceptor. In its turn, a perceptor can be of three types: input or output signal, event, and feature. They are explained in detail in Appendix B.

• Aggregate or part_of" These denote a concept as an aggregation of several other concepts.

• Use: This is a generic use relation. • Temporal." Temporal roles are consulted only via

TMs, and their fillers are only of type instant.

The ActorsTopology part of the interface specifies the I/O information, i.e. the input and output links of the actor from/to other actors (the topology of the connec- tions between actors).

The scheduler uses the FiringConditions part of the interface to manage the firing conditions of the AEs. These conditions are expressed in terms of assertions in the A box; an AE can ask the scheduler to add an assertion which can be used to fire other AEs.

Experts hooked to subconcepts of the same ancestor are likely to have large parts in common. These parts (the interface and the skeleton) are usually hooked to the ancestor. Of course, it is possible to override or refine them. When a new expert is added, only the specialization part (the sequential body) is usually re-written, and this is automatically joined to the skeleton inherited by default. In this way, we define classes of experts in an object oriented architecture whose structure and inheritance are driven by the symbolic level.

The scheduler can send the following main kinds of message to an actor:

• Specify the inputs and outputs. This means that the topology of links among actors is dynamically specified by the scheduler, by initializing the input and output ActorList data structures (defined in the private part of the actor class). This is an approach that contrasts with the static topology definition used for example in CSP. In our model, the topology is initially defined using the actor class constructor and it can then be modified by messages from the scheduler, specifying new links.

• Run/resume the actor (sequential) body (that is, the true actor code).

• Suspend the actor running process.

class Actor { private:

~~Data-structure for internal status control managed by / / : : SequentialBody () ActorList Input_link; //Input mail addresses ActorList Output_link; //Output mail addresses

protected: Event receive();

void send(Event);

void Init_IO0; public:

//Synchronous (blocking) receive primitive

//Synchronous (blocking) send primitive

//Initialize the topology

virtual void SequentialBody0; Actor() { Init_IO0; }

Actor() { send(END); }

class Expert: public Actor { public:

Expert0 : Actor 0 { SequentialBody0; }

void SequentialBody0; }

Expert::SequentialBodyO { do event = receive(); while (event! = START); //Sequential specialization code; it may contain: //sendprimitives to process the output mail address list / /and to ask the scheduler to invoke symbolic primitives //(e.g., add a new assertion); //receive primitives to manage input from actors in the //input mail address list or from the scheduler. }

class Perceptor: public Expert;

class Icon: public Actor { private:

~~Icon data structure protected:

virtual void BuildIcon0; virtual void DestroyIcon0; //here follow icon access methods,



event

~ y sending it messages in which the perceptor specifies the proper icon method to be invoked the icon replies the perceptor requests by sending the selected method reply message

Figure 5 Use of an icon class

//available only to SequentialBody( ): public:

void SequentialBody0; Icon(); ,-~ Icon();

}

Icon::IconO { Actor::Actor0; Buildlcon0; SequentialBody0;

} Icon:: ~ leon() { Actor:: ~ Actor(); Destroylcon0; send(DEAD); }

//Initialize the icon //Process input messages from experts

Icon::SequentialBodyO 11 the icon input messages manager

{ //Sequential specialization code; it contains: //sendprirnitives to process the output mail address list; //receive primitives to manage input from actors in the //input mail address list or from SimulativeEngine:: //Scheduler(). / /Every message returned by receive() specifies the //protected method / / to be used by the actor using the icon. A possible //general scheme for the //part of the SequentialBody() which handles received //messages is the following:

for(EVER) { select message = receive(){ message = = Message_l:

send(:: handle_message_ 1 (me ssage)); message = = Message_2:

send(::handle_message_2(message));

message = = Message_N: send(::handle_message N(message));

}

}

The use of an icon class is sketched in the example of Figure 5. A role of type perceptor has a hooked body

which is able to extract an event from the icon. It performs its task by sending messages to the icon class hooked to an assertion of the room concept.

class SimulativeEngine : public Actor{ l/the scheduler class

private: // Data-structure for the control of the status and the //interface."

Context CurrentContext; ExpertList Experts; IconList Icons; PerceptorList Perceptors;

// the Procne class is the symbolic KB, accessible from / / the scheduler (see appendix A):

friend class Procne; protected:

Boolean isEmpty(Context); / / . . . and all the other methods used in Scheduler()

public: virtual void Scheduler(); SimulativeEngine0;

SimulativeEngine0; }

void SimulativeEngine::SchedulerO

// / / / /

Identify a context • (the CurrentContext): the assertions involved, and the set of fireable AEs, ordered by their TMs and firing conditions.. . CurrentContext = CalculateContext (EmptyContext0); while(!isEmpty(CurrentContext)) { //Fire or resume the AEs whose fr ing conditions and / / T M s are satisfied," RunOrResumeAll(CurrentContext); select event = receive() {

event.message = = NewSymbolicAssertion: / /Manage possible new assertions requested by / /AEs (verify and add assertions), //then call the Recognizer on each new assertion: RecognizeAssertion(event.Data); / / I f new assertions have been added by some AE, / /and some new assertions are out of ~ . . . If(isOutOfContext(event.Data)) //.,.then update • to contain those assertions; CurrentContext = CalculateContext(Current- Context);

event.message = = SymbolicQuery: / /Manage possible symbolic queries requested by //AEs reply.message = event.message; reply.Actor_ID = event.Actor_ID; reply. Data = Procne:: Query(event. Data); send(reply);

event.message = = DEAD: / / I f some actors or icons are dead, they must be //destroyed KillActor(event.Actor_ID);

} //Create the list of actors which must be killed, and / /ki l l them:



begm

Figure 6 Detail of terminological component of WinProcne/HARP knowledge base for diminution example

KillActor(GetKillList0); //Create the list of actors which must be suspended, / /and suspend them: SuspendActor(GetSuspendList0);

} / / r epea t until the context is empty

M U S I C E X A M P L E : D I M I N U T I O N A C T I O N

In this section we show an example of the diminution HARP music action, an expert loosely based on the metaphor of force fields.

The class of experts based on the metaphor of force fields is a relevant one. These descriptions let the user think of and perform a set of actions in terms of the intuitive natural dynamics of navigation in attractor fields, instead of rules. Force fields can be used to model easily complex behaviours, which are very difficult to model as rules. Force fields also give a different viewpoint and provide powerful manipulation primitives. The recent work of Todd and Leman (McAngus Todd, 1992; Leman, 1992) is based on the allusion of musical expression and perception to physical motion, including concepts of energy and mass. For example, they argue that musical phrasing and rhythm have their origin in the kinematic and dynamic variations involved in single motor actions. From the cognitive point of view, this is based on the psycho-physical structure of the human auditory system. Our WinProcne/HARP model is based on these fundamental concepts.

Let us turn to our example. The diminution action selects a music fragment from an existing database on the basis of attraction criteria. An excerpt of the general symbolic KB for this problem is shown in Figure 6; a detail of the part of this KB that stores and uses the database of music fragments is depicted in Figure 7.

USS~

~ sh~nt degree

Figure 7 Diminution example; fragments database in more detail [A set of roots and their corresponding possible diminutions are sketched. Given a root, it is possible to derive a suitable diminished fragment according to the desired features.]

Diminution is a compound music action which tries to substitute a fragment into an existing compound music object (e.g. a piece) with another fragment, which is a 'diminution' (i.e. a sort of variation in music) of the original one. We suppose that we have a database of root fragments, which are a kind of starting fragment which can be matched in the input piece, and that can possibly be substituted for by one of its diminished fragments in the database. For example, in Figure 7, there is a root called root_a, which can be substituted for by aO01, aO02, and so on. Each diminished fragment has a different level of embellishment_degree expressed by the feature role specified in the net, i.e. aO01 is formed by a group of notes that are different from aO02 both in values and in the number of notes, even if their duration is the same. A diminished fragment with a higher number of notes per second has a higher value of embellishment_degree; it is a sort of measure of granularity. As can be seen from Figure 6, diminution is formed by three atomic actions: fragment_match, fragment_view, and fragment_substitu- tion. Each of these atomic action concepts is hooked to a proper expert in the analogical level. The expert hooked to fragment_match either selects a diminished fragment from the fragments_database (whose detailed net is partially expanded in Figure 7), or it generates a new diminished fragment basing on the existing ones in the database. It starts from a specification of the fragment of the piece to be substituted, and its features. The selection of the fragment to be substituted in the original piece is done by the fragment match expert. It is the result of the execution of a particular analysis (segmentation) task operating on the input piece and on the history of previous substitutions. This task identifies significant fragments in the piece to be matched with roots in the database for applying a diminution.

The output fragment derived by fragment_match is sent (via the matched~fragm role) as an input to the second expert (fragment_view) which shows it to the user in a music notation format (see Figure 8). Note that the role matched__fragm, as well as chosen_fragm, works as a 'pipe' communication mechanism between the two experts hooked to the concepts. They are roles of type signal (see Appendix B).


Mul t i - pa rad igm so f tware e n v i r o n m e n t for rea l - t ime process ing: A Camurr i et al.

J

'Root: aab61.00 Tonmlity:.

Transpositions:semitones -S

Time Ouofient:ll 4

Chosen Diminufo

Loeb01.03

Avoila51c Diminutio

aahOl,O0 aabOI,91 ubOl,02 I o|501.64 aahOl,O5 sob01.06

Figure 8 Expert running

SOFTWARE E N V I R O N M E N T

WinProcne/HARP is implemented under Microsoft Windows 3.1 on a 386 (486) personal computer, using Arity Prolog 6.x and Microsoft C/C + + 7.0.

The implemented software system is based on a high- level user interface, as shown in Figure 9. In particular, Figure 9a-d show the basic menus for storing/retrieving KBs, editing, and performing management and utilities operations. A symbolic KB is stored in three types of file (see Figure 9a); a .net file contains the T box, an .ass file contains the A box, and a .lay file contains the layout information. Several utilities allows the user to easily manage, browse in and edit the KB. For example, it is possible to control zoom in/out and fonts, as well as graphic 'aliases' for concepts, for better graphic management of the KB. The user can also define multiple layers, i.e. the KB can be organized in a sort of three-dimensional space organized in multiple levels. Some concepts and roles can be defined in a given layer, and others in another layer, and they can be visible or not from the other layers. All the previous information can be stored in the layout file (.lay) associated with the KB.

The ask menu includes explicit calls to the classifier, the recognizer, the scheduler, the merge module (which loads a KB and merges it with the current KB), a set of standard queries on the T box and A box, and a generic query module, which allows the user to define, store, and load general purpose Prolog queries, including calls to the Procne primitives (see Appendix A), and to add rules to the A box. Instances of a given concept or role can also be requested by clicking with the mouse in the desired concept or role. A menu appears on the selected node (see Figure 9e) and the user can either issue requests on the assertions in the A box for that node, or start interacting with the interface mechanism and the analogical level for that node (explore or execute the hooked actor). Calls to the analogical level can be invoked to start executions and simulations, i.e. executions of the actors (experts or icons) hooked to the symbolic nodes. An example of the activation of an expert is shown in

Print -- tragment u s e _ w e i g h t

- - ~ ~ derives .... '-~-L"C>-~---Z-... a ("- root - ' I [ ] I~_~ dirn fl ag ~____._~e~

Add Role Add DRole Add VRRole Add Isa

Delete Net Item I _Edit Net Item Move Net Item

_Stop editing _ ,r

.1/nil

_ASk Help

.•rlt ) Ijse-weight - ~ __!~-z~ E ] - - - - ~

denves ..... ,. -.>F--~:-..._ - - - [ ~ ] ~ - - - - ~ ( dim fraq ~ _ _ emb~

~,,,-,il ............. "'~TT,, ~" . . . . . . . . . . .

Help Create Alias Hijack Link to Alias Disable Layout

derives " - - ~

--~-" - ~ ° s e _ ~ e , ~ t ~ o p ~

/ / d e r i v e s "~, // \~, ~__._~..._..--[Z~_ . . -~- - - -~ . embe,lishment_degree

II 7.,x. ,. ,oo, kL_ _>¢<:.: ...........

~__ -__J,F---__~ - - t _ {- . . . . ~ ......................................

=============================================

Figure 9 Overview of system user interface

122 K n o w l e d g e - B a s e d Sys tems Vo lume 7 Number 2 June 1994

Multi-paradigm software environment for real-time processing: A Camurri et a/.

Figure 8. A d d i n g a new exper t p r o g r a m is a s imple opera t ion; it can be done at run t ime (no sys tem re -compi la - t ion is needed) , and only requires l imited C + + p r o g r a m - ming expert ise and W i n P r o c n e concepts . Sound ou tpu t can be ob ta ined ei ther off-line, using cmusic experts , whose o u t p u t scores mus t be compi led by the cmusic sof tware system (Moore , 1990), or in a real - t ime M I D I env i ronment .

A n o t h e r more powerfu l vers ion o f the system is under deve lopmen t under Unix on a Sil icon G r a p h i c s Ind igo works ta t ion . In this vers ion we are exper iment ing with more complex case studies re la t ing to advanced robot ics and an ima t ions ( h u m a n o i d recons t ruc t ion , an ima t ion etc.). W e p lan to use bo th versions o f the system inte- g ra ted in a thea t r ica l pro jec t in which the music tasks are de legated to W i n P r o c n e / H A R P on PC Windows , and the ma in c o m p u t e r an ima t ion , recons t ruc t ion etc. are de legated to the vers ion runn ing on the SGI works ta t ion . Both the systems str ict ly in teract by shar ing sensor infor- m a t i o n and exchanging in fo rma t ion on K B upda t ing and A E act iva t ion .

C O N C L U S I O N S

Several app l i ca t ions have been deve loped using Win- P r o c n e / H A R P , with the a im o f bo th refining the represen ta t ion mode l and p roduc ing real work ing systems for the analysis and compos i t i on o f music and mul t ime- dia. In par t i cu la r , the system has been used in the fol low- ing projects:

• by the compose r s C o r r a d o C a n e p a and Giu l i ano Palmier i for the deve lopmen t o f their own music and mul t imed ia compos i t i on projects (Camurr i , Canepa , F r ix ione and Zaccar ia , 1992; Massucco , Mercur io and Palmier i , 1993),

• in a mu l t imed ia thea t r ica l a u t o m a t i o n project under deve lopmen t (Camurr i , Giuf f r ida and Vercelli, 1993),

• in the imp lemen ta t i on o f the mode l o f a recognizer o f tona l features (ha rmon ic analysis) , in co l l abo ra t ion with M a r c L e m a n (Univers i ty o f Ghent , Belgium) (Camur r i and Leman , 1992),

• in the mode l ing and s imula t ion o f advanced robot ics p rob lems (Ard izzone el al., 1993).

The W i n P r o c n e / H A R P sof tware p r o t o t y p e version runn ing on a PC is f reeware (it can be ob ta ined) by sending an e-mai l request to the au thor s (music@dis t . dist.unige.it)).

A C K N O W L E D G E M E N T S

This work has been suppo r t ed by the P roge t to Final iz- za to Sistemi In fo rmat i c i e Ca lco lo Para l le lo o f the I ta l ian Research Counci l , under g ran t 90.00716.PF69. We thank Marce l lo F r ix ione and R e n a t o Zacca r i a for their inva luable con t r i bu t ion to the system design. Others who helped design, implement and test the system include M a r c o Mercur io , the compose r s C o r r a d o C a n e p a and Giu l i ano Palmier i , and the s tudents A le s sand ro Ca to r - cini and Albe r to Massar i .

R E F E R E N C E S

Agha, G and Hewitt, C 'Actors: a conceptual foundation for concurrent object-oriented programming', in Schriver, B and Wegner, P (Eds.) Research Directions in Object Oriented Programming, MIT Press, USA, (1990) Ardizzone, E, Camurri, A, Frixione, M and Zaccaria, R 'A hybrid scheme for action representation' International Journal of Intelligent Systems Vol. 8, No. 3, (1993) pp. 371-403 Brachman, R J and Schmolze, J G 'An overview of the KL-ONE knowledge representation system' Cognitive Science Vol. 9 (1985) pp. 171-216 Brachman, R J, Fikes, R E and Levesque, H J 'Krypton: a functional approach to knowledge representation' in Brachman, R J and Leves- que, H L (Eds.) Readings in Knowledge Representation Morgan Kauf- mann, USA (1985) pp. 411-429 Bresin, R, De Poli, G and Vidolin, A, 'Reti neurali per il controllo delle deviazioni temporali nell'esecuzione musicale' Proceedings of the IX Colloquium on Musical Informatics Genoa, Italy (13-16 Nov 1991) pp. 88-102 Camurri, A 'On the role of artificial intelligence in music research' Interface Vol. 19 No. 2-3 (1990) pp. 219-248 Camurri, A and Canepa, C (Eds.) Proceedings of the IX Colloquium on Musical Informatics Genoa, Italy (13-16 Nov 1991) Camurri, A, Canepa, C, Frixione, M and Zaccaria, R 'HARP: a system for intelligent composer's assistance' IEEE Computer Vol. 24, No. 7 (1991) pp. 64~7 Camurri, A, Canepa, C, Frixione, M and Zaccaria, R 'HARP: a framework and a system for intelligent composer's assistance' in Baggi, D (Ed.) Readings in Computer Generated Music IEEE Computer Society Press (1992) Camurri, A, Giuffrida, F and Vercelli, G 'A system for the real-time control of human models on stage' Proc. X Colloquium on Musical lnformatics Milan, Italy (Dec 1993) Camurri, A and Haus, G 'Architettura e ambienti operativi della Sta- zione di Lavoro Musicale Intelligente' Proceedings of the IX Collo- quium on Musical Informatics Genoa, Italy (13-16 Nov 1991) Camurri, A, Frixione, M, Innocenti, C and Zaccaria, R 'A model of representation and communication of music and multimedia knowledge' Proc. ECAI-92 Vienna, Austria (1992) Camurri, A and Leman, M 'Hybrid representation of music knowledge - - a case study on the automatic recognition of tone centers' Pro~. International Workshop on Models and Representations of Musical Signals Capri, Italy (5-7 Oct 1992) Camurri, A, Frixione, M, Innocenti, C and Zaccaria, R 'SI-nets and analogical models for reasoning in dynamic domains' DIST Technical Report University of Genoa, Italy (Jan 1993) Costa, M, Frixione, M, Gaglio, S, Palladino, D, Spinelli, G, Traversa; M and Zolezzi, M 'PROCNE: a PROlog tool Combining logic and semantic NEts' DIST Technical Report 881 University of Genoa, Italy Dannenberg, R, Lee Fraley, C and Velikonja, P 'Fugue: a functional language for sound synthesis' IEEE Computer Vol. 24, No. 7 (1991) pp. 36-4 1 Krumhansl, C L Cognitive Foundations of Musical Pitch Oxford University Press, USA (1990) Leman, M 'Symbolic and subsymbolic information processing in models of musical communication and cognition' Interface Vol. 18, No. 1 2 (1989) pp. 141-160 Leman, M 'The ontogenesis of tonal semantics: results of a computer study' in Todd, P and Loy, G (Eds.) Music and Connectionism MIT Press, USA (1991) Leman, M 'Tone context by pattern-integration over time' in Baggi, D (Ed.) Readings in Computer Generated Music IEEE Computer Society Press (1992) pp. 117-137 Loy, G, and Abbott, C 'Programming languages for computer music synthesis, performance and composition' ACM Computing Surveys Vol. 17, No. 2 (1985) pp. 235-265 MacGregor, R 'A deductive pattern matcher' Proc AAAI-88 St. Paul, MN, USA (1988) Massucco, C, Mercurio, M and Palmieri, G 'Real-time processing and performance using WinProcne/HARP' Proc. X Colloquium on Musical Informatics Milan, Italy (Dec 1993) Mathews, M V The Technology of Computer Music MIT Press, USA (1969) McAngus Todd, N P 'The dynamics of dynamics: a model of musical expression' J. Acoust. Soc. Am. Vol. 91, No. 6 (1992) pp. 3540-3550


Multi-paradigm software environment for real-time processing: A Camurri et a l

Moore, F R Elements o/' Computer Music Prentice Hall, USA (1990) Pope, S T "The Interim DynaPiano: an integrated computer tool and instrument for composers' Proceedings of the IX Colloquium on Musical lnformatics Genoa, Italy (13 16 Nov 1991) pp. 70~79 Richards, W 'Representations in mental models' Technical Report Department of Brain and Cognitive Science, Massachusetts Institute of Technology, USA (Sep 1990) Roads, C 'Research in music and artificial intelligence' ACM Comput- #~gSurveys Vol 17, No. 2 (1985) pp. 163 190 Woods, W A and Schmolze, J G 'The KL-ONE family' Computers Math. Applic. Vol. 23, No. 2 5 (1992) pp. 133 177

APPENDIX A

Procne class

Procne is the language of the family of K L - O N E for defining the T box, the A box, and the rules in the symbolic KB. It is an extension of the language presented in (Costa et al., 1988), and it is the kernel of the symbolic subsystem of WinProcne/HARP. The formal definition of the language can be found in (Camurri et al., 1993). Here we describe the main features of the implemented language kernel and its primitive calls.

The Procne class is an abstraction of the WinProcne/ H A R P symbolic database, which is written in Prolog (Arity Prolog 6.x in the version for the PC, and Sicstus Prolog in the version for Silicon Graphics). The simulative engine and the experts can use the Procne class enjoying the qualities of a Prolog machine, as if it was directly written in Prolog.

For example, let us suppose that an expert wants to retrieve all the relations which in the current semantic network are defined for people. In the Procne/Prolog environment, it is sufficient to ask for which Xs the following query is satisfied:

role(X, people, )

We developed an appropriate interface between C + + and the Procne/Prolog environment so that the above query could be formulated using the member of the Procne class:

Result = Procne::cRole("","people","")

I f Result is not zero, it means the answer is yes; other- wise, it is no.

The backtracking too is available in C + + ; to cor- rectly answer our query, we should find all the roles defined for people. The following three lines answer this query:

RoleName = SecndArg = EmptyString; for(Result = Proene::eRole(RoleName,"people",

SecndArg); Result! = O; Procne::redo()) printf("people relation: % s / n ' , R o l e N a m e ) ;

The redo() function (provided by Arity Prolog) enables the backtracking handling in the C + + environment. Since Arity Prolog is not able to manage simul- taneous backtracking on different queries, we serialized the Procne/Prolog requests inside the Procne class.

We also defined a protocol which lets the user handle backtracking and uninstantiated variables from C + + :

• A null length string is an uninstantiated variable; • An integer equal to not inst (a macro defined in the

Procne class) is an uninstantiated integer variable. • Everything else is an instantiated term.

The description follows of the principal services provided by the Procne class to the net browser, to the scheduler and, partially, to the experts:

Procne I/O, ~ loading nets and assertions, ,~ saving nets and assertions,

• adding elements to the T box,

c~ concepts, c~ roles, o temporal elements (tense maps),

• retracting elements from the T box,

o generic elements, ~ concepts, o roles, c~ temporal elements (tense maps),

• modifying T box elements, • graphical user interfaces,

o concepts and roles screen positions,

• adding elements to the A box,

o concept instances, c7 role instances,

• retracting elements from the A box,

,o generic instances, o concept instances, o role instances,

• modifying A box elements, • merging two different KBs, • querying Procne KB, • scheduler special-purpose primitives,

o examine firing conditions, © examine concepts and roles types, o examine concepts and roles symbolic/analogical

interface,

RoleName and SecndArg are two C + + strings initia- lized to be empty; the Procne query is managed by the Procne::CRole0 method.

• expert type, • inheritance handling, • expert body search,


Multi-paradigm software environment for real-time processing: A Camurri et aL

• expert I/O search,

• examine concepts and roles links.

APPENDIX B

Perceptor role types

All the intrinsic role types supported by the system are listed in Table 1. The description of the use, aggregate and temporal roles is detailed elsewhere in the paper. Here we analyse the perceptor roles. In Figure 10 the graphical representation of the types of roles allowed in the system is shown.

Perceptor roles can be of three different types: event, signal, and feature.

The event roles, in their turn, can be of two types:

In the first case they can be defined only between an expert and an icon (this means that an actor must be hooked to the role); the expert may be fired when the actor hooked to the event role detects a particular

situation. For instance, suppose that an icon describes a semaphore. We need to fire an expert when the semaphore becomes green (the car must start!); this problem can be solved by the net in Figure 11. The skill embedded in the actor body hooked to the perceptor is responsible for the activation of the move car expert. In the second case event roles connect couples of experts (no actors hooked to the role are needed). Their goal is the same as the event perceptors, i.e. they can be part of the experts'firing conditions. For example, given a compound_transformation t made by two component atomic transformations tr t and tr2, an event role can be defined to fire tr2 on the basis of some output generated by tr~ (see Figure 12). This kind of event role does not allow any flow of data (signals) between the connected experts, but it sends only binary activation.

The input and output signal roles can be of two types:

Table 1 Perceptor role types

Role type Role description

Use Aggregate Temporal

Feature(perceptor)

Input and output signal (perceptor)

Event (perceptor)

Generic roles Part_of roles (no analogical counterpart) Troles (no analogical counterpart in the current model implementation) These are feature extractors, and must be defined between an icon and a concept of type attribute These correspond to perceptors connecting either an icon and an expert, or two experts; in both cases they establish a continuous data flow between two objects These correspond to perceptors connecting an icon and an expert, or two experts; in both cases their output is a fire/nofire message toward a waiting expert

Figure 10

E ] Use

1 Aggregate

[ ] Event

[ ] Feature

Im Output signal

[ ] Input signal

[ ] Temporal

Types of role allowed

is_green s~m ap h o re~ - - -~ (~ ' -m~_ve_ca~

Figure I 1 Car example

ornp_

part of 1 t

trl eve n t / /

tr2

Figure 12 Event roles connecting couples of experts

aco=Jstl¢ s e n~s.* ac~._ j ~ _ ' - - acoustic sensitivit~

r.~>----~ cousti¢ attent* (i....__

. i . /

Figure 13 Loudness of sound example



, % \p=%of2

Figure 14

trl

stream

activate.

Synchronized data flow

tr2

In the first case these roles must be defined between an icon and an expert. This means that an actor must be hooked to the role. This actor extracts from the icon a continuous flow of signals that are usable by the connected expert. For instance, let us consider an expert acoustic_attention which controls the robot acoustic sensor, keeping it more or less open. The expert must know at any instant the loudness of a sound extracted by the perceptor room_loudness connected to the noisy_room icon. The net implementing this example is detailed in Figure 13. The acoustic_

~" intensity' ~ _ _ loudness

->" .... obje~ct* ( music _..>'---__

pitch . - f

Figure 15 Feature role

attention expert takes as an input parameter the room_loudness and controls the acoustic_sensor. In more detail, there is a role of type input signal named room_loudness, which extracts some particular data from the noisy room icon and sends its output (a continuous flow of signals - - the loudness in the room) to the acoustic attention's expert. Acoustic attention sends its output, via the output signal role acoustic sensitivity, to the proper instance of the acoustic sensor.

• In the second case, signal roles are defined between two experts, and work as Unix pipes. Their goal is to manage the stream of data between two experts. In the compound_transformation example, a signal role can be used to transfer data from trl to tr2. Figure 14 illustrates the use of an event and a signal role to specify a synchronized data flow between tq and tr>

The feature roles are defined between an icon and an icon attribute. A feature extracts a particular piece of information from an icon and puts it in the icon's attribute (see Figure 15). An attribute is represented either fully in the A box if it is a purely symbolic entity, or in the analogical KB.


multi-paradigm software environment for the real-time processing of sound, music and multimedia

Documents