Download - AR paper 4
-
7/24/2019 AR paper 4
1/13
An interaction approach to computer animationq
Benjamin Walther-Franks , Rainer Malaka 1
Research Group Digital Media, Universitt Bremen, Fb3, Bibliothekstr. 1, 28359 Bremen, Germany
a r t i c l e i n f o
Article history:
Received 31 March 2014
Revised 15 July 2014
Accepted 19 August 2014
Available online 2 September 2014
Keywords:
Motion design interfaces
Performance animation
Humancomputer interaction
Design space
a b s t r a c t
Design of and research on animation interfaces rarely uses methods and theory of humancomputer-
interaction (HCI). Graphical motion design interfaces are based on dated interaction paradigms, and novel
procedures for capturing, processing and mapping motion are preoccupied with aspects of modeling andcomputation. Yet research in HCI has come far in understanding human cognition and motor skills and
how to apply this understanding to interaction design. We propose an HCI perspective on computer ani-
mation that relates the state-of-the-art in motion design interfaces to the concepts and terminology of
this field. The main contribution is a design space of animation interfaces. This conceptual framework
aids relating strengths and weaknesses of established animation methods and techniques. We demon-
strate how this interaction-centric approach can be put into practice in the development of a multi-touch
animation system.
2014 Elsevier B.V. All rights reserved.
1. Introduction
Moving images are omnipresent in cinema, television, com-
puter games and online entertainment. Digital media such as text,
images and film are nowadays produced by a diverse crowd of
authors, ranging from beginners and laymen to professionals. Yet
animation is still seen by most people as a highly sophisticated
process that only experts can master, using complex interfaces
and expensive equipment. However, consumer motion capture
technology has recently enabled and created a mass-market for
easy-to-use animation tools: computer games. In contrast to most
professional animation tools, recent games employ full-body inter-
action for instance via Kinect, allowing users to control a virtual
character instantaneously through their body. This trend is feeding
back into the area of the experts, with researchers investigating
time-efficient interfaces for computer puppetry using the Kinect
(e.g. [61,55]. Computer animation is currently seeing an influx of
ideas coming from the world of easy-to-use game interface made
for players with no prior training. Game designers in turn are
informed by design knowledge and methods developed over
decades of research in humancomputer interaction (HCI).
It is thus time that computer animation be approached from an
HCI perspective. This could aid describing and analyzing the vast
spectrum of animation techniques ranging fromvery intuitive pup-
petry interfaces for computer games to highly sophisticated con-
trol in advanced animation tools. Our goal is to understand
principles that underlie humanmachine interactions in computer
animation. With new ways of thinking about interactions with
continuous visual media and a thorough investigation of new ani-
mation interfaces on a theoretical foundation, motion design inter-
faces can be made more beginner and expert friendly.
This can be achieved by embedding computer animation meth-
ods and interfaces in an HCI context. Trends in motion design
interfaces can be connected with discussions on next generation
interfaces in HCI. Theoretical frameworks can aid us in tackling
the concrete user interface issues by a profound analysis, which
can aid the process of designing new mechanisms for more natural
and intuitive means of motion creation and editing.
This article approaches this goal in three main steps. We will
first reviewrelated work fromcomputer graphics, human computer
interaction and entertainment computing from a user- and inter-
face-centric perspective with a focus on methods, mappings and
metaphors. In the second step we construct a design spacefor inter-
faces that deal with spatiotemporal media. In the third step, the
utility of this conceptual framework is illustrated by applying it in
designing a multi-touch interactive animation system.
2. Animation techniques: an interaction view
Computer-based frame animation is the direct successor of tra-
ditional hand-drawn animation, and still the main method.
Advances in sensing hardware and processing power have brought
http://dx.doi.org/10.1016/j.entcom.2014.08.007
1875-9521/2014 Elsevier B.V. All rights reserved.
q This paper has been recommended for acceptance by Andrea Sanna. Corresponding author. Tel.: +49 421 218 64414.
E-mail addresses: [email protected](B. Walther-Franks),[email protected](R. Malaka).1 Tel.:+49 421 218 64402.
Entertainment Computing 5 (2014) 271283
Contents lists available at ScienceDirect
Entertainment Computing
j o u r n a l h o m e p a g e : e e s . e l s e v i e r . c o m / e n t c o m
http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://www.sciencedirect.com/science/journal/18759521http://ees.elsevier.com/entcomhttp://ees.elsevier.com/entcomhttp://www.sciencedirect.com/science/journal/18759521http://dx.doi.org/10.1016/j.entcom.2014.08.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.entcom.2014.08.007http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.entcom.2014.08.007&domain=pdfhttp://-/?- -
7/24/2019 AR paper 4
2/13
entirely new possibilities. Motion capture records the live perfor-
mance of actors, introducing a new form of animation more akin
to puppetry than traditional animation. Programmed animation
enables realistic simulations to provide interesting secondary
motion and create more believable worlds.
Traditionally, in computer-based keyframe animation, only
extreme poses or key frames need to be manually established by
the animator. Each keyframe is edited using manipulation tools,
which can be specialized for the target domain, e.g. character
poses. Some manipulation tools allow influencing dynamics
directly in the scene view. The most common means of specifying
dynamics is by using global descriptions, such as time plots or
motion paths. Spatial editing between keyframes can be achieved
indirectly by editing interpolation functions or by defining a new
key pose.
Motion timing is usually done via global descriptions of dynam-
ics. However, some temporal control techniques directly operate
on the target. Snibbe[58]suggests timing techniques that do not
require time plots but can be administered by directly manipulat-
ing the target or its motion path in the scene view. As with spatial
editing, the practicality of temporal editing with displacement
functions depends heavily on the underlying keyframe distribu-
tion. Timing by direct manipulation in the scene view is also sup-
ported by the latest animation software packages. Tweaking
motion trail handles allows for temporal instead of spatial transla-
tion; visual feedback can be given by changing frame numbers
adjacent to the handle. Spatial control of time has also been pro-
posed for video navigation[15].
Motion graphs are two-dimensional plots that map transforma-
tion values (vertical axis) against time (horizontal axis). With a
2DOF input device, such a graph thus allows integrated, simulta-
neous spatiotemporal control. In keyframe animation the motion
editor is the standard way to manage keyframe value interpolation,
typically by means of Bezier curve handles.
In contrast to keyframe animation, performance animation uses
motion capturing of live performance of an actor or puppeteer by
tracking a number of key points in space over time and combiningthem to obtain a representation of the performance. The recorded
data then drives the motion of a digital character. The entire proce-
dure of applying motion capture data to drive an animation is
referred to as performance animation [44]. In a typical setup, an
actors motion is first recorded, then the data is cleaned, processed
and applied to a digital character. Since the digital character can
have quite different proportions than the performer, retargeting
the motion data is a non-trivial task [24]. In this form of perfor-
mance animation, capture and application of motion data to an
animation are two separate processes, data handling is done off-
line. Online performance animation immediately applies captured
data to a digital character, creating animation instantly, allowing
the performer to react immediately to the results or to interact
with an audience[59,24]. Processing limitations sometimes entailthat performers can often only see a low-fidelity pre-visualization
of the final rendering[44].
Many performance animation efforts aim to represent human
motion accurately and limit the abstraction to a minimum and
the motion capture performers use only the senses with which
they have learned to act (e.g. kinaesthetic and proprioceptive feed-
back). For performance animation of stylized or non-humanoid
characters it is desirable to control them in a less literal fashion.
Such a style of performance control is often referred to as computer
or digital puppetry [3,59]. Just as traditional puppeteers would rely
on mirrors or camera feeds to adjust their performance, computer
puppetry requires instant renderings of the applied input to allow
performers to adjust their motions. Real-time mappings either use
high bandwidth devices for coordinated control of all characterDOF, or employ models based on example data or a physical
simulation. One challenge is to control a high number of degrees
of freedom (DOF) at the same time.
Real-time control of humanoid characters suggest literal map-
pings from the puppeteers physique to the characters skeleton.
Non-humanoid characters such as animals, monsters or animate
objects are difficult since they have a vastly different morphology
and motion style to humans. Seol et al. [55]address this by learn-
ing mappings through users mimicking creature motion during a
design phase. These learnt mappings can then be used and com-
bined during online puppetry. In similar work, Yamane et al. [66]
propose matching human motion data to non-humanoid charac-
ters with a statistical model created on the basis of a small set
manually selected and created human-character pose pairs; how-
ever, this process is conducted offline. The technique for optimal
mapping of a human input skeleton onto an arbitrary character
skeleton proposed by Sanna et al.[67]manages without any man-
ual examples and finds the best match between the two based
solely on structural similarities.
For animation techniques on desktop input devices, however,
typically less DOF are available. Recently this has been addressed
by multi-touch input devices, which enable techniques for simulta-
neous rotation, scaling and translation (RST) for 4DOF control of a
2D target[26]. Reisman et al. [52]developed a technique for inte-
grated rotation and translation of 3D content using an arbitrary
amount of contact points on an interactive surface.
When input devices of lesser DOF than the object parameters
are used, integrated control is not possible. This is a common prob-
lem in desktop interaction for navigating and editing 3D media,
since most desktop input and display devices only have two DOF.
Interface designers thus often face the problem of mapping two
control DOF to a higher-dimensional target parameter space. A
solution is to separate the degrees of control, i.e. splitting object
DOF into manageable subsets [4]. With single-pointer input
devices, this necessitates a sequential control of such subsets, e.g.
through displays of multiple orthographic projections of the scene
in one split screen or through spatial handles that are overlaid on
top of the target object. [4].If high-DOF devices are not available and temporal multiplexing
is not desired, interface designers can choose to constrain the
interaction to reduce required control DOF. A challenge for design-
ers is that the model behind the constraint must be understood by
the user, for instance by basing them on mechanisms already
known from other contexts.
Yamane and Nakamura [64] present a pin-and-drag interface
for posing articulated figures. By pinning down parts of the figure,
such as the end-effectors (feet or hands) and dragging others, the
whole character can be controlled with relative ease. Joint motion
ranges, the current joint configuration and the user-set joint con-
straints (pins) thus allow constrained control of several character
DOF with as few as two position input DOF for a 2D character.
The various constraints are prioritized so that dragging constraintsare always fulfilled and solved by differential kinematics that give
a linear relationship between the constraints and the joint
velocities.
Several research projects have attempted to leave the world of
explicit mappings and enable low-to-high-dimensional control,
bimanual interaction and multi-user interaction implicitly by sim-
ulating real-world physics. Frohlich et al. [20]let users kinemati-
cally control intermediate objects that are attached to target
objects by springs. The spring attachment is also used by Agrawala
and Balakrishnan[1]to enable interaction with a physically simu-
lated virtual desktop, the Bumptop.
Limitations in the motion capture system or the performers
physiology to produce certain desired motions can be overcome
by simulating parts of the body and their interaction with the envi-ronment. Ishigaki et al. [31] combine real-time full-body motion
272 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283
-
7/24/2019 AR paper 4
3/13
capture data, physical simulation and a set of motion examples to
create character movement that a user cannot easily perform, such
as climbing or swimming. The virtual environment contains prede-
fined interaction points such as the handles of a monkey bar or a
rope. Once the characters end-effectors are brought into proximity
of an interaction point, control changes so that the character
motion is no longer fully controlled by the motion capture. A sim-
plified simulation that treats the intentional contact as a universal
joint connected to the characters centre of mass by a linear damp-
ened spring enables the calculation of the overall dynamics of the
character.
Even when input and output degrees of freedom match, physi-
cal interdependencies of input DOF can still limit a mapping. In full
body tracking, the joint locations are dependent on actor size and
body proportions. If the performers proportions significantly differ
from character proportions, this can lead to problems with the
character interacting with objects in the scene, such as the floor
or props. For this problem of retargeting of motion capture data
to a new character, Shin et al.[57]propose an approach that maps
input based on a few simple heuristics, e.g., considering the dis-
tance of end-effectors to an object in the scene.
For live performances, control needs to be addressed with high-
bandwidth input devices or performers acting in parallel. With
recorded performances the puppeteer has more options. Capture
sequences or just parts of them can be retaken, or slightly modi-
fied, and complex motion can be built up in passes. Layered or
multi-track animation allows the performer to concentrate on only
a small amount of action at a time and create articulated motions
step by step. Oore et al. [50]employ layered motion recording for
controlling subsets of a characters DOF. For the animation of a
humanoid, they divide the character DOF into three parts and ani-
mate these sequentially: Two 6DOF devices are used to control the
motion of both legs, both arms, and torso and head in three passes.
Dontcheva et al. [13] make motion layering to the principle of their
live animation system.
Video games have a strong connection to animation. Most mod-
ern video games make heavy use of animation in order to breathelife into the game world. In this sense, games are one application
area amongst many others, such as film, television, or education.
But animation is also created with and in video games. The actions
taken by players, the responses of game elements constitute a form
of motion design, often conveying a certain story. This is most evi-
dent in game genres where players control characters in a virtual
world, like a puppeteer controls a puppet. However, animating
for video games differs significantly to animating for film or televi-
sion. While in film characters and objects are only viewed from a
specific camera angle, in interactive media such as video games,
the behavior and the view are spontaneously defined by the player.
The animator cannot foresee the decisions of the player, which is
why he must create animations for all possible player actions that
must meet certain criteria of completeness and realism. Suchmotion libraries contain elementary animation sequences can then
be looped, blended and combined in real-time by the game engine
[37]. By interactively directing pre-defined animations, players
thus essentially perform a kind of digital puppetry with indirect
control.
Motion control through high-DOF input devices extends the
degree of control, further blurring the lines between gaming and
puppetry: as players are able to influence more character DOF,
their possibilities for expression are increased. However, while all
games use some form of motion capture, few offer motion editing
required in animation practice: if a player is not satisfied with his
performance, he will have to do it again. Most games lack tech-
niques for even the basic task of time control, with notable excep-
tions such as Prince of Persia: Sands of Time [60], Zeit2 [5] andBraid [49], in which the player must navigate time as well as space.
Yet while these games incorporate time control in innovative ways,
they do not provide the degree of control and editing required for
professional animation.
In Machinima, the art of 3D game-based filmmaking, animation
and video games ultimately come together to form a novel means
of creating animated movies[37]. Using game engines for anima-
tion or virtual filming has benefits as well as limitations. Modern
3D games provide a complete game world with physics, animated
models, and special effects while offering comparatively simple
controls for puppeteering game characters. This gives authors a
lot to build upon, as opposed to other methods where animations
must be created from scratch. The limitations lie in the depen-
dency on the game developer with their short product cycles, their
game engine and assets, and the legal issues involved in using
these. Computer puppetry in games remains limited, as is any per-
formance control interface that merely activates and blends pre-
defined animations.
Viewing the state-of-the-art in animation with a coherent focus
on the user, mappings and control DOF is a first step in analyzing
the current generation and developing for the next generation of
interfaces. The next step is to further structure this treatment: a
theoretical framework identifies explicit aspects of interaction in
computer animation tools.
3. A design space for computer animation interfaces
Even though there is an increasing trend in computer graphics
research to consider the needs of the artist (e.g.[51,54], most work
on animation interfaces does not consider aspects of HCI. An inter-
action perspective on computer animation can help to construct a
design space of user interfaces for spatiotemporal media. Such a
design space can structure the designers options and aid research-
ers in analyzing the state of the art.
Existing interface design frameworks cannot be readily used for
animation interfaces, as they are either too general or too specific.
General frameworks[21,48]span too large a space or only analyzecertain aspects of interaction like input devices but not their map-
ping to output [9], while domain-specific frameworks [8,14] are
too focused.
Jacob et al. [33]present a framework for reality-based interac-
tion (RBI) that includes four themes: Naive physics (NP) reflects
the innate human understanding of basic concepts of physics, such
as gravity, forces and friction; body awareness and skills (BAS)
describes our sense of our own body and what we can do with
it;environment awareness and skills(EAS) covers how humans per-
ceive and mentally model their environment and place themselves
in relation to it;social awareness and skills(SAS) stands for humans
as social animals, who generate meaning by relating to other
human beings. Considering the four RBI themes for computer ani-
mation, many techniques aim to tap the artists innate understand-ing of spacetime processes, relating to the theme of naive physics
(NP). The environment awareness and skills theme (EAS) comes
into play as soon as humans interact with these real world
spacetime processes. For instance, multi-finger deformation tech-
niques for 2D puppetry on interactive surfaces [45] rely onour nat-
ural sense of timing and real-world experience with objects (NP,
EAS). In fact any technique based on motion capture for defining
dynamics relies on users intuitive sense of space and time (NP,
EAS). Performance controls for digital puppetry use the performers
understanding of their body (BAS). As Kipp and Nguyen[39]illus-
trate, a puppeteer uses complex coordinated hand movements to
bring a wooden puppet to life. Even the technique for low-fidelity
input via mouse and keyboardof Laszlo et al. [40] exploits both an
animators motor learning skills and their ability to reason aboutmotion planning (BAS). Collaboration in computer animation is
B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 273
-
7/24/2019 AR paper 4
4/13
common, as large productions require teams to work together, but
does not usually involve close coordination during a single task.
Multi-user puppetry interfaces are different in that they tap the
ability of humans to relate to other human beings (SAS). The four
qualities must be traded off against other desirable qualities, such
as expressive power, efficiency, versatility, ergonomics, accessibil-
ity and practicality, on a case-to-case basis.
While these themes are relevant for designing any kind of novel
interactive systems aiming at reality-based interaction, they are
rather general. For a conceptual framework specific to animation
it is thus necessary to define a new design space. In the following
we discuss the aspects we have identified in our work as relevant
to such a framework. We motivate their inclusion and relate them
to each other. We will also relate our framework to the RBI
framework.
3.1. Aspects of design
Analogous to general models of humancomputer interaction,
computer animation involves a dialog between a human artist
(animator, actor or puppeteer) and the application, a virtual artifact
(the animation). This occurs through a hardwaresoftware machine(the animation software and the hardware running it, including
input and output periphery). A design framework should consider
aspects of these entities and their relations. Fig. 1shows this basic
triangular structure that describes two views of this human-arti-
fact dialog, one that takes the machine as a mediator into account
(left and lower edge: artist-machine-artifact) and one that
abstracts from it (right edge: artist-artifact). Seven aspects charac-
terize these entities and their relations: task, integration, corre-
spondence, metaphor, directness, orchestration and spacetime.
In the following we will discuss these seven design aspects and
their relevance for HCI and animation tools.
Animation tools for productive use are designed around the
taskfor which they are intended. Decomposition breaks down tasks
into further subtasks, which can be, in turn, repeatedly brokendown until one arrives at basic tasks at the desired level of decom-
position which is frequently used to structure interaction tech-
niques [17,4,29]. At the top level, the main tasks in animation
design are motion creation (generating from scratch), motion edit-
ing (adapting an existing design) and viewing (for visual feedback
on spatial and temporal design). At a lower level, task decomposi-
tion structure varies highly with the type of animation artifact, i.e.
character animation or environment effects. Tool generality [53]or
versatility [33] characterizes the variety of interaction tasks that
can be performed with an interface. This can range from support-
ing a large amount of tasks from varied application domains to
only supporting a single, domain-specific task. Tasks are the goal
of interaction and aim at creating the animation. Therefore, our
design space links the aspects of tasks to the virtual artifact (Fig. 1).
An input device defines theintegrationof control how many
DOF can be changed at the same time from the same input source
[2]. Performance controls are traditionally very specialized, e.g.
using full-body motion capture suits or special hand-puppet input
devices[59,34]. Yet research has also brought forward more gen-
eral controls, such as the 2D multi-point deformation technique
of Igarashi et al. [30]. Since computer animation often involves
domain objects with large amounts of degrees of freedom (even
a simple 3D articulated biped will have around 30 DOF), special-
ized high-DOF input devices allow for a high level of integration.
Ideally the input device should match the structure of the task
Jacob et al. [32]. In most situations the DOF of the input device
are not sufficient and solutions like artificial separation or con-
straining mappings based on a certain model have to be found. If
other considerations lead to using lower-DOF input devices, tasks
should be adapted accordingly, e.g. by separating translation and
orientation [43]. The aspect of integration is mostly construed from
the set-up of the input device. We thus locate the aspect of integra-
tion next to the machine in the design space (Fig. 1).
Correspondence describes how the morphology of the physical
input through the input device and the resulting response of the
artifact relate[29]. Bodenheimer et al.[3]distinguish performance
animation controls by the degree of abstraction in the sense of cor-
respondence. At the one end of the spectrum, mappings are pri-
marily concerned with the character or style of the motion rather
than literal mappings between performer and target. Such map-
pings are more commonly used in computer puppetry. At the other
end of the spectrum are efforts to accurately represent motion that
strive to limit the degree of abstraction to a minimum. A high spa-
tial correspondence between input and output requires less mental
effort since it draws on our experience in using our own body andencountering real-world objects (BAS, EAS). UI designers must face
the tradeoffs between better learnability through high correspon-
dence and the range of motions that can be expressed. The aspect
of correspondence bridges the virtual artifact and the machine
characteristics (machine-artifact edge inFig. 1).
Themetaphoris a notion for describing the mapping of cogni-
tive intentions to physical device interaction using concepts
known from other domains [47,4]. In the conversation metaphor
the user engages in a written or spoken dialogue with the machine.
They are well suited for high-level operations, but less suited for
spatial precision and expression. Today graphical user interfaces
represent the dominating manipulation metaphor, where the user
acts upon a virtual worldrather than using language as an interme-
diary. Manipulation interfaces tap our naive understanding of thelaws of physics (NP), our motor memories (BAS) and how we per-
ceive and interact with our surroundings (EAS). Manipulation
using instruments requires more learning and mental resources,
as well as introducing indirection [65,22]. Sensors tracking the
users body promote an embodiment metaphor where the user
identifies with parts of a virtual world in a more literal way. For
avatar control, embodied interaction builds on our proprioceptive
and kinaesthetic senses (BAS), and can aid our feeling of presence
in virtual environments (EAS). Embodiment has been picked up in
current trends in computer animation that criticizes the complex
and abstract nature of motion design tools based on the WIMP par-
adigm. Since the aspect of metaphors is central to the artists cog-
nitive understanding of his or her activity our design space links it
to the artist inFig. 1.
Fig. 1. The design space of animation interfaces characterizes the entities involved
in the interaction and their relations.
274 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283
-
7/24/2019 AR paper 4
5/13
Directness characterizes the physical distance between user
and the target. This includes both the spatial and the temporal
offset from input to output[2]. In our understanding of directness
we consider the relation betweenuser (artist) and the physical rep-
resentation of the animation through the machine (as illustrated
on the triangular design space inFig. 1). Cognitive facets of direct-
ness have also been considered in other definitions [22,65], but
these can be covered in interaction metaphors.
Since computer animation interfaces deal with continuous or
time-based media with multiple spatial and one temporal dimen-
sion, interfaces need to support viewing and modeling not only of
static spaces but of their dynamics as well. As humans inhabit a
spacetime continuum, and all our actions always have a temporal
dimension, any kind of interaction between a human and a com-
puter to create, edit or view dynamic content relates the humans
spacetime to the mediums spacetime. User time is generally
referred to as real time, which is continuous, the data time as
virtual or stream time, which is discrete [42,12]. Depending on
animation method and technique, the real time of user input can
affect the virtual time or not. Or only either spatial or temporal
parameters of the animation are changed. This suggests that
there are different ways in which real spacetime can be mapped
to virtual spacetime. So far the literature lacks a structured
approach to characterizing the relations of user and artifact space
and time. We will therefore propose a taxonomy in the next sec-
tion, that sorts interaction techniques based on which components
of real and virtual spacetime are involved. This spacetime aspect
abstracts the relation of user and application from the device level,
which is why it is located on the artist-artifact edge of our design
space diagram (Fig. 1).
As a central element of our design space, Orchestration
describes in which order which parts of the users body perform
which sub-task through which input device. Since humans are
most adept at crafting with their hands, and for long time
humancomputer interfaces were optimized for manual control,
orchestration has been best studied for hand-based interaction.
Findings from behavioral psychology show that the dominantand non-dominant hands are optimized for distinct roles in most
tasks. For instance, in the task of writing the non-dominant hand
first establishes a reference frame relative to which the dominant
hand then operates. Using this knowledge in devising bimanual
interaction techniques can have benefits for efficiency [6], Hinckley
et al. [68], Balakrishnan and Kurtenbach [69]) and cognition, by
changing how users think about a task [35,41]. Many every-day
activities also show complex orchestrations of more than just the
hands, such as driving a car where feet control speed, hands the
steering, and fingers additional controls such as lights. Since
orchestration considers human, application and the mediating
device to an equal degree, it is situated at the center of the triangle
relation diagram representing the design space (Fig. 2).
3.2. Spacetime: a new design aspect
The concept of spacetime control mappings considers any nav-
igation, creation or editing operation on a continuous visual med-
ium as a mapping from real spacetime of the input device (the
control dimensions) to virtual spacetime of the presentation med-
ium (the presentation dimensions). The output mediums presen-
tation dimensions can be viewed and edited integrally or
separately regarding space and time. For instance, while frame-
based animation edits poses and the time instants at which they
occur separately, performance-based or procedural approaches
usually define motion in an integrated fashion. Both real space
and time can control either or both virtual space and time. A first
step in structuring these relations is to collapse the individualspatial dimensions to a single abstract space dimension, so that
we need only consider the two dimensions space, time on user
and medium side. The next step is to consider how these two
abstract input dimensions (control) affect the output dimensions
(presentation). The central idea underlying the construction of cat-
egories is that one or both control dimensions can affect one or
both presentation dimensions.
Four basic spacetime categories of mappings can be con-
structed from the possible combinations of the two sets (control
space, control time) and (presentation space, presentation time):
space? space
space? time
time? space
time? time
Often presentation space and time will be modified in an inte-
grated fashion, or spatial and temporal control will both figure into
the inputoutput relation. For this we introduce two control-inte-grated spacetime categories that cover inputoutput mappings in
which both control dimensions contribute to the relation
spacetime? space (i.e., space? space and time? space)
spacetime? time (i.e., space? time and time? time)
and two presentation-integrated spacetime categories in which
both presentation dimensions are affected by the interaction:
space? spacetime (space? space and space? time)
time? spacetime (time? space, time? time)
The final cases are the fully integrated spacetime categories
spacetime? spacetime (space? space and time? time)
spacetime? timespace (space? time and time? space)
which reflect that integrated control dimensions affecting presenta-
tion domains in an integrated way can be matched in two ways.
These ten spacetime categories cover all variants of mapping user
spacetime to medium spacetime. A simple means of visualizing
this is a 3 3 matrix, where the central cell is compartmented into
two, since relating both control and presentation space and time is
ambiguous (Fig. 2).
The first row of the matrix describes control mappings that only
look at the spatial component of the input and do not consider the
timing of the users input. The third row describes control
mappings where input has no spatial component, and the user only
administers state changes with temporal triggers via controls
such as buttons. The second row describes control mappings wherespatial input stands in a temporal context. There are borderline
Fig. 2. The taxonomy of spacetime mappings is structured based on how user
input in real spacetime controls medium output in virtual spacetime.Fig. 3gives
examples of these categories.
B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 275
http://-/?- -
7/24/2019 AR paper 4
6/13
cases between temporal and spatiotemporal control: If trigger
controls exert spatial changes (such as move a step in a certain
direction), we speak of spatial control.
While some mappings can be easily sorted into these categories,
for others it may appear less clear. In the following we consider
each category individually and show that it is possible to find
examples of actual interfaces for all of them (see alsoFig. 3).
Controls in thespace
?
spacecategory use the spatial compo-
nent of user actions to affect the spatial dimensions of the medium.
Most kinds of interactive editing techniques in computer-aided
design fall into this category. A straightforward one-to-one
mapping of viewer time to medium time (time? time) is video
playback. Examples ofspace? timemappings are timelines that
employ a linear spatial representation of time for navigating or
alteringtime-dependent media. Software packages for frame-based
animation make heavy use of linear time plots for temporal naviga-
tionand timing transformations. Lesscommon are examplesfor the
time? space category. Passive navigation techniques for virtual
environments make use of such mappings [4]. After choosing a
target or route either automatically or with the user in the loop,
the systemnavigates the user along the route or to the target, map-
ping user time to medium space. Editing operations are rare in this
category, since the single input DOF is insufficient for most editing
tasks.
In mapping input spacetime for manipulating space only, the
redundant DOFs can be used either for enhanced robustness or
for controlling further parameters. For editing a static image, the
temporal component of the user input can, for instance, be used
to control the stroke type of the virtual brush (spacetime?
space). Velocity-based spatial navigation techniques include input
space and time in the traversal of virtual space. The presentation
time can also be steered: interactive continuous adjustment of
playback speed (e.g. via a slider or wheel) changes video or anima-
tion playback during playback spatiotemporal input affects the
viewing of medium time (spacetime? time).
The category space? spacetime can be found in time plots
that are a common means of graphically representing a variable
changing over time. Animation packages usually feature a graph
editor that enables integrated shifting of key positions and the val-
ues they represent in time and one (spatial) dimension. Three-
dimensional representations of a video stream, video streamers,
even allow spacetime video editing [56]. The mapping category
time?
spacetime is realized in automated navigation through
a dynamic medium: scripted camera movement through animated
scenes navigates both the time and the space of the target medium.
It is often used for cut-scenes in video games, so-called cinematics,
when interactive control is taken from the player for a short time
in favor of progressing the narrative with pre-defined camera
movement. This is different from video playback, where the spatial
component of the medium (the video frame) is not navigated dur-
ing playback. While the result is essentially the same, this distinc-
tion is down to the fundamental difference in the medium data:
For video, the projection from 3D to 2D is already integrated into
the visual data (the video frames), while in 3D the projection is
determined at run-time.
The spacetime? spacetimemappings can be found in many
examples of user interfaces for virtual worlds. Spatial actions
browse or alter the mediums space, and user and medium time
are linearly related. Such mappings are common for interfaces that
require high user immersion. Most performance controls for inte-
grated motion creation also fall into this category, e.g. in interactive
video games or in performance animation. The remaining inverse
mapping of users spacetime to virtual timespace do not seem
to be used for practical implementations. They could, however, be
related to temporal triggers of a user (such as releasing some event)
that influences some graphical representation where theusers spa-
tial input controls temporal parameters of the event.
The spacetime view of operations on continuous visual media
give a new perspective on the types of such operations: whether
they are invasive (editing) or non-invasive (viewing) and whether
Spatial Manipulation
Manipulators/Gizmos/Handles
Posing a character
Motion Editing
Graph Editor
Adjusting ease-in/ease-out
Time Control
Timeline Bar
Browsing a video
Applications
Techniques
Scenario
Applications
Techniques
Scenario
Applications
Techniques
Scenario
Interactive Travel in Static Virtual Environments
Steering
Browsing a 3D information space
Time Space Time Space-Time
Playback
Triggers/Buttons
Watching a video
Performance Animation, Video Games
Computer Puppetry
Animating a character
Time Control
Jog Shuttle
Browsing a video
Passive Travel in Static Virtual Environments
Target-based Navigation/Fly-Throughs
Exploring architectural models
Passive Travel in Dynamic Virtual Environments
Target-based Navigation/Fly-Throughs
Watching a cut-scene in a 3D video game
Space Space Space Space-Time Space Time
Space-Time Space
Time Time
Space-Time Space-Time Space-Time Time
Fig. 3. Nine categories of spacetime mappings with example applications, techniques and scenarios of use. (Figure contains cropped stills of third party material licensed
under CC BY 3.0. Top left, top right and bottom left images attributed to the Durian Blender Open Movie Project; bottom left image attributed to Frontop Technology Co., Ltd;bottom center image attributed to Apricot Blender Open Game Project).
276 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283
-
7/24/2019 AR paper 4
7/13
they involve creating new designs from scratch or refining existing
designs. Firstly, collapsing all spatial parameters into one abstract
space dimension hides the fact that, as a rule, both control and
medium space involve multiple spatial parameters, while time
only constitutes a single quantity on each side. This has an impact
on the distribution of invasive versus non-invasive operations in
the matrix: techniques employing time as input (third row) are
mainly used for passive navigation, rather than for spatial manip-
ulation. This is because space offers more input dimensions and we
can navigate space easier than time. This asymmetry has shaped
how we mentally model the abstract dimension of time: we rather
think of time in terms of space than vice versa[10]. Secondly, the
columns sort mappings intorefinementthrough spatial editing and
temporal editing (left and right column), and creation through inte-
grated influence on medium spacetime (center column). Thirdly,
in many cases the distinction between non-invasive and invasive
operations is a theoretical one. A fly-through of a 3D scene can
either be seen as a navigation that does not change the dataset
or as a camera animation that does. The criteria for distinction
should come from the application: is the camera animation being
created a part of the medium or is it an ephemeral product of
the viewing operation? This distinction has an effect on categoriza-
tion, too.
3.3. Limitations
The aspects characterizing the design space of animation inter-
faces constitute a high-level framework. As such they provide a
structure and cues for design reasoning and analysis, rather than
concrete guidelines. In the following we will illustrate its utility
by showing how we used the design space in developing novel ani-
mation techniques. More case studies and examples are required to
illustrate its application in the multitude of animation-related
issues.
The design space does not offer a set of orthogonal dimensions,
rather its aspects are interrelated. For example, the nature of thetask is linked to the type of spacetime mapping: automation
cantake control away from the user up to the point that spatiotem-
poral input (e.g. continuous control of a puppets legs) can be
reduced to temporal input (e.g. triggering puppet walk cycles with
a button). Another example of such dependencies is that the choice
of metaphor determines the magnitude of directness: from indirect
manipulation over direct manipulation to embodiment. The inter-
relation between the seven design aspects may be not surprising,
as each can be seen as a perspective on the same issuedesigning
user interfaces for controlling spatiotemporal phenomena.
The design space presented in this section is a conceptual
framework for analyzing and designing animation interfaces. It
uses established design aspects identified in the HCI literature.
For describing relations of input and animation spacetime, which
are central to this class of interface, we could not rely on any prior
work. For this aspect we developed a taxonomy for sorting map-
pings into categories based on how they relate input and output
spacetime. Next we will show how we have used these design
aids in practice, both evaluating them as design tools and using
them to propose novel animation interfaces.
4. A multi-touch animation system
In order to illustrate the utility of the design space as an aid for
designing animation interfaces, we explain howit was employed in
the development of a novel animation system that we have pre-
sented in prior work (Walther-Franks et al. [70]). We go beyond
the original work by explicating the design approach underlyingit. The design space-driven approach was chosen in lieu of the first
iterations of a human-centered design process. In our experience
with proposing novel interaction paradigms these stages of an iter-
ative design approach have the issue that users are unfamiliar with
the possibilities of novel technologies and are strongly biased by
existing solutions. The design space can help to guide the first
phase of design until users can be provided with artifacts to
experience.
Even though free-space 3D input devices have recently become
highly popular in particular in combination with game consoles,
they still lack the possibility for accurate and precise control
needed for serious animation editing. Systems like the Kinect are
good for high-level avatar control, with predefined animations.
For more accurate editing, these systems are not yet feasible.
Direct-touch interactive surfaces provide better precision for ani-
mation tasks, and have the best makings for high directness and
correspondence of interaction. The potential of interactive surfaces
has been explored for various applications but only a few consider
animation [45,39]. Most surface-based 3D manipulation tech-
niques are not developed and evaluated for motion capture. Fur-
thermore, most projects only look at individual techniques and
lack a system perspective. However, this is necessary to shed light
on real-world problems such as integrating tools into whole work-
flows or dealing with the realities of software engineering.
4.1. Design approach
Going through the design aspects of our framework, we con-
sider options and make decisions, building up a design approach
to follow for the implementation.
4.1.1. Task
As a typical animation task we decided for performance anima-
tion of 3D rigid body models. Working with three-dimensional
content poses the challenge of a discrepancy between input space
(2D) and output space (3D). In recent years researchers have
started investigating 3D manipulation on interactive surfaces, from
shallow depth manipulation[27]to full 6DOF control[28,52]. Theproblemfor surface-based motion capture is to design spatial map-
pings that allow expressive, direct performance control by taking
into account the unique characteristics of multi-touch displays.
Many performance control interfaces are designed to optimally
suit a specific task, such as walk animation or head motion. This
means that for each type of task the performer must learn a new
control mapping. This is somewhat supported by specialized
devices that afford a certain type of control. For 2DOF input devices
like the mouse this is transferred to digital affordances like handles
of a rig. These map more complex changes in character parameters
to the translation of manipulators. The specialization is designed
into the rig, equalizing control operations to general translation
tasks. Since interactive surfaces have a 2DOF integrated input
structure, we copy this approach for our system.An important secondary task is defining the view on the scene.
Since direct-touch performance controls are defined by the current
projection, this puts a high demand on view controls regarding
flexibility, efficiency and precision. With few exemptions [16,23],
research on surface-based 3D interaction has not dealt much with
view control. Yet 3D navigation is essential for editing complex
scenes in order to acquire multiple perspectives on the target or
zoom in on details. Some surface-based virtual reality setups use
implicit scene navigation by tracking user head position and orien-
tation. However, this limits the range of control. For unconstrained
access to all camera degrees of freedom a manual approach offers
the highest degree of control. A common solution is to introduce
different modes for object transformation and view transformation
(camera panning, zooming, rotation/orbiting). This is prevalent indesktop 3D interaction, where virtual buttons, mouse buttons or
B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 277
http://-/?-http://-/?- -
7/24/2019 AR paper 4
8/13
modifier keys change between object and view transformations.
While zooming and panning cover the cameras three translational
DOF, the third rotational DOF, camera roll, is less essential since the
camera up vector usually stays orthogonal to a scene ground plane.
While in desktop environments this DOF separation is mainly
owed to low-DOF input devices it can also be employed on devices
that allow more integrated transformation techniques, in order to
allow more precise control [46]. We opt for separated control of
camera parameters to enable precise view adjustments.
4.1.2. Integration
Multi-touch interactive surfaces provide two control DOF per
contact. The combination of multiple points can be used to create
integrated controls for 2D and 3D rotation and translation. Yet
Martinet et al. [43] point out that multi-touch-based surface
interaction cannot truly support integrated 6DOF control. They
propose the depth-separated screen-space (DS3) technique which
allows translation separate from orientation. Like the Sticky Tools
technique of Hancock et al.[28], the number of fingers and where
they touch the target (direct) or not (indirect) determines the
control mode. Full 3D control can also be achieved by additive
motion layering: changing the control-display mapping (e.g. by
navigating the view) between takes allows control of further
target DOF.
Other important factors for efficiency are easy switching
between capture and view operations and dedicating hands to
tasks. This requires that a single hand be able to activate different
input modes with as little effort as possible. Widgets as an obvious
solution produce clutter and interfere with performance controls
that already require visual handles. Modal distinction by on- or
off-target hit testing can be problematic if the target has unusual
shape or dimensions. In order to separate between capture and
view control, we employ multi-finger chording in which the num-
ber of fingers switch between modes.
4.1.3. Correspondence
Interactive surfaces promote motor and perceptual correspon-dence between input and output. However, this correspondence
is difficult to maintain when planar input space and higher-dimen-
sional parameter space have to be matched. For a start, users only
interact with two-dimensional projections of three-dimensional
data. For instance, to translate a handle in the screen z-dimension,
one cannot perform the equivalent motion with standard sensing
hardware. The problem with the third dimension on interactive
surfaces is that barring above-the-surface input, manipulations in
the screen z dimension cannot maintain this correspondence, since
input motions can only occur in a plane. Following the integrality
of touch input, this means that the 2 input DOF need to be mapped
to 2 translation parameters of the target (e.g. the handle of a char-
acter rig) so that they follow the same trajectory.
4.1.4. Metaphor
The congruent input and output space of direct input devices
promotes a manipulation style of interaction. Most manipulation
techniques for interactive surfaces are kinematic mappings, where
individual surface contacts exert a pseudo friction force by sticking
to objects or pinning them down. As an alternative to kinematic
control, Cao et al.[7]and Wilson et al. [63] propose surface-based
manipulation through virtual forces. This offers a more compre-
hensive and realistic simulation of physical forces and is also used
in desktop-based and immersive virtual environments. Different
metaphors in the same system can enhance the distinction
between controls that otherwise have much in common. For
instance, in the example of desktop 3D interaction, editing usually
employs the direct or instrumented interaction metaphors, whileview controls bear more resemblance to steering. This could also
support the mental distinction between phenomenologically simi-
lar spatial editing and navigation operations on interactive
surfaces.
Manipulation is the most general metaphor for puppet control.
Through manipulation the puppeteer can flexibly create and
release mappings with a drag-and-drop style of interaction, direct-
ness minimizes mediation between user and target domain. For
complex transformations, as is often necessary in character anima-
tion, rigs should be designed so that handles promote as direct a
manipulation as possiblemeaning that handles should be co-
located with the features they influence and the handle-feature
mapping designed to support maximal correspondence. Regarding
kinematic versus physics-based manipulation mappings, realism
and emergent control styles stand against precision, predictability
and reliability. In animation, full control has a higher priority than
realism, which is why we opt for purely kinematic controls.
4.1.5. Directness
Interactive surfaces can reduce the distance between the user
and the target to a minimum. However, touch input also has poten-
tial disadvantages such as imprecision (when mapping the finger
contact area to a single point) and occlusion of on-screen content
through the users fingers, hands and arms [62]. Re-introducing
indirection can alleviate the occlusion problem. Since absolute
input techniques require to reach every part of the screen which
may become difficult when the display exceeds a certain size, lim-
iting the area of interaction to a part of the screen or indirection
mechanisms can help [18]. The spatial distance between input
and target can also be used as a parameter for interaction design.
For instance, fingers or pens touching the target can control differ-
ent DOF than off-target contacts (mode change). Layered motion
recording can involve manipulating moving targets after the initial
capture pass. Relative mapping applies transformation relative to
the initial input state. This allows arbitrary input location, and
clutching can increase the comfort of use. Both absolute and rela-
tive input can be applied locally and globally, which makes a sig-
nificant difference when controlling behavior of a feature thatinherits motion from its parents. Local mapping allows the user
to ignore motion of parent features and concentrate on local trans-
formations. By default, performance control of a feature overwrites
any previous recordings made for it. In this way, performers can
practice and test a motion until they get it right. They might how-
ever want to keep aspects of an original recording and change oth-
ers. Blending a performance with a previous recording expands the
possibilities for control. It allows performance-based editing of
existing animations.
4.1.6. Orchestration
Studies by Forlines et al. [19]and Kin et al. [38]demonstrated
that the benefits of two-handed (symmetric) input also transfer
to interactive surfaces for basic selection and dragging tasks. Thedifficulty is to get users to use both hands, since single-handed
controls in typical UIs can prime them. To maximize the options,
our system should allow one-handed as well as symmetrical and
asymmetrical bimanual input. The 2D capture approach implicates
that no single spatial manipulation requires more than a single
hand. Consequentially, two single-handed operations can easily
be combined to enable parallel operation, for instance one hand
per character limb, allowing emergent asymmetric and symmetric
control (cf.[11]).
If individual sets of camera parameters are controlled with a
single hand, this allows emergent styles of interaction. Combining
two different camera operations, one with each hand, allows
asymmetric view control. For instance, left hand panning and right
hand zooming can be combined to simultaneous 3DOF view con-trol. A combination of left-handed view control with right-handed
278 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283
-
7/24/2019 AR paper 4
9/13
performance control even enables interaction styles that follow
principles of asymmetric bimanual behavior [25]: the left hand
can operate the view, which will be at a lower spatial and temporal
frequency and with precedence to the right hand, which acts in the
reference frame provided by the left. This approach can be used to
simplify view attaching for editing in dynamic reference frames:
attaching the camera to the current reference frame for all camera
operations provides the benefits of kinaesthetic reference frames
and solves the issue of direct control with dynamic targets.
4.1.7. Spacetime
Direct-touchspatial editingis almostexclusivelyevaluated in the
scope of basic object editingin static environments(space? space).
Non-spatial trigger input by tapping the screen (time? time) is
commonly employed for discrete navigation of image sequences
or videos, e.g. TV sports presenters reviewing video recordings of a
game. With the exception of Moscovich et al. [45] and Kipp and
Nguyen [39], the potential of direct touch for motion capture
(spacetime? spacetime) has received little attention in prior
research. Surface-specific techniques thus seem mainly aligned
along symmetric spacetime categories. The absence of passive,
time-based mappings or graphical depictions of time might be just
because the coupling of input and output so strongly affords direct,
continuous manipulation as opposed to tool use or automation.
While it is still pure conjecture, it is possible that direct-touch
promotes symmetric spacetime mappings which couple user and
medium space and time more literally, while indirect input might
be better suited for more mediated spacetime controls.
4.2. Prototype system
We implemented the design approach in a working prototype of
a multi-touch animation system (Walther-Franks et al. [70]). We
decided to build upon the existing 3D modelling and animation
software Blender. The animation system is built around a core of
performance controls. View controls and a time control interface
complete the basic functionality. Each control can be operated witha single hand. This allows the user to freely combine two opera-
tions, e.g. capturing the motion of two features at once or wielding
the view and the puppet at the same time. Since Blender neither
supports multi-touch input nor concurrent operations, changes
were necessary to its user interface module, especially the event
system. We established a TUIO-based multi-touch interface. TUIO
is an open, platform independent framework that defines a com-
mon protocol and API for tangible interfaces and multi-touch sur-
faces [36]. It is based on the Open Sound Control (OSC) protocol, an
emerging standard for interactive environments. We implemented
chording techniques for mouse emulation by mapping multiple
finger cursors to single 2-DOF input events. This suffices for sin-
gle-hand input. For bimanual interaction the contacts are clustered
using a spatial as well as a temporal threshold. Fingers are onlyadded to the gesture if they are within a certain distance of the
centroid of the gestures cursor cluster, otherwise they create a
new multi-finger gesture. After initial registration the gesture can
be relaxed, i.e. the finger constellation required for detection need
not be maintained during the rest of the continuous gesture. This
means that adding or removing a finger to the cluster will not
change the gesture, making continuous gestures resistant to track-
ing interruptions or touch pressure relaxation. This multi-touch
integration already enables the use of tools via multi-touch ges-
tures with one hand at a time. For two-handed control it was nec-
essary to extend the single pointer UI paradigm implemented in
Blender such that two input sources (two mice or two hands)
can operate independently and in parallel.
Performance controls use selection and translation operators(Fig. 4). The translation operator works along the two axes defined
by the view plane. Single finger input maps to selection (tap) and
translation (drag). In linked feature hierarchies such as skeleton
rigs, the translation is applied to the distal bone end, rotating the
bone around screen z axis. Dragging directly on a target enables
selection and translation in a single fluid motion. Alternatively,
the drag gesture can be performed anywhere on screen, also allow-
ing indirect control of a prior selected target. Indirect dragging thus
requires prior selection to determine the input target. Selection is
the only context-dependent operator, as it determines the target
by ray casting from the tapped screen coordinates.
Layered animation is supported via absolute and additive map-
pings. Absolute mode is the standard mapping, additive mode
must be activated via the GUI. The standard absolute mapping
overwrites any previous transformation at the current time. In
the absence of parent motion this ensures 1:1 correspondence
between input and output. With parent motion, control becomes
relative to the parent frame of reference (local). Additive layering
preserves existing motion and adds the current relative transfor-
mation to it. By changing the view between takes so that the
inputoutput mapping affects degrees of freedom that could not
be affected in previous takes (e.g. by orbiting the view 90 degrees
around screen y), this enables the animator to add depth and thus
create more three-dimensional motion.
The three camera operators pan, orbit and zoom map to
two-, three-, and four-finger gestures (Fig. 5). Assigning chorded
multi-finger gestures to view operators does not have any prece-
dent in the real world or prior work, and there are good arguments
for different choices. A sensible measure is the frequency of use of a
certain view control, and thus one could argue that the more com-
monly used functions should be mapped to the gestures with less
footprint, i.e. fewer fingers. Camera dolly move or zoom is probably
the least used view control, which is why we decided to map it to
the four finger gesture: users can zoom in and out by moving four
fingers up or down screen y. Three fingers allow camera orbit by
the turntable metaphor: movement along the screen x axis controls
turntable azimuth, while motion along screen y controls camera
altitude. Two fingers pan the view along view plane x and y axes.Like transformation controls, camera controls are context-free,
meaning they can be activated anywhere on camera view.
A view attachment mode, when active, fixes the view camera to
the currently selected feature during all camera operations, mov-
ing the camera along with dynamic targets (Fig. 6). The camera-
feature offset is maintained and can be continuously altered
depending on camera operator as described above. After establish-
ing the attachment by starting a view control gesture, new targets
can be selected and manipulated. Releasing the camera control
immediately ends the attachment, rendering the camera static.
By combining one-handed view control and capture in an asym-
metric manner, this approach can solve indirection in control of
dynamic targets.
The time control interface features several buttons and a time-line. Simple play/pause toggle buttons start and stop the playback
within a specified time range. A timeline gives the animator visual
feedback on the remaining loop length in multi-track capture, sup-
porting anticipation. It also enables efficient temporal navigation:
with a one-finger tap the animator can set the playhead to a spe-
cific frame. A continuous horizontal gesture allows for interactive
playback, allowing direct control of playback speed.
5. Evaluation
The design framework was a powerful aid for structuring design
options for the novel multi-touch animation system presented
above. We have also used it in the design of a performance-basedanimation timing technique (Walther-Franks et al. [71]) and are
B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 279
http://-/?-http://-/?- -
7/24/2019 AR paper 4
10/13
Fig. 4. Direct and indirect performance control.
Fig. 5. Basic view transformations with continuous multi-finger gestures.
280 B. Walther-Franks, R. Malaka/ Entertainment Computing 5 (2014) 271283
-
7/24/2019 AR paper 4
11/13
employing it in ongoing projects. A design framework as presented
in this paper cannot be directly evaluated. Its usefulness and
appropriateness is rather proven indirectly through evaluations
of prototypical systems built on its theoretical foundation. For this
reason we will next summarize the evaluation of the multi-touch
animation system.
We evaluated the resulting system in an informal user study.
Aspects of interest were the reception and use of single- and
multi-track capture and camera controls, specifically in how far
two-handed interaction strategies would be employed. Since the
direct animation system has a high novelty and is still at prototype
stage, a formative evaluation was chosen in order to guide further
research. Formative evaluations are common in research and
development of 3D user interfaces [4]. Six right-handed individuals
aged between 23 and 31 years, four male, two female, took part in
our study. All came from a computer science and/or media produc-
tion background. Two of these judged their skill level as frequent
users of animation software, one as an occasional user and three
as rarely using such software. In session of about 30 min, the users
did free animations of a stylized human puppet. An articulated
mannequin was rigged with seven handles that provided puppetry
controls (three bones for control of the body and four inverse kine-
matic handlers for hand and foot end effectors). The inverse kine-
matics handlers allowed expressive control of the multi-joint limbswhile keeping complexity at a minimum. The goal was to explore
what own animation goals users would come up with given the
digital puppet. The study ran the prototype on a rear-projected
horizontal interactive tabletop employing the diffuse illumination
technique with a height of 90 cm, screen diagonal of 52 inch and
a resolution of 1280 800 pixels.
The results of the study revealed that participants took to the
controls easily. Most stated that they enjoyed using our system.
The performance control interface was straightforward for initial
animations. Multi-track animation was mainly used to animate
separate features in multiple passes, less to adjust existing anima-
tion. The more complex additive mapping was hardly used and
met with initial confusion, although explanation and experiment-
ing usually solved this. The view controls were quickly understoodand were used without difficulty. The most commonly used cam-
era operation was orbit. As all participants were familiar with
the timeline metaphor they had no problems understanding it.
Most subjects easily employed the absolute positioning of the
playhead to jump to a frame and to scrub along the timeline to
review the animation they had created. One participant used the
timeline for a method of animation somewhere between perfor-
mance and frame-based animation: using the left hand for play-
head and the right for pose control, he exerted a fast, efficient
pose-to-pose animation style. Five out of six participants mani-
fested asymmetric bimanual styles of interaction. An emergent
strategy of half of our studys participants was to dedicate the left
hand for view or time controls and the right for capture. Further,
one participant controlled two puppet features simultaneously.Three used their left hand to attach the view to the mannequin
for animating its limbs once they had created animation for the
root bone. The benefit of locking the view to a frame of reference
in this way seemed immediately apparent to them, and was
greeted with enthusiasm in two cases.
Given the short timeframe and lack of experience in perfor-
mance animation, participants were able to create surprisingly
refined character motion. Four were able to create expressive char-
acter animations within the short timeframe of 10 min in the free
animation task. These were a walk, jump and squat motions and
dance moves.
Inexperienced users had a harder time to comprehend spatial
relationships, while those with more experience in 3D animation
notably picked up controls more fluently. This comes as no sur-
prise, as using and controlling software takes time and practice,
regardless of interface. For novice and casual users, our 2DOF strat-
egy seems appropriate, since it constrains manipulation by the
depth dimension. However, the interface might need improvement
visualizing these constraints and giving more hints on depth cues.
6. Conclusion and discussion
Current animation system are too complex and inefficient for
the high demand in animated content today. In order to make themmore efficient and accessible to a broad range of users we have to
look at such tools from an HCI perspective. Our work has taken
steps in this direction. A review summarized related work in com-
puter animation interfaces regarding issues of control and use. A
design space characterized important aspects of animation inter-
faces on varying levels of abstraction. A taxonomy for spacetime
interactions with spatiotemporal media described how user and
medium space and dynamics relate in animation interfaces. The
use of this conceptual framework was demonstrated in the design
of a multi-touch animation system. For this proof-of-concept proto-
type we used interactive surfaces as high-bandwidth direct input
devices. It features robust, easy to understand, and conflict-free
unimanual mappings for performance and view control that can
be combined for efficient bimanual interaction. A user study veri-fied the design approach by showing largely positiveuser reactions.
The majority of users employed both hands in emergent asymmet-
ric and symmetric bimanual interaction.
Animations are created by people for people in order to inform,
educate or entertain. Striving for higher usability by applying
knowledge on physiological and psychological human factors is
the foundation of humancomputer interaction, and one of the
main points of our work. However, animation is primarily still an
art and a craft. Just as good animations have always been created
by artists with capability and skill, next generation animation
interfaces will still require talent and training on behalf of the user.
But in contrast to current mainstream tools they can help to ease
the effort in training and allow animators to express their creativ-
ity more efficiently. While animation tools cannot enablecompletely uninitiated people to create stunning motion designs
Fig. 6. The view attaching technique. Features can inherit motion from parents animated in previous motion layers. In such cases direct control is not possible. By attaching
the view to the features frame of reference, direct control is reintroduced.
B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283 281
http://-/?- -
7/24/2019 AR paper 4
12/13
without significantly constraining creativity, they can do a lot more
to make the learning curve less steep. We believe that next gener-
ation tools should incorporate everyone from beginners to experi-
enced professionals, by being easy to learn, but hard to master. In
this we hold it with voices in the community that, rather than
making systems easy to use, intend to accelerate the progress from
novices to experts[35], by letting users feel like naturals[62].
Acknowledgement
This work was funded in part by the Klaus Tschira Stiftung.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at http://dx.doi.org/10.1016/j.entcom.2014.
08.007.
References
[1] Anand Agrawala, Ravin Balakrishnan, Keepin it real: pushing the desktop
metaphor with physics, piles and the pen, in: Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems, CHI 06, ACM, NewYork,NY, USA, 2006, pp. 12831292.
[2] Michel Beaudouin-Lafon, Instrumental interaction: an interaction model for
designing post-WIMP user interfaces, in: Proceedings of the SIGCHI conference
on Human Factors in Computing Systems, CHI 00, ACM, New York, NY, USA,
2000, pp. 446453.
[3] B. Bodenheimer, C. Rose, S. Rosenthal, J. Pella, The process of motion capture:
dealing with the data, in: D. Thalmann, M. van de Panne (Eds.), Computer
Animation and Simulation 97. Eurographics/ACM SIGGRAPH, 1997.
[4] Doug.A. Bowman, Ernst. Kruijff, Joseph.J. LaViola, Ivan. Poupyrev, 3D User
Interfaces: Theory and Practice, Addison-Wesley, 2004.
[5] Brightside Games. Zeit2. Ubisoft, 2011.
[6] W. Buxton, B. Myers. A study in two-handed input, in: Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, CHI 86, ACM,
New York, NY, USA, 1986, pp. 321326.
[7] Xiang Cao, Andrew D. Wilson, Ravin Balakrishnan, Ken Hinckley, Scott E.
Hudson, ShapeTouch: leveraging contact shape on interactive surfaces, in:
2008 IEEE International Workshop on Horizontal Interactive Human Computer
Systems (TABLETOP). IEEE, October 2008, pp. 129136.
[8] S. K. Card, J. Mackinlay. The structure of the information visualization design
space, in: Information Visualization, 1997. Proceedings., IEEE Symposium on ,
volume 0, IEEE, Los Alamitos, CA, USA, October 1997, pp. 9299.
[9] Stuart.K. Card, Jock.D. Mackinlay, George.G. Robertson, A morphological
analysis of the design space of input devices, ACM Trans. Inf. Syst. 9 (2)
(April 1991) 99122.
[10] Daniel. Casasanto, Lera. Boroditsky, Time in the mind: using space to think
about time, Cognition 106 (2) (February 2008) 579593.
[11] Lawrence D. Cutler, Bernd Frhlich, Pat Hanrahan. Two-handed direct
manipulation on the responsive workbench, in: SI3D 97: Proceedings of the
1997 Symposium on Interactive 3D Graphics, ACM, New York, NY, USA, 1997,
pp. 107114.
[12] J.D.N. Dionisio, A.F. Cardenas, A unified data model for representing
multimedia, timeline, and simulation data, IEEE Trans. Knowledge Data Eng.
10 (5) (September 1998) 746767.
[13] Mira. Dontcheva, Gary. Yngve, Zoran. Popovic, Layered acting for character
animation, ACM Trans. Graph. 22 (3) (July 2003) 409416.
[14] Tanja Dring, Axel Sylvester, Albrecht Schmidt. A design space for ephemeral
user interfaces, in: Proceedings of the 7th International Conference onTangible, Embedded and Embodied Interaction, TEI 13, ACM, New York, NY,
USA, 2013, pp. 7582.
[15] Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai,
Ravin Balakrishnan, Karan Singh. Video browsing by direct manipulation, in:
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 08, ACM, New York, NY, USA, 2008, pp. 237246.
[16] J. Edelmann, A. Schilling, S. Fleck. The DabR a multitouch system for intuitive
3D scene navigation, in: 3DTV Conference. The True Vision Capture,
Transmission and Display of 3D Video, 2009. IEEE, May 2009, pp. 14.
[17] James.D. Foley, Andries. van Dam, Steven.K. Feiner, John F. Hughes, Computer
Graphics Principles and Practice, Addison-Wesley, 1996.
[18] Clifton Forlines, Daniel Vogel, Ravin Balakrishnan. Hybrid Pointing: fluid
switching between absolute and relative pointing with a direct input device,
in: UIST 06: Proceedings of the 19th Annual ACM Symposium on User
Interface Software and Technology, ACM, New York, NY, USA, 2006, pp. 211
220.
[19] Clifton Forlines, Daniel Wigdor, Chia Shen, Ravin Balakrishnan. Direct-touch
vs. mouse input for tabletop displays, in: Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI 07, ACM, New York, NY, USA,2007, pp. 647656.
[20] B. Frohlich, H. Tramberend, A. Beers, M. Agrawala, D. Baraff. Physically-based
manipulation on the responsive workbench, in: IEEE Virtual Reality 2000,
volume 0, IEEE Comput. Soc., Los Alamitos, CA, USA, 2000, pp. 511.
[21] David M. Frohlich, The design space of interfaces, in: Lars. Kjelldahl (Ed.),
Multimedia, Eurographic Seminars, Springer, Berlin Heidelberg, 1992, pp. 53
69.
[22] DavidM Frohlich, Direct manipulation and other lessons, in: Martin G. He-
lander, Thomas K. Landauer, Prasad V. Prabhu (Eds.), Handbook of Human
Computer Interaction, Elsevier, North-Holland, 1997, pp. 463488.
[23] Chi W. Fu, Wooi B. Goh, Junxiang A. Ng. Multi-touch techniques for exploring
large-scale 3D astrophysical simulations, in: Proceedings of the 28thinternational conference on Human factors in computing systems, CHI 10,
ACM, New York, NY, USA, 2010, pp. 22132222.
[24] Michael. Gleicher, Animation from observation: motion capture and motion
editing, SIGGRAPH Comput. Graph. 33 (4) (November 1999) 5154 .
[25] Y. Guiard, Asymmetric division of labor in human skilled bimanual action: the
kinematic chain as a model, J. Motor Behav. 19 (4) (December 1987) 486517.
[26] Marc S. Hancock, F. D. Vernier, Daniel Wigdor, Sheelagh Carpendale, and Chia
Shen. Rotation and translation mechanisms for tabletop interaction, in:
Horizontal Interactive HumanComputer Systems, 2006. TableTop 2006.
First IEEE International Workshop on, 8 pp+. IEEE, January 2006.
[27] Mark Hancock, Sheelagh Carpendale, Andy Cockburn. Shallow-depth 3d
interaction: design and evaluation of one-, two- and three-touch techniques,
in: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 07, ACM, New York, NY, USA, 2007, pp. 11471156.
[28] Mark Hancock, Thomas T. Cate, Sheelagh Carpendale. Sticky tools: Full 6DOF
force-based interaction for multi-touch tables, in: Proceedings of Interactive
Tabletops and Surfaces 2009, 2009.
[29] Ken. Hinckley, Daniel. Wigdor, Input Technologies and Techniques, Taylor &
Francis, 2012. Chapter 9.
[30] Takeo. Igarashi, Tomer. Moscovich, John.F. Hughes, As-rigid-as-possible shape
manipulation, ACM Trans. Graph. 24 (3) (2005) 11341141.
[31] Satoru Ishigaki, Timothy White, Victor B. Zordan, C. Karen Liu, Performance-
based control interface for character animation, ACM Trans. Graph. 28 (3)
(2009) 18. July.
[32] Robert J.K. Jacob, Linda E. Sibert, Daniel C. McFarlane, M. Preston Mullen,
Integrality and separability of input devices, ACM Trans. Comput. Hum.
Interact. 1 (1) (1994) 326. March.
[33] Robert J. K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S. Horn, Orit
Shaer, Erin T. Solovey, Jamie Zigelbaum. Reality-based interaction: a
framework for post-WIMP interfaces, in: Proceedings of the Twenty-sixth
Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 08,
ACM, New York, NY, USA, 2008, pp. 201210.
[34] John Jurgensen. From muppets to digital puppets, August 2008. URLhttp://
www.youtube.com/watch?v=GN8WbHomQJg.
[35] Paul Kabbash, William Buxton, Abigail Sellen. Two-handed input in a
compound task, in: Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems, CHI 94, ACM, New York, NY, USA, 1994, pp. 417423.[36] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, Enrico Costanza. TUIO
a protocol for table based tangible user interfaces, in: Proceedings of the 6th
International Workshop on Gesture in HumanComputer Interaction and
Simulation (GW 2005), Vannes, France, 2005.
[37] Matt Kelland, Dave Morris, Dave Lloyd Machinima, Making Animated Movies
in 3D Virtual Environments, Ilex, Lewes, 2005.
[38] Kenrick Kin, Maneesh Agrawala, Tony DeRose. Determining the benefits of
direct-touch, bimanual, and multifinger input on a multitouch workstation, in:
Proceedings of Graphics Interface 2009, GI 09, Canadian Information
Processing Society, Toronto, Ontario, Canada, Canada, 2009, pp. 119124.
[39] Michael Kipp, Quan Nguyen. Multitouch puppetry: creating coordinated 3D
motion for an articulatedarm, in: ACMInternational Conference on Interactive
Tabletops and Surfaces, ITS 10, ACM, New York, NY, USA, 2010, pp. 147156.
[40] Joseph Laszlo, Michiel van de Panne, Eugene Fiume. Interactive control for
physically-based animation, in: SIGGRAPH 00: Proceedings of the 27th Annual
Conference on Computer Graphics and Interactive Techniques, ACM Press/
Addison-Wesley Publishing Co., New York, NY, USA, 2000, pp. 201208.
[41] Andrea Leganchuk, Shumin Zhai, William Buxton, Manual and cognitive
benefits of two-handed input: an experimental study, ACM Trans. Comput.Hum. Interact. 5 (4) (1998) 326359. December.
[42] Thomas D.C. Little, in: Time-based Media Representation and Delivery, ACM
Press/Addison-Wesley Publishing Co., New York, NY, USA, 1994, pp. 175200.
[43] A. Martinet, G. Casiez, L. Grisoni, Integrality and separability of multitouch
interaction techniques in 3D manipulation tasks, IEEE Trans. Vis. Comput.
Graph. 18 (3) (March 2012) 369380.
[44] Alberto Menache. Understanding motion capture for computer animation.
2011.
[45] T. Moscovich, T. Igarashi, J. Rekimoto, K. Fukuchi, J. F. Hughes. A multi-finger
interface for performance animation of deformable drawings, in: UIST 2005
Symposium on User Interface Software and Technology, October 2005.
[46] Miguel A. Nacenta, Patrick Baudisch, Hrvoje Benko, Andrew D. Wilson.
Separability of spatial manipulations in multi-touch interfaces, in: GI 09:
Proceedings of Graphics Interface 2009, Canadian Information Processing
Society, Toronto, Ontario, Canada, Canada, 20