visuomotor learning using image manifolds - … learning using image manifolds a thesis submitted in...

Visuomotor Learning Using ImageManifolds

A Thesis Submittedin Partial Fulfilment of the Requirements

for the Degree of

Master of Technology

by

Sunakshi Gupta

Roll Number : 12111044

under the guidance of

Dr. Amitabha Mukerjee

Department of Computer Science and Engineering

Indian Institute of Technology Kanpur

July, 2014

Abstract

Vision provides us with much information about distal spaces, and enables us to

formulate expectations about how objects move. In this work, we consider how

images, together with motor and performance data, can be used to form ”maps” -

or a holistic view of any motion. For example if the learner knows how to throw

a ball, it may be able to use the map to see that it also applies to kicking, or

catching, or throwing darts. These maps, we suggest, are manifolds that constitute

dense latent spaces for these motions. While earlier approaches to learning such

systems (e.g. discovering laws of physics) used prior knowledge of the set of system

variables, we assume no such knowledge and discover an equivalent representation

with alternate parameters (the latent parameters of the non-linear manifold). We

show how such a system can be built without any explicit parametrization for a

number of motion systems from rolling on an incline to pulleys to projectile motion.

We then demonstrate how such a map (for projectiles) can be used for throwing

darts, and how practice can improve the relevant part of the map, and hence the

throwing capability.

ii

Dedicated to

my parents and my sister for their love and support.

iii

Acknowledgement

I would like to express my sincere gratitude to my thesis advisor Dr. Amitabha

Mukerjee for his constant support and guidance throughout the tenure of my thesis.

He has been an amazing mentor in each and every aspect of my life. I would also like

to thank M.S.Ram for helping me with all my queries and providing great support

in implementation. I am also thankful to all my friends who gave me strength and

hope whenever i required. I thank Deepali and Nirmal for the insightful discussions

regarding the direction of my work. Last but not the least, I would like to thank

my parents and my sister Medhavi who were a constant driving force. This thesis

would not have been possible without their love and encouragement.

Sunakshi Gupta

iv

Contents

Abstract ii

List of Figures viii

List of Algorithms ix

List of Tables x

1 Introduction 1

1.1 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Manifolds and Dimensionality Reduction . . . . . . . . . . . . . . . . 9

1.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Inferring laws of mechanics from Image manifolds 12

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Algorithm and Approach . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Result Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Throwing: Acquire motor skills from perception 32

3.1 Comparison between Hausdorff and Euclidean Distance . . . . . . . . 33

3.2 Learning to Throw: Incremental Approach . . . . . . . . . . . . . . . 34

3.3 Goal Oriented Skills: Dart Throwing . . . . . . . . . . . . . . . . . . 36

4 Conclusions and Future Work 41

Bibliography 43

v

List of Figures

1.1 The baby is able to predict that the left slope is the sharper one and

the ball will move faster on this. . . . . . . . . . . . . . . . . . . . . . 1

1.2 A schematic representation of an experimental layout (based on Rieser,

Guth, and Hill’s descriptions, 1982, 1986). The dotted arrow indicates

one example of a spatial link, which implies an inferential process. . 2

1.3 Common visual parameters in various skills. Black box depicts area

of interest for our model . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Sample Input for the system built by Rajagopalan and Kuipers. Fig-

ure obtained from [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Set of trajectory images. Each trajectory is associated with the motor

parameters of the throw. . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Sample Images for the Mechanics Systems Used . . . . . . . . . . . . 13

2.2 Sample Images from Dataset for Ball on Incline problem. . . . . . . . 17

2.3 Result of Isomap on Ball on Incline Images . . . . . . . . . . . . . . 18

2.4 Neighbourhood graph with low Sampling Density showing the bands

formed based upon the ball position. Colours show the change in ball

position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Result of Isomap on Ball on Incline Images . . . . . . . . . . . . . . 19

2.6 Sample Images from Dataset for Crank and Piston problem for single

radius crank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Result of Isomap on Crank and Piston Images . . . . . . . . . . . . . 20

vi

2.8 Two-dimensional manifold of Crank and Piston system coloured by

(a)piston location from the top and (b) crank angle depicting the

relationship between the two. . . . . . . . . . . . . . . . . . . . . . . 21

2.9 Sample Images from Dataset for Crank and Piston problem for mul-

tiple crank radii. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.10 Neighbourhood Graph of Crank and Piston system showing that get-

ting a larger radius increases the mobility of the piston. . . . . . . . 22

2.11 The result of the interpolation for 130 query images with crank ra-

dius=7 pt. Red dots mark the interpolated query images while blue

dots are the original images in the manifold. . . . . . . . . . . . . . 23

2.12 Sample Images from Dataset for Box and Pulley problem. . . . . . . 24

2.13 Two-dimensional manifold of Box and Pulley system coloured by

(a)size of the red box and (b) height attained by the red box from

the top depicting the relationship between the two. . . . . . . . . . . 24

2.14 Two-dimensional manifold of Box and Pulley system showing the ac-

celeration of the box depends only on the weights of the boxes whereas

the velocity at any point also takes note of the height travelled. . . . 26

2.15 The predicted fit for the 2 sets of query images displayed in red with

the original manifold points in blue. . . . . . . . . . . . . . . . . . . 27

2.16 Sample Images from Dataset for Ball Projectile problem. . . . . . . . 28

2.17 Comparison of results of Isomap for Euclidean distance (top) and

Hausdorff Distance(bottom) showing how the cluster formed in Eu-

clidean neighbourhood is resolved with Hausdorff. . . . . . . . . . . 29

2.18 Variations in the manifold according to different parameters shown

by colouring the neighbourhood graph . . . . . . . . . . . . . . . . . 30

2.19 Manifold depicting relationship between the neighbourhoods in the

image space and the parameter space(θ). . . . . . . . . . . . . . . . 31

3.1 Variation in SAE in velocity of throw for different sets of images . . 35

vii

3.2 Throws with unrestricted velocities (Left) cover larger area and at-

tain more number of heights than the ones having restriction on the

velocity range (Right) . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Sample Image showing the desired trajectories hitting the dart at the

centre. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Two dimensional manifold for projectile images showing variation in

epsilon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Two dimensional manifold showing interpolated curve. . . . . . . . . 39

viii

List of Algorithms

1 Mechanics Law Discovery Algorithm . . . . . . . . . . . . . . . . . . 14

2 Dart Throwing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 38

ix

List of Tables

2.1 Real Time Examples of Mechanics Problems considered . . . . . . . . 16

2.2 Sum Absolute Errors for different sets of query images . . . . . . . . 27

2.3 Errors for different sets of query images . . . . . . . . . . . . . . . . . 29

x

Chapter 1

Introduction

Robots lack the power to learn things from the observations and experience, similar

to how babies learn. Consider a baby robot that observes how things move in reac-

tion to its actions or those of others. For example, given different sloping surfaces, a

ball falls quicker for the sharper slopes, a ball thrown at an angle describes a curve

reaching a high point and then starts coming down. The baby observes these for

some time and then is able to understand the behaviour of the system in some way

or the other. For instance, the baby is able to understand which slope among the

two shown in Figure 1.1 will make the ball roll down quicker. A kid of say 6 years

of age is able to determine which slide is steeper and hence faster even though he

might not have been to such a big slide before. There is an implicit understanding

of such systems in human brain from an early age.

Smith and Gasser state that the learning procedure of babies start from observ-

Figure 1.1: The baby is able to predict that the left slope is the sharper one andthe ball will move faster on this.

1

2

ing events in their environment and gathering information [2]. The sensory system

collects all the required information by moving body parts and recording the experi-

ence in terms of perception or sensory input. Of all the sense contributing to gather

information to learn about a system, vision is our most important sense. It appears

to have a key role in forming the spatial knowledge in the brain.

It has also been shown in [3] that the congenitally blind people have difficulty in

making the visual map of the places they have been in their mind. The congenitally

blind people showed more errors than the blindfolded sighted people in determining

the direction of a location from the another point after they are taken to all those

locations one by one from an origin point (Figure 1.2). Hence, it is clear from this

experiment that the people with no vision are not very good at learning the overall

map of the area they have been to. However, they show little error in determining

the path they have taken to reach from a starting point to another. It is observed

that blind people are as good as sighted people in understanding the ’route’ between

two points but they are unable to formulate an overall ’map’ when given many such

routes. This shows that to gather spatial knowledge, vision plays an important

Figure 1.2: A schematic representation of an experimental layout (based on Rieser,Guth, and Hill’s descriptions, 1982, 1986). The dotted arrow indicates one exampleof a spatial link, which implies an inferential process.

role. In our work, we are trying to learn such a basic map behind various systems

using visual input. We build a generalised system that is able to understand the

3

overall functioning of any physical system or motor action observed. All the skills

are correlated to each other in some way or the other. We have an overall general

map defined for them in the brain such that a person possessing one particular skill

will be able to perform an action involving the use of the other related skill. For

example, a baby knowing how to throw is also able to catch, kick or shoot with some

precision. We are trying to learn that general map formulated in our brain. Figure

1.3 shows how the two actions of throwing and kicking can be mapped to the same

visual input. The patterns in both the actions is the same and we are trying to

discover such patterns in our model which can be then finetuned accordingly to gain

expertise in one particular skill.

(a) Action of Throwing ball(b) Action of kicking ball

Figure 1.3: Common visual parameters in various skills. Black box depicts area ofinterest for our model

Consider the following situations:

1. The agent recognizes [Sam throwing a ball to Sita]

2. The agent executes [throwing the ball to Sita].

3. The agent understands the sentence “Sam threw the ball to Sita”

In classical AI, (a) would be modeled by a function whose input is visual and

which classifies it as an action of class throw(); it would be trained on a large set

of tagged videos. For (b), the robotic agent would command a sequence of motor

torques at its joints. To learn to project the ball so that it goes to a target, it would

4

train by throwing the ball repeatedly until it is able to achieve a desired path. For

the language input (c), one would parse the sentence into constituent phrases, and a

semantic lexicon might map these to formal structures such as throw(agent:Sam,

object: Ball, path: $p)∧goal($p, Sita).

This process makes it hard to correlate knowledge between these modalities - e.g.

looking at another person throwing short, it is hard to tell her to throw it higher,

say. It is also inefficient in terms of requiring thrice the circuitry compared to a

unified system.

Indeed, primate brains seem to be operating in a more integrated manner. Motor

behaviour invokes visual simulation to enable rapid detection of anomalies, while

visual recognition invokes motor responses to check if they match. Each time the

robot throws the ball for motor learning, it also generates a visual trajectory from

which it can learn a visual model of the action. Further, the motion parameters are

correlated with the visual feedback - both lie on matched low-dimensional curved

manifolds which can be aligned.

We develop a unified approach for modeling actions, based on visual inputs (say

trajectories or paths) resulting from given motor commands. The model constitutes

a low-dimensional manifold that discovers the correlations between visual and motor

inputs. The model can be applied to either visual recognition or to motor tasks.

The attempt of this thesis is to build a model similar to the model of human

brain for the skills combining vision and motor actions. The model built is then used

as starting point to learn specific tasks for example, the projectile motion system

can be used to learn the aiming tasks like dart throwing, catching or throwing into

a basket etc. In this thesis, the results of goal driven models are demonstrated on

the dart throwing problem.

1.1 Past Work

Past works in this field are related to our thesis in mainly two ways, one is learning

physical laws from observations and the other is optimising model to attain motor

5

skills like throwing, catching etc.

1.1.1 Learning Physics Laws

Earlier works on learning physical laws use considerable amount of prior knowledge

about the system. For example, Rajagopalan and Kuipers[1] built a system which

can predict the nature of behaviour of a conducting loop sliding down an incline into

a magnetic field given an understanding of the basic laws of electromagnetism. The

input to the system was an image of the system with text information describing the

image and the rules of magnetic field. Their work contains a large amount of prior

knowledge about the electromagnetic systems given in the form of explicit rules to

the system because of which the system is able to process and understand the text

information given with the image as input.

Another work in discovering the underlying physics law of a system is by Kinder-

Figure 1.4: Sample Input for the system built by Rajagopalan and Kuipers. Figureobtained from [1]

mann and Protzel [4] who try to fit a model into the observational data to determine

the physics laws for the free fall system. They train a neural network on the ex-

perimental data containing velocities of a falling object at the two ends of a tube.

They demonstrate how neural network can be used to learn a function that closely

resembles the actual laws behind the free fall experiment. Their work is also problem

specific and will not work for any other problem without any priors.

Schmidt and Lipson use a different approach[5] to discover the laws governing the

set of observations captured using motion sensor techniques for systems like double

6

pendulum and harmonic oscillators. They used genetic programming to determine

the equations for these systems and were able to converge on quite accurate equations

which mirror the actual laws behind these systems.

Sparkes et al. give a totally new perspective to the scientific discovery area by

using robot scientists Adam and Eve for fully automating the process of scientific

discovery [6]. These robots performed the experiments themselves and formulate

the laws from the experimental data recorded. This is the first work where each and

every task required in a law’s discovery is performed by the robot scientists and no

manual interference is required. But here also, the range of experiments performed

is limited to the instruments present in the robot and hence, the process can not be

generalised.

Pat Langley et al.[7] give various techniques to derive the equations of laws

that fit the given experimental values for a particular system. They provide four

programs that are able to derive patterns in the physical measurements given. The

major advantage of their work is that the programs used to discover the variability

in the data are completely free from any presuppositions about the systems which

is close to our work since we also do not use any prior domain specific knowledge

about the systems. However, their work is strongly a data-driven approach and it

tries to simulate the human behaviour using experimental data values rather than

visual input.

All the above works learn from the experimental values either specified as input

or recorded by the system itself. These give the idea of what methods can be used

to generalise the variability noted in a system in terms of data values. Our work also

aims at capturing that variability but the observations are in terms of perception.

This includes inputs in terms of images or videos or even visual sensory data that is

being gathered by the person when he/she observes a system in working or performs

it itself. One more difference is that our system does not make use of any priors

about the system. Our model is a generalised one which can be applied to any

system and information about that system can be derived. We have demonstrated

7

this by using the same model for learning the laws behind four different physics

problems.

We would also like to relate the visual input with the motor behaviour in our

work. The same model can be used to relate various motor actions with the visual

data. The related work in this field where visual senses are used to build specific

skills like throwing, catching etc. are described in brief in the following section.

1.1.2 Learning Motor Behaviour

Shadmehr and Krakauer show how the perception is used to make corrections in the

internal model of the brain and eventually, how brain learns to perform the action[8].

For every motor action we perform, our brain formulates an internal model and then

predicts the outcome of that model. The predicted outcome is then matched with

the feedback calculated from the perception of the action. The error is then used to

modify the internal model for the same.

It has also been shown in [9] how action and perception are closely related. This

work shows that even when one observes a motor action taking place, the brain

triggers the same neurons that are triggered when one performs that action himself.

This is the reason why one can predict the outcome when shown with a video clip of

people throwing a dart at the dart-board. The accuracies were high for the videos

of themselves because their brain was able to match the perception with the exact

action it performed. However, it is argued in [10] that this is not always the case.

They give the example of rhesus monkeys who are incapable of performing a task

like throwing but then also, they are able to understand the motion of the object

thrown by a person at them. This contradicts the theory that the brain tries to

understand any action they have perceived by mapping it with their own motor

system because, the rhesus monkeys are able to predict the motion even though

their motor system is devoid of any action like throwing. So, we can say that the

matching-mechanism works only for humans.

In humans, the models for the motor actions start developing in the brain from

8

a young age [11]. The babies slowly learn to perform actions like walking, throwing

etc. by building models using observations and personal experience. These models

keep on improving with exposure to more data. This has also been demonstrated

by us for the action of throwing in our thesis. With the increase in experience,

the general models for these motor skills can be finetuned to perform finer skills

which have some specific requirements. For instance, the model of throwing can

be modulated to learn finer skills like throwing a baseball, a frisbee or even a dart,

catching, kicking etc.

In baseball, making the decision on the fly about the direction to move in order

to catch the baseball seems like very difficult to learn. But, it has been shown in [12]

that the baseball player follow a path to maintain the linear optical trajectory(LOT)

of the ball. The work states that the players run in such a manner that the rate of

change of the optical angle of the ball remains constant. Dogs also follow the same

technique to catch the frisbee thrown at them. This has been shown in [13] where

cameras were mounted on the head of two dogs to record the perception by dogs

and it was found that they also maintain an LOT of the frisbee.

Another specific throwing skill, dart throwing has also been looked at in many

works [14, 15] etc. The author in [14] tries to build a decision model based on the

likelihood and prior observed for dart throwing using Bayesion decision theory. This

model shows how the nervous system should take optimal decisions to get the best

score in throwing a dart. We also look at throwing dart optimally in this thesis,

but ours is a very basic model looking at only one-dimensional dart with a different

approach.

Our approach is different from all of the above methods in two ways: a)we are

not trying to learn the scientific equations of the laws underneath the system and

b) we are not learning a specific skill or even a specific skeleto-muscular system.

Instead we are trying to learn the generalised map of the actions which can be used

for all kinds of activities that involve a visual input and a reaction to that input

using any motor action. We are attempting to find out the basic properties that if

9

known about the system can help us control the system in the most optimised way.

In our work, the observations are the perception of various physical systems in

the form of images. These images are clustered on a lower dimensional manifold to

discover the variations and patterns in the system. Similar to most of the works

described, we also build a model to describe the patterns but our model is built

on the lower dimensional embedding space(manifold) rather than the image space.

The analysis of the manifold learnt reveals many important properties of the system.

The motor skill of throwing also is associated with the physical system of projec-

tile motion and hence is learnt through the same model. We describe a generalised

algorithm that can be applied to the images of any physical phenomenon and will

discover the understanding of the underlying law without any prior domain specific

knowledge about the system.

1.2 Manifolds and Dimensionality Reduction

Learning a few visuo-motor tasks are among our agent’s very first achievements. Let

us consider the act of throwing a ball. Our learner knows the motor parameters of

the throw as it is being thrown - here we focus not on the sequence of motor torques,

but just the angle and velocity at the point of release.

Figure 1.5: Set of trajectory images. Each trajectory is associated with the motorparameters of the throw.

Each trajectory give us an image (samples -fig. 1.5). We are given a large set of

10

images (say, N=1080), each with 100×100 pixels. Each image can be thought of as

a point in a 104-dimensional space. The set of possible images is enormous, but we

note that if we assign pixels randomly, the probability that the resulting image will

be a [throw] trajectory is practically zero. Thus, the subspace of [throw] images is

very small.

Next we would like to ask what types of changes can we make to an image

while keeping it within this subspace? In fact, since each throw varies only on the

parameters (θ, v), there are only two ways in which we can modify the images while

remaining locally within the subspace of throw images. This is the dimensionality

of the local tangent space at any point, and by stitching up these tangent spaces

we can model the entire subspace as a non-linear manifold of the same intrinsic

dimensionality. The structure of this image manifold exactly mimics the structure

of the motor parameters (the motor manifold). They can be mapped to a single

joint manifold, which can be discovered using standard non-linear dimensionality

reduction algorithms such as ISOMAP [16].

ISOMAP makes use of classical MDS(Multidimensional Scaling) to find the low

dimensional embedding of the data. It tries to preserve the geodesic distance be-

tween every pair of points. The steps include constructing a neighbourhood graph

based on the distance metric used and computing shortest path between every pair

of points. The geodesic matrix is generated by combining the weights of the edges on

the shortest path in the neighbourhood graph constructed. The lower dimensional

coordinates are found by applying MDS on the geodesic matrix. The geodesics are

computed not in the euclidean space but in the curved image space of the data which

helps in preserving the geometry of the high dimensional data. This is the reason

why it performs well even with highly non-linear data. The computations performed

in ISOMAP are not very complex because all the calculations are performed on the

neighbourhood graph constructed which is often small, depending upon the value of

K. The major percentage of the computation time of ISOMAP is spent in comput-

ing the pairwise distance matrix used to find the k-neighbours of every point. The

11

other major contribution is by the Dijkstra’s algorithm (m+nlog(n) for m edges and

n data points))to compute the shortest distance between every pair of points. For

our problems however, the time is low since the number of data points are not very

large. Due to the high dimension of the images used in our systems (of the order

104), discovering the hidden manifold seems difficult but ISOMAP discovers the in-

trinsic dimensionality easily and accurately. Our data is densely sampled and is also

free from any discrepancies, therefore, ISOMAP performs well with the images in

our problems.

1.3 Thesis Organisation

The next chapter contains the main algorithm to discover the regularities in the

images of various physical systems that resemble the underlying law in some manner.

It also contains the results of the algorithm on four mechanics problems. Chapter 3

further shows how the model learnt in Chapter 2 can be modified to learns specific

goal oriented skills like dart throwing. We conclude with a discussion in Chapter 4.

Chapter 2

Inferring laws of mechanics from

Image manifolds

This chapter presents the main contribution of this thesis work i.e. how to use the

images of physical systems to make assertions about the working of these systems

and how to control them.

We look at simple mechanics problems like ball on incline, projectile motion,

box and pulley and crank and piston. We try to gather as much information as we

can from the images of these problems. Then, we built a model that can extract

the relevant information from any query image for the system. The systems that

are considered for analysis are constrained with many restrictions such that all the

results lie in a particular required range corresponding to the given problem domain.

The systems are assumed to be free from any external force or noise which can

cause discrepancy in the functioning of the system. The four mechanics problems

considered are shown in Figure 2.1.

12

13

(a) Ball on Incline (b) Crank and Piston

(c) Box and Pulley (d) Ball Projectile

Figure 2.1: Sample Images for the Mechanics Systems Used

2.1 Problem Statement

We want to find out the basic properties which, if known about the system can help

us control the system in the most optimised way. We try to learn the number of

degrees of freedom in a system by learning a manifold from the set of images. We

learn the controlling parameters of the system solely from the image data.

The goal is to gain the implicit knowledge behind the physical system. This

knowledge should be such that we are able to perform the task optimally and accu-

rately without knowing the exact equations of the laws underlying the system.

The purpose of this thesis is to show how the images of a physical phenomenon

can be used to discover latent characterisations leading to laws of the system that

mirror discoveries in physics. These images can be clustered on a manifold to dis-

cover the controlling parameters of that system as well as the relationships between

them.

14

2.2 Algorithm and Approach

The process of getting the embedding and training network is outlined in Algo-

rithm 1. The results of this algorithm are demonstrated on four mechanics problems.

The steps followed in all the four problems are more or less the same. They differ

in only the evaluation of the results and depicting the relations in them. Different

systems are evaluated differently and yield separate results. The main steps involved

in building the model are computing the low dimensional embedding and training

the neural networks according to the obtained embedding.

Algorithm 1 Mechanics Law Discovery Algorithm

1: Input: Set of high dimensional images {I1, I2, I3, ...IN} and corresponding con-trol parameters

2: Output: Value of the control parameters for the set of query images

3: Step1: Obtain a low dimensional embedding of the high dimensional imagedata using ISOMAP dimensionality reduction technique.

4: Step2: Train a regression model to acquire a mapping from the low dimensional(curved)manifold to the respective control parameters.

5: Step3:For executing a new throw, use a (query) image with desired path. Finda linear interpolation J for this query image Ji =

∑kj=1wjIj

6: Step4: Calculate the embedding points for the query image using the weightslearnt in Step3.Qi =

∑kj=1wj qj

7: Step5: Use the mapping learnt in Step2 to obtain the corresponding parametersfor the query image Ji.

8: Step6: Analyse the manifold to depict interesting relations about the physicalsystem.

The images of a physical system may have a high dimensionality, however the

degree of freedom is very low as compared to its dimensionality. Thus, the images

vary in only few parameters. Our motive here is to determine the relationship be-

tween the variability in the images and the system parameters. Since the underlying

dimension of the systems is much lower than the original dimension, the images can

15

be assumed to lie on a lower dimensional manifold in the image space. Any non

linear dimensionality reduction algorithm can be used to deduce the underlying di-

mensionality of the system and discover the manifold to which they belong. In our

work, we are using ISOMAP technique which attempts to preserve the geodesics

such that the points close to each other in the image space are close in the low

dimensional space and the points far apart are distant to each other[16].

The embedding points are not directly mappable to the control parameters for

the system, so, we train a back-propagation network to compute that mapping. The

network is trained on the points on which the manifold is built. It can then be used

to determine the values of the parameters for other points in the manifold. The size

of the training set affects the accuracy of the network and the prediction improves

as more images are added to the training set. This is discussed in more details

and demonstrated via results in the throwing problem in Chapter 3. For different

mechanics problems, we use a size which is capable of holding sufficient variability

in the images and control parameters required for our computation.

For any test image, first the nearest neighbours are computed based upon the

distance metric used for executing ISOMAP(eg. Euclidean,Hausdorff). This is

necessary because the neighbourhood is defined on the basis of that distance metric.

Using linear interpolation on the nearest neighbours, the weights are computed for

the query images.Then these weights are used to get the embedding coordinates

which can then be fed into the neural network trained to obtain the values of the

parameters for the query image.

The manifold obtained can further be used to give other assertions about the

systems by colouring the neighbourhood graph on the basis of different parameters

related to the system. The colouring can be used to detect associations among those

parameters and the neighbourhood. Thus, it can be utilised to make assertions about

the system as a whole. Some interesting relations have been discovered in the four

mechanics problems we have considered for analysis. These relations reflect the laws

underlying the physical system, which is quite interesting since the model we are

16

using is not trained on actual experimental data values but on images.

2.3 Result Analysis

The results of the approach mentioned in Algorithm 1 are demonstrated on four

problems in mechanics(Ball on Incline,Crank and Piston,Box and Pulley and Ball

Projectile). We have selected systems that are relatively simple to observe and

perceive from images and are in some way related to real life. The examples of real

life problems for each of the four systems is enlisted in Table 2.1. The images of all

Table 2.1: Real Time Examples of Mechanics Problems considered

Mechanics problem Real life analogy)

Ball on Incline slides in parks, inclined rampsCrank and Piston cars and locomotive enginesBox and Pulley wells,elevators,flag hoistingBall Projectile throwing basketball,dart etc.

four problems are shown in Figure 2.1. The laws behind these mechanics problems

are based on the basic equations of Newton’s laws of motion and are assumed to be

in ideal environment.

Coloring Scheme followed in Results: The lowest parameter value is repre-

sented by Yellow color increasing in order of Magenta, Red, Green and the highest

with Blue color. This is consistent throughout the thesis.

2.3.1 Ball on Incline

Ball on Incline system has an incline plane and a ball which is sliding down the

incline with zero initial velocity. This problem is relatable to many real life systems

like a child’s slide. A child does not know the laws of physics but he can under-

stand that a slide with a steep slope might be dangerous to him and is scared of

going down. However, he is comfortable on a slide with small slope. Similarly, while

pushing an object down a ramp, the workers know exactly how much force to ap-

ply to impart a velocity to the object which is sufficient for it to reach the ramp end.

17

Figure 2.2: Sample Images from Dataset for Ball on Incline problem.

In both the examples above, it is evident that the brain formulates a model for

these physical systems even though it may not be aware of the exact mechanics laws.

To imitate this behaviour, we generated 2000 images of the Ball and Incline system

with 60 different slopes and 30 different ball positions on the slope (see sample

18

(a) Residual Variance Plot (b) Neighbourhood Graph

Figure 2.3: Result of Isomap on Ball on Incline Images

images in Figure 2.2). The images have a dimensionality of about 40,000 while

the underlying dimension is only 2 which are slope of the incline and ball position

on the incline. For testing purpose, 160 more images were generated that belonged

to slopes separate from the training set. The images were fed to the Algorithm 1

and the results of the Isomap algorithm are shown in Figure 2.3. Isomap clearly

depicts the dimensionality of the system to be 2 and the neighbourhood graph also

shows the association between various points. A particular image is close to both

the images of similar slopes and similar ball positions.

The neighbourhood graph obtained shows that the manifold is very densely sam-

pled. The sampling density refers to the number of images per unit distance. In

terms of manifolds, sampling density relates to the number of points in the neigh-

bourhood of an image. With a smaller sampling density, the behaviour of the

manifold changes as it is been observed that the same ball position images have

more common pixels than the ones with same slope but different ball positions.

So, in terms of Euclidean distance, they get closer and form separate bands in the

neighbourhood graph as shown in Figure 2.4. Increasing the sampling density by

increasing the overlap in the balls on the same slope, we were able to get an embed-

ding which is free from any bands. On observing the coloured manifold based on

the slope(Figure 2.5a) and ball position(Figure 2.5b), we can clearly depict that

19

these two are the cause of the variability in the images. Hence, these might be the

two dimensions that are capable of defining the whole system.

Figure 2.4: Neighbourhood graph with low Sampling Density showing the bandsformed based upon the ball position. Colours show the change in ball position

(a) Coloring by slope of in-cline

(b) Coloring by ball posi-tion on incline

(c) Coloring by time toreach the end on incline

Figure 2.5: Result of Isomap on Ball on Incline Images

As we can observe from the coloured neighbourhood graph, the time remaining

for the ball to reach the end of the incline from the position in the image is dependent

upon both the slope of the incline and the current ball positions. The Figure 2.5c

clearly depicts that for a particular position on incline the time is minimum for the

maximum slope. Similarly, a trivial observation is that for a particular slope, the

time is minimum for the bottom most position. So, this knowledge can be utilised

to choose the slope angle or inclination height to optimise the efforts to perform a

task on the incline.

20

2.3.2 Crank and Piston

Crank and Piston have been used in locomotive engines and are utilised in other

related systems. The reason for selecting this system is that it is quite deceptive to

the eye, in the sense that the system can be easily misunderstood as a two degree

system instead of one.

Figure 2.6: Sample Images from Dataset for Crank and Piston problem for singleradius crank.

The motion of the piston induces the rotation of the crank which does not seem

obvious from the images. It can be said that the pressure applied at the piston and

its duration determines the angle by which the crank will rotate.

(a) Residual Variance Plot (b) Neighbourhood Graph

Figure 2.7: Result of Isomap on Crank and Piston Images

About 360 images for one full rotation of the crank were generated with each

image depicting a change of one degree in the angle. These images were captured

from the simulation of the system generated by MATLAB. The sample images for

the same is shown in Figure 2.6. The parameters like angle of crank rotated and

21

location of the piston are known for the training images. The results of the Isomap

algorithm are depicted in Figure 2.7a and Figure 2.7b which shows the k-nearest

neighbours of any point in the manifold.

Since the residual variance is minimum for dimension 2, so, the dimension predicted

for this system is 2. However, the neighbourhood graph clearly depicts the manifold

to be a single line connected at the ends. Hence, the dimension of the system should

be one.

Colouring the manifold on the basis of the piston location and the angle rotated

are shown in Figure 2.8. The black mark determines the point of start (point with

zero crank angle and minimum piston location from the top). The coloured graph in

(a) Colouring by Piston Location (b) Colouring by Crank Angle

Figure 2.8: Two-dimensional manifold of Crank and Piston system coloured by(a)piston location from the top and (b) crank angle depicting the relationship be-tween the two.

Figure 2.8a is symmetric around the horizontal axis passing through the centre of

the graph. This means that the piston location is the same for angles having same

deflection from the origin point in either direction. So, if the crank angle is recorded

in terms of deflection from the origin point and dropping the information about

the direction of deflection, we can conclude that there is one to one correspondence

between the piston location and the crank angle. Hence, it would not be incorrect

22

to say that motion in any one of the two (piston or crank) induces motion in the

other which we know by experience, is the correct assertion about the system. Other

parameters that affect the motion are the radius of the crank and the length of the

shaft but these parameters are constant for a particular crank and piston system.

If these parameters need to be varied then, we would require a combination of such

systems with varied radii and lengths.

Figure 2.9: Sample Images from Dataset for Crank and Piston problem for multiplecrank radii.

(a) Neighbourhood Graphshowing connections

(b) Colouring by PistonLocation

(c) Colouring by CrankRadius

Figure 2.10: Neighbourhood Graph of Crank and Piston system showing that gettinga larger radius increases the mobility of the piston.

For our problem, we looked at 4 different crank systems with different radii

(8,9,10,11 points) and tried to find out the dependency among them. There are

23

1440 images in the dataset with 360 images for each crank radius(Figure 2.9). The

crank with lower radius has less number of movements for the piston and so the

piston does not achieve some of the highest locations as shown in Figure 2.10b.

The coloured graph also shows the increase in the number of piston locations as the

radius is increased depicting close dependency between the piston location and the

radius of the crank. With four different radii, we get four identical sub-manifolds

wrapped around each other showing the same manifold topology. Hence, the images

of the system are able to hold sufficient variability of the system that is clearly

discoverable from the manifold of the system.

For testing, we had 120 test images generated with a radius 7 points. The neural

network trained on the dataset of 1440 images was then used to predict the expected

piston location in each of these 120 images. The error was calculated by comparing

with the original values of piston location of the query images. The Sum Absolute

Error came out to be 0.62. To show its associations with the training images, the

images were plotted on the original graph. The plot is shown in Figure 2.11.

Figure 2.11: The result of the interpolation for 130 query images with crank radius=7pt. Red dots mark the interpolated query images while blue dots are the originalimages in the manifold.

2.3.3 Box and Pulley

Pulley is a classic problem in mechanics with numerous versions. For our analysis,

we have used a pulley mounted at the roof with two weights suspended at both

24

the ends of the rope. The weights move up or down under the influence of gravity.

The direction of the acceleration is decided by the heavier weight. For our analysis,

weight of the lighter box(blue) is kept constant while that of the heavier box(red) is

varied. According to the difference in their mass, the pulley rotates. Since the boxes

are connected with a rope, the red box moves down with the same acceleration with

which the blue box moves up. The dataset of the images used to train the model is

shown in Figure 2.12.

The images were generated as if they are captured from a video of the whole

Figure 2.12: Sample Images from Dataset for Box and Pulley problem.

(a) Residual Variance Graph

(b) Neighbourhood Graph

Figure 2.13: Two-dimensional manifold of Box and Pulley system coloured by (a)sizeof the red box and (b) height attained by the red box from the top depicting therelationship between the two.

simulation of their movement. Therefore, the dataset consists of multiple sets of

images with each set containing the trajectories of a box moving down under the

25

influence of gravity. There were 600 images of 10 different weights each going through

60 different heights (Figure 2.1c). Since only changing parameter is the size of the

red box, so the essential dimension of the system is 1. But because the height of the

box is also varying for each weight, the Isomap algorithm detects the dimensionality

to be 2(Figure 2.13a). The neighbourhood graph(Figure 2.13b) clearly displays 10

different trajectories of the weights shown with different colours in Figure 2.14a.

The manifold coloured with heights (Figure 2.14b) shows how each weight goes

through each of the heights. The motion is constrained because the length of the

rope is restricted so, some sizes of the box may not attain some heights. This is

depicted by the less number of images in the blue band(Figure 2.14b) which refers

to the maximum heights attained from the top. Also, Figure 2.14c shows that the

acceleration is linear in terms of the weight of the box. So, we can assert that it

remains constant throughout the motion of a particular mass whereas the velocity

depends upon both the weight and the distance travelled. The velocity seems to

vary with distance with its lowest value being at the points with lowest height value.

As the velocity increases, the effect of the weight also comes into notice as shown in

the graph which becomes curved towards the weight direction.

The variability held by the system is low, since the images in this system were

very restricted in nature with only 10 weights. To test this model, we removed all

images for a particular trajectory from the training set, so that we can test for a

whole trajectory of some other mass which is not there in the manifold.

A total of 110 test images were separated consisting of trajectories of 2 different

weights, one at the extreme of the range of weights (weight=19) and one within the

range(weight =15). Testing is done on 55 images of each of the two query trajec-

tories as a test set and the remaining as the training set. The network was trained

separately for the two to predict the values for these sets by interpolation.

The errors obtained in the values of acceleration and velocity of the pulley for the

query images is listed in Table 2.2. The results of the prediction by the neural

26

(a) Colouring by size of the box(b) Colouring by heights attained from thetop

(c) Colouring by acceleration(d) Colouring by velocity of the red box

Figure 2.14: Two-dimensional manifold of Box and Pulley system showing the ac-celeration of the box depends only on the weights of the boxes whereas the velocityat any point also takes note of the height travelled.

network are shown in Figure 2.15. The interpolated points are shown in red while

the points in the manifold are depicted in blue. Figure 2.15a shows how the tra-

jectory of the test images is fitted into the manifold. The errors in the extrapolated

trajectory of weight 19 are a bit higher than for weight 15 because it lies on the

boundary location. This is the reason why the extrapolated values for this weight

are not as accurate as that for the weight 15 which can be seen from the Figure

2.15b and Figure 2.15a.

27

Table 2.2: Sum Absolute Errors for different sets of query images

Query Set Acceleration(in m/s2) Velocity(in m/s))

Weight=15 0.24 (2.1%) 0.63(3%)Weight=19 0.39(4%) 0.75(4.7%)

(a) Query Weight=15 (b) Query Weight=19

Figure 2.15: The predicted fit for the 2 sets of query images displayed in red withthe original manifold points in blue.

2.3.4 Ball Projectile

The simplest application of Ball Projectile motion is in the throwing action be

it a ball, a frisbee, a dart or any other object. It is the most common physical

phenomenon being observed in daily routine. The action of throwing are learnt by

the babies as part of their gross motor skills which are skills like walking, running,

climbing stairs etc learned by babies in their early growth years. In this section,

we will only learn the model for the projectile motion which can then be used to

establish knowledge about other systems like the act of throwing dart etc.

The determining parameters for a projectile motion are the velocity and the angle

of projection. A total of 1080 trajectories were generated having varied range and

heights(Figure 2.16). The system has internal control to determine the values of

the velocity and angle for each trajectory. The images were again fed to the Isomap,

however, the results for this system were not favourable. There was a cluster of

points formed in some portions of the manifold, rendering it inefficient to make

any assertions about the system(Figure 2.17b). For this approach, we had used

28

Figure 2.16: Sample Images from Dataset for Ball Projectile problem.

Euclidean distance metric which led to these results. It was incapable of capturing

the exact variability of the data. Many images were so close to each other in terms

of the Euclidean distance that their neighbourhoods overlapped causing chunks of

clusters in the manifold. Due to this, Hausdorff metric [17] was considered which

calculates the maximum distance of all points in one set to the closest point in

another. It is computed by the equation h(A,B) = maxa∈Aminb∈B ‖a− b‖.

The results of the algorithm with Hausdorff metric are displayed in Figure 2.17.

It is clearly evident that the Hausdorff metric forms a much better neighbourhood

than euclidean for the projectile images. The dimensionality detected using this is

2. The manifold coloured with various parameters like velocity, angle, maximum

height and range are shown in Figure 2.18.

Various assertions can be made by looking at the coloured graphs. For instance,

for a constant velocity, the optimal angle with which a ball should be thrown to

attain maximum possible height and range, is in the middle band of angles i.e.

29

(a) Residual Variance with Eu-clidean Distance

(b) Neighbourhood Graph showingclusters using Euclidean Distance

(c) Residual Variance with Haus-dorff Distance

(d) Neighbourhood Graph withHausdorff Distance

Figure 2.17: Comparison of results of Isomap for Euclidean distance (top) and Haus-dorff Distance(bottom) showing how the cluster formed in Euclidean neighbourhoodis resolved with Hausdorff.

somewhere around 45 degrees as shown by the green band (Figure 2.18d). Exact

values are computed from the neural network trained for these images. The model

formulated can also be used to predict the velocity and angle to reach a given height

or travel a particular range. The error in predicting the parameter values for a set

of 130 query images is given in Table 2.3.

Optimisations can also be made just like in human brain as to which angle and

Table 2.3: Errors for different sets of query images

Query Set Velocity(in m/s) Theta(in degrees) Max Height(in m) Range(in m)

SAE 0.59 0.67 1.93 2.7% Error 1.8 1.02 4.1 4

velocity will make the ball reach a particular target with minimum energy. All these

30

decisions are taken by the brain automatically, once the brain learns a model based

upon the practice throws which is considered as training data for the model. Simi-

larly, we try to answer such questions by building a model based upon a given set of

throws which can be treated as the practice trajectories. Using this manifold and its

relationship with the basic parameters, the parameter-values that optimise a par-

ticular entity related to the system can be found out. This is very useful for taking

decisions in many real life problems such as hitting a target or throwing dart. More

detailed discussion on the applications of throwing model can be found in Chapter

3.

(a) Velocity of Projection (b) Maximum Height Attained

(c) Range of Projectile (d) Angle of Projection

Figure 2.18: Variations in the manifold according to different parameters shown bycolouring the neighbourhood graph

31

Moreover, the parameter space(v,θ) is homeomorphic to the image space. This

means that the neighbourhood in the parameter space will be similar to the neigh-

bourhood in the image space (which lies in Hausdorff space since the neighbourhood

in the image space is computed based upon hausdorff distances). The homeomor-

phism can be easily proved because the function mapping the parameters to the

respective images is a continuous function in v and θ (according to the equations of

the parabola). This mapping is shown in Figure 2.19 where the images are mapped

on the manifold showing various ranges of θ with respect to the manifold embedding

points. This images depicts the similarity in the two neighbourhoods(image space

and parameter).

Figure 2.19: Manifold depicting relationship between the neighbourhoods in theimage space and the parameter space(θ).

Chapter 3

Throwing: Acquire motor skills

from perception

Motor Skills development are a part of a child’s overall growth. They are broadly

classified into two type: gross and fine motor skills. The early growth years witness

the development of the gross motor skills which include walking, running, throwing

etc. These skills involve movement of large muscles and body parts like arms, legs

etc. and are not lost even after a period of non-use. The fine motor skills are how-

ever, skills that can be forgotten if not performed for some days like playing piano,

eating with fork etc. These skills are specific and require small muscle movement.

In this section, we will be looking at the development of one of the gross motor

skills, throwing in babies. Then, we analyse its application in some higher order

goal oriented actions like dart throwing. We know that babies learn most of its

activities by observing others performing that action or by performing them himself

repeatedly. Therefore, perception plays an important role in the process of a child’s

motor development. We have already learnt a model in Chapter 2 which learns

how to throw a ball for specified parameters like maximum height, range, energy

etc. We will be finetuning that model to learn the basic action of throwing like a

baby and the expert technique of dart throwing.

32

33

As specified in the preceding chapter, the throwing images of projectile do not per-

form well with Euclidean distance metric because of which we had to look at other

metric called Hausdorff distance metric. The two metrics are compared in Section

3.1 along with a brief description of Hausdorff distance. The manifold learnt is then

used to demonstrate the way in which babies learn the model of throwing. In Section

3.2, it is shown how the reach of the baby increases with the increase in the number

of throws. The knowledge gained in Section 3.2 and Chapter 2 is then applied to

a more goal oriented action of dart throwing in the Section 3.3.

3.1 Comparison between Hausdorff and Euclidean

Distance

Hausdorff distance is defined as the greatest of all the distances from a point in one

set to the closest point in the other set. For two sets of points, A = a1, a2...andB =

b1, b2..., it is computed by the equation h(A,B) = maxa∈Aminb∈B ‖a− b‖.

Hausdorff distance is a metric since it follows all the properties required for being

a metric (non-negativity, identity of indiscernibles, symmetry and triangle inequal-

ity). Since Hausdorff calculates the L2 distance between two points, non-negativity

(d(x, y) ≥ 0) is ensured. Also, it calculates the distance between every pair of points

in the two sets so, even a slight difference in the pixels of the two images will make

the overall distance non-zero. Hence, identity of indiscernibles is ensured (d(x,y)=0

iff x=y) It follows triangle inequality [18] and in order to make it symmetric we

choose H(A,B)=max(h(A,B),h(B,A)). So, it can be called a metric [17].

Hausdorff distance and Euclidean distance are very different from each other. Eu-

clidean is the most common metric used while Hausdorff has not been widely used

with manifold learning algorithms. The usage of the two is totally domain specific.

For instance, we used Euclidean distance for all our systems but for projectile sys-

34

tem, Hausdorff was used since it performs better. The difference in the manifold

learnt for projectile images for the two metrics has already been shown in Figure

2.17. The problem of clustering observed with Euclidean distance is resolved using

Hausdorff distance.

Hausdorff distance seems to be favourable to us in the parabola problem because

it finds the overall distance by finding the difference between every pair of ON pixels.

So even if only one pixel is very far from every other pixel in an image, the two images

will be distant in terms of hausdorff distance. However, the Euclidean distance just

takes the difference between the respective pixel values in the two images. So, images

having large area of overlap are close in terms of euclidean distance even though,

some parts of them may be significantly different. This is the reason why we have

used Hausdorff distance for training our model for the projectile images since the

number of On pixels in each image are very less and there are high chances of overlap

of the images.

3.2 Learning to Throw: Incremental Approach

As mentioned in [8] when motor commands are generated, the error in achieving

the purpose is estimated by matching the feedback with the predictions generated

by the internal model. The feedback is given based upon perception and the error

in the model is minimised by modifying the model accordingly. In this work we are

trying to build the internal model for a motor command like throwing .

We first look at a problem of throwing a ball or any object and try to build the

model similar to how a baby learns. This model can be called the Prediction model

or forward model (P: hthrow− > q) since there is no pre-defined target. We are just

predicting the motion by the knowledge of v and Θ. We just learn a mapping from

the motor parameters like velocity of throw and the angle of throw to the perceptual

data. There are 1080 images of size 100*100 which are clustered on a 2 dimensional

35

Figure 3.1: Variation in SAE in velocity of throw for different sets of images

manifold using Isomap algorithm which is then colored based upon the motor pa-

rameters (v,Θ)[Figure 2.18a and Figure 2.18d]. Any regression model can be used

to find the mapping between the embedding points and (v,Θ) and we use neural

network to do the same.

The robot throws the ball for a few times and eventually learns the model better

by throwing more balls. Increasing the number of throws in a particular location

improves the accuracy in that region and as the variability of throws increases the

model also eventually improves.

We learn the manifold on a 100 images of projectile throws and then train a

network to predict the corresponding parameters for the images like velocity of

throw and angle of projection. The error is then calculated for a set of 130 test

images.

Gradually, the number of images is increased and the accuracy increases since

the model improves when trained with more images (Figure 3.1). The error falls

for the same number of images if only few velocities are to be learnt because throws

with unrestricted velocities cover larger area and attain more number of heights.

However, the range and height of the throws are limited in case of few velocities [see

Figure 3.2]. This is similar to the incremental manner the baby learns to throw. In

his earlier growth years, he is able to throw the ball only up to a certain distance

because his model is under-trained due to less number of practice throws observed

36

by him. A similar observation is made with our model also.

Figure 3.2: Throws with unrestricted velocities (Left) cover larger area and attainmore number of heights than the ones having restriction on the velocity range (Right)

Since, we have now realised the association of these motor parameters with the

trajectory of the throw, we can use this knowledge to throw at a target like throwing

a dart at the board accurately. The decision of which trajectory to choose to hit

the dart at the bull’s eye can be optimally made using the Inverse decision model

(D: xtarget− > q− > hthrow). This model is explained in the next section.

3.3 Goal Oriented Skills: Dart Throwing

Once we have learnt the model for the holistic throw, we can apply the knowledge

gained to perform specific skills like throwing a dart at the board accurately and

optimally. We have a model which maps various projectiles to their respective motor

parameters from Chapter 2. This model can also interpolate among the trajecto-

ries already in the model to determine the parameters for an out of sample query

trajectory.

The problem now consists of first finding the set of trajectories that are likely

to hit the dart at some location. Then, perform interpolation on the sets of images

hitting the dart to find out the ones which are accurate to our level of satisfaction.

In the previous model, interpolation was simple to perform because of the presence

of a single desired output i.e. the query image. One can easily perform linear

interpolation on the nearest neighbours of the query trajectory and use those weights

37

to find the respective parameters. This was attainable since the model operates on

the space of the images and the desired output is also in the form of an image.

Things however, are different if the expected output is not a single image but a

series of images as is in the case of dart throwing problem.

The problem is to find the set of trajectories that will hit a given one dimensional

dart in bulls eye i.e. at the centre (Figure 3.3. Out of these trajectories, the most

optimal one which has the best motor parameters, such that it is the easiest to throw

can be selected.

Figure 3.3: Sample Image showing the desired trajectories hitting the dart at thecentre.

The errors are calculated in terms of height at which the projectile hits the dart

and its distance from the bull’s eye. Only the projectiles having error less than

a threshold ε are used for further analysis, since they form the ε−neighbourhood

of the target trajectories. This neighbourhood holds large amount of information

which can be utilised to give desired results. It is evident from the Figure 3.4 that

an increase in the value of ε makes the size of the neighbourhood to grow in radial

manner such that each of the neighbourhood forms a superset of the previous one

i.e. Nε1 ⊂ Nε2 ⊂ Nε3 and so on. This shows that the error is less for trajectories

38

Algorithm 2 Dart Throwing Algorithm

1: Input: Image containing the dart.

2: Output: Series of trajectories I1, I2..Ip that will hit the dart in the most optimalposition.

3: Step1: Calculate the errors e for all sets of trajectories and select the ones witherror < ε.

4: Step2: Fit a quadratic through the embedding points of the selected trajectoriesand their corresponding errors such thatqTSq + bT q = e.

5: Step3: Use the values of S and b computed in Step 2 to compute the q-coordinates for which the error is minimised i.e. qTSq + bT q = 0.

6: Step4: Plot the embedding points on the manifold to find the region of accuratetrajectories (Figure 3.5).

7: Step5: Use the mapping learnt in Chapter 2 to generate more throws in theaccurate region. Re-calculate errors.

8: Step6: Analyse the accurate region on the manifold to select the trajectorieswith motor parameters that optimise one’s effort in throwing.

lying near the medial axis of the ε−neighbourhood of the dart image.

Figure 3.4: Two dimensional manifold for projectile images showing variation inepsilon

The embedding points of the neighbourhood images can be found from the model

learnt in the previous sections. Each point has an error associated with it. This is

used to fit a quadratic through the points for their respective errors such that for

39

each i, qiTSqi + bT qi = ei.

This equation is solved for matrices S and b which are then used to calculate the

values of q for zero error. This is equivalent to finding the sets of points satisfying

the quadratic equation Aq21 +Bq1q2+Cq22 +Dq1+Eq2 = 0, where values of constants

A,B,C,D and E are computed from the matrices S and b.

The coordinates obtained in this step are plotted on the manifold to find the

interpolated curve of the highest accuracy (Figure 3.5). The curve obtained is

ellipse for this dart. Only part of it is shown which lies in the neighbourhood region

for the given dart. It can be any other curve for other dart locations depending

upon the values of constants S and b learnt in Step2 of the Algorithm 2. More

number of trajectories are generated by taking potential points from this curve on

the manifold and generating throw images for those parameters.

Figure 3.5: Two dimensional manifold showing interpolated curve.

Since we already have a model that maps the embedding points to the respective

control parameters, we can use those parameters to generate images for the points

in the locality of the curve obtained in Algorithm 2. Around 80 more images were

generated near the curve and then errors were re-calculated for all the images in the

neighbourhood. The overall accuracy of the system improves but the errors are still

high because of the errors accumulated in (a) fitting the quadratic, (b) computing

the embedding coordinates and (c) mapping the coordinates to control parameters

to generate the images. The errors will reduce once these models are improved.

Since we have the trajectories with better accuracy, we can now optimise our

40

model by imposing different restrictions on these trajectories and choosing the ones

which clear those restrictions. For example, if we need to throw the dart accurately

with minimum energy, the trajectories with the lowest velocity can be chosen from

the accurate set. Similarly, a condition of hitting the dart at only 90 degrees will

lead to throwing away the projectiles hitting the dart at an angle not close to 90.

This model can be applied to other goal oriented throwing tasks also. Examples

include basketball, baseball and even football each of which require the player to

throw, shoot or kick the ball so that it reaches a pre-defined target. In basketball,

the target is the basket which is located at a height therefore, the throws that can

reach the basket from a specified location at some specified angles are required. This

is similar to the dart throwing problem except that the range of throws is increased

by a large factor and hence larger velocities will be required.

Baseball also utilises many laws behind the projectile motion. The act of both

throwing and catching require good skills or we can say, a well-trained model. Some

of the past works looking at baseball have been described in Chapter 1. In football,

the skill that is used is not throwing but kicking however, the ball follows the same

projectile motion so the model learnt remains the same. Here, the goal covers a

larger area both height-wise and width-wise so, a large number of trajectories can

fulfill the task.

Additional requirements include avoiding an obstacle (eg. goalkeeper, opponent

team player etc) to reach the goal. This can be easily achieved by dropping off

the trajectories that hit the given obstacle from the ones reaching the goal. This

type of computations is done on the fly by the skilled players which is a result of

many years of practice and experience. So, a model that can imitate this behaviour

requires a good training and an incremental learning process so that it can improve

with experience. In this chapter, we tried to build such a model that can capture the

basic behaviour of the brain. Additional modifications to the model can be made

specific to the concerned problem but the basic solution scheme remains the same.

Chapter 4

Conclusions and Future Work

This work attempts to build a generalised model to discover the visuo-motor patterns

from the set of images of physical systems or systems involving motor activities like

throwing. The model we have built requires no prior knowledge and hence can

be used to learn the patterns for any system where images capture the inherent

variability of the system. We have demonstrated the results of our system on four

mechanics problems and this can be further extended to work on other systems as

well.

The model learnt from the images is then used as starting point to learn the basics

about any goal driven activity like dart throwing, basketball etc. This model tries

to imitate the behaviour of the human brain by formulating an overall generalised

map for various motor activities.

For building a good model in the brain, two things are required: one is the ability

to build a model on the data collected and the other is perception so that the model

can be improved with experience. In our work, we have looked at only the visual

input as the perception for our model and how that visual input can be utilised to

learn various motor skills like throwing, kicking etc. We show that how the vision

is the most important sense for learning a system in our work. The models built

for all the four mechanics problems namely, ball on incline, ball projectile, pulley

and crank and piston are tested on new points generated and their accuracies are

discussed. The manifold learnt is also analysed to depict relationships between the

41

42

control parameters for the respective system which are enlisted in Chapter 2.

Moreover, Hausdorff distance metric was used with Isomap algorithm rather than

the conventional Euclidean distance and has been shown to give better results for

Ball Projectile system. Though Hausdorff performs better for Projectile images but

for other systems like Ball on Incline where the overlap is high, it does not perform

that good. So, the choice of metric depends upon the system considered. It performs

better for images with low overlap.

Further work in this thesis requires its integration with the language. We have

shown that a system starting with very few priors can combine motor, visual (and

possibly other modalities) into an integrated model. We can further use this rudi-

mentary concept knowledge to bootstrap language. Such a system can then learn

further refinements to this concept space using language alone.

Bibliography

[1] Raman Rajagopalan and Benjamin Kuipers. Qualitative spatial reasoningabout objects in motion: Application to physics problem solving. In Artifi-cial Intelligence for Applications, 1994., Proceedings of the Tenth Conferenceon, pages 238–245. IEEE, 1994.

[2] Linda Smith and Michael Gasser. The development of embodied cognition: Sixlessons from babies. Artificial life, 11(1-2):13–29, 2005.

[3] Catherine Thinus-Blanc and Florence Gaunet. Representation of space in blindpersons: vision as a spatial sense? Psychological bulletin, 121(1):20, 1997.

[4] Lars Kindermann and Peter Protzel. Physics without laws-making exact predic-tions with data based methods. In Neural Networks, 2002. IJCNN’02. Proceed-ings of the 2002 International Joint Conference on, volume 2, pages 1673–1677.IEEE, 2002.

[5] Michael Schmidt and Hod Lipson. Distilling free-form natural laws from exper-imental data. science, 324(5923):81–85, 2009.

[6] Andrew Sparkes, Wayne Aubrey, Emma Byrne, Amanda Clare, Muhammed NKhan, Maria Liakata, Magdalena Markham, Jem Rowland, Larisa N Soldatova,Kenneth E Whelan, et al. Review towards robot scientists for autonomousscientific discovery. Autom Exp, 2, 2010.

[7] Pat Langley. Scientific discovery: Computational explorations of the creativeprocesses. MIT press, 1987.

[8] Reza Shadmehr and John W Krakauer. A computational neuroanatomy formotor control. Experimental Brain Research, 185(3):359–381, 2008.

[9] Gunther Knoblich and Rudiger Flach. Predicting the effects of actions: Inter-actions of perception and action. Psychological Science, 12(6):467–472, 2001.

[10] Justin N Wood and Marc D Hauser. Action comprehension in non-human pri-mates: motor simulation or inferential reasoning? Trends in cognitive sciences,12(12):461–465, 2008.

[11] Gyorgy Gergely, Zoltan Nadasdy, Gergely Csibra, and Szilvia Bıro. Taking theintentional stance at 12 months of age. Cognition, 56(2):165–193, 1995.

[12] Michael K McBeath, Dennis M Shaffer, and Mary K Kaiser. How baseballoutfielders determine where to run to catch fly balls. SCIENCE-NEW YORKTHEN WASHINGTON-, pages 569–569, 1995.

43

44

[13] Dennis M Shaffer, Scott M Krauchunas, Marianna Eddy, and Michael KMcBeath. How dogs navigate to catch frisbees. Psychological Science,15(7):437–441, 2004.

[14] Konrad Kording. Decision theory: what” should” the nervous system do?Science, 318(5850):606–610, 2007.

[15] Hermann Muller and Dagmar Sternad. Motor learning: changes in the structureof variability in a redundant task. In Progress in motor control, pages 439–456.Springer, 2009.

[16] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometricframework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

[17] Daniel P. Huttenlocher, Gregory A. Klanderman, and William J Rucklidge.Comparing images using the hausdorff distance. Pattern Analysis and MachineIntelligence, IEEE Transactions on, 15(9):850–863, 1993.

[18] Jeff Henrikson. Completeness and total boundedness of the hausdorff metric.MIT Undergraduate Journal of Mathematics, 1:69–80, 1999.

visuomotor learning using image manifolds - … learning using image manifolds a thesis submitted in...

Documents