lecture 23: mns model 2 michael arbib and erhan oztop the mirror neuron system for grasping:

1Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2

Lecture 23: MNS Model 2

Michael Arbiband

Erhan Oztop

The Mirror Neuron System for Grasping:Visual Processing for the MNS model

The Virtual Arm

The Core Mirror Neuron Circuit

Michael Arbib: CS564 - Brain Theory and Artificial Intelligence

University of Southern California, Fall 2001


Vis

ual C

orte

x

STS

7a

Hand shape recognition & Hand motion detection

Hand-Object spatial relation analysis

Object affordance -hand state association

Object affordance extraction (FARS-AIP)

Motor program(FARS-F5)

Motor execution

Mirror Feedback

Integrate temporal association7b F5canoni

cal

AIP

M1

F5mirror

Object features

Object location (FARS-VIP)

Motor program(FARS-F4)F4

MIP/LIP/VIP

Object recognition

Vis

ual C

orte

x

IT

cIPS

Action recognition (Mirror Neurons)

Visual Input Processing


Reminder: Hand State components

For most components we need to know (3D) configuration of the hand.


What does it take for (monkey) MNS to work?

Visual Input:Recognition of hand (h) and object (o) and their temporal and spatial relation (r).

The set of all valid h * o * r combinations which make up a grasp is very large (actually infinite). It is impossible that the system memorizes all such combinations and raises the mirror response flag when a match occurs.

The visual information must be mapped to a lower dimensional space which is simpler to handle.

This space has to capture the grasp type discrimination and temporal and spatial relation characteristics of the hand and object:

The aperture(s) between fingers, the position of the thumb with respect to palm are key to define the hand configuration relevant for the grasp recognition.

The disparity between the aperture axes and the object grasp axis and the distance of the hand to the object are key to define the relation of the hand to the object.


Key task

To determine whether the motion and preshape of a moving hand may be extrapolated to culminate in a grasp appropriate to one of the affordances of the observed object.

Hand State

We capture the hand and its relation to the target object information in the Hand State, a 7-dimensional trajectory H(t) with the following components:H(t) = (d(t), v(t), a(t), o1(t), o2(t), o3(t), o4(t)) where

d(t): distance to target at time tv(t): tangential velocity of the wrist a(t): Aperture of the virtual fingers involved in grasping at time to1(t): Angle between the object axis and the (index finger tip – thumb tip) vector o2(t): Angle between the object axis and the (index finger knuckle – thumb tip) vector o3(t), o4(t): The two angles defining how close the thumb is to the hand as measured relative to the side of the hand and to the inner surface of the palm.

Note that the whole history of H(t) during a grasp is required to represent the grasp.


Visual Processing for the MNS model

How much should we attempt to solve ? Even though computers are getting more powerful the vision problem in its general form is an unsolved problem in engineering. There exists gesture recognition systems for human-computer interaction and sign language interpretation Our vision system must at least recognize

1) The Hand and its Configuration

2) Object features


Simplifying the problem

We simplifying the problem of recognizing the Hand and its Configuration by using colored patches on the articulation points of the hand.

If we can extract the patch positions reliably then we can try to extract some of the features that make up the hand state by trying to estimate the 3D pose of the hand from 2D pose.

Thus we have 2 steps:

1. Extract the color marker positions

2. Estimate 3D pose


• The Vision task is simplified using colored tapes on the joints and articulation points

• The First step of hand configuration analysis is to locate the color patches unambiguously (not easy!).

Use color segmentation. But we have to compensate for lighting, reflection, shading and wrinkling problems: Robust color detection

The Color-Coded Hand


Robust Detection of the Colors – RGB space

A color image in a computer is composed of a matrix of pixels triplets (Red,Green,Blue) that define the color of the pixel.

We want to label a given pixel color as belonging to one of the color patches we used to mark the hand, or as not belonging to any class.

A straightforward way to detect whether a given target color (R’,G’,B’) matches the pixel color (R,G,B) is to look at the squared distance (R-R’)2 + (G-G’)2 + (B-B’) 2

with a threshold to do the classification.This does not work well, because the shading and different lighting conditions effect R,G,B values a lot and a our simple nearest neighbor method fails. For example an orange patch under shadow is very close to red in RGB space.

But we can do better: Train a neural network that can do the labeling for us


Robust Detection of Colors – the Color Expert

Create a training set using a test image by manuallypicking colors from the image and specifying their labels.

Create a NN – in our case a one hidden layer feed-forward network - that will accept the R,G,B values as input and put out the marker label, or 0 for a non-marker color.

Make sure that the network is not too “powerful” so that it does not memorize the training set (as distinct from generalization)

Train it then Use it: When given a pixel to classify, apply the RGB values of the pixel to the trained network and use the output as the marker that the pixel belongs to.

One then needs a segmentation system to aggregate the pixels into a patch with a single color label.


Color Segmentation and Feature Extraction

Preprocessing

Color Expert(Network

weights)

Training phase: A color expert is generated by training a feed-forward network to approximate human perception of color.

Features

Actual processing: The hand image is fed to an augmented segmentation system. The color decision during segmentation is done by the consulting color expert.

NN augmented segmentation system


STS hand shape recognition

Color Coded Hand Feature Extraction

Step 1 of hand shape recognition: Process the color-coded hand image and generates a set of feature: position of markers relative to the wrist


Hand Model

A realistic drawing of

hand bones. The hand is

modelled with 14 degrees of freedom as illustrated.


STS hand shape recognition2:3D Hand Model Matching

Classification

Grasp Type

Result of feature extraction

Feature Vector

Error minimization

The model matching algorithm minimizes the error between the extracted features and the model hand.

Step 2: The feature vector is used to fit a 3D-kinematics model of the hand by the model matching module. The resulting hand configuration is sent to the classification module.


Vis

ual C

orte

x

STS

7a






Motor execution

Mirror Feedback

Integrate temporal association7b F5canonica

l

AIP

M1

F5mirror

Object features



MIP/LIP/VIP

Object recognition

Vis

ual C

orte

x

IT

cIPS


Reach & Grasp generation


Virtual Hand/Arm and Reach/Grasp Simulator

A power grasp and a side grasp

A precision pinch


Kinematics model of arm and hand

19 DOF freedom: Shoulder(3), Elbow(1), Wrist(3), Fingers(4*2), Thumb (3)

Implementation Requirements Rendering: Given the 3D positions of links’ start and end points, generate a 2D representation of the arm/hand (easy)

Forward Kinematics: Given the 19 angles of the joints compute the position of each link (easy)

Inverse Kinematics: Given a desired position in space for a particular link what are the joint angles to achieve the desired position (semi-hard)

Reach & Grasp execution: Harder than simple inverse kinematics since there are more constraints to be satisfied (e.g. multiple target positions to be achived at the same time)


A 2D, 3DOF arm example

Forward kinematics: given joint angles A,B,C compute the end effector position P:

X = a*cos(A) + b*cos(B) + c*cos(C)

Y = a*sin(A) + b*sin(B) + c*sin(C)

B

A

C

P(x,y)

a

b

c

Inverse kinematics: given joint position P there are infinitely many joint angle triplets to achieve

P(x,y)

Radius of the circles are a and c and the segments connecting the circles are all equal length of b

Radius=a

Radius=c

bb

b


Consider just the arm. The forward kinematics of the arm can be represented as a vector function that maps joint angles of the arm to the wrist position.

(x,y,z)=F(s1,s2,s3,e) , where s1,s2,s3 are the shoulder angles and e is the elbow angle.

We can formulate the inverse kinematics problem as an optimization problem: Given the desired P’ = (x’,y’,z’) to be achieved we can introduce the error function

J = || (P’-F(s1,s2,s3,e)) ||Then we can compute the gradient with respect to s1,s2,s3,e and follow the minus gradient to reach the minimum of J.

This method is called to Jacobian Transpose method as the partial derivatives of F encountered in the above process can be arranged into the transpose of a special derivative matrix called the Jacobian (of F).

A Simple Inverse Kinematics Solution


Power grasp time series data

+: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1-axisdisp2; : speed; : distance.


Curve recognition

The general problem: associate N-dimensional space curves with object affordances

A special case: The recognition of two (or three) dimensional trajectories in physical space

Simplest solution: Map temporal information into spatial domain. Then apply known pattern recognition techniques.

Problem with simplest solution: The speed of the moving point can be a problem! The spatial representation may change drastically with the speed

Scaling can overcome the problem. However the scaling must be such that it preserves the generalization ability of the pattern recognition engine.

Solution: Fit a cubic spline to the sampled values. Then normalize and re-sample from the spline curve.

Result:Very good generalization. Better performance than using the Fourier coefficients to recognize curves.


Curve recognition

Spatial resolution: 30

Network input size: 30

Hidden layer size: 15

Output size: 5

Training : Back-propagation with momentum.and adaptive learning rate

Sampled pointsPoint used for spline interpolationFitted spline

Curve recognition system demonstrated for hand drawn numeral recognition (successful recognition examples for 2, 8 and 3).


Vis

ual C

orte

x

STS

7a






Motor execution

Mirror Feedback

Integrate temporal association7b F5canonica

l

AIP

M1

F5mirror

Object features



MIP/LIP/VIP

Object recognition

Vis

ual C

orte

x

IT

cIPS


Core Mirror Circuit


Hand state

Mirror Neurons (F5mirror)AssociationNeurons

Mirror Feedback

Object affordance

Mirror Neuron Output

Motor Program

Core Mirror Circuit

What is to be learned?:

Connections from hand-state and object-affordance are adapted so that association neurons will respond when hand-state is congruent with object

Connections from association neurons are adapted so that their integrated activity will activate mirror neurons for the appropriate grasp


Connectivity pattern

7b

Mirror Feedback

Object affordance (AIP)

7a

STS

F5mirror

Motor Program (F5canonical)


A single grasp trajectory viewed from three different angles

The wrist trajectory during the grasp is shown by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals.

How the network classifies the action as a power grasp. Empty squares: power grasp output; filled squares: precision grasp; crosses:

side grasp output


Power and precision grasp resolution

( a )

( b )


“Spatial Perturbation” Experiment with trained core mirror circuit

Figure A. A regular precision grasp (the hand spatially coincides with the target).

Figure B. The response of the network as precision grasp.

Figure C. The target object is displaced to create a ‘fake’ grasp.

Figure D. The response of the network to action in Figure C. The activity of the precision mirror neuron is reduced. In the graphs the x axes represent the normalized time (0 for start of grasp, 1 for the contact with object) and y axes represent the cell firing rate.

A

C

B

D


“Kinematics Alteration” Experiment with the trained core mirror circuit

Figure A. A regular precision grasp (the wrist has a bell shaped velocity profile).

Figure B. The velocity profile is (almost) linear.

Figure C. Classification of the action in Figure A as precision grasp.

Figure D. The activity vanished during the observation of action

Note that the scales of the graphs C and D are different.

A B

D E

Normalized time

Normalized time

Normalized time

Firing rate Firing rate

Normalized speed

C D

Correspondence of MNS Model with Monkey Mirror Neuron System

Monkey A/B executes grasp

Monkey A: visual system processes the visual stimuli generated

Monkey A: F5 mirror neurons respond to their preferred stimuli

Mon

key

Mod

el

Grasp Execution

By Grasp Simulator

Action Recognition/ Learning

By Core Mirror Circuit

Back-prop Net

Temporal to Spatial Conversion

Preprocessing

Process

Engine

StrategyFu

nct

ion

ing

Inverse Kinematics Curve fitting,time and space scaling


Experimental Challenges

What are “poor” mirror neurons coding?

- temporal recognition codes

- transient response to actions which are not exactly the preferred stimuli

How can we relate different cells’ responses to each other?

- Fix the condition and record from as many as possible cells with the exactly the same condition.

Is it possible to record from mirror cells in different age groups of monkeys ( i.e. infant to adult)?


How can MNS be plugged into a learning-by-imitation system with faith to biological constrains (BG, Cerebellum, SMA, PFC etc..)

How does the brain handle temporal data? Transform the learning network into a one which can work directly on temporal data. Eliminate the preprocessing required before the input can be applied to MNS core circuit.

Extend the action to be recognized beyond simple grasps.

Model the complementary circuit, learning to grasp by trial and error.

And a lot more!

Modeling Challenges

lecture 23: mns model 2 michael arbib and erhan oztop the mirror neuron system for grasping:

Documents