![Page 1: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/1.jpg)
Perceptionand
Perspectivein
Robotics
Perceptionand
Perspectivein
Robotics
–– Paul Fitzpatrick Paul Fitzpatrick ––
MIT CSAILMIT CSAILUSAUSA
![Page 2: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/2.jpg)
experimentation helps perception
Rachel: We have got to find out if [ugly naked guy]'s alive.
Monica: How are we going to do that? There's no way.
Joey: Well there is one way. His window's open – I say, we poke him.
(brandishes the Giant Poking Device)
![Page 3: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/3.jpg)
robots can experiment
Robot: We have got to find out where this object’s boundary is.
Camera: How are we going to do that? There's no way.
Robot: Well there is one way. Looks reachable – I say, let’s poke it.
(brandishes the Giant Poking Limb)
![Page 4: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/4.jpg)
the root of all vision
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
affordance exploitation(rolling)
manipulator detection(robot, human)
![Page 5: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/5.jpg)
theoretical goal: a virtuous circle
familiar activities
familiar entities (objects, actors, properties, …)
use constraint of familiar activity to
discover unfamiliar entity used within it
reveal the structure of unfamiliar activities by
tracking familiar entities into and through them
![Page 6: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/6.jpg)
practical goal: adaptive robots
Motivated by fallibility§ Complex action and perception will fail§ Need simpler fall-back methods that resolve
ambiguity, learn from errors
Motivated by transience§ Task for robot may change from day to day§ Ambient conditions change§ Best to build in adaptivity from very beginning
Motivated by infants§ Perceptual development outpaces motor § Able to explore despite sloppy control
![Page 7: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/7.jpg)
giant poking device: Cog
Head(7 DOFs)
Torso(3 DOFs)
Left arm(6 DOFs)
Right arm(6 DOFs)
Stand(0 DOFs)
![Page 8: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/8.jpg)
giant poking device: Cog
![Page 9: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/9.jpg)
talk overview
Learning from an activity§ Poking: to learn to recognize objects,
manipulators, etc.§ Chatting: to learn the names of objects
Learning a new activity§ Searching for an object§ Then back to learning from the activity…
![Page 10: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/10.jpg)
virtuous circle
poking
objects, …
![Page 11: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/11.jpg)
virtuous circle
poking, chatting
objects, words, names, …
ball!
![Page 12: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/12.jpg)
virtuous circle
poking, chatting, search
objects, words, names, …
searchsearch
![Page 13: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/13.jpg)
virtuous circle
poking, chatting, search
objects, words, names, …
searchsearch
![Page 14: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/14.jpg)
talk overview
Learning from an activity§ Poking: to learn to recognize objects,
manipulators, etc.§ Chatting: to learn the names of objects
Learning a new activity§ Searching for an object§ Then back to learning from the activity…
![Page 15: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/15.jpg)
talk overview
Learning from an activity§ Poking: to learn to recognize objects,
manipulators, etc.§ Chatting: to learn the names of objects
Learning a new activity§ Searching for an object§ Then back to learning from the activity…
![Page 16: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/16.jpg)
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
affordance exploitation(rolling)
manipulator detection(robot, human)
![Page 17: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/17.jpg)
pokingobject segmentation
![Page 18: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/18.jpg)
“Active Segmentation”
segmenting objectsthrough action
![Page 19: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/19.jpg)
“Active Segmentation”
segmenting objectsby coming into contact with them
![Page 20: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/20.jpg)
a simple scene?
Edges of table and cube overlap Color of cube and
table are poorly separatedCube has
misleading surface pattern
Maybe some cruel grad-student
faked the cube with paper,or glued it
to the table
![Page 21: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/21.jpg)
active segmentation
![Page 22: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/22.jpg)
active segmentation
![Page 23: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/23.jpg)
Sandini et al, 1993
![Page 24: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/24.jpg)
where to poke?
Visual attention system§ Robot selects a region to fixate based on salience
(bright colors, motion, etc.)§ Region won’t generally correspond to extent of
object
Poking activation§ Region is stationary§ Region reachable (right distance, not too high up)§ Distance measured through binocular disparity
![Page 25: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/25.jpg)
visual attention system
attracted toskin color
attracted tobright color,movement
smoothpursuit
attracted toskin color
smoothpursuit
person approaches
shakesobject
movesobject
hidesobject
stands up
(Collaboration with Brian Scassellati, Giorgio Metta, Cynthia Breazeal)
![Page 26: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/26.jpg)
tracking
![Page 27: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/27.jpg)
poking activation
![Page 28: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/28.jpg)
evidence for segmentation
Areas where motion is observed upon contact§ classify as ‘foreground’
Areas where motion is observed immediately before contact§ classify as ‘background’
Textured areas where no motion was observed§ classify as ‘background’
Textureless areas where no motion was observed§ no information
![Page 29: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/29.jpg)
minimum cut
“allegiance” = cost of assigning two nodes to different layers (foreground versus background)
foregroundnode
backgroundnode
pixel nodes
allegiance to foreground
allegiance to background
pixel-to-pixelallegiance
![Page 30: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/30.jpg)
minimum cut
“allegiance” = cost of assigning two nodes to different layers (foreground versus background)
foregroundnode
backgroundnode
pixel nodes
allegiance to foreground
allegiance to background
pixel-to-pixelallegiance
![Page 31: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/31.jpg)
grouping (on synthetic data)
“figure” points
(known motion)
proposed segmentation
“ground”points
(stationary,or gripper)
![Page 32: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/32.jpg)
point of contact
1 2 3 4 5
9876 10
Motion spreads suddenly, faster than the arm itself à contact
Motion spreads continuously (arm or its shadow)
![Page 33: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/33.jpg)
point of contact
![Page 34: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/34.jpg)
segmentation examples
Impact event Motion caused(red = novel,Purple/blue = discounted)
Segmentation(green/yellow)
Sidetap
Backslap
![Page 35: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/35.jpg)
segmentation examples
![Page 36: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/36.jpg)
segmentation examples
table
car
![Page 37: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/37.jpg)
boundary fidelity
0 0.05 0.1 0.15 0.2 0.25 0.30
0.04
0.08
0.12
0.16
0.2
measure of size (area at poking distance)
mea
sure
of a
nisi
otro
py(s
quar
e ro
ot o
f) s
econ
dH
um
omen
tcube
car
bottle
ball
![Page 38: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/38.jpg)
signal to noise
![Page 39: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/39.jpg)
pokingobject segmentation
edge catalog
![Page 40: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/40.jpg)
“Appearance Catalog”
exhaustively characterizingthe appearance of a low-level feature
![Page 41: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/41.jpg)
sampling oriented regions
![Page 42: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/42.jpg)
sample samples
![Page 43: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/43.jpg)
most frequent samples
1st
21st
41st
61st
81st
101st
121st
141st
161st
181st
201st
221st
![Page 44: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/44.jpg)
selected samples
![Page 45: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/45.jpg)
some tests
Red = horizontal Green = vertical
![Page 46: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/46.jpg)
natural images
0000 909000 454500 −−454500±±22.522.500 ±±22.522.500 ±±22.522.500 ±±22.522.500
![Page 47: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/47.jpg)
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
![Page 48: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/48.jpg)
“Open Object Recognition”
detecting and recognizing familiar objects,
enrolling unfamiliar objects
![Page 49: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/49.jpg)
object recognition
Geometry-based§ Objects and images modeled as set of
point/surface/volume elements§ Example real-time method: store geometric relationships in
hash table
Appearance-based§ Objects and images modeled as set of features closer to
raw image§ Example real-time method: use histograms of simple
features (e.g. color)
![Page 50: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/50.jpg)
geometry+appearance
Advantages: more selective; fastDisadvantages: edges can be occluded; 2D methodProperty: no need for offline training
angles + colors + relative sizes
invariant to scale, translation,In-plane rotation
![Page 51: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/51.jpg)
details of features
Distinguishing elements:§ Angle between regions (edges)§ Position of regions relative to their projected intersection point (normalized
for scale, orientation)§ Color at three sample points along line between region centroids
Output of feature match:§ Predicts approximate center and scale of object if match exists
Weighting for combining features:§ Summed at each possible position of center; consistency check for scale§ Weighted by frequency of occurrence of feature in object examples, and
edge length
![Page 52: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/52.jpg)
localization example
look for this… …in this
![Page 53: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/53.jpg)
localization example
![Page 54: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/54.jpg)
localization example
![Page 55: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/55.jpg)
localization examplejust using geometry
geometry + appearance
![Page 56: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/56.jpg)
other examples
![Page 57: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/57.jpg)
other examples
![Page 58: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/58.jpg)
other examples
![Page 59: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/59.jpg)
just for fun
look for this… …in thisresult
![Page 60: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/60.jpg)
real object in real images
![Page 61: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/61.jpg)
yellow on yellow
![Page 62: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/62.jpg)
multiple objects
camera image
response foreach object
implicated edgesfound and grouped
![Page 63: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/63.jpg)
extending the attention system
trackertrackermotor control
(eyes, head, neck)motor control
(eyes, head, neck)
egocentricmap
egocentricmap
low-levelsalience filters
low-levelsalience filters
object recognition/localization (wide)object recognition/localization (wide)
object recognition/localization (foveal)object recognition/localization (foveal)
motor control(arm)
motor control(arm)
pokingsequencerpoking
sequencer
![Page 64: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/64.jpg)
attention
![Page 65: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/65.jpg)
open object recognition
sees ball,“thinks” it is cube
correctly differentiatesball and cube
robot’scurrentview
recognizedobject (as seenduring poking) pokes,
segmentsball
![Page 66: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/66.jpg)
open object recognition
![Page 67: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/67.jpg)
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
manipulator detection(robot, human)
![Page 68: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/68.jpg)
finding manipulators
Analogous to finding objects
Object§ Definition: physically coherent structure§ How to find one: poke around and see what moves together
Actor§ Definition: something that acts on objects§ How to find one: see what pokes objects
![Page 69: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/69.jpg)
similar human and robot actions
Object connects robot and human action
![Page 70: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/70.jpg)
catching manipulators in the act
contact!manipulator approaches object
![Page 71: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/71.jpg)
modeling manipulators
![Page 72: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/72.jpg)
manipulator recognition
![Page 73: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/73.jpg)
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
affordance exploitation(rolling)
manipulator detection(robot, human)
![Page 74: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/74.jpg)
“Affordance Recognition”
switching fromobject-centric perception
to recognizing action opportunities
(collaboration with Giorgio Metta)
![Page 75: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/75.jpg)
what is an affordance?
A leaf affordsrest/walking to an ant …
… but not to an elephant
![Page 76: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/76.jpg)
exploring affordances
![Page 77: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/77.jpg)
objects roll in different ways
a toy carit rolls forward
a bottleit rolls along its side
a toy cubeit doesn’t roll easily
a ballit rolls in
any direction
![Page 78: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/78.jpg)
preferred direction of motion
0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
Bottle, pointedness=0.13 Car, pointedness=0.07
Ball, pointedness=0.02Cube, pointedness=0.03freq
uenc
y of
occ
urre
nce
difference between angle of motion and principal axis of object (degrees)
Rolls at right angles toprincipal axis
Rolls along principal axis
![Page 79: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/79.jpg)
affordance exploitation
Caveat: this work uses an early version of object detection (not the one presented today)
![Page 80: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/80.jpg)
mimicry test
Demonstration by human
Mimicry in similar situation
Mimicry when object is rotated
Invoking the object’s natural rolling affordance
Going against the object’s natural rolling affordance
![Page 81: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/81.jpg)
mimicry test
![Page 82: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/82.jpg)
pokingobject segmentation
edge catalog
object detection(recognition, localization,
contact-free segmentation)
affordance exploitation(rolling)
manipulator detection(robot, human)
![Page 83: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/83.jpg)
talk overview
Learning from an activity§ Poking: to learn to recognize objects,
manipulators, etc.§ Chatting: to learn the names of objects
Learning a new activity§ Searching for an object§ Then back to learning from the activity…
![Page 84: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/84.jpg)
open speech recognition
Vocabulary can be extended at any timeAssumes active vocabulary is smallIsolated words only
![Page 85: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/85.jpg)
keeping track of objects
EgoMapshort term memory
of objects and their locationsso “out of sight” is not “out of mind”
![Page 86: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/86.jpg)
keeping track of objects
![Page 87: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/87.jpg)
speech and space: chatting
![Page 88: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/88.jpg)
talk overview
Learning from an activity§ Poking: to learn to recognize objects,
manipulators, etc.§ Chatting: to learn the names of objects
Learning a new activity§ Searching for an object§ Then back to learning from the activity…
![Page 89: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/89.jpg)
Tomasello’s experiments
Designed experiments to challenge constraint-based theory of language acquisition in infants
Wants to show infants learn words through real understanding of activity (‘flow of interaction’), not hacks
Great test cases! Get beyond direct association
(But where does knowledge of activity come from?)
![Page 90: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/90.jpg)
“let’s go find the toma!”
Infant plays with set of objects
Then adult says “let’s go find the toma!” (nonce word)Acts out a search, going to several objects first before finally finding the ‘toma’
Later, infant tested to see which object it thinks is the ‘toma’
Several variants (e.g. ‘toma’ placed in inaccessible location with the infant watching – adult is upset when trying to get it)
![Page 91: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/91.jpg)
“let’s go find the toma!”
![Page 92: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/92.jpg)
goal
Have robot learn about search activity from examples of looking for known objects
Then apply that to a “find the toma”-like scenario
![Page 93: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/93.jpg)
virtuous circle
poking, chatting
car, ball, cube, and their names
discover car, ball, and cube through poking; discover their names
through chatting
![Page 94: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/94.jpg)
virtuous circle
poking, chatting, search
car, ball, cube, and their names
follow named objects into search activity, and observe
the structure of search
![Page 95: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/95.jpg)
virtuous circle
poking, chatting, searching
car, ball, cube, toma, and their names
discover object through poking, learn its name
(‘toma’) indirectly during search
![Page 96: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/96.jpg)
learning about search
![Page 97: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/97.jpg)
what the robot learns
‘Find’ is followed by mention of an absent object
‘Yes’ is said when a previously absent object is in view
![Page 98: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/98.jpg)
how it learns this
Look for reliable event/state combinations, sequences
Events are:§ hearing a word§ seeing an object
States are: § recent events§ situation evaluations (object corresponding to word not present,
mismatch between word and object, etc.)
![Page 99: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/99.jpg)
finding the toma
![Page 100: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/100.jpg)
caveats
Much much less sophisticated than infants!
Cues the robot is sensitive to are very impoverished
Slightly different from Tomasello’s experiment
Saved state between stages – wasn’t one complete continuous run
![Page 101: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/101.jpg)
conclusions: why do this?
Uses all the ‘alternative essences of intelligence’§ Development§ Social interaction§ Embodiment§ Integration
Points the way to really flexible robots§ today the robot should sort widgets from wombats (neither of
which it has seen before)§ who knows what it will have to do tomorrow
![Page 102: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/102.jpg)
conclusions: contributions
active segmentation
open object recognition
appearance catalog
affordance recognition
open speech recognition
through contact
for correction, enrollment
for oriented features
for rolling
for isolated words
virtuous circle of development
learning about and through activity
![Page 103: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/103.jpg)
conclusions: the future
Dexterous manipulation
Object perception (visual, tactile, acoustic)§ During dextrous manipulation§ During failed manipulation
Integration with useful platform§ Socially enabled§ Mobile
![Page 104: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/104.jpg)
![Page 105: and - People | MIT CSAILpeople.csail.mit.edu/.../present/liralab03perception.pdfreveal the structure of unfamiliar activities by tracking familiar entities into and through them practical](https://reader036.vdocuments.us/reader036/viewer/2022062510/611d24a5f830475784337aad/html5/thumbnails/105.jpg)
Tom Murphy (www.cs.cmu.edu/~tom7)