Neurorobotics:From Vision to Action 62.4 The Role of Mirror Systems 1473 The problem of defining objects reflects the problem terms of their affordances.Given a sufficient level of ab- of segmenting the visual scene.Criteria such as contrast, straction,this implementation is close to the response of binocular disparity,and motion can be applied,but any of canonical neurons in F5 and their interaction with neu- them can fail in a given situation.In the present system, rons observed in AIP that respond to object orientation optic flow was used to detect and segment an object after (see Sakata et al.[62.128]). impact with the arm end point,and from segmentation based on shape,color,and behavior data.Depending on 62.4.3 Mirror Neurons and Imitation the available motor repertoire,the robot could explore a range of possible object behaviors (affordances)and Fitzpatrick and Metta [62.126]also addressed the ques- form an object model which combines both sensorial tion of what is further required for interpreting observed and motor properties of the object that robot had the actions.Where in the previous section,the robot identi- chance to interact with. fied the motion of the object because of a specific action In a typical experiment,the human operator waves applied to it,here it could backtrack and derive the type an object in front of the robot,which reacts by looking of action from the observed motion of the object.It can at it.If the object is dropped on the table in front of the further explore what is causing motion and learn about robot,a reaching action is initiated,and the robot possi-the concept of manipulator in a more general setting.In bly makes contact with the object.Vision is used during fact,the same segmentation procedure mentioned ear- the reaching and touching movement to guide the robot's lier could visually interpret poking actions generated flipper toward the object,to segment the hand from the by a human as well as those generated by the robot. object upon contact,and to collect information about One might argue that observation could be exploited the behavior of the object caused by the application for learning about object affordances.This is possibly of a certain action (see Fitzpatrick [62.127]).Unfortu-true to the extent passive vision is reliable and action nately,interaction of the robot's flipper with objects does is not required.The advantage of the active approach, not result in a wide class of different affordances and so at least for the robot,is that it allows controlling the this study focused on the rolling affordances of a toy car,amount of information impinging on the visual sensors an orange juice bottle,a ball,and a colored toy cube.Be- by,for instance,controlling the speed and type of ac- sides reaching.the robot's motor repertoire consists of tion.This strategy might be especially useful given the four different stereotyped approach movements cover- limitations of artificial perceptual systems.Thus,obser- ing a range of directions of about 180 around the object. vations can be converted into interpreted actions.The For each successful trial.the robot stored the result of action whose effects are closest to the observed conse- the segmentation,the object's principal axis which was quences on the object (which we might translate into selected as representative shape parameter,the action the goal of the action)is selected as the most plausible -initially selected randomly from the set of four ap- interpretation given the observation.Most importantly, proach directions-and the movement of the center of the interpretation reduces to the interpretation of the mass of the object for some hundreds of milliseconds simple kinematics of the goal and consequences of the after impact with the flipper was detected.This gives in- action rather than to understanding the complex kine- formation about the rolling properties (affordances)of matics of the human manipulator.The robot understands the different objects,e.g.,the car tends to roll along its only to the extent it has learned to act.One might note principal axis,the bottle at a right angle with respect to that a more refined model should probably include vi- the axis.Figure 62.9 shows the result of collecting about sual cues from the appearance of the manipulator into Part 700 samples of generic poking actions and estimating thethe interpretation process.Indeed,the hand state that average direction of displacement of the object.Note,for was central to the Oztop-Arbib model was based on an example,that the action labeled as backslap(moving the object-centered view of the hand's trajectory in a coor- object with the flipper outward from the robot)consis-dinate frame based on the object's affordances.The last tently gives a visual object motion upward in the image question to address is whether the robot can imitate the plane(corresponding to the peak at-100,0 being the goal of a poking action.The step is indeed small since direction parallel to the image x-axis:the y-axis pointing most of the work is actually in interpreting observations. downward).A similar consideration applies to the other Imitation was generated in the following by replicating actions.Although crude,this implementation shows that the latest observed human movement with respect to the with little pre-existing structure the robot could acquire object and irrespective of its orientation,for example, the crucial elements for building knowledge of objects in in case the experimenter poked the toy car sideways,Neurorobotics: From Vision to Action 62.4 The Role of Mirror Systems 1473 The problem of defining objects reflects the problem of segmenting the visual scene. Criteria such as contrast, binocular disparity, and motion can be applied, but any of them can fail in a given situation. In the present system, optic flow was used to detect and segment an object after impact with the arm end point, and from segmentation based on shape, color, and behavior data. Depending on the available motor repertoire, the robot could explore a range of possible object behaviors (affordances) and form an object model which combines both sensorial and motor properties of the object that robot had the chance to interact with. In a typical experiment, the human operator waves an object in front of the robot, which reacts by looking at it. If the object is dropped on the table in front of the robot, a reaching action is initiated, and the robot possibly makes contact with the object. Vision is used during the reaching and touching movement to guide the robot’s flipper toward the object, to segment the hand from the object upon contact, and to collect information about the behavior of the object caused by the application of a certain action (see Fitzpatrick [62.127]). Unfortunately, interaction of the robot’s flipper with objects does not result in a wide class of different affordances and so this study focused on the rolling affordances of a toy car, an orange juice bottle, a ball, and a colored toy cube. Besides reaching, the robot’s motor repertoire consists of four different stereotyped approach movements covering a range of directions of about 180◦ around the object. For each successful trial, the robot stored the result of the segmentation, the object’s principal axis which was selected as representative shape parameter, the action – initially selected randomly from the set of four approach directions – and the movement of the center of mass of the object for some hundreds of milliseconds after impact with the flipper was detected. This gives information about the rolling properties (affordances) of the different objects, e.g., the car tends to roll along its principal axis, the bottle at a right angle with respect to the axis. Figure 62.9 shows the result of collecting about 700 samples of generic poking actions and estimating the average direction of displacement of the object. Note, for example, that the action labeled as backslap (moving the object with the flipper outward from the robot) consistently gives a visual object motion upward in the image plane (corresponding to the peak at −100◦, 0◦ being the direction parallel to the image x-axis; the y-axis pointing downward). A similar consideration applies to the other actions. Although crude, this implementation shows that with little pre-existing structure the robot could acquire the crucial elements for building knowledge of objects in terms of their affordances. Given a sufficient level of abstraction, this implementation is close to the response of canonical neurons in F5 and their interaction with neurons observed in AIP that respond to object orientation (see Sakata et al. [62.128]). 62.4.3 Mirror Neurons and Imitation Fitzpatrick and Metta [62.126] also addressed the question of what is further required for interpreting observed actions. Where in the previous section, the robot identi- fied the motion of the object because of a specific action applied to it, here it could backtrack and derive the type of action from the observed motion of the object. It can further explore what is causing motion and learn about the concept of manipulator in a more general setting. In fact, the same segmentation procedure mentioned earlier could visually interpret poking actions generated by a human as well as those generated by the robot. One might argue that observation could be exploited for learning about object affordances. This is possibly true to the extent passive vision is reliable and action is not required. The advantage of the active approach, at least for the robot, is that it allows controlling the amount of information impinging on the visual sensors by, for instance, controlling the speed and type of action. This strategy might be especially useful given the limitations of artificial perceptual systems. Thus, observations can be converted into interpreted actions. The action whose effects are closest to the observed consequences on the object (which we might translate into the goal of the action) is selected as the most plausible interpretation given the observation. Most importantly, the interpretation reduces to the interpretation of the simple kinematics of the goal and consequences of the action rather than to understanding the complex kinematics of the human manipulator. The robot understands only to the extent it has learned to act. One might note that a more refined model should probably include visual cues from the appearance of the manipulator into the interpretation process. Indeed, the hand state that was central to the Oztop–Arbib model was based on an object-centered view of the hand’s trajectory in a coordinate frame based on the object’s affordances. The last question to address is whether the robot can imitate the goal of a poking action. The step is indeed small since most of the work is actually in interpreting observations. Imitation was generated in the following by replicating the latest observed human movement with respect to the object and irrespective of its orientation, for example, in case the experimenter poked the toy car sideways, Part G 62.4