$30
between perception and manipulation using eye-to-hand camera coordination. Towards this goal, we have developed a simulation environment in PyBullet, where a Universal Robot (UR5e) with a two-fingered Robotiq 2F-140 gripper perceives the environment through an RGB-D camera. The experimental setup for this assignment is shown in Fig. 1. This setup is very useful to extensively evalu-
Figure 1: Our experimental setup consists of a table, a basket, a UR5e robotic arm, and objects from the YCB dataset. The green rectangle shows the robot’s workspace, and the camera indicates the pose of the camera in the environment. Synthesis RGB and depth images, together with a segmentation mask are shown on the left side of the figure.
Service robots typically use a perception system to perceive the world. In particular, perception systems provide valuable information that the robot has to consider for interacting with users and environments. To assist humans in various daily tasks, a robot needs to know how to grasp and manipulate objects in different situations. For instance, consider a clear table task, where a robot needs to remove all objects from a table and put them into a basket. Such tasks consist of two phases: the first phase is dedicated to the perception of the object, and the second phase is about the planning and execution of the manipulation task. In this assignment, we mainly focus on deep visual object grasping and manipulation.
Traditional object grasping approaches explicitly model how to grasp different objects by considering prior knowledge about object shape, and pose. It has been proven that it is hard to obtain such prior information for never-seen-before objects in human-centric environments. More recent approaches try to tackle this limitation by formulating object grasping as an object-agnostic problem, in which grasp synthesis is detected based on learned visual features without considering prior object-specific information. In this vein, much attention has been given to object grasping approaches based on Convolutional Neural Network (CNN). Among deep visual grasping approaches, GR-ConvNet [1] showed the state-of-the-art results. In particular, GR-ConvNet receives an RGB and a Depth images and generate a pixel-wise grasp configuration. As an example of how to use CNN in visual grasping experiments, we have integrated the GR-ConvNet to our setup.
setup to
L Your tasks
. For the first part, you need to understand and describe how this system works by reading GR-ConvNet paper, checking the provided code, and examining its performance in isolated, packed, and pile scenarios. We explain the evaluation scenarios and metrics below.
For the second part of this , you need to select another deep visual grasping approach and integrate it into the system. Similar to the previous part, you need to evaluate the model in isolated, packed, and pile scenarios. Finally, you need to analyze and compare the obtained results with the GR-ConvNet model.
It should be noted that it is possible to select a 3D based deep learning approach (e.g., [3, 4, 5, 6]), or depth only based approaches (e.g., [7]), or even data-driven approach (e.g., [8]) instead of RGB-D based approaches. To convert the RGB-D image to point cloud you can use Open3D library. In the case of deep learning approaches, we strongly recommend using pre-trained models.
To evaluate a grasping approach, you need to performed 10 rounds of experiments per scenario and analysis the obtained results. In the case of pile and packed scenarios, for each experiment, we randomly generate a new scene consisting of five objects (see Fig. 3). For the isolated object scenario, we place a randomly selected object in an arbitrary pose inside the robot’s workspace. In all experiments, the robot knows in advance about the pose of the basket object as the placing area, while the robot needs to predict grasp synthesis for the given scene and select the best graspable pose to grasp the target object, pick it up, and put it in the basket. Note that, at the beginning of each experiment, we set the robot to a predefined setting, and randomly place objects on the table.
A particular grasp is recorded as a success if the object is inside the basket at the end of the experiment. You need to report the performance of an approach by measuring
success_rate= numberof successful grasps number of attempts .
In the case of pile and packed scenarios, to generate a simulated scene containing five objects, we randomly spawn objects into a box placed on top of the table. We wait for a couple of seconds until all objects become stable, and then remove the box. To generate a packed scenario,
we iteratively placed a set of objects next together in the workspace. These procedures are shown in Fig. 3.
For the pile and packed scenarios, in addition to the success_rate, you need to report the average percentage of objects removed from the workspace. An experiment is continued until either all objects get removed from the workspace, or four failures occurred consecutively. Note that, the system automatically reports a sum-