gigino test
Tratto da: https://www.sciencedirect.com/science/article/abs/pii/S0736584523001199
To grasp the randomly moving objects in unstructured environment, a novel robotic grasping method based on multi-agent TD3 with high-quality memory (MA-TD3H) is proposed. During the grasping process, the MA-TD3H algorithm obtains the object's motion state from the vision detection module and outputs the velocity of the gripper. The quality of the sampled memory plays a crucial role in reinforcement learning models. In MA-TD3H, transitions are saved in the memory buffer and high-quality memory (H-memory) buffer respectively. When updating the actor network, transitions are adaptively sampled from the two buffers by a set ratio according to the current grasping success rate of the algorithm. Also, the multi-agent mechanism enables the MA-TD3H algorithm to control multiple agents for simultaneous training and experience sharing. In the simulation, MA-TD3H improves the success rate of grasping the moving object by around 25 percent, compared with TD3, DDPG and SAC. While in most cases, MA-TD3H spends 80 percent of the time of the other algorithms. In real-world experiments on grasping objects in different shapes and trajectories, the average grasping prediction success rate (GPSR) and grasping reaching success rate (GRSR) of MA-TD3H are above 90 percent and 80 percent respectively, and the average GRSR is improved by 20–30 percent compared with the other algorithms. In summary, simulated and real-world experiments validate that the MA-TD3H algorithm outperforms the other algorithms in robotic grasping for moving objects.
Introduction
Nowadays, service-oriented robots are widely used in various fields such as medical care, transportation and manufacturing. In smart manufacturing, the robotic arm has been applied in the picking [1], placement [2] and perception [3,4] of the workpiece, so that the workers can be relieved of repetitive tasks and the productivity can be improved. Currently, the robotic grasping [5] problem is a fundamental problem in intelligent assembly. Robotic arms can assist workers by grasping parts and tools, which greatly improves assembly efficiency. However, most of the researches only focus on the robotic grasping for static objects. In the real manufacturing environments, the spatial positions of the objects might change dynamically at unpredictable velocity and there are few researches on the robotic grasping for arbitrarily moving objects.
Robotic grasping methods are divided into the analytic methods and the data-driven methods [6,7]. Early researchers used analytic methods to perform robotic object grasping tasks. In offline analytic methods, on which tremendous work has been made [8], [9], [10], the precise geometric models of the objects are calculated, and the robotic grasping problem is converted into a constrained optimization problem, where the optimal grasping movement is found using kinematics, geometric relations and dynamic performance criteria. Due to improved sensor performance and expanding market demand, online analytic methods are gradually applied to moving object robotic grasping [11,12], which utilize the information returned from tactile [13], visual [14] and force sensors [15] to calculate the dynamic model in real-time and thereby adjust the posture of the robotic arm. Among them, vision-based online analytic methods performing robotic grasping for moving objects have been well developed due to the advantages of the detailed environment's information, wide field of view and high accuracy. In [16,17], the authors used the vision system to compute the 3D position of each pixel in the image to obtain the position of the moving object in real-time. Then they applied the motion planner to update the joint-level servos of the arm to control the end effector approaching the moving object. In [18,19], the authors used different candidate search regions to train multiple tracking experts with the proposed filter for good performance on the fast-moving objects. Then the manipulator equipped with an eye-in-hand depth camera [20] can approach the moving targets and accomplish the grasping. In conclusion, with the geometric models of the moving objects being priorly known and the motion state of the objects being visually detected, the online analytic methods can complete the robotic grasping task for moving objects.
Due to the lack of prior knowledge of the moving objects and the environment, more and more data-driven methods have been explored in recent years [21], [22], [23], [24], [25], [26], [27]. In [22], the authors divided the robotic grasping pose into grasping angle, grasping position and grasping width and trained a lightweight convolutional neural network (CNN)called GGCNN to calculate them separately. GGCNN can capture the position change of a moving object in almost real-time and perform grasping. In [6], the authors trained two networks simultaneously to grasp the moving object, one for predicting the grasp matrix through the CGD dataset, and the other for performing visual servoing control to ensure that the moving object remains within the camera's field of view. In [23], the authors proposed a grasp detection depth network to detect the grasping rectangle from the visual image, and then enhanced the stability of the robotic arm in the grasping process by haptic sensors.
To avoid the shortcomings of deep learning (DL) approaches, for instance, the susceptibility to the environment, the long time needed to build a labeled database, and the dependence of the grasping performance on the supervisor's ability to interpret the grasping posture, deep reinforcement learning (DRL) is applied to robotic grasping for moving objects [28]. Typical RL algorithms, such as DDPG [29], SAC [30] and TD3 [31], are based on trial-and-error approaches focusing on maximizing a cumulative reward function, which allows an object-agnostic grasping procedure without environment-restricted modeling. The robotic grasping methods for moving objects based on RL try to find the actions that can get a higher reward in the training process, which leads to successful grasping of the moving object [32]. In [33], a DRL robotic grasping method for moving objects without the prerequisite of a large amount of training data was proposed. The method took human grasping examples as a priori knowledge and then simulated various possible actions with “action view” based rendering. Experiments validated that the method could grasp the object effectively even if the position of the object was dynamically changed after each grasp attempt. In [34], the DDPG algorithm was applied to grasp the moving workpiece in automatic stamping. Considering the low success rate of DDPG, Hindsight Experiment Replay (HER) was used to improve the success rate. However, the article carried out only simulation, but no real-world experiments. In [35], the SAC algorithm was deployed to grasp moving objects in a Baxter robot equipped with an RGBD camera, and good results were achieved in both simulation and real-world experiments. But the effect of replay buffer quality on algorithm convergence was not taken into account, leading to long training time and low training efficiency. In [36], the YOLO algorithm was used to recognize the moving object, and then a moving object detection and prediction network was designed by combining the CNN and the Long Short Term Memory (LSTM) algorithm. The CNN was designed to predict the position and grasping angle of a moving object and the LSTM took the three latest positions of the object as the input and output five future grasping positions. The predicted future positions were closely related to the inputs. Therefore, the LSTM would fail to predict the future positions of the moving object when the object moves in an irregular path. In summary, there are still many challenges in the researches of robotic grasping method for moving object based on RL, such as low real-time performance, long training time and difficulty in deploying the simulation model to the real-world experiment.
In DRL, asynchronous update [37,38] and episodic control [37,39] have been proved to be effective in training time reduction and reward improvement. The asynchronous update enables DRL to train several agents simultaneously and share their experiences. The main problem is to investigate how diverse experiences of multiple robotic platforms can be appropriately integrated into a single policy. Properly handling the diversity of the sampled experiences might benefit the explicit exploration. Episodic control records highly rewarded transitions and follows a policy that replays sequences of transitions that previously yielded high rewards. However, previous works can't adaptively adjust the episodic memory criterion, which may reduce the training efficiency and the trained model's capability. Another problem is that the real-world environment may differ from the simulation. The sim-to-real transfer [40] should be considered so that the control algorithm trained in the simulation can be migrated to the real-world robotic system without fine-tuning.
To solve the above problems, this paper proposes a DRL robotic grasping method for moving objects based on multi-agent TD3 with H-memory (MA-TD3H). To improve the grasping success rate of the trained model, multiple training environments are set up and an asynchronous agent is built in each environment respectively. The robotic grasping system has a vision detection module and the MA-TD3H as the control algorithm. Firstly, the vision detection module obtains the RGB images and depth images from the camera set on the workbench. Then the RGB images obtained by the camera are fed to the YOLOv3 [41] algorithm in the vision detection module to recognize and locate the moving object. The grasping pose of the gripper is obtained by image processing techniques based on Hough transform and Canny operator [42]. Finally, the control algorithm outputs the velocity of the gripper in real-time and guides the gripper to grasp the object. To improve the training efficiency, an adaptive H-memory mechanism is proposed. The memory replay mechanism allows the agent to learn from past experiences. A memory buffer and an H-memory buffer are established separately, and the transitions are stored in either buffer depending on the transitions’ quality. In the training process, transitions are sampled from both buffers by a certain ratio, which changes adaptively as the training progresses. The principle of H-memory is that when the grasping success rate of the model is relatively low, the algorithm needs to sample more high-quality samples from H-memory to improve the success rate. When the success rate of the model is relatively high, the algorithm needs to prioritize exploration of the environment over exploitation of high-quality samples. The trained MA-TD3H algorithm can effectively perform robotic grasping for moving objects in both simulation and the real-world environment.
The contributions of this paper can be summarized as follows:
1)
A robotic grasping method for moving objects based on MA-TD3H is proposed. H-memory and multi-agent mechanism are applied to improve the convergence speed and the grasping capability of the MA-TD3H algorithm compared with MA-TD3 (without H-memory) and TD3H (without multi-agent mechanism) algorithms.
2)
An adaptive effective-sample method called H-memory is proposed. The parameters, threshold of H-memory and the sample ratio are adaptively adjusted according to the grasping success rate in the training process.
3)
With eye-to-hand calibration and coordinate transformation, the grasping model trained in the simulation can be deployed to a real-world robotic arm without fine-tuning. Both simulated and real-world experiments have validated that the MA-TD3H algorithm outperforms TD3, DDPG and SAC algorithms in the same robotic grasping tasks for moving objects.
The remainder of this paper is structured as follows. Section 2 introduces the framework of the robotic grasping model for moving objects. In Section 3, a detailed robotic grasping method for moving objects based on MA-TD3H is presented. Simulations are reported in Section 4, and the real-world experiments are given in Section 5, followed by a conclusive Section 6.
Commenti
Posta un commento