For autonomous robots, it is desirable to learn coordination of primitive skills that can effectively solve long-horizon tasks and perform novel ones. Recent advances in hierarchical policy learning have shown that decomposing complex tasks into sequences of primitive skills which are called sketches can enable robots to perform directed exploration in challenging manipulation tasks. However, they usually fall short in sequencing skills in a new task without retraining as the task sketches are almost hard-coded or learned by deep reinforcement learning. To improve exploration efficiency for long-horizon tasks, we propose Sketch RL, a hierarchical framework that combines supervised learning with reinforcement learning interactively generates the task sketch, and utilizes it as the curriculum to guide low-level skill learning. Furthermore, to allow for multitask decomposition and generalizing few-shot to new tasks, our method exploits a Vision-based Skill Predictor (VSP) to capture shared subtask structure. Extensive experiments on challenging manipulation tasks demonstrate that Sketch RL substantially outperforms other prior baseline methods and is capable of adapting to new tasks with different sketches and real-world settings.
we develop Sketch RL, an interactive hierarchical policy learning system that embeds skill transition dynamics of tasks into the process of low-level skill learning, which enables improvement in exploration efficiency for complex manipulation tasks and generalization to new tasks unseen during training. The first phase is to train the Vision-based Skill Predictor (VSP) module to predict the next primitive skill with visual inputs by comparing the structural similarity between the current frame and different key frames of a task. By capturing the shared subtask structure, this paradigm of supervised learning based on visual inputs can in principle enable the high-level task planner to deal with multitask decomposition and generalization few- shot to new tasks.
We evaluate our Sketch RL method on four challenging manipulation tasks from robosuite framework. NutAssembly (N): the robot aims to pick the round nut and then place it into a cylindrical peg. Stack (S): a red cube need to be stacked on top of a green cube. PickPlaceMilk (M): the robot need to pick a milk box and move it to the target bin. StackThree (ST): a red cube needs to be stacked on top of a green cube, then a blue cube needs to be stacked on top of the red cube. To train the Sketch RL, the simulation environment provides high-level visual observations including the end-effector and operating objects at fixed intervals and low-dimensional states consisting of proprioceptive states and task-related information at each decision-making step.
NutAssembly (N) |
Stack (S) |
PickPlaceMilk (M) |
StackThree (ST) |
The primary experimental question in this paper is whether the learned VSP module is capable of multitask decomposition and improving the exploration efficiency of learning policies for solving complex manipulation tasks. We compare our method to relevant prior work which performs either reinforcement learning method from scratch or hierarchical policy learning methods. Note that four manipulation tasks share one VSP module trained on datasets {N, S, M} for generating task sketches.
To evaluate the performance of the VSP module when generalizing few-shot to new tasks, we conduct the cross-task experiments by selecting one or two tasks for training and a new task for testing. These results show that the attention mechanism plays an important role in better capturing object-agent interaction features of a task that might be shared with different tasks and be predictive of corresponding skills.
To validate the practicality of Sketch RL for solving multi-stage manipulation tasks, we also perform the real-world cross-robot evaluations in Lift and PickPlaceMilk tasks. Quantitatively, we evaluate 20 trials for each task and achieve the average success rate 100% on Lift and 90% on PickPlaceMilk.
Real-Lift |
Real-PickPlaceMilk |