20224701 Jisu Han, 20235333 Haewon Jung, 20218120 YoonYoung Cho

Task and Motion Planning (TAMP) problem is an important problem in robotics where an agent must plan through the hybrid action-space and long-horizon nature. Due to this, neither discrete search nor continuous-space optimization from classical motion planning is directly applicable individually [13]. Traditionally, this problem has been tackled with bilevel planning, where a high-level discrete task plan is refined with sampling-based refinement of continuous parameters [3, 15]. However, the large branching factor in TAMP domains can result in a prohibitively high computing cost for practical applications without access to task-informed priors to guide the planning process.

To address this, recent works [13,18] propose to use a learned model to guide planning. These approaches can efficiently plan with a greatly reduced compute cost by prioritizing predicate evaluations and generating continuous-space parameters from learned priors. Even when armed with such priors, however, conventional autoregressive planning-based approaches still suffer from (1) the expense of building a full domain specification of the downstream task, (2) the high cost of planning for each step in long-horizon domains, especially due to backtracking whenever downward refinement results in a failure (See Figure 1 (a)).

Figure 1. (a) Baselines[13,18] works as to generate actions in two-stage (i.e., high-level discrete skill parameters and low-level motion variables are not determined at once, but work in staged manner, where action primitives come out and then action parameters later). Also, these actions are given as single-step manner. (b) Our method outputs actions all at once, considering across-time steps and high/low-level action information.

Figure 1. (a) Baselines[13,18] works as to generate actions in two-stage (i.e., high-level discrete skill parameters and low-level motion variables are not determined at once, but work in staged manner, where action primitives come out and then action parameters later). Also, these actions are given as single-step manner. (b) Our method outputs actions all at once, considering across-time steps and high/low-level action information.

An alternative paradigm has recently been proposed in motion planning, where the planning problem is cast as inference [1]. By sampling from a learned generative model, works such as Diffuser [1] address issues arising from auto-regressive branching by inferring the full trajectory at once. Moreover, since the planning algorithm can be directly learned from data, we can forgo the expense of constructing a full domain specification and learn solely from demonstration data. While this paradigm has been shown to be effective in motion planning, its efficacy has yet to be demonstrated in TAMP domains, mainly due to the difficulty of considering hybrid action spaces.

To resolve this, we leverage recent advances in diffusion models, such as D3PM [6], to handle discrete decision-making problems for solving TAMP problems. By adopting hybrid-space diffusion, we expect that our approach will have the following two main benefits:

  1. Instead of bi-level planning, simultaneously optimizing high-level discrete variables and low-level continuous parameters via diffusion allows both levels to influence each other, improving the plan consistency and eventually boosting the performance.
  2. By predicting actions at the trajectory level, our approach can overcome the issue of exponential planning time in long-horizon domains with hybrid action spaces, which requires an enormous search for continuous variables.

The overview of our approach, compared to the baselines, is illustrated in Figure 1.


Overview

Distill demonstration into the diffusion-based planner

We acquire demonstration data in order to learn the prior over a feasible plan. While several demonstration data in robotics domains have been publicly available recently, they often do not include ground-truth labels on high-level discrete skill parameters and low-level motion variables. Therefore, such existing data are incompatible with our proposed setup, which requires learning discrete and continuous actions with different temporal abstractions.

To resolve this, we distill expert data into our diffusion-based policy: this paradigm has been recently introduced in TAMP among works such as Optimus [14]. Specifically, we first generate data based on heuristics and then utilize the generated data to train a diffusion policy. This process can be illustrated as Figure 2.

Figure 2.  Conceptual Illustration of diffusion-based policy We distill the demonstration dataset into our diffusion-based policy, which takes state and action pairs.

Figure 2. Conceptual Illustration of diffusion-based policy We distill the demonstration dataset into our diffusion-based policy, which takes state and action pairs.

Hybrid Model Architecture for Discrete and Continuous Actions

We seek to develop and train a diffusion-based policy architecture in order to distill demonstration data successfully with hybrid action spaces. To this end, the primary challenge is to incorporate discrete decision variables. While several works consider the problem of diffusion-based policy architectures for short-horizon, continuous-action domains [1,11], To our knowledge, no prior work has shown that they can extend to discrete, long-horizon domains that are prevalent across TAMP settings.

Recently, D3PM [6] has been proposed to address the diffusion problem in discrete spaces. In particular, they define the forward process in diffusion among categorical variables as adding a small noise in the Markov chain, then learn the reverse process as usual in diffusion. By adopting this approach for discrete variables and the DDIM approach for continuous variables, we can achieve hybrid action space.

Our model's trajectory involves a sequence of actions that cannot be directly manipulated with noise addition techniques like those used in D3PM or DDIM due to their hybrid variable characteristics. To address this, we differentiate between two different types of variables in the trajectory and apply noise to each type separately.

We described the details of forward and reverse processes and loss computation.