The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning

Abstract

We present Mixture of Discrete-time Gaussian Processes (MiDiGaP), a novel approach for flexible policy representation and imitation learning in robot manipulation. MiDiGaP enables learning from as few as five demonstrations using only camera observations and generalizes across a wide range of challenging tasks. It excels at long-horizon behaviors such as making coffee, highly constrained motions such as opening doors, dynamic actions such as scooping with a spatula, and multimodal tasks such as hanging a mug. MiDiGaP learns these tasks on a CPU in less than a minute and scales linearly to large datasets. We also develop a rich suite of tools for inference-time steering using evidence such as collision signals and robot kinematic constraints. This steering enables novel generalization capabilities, including obstacle avoidance and cross-embodiment policy transfer. MiDiGaP achieves state-of-the-art performance on diverse few-shot manipulation benchmarks. On constrained RLBench tasks, it improves policy success by 76 percentage points and reduces trajectory cost by 67%. On multimodal tasks, it improves policy success by 48 percentage points and increases sample efficiency by a factor of 20. In cross-embodiment transfer, it more than doubles policy success.

Overview

Mixtures of Discrete-Time Gaussian Processes (MiDiGaP) is:

Sample-efficient: only 5 demos needed.
Fast: fitting a policy in less than one minute.
Riemannian: models orientation properly.
Multimodal: learns multimodal behaviors from few samples.
Interpretable: easy to interpret, debug, and extend.
Generalizable: across task environments, object instances, clutter, and more.
Probabilistic: enables effective inference-time steering, like

Collision Avoidance, and
Embodiment Transfer.

Dynamic: high fidelity and save execution enable dynamic task like scooping and pouring.

Technical Approach

Object-Centric Policy Learning: We leverage object-centric multi-stream learning to efficiently generalization across the task space. For example, for opening a microwave, we model the trajectory distribution both from the perspective of the end-effector and the microwave.

Discrete-Time Gaussian Processes (DiGaP) model the distribution with higher density than Gaussian Mixture Models, thus achieving higher expressivity while being computationally efficient. The dense modeling enables solving highly constrained task, like opening doors or pouring in a spiral. In contrast to continuous Gaussian Processes, they are not restraint by a kernel function.

Mixtures of Discrete-Time Gaussian Processes (MiDiGaP) model multimodal trajectory distributions from few samples.

Inference-Time Updating

Inference-time updating: MiDiGaP can be efficiently updated during inference with new evidence, such as collision information or kinematic constraints. For arbitrary evidence (including kinematic feasibility, non-convex collisions, etc.), we update the distribution over the modes of the policy to avoid infeasible modes. If the evidence is convex (like reachability and convex collisions), we adapt the predicted trajectories of each mode via constrained Gaussian updating, thus providing compliance with the new constraints while still ensuring task success. Updating is fast, and constraints can be combined and applied continuously during inference.

Embodiment Transfer

Variance-Aware Path Optimization (VAPOR): MiDiGaP predicts a distribution over trajectories, which can be used to optimize the path of the robot. By maximizing the probability of the robot's trajectory under this distribution while ensuring kinematic feasibility, we can adapt a learned policy to new robot embodiments at inference-time.

Video

Code

For academic usage a software implementation of this project based on PyTorch will soon be realesed in our GitHub repository and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Model downloads will soon be available below.

Publications

If you find our work useful, please consider citing our paper:

Jan Ole von Hartz, Adrian Röfer, Joschka Boedecker Abhinav Valada,

The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning
Under review for publication, 2025.

(PDF) (BibTeX)

Authors