Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Orr Krupnik, Elisei Shafer, Tom Jurgenson, Aviv Tamar

ECE Technion

Accepted to the 2023 Conference on Robot Learning

Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.

Model Adaptation with the Cross-Entropy Method

MACE aims to recover posteriors over robotic task descriptions by fine-tuning a pre-trained generative model representing the prior. MACE requires access to a pre-trained generative model and a simulator generating observations from task parameters. The MACE update rule is given below: 

We optimize this objective iteratively using SGD and the Cross-Entropy Method: at each iteration, we sample task parameters (x) from the current model, produce observations using the simulator and score them compared to a given observation matchin the ground-truth task parameters. We then take the top-scoring observations (similarly to CEM) and optimize the model parameters with them using SGD. 

The full MACE algorithm is given on the right, with the full objective on line 6.  See Section 3.2 in the paper for additional details.

Inferring Object Shapes by Grasping

We use MACE to infer object shapes given contact points of robot fingers with the object surface. Using MACE-VAE, we tune a model pre-trained on the ShapeNet Airplae class, aiming to produce samples which better match a given set of contact points. Below are samples from the prior, posterior and CVAE baseline model. The model tuned by MACE produces objects matching the given contact points, with higher diversity than the CVAE baseline. For more details, see Sec. 4.1 of the paper.  

Samples from the prior (pre-trained VAE)

Samples from the VAE tuned by MACE

Samples from the CVAE baseline

Samples from the various models. Object point clodus are shown in white; orange points show the simulated robot "fingers" and the contact points with the ground truth object, given as the observation o' for the score function calculations (and as the condition for the CVAE baseline). 

Out of Distribution Experiment

To showcase the advantage of MACE over the CVAE baseline, we use a different observation (fifth finger in the back instead of front of airplane), which is OOD w.r.t. the CVAE training set. The model tuned by MACE still manages to produce good samples matching the new observation, while the CVAE fails completely. 

Samples from the VAE tuned by MACE

Samples from the CVAE baseline

Inverse Kinematics with Obstacles

Using 1M pairs of valid robot configurations and matching end-effector positions collected in PyBullet with no obstacles present, we train an autoregressive inverse kinematics prediction model, which outputs a set of robot joint positions given an end-effector position. We tune this model using MACE to avoid collision with novel obstacles: given an observation o' describing an obstacle position, the score function is 0 whenever the robot is in collision, and proportional to the distance from the desired position otherwise (see details in Sec. 4.2 of the paper). 

On the left below are samples of robot configurations from the prior (pre-trained, untuned) IK model for two obstacle configurations in PyBullet: Window and Wall. Most of them collide with the obstacles, and are unsuitable for reaching the target end-effector pose. On the right, matching samples are shown from the model tuned with MACE, which do not collide with the obstacles but still provide a diversity of solutions. See Sec. 4.2 and Appendix B of the paper for more results.   

Window with prior samples

Wall with prior samples

Window with tuned model samples

Wall with tuned model samples

Real Robot IK Demonstration

We demonstrate the usability of MACE on a real-world Franka Panda robot by planning to a joint configuration obtained by fine-tuning a generative model with MACE. 

The generative model used is the same one described above and in the paper (Sec. 4.2). Note that it was trained on a dataset collected without obstacles, consisting of pairs of joint poses and end-effector positions collected in a different simulator (PyBullet) compared to the one presented here (IsaacGym)

Below are samples of robot configurations from the prior (pre-trained, untuned) IK model for two separate target poses (for matching box poses). Most of them collide with the box, and are unsuitable for reaching the target end-effector pose.

Prior samples: box pose 1

Prior samples: box pose 2

Below are robot configurations from the posterior model tuned by MACE. These are both valid configurations in simulation (they do not collide with the box, and reach the target end-effector position). In addition, as described in the paper, MACE can find these configurations faster than the MoveIt inverse kinematics calculation. 

Configuration found by MACE: box pose 1

Configuration found by MACE: box pose 2

Below are real robot results, applying MoveIt to plan a trajectory with the IK produced by MACE. The box position was measured manually (please allow some time for the GIFs to load, may take up to 20 seconds).

Real-world reaching of target pose: box pose 1

Real-world reaching of target pose: box pose 2

Citation:

@inproceedings{krupnik2023fine,

  title={Fine-Tuning Generative Models as an Inference Method for Robotic Tasks},

   author={Krupnik, Orr and Shafer, Elisei and Jurgenson, Tom and Tamar, Aviv},

   booktitle={7th Annual Conference on Robot Learning},

   year={2023}

}