Connect With Aresty

RAD Collaboratory SURF
Object-Object Physical Interaction Prediction with Transformers
Project Summary
Deep Interaction Prediction Network (DIPN, https://arxiv.org/pdf/2011.04692) demonstrates impressive capability in predicting multi-object dynamics during robotic pushing, using modular MLPs to model both direct and interactive transformations among objects. However, its MLP-based interaction modules treat object-object relations in a pairwise and static manner, limiting scalability and expressiveness when complex, long-range dependencies occur. This project proposes to replace DIPN’s MLP interaction blocks with transformer architectures, leveraging self-attention to dynamically weight and propagate features across all objects in a scene. By encoding object embeddings and push actions as tokens within a transformer encoder-decoder, we expect to achieve a richer modeling of collective motion and more robust generalization to diverse object configurations. The research will involve adapting DIPN’s pipeline, training both in simulation and on real hardware, and evaluating improvements in predictive accuracy (IoU) and downstream clutter removal efficiency.


Sign in to view more information about this project.