Language-Driven Robotics: Dynamic Closed-Loop Control for Complex Manipulation
Author(s)
Barroso, Pierre
Advisor(s)
Pradalier, Cedric
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
This thesis presents a novel framework for integrating language-driven control with dynamic closed-loop manipulation, enabling robotic systems to adapt effectively to complex and changing environments. By leveraging advanced 3D scene representations, vision-language models, and innovative methodologies, the research addresses the challenges of grasping and manipulating novel objects in real-time.
The proposed approach combines Gaussian Splatting, a fast and efficient 3D representation technique, with CLIP embeddings to provide semantic understanding of the scene. Using Grounded-SAM2 for segmentation and Gaussian-based clustering for dynamic object detection, the system achieves high segmentation accuracy. Object tracking is handled with Co-Tracker 3, ensuring robust updates to object positions and transformations in dynamic scenes. These capabilities culminate in a grasp pose generation mechanism, allowing reliable execution of grasps on unseen objects.
Experimental results demonstrate the framework's effectiveness in rapid scene reconstruction, accurate segmentation, robust tracking, and high success rates in grasp execution. Despite successes, challenges such as training instabilities, hardware dependencies, and tracking drift highlight opportunities for further improvement.
This work advances language-driven robotics by integrating semantic understanding with dynamic manipulation. It lays the foundation for adaptive and intelligent robotic systems, with future directions including enhanced encoder/decoder models, improved dynamic scene representations, and expanded applications in fast-paced and complex tasks. The contributions open new avenues for real-world robotic applications requiring adaptability and precision.
Sponsor
Date
2024-12-16
Extent
Resource Type
Text
Resource Subtype
Thesis