RAGG: Retrieval-Augmented Grasp Generation Model

Zhenhua Tang; Bin Zhu; Yanbin Hao; Chong-Wah Ngo; Richang Hong

doi:10.1609/aaai.v39i7.32786

Authors

Zhenhua Tang Hefei University of Technology
Bin Zhu Singapore Management University
Yanbin Hao University of Science and Technology of China
Chong-Wah Ngo Singapore Management University
Richang Hong Hefei University of Technology

DOI:

https://doi.org/10.1609/aaai.v39i7.32786

Abstract

Intent-based grasp generation inherently involves challenges such as manipulation ambiguity and modality gaps. To address these, we propose a novel Retrieval-Augmented Grasp Generation model (RAGG). Our key insight is that when humans manipulate new objects, they initially mimic the interaction patterns observed in similar objects, then progressively adjust hand-object contact. Consequently, we develop RAGG as a two-stage approach, encompassing retrieval-guided generation and structurally stable grasp refinement. In the first stage, we propose a Retrieval-Augmented Diffusion Model (ReDim), which identifies the most relevant interaction instance from a knowledge base to explicitly guide grasp generation, thereby mitigating ambiguity and bridging modality gaps to ensure semantically correct manipulation. In the second stage, we introduce a Progressive Refinement Network (PRN) with Kolmogorov-Arnold Network (KAN) layers to refine the generated coarse grasp, employing a Structural Similarity Index loss to constrain the spatial relationship between the hand and the object, thus ensuring the stability of the grasp. Extensive experiments on the OakInk and GRAB benchmarks demonstrate that RAGG achieves superior results compared to state-of-the-art approach, indicating not only better physical feasibility and controllability but also strong generalization and interpretability for unseen objects.

RAGG: Retrieval-Augmented Grasp Generation Model

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information