SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
DOI:
https://doi.org/10.1609/aaai.v40i22.38904Abstract
Vision-Language-Action (VLA) models have advanced in robotic manipulation, yet practical deployment remains hindered by two key limitations: **1) perceptual redundancy**, where irrelevant visual inputs are processed inefficiently, and **2) superficial instruction-vision alignment**, which hampers semantic grounding of actions. In this paper, we propose **SemanticVLA**, a novel VLA framework that performs Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation. Specifically: **1)** To sparsify redundant perception while preserving semantic alignment, **Semantic-guided Dual Visual Pruner (SD-Pruner)** performs: Instruction-driven Pruner (ID-Pruner) extracts global action cues and local semantic anchors in SigLIP; Spatial-aggregation Pruner (SA-Pruner) compacts geometry-rich features into task-adaptive tokens in DINOv2. **2)** To exploit sparsified features and integrate semantics with spatial geometry, **Semantic-complementary Hierarchical Fuser (SH-Fuser)** fuses dense patches and sparse tokens across SigLIP and DINOv2 for coherent representation. **3)** To enhance the transformation from perception to action, **Semantic-conditioned Action Coupler (SA-Coupler)** replaces the conventional observation-to-DoF approach, yielding more efficient and interpretable behavior modeling for manipulation tasks. Extensive experiments on simulation and real-world tasks show that SemanticVLA sets a new SOTA in both performance and efficiency. SemanticVLA surpasses OpenVLA on LIBERO benchmark by **21.1%** in success rate, while reducing training cost and inference latency by **3.0×** and **2.7×**.Downloads
Published
2026-03-14
How to Cite
Li, W., Zhang, R., Shao, R., Fang, Z., Zhou, K., Tian, Z., & Nie, L. (2026). SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18397–18405. https://doi.org/10.1609/aaai.v40i22.38904
Issue
Section
AAAI Technical Track on Intelligent Robotics