TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

Qifeng Lei; Zhiyong Yang; Qianqian Xu; Cong Hua; Peisong Wen; Qingming Huang

doi:10.1609/aaai.v40i27.39444

Authors

Qifeng Lei School of Computer Science and Technology, University of Chinese Academy of Sciences
Zhiyong Yang School of Computer Science and Technology, University of Chinese Academy of Sciences
Qianqian Xu State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Cong Hua State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Peisong Wen School of Computer Science and Technology, University of Chinese Academy of Sciences
Qingming Huang School of Computer Science and Technology, University of Chinese Academy of Sciences State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i27.39444

Abstract

Efficiently fine-tuning pre-trained models for downstream tasks is a key challenge in the era of foundation models. Parameter-efficient fine-tuning (PEFT) presents a promising solution, achieving performance comparable to full fine-tuning by updating only a small number of adaptation weights per layer. Traditional PEFT methods typically rely on a single expert, where the adaptation weight is a low-rank matrix. However, for complex tasks, the data's inherent diversity poses a significant challenge for such models, as a single adaptation weight cannot adequately capture the features of all samples. To address this limitation, we explore how to integrate multiple small adaptation experts into a compact structure to defeat a large adapter. Specifically, we propose Tucker Adaptation (TuckA), a method with four key properties: (i) We use Tucker decomposition to create a compact 3D tensor where each slice naturally serves as an expert. The low-rank nature of this decomposition ensures that the number of parameters scales efficiently as more experts are added. (ii) We introduce a hierarchical strategy that organizes these experts into groups at different granularities, allowing the model to capture both local and global data patterns. (iii) We develop an efficient batch-level routing mechanism, which reduces the router's parameter size by a factor of L compared to routing at every adapted layer (where L is the number of adapted layers) (iv) We propose data-aware initialization to achieve loss-free expert load balancing based on theoretical analysis. Extensive experiments on benchmarks in natural language understanding, image classification, and mathematical reasoning speak to the efficacy of TuckA, offering a new and effective solution to the PEFT problem.

TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information