MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Kanghyun Choi; Hyeyoon Lee; Dain Kwon; SunJong Park; Kyuyeun Kim; Noseong Park; Jonghyun Choi; Jinho Lee

doi:10.1609/aaai.v39i15.33761

Authors

Kanghyun Choi Seoul National University
Hyeyoon Lee Seoul National University
Dain Kwon Seoul National University
SunJong Park Seoul National University
Kyuyeun Kim Google
Noseong Park Korea Advanced Institute of Science & Technology
Jonghyun Choi Seoul National University
Jinho Lee Seoul National University

DOI:

https://doi.org/10.1609/aaai.v39i15.33761

Abstract

Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we observe that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From this observation, we find that aligning attention maps of synthetic data helps improve the overall performance of quantized ViTs. Motivated by this finding, we devise MimiQ, a novel DFQ method designed for ViTs that enhances inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention outputs from each spatial query patch. Then, we align the attention maps of the quantized network to those of the full-precision teacher by applying head-wise structural attention distillation. The experimental results show that the proposed method significantly outperforms baselines, setting a new state-of-the-art for ViT-DFQ.

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information