LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit

Chengtao Lv; Bilang Zhang; Yang Yong; Ruihao Gong; Yushi Huang; Shiqiao Gu; Jiajun Wu; Yumeng Shi; Jinyang Guo; Wenya Wang

doi:10.1609/aaai.v40i29.39598

Authors

Chengtao Lv Nanyang Technological University SenseTime Research
Bilang Zhang Beihang University SenseTime Research
Yang Yong SenseTime Research
Ruihao Gong Beihang University SenseTime Research
Yushi Huang SenseTime Research Hong Kong University of Science and Technology
Shiqiao Gu SenseTime Research
Jiajun Wu Beihang University
Yumeng Shi Nanyang Technological University
Jinyang Guo Beihang University
Wenya Wang Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v40i29.39598

Abstract

Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five representative VLM families and enables systematic study of token-level and model-level compression. Our benchmark reveals that: (1) Spatial and temporal redundancies demand distinct technical strategies. (2) Token reduction methods degrade significantly in multi-turn dialogue and detail-sensitive tasks. (3) Combining token and model compression achieves extreme compression with minimal performance loss. We believe LLMC+ will facilitate fair evaluation and inspire future research in efficient VLM.

LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information