FVNet: Harnessing Liquid Neural Dynamics for Lightweight Visual Representation

Authors

  • Zhenzhe Hou School of Information and Electronics, Beijing Institute of Technology, Beijing, China
  • Xiaohui Chu School of Information and Electronics, Beijing Institute of Technology, Beijing, China
  • Runze Hu School of Information and Electronics, Beijing Institute of Technology, Beijing, China
  • Yang Li Department of Electronic Engineering, Tsinghua University, Beijing, China
  • Yutao Liu School of Computer Science and Technology, Ocean University of China, Qingdao, China

DOI:

https://doi.org/10.1609/aaai.v40i6.42481

Abstract

Efficient visual backbone design remains crucial for resource-constrained computer vision applications. Inspired by the adaptive continuous-time dynamics observed in biological neurons, we propose FVNet, a novel lightweight architecture that integrates liquid neural dynamics for efficient and dynamic visual feature extraction. Central to FVNet is the Fluid Temporal Flow Unit (FTFU), which employs continuous-time equations with learnable time constants to capture spatio-temporal dependencies adaptively. By further stacking these units in a Multi-Phase Fluid Block (MPFB), our model processes features across parallel temporal scales, enabling context-aware feature encoding without incurring excessive computational overhead. Through a discrete closed-form solution, FVNet achieves the representational power of continuous-time models while avoiding the instability and overhead of iterative numerical solvers. Extensive experiments on various vision tasks demonstrate that FVNet achieves superior performance and efficiency over existing state-of-the-art lightweight networks.

Downloads

Published

2026-03-14

How to Cite

Hou, Z., Chu, X., Hu, R., Li, Y., & Liu, Y. (2026). FVNet: Harnessing Liquid Neural Dynamics for Lightweight Visual Representation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4789–4797. https://doi.org/10.1609/aaai.v40i6.42481

Issue

Section

AAAI Technical Track on Computer Vision III