Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)

Authors

  • Yue Cao The University of British Columbia
  • Yanshuo Fan The University of British Columbia
  • Junchi Bin The University of British Columbia
  • Zheng Liu The University of British Columbia

DOI:

https://doi.org/10.1609/aaai.v37i13.26946

Keywords:

Transformer, Computer Vision, Multi-modal Fusion, Objection Detection

Abstract

It has become a common practice for many perceptual systems to integrate information from multiple sensors to improve the accuracy of object detection. For example, autonomous vehicles use visible light, and infrared (IR) information to ensure that the car can cope with complex weather conditions. However, the accuracy of the algorithm is usually a trade-off between the computational complexity and memory consumption. In this study, we evaluate the performance and complexity of different fusion operators in multi-modal object detection tasks. On top of that, a Poolformer-based fusion operator (PoolFuser) is proposed to enhance the accuracy of detecting targets without compromising the efficiency of the detection framework.

Downloads

Published

2024-07-15

How to Cite

Cao, Y., Fan, Y., Bin, J., & Liu, Z. (2024). Lightweight Transformer for Multi-Modal Object Detection (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16172-16173. https://doi.org/10.1609/aaai.v37i13.26946