Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)

Yue Cao; Yanshuo Fan; Junchi Bin; Zheng Liu

doi:10.1609/aaai.v37i13.26946

Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)

Authors

Yue Cao The University of British Columbia
Yanshuo Fan The University of British Columbia
Junchi Bin The University of British Columbia
Zheng Liu The University of British Columbia

DOI:

https://doi.org/10.1609/aaai.v37i13.26946

Keywords:

Transformer, Computer Vision, Multi-modal Fusion, Objection Detection

Abstract

It has become a common practice for many perceptual systems to integrate information from multiple sensors to improve the accuracy of object detection. For example, autonomous vehicles use visible light, and infrared (IR) information to ensure that the car can cope with complex weather conditions. However, the accuracy of the algorithm is usually a trade-off between the computational complexity and memory consumption. In this study, we evaluate the performance and complexity of different fusion operators in multi-modal object detection tasks. On top of that, a Poolformer-based fusion operator (PoolFuser) is proposed to enhance the accuracy of detecting targets without compromising the efficiency of the detection framework.

Downloads

Published

2024-07-15

How to Cite

Cao, Y., Fan, Y., Bin, J., & Liu, Z. (2024). Lightweight Transformer for Multi-Modal Object Detection (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16172-16173. https://doi.org/10.1609/aaai.v37i13.26946

Download Citation

Issue

Vol. 37 No. 13: AAAI-23 Special Programs, IAAI-23, EAAI-23, Student Papers and Demonstrations

Section

AAAI Student Abstract and Poster Program

Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription