Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)
Keywords:Transformer, Computer Vision, Multi-modal Fusion, Objection Detection
AbstractIt has become a common practice for many perceptual systems to integrate information from multiple sensors to improve the accuracy of object detection. For example, autonomous vehicles use visible light, and infrared (IR) information to ensure that the car can cope with complex weather conditions. However, the accuracy of the algorithm is usually a trade-off between the computational complexity and memory consumption. In this study, we evaluate the performance and complexity of different fusion operators in multi-modal object detection tasks. On top of that, a Poolformer-based fusion operator (PoolFuser) is proposed to enhance the accuracy of detecting targets without compromising the efficiency of the detection framework.
How to Cite
Cao, Y., Fan, Y., Bin, J., & Liu, Z. (2023). Lightweight Transformer for Multi-Modal Object Detection (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16172-16173. https://doi.org/10.1609/aaai.v37i13.26946
AAAI Student Abstract and Poster Program