MFINet: Multi-view Fusion and 2D–3D Interaction Enhancement for Real-Time LiDAR Semantic Segmentation
DOI:
https://doi.org/10.1609/aaai.v40i10.37723Abstract
LiDAR semantic segmentation is a key task in advanced autonomous driving systems. Projection-based methods exhibit real-time potential due to their efficiency, but suffer from inevitable 3D information loss and rely on time-consuming post-processing, limiting overall performance. To address this, we propose MFINet, a real-time semantic segmentation network based on multi-view fusion and 2D-3D interaction enhancement. It adopts a three-branch architecture that integrates 3D Point View (3D-PV), 2D Bird’s Eye View (2D-BEV) and 2D Range View (2D-RV) to make full use of 2D and 3D representation. From 3D to 2D, we design a 3D Point Feature Projector (3DPFP), which injects 3D features into the 2D BEV and RV pseudo-images to retain effective 3D information. From 2D to 3D, a Feature Enhancement (FE) module is designed to leverage the advantages of 2D information in extracting geometric and semantic features. We also introduce a 2D-3D Fusion Head (FH) to aggregate point features from multiple views. Besides, we incorporate a Multi-Scale Dilated Attention (MSDA) module with a sliding window strategy to enhance feature discrimination. Extensive experiments on the SemanticKITTI and NuScenes benchmarks demonstrate that MFINet outperforms existing methods on the SemanticKITTI, NuScenes val set and achieves competitive results on the NuScenes test set.Published
2026-03-14
How to Cite
Ma, N., Liu, Z., & Han, Y. (2026). MFINet: Multi-view Fusion and 2D–3D Interaction Enhancement for Real-Time LiDAR Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(10), 7800-7808. https://doi.org/10.1609/aaai.v40i10.37723
Issue
Section
AAAI Technical Track on Computer Vision VII