FloorPlanFormer: Multi-Task Transformer Network for Floor Plan Recognition with Outer-to-Inner Feature Refinement

Authors

  • Yun Liang South China Agricultural University
  • ZiHao Wu South China Agricultural University
  • Run Zheng South China Agricultural University
  • Shuai Xie South China Agricultural University
  • Bo Hong South China Agricultural University
  • Yishen Lin South China Agricultural University

DOI:

https://doi.org/10.1609/aaai.v40i9.37625

Abstract

Floor plan recognition requires accurate segmentation and classification of entrance doors, outer contours (walls and windows) and inner contours (various room types) , despite strong spatial dependencies and large stylistic differences between different datasets. To overcome these challenges, we propose FloorPlanFormer, a multi-task learning network divided into three phases: the first phase introduces a Swin Transformer backbone with a pixel decoder to extract fine-grained pixel-level semantics; the second phase employs prompt encoder and mask decoder, and a novel Global Contextual Attention Module (GCAM) is designed to generate clear, high-quality outer contour masks; the third stage uses mask transformer decoder to recognize targets and designs a Masked Feature Refinement Module (MFRM) to accurately delineate the inner contour by modeling the relationship between the local inner and outer contours. Finally, we constructed FloorPlan8K, a dataset containing 8200 images and 77434 instances, on which our model was trained and evaluated, and the results greatly outperformed the state-of-the-art general segmentation methods and specialized methods.

Downloads

Published

2026-03-14

How to Cite

Liang, Y., Wu, Z., Zheng, R., Xie, S., Hong, B., & Lin, Y. (2026). FloorPlanFormer: Multi-Task Transformer Network for Floor Plan Recognition with Outer-to-Inner Feature Refinement. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 6916–6924. https://doi.org/10.1609/aaai.v40i9.37625

Issue

Section

AAAI Technical Track on Computer Vision VI