FloorPlanFormer: Multi-Task Transformer Network for Floor Plan Recognition with Outer-to-Inner Feature Refinement

Yun Liang; ZiHao Wu; Run Zheng; Shuai Xie; Bo Hong; Yishen Lin

doi:10.1609/aaai.v40i9.37625

Authors

Yun Liang South China Agricultural University
ZiHao Wu South China Agricultural University
Run Zheng South China Agricultural University
Shuai Xie South China Agricultural University
Bo Hong South China Agricultural University
Yishen Lin South China Agricultural University

DOI:

https://doi.org/10.1609/aaai.v40i9.37625

Abstract

Floor plan recognition requires accurate segmentation and classification of entrance doors, outer contours (walls and windows) and inner contours (various room types) , despite strong spatial dependencies and large stylistic differences between different datasets. To overcome these challenges, we propose FloorPlanFormer, a multi-task learning network divided into three phases: the first phase introduces a Swin Transformer backbone with a pixel decoder to extract fine-grained pixel-level semantics; the second phase employs prompt encoder and mask decoder, and a novel Global Contextual Attention Module (GCAM) is designed to generate clear, high-quality outer contour masks; the third stage uses mask transformer decoder to recognize targets and designs a Masked Feature Refinement Module (MFRM) to accurately delineate the inner contour by modeling the relationship between the local inner and outer contours. Finally, we constructed FloorPlan8K, a dataset containing 8200 images and 77434 instances, on which our model was trained and evaluated, and the results greatly outperformed the state-of-the-art general segmentation methods and specialized methods.

FloorPlanFormer: Multi-Task Transformer Network for Floor Plan Recognition with Outer-to-Inner Feature Refinement

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information