An Embedding-Unleashing Video Polyp Segmentation Framework via Region Linking and Scale Alignment
DOI:
https://doi.org/10.1609/aaai.v38i2.27942Keywords:
CV: Medical and Biological Imaging, CV: SegmentationAbstract
Automatic polyp segmentation from colonoscopy videos is a critical task for the development of computer-aided screening and diagnosis systems. However, accurate and real-time video polyp segmentation (VPS) is a very challenging task due to low contrast between background and polyps and frame-to-frame dramatic variations in colonoscopy videos. We propose a novel embedding-unleashing framework consisting of a proposal-generative network (PGN) and an appearance-embedding network (AEN) to comprehensively address these challenges. Our framework, for the first time, models VPS as an appearance-level semantic embedding process to facilitate generate more global information to counteract background disturbances and dramatic variations. Specifically, PGN is a video segmentation network to obtain segmentation mask proposals, while AEN is a network we specially designed to produce appearance-level embedding semantics for PGN, thereby unleashing the capability of PGN in VPS. Our AEN consists of a cross-scale region linking (CRL) module and a cross-wise scale alignment (CSA) module. The former screens reliable background information against background disturbances by constructing linking of region semantics, while the latter performs the scale alignment to resist dramatic variations by modeling the center-perceived motion dependence with a cross-wise manner. We further introduce a parameter-free semantic interaction to embed the semantics of AEN into PGN to obtain the segmentation results. Extensive experiments on CVC-612 and SUN-SEG demonstrate that our approach achieves better performance than other state-of-the-art methods. Codes are available at https://github.com/zhixue-fang/EUVPS.Downloads
Published
2024-03-24
How to Cite
Fang, Z., Guo, X., Lin, J., Wu, H., & Qin, J. (2024). An Embedding-Unleashing Video Polyp Segmentation Framework via Region Linking and Scale Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1744–1752. https://doi.org/10.1609/aaai.v38i2.27942
Issue
Section
AAAI Technical Track on Computer Vision I