Unsupervised Photometric-Consistent Depth Estimation from Endoscopic Monocular Video

Authors

  • Shijie Li College of Computer Science, Sichuan University, Chengdu, China
  • Weijun Lin College of Computer Science, Sichuan University, Chengdu, China
  • Qingyuan Xiang College of Computer Science, Sichuan University, Chengdu, China
  • Yunbin Tu School of Computer Science and Technology, University of Chinese Academy of Sciences Beijing, China
  • Shitan Asu College of Computer Science, Sichuan University, Chengdu, China
  • Zheng Li College of Computer Science, Sichuan University, Chengdu, China

DOI:

https://doi.org/10.1609/aaai.v39i5.32521

Abstract

Recent advancements in unsupervised monocular depth estimation typically rely on an assumption that image photometry remains consistent across consecutive frames. However, this assumption often fails in endoscopic scenes due to: 1) local photometric inconsistency caused by specular reflections creating highlights; and 2) global photometric inconsistency resulting from the simultaneous movement of the light source and the camera. Since unsupervised depth estimation methods rely on appearance discrepancies between frames as a supervisory signal, these photometric inconsistencies inevitably deteriorate loss function calculation. In this paper, our goal is to obtain a strong and reliable supervisory signal for achieving photometric-consistent depth estimation. To this end, for local photometric inconsistency, we utilize the specular reflection model to introduce a Highlight Loss for handling the estimation of highlight regions. For global photometric inconsistency, we design a Photometric Match module, which utilizes the spotlight illumination model to derive an analytical expression, achieving photometric alignment across different frames. Unlike previous works that introduce additional optical flow or networks, our method is simpler and more efficient. Extensive experiments demonstrate our method achieves the state-of-the-art results on C3VD, SCARED and SERV-CT datasets.

Downloads

Published

2025-04-11

How to Cite

Li, S., Lin, W., Xiang, Q., Tu, Y., Asu, S., & Li, Z. (2025). Unsupervised Photometric-Consistent Depth Estimation from Endoscopic Monocular Video. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4923–4931. https://doi.org/10.1609/aaai.v39i5.32521

Issue

Section

AAAI Technical Track on Computer Vision IV