TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract)

Authors

  • Pranshu Danani Indian Institute of Technology, Roorkee
  • Yash Bansal Indian Institute of Technology, Roorkee
  • Parshiv Kapoor Indian Institute of Technology, Roorkee

DOI:

https://doi.org/10.1609/aaai.v40i48.42204

Abstract

Deep learning is increasingly applied to intraoperative and surgical video analysis to enable real-time workflow recognition, and decision support for improved surgical precision. A key direction is modeling surgical activity as triplets of instrument, action, and target, which provide a richer representation of procedures. However, existing approaches often depend on bounding-box annotations or lack temporal context. We propose TWiST (Temporal Weakly Supervised Triplet detection), a framework that combines weakly supervised instrument localization, temporal attention for triplet prediction, and grounding of triplets with detected instruments. Our experiments show that TWiST outperforms prior weakly supervised baselines.

Downloads

Published

2026-03-14

How to Cite

Danani, P., Bansal, Y., & Kapoor, P. (2026). TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41174–41176. https://doi.org/10.1609/aaai.v40i48.42204