TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract)
DOI:
https://doi.org/10.1609/aaai.v40i48.42204Abstract
Deep learning is increasingly applied to intraoperative and surgical video analysis to enable real-time workflow recognition, and decision support for improved surgical precision. A key direction is modeling surgical activity as triplets of instrument, action, and target, which provide a richer representation of procedures. However, existing approaches often depend on bounding-box annotations or lack temporal context. We propose TWiST (Temporal Weakly Supervised Triplet detection), a framework that combines weakly supervised instrument localization, temporal attention for triplet prediction, and grounding of triplets with detected instruments. Our experiments show that TWiST outperforms prior weakly supervised baselines.Downloads
Published
2026-03-14
How to Cite
Danani, P., Bansal, Y., & Kapoor, P. (2026). TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41174–41176. https://doi.org/10.1609/aaai.v40i48.42204
Issue
Section
AAAI Student Abstract and Poster Program