MemoVision: A Digital Catalog for Everyday Interactions
DOI:
https://doi.org/10.1609/aaai.v40i48.42368Abstract
We present MemoVision, a digital catalog system that captures semantic, spatial, temporal and interaction information as users move around physical environments using client devices such as smart glasses. The system utilizes open-vocabulary semantic segmentation and 3D scans to store objects-of-interest with comprehensive semantic, spatial, temporal and interaction labels. Our demonstration shows multimodal information query and retrieval capabilities, supporting specific queries about object locations, temporal events and user interactions including eye gaze and hand poses, enabling more contextualized responses compared to current multimodal large language models.Downloads
Published
2026-03-14
How to Cite
Ng, L. X., Tang, K. T. W., & Tan, J. J. W. (2026). MemoVision: A Digital Catalog for Everyday Interactions. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41646–41648. https://doi.org/10.1609/aaai.v40i48.42368