Fashion Microscope: Pixel-Level Attribute Perception via Optimal Transport and Neural Semantic Aggregation

Authors

  • Shuili Zhang Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Hongzhang Mu Institute of Information Engineering, Chinese Academy of Sciences
  • Jiawei Sheng Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Qianqian Tong Department of Strategic and Advanced Interdisciplinary Research, Peng Cheng Laboratory
  • Wenyuan Zhang Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Quangang Li Institute of Information Engineering, Chinese Academy of Sciences
  • Tingwen Liu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i19.38672

Abstract

Attribute-specific fashion retrieval aims to enhance fine-grained image retrieval by emphasizing the similarity of specific attributes. Current methods primarily rely on attention mechanisms to extract attribute-related visual features but face two key challenges: the limitations of coarse-grained localization in achieving fine-grained accuracy, and an imbalance between global and local perception, where excessive focus on local features can undermine overall performance. To address these issues, we propose the fashion microscope ProFashion, which achieves pixel-level attribute awareness through optimal transport and neural semantic aggregation. The framework begins by employing optimal transport to align semantic attributes with visual patterns from a global perspective, generating an attribute-visual value map that highlights distinctive regions while reducing interference. This is followed by simulating the human brain's perception of attribute feature patterns through superpixel generation and aggregation, capturing attribute-related features at the pixel semantic level and forming key semantic clusters that preserve microstructures. Building on this, an attribute graph is constructed to facilitate feature clustering, significantly enhancing the framework's capability to handle overlapping features and cross-scale relationships. Comprehensive experiments on the FashionAI, DeepFashion, and DARN datasets demonstrate the framework's effectiveness, achieving overall MAP improvements of 3.11%, 3.70%, and 3.49%, respectively. Additionally, the framework delivers relative average throughput gains of 26.94%, 22.22%, and 24.78% on the FashionAI, DeepFashion, and DARN datasets, respectively.

Published

2026-03-14

How to Cite

Zhang, S., Mu, H., Sheng, J., Tong, Q., Zhang, W., Li, Q., & Liu, T. (2026). Fashion Microscope: Pixel-Level Attribute Perception via Optimal Transport and Neural Semantic Aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16343–16351. https://doi.org/10.1609/aaai.v40i19.38672

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III