Composed Image Retrieval with Text Feedback via Multi-Grained Uncertainty Regularization
1Sea-NExT Joint Lab, National University of Singapore
2Tsinghua University
3Faculty of Science and Technology, and Institute of Collaborative Innovation, University of Macau
Code[GitHub]
Paper [arXiv]
Cite [BibTeX]

Abstract

We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e., fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate.

In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization.


Architecture

Our main contributions are the uncertainty modeling via augmenter, and the uncertainty regularization for coarse matching. Our model applies both the fine-grained matching and the proposed coarse-grained uncertainty regularization, facilitating the model training.

The overview of our network.


Results

Without loss of generability, we verify the effectiveness of the proposed method on the fashion datasets, which collect the feedback from customers easily, including FashionIQ, Fashion200k and Shoes. Each image in these fashion datasets is tagged with descriptive texts as product description, such as 'similar style t-shirt but white logo print'.

Results on FashionIQ.

Results on Fashion200k and Shoes.


Paper

Y. Chen, Z. Zheng, W. Ji, L. Qu, T. Chua.
Composed Image Retrieval with Text Feedback via Multi-Grained Uncertainty Regularization.
ICLR, 2024 [ArXiv].