WAB'21: Proceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge


WAB'21: Proceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge

Full Citation in the ACM Digital Library

SESSION: Paper Presentations

Domain Balanced Sampling and Iterative Search for Product Identification

  • Litong Gong
  • Sheng Tang
  • Juan Cao

This paper introduces our solution to the 1st Workshop on Multimodal Product Identification in Livestreaming and Watch and Buy Challenge, a real-world task in a live stream scene and is challenging due to factors such as lighting, occlusion, and cross-domain. We model this task as the object detection and image retrieval problem. In the whole pipeline, we mainly focus on the problem of cross-domain retrieval and propose a domain-balanced sampling method, which enhances the robustness of multi-domain retrieval. Besides, to eliminate the influence of irrelevant clothing, we propose an iterative cross-search strategy, which greatly improves the accuracy of matching. In addition, we also experiment with the exploitation of text information, including multimodal product classification and multimodal intent recognition. With the aforementioned method, we achieved an F1 score of 69.2% and finally achieve first place in the competition.

Multimodal Product Identification: Submission to Watch and Buy 2021 Challenge

  • Jun Peng
  • Su Feng
  • Ya Wang
  • Haowen Hou
  • Fengzong Lian
  • Zhanhui Kang

This technical report describes the overview of our approach to the "Watch and Buy: Multimodal Product Identification Challenge". Specifically, we tackle this problem with a three-stage framework, i.e., product detection, retrieval and classification. For the product detection, we leverage the performance by Cascade R-CNN and deformable convolution to alleviate the impact of image distortion. For the product retrieval, we enhance the Multiple Granularity Network (MGN) with global and local context through IBN, SE and Non-local blocks. The task of product classification suffers from fashion variation. To this end, we propose to fuse the global feature of the integral images and local feature of products. Experiments demonstrate that our works could achieve competitive performance with the state-of-the-art methods and our overall approach achieves a F1 score of 0.648, ranking the second place in the final challenge.

3rd Solution to WAB Challenge: A Two Stage Model Using only Pixel-level Features for Video-Image Retrieval

  • Runqi Wang
  • Wei Zeng
  • Yuehan Yao

In this article, we propose an effective product retrieval system pipeline robust to large-scale E-commerce platform product data. Our proposed method consists of three parts: detection, retrieval, and post-processing. We first perform the detection task of precisely retrieving the target product and then using the model based on metric learning to retrieve the corresponding item. To improve the search robustness against noise and misleading bounding boxes, we apply post-processing methods such as weighted box fusion and feature connection. Using the proposed method, we won third place in the Watch and Buy: Multimodal Product Identification Challenge

Multimodal Region-level Clothing Re-identification in E-commerce Livestreaming

  • Hongwei Han
  • Xiu Li

This is a technical report on the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge. The technical pipeline includes, 1)object detection. 2)Product Reidentification. 3)Multimodal classification under multi-criteria supervision. 4)Smart sorting. The technical details and code documents will be shown in this report. Our team, "THU-TAOBAO MAN'', ranked 4th in the final season.

Watch and Buy: A Practical Solution for Real-time Fashion Product Identification in Live Stream

  • Jun Rao
  • Yue Cao
  • Shuhan Qi
  • Zeyu Dong
  • Tao Qian
  • Xuan Wang

"Watch and Buy: Multimodal Product Identification(WAB)" challenge is a new task in the field of cross-modal retrieval, which aims to retrieve the relevant products when users watching live streamers selling fashion products. In practice, it is very hard to get the product items accurately and quickly because of large deformations, occlusions and motion blur of product items in a real-world live streaming environment. In this paper, our solution for WAB challenge is presented, which includes the model and training methods of fashion product localization and identification, as well as the detailed strategy for optimization, model assembly, and post-process rank. Experiments show that our strategies for data enhancement, model fusion and result ranking can lead to a better result. Finally, our model is small and efficient with competitive results and attains 0.4915 on test B in the final season, ranking 5th. And our model attains 0.5604 on test A, ranking 1st in the late submission.