MuCAI'21: Proceedings of the 2nd ACM Multimedia Workshop on Multimodal Conversational AI

MuCAI'21: Proceedings of the 2nd ACM Multimedia Workshop on Multimodal Conversational AI

Full Citation in the ACM Digital Library

SESSION: Keynote Talk

Conversational AI Efforts within Facebook AI Applied Research

  • Alborz Geramifard

The goal of the conversational AI team at Facebook AI Applied Research team is to create AI driven dialog capabilities with the augmented/virtual reality product focus. This talk provides an overview of our recent efforts on data collection, multimodal dialog, pipelined model-based policies and end-to-end architectures.

SESSION: Paper Presentations

Towards Enriching Responses with Crowd-sourced Knowledge for Task-oriented Dialogue

  • Yingxu He
  • Lizi Liao
  • Zheng Zhang
  • Tat-Seng Chua

Task-oriented dialogue agents are built to assist users in completing various tasks. Generating appropriate responses for satisfactory task completion is the ultimate goal. Hence, as a convenient and straightforward way, metrics such as success rate, inform rate etc., have been widely leveraged to evaluate the generated responses. However, beyond task completion, there are several other factors that largely affect user satisfaction, which remain under-explored. In this work, we focus on analyzing different agent behavior patterns that lead to higher user satisfaction scores. Based on the findings, we design a neural response generation model EnRG. It naturally combines the power of pre-trained GPT-2 in response semantic modeling and the merit of dual attention in making use of the external crowd-sourced knowledge. Equipped with two gates via explicit dialogue act modeling, it effectively controls the usage of external knowledge sources in the form of both text and image. We conduct extensive experiments. Both automatic and human evaluation results demonstrate that, beyond comparable task completion, our proposed method manages to generate responses gaining higher user satisfaction.

Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction

  • Maria Tsfasman
  • Avinash Saravanan
  • Dekel Viner
  • Daan Goslinga
  • Sarah de Wolf
  • Chirag Raman
  • Catholijn M. Jonker
  • Catharine Oertel

How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment.

The Design of a Trust-based Game as a Conversational Component of Interactive Environment for a Human-agent Negotiation

  • Andrey V. Vlasov
  • Oksana O. Zinchenko
  • Zhenjie Zhao
  • Mansur Bakaev
  • Arsenjy Karavaev

This research shed the light on how humans interact with virtual partners (Figure 1; 3D view: in an interactive environment based on economic games and how this environment can be applied to the training process with immersive technologies. The designed system could be integrated as a tool and be a component of an e-learning platform with Conversational AI and human-agent-interactions which allows human users to play and learn. Scientifically, we have considered the trust problem from a different point of view - learning by doing (i.e., gaming), and proposed that individuals can wear "trust care" lenses on trained "golden eyes" while communicating with others. We explore how contextual trust can be used to promote any human-agent collaboration even in the domain of a competitive negotiation scenario. We present small-scale online testing via instant messaging in Telegram [@trudicbot] and prepare VR testing to demonstrate the potentials of the trust- based game approach.

SESSION: Poster Presentation

iFetch: Multimodal Conversational Agents for the Online Fashion Marketplace

  • Ricardo Gamelas Sousa
  • Pedro Miguel Ferreira
  • Pedro Moreira Costa
  • Pedro Azevedo
  • Joao Paulo Costeira
  • Carlos Santiago
  • Joao Magalhaes
  • David Semedo
  • Rafael Ferreira
  • Alexander I. Rudnicky
  • Alexander Georg Hauptmann

Most of the interaction between large organizations and their users will be mediated by AI agents in the near future. This perception is becoming undisputed as online shopping dominates entire market segments, and the new "digitally-native" generations become consumers. iFetch is a new generation of task-oriented conversational agents that interact with users seamlessly using verbal and visual information. Through the conversation, iFetch provides targeted advice and a "physical store-like" experience while maintaining user engagement. This context entails the following vital components: 1) highly complex memory models that keep track of the conversation, 2) extraction of key semantic features from language and images that reveal user intent, 3) generation of multimodal responses that will keep users engaged in the conversation and 4) an interrelated knowledge base of products from which to extract relevant product lists.