MMArt-ACM '22: Proceedings of the 2022 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia

MMArt-ACM '22: Proceedings of the 2022 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia

MMArt-ACM '22: Proceedings of the 2022 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia


Full Citation in the ACM Digital Library

SESSION: Oral Session

SiSP: Japanese Situation-dependent Sentiment Polarity Dictionary

  • Atsushi Takada
  • Yoshinobu Kano
  • Toshihiko Yamasaki

In order to deal with the variety of meanings and contexts of words, we created a Japanese Situation-dependent Sentiment Polarity Dictionary (SiSP) of sentiment values labeled for 20 different situations. This dictionary was annotated by crowdworkers with 25,520 Japanese words, and consists of 10 responses for each situation of each word. Using our SiSP, we predicted the polarity of each word in the dictionary and that of dictionary words in sentences considering the context. In both experiments, situation-dependent prediction showed superior results in determining emotional polarity.

Modeling Kansei Index for Images and Impression Estimation Using Fine Tuning

  • Yukiya Taki
  • Kunihito Kato
  • Kazunori Terada
  • Kensuke Tobitani

In this study, we propose an effective method for estimating image impressions that is based on a Kansei (affective) index and then show how a deep learning model can be used to evaluate the suitability of images based on that Kansei index. To accomplish this, we exhaustively collected words and phrases that express image impressions using the evaluation grid method and then conducted an evaluation experiment for those collected words using Yahoo! Cloud Sourcing. Next, factor analysis was performed on the collected evaluation data, and the obtained factor scores were used as the Kansei index. Finally, we used ResNet18 trained with ImageNet to fine-tune each image's factor scores for use as supervised data and confirmed that our deep learning model could infer an effective Kansei index that correlates strongly with the target images.

EMVGAN: Emotion-Aware Music-Video Common Representation Learning via Generative Adversarial Networks

  • Yu-Chih Tsai
  • Tse-Yu Pan
  • Ting-Yang Kao
  • Yi-Hsuan Yang
  • Min-Chun Hu

Music can enhance our emotional reactions to videos and images, while videos and images can enrich our emotional response to music. Cross-modality retrieval technology can be used to recommend appropriate music for a given video and vice versa. However, the heterogeneity gap caused by the inconsistent distribution between different data modalities complicates learning the common representation space from different modalities. Accordingly, we propose an emotion-aware music-video cross-modal generative adversarial network (EMVGAN) model to build an affective common embedding space to bridge the heterogeneity gap among different data modalities. The evaluation results revealed that the proposed EMVGAN model can learn affective common representations with convincing performance while outperforming other existing models. Furthermore, the satisfactory performance of the proposed network encouraged us to undertake the music-video bidirectional retrieval task.

Unattractive Face Amplifies Late Frontal Slow Wave during Visual Perspective Taking

  • Hirokazu Doi

The ability to imagine what a visual scenery looks like from other viewer's perspective is termed "second order visual perspective taking" and is considered to be linked with such functions as empathy and mentalizing. A few recent studies indicated the possibility that empathic response towards others is modulated by their physical attractiveness. On the basis of this, it was conjectured that visual perspective taking is also under the influences of viewer's attractiveness.

The present study investigated this hypothesis by examining the effects of viewer's facial attractiveness on neural activation during visual perspective taking. The main focus of interest was the effect of facial attractiveness on late frontal slow wave (LFSW), an event-related potential component that has hitherto been shown to reflect the process of other's mental state representation.

The results revealed larger LFSW when taking visual perspective of unattractive than attractive viewers. Together with some of the existing literature, the present finding indicates that people have stronger motivation to see things from unattractive other's perspective than attractive one.