WCRML '19- Proceedings of the ACM Workshop on Crossmodal Learning and Application


WCRML '19- Proceedings of the ACM Workshop on Crossmodal Learning and Application

Full Citation in the ACM Digital Library

SESSION: Workshop Presentations

Some Shades of Grey!: Interpretability and Explainability of Deep Neural Networks

  •     Andreas Dengel

Based on the availability of data and corresponding computing capacity, more and more cognitive tasks can be transferred to computers, which independently learn to improve our understanding, increase our problem-solving capacity or simply help us to remember connections. Deep neural networks in particular clearly outperform traditional AI methods and thus find more and more areas of application where they are involved in decision-making or even make decisions independently. For many areas, such as autonomous driving or credit allocation, the use of such networks is extremely critical and risky due to their "black box" character, since it is difficult to interpret how or why the models come to certain results. The paper discusses and presents various approaches that attempt to understand and explain decision-making in deep neural networks.

Multimodal Multitask Emotion Recognition using Images, Texts and Tags

  •      Mathieu Pagé Fortin
  • Brahim Chaib-draa

Recently, multimodal emotion recognition received an increasing interest due to its potential to improve performance by leveraging complementary sources of information. In this work, we explore the use of images, texts and tags for emotion recognition. However, using several modalities can also come with an additional challenge that is often ignored, namely the problem of "missing modality". Social media users do not always publish content containing an image, text and tags, and consequently one or two modalities are often missing at test time. Similarly, the labeled training data that contain all modalities can be limited. Taking this in consideration, we propose a multimodal model that leverages a multitask framework to enable the use of training data composed of an arbitrary number of modality, while it can also perform predictions with missing modalities. We show that our approach is robust to one or two missing modalities at test time. Also, with this framework it becomes easy to fine-tune some parts of our model with unimodal and bimodal training data, which can further improve overall performance. Finally, our experiments support that this multitask learning also acts as a regularization mechanism that improves generalization.

Fusing Deep Quick Response Code Representations Improves Malware Text Classification

  •      Manikandan Ravikiran
  • Krishna Madgula

Recently, multimodal processing for various text classification tasks such as emotion recognition, sentiment analysis, author profiling etc. have gained traction due to its potential to improve performance by leveraging complementary sources of information such as texts, images and speech. In this line, we focus on multimodal malware text classification. However unlike traditional tasks such as emotions recognition and sentiment analysis generating complementary domain information is difficult in cyber security, leading to little focus of such a multimodal idea in context of malware classification. As such, in this work we propose to address this gap by improving malware text classification task by leveraging Quick Response (QR) codes generated from the same as complementary information. With superior capacity of Convolutional Neural Network's (CNN) to process images, we fuse the representations from CNN's for both text and image data in multiple ways, where we show that using complementary information from QR codes improves the performance of the task of malware text classification thereby achieving new state-of-the-art and creating the very first multimodal benchmark on malware text classification.