ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

Full Citation in the ACM Digital Library

SESSION: Invited-Plenary Talk

Session details: Invited-Plenary Talk

  • Hugo Hammer

Explainable Artificial Intelligence for Human Embryo Cell Cleavage Stages Analysis

  • Akriti Sharma
  • Mette H. Stensen
  • Erwan Delbarre
  • Trine B. Haugen
  • Hugo L. Hammer

Many couples struggle with infertility, and opt for assisted reproductive technology (ART). Selecting the embryo with the highest chance of resulting in pregnancy, is one of the most critical steps of the ART procedure. The implantation potential of an embryo is associated with its morphology which includes assessing cell cleavage stages. Today, the assessment is mainly done manually by visual examination by embryologist and therefore is often subjective. Deep learning (DL) models can be used to automatically classify cell cleavage stages, and make the embryo assessment process efficient, objective. However, it is important that embryologists understand and trust the DL predictions, and in this paper we evaluate the potential of three explainable artificial intelligence (XAI) techniques. First, we compare the two XAI techniques Grad-CAM and LIME to identify important characteristics in embryo images associated with specific embryo cleavage stages. Both approaches identified biologically relevant morphological characteristics, but generally Grad-CAM was more consistent than LIME. Secondly, we suggest an approach on how to use the XAI technique SHAP to identify image characteristics that caused some images to be misclassified. The identified image characteristics had meaningful biological interpretation such as fragmentation. Overall the study demonstrates that DL in combination XAI can be useful in embryo assessment and selection.

SESSION: Session 1: Multimodal Data Analysis

Session details: Session 1: Multimodal Data Analysis

  • Michael Riegler

Multimodal Cheapfakes Detection by Utilizing Image Captioning for Global Context

  • Tuan-Vinh La
  • Quang-Tien Tran
  • Thanh-Phuc Tran
  • Anh-Duy Tran
  • Duc-Tien Dang-Nguyen
  • Minh-Son Dao

The rapid development of technology in social media platforms has led to abundant misinformation and fake news spreading in the community. One of the most prevalent ways to misleading information on social media is cheapfakes, which are more accessible and affordable than deepfakes. Most existing approaches extract features from text or concatenate visual and textual features and train with multimodal to classify news. This paper proposed several strategies to leverage object, textual, image captioning features. These strategies focus on utilizing image captioning to extract the correlation between images and captions. We also propose some boosting techniques to enhance the result. Our methods are evaluated on the "MMSys'21 Grand Challenge" dataset and have 86.75% accuracy.

Tone Classification for Political Advertising Video using Multimodal Cues

  • Anh-Khoa Vo
  • Yuta Nakashima

Politics has always gotten much attention throughout history, and video advertisement has become one of the most essential tools for political communication. Analysis of such political advertising videos can provide more insight into the political campaign by evaluating the message in it, such as a candidate's attitude toward a certain political issue. In this paper, we propose to classify the tone in political advertising videos into promotive, contrastive, and their mixture using a deep neural network to benefit automatic analysis of such videos. We especially explore how different modalities of videos, i.e., visuals, audio, and text, contribute to improving the classification accuracy.

A Hybrid Transformer Network for Detection of Risk Situations on Multimodal Life-Log Health Data

  • Rupayan Mallick
  • Jenny Benois-Pineau
  • Akka Zemmari
  • Marion Pech
  • Thinhinane Yebda
  • Hélène Amieva
  • Laura Middleton

The paper is focused on the development of hybrid transformer architectures for the detection of risk events on multimodal data recorded on a person with visual and signal sensors. The proposed two-stream architecture consists of a visual transformer and linear transformer of time series. The linear transformer is benchmarked on the publicly available dataset UCI-HAR. The experiments with our architecture have been conducted on the in-the-wild dataset BIRDS. The hybrid transformer architecture has better empirical performance than the 3D CNNs and RNNs in previous work. The accuracy of detection of risk situations shows an improvement of 10% over the single-stream transformers.

Predicting High-risk Congestion Areas During Heavy Rain Using Multi Prediction Model and Maximum Periodic Frequent Pattern Algorithms

  • Minh-Dang Tran
  • Nazmudeen Mohamed Saleem

The growth of big data makes it possible to study traffic congestion problems through data from sensors in various locations. Traffic congestion and rainfall data can be used to point out traffic congestion and provide corresponding solutions to solve traffic problems. This paper proposes a novel system namely Rainfall and Traffic Congestion Prediction System (RTC-PS) to address these problems. The proposed system is based on future data and a part of recent data to find roads that are frequently congested when heavy rains occur. We applied the Stacked Generalization method to combine four prediction algorithms: Random Forest (RF), Catboost, eXtreme Gradient Boosting (XGBoost), and Least Absolute Shrinkage and Selection Operator (Lasso). We employ the Maximal Periodic Frequent Pattern Growth (maxPFP-Growth) algorithm mining both recent data and predicted data to discover high traffic demand areas when it rains heavily. The experiments were performed using a database from sensor data collected in Kobe City, Japan, from 2014-2015. The results demonstrate the accuracy of the prediction results from the RTC-PS system when using a lot of sensor data in many areas of the city. The proposed novel RTC-PS system can provide a thorough understanding of the areas that are prone to traffic congestion when heavy rain occurs in an urban area, which can be used by authorities and citizens to minimize the impact of traffic congestion on a citywide scale.

SESSION: Session 2: IoT, Education, and Privacy Preserving

Session details: Session 2: IoT, Education, and Privacy Preserving

  • Duc-Tien Dang-Nguyen

IoT-based Multimodal Analysis for Smart Education: Current Status, Challenges and Opportunities

  • Wenbin Gan
  • Minh Son Dao
  • Koji Zettsu
  • Yuan Sun

IoT-based multimodal learning analytics promises to obtain an in-depth understanding of the learning process. It provides the insights for not only the explicit learning indicators but also the implicit attributes of learners, based on which further potential learning support can be timely provided in both physical and cyber world accordingly. In this paper, we present a systematic review of the existing studies for examining the empirical evidences on the usage of IoT data in education and the capabilities of multimodal analysis to provide useful insights for smarter education. In particular, we classify the multimodal data into four categories based on the data sources (data from digital, physical, physiological and environmental spaces). Moreover, we propose a concept framework for better understanding the current state of the filed and summarize the insights into six main themes (learner behavior understanding, learner affection computing, smart learning environment, learning performance prediction, group collaboration modeling and intelligent feedback) based on the objectives for intelligent learning. The associations between different combinations of data modalities and various learning indicators are comprehensively discussed. Finally, the challenges and future directions are also presented from three aspects.

Is More Realistic Better? A Comparison of Game Engine and GAN-based Avatars for Investigative Interviews of Children

  • Pegah Salehi
  • Syed Zohaib Hassan
  • Saeed Shafiee Sabet
  • Gunn Astrid Baugerud
  • Miriam Sinkerud Johnson
  • Pål Halvorsen
  • Michael A. Riegler

The success of investigative interviews with maltreated children is often defined by the interviewer's ability to elicit a reliable and coherent account of the alleged incident from the child. Research shows that a child avatar mimicking a maltreated child can improve interviewers' performance in conducting these interviews. The realism of such a child avatar is considered one of the most critical factors. Based on this, the current study aims to generate realistic child avatars in real-time that utilize multimodal data and different components from artificial intelligence. This paper discusses the subjective findings of a study of two types of child avatar videos; animated avatars created using the Unity game engine and photorealism talking-head avatars using Generative adversarial networks (GANs). The results show that although the state-of-the-art GAN-generated avatars are significantly more realistic, they do not necessarily create a better experience, as most of the participants prefer talking to animated avatars.

Towards Intellectual Property Rights Protection in Big Data

  • Rafik Hamza
  • Minh-Son Dao
  • Sadanori Ito
  • Zettsu Koji

Big Data applications can revolutionize any platform by facilitating the analysis of large amounts of information. However, the biggest challenge associated with Big Data is overcoming the intellectual property barriers associated with the use of this data, especially in cross-database applications. Although intellectual property provisions have been formulated to limit inappropriate use and manage access to Big Data, it is difficult to make this trade-off and overcome the challenges of Big Data. This paper explores the limits of intellectual property rights in Big Data applications. The advent of Big Data requires an alternative conceptual framework along with security policies and regulations. The profound issues of copyright on cross-database platforms are highlighted in this paper, as well as the paradigm shift from ownership to control of access to and use of Big Data, especially on cross-database platforms. We also present a real-world case study of the underlying technologies of cross-data analytics (xdata.nict.jp). The xData platform aims to coordinate competing social, personal, and industrial interests in data to ensure fair access while minimizing legal and ethical threats. Finally, we discuss the idea of using blockchain-enabled smart contracts to protect intellectual property rights on cross-data platforms and highlight some important aspects of copyright issues, highlighting key issues and current open challenges.

DeDigi: A Privacy-by-Design Platform for Image Forensics

  • Chi-Hao Tran
  • Quoc-Thang Tran
  • Quynh-Chau Long-Vu
  • Hai-Son Nguyen
  • Anh-Duy Tran
  • Duc-Tien Dang-Nguyen

With the explosion of multimedia has come the rise of advanced image processing and free editing tools, allowing individuals to readily change how an image or video appears in front of other eyes. This act is malicious and may cause disruption in the community; therefore, to resolve these issues, there has been many existing tools that use Digital Image Forensics (DIF) techniques. Even so, based on our MUUP evaluation system, these tools can lack guidance, difficult to utilize and can have privacy issues. Following that, we then investigate existing algorithms from various DIF techniques, and incorporate this knowledge into our DIF web tool - DeDigi. For DeDigi's design, based on our Design Science system, each design iteration will take into account user feedback on how the tool's design is perceived from their perspective. The final outcome of our "privacy-by-design" tool is five approaches from various categories of the focused field, a score of 3.92/5 for DeDigi (beta)'s user experience, and new user interfaces for training purpose.

SESSION: Session 3: Federated Learning

Session details: Session 3: Federated Learning

  • Minh-Son Dao

FedMCRNN: Federated Learning using Multiple Convolutional Recurrent Neural Networks for Sleep Quality Prediction

  • Tran Anh Khoa
  • Do-Van Nguyen
  • Phuoc Van Nguyen Thi
  • Koji Zettsu

"Good night" is the most common saying everyone uses every day. That shows sleep plays a vital role in human life, and about a third of a lifetime is spent sleeping. Having a good sleep means having good health, spirit, and intellect to work. Many studies have analyzed predicted sleep quality using machine learning (ML). However, no studies have federated learning (FL) to analyze and predict sleep quality predictions. Our study operated federated multiple convolutional neural networks (FedMCRNN) and multi-modal data collected from wearables for sleep quality prediction. We measure the performance of the FedMCRNN in many-to-one and many-to-many cases using a variety of metrics and compare it with traditional machine learning models. The results show that FedMCRNN predicts quality intention reliably, with 96.774% and 68.721% accuracies for the two cases, many-to-one and many-to-many, respectively. Besides, other metrics have better value than methods. The results also show that FedMCRNN performs better than previous most advanced methods for predicting sleep quality and clearly shows which features influence sleep quality. Our findings have implications for the development of Artificial Intelligence (AI) doctors.

Efficient Resource Allocation using Federated Learning in Cellular Networks

  • Son Cao Nguyen
  • Minh Hoang
  • Tinh Phuc Vo
  • Duc Ngoc Minh Dang

Federated learning (FL) has been introduced as a promising solution to address data privacy issues. FL is receiving widespread attention in various applications, including wireless communication technology. Using FL performs local model training and uploads model parameters for global synthesis at the server. In wireless communication technology, the most concerning issue is resource allocation, when tens or hundreds of devices are involved in the operation. Therefore, this paper proposes three algorithms, random scheduling (RS), round-robin (RR), and proportional fair (PF), to allocate resources using FL with Poisson Point Process (PPP) distributed in the cellular networks. The efficiency of resource allocation is evaluated using the MNIST Fashion multi-modal dataset and considering each user's (UE) convergence speed and time.