MADiMa '22: Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management on Multimedia Assisted Dietary Management

Digital Library logo Full Citation in the ACM Digital Library

SESSION: Invited Talk 1

The Quest towards Automated Dietary Monitoring & Intervention in Free-living

Oliver Amft

In the first part of this talk, I will review the hunt for sensors that started of the field of automated dietary monitoring (ADM) and continues to play a role in shaping current research. Moreover, I will describe the eyeglasses-based sensors that we currently develop and their perspectives for free-living monitoring. Moving on, in part two, I will discuss digital twin-based co-simulation as a novel system design approach for wearable devices and their relevance for supporting machine learning algorithm development. Finally, in part three, I will extend the scope into technology-based dietary intervention, i.e., how ADM can support users in their daily life when targeting a diet change or body weight reduction. I will show examples from our work to create digital twins that model individual behavior, identify behavior changes, and interaction strategies to integrate in everyday life.

SESSION: Oral Paper 1 Session

Real Scale Hungry Networks: Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image

Shu Naritomi
Keiji Yanai

The management of dietary calorie content using information technology has become an essential topic in the multimedia field of research in recent years. Therefore, many researchers and companies are conducting research and developing applications. Many methods for estimating the calorie content of a food use image recognition. However, these methods have a problem. They cannot consider the 3D heights and depths of the food because they only consider the food as a 2D object, even though the actual meal is 3D. To solve this problem, we would like to utilize 3D reconstruction techniques based on deep learning, developed in recent years, but most of these methods reconstruct the normalized objects. Being normalized means that the actual size is unknown, making it difficult to use them for estimating calories and nutritional value. In this paper, we propose a method using an implicit function representation that reconstructs the 3D shapes of a dish and plate as they are in real scale, using an RGB-D image and camera parameters.

Chewing Detection from Commercial Smart-glasses

Vasileios Papapanagiotou
Anastasia Liapi
Anastasios Delopoulos

Automatic dietary monitoring has progressed significantly during the last years, offering a variety of solutions, both in terms of sensors and algorithms as well as in terms of what aspect or parameters of eating behavior are measured and monitored. Automatic detection of eating based on chewing sounds has been studied extensively, however, it requires a microphone to be mounted on the subject's head for capturing the relevant sounds. In this work, we evaluate the feasibility of using an off-the-shelf commercial device, the Razer Anzu smart-glasses, for automatic chewing detection. The smart-glasses are equipped with stereo speakers and microphones that communicate with smart-phones via Bluetooth. The microphone placement is not optimal for capturing chewing sounds, however, we find that it does not significantly affect the detection effectiveness. We apply an algorithm from literature with some adjustments on a challenging dataset that we have collected in house. Leave-one-subject-out experiments yield promising results, with an F1-score of $0.96$ for the best case of duration-based evaluation of eating time.

Learning Multi-Subset of Classes for Fine-Grained Food Recognition

Javier Ródenas
Bhalaji Nagarajan
Marc Bolaños
Petia Radeva

Food image recognition is a complex computer vision task, because of the large number of fine-grained food classes. Fine-grained recognition tasks focus on learning subtle discriminative details to distinguish similar classes. In this paper, we introduce a new method to improve the classification of classes that are more difficult to discriminate based on Multi-Subsets learning. Using a pre-trained network, we organize classes in multiple subsets using a clustering technique. Later, we embed these subsets in a multi-head model structure. This structure has three distinguishable parts. First, we use several shared blocks to learn the generalized representation of the data. Second, we use multiple specialized blocks focusing on specific subsets that are difficult to distinguish. Lastly, we use a fully connected layer to weight the different subsets in an end-to-end manner by combining the neuron outputs. We validated our proposed method using two recent state-of-the-art vision transformers on three public food recognition datasets. Our method was successful in learning the confused classes better and we outperformed the state-of-the-art on the three datasets.

SESSION: Invited Talk 3

mediPiatto: Using AI to Assess and Improve Mediterranean Diet Adherence

Arindam Ghosh

Numerous studies have demonstrated the benefits of Mediterranean Diet Adherence (MDA) to improved long-term weight loss outcomes, positive effects on cardiovascular health, and decrease in complications among diabetic patients. However, manual assessment of MDA on a regular basis is challenging, and a convenient method of such evaluation is needed for mass adoption. The goal of the mediPiatto research project was to develop an AI-based end-to-end automatic system for evaluation and improvement of MDA from meal log images. The developed system was embedded into a smartphone application for meal tracking. A 4-week feasibility study was conducted with 24 participants where a weekly report with a score quantifying their adherence was sent to them. A comparison of the system-generated MDA score of four users with that calculated by an expert dietitian showed a mean difference of 3.5%. A self-reported food frequency questionnaire (FFQ) - used as a self-measurement of a person's compliance with the MD - showed that 19 out of 24 participants had an overall increase in the score over the period of the study. An end-of-study survey yielded overall positive feedback from the participants with 20 out of 24 reporting that they would be interested in incorporating the system in their daily lives.

SESSION: Oral Paper 2 Session

Text-based Image Editing for Food Images with CLIP

Kohei Yamamoto
Keiji Yanai

Recently, the large-scale language-image pre-trained model, such as CLIP, has drawn much attention due to its remarkable ability for various tasks, including classification and image synthesis. The combination of CLIP and GAN can be used for text-based image manipulation and text-based image synthesis.Several models of a combination of CLIP and GAN have been proposed so far. However, their effectiveness in the food image domain has not been examined comprehensively yet. In this paper, we reported the results of the experiments on text-based food image manipulation using VQGAN-CLIP and discussed the possibility of food image manipulation by texts.

World Food Atlas for Food Navigation

Ali Rostami
Nitish Nagesh
Amir Rahmani
Ramesh Jain

Food plays a central role in agriculture, public wellness, public health, culinary art, and culture. Food-related data is available in varying formats and with different access levels ranging from private datasets to publicly downloadable data. Every food-related query, in principle, is a food recommendation problem. We analyze the components of a food recommendation and its requirements. We demonstrate the effectiveness of having access to worldwide food data from divergent aspects for answering food- and health-related queries that would otherwise be expensive and require specialized domain expertise. We present the World Food Atlas (WFA): An open-source platform for different stakeholders in the food ecosystem to share their data on a global data hub with a singular point of access. The world food atlas contains the availability and inter-connectivity of food and its effects in various forms. We gather real-world questions by partnering with nutritionists, dietitians, and doctors. We categorize the practical food queries to construct requirement tables and develop a novel schema to satisfy the requirement table to model the world food atlas. Finally, we demonstrate how food and lifestyle navigation systems can use the world food atlas to enable personalized and context-driven solutions to person-entity-context queries.

SESSION: Poster Session

SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations

Yuma Honbu
Keiji Yanai

By using semantic segmentation dataset with pixel-wise annotation for training GANs, image generation from a given mask image drawn by a user is possible. However, regarding mask-based food image synthesis, the existing food segmentation datasets have only food region masks and no plate region masks. When we train a mask-based image synthesis network with the datasets without plate mask annotation, the plate regions in the generated food images are uncontrollable by a user and tend to be distorted. To solve this problem, we use a Few-shot segmentation method to estimate the plate regions of the image in the existing food segmentation dataset using a limited number of plate region annotations, and add dish region masks to it. By using added plate masks as training data, we enable generating food images under the control of the shape of the plates. We have implemented the interactive food image drawing system in which we draw food masks as well as plate masks. In the demo, we demonstrate that we generate natural set meal images which include multiple dishes by the sketch interface easily.

DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images

Kento Adachi
Keiji Yanai

An automatic meal recording is one of typical applications of image recognition technology. In fact, some mobile apps on meal recording have been released so far. Most of the apps assume that a user takes a meal photo before start eating. However, this approach is not appropriate for the meals in which foods are served while taking meals such as food buffets, shared large plates and hot pots. In this study, we propose a mobile meal recording system that estimates food calories during eating in the real-time way by eating action recognition with RGB-D images obtained by a front-mounted depth sensor on a smartphone. % which is used for face recognition. In the experiments with the mobile app implemented for an iPhone, in the situation of eating grilled meat, the proposed system improved the accuracy of calorie estimation by up to 28% and recognized the correct meal category with 6.67 times higher accuracy in eating action recognition compared to the baseline system.

Simulating Personal Food Consumption Patterns using a Modified Markov Chain

Xinyue Pan
Jiangpeng He
Andrew Peng
Fengqing Zhu

Food image classification serves as the foundation of image-based dietary assessment to predict food categories. Since there are many different food classes in real life, conventional models cannot achieve sufficiently high accuracy. Personalized classifiers aim to largely improve the accuracy of food image classification for each individual. However, a lack of public personal food consumption data proves to be a challenge for training such models. To address this issue, we propose a novel framework to simulate personal food consumption data patterns, leveraging the use of a modified Markov chain model and self-supervised learning. Our method is capable of creating an accurate future data pattern from a limited amount of initial data, and our simulated data patterns can be closely correlated with the initial data pattern. Furthermore, we use Dynamic Time Warping distance and Kullback-Leibler divergence as metrics to evaluate the effectiveness of our method on the public Food-101 dataset. Our ex- perimental results demonstrate promising performance compared with random simulation and the original Markov chain method.

Automatic Classification of High vs. Low Individual Nutrition Literacy Levels from Loyalty Card Data in Switzerland

Marc Blöchlinger
Jing Wu
Simon Mayer
Klaus L. Fuchs
Melanie Stoll
Lia Bally

The increasingly prevalent diet-related non-communicable diseases (NCDs) constitute a modern health pandemic. Higher nutrition literacy (NL) correlates with healthier diets, which in turn has favorable effects on NCDs. Assessing and classifying people's NL is helpful in tailoring the level of education required for disease self-management/empowerment and adequate treatment strategy selection. With recently introduced regulation in the European Union and beyond, it has become easier to leverage loyalty card data and enrich it with nutrition information about bought products. We present a novel system that utilizes such data to classify individuals into high- and low- NL classes, using well-known machine learning (ML) models, thereby permitting for instance better targeting of educational measures to support the population-level management of NCDs. An online survey (n = 779) was conducted to assess individual NL levels and divide participants into high- and low- NL groups. Our results show that there are significant differences in NL between male and female, as well as between overweight and non-overweight individuals. No significant differences were found for other demographic parameters that were investigated. Next, the loyalty card data of participants (n = 11) was collected from two leading Swiss retailers with the consent of participants and a ML system was trained to predict high or low NL for these individuals. Our best ML model, which utilizes the XGBoost algorithm and monthly aggregated baskets, achieved a Macro-F1-score of .89 at classifying NL. We hence show the feasibility of identifying individual NL levels based on household loyalty card data leveraging ML models, however due to the small sample size, the results need to be further verified with a larger sample size.

AI-Assisted Food Intake Activity Recognition Using 3D mmWave Radars

Yi-Hung Wu
Yuanjie Chen
Shervin Shirmohammadi
Cheng-Hsin Hsu

The automatic recognition of when and for how long a person is eating a certain food or drinking has applications in telecare, smarthome data monetization, and diet control. Existing food recognition systems either recognize the type of the food, but not when and for how long the person was eating and drinking, or use invasive sensors or privacy-intruding cameras which users are hesitant to install in their homes. In this paper, we propose a non-invasive system, using Artificial Intelligence to process 3D point cloud data collected from a 3D mmWave radar, that can distinguish a person's eating and drinking activities from other daily activities. This is challenging because eating and drinking activities are much more fine-grained than activities that existing systems can detect, such as sitting, running, walking, etc. Performance evaluations show that our proposed system significantly outperforms a representative state-of-the-art activity recognition system, RadHAR, by at least 27% and can reach 96.56% and 96.73% accuracy for two different training/testing split setups.

MADiMa '22: Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management on Multimedia Assisted Dietary Management

MADiMa '22: Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management on Multimedia Assisted Dietary Management

SESSION: Invited Talk 1

The Quest towards Automated Dietary Monitoring & Intervention in Free-living

SESSION: Oral Paper 1 Session

Real Scale Hungry Networks: Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image

Chewing Detection from Commercial Smart-glasses

Learning Multi-Subset of Classes for Fine-Grained Food Recognition

SESSION: Invited Talk 3

mediPiatto: Using AI to Assess and Improve Mediterranean Diet Adherence

SESSION: Oral Paper 2 Session

Text-based Image Editing for Food Images with CLIP

World Food Atlas for Food Navigation

SESSION: Poster Session

SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations

DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images

Simulating Personal Food Consumption Patterns using a Modified Markov Chain

Automatic Classification of High vs. Low Individual Nutrition Literacy Levels from Loyalty Card Data in Switzerland

AI-Assisted Food Intake Activity Recognition Using 3D mmWave Radars

Sections

User login