CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

Digital Library logo
Full Citation in the ACM Digital Library

SESSION: Paper Presentations

Multimodal Dish Pairing: Predicting Side Dishes to Serve with a Main Dish

Taichi Nishimura
Katsuhiko Ishiguro
Keita Higuchi
Masaaki Kotera

Planning a food menu is an essential task in our daily lives. We need to plan a menu by considering various perspectives. To reduce the burden when planning a menu, this study first tackles a novel problem of multimodal dish pairing (MDP), i.e., retrieving suitable side dishes given a query main dish. The key challenge of MDP is to learn human subjectivity, i.e., one-to-many relationships of the main and side dishes. However, in general, web resources only include one-to-one manually created pairs of main and side dishes. To tackle this problem, this study assumes that if side dishes are similar to a manually created side dish, they are also acceptable for the query main dish. We then imitate a one-to-many relationship by computing the similarity of side dishes as side dish scores and assigning them to unknown main and side dish pairs. Based on this score, we train a neural network to learn the suitability of the side dishes through learning-to-rank techniques by fully leveraging the multimodal representations of the dishes. During the experiments, we created a dataset by crawling recipes from an online menu site and evaluated the proposed method based on five criteria: retrieval evaluation, overlapping ingredients, overlapping cooking methods, consistency of the dish styles, and human evaluations. Our experiment results show that the proposed method is superior to the baseline in terms of these five criteria. The results of the qualitative analysis further demonstrates that the proposed method can retrieve side dishes suitable for the main dish.

Recipe Recommendation for Balancing Ingredient Preference and Daily Nutrients

Sara Ozeki
Masaaki Kotera
Katushiko Ishiguro
Taichi Nishimura
Keita Higuchi

In this work, we propose a recipe recommendation system for daily eating habits based on user preference and nutrient balance. This method prompts user input and allows for the substitution or addition of ingredients while reflecting the user's preferences. The system also considers daily nutrient balance to fill dietary reference intakes such as carbohydrates, protein, and fat. While users select a day's worth of preferred recipes, the system updates the recommendation based on user selection and excess/deficiency predefined nutritional criteria. We run a simulation study to see the performance of the proposed algorithm. With our recipe planning application, we also performed a user study that participants chose a day's worth of recipes with preferred ingredients. The results show that the proposed system helps make better nutrient balance recipes than traditional ingredient-based search. In addition, the participants liked recommendations from the proposed system that improved satisfaction with recipe selection.

Prediction of Mental State from Food Images

Kei Nakamoto
Sosuke Amano
Hiroaki Karasawa
Yoko Yamakata
Kiyoharu Aizawa

Diet is a very important factor in people's health management. Applications that record photos of meals and help people manage their diets are used by many users every day. In many cases, such applications use images just to estimate meals and calories. We propose a further use of diet images. The new idea is to read changes in mental health, such as stress and well-being, from the diet image over some period of time. If the applications can recognize the signs of mental health changes that dietary records give, they will be more useful as health management applications. The two contributions of this paper are that we have created a dataset consisting of 24,644 meal items dietary record and mental health records for the same time period(over a 3-month period), and that we have shown that changes in mental health are correlated with changes in diet, and especially, the correlation is stronger in groups with greater change. The potential and goal of this study are to extract features from the images of meals related to mental health.

Learning Sequential Transformation Information of Ingredients for Fine-Grained Cooking Activity Recognition

Atsushi Okamoto
Katsufumi Inoue
Michifumi Yoshioka

The goal of our research is to recognize the fine-grained cooking activities (e.g., dicing or mincing in cutting) in the egocentric videos from the sequential transformation of ingredients that are processed by the camera-wearer; these types of activities are classified according to the state of ingredients after processing, and we often utilize the same cooking utensils and similar motions in such activities. Due to the above conditions, the recognition of such activities is a challenging task in computer vision and multimedia analysis. To tackle this problem, we need to perceive the sequential state transformation of ingredients precisely. In this research, to realize this, we propose a new GAN-based network whose characteristic points are 1) we crop images around the ingredient as a preprocessing to remove the environmental information, 2) we generate intermediate images from the past and future images to obtain the sequential information in the generator network, 3) the adversarial network is employed as a discriminator to classify whether the input image is generated one or not, and 4) we employ the temporally coherent network to check the temporal smoothness of input images and to predict cooking activities by comparing the original sequential images and the generated ones. To investigate the effectiveness of our proposed method, for the first step, we especially focus on "\textitcutting activities ". From the experimental results with our originally prepared dataset, in this paper, we report the effectiveness of our proposed method.

SESSION: Short Paper Presentations

Recipe Recording by Duplicating and Editing Standard Recipe

Akihisa Ishino
Yoko Yamakata
Kiyoharu Aizawa

The best way to ascertain the exact nutritional value of a user's food intake is to have the user record the recipe for that food himself/herself. However, writing a recipe from scratch is tedious and impractical. Therefore, we proposed a method that allows users to write their own recipe in a short time by duplicating and editing a standard recipe. We developed a smartphone application and conducted an experiment in which 19 participants were asked to write their own recipes for about 10 food items each. The results showed that the duplication method took 74% of the time compared to writing a recipe from scratch. The number of editing operations was also reduced to 45%. Future work is to construct a dataset of standard recipes that can be rewritten with little editing cost for any person's recipe.

"Comparable Recipes": A Construction and Analysis of a Dataset of Recipes Described by Different People for the Same Dish

Rina Kagawa
Rei Miyata
Yoko Yamakata

Recording high-quality textual recipes is effective for documenting food culture. However, comparing the quality of various recipes is difficult because recipe quality might depend on a variety of description styles and dishes. Therefore, we constructed the following "Comparative Recipes" dataset. First, each of the 64 writers described five recipes after watching five home cooking videos. A total of 318 recipes were created. For each dish (video), there were 15.9 recipes on average, and each recipe was described by a different writer. Next, 335 recipe readers evaluated the quality (i.e., the reproducibility and completeness) of each recipe. A morphological analysis that used this dataset revealed that the amount of description per cooking step affects recipe quality. Furthermore, the effects of cooking procedures being integrated into cooking steps on recipe quality tended to be dependent on the reader's skill. The results suggest a need for description support that appropriately integrates cooking procedures into cooking steps according to the skills and preferences of the reader.

Few-shot Food Recognition with Pre-trained Model

Yanqi Wu
Xue Song
Jingjing Chen

Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.

MIAIS: A Multimedia Recipe Dataset with Ingredient Annotation at Each Instructional Step

Yixin Zhang
Yoko Yamakata
Keishi Tajima

In this paper, we introduce a multimedia recipe dataset with annotation of ingredients at every instructional step, named MIAIS (Multimedia recipe dataset with Ingredient Annotation at every Instructional Step). One unique feature of recipe data is that it is usually presented in a sequential and multimedia form. However, few publicly available recipe datasets contain multimedia text-image paired data for every cooking step. Our goal is to construct a recipe dataset that contains sufficient multimedia data and the annotations to them for every cooking step, which is important for many research topics, such as cooking flow graph generation, recipe text generation, and cooking action recognition. MIAIS contains 12,000 recipes; each recipe has 9.13 cooking instruction steps on average, each of which is a tuple of a text description and an image. The text descriptions and images are collected from the NII Cookpad Dataset and Cookpad Image Dataset, respectively. We have already released our annotation data and related information.

ABLE: Aesthetic Box Lunch Editing

Yutong Zhou
Nobutaka Shimada

This paper proposes an exploratory research that contains a pre-trained ordering recovery model to obtain correct placement sequences from box lunch images, and a generative adversarial network to composite novel box lunch presentations from single item food and generated layouts. Furthermore, we present Bento800, the first cleanly annotated, high-quality, and standardized dataset for aesthetic box lunch presentation generation and other downstream tasks. Bento800 dataset is available at \urlhttps://github.com/Yutong-Zhou-cv/Bento800_Dataset.

SESSION: Keynote Talk

Creating a World Food Atlas

Ramesh Jain

I face a problem multiple times every day: What am I going to eat, how much, and where? Where can I get enjoyable healthy food?

We live in a world where latest geo-spatial information of interest around us is available in the palm of our hand in our smart phone with navigational guidance, if needed. However, the most vital life information related to food remains inaccessible.

Food is vital for health and enjoyment by people, society, and planet. However, data, information, and knowledge related to food suffer from inaccessibility, disinformation, and ignorance. A dependable, trusted, accessible, and dynamic source of geo-indexed food data providing culinary, nutritional, and environmental characteristics is essential for guiding wholistic food decisions. A good amount of data and knowledge related to food is already available in different silos. All those silos may be assimilated into a World Food Atlas (WFA) and made available to people to use it for designing food-centered applications, including food recommendation. WFA contains information about location of sources for food ingredients, dishes, recipes, and consumption patterns. All this information may become available through ubiquitous maps. WFA will help in making better decisions for personal, societal, and planetary health. We believe that there is an urgent need and technology is ready to make it happen. Since food varies significantly across even shorter distances and food preparations are dependent on local culture and socio-economic conditions, it is important that local people are involved in creating such an atlas. We have started an open-data World Food Atlas project and are inviting participation of all interested people to contribute. We need people from different area to help populate WFA and use it. The project is in its infancy. We are building a global community that will make this happen. We invite you to participate in this exciting project.

SESSION: Panel

CEA++2022 Panel - Toward Building a Global Food Network

Yoko Yamakata
Stavroula Mougiakakou
Ramesh Jain

How can we create a global food network?

Attempts are being made worldwide to lead people to healthier eating habits. They are not always in the academic field but often on a small scale and privately, in hospitals, nursing homes, schools, and various organizations. They may be collecting and manually analyzing data such as recipes and food records. They are precious data, but in many cases, they are never made public. And in academia, dictionaries, corpora, and knowledge graphs are constructed manually at a high cost, but such knowledge is never shared, and another group continues to generate new knowledge. How can we reduce this wasteful work and allow data and knowledge to be shared and leveraged?

There are several issues involved in sharing food data. First, food cultures differ from country to country and region to region. Food data produced in one area rarely works as in another. Food data, especially when linked to medical care, is likely to contain private information and must be anonymized when shared. Anonymizing data without losing its intrinsic value is complex, and knowledge sharing must be abandoned in many cases. In addition, food logging is burdensome. Eating takes place every day, multiple times a day. Recording every meal requires a tremendous amount of effort. However, the impact of a single meal on a person's body is minimal, and it is the long-term record that is important in guiding a person to good health.

In this panel discussion, we invite Prof. Stavroula Mougiakakou, General Chair of MADiMa22, a workshop co-located with CEA++22, and Prof. Ramesh Jain, the keynote speaker of CEA++22, to discuss the issues raised above. The Moderator, Prof. Yoko Yamakata will make the panel discussion open to all, and participants from MADiMa22 and CEA++22 are also welcome to join the discussion.

CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

SESSION: Paper Presentations

Multimodal Dish Pairing: Predicting Side Dishes to Serve with a Main Dish

Recipe Recommendation for Balancing Ingredient Preference and Daily Nutrients

Prediction of Mental State from Food Images

Learning Sequential Transformation Information of Ingredients for Fine-Grained Cooking Activity Recognition

SESSION: Short Paper Presentations

Recipe Recording by Duplicating and Editing Standard Recipe

"Comparable Recipes": A Construction and Analysis of a Dataset of Recipes Described by Different People for the Same Dish

Few-shot Food Recognition with Pre-trained Model

MIAIS: A Multimedia Recipe Dataset with Ingredient Annotation at Each Instructional Step

ABLE: Aesthetic Box Lunch Editing

SESSION: Keynote Talk

Creating a World Food Atlas

SESSION: Panel

CEA++2022 Panel - Toward Building a Global Food Network

Sections

User login