CEA '20: Proceedings of the 12th Workshop on Multimedia for Cooking and Eating Activities

CEA '20: Proceedings of the 12th Workshop on Multimedia for Cooking and Eating Activities

CEA '20: Proceedings of the 12th Workshop on Multimedia for Cooking and Eating Activities

Full Citation in the ACM Digital Library

SESSION: Workshop Presentations

Cooking Activity Recognition in Egocentric Videos with a Hand Mask Image Branch in
the Multi-stream CNN

  • Shinya Michibata
  • Katsufumi Inoue
  • Michifumi Yoshioka
  • Atsushi Hashimoto

Fine-grained activity recognition, especially cooking activity one, with egocentric
videos is a hot topic and a challenging task in computer vision. To tackle this problem,
many researchers have tried to leverage the information of cooking tools such as knife,
peeler, etc., or that of equipment in the background. Although these are useful to
improve the recognition performance on general cooking activity categories, the information
does not provide sufficient evidences to recognize fine-grained cooking activities
such as slicing, mincing, etc., because these belong same the general category and
we often utilize the same type of tools. In addition, since the types of tools and
equipment differs for each kitchen, a recognition model can over-fit to some specific
environments in training data due to the over-confidence on such information.Therefore,
a method having a high discriminating power of object classification and robustness
for the environment difference is required. For the first step to realize such a method,
in this research, we focus on the characteristics of egocentric video, i.e., capturing
hands of camera wearer without occlusions. Hand shape is useful to recognize the objects
manipulated by camera wearer and sequential hand positions are also effective to analyze
hand movement. By using these advantages, in this paper, we proposed a new multi-stream
CNN, which has a mask image branch to leverage the hand shape and position information,
in addition to the RGB and optical flow branches. From the empirical experiments for
fine-grained cooking activity recognition in three types of kitchens, our proposed
method outperformed the conventional methods and we confirmed that our proposed method
has higher robustness for environmental difference compared with conventional methods.

Interactive Cake Decoration with Whipped Cream

  • Mako Miyatake
  • Aoi Watanabe
  • Yoshihiro Kawahara

We present a three-dimensional whipped cream printing technique for decorating cakes.
Conventional cake decorating patterns are produced by specially trained professionals
because substantial practice is required. In this paper, we introduce a printer that
can stack whipped-cream voxels on a cake sponge using a robot arm. An interactive
graphical user interface enables users to draw their favorite patterns on demand.
By optimizing the motion of the nozzle, the robot can dispense and stack the cream

Cooking Recipe Analysis based on Sequences of Distributed Representation on Procedure
Texts and Associated Images

  • Akari Ninomiya
  • Tomonobu Ozaki

Nowadays, online community sites on cooking and eating activities are recognized as
an indispensable infrastructure for daily life.In order to accurately respond to increasingly
sophisticated user requirements, it is necessary to extract the characteristics of
each cooking recipe and to clarify the relationship among them. Since mapping each
recipe into a vector space by using representation learning is one of the most promising
ways for the cooking recipe analyses, a wide variety of distributed representations
of recipes have been proposed.In this paper, to provide a precise representation of
cooking recipes from a different perspective, we propose to represent each recipe
using two sequences of distributed representation.One sequence is obtained from cooking
steps written in recipe text using BERT, and the other one is derived from sequences
of associated images during cooking by the VGG16 convolutional neural network.To assess
the effectiveness of the proposal, we perform cluster analysis for recipes on four
dishes based on the standard DTW distance among sequential distributed representations.