SUMAC '19- Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Visualizing Orientations of Large Numbers of Photographs

Florian Niebling

Digitized historical photographs are invaluable sources and key items for scholars in Cultural Heritage (CH) research. In addition to browsing online image collections using metadata, alternative ways of finding photographs are possible, by embedding the documents into spatial and temporal contexts to provide interactive access to these vast resources. Towards this goal, spatial properties of photographic items, i.e. position and orientation of the camera, can be automatically estimated using Structure from Motion (SfM) algorithms, even on historical photographs. In the talk we will introduce a web-based 4D browser environment that enables architectural historians to answer spatial research questions, such as "Which buildings were photographed more (or less) often than others", "From which directions has a point of interest been photographed?", "Is there one main preferred direction?" Existing methods to visualize spatial properties of images statistically are limited. Although traditional visualization methods - such as heat maps - can be used to show positional distribution of large numbers of images, it is still not widely explored how to visualize distributions of their orientations. We will discuss visualization techniques for interactively browsing spatialized photographs to gain knowledge from a combination of historical depictions of buildings and corresponding 3D models of historic cityscapes. We present first adaptations of visualization methods from other application domains that also address orientation, towards offering supporting tools to historians with corresponding spatial research questions. The introduced differentiation of occuring imaging phenomena can be used to evaluate the effectiveness of different visualization methods in empirical user studies.

Beyond Three Dimensions: Managing Space, Time and Subjectivity in your Data

Fabio Vitali

Why are we digitizing multimedia content of our heritage? Why are we spending important effort and money to create exact digital copies of our pictures, vases, manuscripts, historical archives, early movies and photographs, archeological artefacts, etc. Why is it important to capture not just the gist and the content of the artefacts, but every minute detail, the texture, the colors, every imperfection, every stain caused by time and handlers that have held and used them across centuries? One possible reason is that the gist and the content of the artefacts are not the only part of them that we are fascinated with. One possible reason is that the stories we want to tell and to be told on these artefacts are about their textures, their colors, their imperfections, their stains. It is the stories, not the artefacts, that are important. Digitized multimedia content and descriptive metadata are important as long as they allow and integrate derivative works: scholarly treaties, divulgation works, text books at schools, touristic guides, travel magazines, art magazines, cultural magazines, etc., are the means through which people access and enjoy our heritage. Without them, we would be left with boring lists of numbers and names and images. Stories not only represent how specific humans see and think about our artefacts: they are points of view that enrich data with context and interpretation, transforming the boring lists of numbers and names and images into interesting narratives that teach and entertain. As such, they are subjective, and they provide context. Getting ready to allow, support, expect and even encourage subjectivity and contextualization in our derivative works, in our stories, require that we also organize the data itself, and the descriptive metadata, to be subjective and contextualized and, more important, multiple. Technologies exist and are plentiful. Yet, few if any datasets and collections of digitized content and descriptive metadata over heritage artefacts are designed and structured from the beginning so as to support and accommodate or handling multiple independent, often inconsistent or contrasting, contextualized points of view over the same asset. We need to learn to embrace complexity and enjoy the disagreements and the scholarly disagreements and conflicts: they represent our understanding of our past much better than monolithic and synthetic points on view and make up for much better and memorable stories, which is what counts.

SESSION: Workshop Presentations

Pseudo-Cyclic Network for Unsupervised Colorization with Handcrafted Translation and Output Spatial Pyramids

Rémi Ratajczak
Carlos Crispim-Junior
Béatrice Fervers
Elodie Faure
Laure Tougne

We present a novel pseudo-cyclic adversarial learning approach for unsupervised colorization of grayscale images. We investigate the use of a non-trainable, lightweight and well-defined Handcrafted Translation to enforce the generation of realistic images and replace one of the two deep convolutional generative adversarial neural networks classically used in cyclic models. Additionally, we propose to use Output Spatial Pyramids to jointly constrain the deep latent spaces of an encoder-decoder generator to preserve spatial structures and improve the quality of the generated images. We demonstrate the interest of our approach compared with the state of the art on standard datasets (paintings, landscapes, aerial, thumbnails) that we modified for the purpose of colorization. We evaluate colorization quality of the generated images along the training with deterministic and reproducible criteria. In complement, we demonstrate the ability of our method to generate representations that are prone to make a classification network generalize well to slightly different color spaces. We believe our approach has potential applications in arts and cultural heritage to produce alternative representations without requiring paired data.

Recognizing Characters in Art History Using Deep Learning

Prathmesh Madhu
Ronak Kosti
Lara Mührenberg
Peter Bell
Andreas Maier
Vincent Christlein

In the field of Art History, images of artworks and their contexts are core to understanding the underlying semantic information. However, the highly complex and sophisticated representation of these artworks makes it difficult, even for the experts, to analyze the scene. From the computer vision perspective, the task of analyzing such artworks can be divided into sub-problems by taking a bottom-up approach. In this paper, we focus on the problem of recognizing the characters in Art History. From the iconography of Annunciation of the Lord (Figure 1), we consider the representation of the main protagonists, Mary and Gabriel, across different artworks and styles. We investigate and present the findings of training a character classifier on features extracted from their face images. The limitations of this method, and the inherent ambiguity in the representation of Gabriel, motivated us to consider their bodies (a bigger context) to analyze in order to recognize the characters. Convolutional Neural Networks (CNN) trained on the bodies of Mary and Gabriel are able to learn person related features and ultimately improve the performance of character recognition. We introduce a new technique that generates more data with similar styles, effectively creating data in the similar domain. We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character. Additionally, we analyze the localized image regions for the network predictions.

Historical and Modern Features for Buddha Statue Classification

Benjamin Renoust
Matheus Oliveira Franca
Jacob Chan
Noa Garcia
Van Le
Ayaka Uesaka
Yuta Nakashima
Hajime Nagahara
Jueren Wang
Yutaka Fujioka

While Buddhism has spread along the Silk Roads, many pieces of art have been displaced. Only a few experts may identify these works, subjectively to their experience. The construction of Buddha statues was taught through the definition of canon rules, but the applications of those rules greatly varies across time and space. Automatic art analysis aims at supporting these challenges. We propose to automatically recover the proportions induced by the construction guidelines, in order to use them and compare between different deep learning features for several classification tasks, in a medium size but rich dataset of Buddha statues, collected with experts of Buddhism art history.

Challenging Deep Image Descriptors for Retrieval in Heterogeneous Iconographic Collections

Dimitri Gominski
Martyna Poreba
Valérie Gouet-Brunet
Liming Chen

This article proposes to study the behavior of recent and efficient state-of-the-art deep-learning based image descriptors for content-based image retrieval, facing a panel of complex variations appearing in heterogeneous image datasets, in particular in cultural collections that may involve multi-source, multi-date and multi-view contents. For this purpose, we introduce a novel dataset, namely Alegoria dataset, consisting of 12,952 iconographic contents representing landscapes of the French territory, and encapsultating a large range of intra-class variations of appearance which were finely labelled. Six deep features (DELF, NetVLAD, GeM, MAC, RMAC, SPoC) and a hand-crafted local descriptor (ORB) are evaluated against these variations. Their performance are discussed, with the objective of providing the reader with research directions for improving image description techniques dedicated to complex heterogeneous datasets that are now increasingly present in topical applications targeting heritage valorization.

Processing Historical Film Footage with Photogrammetry and Machine Learning for Cultural Heritage Documentation

Francesca Condorelli
Fulvio Rinaudo

Historical film footages in many cases represent the only remaining traces of Cultural Heritage that has been lost or changed over time. Photogrammetry is a powerful technique to document the heritage transformations, but its implementation is technically challenging due to the difficulty in finding the historical data suitable to be process. This paper aims to examine the possibility to extract metric information of historic buildings from historical film footage for their 3D virtual reconstruction. In order to make automatic the research of a specific monument to document, in the first part of the study an algorithm for the detection of architectural heritage in historical film footage was developed using Machine Learning. This algorithm allowed the identification of the frames in which the monument appeared and their processing with photogrammetry. In the second part, with the implementation of open source Structure-from-Motion algorithms, the 3D virtual reconstruction of the monument and its metric information were obtained. The results were compared with a benchmark for evaluate the metric quality of the model, according to specific camera motion. This research, analysing the metric potentialities of historical film footage, provides fundamental support to documentation of Cultural Heritage, creating tools useful for both geomatics and historians.

An Interactive Web Application for the Creation, Organization,and Visualization of Repeat Photographs

Axel Schaffland
Oliver Vornberger
Gunther Heidemann

Repeat photography describes the process of photographing the same scene from the same camera position at different points in time. Goals are for example the detection and measurement of changes in the scene. In this paper we will present re.photos (www.re.photos), a working and freely accessible system for re- peat photography which offers online methods for the interac- tive and user friendly creation as well as the interactive browsing, searching, locating, rating, sharing, and discussing of repeat photo- graphs. Users can add historic images, called templates, for which other users or they themself can record a new image and create a re- peat photograph. Creating repeat photographs involves registering the two images, i. e., aligning the images such that the same objects are at the same pixel locations in both images. Our system offers a new interactive registration method, allowing control over the registration process, since automatic methods yield poor results. Previous works addressed only subproblems and did not result in usable and freely accessible software as our web platform.

SESSION: Workshop Posters

Organizing Cultural Heritage with Deep Features

Abraham Montoya Obeso
Jenny Benois-Pineau
Mireya Saraí García Vázquez
Alejandro Álvaro Ramírez Acosta

In recent years, the preservation and diffusion of culture in the digital form has been a priority for the governments in different countries, as in Mexico, with the objective of preserving and spreading culture through information technologies. Nowadays, a large amount of multimedia content is produced. Therefore, more efficient and accurate systems are required to organize it. In this work, we analyze the ability of a pre-trained residual network (ResNet) to describe information through the extracted deep features and we analyze its behavior by grouping new data into clusters by the K-means method at different levels of compression with the PCA algorithm showing that the structuring of new input data can be done with the proposed method.

Deep Learning as a Tool for Early Cinema Analysis

Samarth Bhargav
Nanne van Noord
Jaap Kamps

Visual Cultural Heritage has extensively been explored using multimedia methods, but has so far been limited to still images. In particular, Early Cinema has hardly been explored. We analyze the Desmet collection, a recently digitized collection of early cinema (1907-1916), in the context of intertitles. Intertitles played an important role in silent movies in order to convey the main narratives, and split the film into semantically meaningful segments. We first build several classifiers to detect these intertitles, and evaluate it on a gold standard collection annotated by an expert. We illustrate the usefulness of using Deep Learning methods to extract semantic features to analyze the role of intertitles in early cinema. Furthermore, we attempt to structure and map the narrative progression of a film with respect to the locations at which shots were filmed.

ISHIGAKI Retrieval through Combinatorial Optimization

Sakino Ando
Gou Koutaki
Keiichiro Shirai

Two strong earthquakes that hit Kumamoto in April 2016 damaged the infrastructure and many buildings in the area, significantly impacting the people and traditional cultural assets of Kumamoto. In this project, using information technology, we aim to reconstruct the stone walls of Kumamoto castle, which is an important nationally designated special historical site in Japan. A stone wall is called "Ishigaki'' in Japanese. During the earthquake, many of the stones of the Kumamoto castle wall fell to the ground. Therefore, to reconstruct the stone wall, we must identify the original locations of the collapsed stones, similar to a jigsaw puzzle. In this paper, we formulate the problem by defining the cost function of the similarity between before and after collapse of the stones and propose the application of a combinatorial optimization algorithm. We tested the proposed solver by identifying 269 actual collapsed stones of the castle wall, and the results demonstrated that the proposed method can identify 253 stones correctly (94% accuracy).

An Ontology Web Application-based Annotation Tool for Intangible Culture Heritage Dance Videos

Sylvain Lagrue
Nathalie Chetcuti-Sperandio
Fabien Delorme
Chau Ma Thi
Duyen Ngo Thi
Karim Tabia
Salem Benferhat

Collecting dance videos, preserving and promoting them after enriching the collected data has been significant actions in preserving Intangible culture heritage in South-East Asia. Whereas techniques for the conceptual modeling of the expressive semantics of dance videos are very complex, they are crucial to exploit effectively the video semantics. This paper proposes an ontology web-based dance video annotation system for representing the semantics of dance videos at different granularity levels. Especially, the system incorporates both syntactic and semantic features of pre-built dance ontology system in order to not only use the available semantic web system but also to create unity for users when annotating videos to minimize conflicts.

SUMAC '19- Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents

SUMAC '19- Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents

SESSION: Keynote Talks

Visualizing Orientations of Large Numbers of Photographs

Beyond Three Dimensions: Managing Space, Time and Subjectivity in your Data

SESSION: Workshop Presentations

Pseudo-Cyclic Network for Unsupervised Colorization with Handcrafted Translation and Output Spatial Pyramids

Recognizing Characters in Art History Using Deep Learning

Historical and Modern Features for Buddha Statue Classification

Challenging Deep Image Descriptors for Retrieval in Heterogeneous Iconographic Collections

Processing Historical Film Footage with Photogrammetry and Machine Learning for Cultural Heritage Documentation

An Interactive Web Application for the Creation, Organization,and Visualization of Repeat Photographs

SESSION: Workshop Posters

Organizing Cultural Heritage with Deep Features

Deep Learning as a Tool for Early Cinema Analysis

ISHIGAKI Retrieval through Combinatorial Optimization

An Ontology Web Application-based Annotation Tool for Intangible Culture Heritage Dance Videos

Sections

User login