SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents

SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents

SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents


Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Opportunities and Challenges in Digitally Transforming World Heritage at 50+ Years

  • Mario Santana Quintero

International legal instruments and institutions have reached over half a century of existence; for example, in 2022, the UNESCO World Heritage Convention with 50 years of existence. Furthermore, "The Second Congress of Architects and Specialists of Historic Buildings, in Venice in 1964, the first one being the International Restoration Charter, better known as the Venice Charter, and the second one, put forward by UNESCO, provided for the creation of the International Council on Monuments and Sites (ICOMOS)" [1], ICOMOS is the most important not-for-profit heritage network that serves as advisory body to UNESCO's World Heritage Committee.

The ICOMOS Charter of Venice of 1964, in its article 16, already acknowledges the importance of "precise documentation in the form of analytical and critical reports, illustrated with drawings and photographs" [2], and simultaneously the establishment of these important organizations and charters, Abu Simbel temple was documented and moved to a safe location to protect the integrity of this significant historic site in Egypt, the documentation of this temple was done with photogrammetry, at the time the most advanced technology at the time. In contrast, nowadays, anyone with decent equipment and an adequate skill set can document the historic site comparable to the work at Abu Simbel. The products of this work can be displayed in sophisticated virtual reality headsets and disseminated to broader audiences.

Moreover, technology allows us to identify buried sites in the densest jungles and monitor the impacts of climate change on world heritage sites worldwide; technology has substantially enhanced the understanding of cultural heritage resources at large. With the frenetic technological development, new opportunities are offered but also challenges.

One can argue that digital technologies provide opportunities to document and conserve world heritage sites in more detailed. Advanced imaging techniques, 3D simulations, and enhanced visualization approaches, such as virtual reality, offer new platforms for disseminating and exchanging knowledge.

However, many actors have not considered the ethical implications of digital assets for respecting communities where those sites are located. Furthermore, the digital divide between those who can apply those technologies and others who lack financial means, several examples of digital colonialism or appropriation are shameful for world heritage protection.

For this reason, FAIR [3] and CARE [4] among other principles of digital data management desperately need adoption in improving the effectiveness and correctness of digital technologies used for world heritage. In brief, digital assets should be respectful, sustainable, transferable, and ethically correct; they should benefit and contribute to the protection of those sites and provide opportunities to the rightsholders to get help from their use. Also, digital assets should transcend longer periods, so future generations can enjoy them and reuse for better application.

Knowledge Graphs for Cultural Heritage and Digital Humanities

  • Victor de Boer

Galleries, Libraries, Archives and Museums (so-called GLAMs) as well as digital humanities researchers are more and more publishing and sharing digital data, information and knowledge online. This opens new opportunities for cross-institute, cross-researcher and cross-project collaborations and analyses of such (multi-modal) data. Semantic Web principles and technologies and more specifically Knowledge Graphs are excellent models for representing and integrating heterogeneous data while allowing for standardized access and querying. However, several challenges remain in the modelling, enriching and linking such cultural knowledge graphs, as well as making them usable for a variety of real-world user tasks.

In this talk, I discuss the promises and challenges of designing, constructing and enriching knowledge graphs for cultural heritage and digital humanities. I will talk about semantic interoperability and how connecting previously unconnected data and knowledge presents new opportunities for historians, media scholars and other researchers. User-centric challenges include how such integrated and multimodal data can be browsed, queried or analysed using for state of the art machine learning.

I will also address the issue of polyvocality, where multiple perspectives on (historical) information is to be represented. Especially in contexts such as that of (post-)colonial heritage, representing multiple voices is crucial. I will show ongoing research on how knowledge graphs can provide excellent vehicles for this.

SESSION: Workshop Presentations

Latent Wander: an Alternative Interface for Interactive and Serendipitous Discovery of Large AV Archives

  • Yuchen Yang
  • Linyida Zhang

Audiovisual (AV) archives are invaluable for holistically preserving the past. Unlike other forms, AV archives can be difficult to explore. This is not only because of its complex modality and sheer volume but also the lack of appropriate interfaces beyond keyword search. The recent rise in text-to-video retrieval tasks in computer science opens the gate to accessing AV content more naturally and semantically, able to map natural language descriptive sentences to matching videos. However, applications of this model are rarely seen. The contribution of this work is threefold. First, working with RTS (Télévision Suisse Romande), we identified the key blockers in a real archive for implementing such models. We built a functioning pipeline for encoding raw archive videos to the text-to-video feature vectors. Second, we designed and verified a method to encode and retrieve videos using emotionally abundant descriptions not supported in the original model. Third, we proposed an initial prototype for immersive and interactive exploration of AV archives in a latent space based on the previously mentioned encoding of videos.

Spatially Localised Immersive Contemporary and Historic Photo Presentation on Mobile Devices in Augmented Reality

  • Loris Sauter
  • Tim Bachmann
  • Luca Rossetto
  • Heiko Schuldt

These days, taking a photo is the most common way of capturing a moment. Some of these photos captured in the moment are never to be seen again. Others are almost immediately shared with the world. Yet, the context of the captured moment can only be shared to a limited extent. The continuous improvement of mobile devices has not only led to higher resolution cameras and, thus, visually more appealing pictures but also to a broader and more precise range of accompanying sensor metadata. Positional and bearing information can provide context for photos and is thus an integral aspect of the captured moment. However, it is commonly only used to sort photos by time and possibly group by place. Such more precise sensor metadata, combined with the increased computing power of mobile devices, can enable more and more powerful Augmented Reality (AR) capabilities, especially for communicating the context of a captured photo. Users can thereby witness the captured moment in its real location and also experience its spatial contextualization. With the help of a suitable data augmentation, such context-preserving presentation can be extended even to non-digitally born content, including historical images. This offers new immersive ways to experience the cultural history of one's current location. In this paper, we present an approach for location-based image presentation in AR on mobile devices. With this approach, users can experience captured moments in their physical context. We demonstrate the power of this approach based on a prototype implementation and evaluate it in a user study.

"Do touch!" - 3D Scanning and Printing Technologies for the Haptic Representation of Cultural Assets: A Study with Blind Target Users

  • Arne Bruns
  • Anika A. Spiesberger
  • Andreas Triantafyllopoulos
  • Patric Müller
  • Björn W. Schuller

Visiting museums can be challenging for visually impaired people, as many objects are hidden behind glass walls and information is limited to descriptions. One of the best ways to increase accessibility and inclusion in museums and other cultural heritage institutions is through the use of 3D-printed replicas. However, there are several different scanning and printing processes that not only differ in terms of effort and cost but can also produce very different results. This paper evaluates two different scanning techniques and four different printing processes in terms of these aspects and includes feedback from a group of blind and partially sighted users on the aesthetic quality and fidelity of the printed objects. We found differences between the scanning methods mainly regarding their ease of use. Of the printing methods tested, stereolithography was preferred by the majority of participants for use in the museum. Additionally, we include user comments which touch on the general aspects of presenting museum artefacts using haptic devices. Our study thus provides valuable insights into the preferences of the target users, which can be used to inform decisions about more inclusive museum experiences.

Why Don't You Speak?: A Smartphone Application to Engage Museum Visitors Through Deepfakes Creation

  • Matteo Zaramella
  • Irene Amerini
  • Paolo Russo

In this paper, we offer a gamification-based application for the cultural heritage sector that aims to enhance the learning and fruition of museum artworks. The application encourages users to experience history and culture in the first person, based on the idea that the artworks in a museum can tell their own story, thus improving the engagement of the museums and providing information on the artwork itself.

Specifically, we propose an application that allows museum visitors to create a deepfake video of a sculpture directly through their smartphone. More in detail, starting from a few live frames of a statue, the application generates in a short time a deepfake video where the statue talks by moving its lips synchronized with a text or audio file. The application exploits an underlying generative adversarial network technology and has been specialized on a custom statues dataset collected for the purpose. Experiments show that the generated videos exhibit great realism in the vast majority of cases, demonstrating the importance of a reliable statue face detection algorithm. The final aim of our application is to make the museum experience different, with a more immersive interaction and an engaging user experience, which could potentially attract more people to deepen classical history and culture.

Clustering for the Analysis and Enrichment of Corpus of Images for the Spatio-temporal Monitoring of Restoration Sites

  • Laura Willot
  • Dan Vodislav
  • Valerie Gouet-Brunet
  • Livio De Luca
  • Adeline Manuel

After the fire that took place in the cathedral Notre-Dame de Paris, a Scientific research group composed of 9 Working Groups was set up to study the building. With the aim of creating a digital ecosystem for spatio-temporal monitoring of a restoration site, efforts have been put to develop methodologies for data management and enrichment. In line with this approach, this work focuses on the exploration, analysis and enrichment of a corpus of images and annotations of the cathedral, with regards to their temporal, spatial and semantic aspects, in addition to their visual content. Using prevailing clustering methods focusing on different criteria, we obtained several clusters of images from which new information can be inferred. By exploring these complementary aspects, we identified similarity links between the images. Further work will focus on multimodal analyses of the corpus.

SniffyArt: The Dataset of Smelling Persons

  • Mathias Zinnen
  • Azhar Hussian
  • Hang Tran
  • Prathmesh Madhu
  • Andreas Maier
  • Vincent Christlein

Smell gestures play a crucial role in the investigation of past smells in the visual arts yet their automated recognition poses significant challenges. This paper introduces the SniffyArt dataset, consisting of 1941 individuals represented in 441 historical artworks. Each person is annotated with a tightly fitting bounding box, 17 pose keypoints, and a gesture label. By integrating these annotations, the dataset enables the development of hybrid classification approaches for smell gesture recognition. The dataset's high-quality human pose estimation keypoints are achieved through the merging of five separate sets of keypoint annotations per person. The paper also presents a baseline analysis, evaluating the performance of representative algorithms for detection, keypoint estimation, and classification tasks, showcasing the potential of combining keypoint estimation with smell gesture classification. The SniffyArt dataset lays a solid foundation for future research and the exploration of multi-task approaches leveraging pose keypoints and person boxes to advance human gesture and olfactory dimension analysis in historical artworks.

Beyond Built Year Prediction: The Bag of Time Model and a Case Study of Buddha Images

  • Li Weng

In this work, we propose a novel approach to the dating of cultural heritage objects using a bag-of-time (BoT) model, which combines visual and temporal information. The BoT model extends the traditional bag-of-words (BoW) representation, allowing multiple cultural elements in an image to be associated with different time points. We introduce the concept of cultural elements as stable and distinct data patterns, enabling the BoT model to encode time information efficiently. Furthermore, we present an aggregated bag-of-time (ABoT) model that captures the temporal distribution of image contents. Our experiments on a Buddha image dataset demonstrate the effectiveness of the proposed approach in predicting built years and enabling fine-grained dating. Additionally, we explore the applications of multi-modal image retrieval and time-based visualization, showcasing the versatility of our models in cultural heritage research and analysis.