PIES-ME '22: Proceedings of the 1st Workshop on Photorealistic Image and Environment Synthesis for Multimedia Experiments

PIES-ME '22: Proceedings of the 1st Workshop on Photorealistic Image and Environment Synthesis for Multimedia Experiments

PIES-ME '22: Proceedings of the 1st Workshop on Photorealistic Image and Environment Synthesis for Multimedia Experiments

Full Citation in the ACM Digital Library

SESSION: Keynote Speech

Hyper-Realistic and Immersive Imaging for Enhanced Quality of Experience

  • Frederic Dufaux

Producing truly realistic and immersive digital imaging is widely seen as the ultimate goal towards further improving Quality of Experience (QoE) for end users of multimedia services. The human visual system is able to perceive a wide range of colors, luminous intensities, and depth, as present in a real scene. However, current traditional imaging technologies cannot capture nor reproduce such a rich visual information. Recent research innovations have made it possible to address these bottlenecks in multimedia systems. As a result, new multimedia signal processing areas have emerged such as ultra-high definition, high frame rate, high dynamic range imaging, light field, and point cloud. These technologies have the potential to bring a leap forward for upcoming multimedia systems. However, the effective deployment of hyper-realistic video technologies entails many technical and scientific challenges. In this talk, I will discuss recent research activities covering several aspects of hyper-realistic imaging, including point cloud compression, light field compression, and semantic-aware tone mapping for high dynamic range imaging.

SESSION: Session 1: Indoor Scenes Creation and Datasets

Delving into Light-Dark Semantic Segmentation for Indoor Scenes Understanding

  • Xiaowen Ying
  • Bo Lang
  • Zhihao Zheng
  • Mooi Choo Chuah

State-of-the-art segmentation models are mostly trained with large-scale datasets collected under favorable lighting conditions, and hence directly applying such trained models to dark scenes will result in unsatisfactory performance. In this paper, we present the first benchmark dataset and evaluation methodology to study the problem of semantic segmentation under different lighting conditions for indoor scenes. Our dataset, namely LDIS, consists of samples collected from 87 different indoor scenes under both well-illuminated and low-light conditions. Different from existing work, our benchmark provides a new task setting, namely Light-Dark Semantic Segmentation (LDSS), which adopts four different evaluation metrics that assess the performance of a model from multiple aspects. We perform extensive experiments and ablation studies to compare the effectiveness of different existing techniques with our standardized evaluation protocol. In addition, we propose a new technique, namely DepthAux, that utilizes the consistency of depth images under different lighting conditions to help a model learn a unified and illumination-invariant representation. Our experimental results show that the proposed DepthAux can provide consistent and significant improvements when applied to a variety of different models. Our dataset and other resources are publicly available on our project page: http://mercy.cse.lehigh.edu/LDIS/

Language-guided Semantic Style Transfer of 3D Indoor Scenes

  • Bu Jin
  • Beiwen Tian
  • Hao Zhao
  • Guyue Zhou

We address the new problem of language-guided semantic style transfer of 3D indoor scenes. The input is a 3D indoor scene mesh and several phrases that describe the target scene. Firstly, 3D vertex coordinates are mapped to RGB residues by a multi-layer perceptron. Secondly, colored 3D meshes are differentiablly rendered into 2D images, via a viewpoint sampling strategy tailored for indoor scenes. Thirdly, rendered 2D images are compared to phrases, via pre-trained vision-language models. Lastly, errors are back-propagated to the multi-layer perceptron to update vertex colors corresponding to certain semantic categories. We did large-scale qualitative analyses and A/B user tests, with the public ScanNet and SceneNN datasets. We demonstrate: (1) visually pleasing results that are potentially useful for multimedia applications. (2) rendering 3D indoor scenes from viewpoints consistent with human priors is important. (3) incorporating semantics significantly improve style transfer quality. (4) an HSV regularization term leads to results that are more consistent with inputs and generally rated better. Codes and user study toolbox are available at https://github.com/AIR-DISCOVER/LASST.

Towards a Calibrated 360° Stereoscopic HDR Image Dataset for Architectural Lighting Studies

  • Michèle Atié
  • Toinon Vigier
  • François Eymond
  • Céline Drozd
  • Raphaël Labayrade
  • Daniel Siret
  • Yannick Sutter

High fidelity 360 degree images enhance user experience and offer realistic representations for architectural design studies. Specifically, VR and hyper-realistic imaging technologies can be helpful tools to study daylight in architectural places thanks to the high level of immersion and its ability to create perceptually accurate and faithful scene representations.

In this paper, we present a novel method for collecting and processing physically calibrated 360 degree stereoscopic high-dynamic range images of daylit indoor places. The future dataset aims to provide a higher degree of realism, a wide range of various luminous interior spaces supplied with information on the physical characterization of the space. This paper presents the first applications of this method on different places and discusses challenges for assessing visual perception in these images. In the near future, this dataset will be publicly available for architectural as well as multimedia studies.

SESSION: Session 2: Quality Assessment

Subjective Study of the Impact of Compression, Framerate, and Navigation Trajectories on the Quality of Free-Viewpoint Video

  • Jesús Gutiérrez
  • Adriana Galán
  • Pablo Pérez
  • Daniel Corregidor
  • Teresa Hernando
  • Javier Usón
  • Daniel Berjón
  • Julián Cabrera
  • Narciso García

This paper presents an exploratory study of the perceptual quality of different coding configurations of a real-time Free-viewpoint Video (FVV) system through a subjective experiment. In addition, different pre-defined camera trajectories were considered to analyze their impact on the visual quality and their relationship with the trajectories of the observers when freely exploring the content. For this experiment, a novel test methodology was used based on the participation of few observers who repeat the test in different moments. The results provide useful insights on options to reduce, if necessary, the amount of data to deliver FVV providing the highest possible quality to the end users. Also, they can help on defining trajectories that can be appealing for the users if they do not have the possibility to freely navigate through the content or that can be useful to perform valid subjective tests with pre-defined trajectories. Finally, the FVV dataset that has been created and used for this experiment will be made publicly available for the research community, once it is completed with more videos and results from future subjective tests.

Comparative Evaluation of Temporal Pooling Methods for No-Reference Quality Assessment of Dynamic Point Clouds

  • Pedro G. Freitas
  • Giovani D. Lucafo
  • Mateus Gonçalves
  • Johann Homonnai
  • Rafael Diniz
  • Mylène C.Q. Farias

Point Cloud Quality Assessment (PCQA) has become an important task in immersive multimedia since it is fundamental for improving computer graphics applications and ensuring the best Quality of Experience(QoE) for the end user. In recent years, the field of PCQA has made exemplary progress, with state-of-the-art methods achieving better predictive performance at lower computational complexity. However, most of this progress was made using Full Reference (FR) metrics. Since, in many cases, the reference point cloud is not available, the design of No-Reference (NR) methods has become increasingly important. In this paper, we investigate the suitability of geometric-aware texture descriptors to blindly assess the quality of colored Dynamic Point Cloud (DPC). The proposed metric first uses a descriptor to extract features of the assessed Point Cloud (PC) frames. Then, the descriptor statistics are used to extract quality-aware features. Finally, a machine learning algorithm is employed to regress the quality-aware features into visual quality scores, and these scores are aggregated using a temporal pooling function. Then we study the effects of different temporal pooling strategies on the performance of DPC quality assessment methods. Our experimental tests were carried out using the latest publicly available database and demonstrated the efficiency of the evaluated temporal pooling models. This work aims to provide a direction on how to apply a temporal pooling function to combine per-frame quality predictions generated with descriptor based PC quality assessment methods to estimate the quality of dynamic PCs. An implementation of the metric described in this paper can be found in https://gitlab.com/gpds-unb/no-referencedpcqa-temporal-pooling.