IXR '22: Proceedings of the 1st Workshop on Interactive eXtended Reality

IXR '22: Proceedings of the 1st Workshop on Interactive eXtended Reality

IXR '22: Proceedings of the 1st Workshop on Interactive eXtended Reality

Full Citation in the ACM Digital Library

SESSION: Keynote Talk 1

Deep Learning-based Extended Reality: Making Humans and Machines Speak the Same Visual Language

  • Fernando Pereira

The key goal of Extended Reality (XR) is to offer the human users immersive and interactive experiences, notably the sense of being in a virtual or augmented environment, interacting with virtual beings or objects. A fundamental element in this goal is the visual content, its realism, level of interactivity and immersion. The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where light fields and point clouds are playing an increasingly important role, offering 6DoF experiences in addition to the more common and limited 2D images and video-based experiences. This increased immersion is critical for emerging applications and services, notably virtual and augmented reality, personal communications and meetings, education and medical applications and virtual museum tours. To have effective remote experiences across the globe, it is critical that all types of visual information are efficiently compressed to be compatible with the bandwidth resources available. In this context, deep learning (DL)-based technologies came recently to play a central role, already overcoming the compression performances of the best previous, hand-made coding solutions. However, this breakthrough goes much beyond coding since DL-based tools are also nowadays the most effective for computer vision tasks such as classification, recognition, detection, and segmentation. This double win opens, for the first time, the door for a common visual representation language associated to the novel DL-based latents/coefficients which may simultaneously serve for human and machine consumption. While the humans will use the DL-based coded streams to decode immersive visual content, the machines will use the same precise streams for computer vision tasks, thus ?speaking' a common visual language. This is not possible with conventional visual representations, where the machine vision processors deal with decoded content, thus suffering from compression artifacts, and even at the cost of additional complexity. This visual representation approach will offer a more powerful and immersive augmented Extended Reality where humans and machines may more seamlessly participate at lower complexity. In this context, the main objective of this keynote talk is to discuss this DL-based dual-consumption paradigm, how it is being fulfilled and what are its impacts. Special attention will be dedicated to the ongoing standardization projects in this domain, notably in JPEG and MPEG.

SESSION: Session 1: Human Behaviour Analysis and Modelling for IXR

Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition

  • Silvia Rossi
  • Irene Viola
  • Pablo Cesar

This work presents an explorative behavioural analysis of users navigating in an immersive space aimed at enabling the next-generation multimedia systems. Our main goal is to understand how the user experience of immersive content with 6-Degrees-of-Freedom (DoF) is affected not only by the visual content and its quality but also by the disposition of the user. We based our investigations on traditional statistical metrics, on techniques that have been already used for 6-DoF, as well as adapted 3-DoF tools to be used in this new context. We show the limitation of each metric in giving a complete interpretation of user behaviour, and we draw insights on important factors to be considered when analysing and predicting navigation trajectories. Specifically, we have noticed in our behavioural investigations that the user disposition plays an important role in the way of interacting with the immersive content. This opens the gate to user profiles (i.e., a collection of key information that describes the behavioural features of a single or group of users) that would be beneficial for different purposes in future immersive applications such as enabling new modalities for live streaming services optimised per user profiles but also for user-based quality assessment methods.

Effects of Haptic Feedback on User Perception and Performance in Interactive Projected Augmented Reality

  • Sam Van Damme
  • Nicolas Legrand
  • Joris Heyse
  • Femke De Backere
  • Filip De Turck
  • Maria Torres Vega

By means of vibrotactile and force feedback, i.e., haptics, users are given the sensation of touching and manipulating virtual objects in interactive Extended Reality (XR) environments. However, research towards the influence of this feedback on the users' perception and performance in interactive XR is currently still scarce. In this work, we present an experimental evaluation of the effects of haptic feedback in interactive immersive applications. By means of a Projected Augmented Reality (PAR) setup, users were asked to interact with a projected environment by completing three different tasks based on finger-tracking and in the presence of visual latency. Evaluations were performed both subjectively (questionnaire) and objectively (i.e. duration and accuracy). We found out that while haptic feedback does not enhance the performance for simple tasks, it substantially improves it for more complex ones. This effect is more evident in presence of network degradation, such as latency. However, the subjective questionnaires showed a general skepticism about the potential of incorporating haptic information into immersive applications. As such, we believe that this paper provides an important contribution toward the understanding and assessment of the influence of haptic technology in interactive immersive systems.

Generating Realistic Synthetic Head Rotation Data for Extended Reality using Deep Learning

  • Jakob Struye
  • Filip Lemic
  • Jeroen Famaey

Extended Reality is a revolutionary method of delivering multimedia content to users. A large contributor to its popularity is the sense of immersion and interactivity enabled by having real-world motion reflected in the virtual experience accurately and immediately. This user motion, mainly caused by head rotations, induces several technical challenges. For instance, which content is generated and transmitted depends heavily on where the user is looking. Seamless systems, taking user motion into account proactively, will therefore require accurate predictions of upcoming rotations. Training and evaluating such predictors requires vast amounts of orientational input data, which is expensive to gather, as it requires human test subjects. A more feasible approach is to gather a modest dataset through test subjects, and then extend it to a more sizeable set using synthetic data generation methods. In this work, we present a head rotation time series generator based on TimeGAN, an extension of the well-known Generative Adversarial Network, designed specifically for generating time series. This approach is able to extend a dataset of head rotations with new samples closely matching the distribution of the measured time series.

SESSION: Keynote Talk 2

Adventures in the Realverse: Experiences and Challenges in XR Communications

  • Pablo Perez

Extended Reality is called to change human communications, in the same way that the telephone or the video call did. We are still at the beginning of the road, but the technology is already advanced and affordable enough to give us a glimpse of what XR communications will look like. What technological modules are needed to build an XR system that allows you to "teleport" to another location? Is it available to anyone, is it inclusive? What networks and telecommunication systems are needed to make it work? How can we model the Quality of Experience? In this talk we will raise these and other questions and try to outline some answers.

SESSION: Session 2: Enabling IXR Infrastructures

Optimal Camera Placement for 6 Degree-of-Freedom Immersive Video Streaming Without Accessing 3D Scenes

  • Sheng-Ming Tang
  • Yuan-Chun Sun
  • Jia-Wei Fang
  • Kuan-Yu Lee
  • Ching-Ting Wang
  • Cheng-Hsin Hsu

We consider a 6 Degree-of-Freedom (6DoF) immersive video streaming eco-system, which is composed of content creators, cloud service providers, and Head-Mounted Display (HMD) users. To avoid intellectual property leakage, content providers only offer rendering and query access, which significantly complicates the problem of placing source cameras to maximize the synthesized quality for multiple 6DoF HMD viewers. We address three key challenges: extremely large search space, nonlinear synthesized quality models, and computationally-intensive numerical algorithms. Three algorithm variants: C1G, C2G, and C2I, have been proposed. The first one employs a first-order coverage model, while the last two employ a second-order coverage model; moreover, the first two adopt a greedy method, while the last one adopts an Integer Programming (IP) solver. We implement a prototype system and carry out evaluations based on a real dataset consisting of 16 subjects collected by us. Our evaluation results reveal that, with 16 source cameras, our algorithm: (i) outperforms the baseline one by at least 2.37 dB in PSNR, 0.05 in SSIM, and 10.84 in VMAF and (ii) achieves an optimality gap as small as 1.00 dB in PSNR, 0.01 in SSIM, and 2.30 in VMAF.

Partially Reliable Transport Layer for QUICker Interactive Immersive Media Delivery

  • Hemanth Kumar Ravuri
  • Maria Torres Vega
  • Jeroen van der Hooft
  • Tim Wauters
  • Filip De Turck

requirements such as high bandwidth (i.e., several Gbps) and low latency (i.e., five milliseconds). Today, most video-streaming applications leverage the transmission control protocol (TCP) for reliable end-to-end transmission. However, the reliability of TCP comes at the cost of additional delay due to factors such as connection establishment, head-of-line (HOL) blocking, and retransmissions under sub-optimal network conditions. Such behavior can lead to stalling events or freezes, which are highly detrimental to the user's Quality of Experience (QoE). Recently, QUIC has gained traction in the research community, as it promises to overcome the shortcomings of TCP without compromising on reliability. However, while QUIC vastly reduces the connection establishment time and HOL blocking, thus increasing interactivity, it still underperforms while delivering multimedia due to retransmissions under lossy conditions. To cope with these, QUIC offers the possibility to support unreliable delivery, like that of the user datagram protocol (UDP). While live-video streaming applications usually opt for completely unreliable protocols, such an approach is not optimal for immersive media delivery since it is not affordable to lose certain data that might affect the end user's QoE. In this paper, we propose a partially reliable QUIC-based data delivery mechanism that supports both reliable (streams) and unreliable (datagrams) delivery. To evaluate its performance, we have considered two immersive-video delivery use cases, namely tiled 360-degree video and volumetric point clouds. Our approach outperforms state-of-the-art protocols, especially in the presence of network losses and delay. Even at a packet loss ratio as high as 5%, the number of freezing events for a 120-second video is almost zero as against 120 for TCP.

DHR: Distributed Hybrid Rendering for Metaverse Experiences

  • Yu Wei Tan
  • Alden Tan
  • Nicholas Nge
  • Anand Bhojan

Classically, rasterization techniques are performed for real-time rendering to meet the constraint of interactive frame rates. However, such techniques do not produce realistic results as compared to ray tracing approaches. Hence, hybrid rendering has emerged to improve the graphics fidelity of rasterization with ray tracing in real-time. We explore the approach of distributed rendering in incorporating real-time hybrid rendering into metaverse experiences for immersive graphics. In standalone extended reality (XR) devices, such ray tracing-enabled graphics is only feasible through pure cloud-based remote rendering systems that rely on low-latency networks to transmit real-time ray-traced data in response to interactive user input. Under high network latency conditions, remote rendering might not be able to maintain interactive frame rates for the client, adversely affecting the user experience. We adopt hybrid rendering via a distributed rendering approach by integrating ray tracing on powerful remote hardware with raster-based rendering on user access devices. With this hybrid approach, our technique can help standalone XR devices achieve ray tracing-incorporated graphics and maintain interactive frame rates even under high-latency conditions.

SESSION: Session 3: IXR Use Cases and Proof of Concepts

Tele-Robotics VR with Holographic Vision in Immersive Video

  • Gauthier Lafruit
  • Laurie Van Bogaert
  • Jaime Sancho Aragon
  • Michael Panzirsch
  • Gregoire Hirt
  • Klaus H. Strobl
  • Eduardo Juarez Martinez

We present a first-of-its-kind end-to-end tele-robotic VR system where the user operates a robot arm remotely, while being virtually immersed into the scene through force feedback and holographic vision. In contrast to stereoscopic head mounted displays that only provide depth perception to the user, the holographic vision device projects a light field, additionally allowing the user to correctly accommodate his/her eyes to the perceived depth of the scene's objects. The highly improved immersive user experience results in less fatigue in the tele-operator's daily work, creating safer and/or longer working conditions. The core technology relies on recent advances in immersive video coding for audio-visual transmission developed within the MPEG standardization committee. Virtual viewpoints are synthesized for the tele-operator's viewing direction from a couple of colour and depth fixed video feeds. Besides of the display hardware and its GPU-enabled view synthesis driver, the biggest challenge hides in obtaining high-quality and reliable depth images from low-cost depth sensing devices. Specialized depth refinement tools have been developed for running in realtime at zero delay within the end-to-end tele-robotic immersive video pipeline, which must remain interactive by essence. Various modules work asynchronously and efficiently at their own pace, with the acquisition devices typically being limited to 30 frames per second (fps), while the holographic headset updates its projected light field at up to 240 fps. Such modular approach ensures high genericity over a wide range of free navigation VR/XR applications, also beyond the tele-robotic one presented in this paper.

Virtual Visits: UX Evaluation of a Photorealistic AR-based Video Communication Tool

  • Marina Alvarez
  • Alexander Toet
  • Sylvie Dijkstra-Soudarissanane

Social communication is increasingly performed through video communication (VC) tools such as Teams or Zoom. Although VC is an effective way to maintain relationships remotely, it is still limited in creating a sense of social connectedness since the interaction does not feel as natural as a physical encounter. We evaluated the user experience of a new photorealistic VC tool based on augmented reality (AR) compared with a popular VC tool (MS Teams), focusing on the feeling of social presence they provide. Participants experienced one of the systems and answered a questionnaire about their experience. In the AR condition, participants perceived a 3D image of the remote person projected in their local background (Fig.1). In the "regular video call" condition, using MS Teams, participants perceived a 2D image of the remote person in the remote background. The variables measured were social presence, interpersonal closeness, global user experience, duration of calls and perception of time, behaviour, and attitude toward the AR tool. No significant differences were found in subjective measures of social presence, interpersonal closeness, and global UX. There were also no significant differences in the duration of calls or time perception per condition. However, behavioural analysis showed that with the AR tool participants used significantly more hand gestures (non-verbal communication), showed a more relaxed attitude and laughed more frequently. Although we did not find any evidence that it provides more social presence, the AR communication tool appears to induce more natural behaviour. Both conditions received high scores on social presence and global UX. Qualitative analysis of open questions and comments showed that the human-sized image of the communication partner was the main factor driving these scores and suggested improvements for future system design.

Engagement and Quality of Experience in Remote Business Meetings: A Social VR Study

  • Simardeep Singh
  • Sylvie Dijkstra-Soudarissanane
  • Simon Gunkel

Currently, most videoconferencing technologies do not keep employees sufficiently engaged during business meetings. Recent studies have shown how extended reality (XR) technologies can help in executing remote meetings in new and possibly better ways. One important factor for meetings in e.g. Virtual Reality (VR) is avatar realism, with the assumption that photorealistic representations of users increase the engagement during meetings compared to a model-based graphical representation. However, so far only limited studies have been conducted in a real-world setting with a social virtual reality communication system in which users are represented as photorealistic avatars. Therefore, in this paper, we present a pilot study using a social VR communication system that allows employees of an organisation to meet each other from different remote office locations in The Netherlands. The users are captured with a depth camera, after which the capture is rendered in the HMD's of the users. Furthermore, the research provides a novel way to subjectively investigate the engagement and quality of experience (QoE) in social VR in real-world settings and long-term use. Our correlation analysis shows that there are strong linear relationships between the quality of communication, embodiment, immersion, social presence, and meeting-engagement. Furthermore, there are strong linear relationships between usability, quality of interaction, and quality of experience.