NOSSDAV '23: Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video

NOSSDAV '23: Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video

NOSSDAV '23: Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video


Full Citation in the ACM Digital Library

RepCaM: Re-parameterization Content-aware Modulation for Neural Video Delivery

  • Rongyu Zhang
  • Lixuan Du
  • Jiaming Liu
  • Congcong Song
  • Fangxin Wang
  • Xiaoqi Li
  • Ming Lu
  • Yandong Guo
  • Shanghang Zhang

Recently, content-aware methods have been utilized to reduce the bandwidth and improve the quality of Internet video delivery. Existing methods train corresponding content-aware super-resolution (SR) models for each video chunk on the server and stream low-resolution (LR) video chunks along with SR models to the client. Previous works introduce additional partial parameters to privatize the models of different video chunks. However, this still leads to the accumulation of parameters and even fails to modulate when the length of video increases, bringing extra delivery costs and performance degradation. In this paper, we introduce a novel Re-parameterization Content-aware Modulation (RepCaM) method to modulate all the video chunks with an end-to-end training strategy. Our method adopts extra parallel-cascade parameters during training to fit multiple chunks while removing the additional parameters through re-parameterization during inference. Therefore, RepCaM increases no extra model size compared with the original SR model. Moreover, in order to improve the training efficiency on servers, we propose an online Video Patch Sampling (VPS) method to speed up the training convergence. We conduct extensive experiments on VSD4K and newly collected dataset (VSD4K-2022), achieving state-of-the-art results in video restoration quality and delivery bandwidth compression. Code is available at: https://github.com/Neural-video-delivery/RepCaM-Pytorch-NOSSDAV2023.

Latency-Aware 360-Degree Video Analytics Framework for First Responders Situational Awareness

  • Jiaxi Li
  • Jingwei Liao
  • Bo Chen
  • Anh Nguyen
  • Aditi Tiwari
  • Qian Zhou
  • Zhisheng Yan
  • Klara Nahrstedt

First responders operate in hazardous working conditions with unpredictable risks. To better prepare for demands of the job, first responder trainees conduct training exercises that are being recorded and reviewed by the instructors, who check for objects indicating risks within the video recordings (e.g., firefighter with an unfastened gas mask). However, the traditional reviewing process is inefficient due to unanalyzed video recordings and limited situational awareness. For better reviewing experience, a latency-aware Viewing and Query Service (VQS) should be provided. The VQS should support object searching, which can be achieved using the video object detection algorithms. Meanwhile, the application of 360-degree cameras facilitates an unlimited field of view of the training environment. Yet, this medium represents a major challenge because low-latency high-accuracy 360-degree object detection is difficult due to higher resolution and geometric distortion. In this paper, we present the Responders-360 system architecture designed for 360-degree object detection. We propose a Dynamic Selection algorithm that optimizes computation resources while yielding accurate 360-degree object inference. The results, using a unique dataset collected from a firefighting training institute, show that the Responders-360 framework achieves 4x speedup and 25% memory usage reduction compared with the state-of-the-art methods.

DMGC:Deep Triangle Mesh Geometry Compression via Connectivity Prediction

  • Xudong Zhao
  • Xinyao Zeng
  • Linyao Gao
  • Yiling Xu
  • Yanfeng Wang

We propose a novel deep lossless geometry compression algorithm for triangle mesh. Typical traditional triangle mesh compression algorithms are connectivity-driven, which first codes connectivity and then codes vertices according to the encoded connectivity. However, vertex compression is inefficient since these algorithms lack the exploitation of vertices' intra-component redundancy, which point cloud compression (PCC) excels at. Therefore, our approach first compresses the vertices with a lossless PCC algorithm since the vertices could be considered as a point cloud. Moreover, based on the encoded vertices, the bitrate of connectivity could be decreased further by exploiting the cross-component redundancy between vertex and connectivity. Specifically, we divide the connectivity into KNN connectivity and isolated connectivity. We design a deep entropy model for KNN connectivity compression. This model extracts the spatial feature of encoded vertices first, then predicts the vertex-pairs' connection probabilities using the spatial feature. In order to exploit the intra-component redundancy, an auto-regressive strategy is also employed when predicting. The predicted probabilities are finally fed into a binary arithmetic coder to code the KNN connectivity into a compact bitstream. The isolated connectivity is encoded in direct coding mode (DCM). To our knowledge, this paper is the first work to utilize the deep neural network on mesh compression. We validate the effectiveness of our method on simplified MPEG V-DMC dataset. Experimental results demonstrate that the proposed method achieves average 7.33% and 40.75% bpv gains on connectivity and vertex separately, resulting average 30.26% bpv gains on total mesh compression compared with MPEG SC3DMC.

A Low Cost Cross-Platform Video/Image Process Framework Empowers Heterogeneous Edge Application

  • Danyang Song
  • Cong Zhang
  • Yifei Zhu
  • Jiangchuan Liu

Recently, video/image intelligent analytics has been widely used in industrial Artificial Intelligence (AI) applications, such as defect detection, face recognition, and security monitoring. To provide better applicability and compatibility in such applications, the embedded AI models must be developed, compiled, and deployed under different development frameworks, such as cuDNN, RKNN, etc. Unfortunately, these frameworks are supported by various Graphic Processing Unit (GPU) hardware vendors, resulting in different model parameter structures and increased development costs. To address these issues, we propose LiGo, a low cost cross-platform video/image process framework, that simplifies and accelerates video intelligent processing in practical heterogeneous hardware systems. LiGo1 provides video processing pipeline, cross-platform development environments, and unified model serving structures. We demonstrate LiGo's efficiency and flexibility in model generation and deployment through its use in supporting multiple real-world commercial industrial systems.

Will Dynamic Foveation Boost Cloud VR Gaming Experience?

  • Jia-Wei Fang
  • Kuan-Yu Lee
  • Teemu Kämäräinen
  • Matti Siekkinen
  • Cheng-Hsin Hsu

Cloud Virtual Reality (VR) gaming offloads the computationally-intensive rendering tasks from resource-limited Head-Mounted Displays (HMDs) to cloud servers, which consume a staggering amount of bandwidth for high-quality gaming experiences. One way to cope with such high bandwidth demands is to capitalize on human vision systems by allocating a higher bitrate to the foveal region of HMD viewport, which is known as foveation in the literature. Although foveation was employed by remote VR gaming, existing open-source projects all adopt static foveation, in which the HMD gamer gaze position is assumed to be fixed at the viewport center. In this paper, we construct the very first cloud VR gaming system that supports dynamic foveation. That is, the real-time gaze positions of gamers are streamed from eye-trackers on HMDs to cloud servers, which in turn adjust the foveation parameters, such as foveal region size/location and peripheral region quality degradation, accordingly. Using our developed cloud VR gaming system, we design and carry out a user study using a game called Fruit Ninja VR 2 to find the foveation parameters in static and dynamic foveation for maximizing the gaming Quality of Experience (QoE) in Mean Opinion Score (MOS). With the chosen foveation parameters, we found that, compared to cloud VR gaming without foveation, static foveation leads to a MOS increase of 0.60 and a bitrate reduction of 8.71%. Furthermore, adopting dynamic foveation results in an additional 0.60 increase on MOS while saving 9.81% bitrate, compared to static foveation. Our findings demonstrate the potential of dynamic foveation in cloud VR gaming, which dictates both high visual quality and short response time. The optimization techniques developed in this and follow-up work could benefit other cloud-rendered applications that typically have less strict requirements than cloud VR gaming.

Implementing Partial Atlas Selector for Viewport-dependent MPEG Immersive Video Streaming

  • Soonbin Lee
  • Jong-Beom Jeong
  • Eun-Seok Ryu

The ISO/IEC 23090-12 MPEG Immersive Video (MIV) standard technology, which provides immersive volumetric scenes with six degrees of freedom (6DoF), has recently been the subject of research and development efforts. The key concept of MIV technology is to generate an atlas that is a minimal representation of the multiple source view, with a low pixel rate to limit the number of existing video decoder instantiations. However, this atlas generation process produces dependencies between views in the reconstruction. This inability of conventional MIV to independently transmit and decode portions of the source view is a major challenge for 6DoF viewport-dependent streaming. This paper proposes a framework that can independently select and transmit only the atlas of the required area when rendering immersive content. This paper also presents a visibility calculation method to determine the importance of each atlas for viewport rendering. Experiments with a limited pixel rate under experimental conditions have shown that a highly efficient 6DoF viewport-dependent streaming system is achievable. The proposed method has been implemented with high-level syntax conformance in the MIV test model software, so this framework can be deployed with various adaptive streaming systems along with MIV bitstream in the future.

Realtime Multimedia Services over Starlink: A Reality Check

  • Haoyuan Zhao
  • Hao Fang
  • Feng Wang
  • Jiangchuan Liu

Recently, Low Earth orbit Satellite Networking (LSN) has been suggested as a critical and promising component toward high-bandwidth and low-latency global coverage in the upcoming 6G communication infrastructure. SpaceX's Starlink is arguably the largest and most operable LSN to date. There have been practical uses of Starlink with diverse networked applications, including multimedia applications of stringent demands. Given the mixed and inconsistent feedbacks from end users, it remains unclear whether today's LSNs, in particular, Starlink, have been ready for realtime multimedia. In this paper, we present a systematic measurement study on realtime multimedia services over Starlink, seeking insights into their operations and performance in this new generation networking. Our findings demonstrate that Starlink can effectively handle most video-on-demand (VoD) and live-streaming services with properly configured buffers, but suffer from video pauses or audio cut-offs during interactive video conferencing, especially in extreme weather. We also examine the impact of satellite switching and evolution of satellite routing strategies, offering hints into the future enhancements for multimedia services and for LSNs.

Cross that boundary: Investigating the feasibility of cross-layer information sharing for enhancing ABR decision logic over QUIC

  • Joris Herbots
  • Arno Verstraete
  • Maarten Wijnants
  • Peter Quax
  • Wim Lamotte

With HTTP Adaptive Streaming (HAS), client-side Adaptive Bitrate (ABR) algorithms drive the (quality-variant) scheduling and downloading of media segments. These ABR algorithms are implemented in the application layer and can therefore base their logic only on relatively coarse and/or inaccurate application-layer metrics. The recently standardized QUIC transport protocol has many userspace implementations, which paves the way for cross-layer optimizations by exposing transport-layer metrics to application-layer algorithms. In this paper, we investigate whether the availability of fine-grained transport-level throughput metrics can positively impact the operation of ABR algorithms and hence the Quality of Experience (QoE) of HAS users in Video on Demand (VoD) settings. Our results show that QUIC-level throughput data can indeed aid ABR algorithms to more accurately predict playout buffer under-runs, which in turn allows the ABR logic to take reactive measures in a timely fashion such that playback stalls can be avoided under challenging network conditions. Overall, our work presents a step towards improving ABR operation via cross-layer data exchange.

Improving ABR Performance for Short Video Streaming Using Multi-Agent Reinforcement Learning with Expert Guidance

  • Yueheng Li
  • Qianyuan Zheng
  • Zicheng Zhang
  • Hao Chen
  • Zhan Ma

In the realm of short video streaming, popular adaptive bitrate (ABR) algorithms developed for classical long video applications suffer from catastrophic failures because they are tuned to solely adapt bitrates. Instead, short video adaptive bitrate (SABR) algorithms have to properly determine which video at which bitrate level together for content prefetching, without sacrificing the users' quality of experience (QoE) and yielding noticeable bandwidth wastage jointly. Unfortunately, existing SABR methods are inevitably entangled with slow convergence and poor generalization. Thus, in this paper, we propose Incendio, a novel SABR framework that applies Multi-Agent Reinforcement Learning (MARL) with Expert Guidance to separate the decision of video ID and video bitrate in respective buffer management and bitrate adaptation agents to maximize the system-level utilized score modeled as a compound function of QoE and bandwidth wastage metrics. To train Incendio, it is first initialized by imitating the hand-crafted expert rules and then fine-tuned through the use of MARL. Results from extensive experiments indicate that Incendio outperforms the current state-of-the-art SABR algorithm with a 53.2% improvement measured by the utility score while maintaining low training complexity and inference time.

iStream Player: A Versatile Video Player Framework

  • Akram Ansari
  • Mea Wang

The increasing demand for video streaming in all forms draws significant research and development attention, especially on the client-side for adaptive streaming services like DASH and HLS. However, the implementation challenges in developing and validating new client-side solutions within a full-stack video player pose a major obstacle. State-of-the-art open-source video players, such as DASH.js, VLC, and GPAC, were designed for specific purposes and are difficult to extend and modify for video streaming research. To address this issue, we propose iStream Player, a versatile video player framework featuring fully extendable and independent micro-modules similar to Lego blocks. Constructing a video player in iStream Player is as simple as assembling Lego pieces. Our case studies demonstrate that it is effortless to create a diverse range of players by making only minor changes, such as extending or replacing only one or two micro-modules. As a result, iStream Player significantly reduces the time and effort required to develop and validate new solutions, providing researchers and developers in the video streaming field with a shared platform to explore and to share their innovative ideas.