NOSSDAV '21: Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

NOSSDAV '21: Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

NOSSDAV '21: Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for
Digital Audio and Video

Full Citation in the ACM Digital Library

Vibra: neural adaptive streaming of VBR-encoded videos

  • Gangqiang Zhou
  • Run Wu
  • Miao Hu
  • Yipeng Zhou
  • Tom Z. J. Fu
  • Di Wu

Variable Bitrate (VBR) video encoding can provide much high quality-to-bits ratio
compared to the widely adopted Constant Bitrate (CBR) encoding, and thus receives
significant attentions by content providers in recent years. However, it is challenging
to design efficient adaptive bitrate algorithms for VBR-encoded videos due to the
sharply fluctuating chunk size and the resulting bitrate burstiness. In this paper,
we propose a neural adaptive streaming framework called Vibra for VBR-encoded videos, which can well accommodate the high fluctuation of video
chunk sizes and improve the quality-of-experience (QoE) of end users significantly.
Our framework takes the characteristics of VBR-encoded videos into account, and adopts
the technique of deep reinforcement learning to train a model for bitrate adaptation.
We also conduct extensive trace-driven experiments, and the results show that Vibra outperforms the state-of-the-art ABR algorithms with an improvement of 8.17% -- 29.21%
in terms of the average QoE.

Data diet pills: in-network video quality control system for traffic usage reduction

  • Anan Sawabe
  • Takanori Iwai
  • Akihiro Nakao

Traffic reduction for bandwidth-hungry video streaming services, such as YouTube,
benefits not only subscribers struggling to avoid going over their contracted data
limit, but also service providers when the number of people who use video streaming
services increase. Because not all stakeholders who want to reduce traffic usage are
willing to conduct cumbersome operations, e.g., manually setting lower resolution,
we argue here that network operators should introduce a traffic pacer for providing traffic reduction services as an optional plan for subscribers. This
paper proposes NetPacer, an in-network traffic pacing system for reducing traffic usage by degrading the
video quality. NetPacer has two features. The first is relative pacing, which degrades the video quality relative to the initial quality by traffic shaping,
thus enabling flexible quality control. The second is in-network timely video quality identification via encrypted traffic analysis by using machine learning. Through experiments, we
demonstrate that NetPacer successfully reduces traffic by 30.8% by degrading the resolution
by one level while keeping the QoE (i.e., Mean Opinion Score (MOS)) degradation below
0.268 points on average for 50 YouTube videos.

Uncertainty-aware robust adaptive video streaming with bayesian neural network and
model predictive control

  • Nuowen Kan
  • Chenglin Li
  • Caiyi Yang
  • Wenrui Dai
  • Junni Zou
  • Hongkai Xiong

In this paper, we propose BayesMPC, an uncertainty-aware robust adaptive bitrate (ABR) algorithm on the basis of Bayesian
neural network (BNN) and model predictive control (MPC). Specifically, to improve
the capacity of learning transition probability of the network throughput, we adopt
a BNN-based predictor that is able to predict the statistical distribution of future
throughput from the past throughput by not only considering the aleatoric uncertainty
(e.g., noise), but also capturing the epistemic uncertainty incurred by lack of adequate
training samples. We further show that by using the negative log-likelihood loss function
to train this BNN-based throughput predictor, the generalization error can be minimized
with the guarantee of PAC-Bayesian theorem. Rather than a point estimate, the learnt
uncertainty can contribute to a confidence region for the future throughput, the lower
bound of which then leads to an uncertainty-aware robust MPC strategy to maximize
the worst-case user quality-of-experience (QoE) w.r.t. this confidence region. Finally,
experimental results on three real-world network trace datasets validate the efficiency
of both the proposed BNN-based predictor and uncertainty-aware robust MPC strategy,
and demonstrate the superior performance compared to other baselines, in terms of
both the overall QoE performance and generalization across all ranges of heterogeneous
network and user conditions.

Common media client data (CMCD): initial findings

  • Abdelhak Bentaleb
  • May Lim
  • Mehmet N. Akcay
  • Ali C. Begen
  • Roger Zimmermann

In September 2020, the Consumer Technology Association (CTA) published the CTA-5004:
Common Media Client Data (CMCD) specification. Using this specification, a media client
can convey certain information to the content delivery network servers with object
requests. This information is useful in log association/analysis, quality of service/experience
monitoring and delivery enhancements. This paper is the first step toward investigating
the feasibility of CMCD in addressing one of the most common problems in the streaming
domain: efficient use of shared bandwidth by multiple clients. To that effect, we
implemented CMCD functions on an HTTP server and built a proof-of-concept system with
CMCD-aware dash.js clients. We show that even a basic bandwidth allocation scheme
enabled by CMCD reduces rebuffering rate and duration without noticeably sacrificing
the video quality.

PAAS: a preference-aware deep reinforcement learning approach for 360° video streaming

  • Chenglei Wu
  • Zhi Wang
  • Lifeng Sun

Conventional tile-based 360° video streaming methods, including deep reinforcement
learning (DRL) based, ignore the interactive nature of 360° video streaming and download
tiles following fixed sequential orders, thus failing to respond to the user's head
motion changes. We show that these existing solutions suffer from either the prefetch accuracy or the playback stability drop. Furthermore, these methods are constrained to serve only one fixed streaming
preference, causing extra training overhead and the lack of generalization on unseen preferences. In this paper, we propose a dual-queue streaming framework, with accuracy and stability
purposes respectively, to enable the DRL agent to determine and change the tile download
order without incurring overhead. We also design a preference-aware DRL algorithm
to incentivize the agent to learn preference-dependent ABR decisions efficiently.
Compared with state-of-the-art DRL baselines, our method not only significantly improves
the streaming quality, e.g., increasing the average streaming quality by 13.6% on
a public dataset, but also demonstrates better performance and generalization under
dynamic preferences, e.g., an average quality improvement of 19.9% on unseen preferences.

Multi-resolution quality-based video coding system for DASH scenarios

  • Wilmer Moina-Rivera
  • Juan Gutiérrez-Aguado
  • Miguel Garcia-Pineda

Today, more than 85% of Internet traffic has a multimedia component. Video streaming
occupies a large part of this percentage mainly because this type of content is provided
by the most used applications on the Internet (e.g. Twitch, TikTok, Disney+, YouTube,
Netflix, etc.). Most of these platforms use HTTP Adaptive Streaming (HAS) to send
this media content to end users in order to ensure a good quality of experience (QoE).
But, this QoE should be guaranteed from the video to be transmitted, i.e., the video
should have an adequate quality by minimizing the bitrate before transmission. In
order to solve this issue, we present a system capable of encoding a video in several
resolutions given the desired value of an objective metric. Our system includes the
objective metric in the encoding loop in order to maintain the quality in all segments.
This system has been tested with three video and five resolutions for each video.
Our proposal provides improvements of more than 10% in terms of video size and with
similar coding times when compared with a fixed Constant Rate Factor (CRF) encoding.
A visual comparison between our proposal and a fixed CRF encoding can be seen at:

ES-HAS: an edge- and SDN-assisted framework for HTTP adaptive video streaming

  • Reza Farahani
  • Farzad Tashtarian
  • Alireza Erfanian
  • Christian Timmerer
  • Mohammad Ghanbari
  • Hermann Hellwagner

Recently, HTTP Adaptive Streaming (HAS) has become the dominant video delivery technology over the Internet. In HAS,
clients have full control over the media streaming and adaptation processes. Lack
of coordination among the clients and lack of awareness of the network conditions
may lead to sub-optimal user experience and resource utilization in a pure client-based
HAS adaptation scheme. Software Defined Networking (SDN) has recently been considered to enhance the video streaming process. In this
paper, we leverage the capability of SDN and Network Function Virtualization (NFV) to introduce an edge- and SDN-assisted video streaming framework called ES-HAS.
We employ virtualized edge components to collect HAS clients' requests and retrieve
networking information in a time-slotted manner. These components then perform an
optimization model in a time-slotted manner to efficiently serve clients' requests
by selecting an optimal cache server (with the shortest fetch time). In case of a
cache miss, a client's request is served (i) by an optimal replacement quality (only better quality levels with minimum deviation)
from a cache server, or (ii) by the original requested quality level from the origin server. This approach is
validated through experiments on a large-scale testbed, and the performance of our
framework is compared to pure client-based strategies and the SABR system [12]. Although
SABR and ES-HAS show (almost) identical performance in the number of quality switches,
ES-HAS outperforms SABR in terms of playback bitrate and the number of stalls by at
least 70% and 40%, respectively.

360NorVic: 360-degree video classification from mobile encrypted video traffic

  • Chamara Kattadige
  • Aravindh Raman
  • Kanchana Thilakarathna
  • Andra Lutu
  • Diego Perino

Streaming 360° video demands high bandwidth and low latency, and poses significant
challenges to Internet Service Providers (ISPs) and Mobile Network Operators (MNOs).
The identification of 360° video traffic can therefore benefits fixed and mobile carriers
to optimize their network and provide better Quality of Experience (QoE) to the user.
However, end-to-end encryption of network traffic has obstructed identifying those
360° videos from regular videos. As a solution this paper presents 360NorVic, a near-realtime
and offline Machine Learning (ML) classification engine to distinguish 360° videos
from regular videos when streamed from mobile devices. We collect packet and flow
level data for over 800 video traces from YouTube & Facebook accounting for 200 unique
videos under varying streaming conditions. Our results show that for near-realtime
and offline classification at packet level, average accuracy exceeds 95%, and that
for flow level, 360NorVic achieves more than 92% average accuracy. Finally, we pilot
our solution in the commercial network of a large MNO showing the feasibility and
effectiveness of 360NorVic in production settings.

Deep reinforced bitrate ladders for adaptive video streaming

  • Tianchi Huang
  • Rui-Xiao Zhang
  • Lifeng Sun

In the typical transcoding pipeline for adaptive video streaming, raw videos are pre-chunked
and pre-encoded according to a set of resolution-bitrate or resolution-quality pairs
on the server-side, where the pair is often named as bitrate ladder. Different from existing heuristics, we argue that a good bitrate ladder should be
optimized by considering video content features, network capacity, and storage costs
on the cloud. We propose DeepLadder, a per-chunk optimization scheme which adopts state-of-the-art deep reinforcement
learning (DRL) method to optimize the bitrate ladder w.r.t the above concerns. Technically,
DeepLadder selects the proper setting for each video resolution autoregressively.
We use over 8,000 video chunks, measure over 1,000,000 perceptual video qualities,
collect real-world network traces for more than 50 hours, and invent faithful virtual
environments to help train DeepLadder efficiently. Across a series of comprehensive
experiments on both Constant Bitrate (CBR) and Variable Bitrate (VBR)-encoded videos,
we demonstrate significant improvements in average video quality bandwidth utilization,
and storage overhead in comparison to prior work as well as the ability to be deployed
in the real-world transcoding framework.

Higher quality live streaming under lower uplink bandwidth: an approach of super-resolution based video coding

  • Ying Chen
  • Qing Li
  • Aoyang Zhang
  • Longhao Zou
  • Yong Jiang
  • Zhimin Xu
  • Junlin Li
  • Zhenhui Yuan

With the growing popularity of live streaming, high video quality and low latency
with limited uplink bandwidth have become a significant challenge. In this study,
we propose Live Super-Resolution Based Video Coding (LiveSRVC), a novel video uploading framework that improves the quality of live
streaming with low latency under limited uplink bandwidth. We design a new super-resolution-based
key frame coding module to improve the coding compression efficiency. LiveSRVC dynamically
selects the bitrate and the compression ratio of key frames, mitigating the influence
of uplink bandwidth capacity on live streaming quality. Trace-driven emulations verify
that LiveSRVC can provide the same quality while reducing up to 50% of the required
bandwidth compared to the original encoding method (H.264). LiveSRVC consumes at least
10X less GPU occupation time compared to the method of reconstructing all frames with

Understanding quality of experience of heuristic-based HTTP adaptive bitrate algorithms

  • Babak Taraghi
  • Abdelhak Bentaleb
  • Christian Timmerer
  • Roger Zimmermann
  • Hermann Hellwagner

Adaptive bitrate (ABR) algorithms play a crucial role in delivering the highest possible
viewer's Quality of Experience (QoE) in HTTP Adaptive Streaming (HAS). Online video
streaming service providers use HAS - the dominant video streaming technique on the
Internet - to deliver the best QoE for their users. A viewer's delight relies heavily
on how the ABR of a media player can adapt the stream's quality to the current network
conditions. QoE for video streaming sessions has been assessed in many research projects
to give better insight into the significant quality metrics such as startup delay
and stall events. The ITU Telecommunication Standardization Sector (ITU-T) P.1203
quality evaluation model allows to algorithmically predict a subjective Mean Opinion
Score (MOS) by considering various quality metrics. Subjective evaluation is the best
assessment method for examining the end-user opinion over a video streaming session's
experienced quality. We have conducted subjective evaluations with crowdsourced participants
and evaluated the MOS of the sessions using the ITU-T P.1203 quality model. This paper's
main contribution is to investigate the correspondence of subjective and objective
evaluations for well-known heuristic-based ABRs.

CrowdSR: enabling high-quality video ingest in crowdsourced livecast via super-resolution

  • Zhenxiao Luo
  • Zelong Wang
  • Jinyu Chen
  • Miao Hu
  • Yipeng Zhou
  • Tom Z. J. Fu
  • Di Wu

The prevalence of personal devices motivates the rapid development of crowdsourced
livecast in recent years. However, there exists huge diversity of upstream bandwidth
among amateur broadcasters. Moreover, the highest video quality that can be streamed
is limited by the hardware configuration of broadcaster devices (e.g., 540p for low-end
mobile devices). The above factors pose significant challenges to the ingestion of
high-resolution live video streams, and result in poor quality-of-experience (QoE)
for viewers. In this paper, we propose a novel live video ingest approach called CrowdSR for crowdsourced livecast. CrowdSR can transform a low-resolution video stream uploaded
by weak devices into a high-resolution video stream via super-resolution, and then
deliver the stream to viewers. CrowdSR can exploit crowdsourced high-resolution video
patches from similar broadcasters to speedup model training. Different from previous
work, our approach does not require any modification at the client side, and thus
is more practical and easy to implement. Finally, we implement and evaluate CrowdSR
by conducting a series of real-world experiments. The results show that CrowdSR significantly
outperforms the baseline approaches by 0.42-1.09 dB in terms of PSNR and 0.006-0.014
in terms of SSIM.

Dynamic 3D point cloud streaming: distortion and concealment

  • Cheng-Hao Wu
  • Xiner Li
  • Rahul Rajesh
  • Wei Tsang Ooi
  • Cheng-Hsin Hsu

We present a study on the impact of packet loss on dynamic 3D point cloud streaming,
encoded with MPEG Video-based Point Cloud Compression (V-PCC) standard. We show the
distortion when different channels of V-PCC bitstream are lost, with the loss of occupancy
and geometry data impacting the quality most significantly. Our results point to the
need for better error concealment techniques. We end the paper by presenting preliminary
thoughts and experimental results of two naive error concealment techniques in the
point cloud domain, for attributes and geometry data, respectively, and highlight
the limitations of each.

Wifi-VLC dual connectivity streaming system for 6DOF multi-user virtual reality

  • Jacob Chakareski
  • Mahmudur Khan

We investigate a future WiFi-VLC dual connectivity streaming system for 6DOF multi-user
virtual reality that enables reliable high-fidelity remote scene immersion. The system
integrates an edge server that uses scalable 360° tiling to adaptively split the present
360° view of a VR user into a panoramic baseline content layer and a viewport-specific
enhancement content layer. The user is then served the two content layers over complementary
WiFi and VLC wireless links such that the delivered viewport quality is maximized
for the given WiFi and VLC transmission resources. We formally characterize the actions
of the server using rate-distortion optimization that we solve at low complexity.
To account for the users' mobility as they explore different 360° viewpoints of the
6DOF remote scene content and maintain reliable high-quality VLC connectivity, we
explore dynamic VLC transmitter steering and assignment in the system as graph bottleneck
matching that aims to maximize the received VLC SNR across all users. We formulate
an effective low-complexity solution to this discrete combinatorial optimization problem
of high complexity. The paper also contributes a first actual 6DOF body and head movement
VR navigation dataset that we collected and facilitate to assess the performance of
our system via simulation experiments. These demonstrate enhanced VLC transmission
performance and an up to 7 dB gain in viewport quality over a state-of-the-art VLC
cellular system (LiFi), and an up to 10 dB gain in viewport quality over a state-of-the-art
traditional wireless streaming method, for 12K-120fps 360° 6DOF VR content. Moreover,
the synergistic WiFi-VLC dual connectivity of the proposed system augments its reliability
over the reference method LiFi that comprises only VLC links. These outcomes motivate
further exploration and prototype implementation of our system.

Viewport-aware dynamic 360° video segment categorization

  • Amaya Dharmasiri
  • Chamara Kattadige
  • Vincent Zhang
  • Kanchana Thilakarathna

Unlike conventional videos, 360° videos give freedom to users to turn their heads,
watch and interact with the content owing to its immersive spherical environment.
Although these movements are arbitrary, similarities can be observed between viewport
patterns of different users and different videos. Identifying such patterns can assist
both content and network providers to enhance the 360° video streaming process, eventually
increasing the end-user Quality of Experience (QoE). But a study on how viewport patterns display similarities across different video content, and their potential
has not yet been done. In this paper, we present a comprehensive analysis of a dataset
of 88 360° videos and propose a novel video categorization algorithm that is based
on similarities of viewports. First, we propose a novel viewport clustering algorithm
that outperforms the existing algorithms in terms of clustering viewports with similar
positioning and speed. Next, we develop a novel and unique dynamic video segment categorization
algorithm that shows notable improvement in similarity for viewport distributions
within the clusters when compared to that of existing static video categorizations.