MMSys '18- Proceedings of the 9th ACM Multimedia Systems Conference


SESSION: Research track

DASHing towards hollywood

  •      Saba Ahsan
  • Stephen McQuistin
  • Colin Perkins
  • Jörg Ott

Adaptive streaming over HTTP has become the de-facto standard for video streaming over the Internet, partly due to its ease of deployment in a heavily ossified Internet. Though performant in most on-demand scenarios, it is bound by the semantics of TCP, with reliability prioritised over timeliness, even for live video where the reverse may be desired. In this paper, we present an implementation of MPEG-DASH over TCP Hollywood, a widely deployable TCP variant for latency sensitive applications. Out-of-order delivery in TCP Hollywood allows the client to measure, adapt and request the next video chunk even when the current one is only partially downloaded. Furthermore, the ability to skip frames, enabled by multi-streaming and out-of-order delivery, adds resilience against stalling for any delayed messages. We observed that in high latency and high loss networks, TCP Hollywood significantly lowers the possibility of stall events and also supports better quality downloads in comparison to standard TCP, with minimal changes to current adaptation algorithms.

Want to play DASH?: a game theoretic approach for adaptive streaming over HTTP

  •      Abdelhak Bentaleb
  • Ali C. Begen
  • Saad Harous
  • Roger Zimmermann

In streaming media, it is imperative to deliver a good viewer experience to preserve customer loyalty. Prior research has shown that this is rather difficult when shared Internet resources struggle to meet the demand from streaming clients that are largely designed to behave in their own self-interest. To date, several schemes for adaptive streaming have been proposed to address this challenge with varying success. In this paper, we take a different approach and develop a game theoretic approach. We present a practical implementation integrated in the dash.js reference player and provide substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments. Our approach outperforms its competitors in the average viewer experience by 38.5% and in video stability by 62%.

Film editing: new levers to improve VR streaming

  •      Savino Dambra
  • Giuseppe Samela
  • Lucile Sassatelli
  • Romaric Pighetti
  • Ramon Aparicio-Pardo
  • Anne-Marie Pinna-Déry

Streaming Virtual Reality (VR), even under the mere form of 360° videos, is much more complex than for regular videos because to lower the required rates, the transmission decisions must take the user's head position into account. The way the user exploits her/his freedom is therefore crucial for the network load. In turn, the way the user moves depends on the video content itself. VR is however a whole new medium, for which the film-making language does not exist yet, its "grammar" only being invented. We present a strongly inter-disciplinary approach to improve the streaming of 360° videos: designing high-level content manipulations (film editing) to limit and even control the user's motion in order to consume less bandwidth while maintaining the user's experience. We build an MPEG DASH-SRD player for Android and the Samsung Gear VR, featuring FoV-based quality decision and a replacement strategy to allow the tiles' buffers to build up while keeping their state up-to-date with the current FoV as much as bandwidth allows. The editing strategies we design have been integrated within the player, and the streaming module has been extended to benefit from the editing. Two sets of user experiments enabled to show that editing indeed impacts head velocity (reduction of up to 30%), consumed bandwidth (reduction of up to 25%) and subjective assessment. User's attention driving tools from other communities can hence be designed in order to improve streaming. We believe this innovative work opens up the path to a whole new field of possibilities in defining degrees of freedom to be wielded for VR streaming optimization.

Combining skeletal poses for 3D human model generation using multiple kinects

  •      Kevin Desai
  • Balakrishnan Prabhakaran
  • Suraj Raghuraman

RGB-D cameras, such as the Microsoft Kinect, provide us with the 3D information, color and depth, associated with the scene. Interactive 3D Tele-Immersion (i3DTI) systems use such RGB-D cameras to capture the person present in the scene in order to collaborate with other remote users and interact with the virtual objects present in the environment. Using a single camera, it becomes difficult to estimate an accurate skeletal pose and complete 3D model of the person, especially when the person is not in the complete view of the camera. With multiple cameras, even with partial views, it is possible to get a more accurate estimate of the skeleton of the person leading to a better and complete 3D model. In this paper, we present a real-time skeletal pose identification approach that leverages on the inaccurate skeletons of the individual Kinects, and provides a combined optimized skeleton. We estimate the Probability of an Accurate Joint (PAJ) for each joint from all of the Kinect skeletons. We determine the correct direction of the person and assign the correct joint sides for each skeleton. We then use a greedy consensus approach to combine the highly probable and accurate joints to estimate the combined skeleton. Using the individual skeletons, we segment the point clouds from all the cameras. We use the already computed PAJ values to obtain the Probability of an Accurate Bone (PAB). The individual point clouds are then combined one segment after another using the calculated PAB values. The generated combined point cloud is a complete and accurate 3D representation of the person present in the scene. We validate our estimated skeleton against two well-known methods by computing the error distance between the best view Kinect skeleton and the estimated skeleton. An exhaustive analysis is performed by using around 500000 skeletal frames in total, captured using 7 users and 7 cameras. Visual analysis is performed by checking whether the estimated skeleton is completely present within the human model. We also develop a 3D Holo-Bubble game to showcase the real-time performance of the combined skeleton and point cloud. Our results show that our method performs better than the state-of-the-art approaches that use multiple Kinects, in terms of objective error, visual quality and real-time user performance.

Blind image quality assessment based on multiscale salient local binary patterns

  •      Pedro Garcia Freitas
  • Sana Alamgeer
  • Welington Y. L. Akamine
  • Mylène C. Q. Farias

Due to the rapid development of multimedia technologies, over the last decades image quality assessment (IQA) has become an important topic. As a consequence, a great research effort has been made to develop computational models that estimate image quality. Among the possible IQA approaches, blind IQA (BIQA) is of fundamental interest as it can be used in most multimedia applications. BIQA techniques measure the perceptual quality of an image without using the reference (or pristine) image. This paper proposes a new BIQA method that uses a combination of texture features and saliency maps of an image. Texture features are extracted from the images using the local binary pattern (LBP) operator at multiple scales. To extract the salient of an image, i.e. the areas of the image that are the main attractors of the viewers' attention, we use computational visual attention models that output saliency maps. These saliency maps can be used as weighting functions for the LBP maps at multiple scales. We propose an operator that produces a combination of multiscale LBP maps and saliency maps, which is called the multiscale salient local binary pattern (MSLBP) operator. To define which is the best model to be used in the proposed operator, we investigate the performance of several saliency models. Experimental results demonstrate that the proposed method is able to estimate the quality of impaired images with a wide variety of distortions. The proposed metric has a better prediction accuracy than state-of-the-art IQA methods.

Favor: fine-grained video rate adaptation

  •      Jian He
  • Mubashir Adnan Qureshi
  • Lili Qiu
  • Jin Li
  • Feng Li
  • Lei Han

Video rate adaptation has large impact on quality of experience (QoE). However, existing video rate adaptation is rather limited due to a small number of rate choices, which results in (i) under-selection, (ii) rate fluctuation, and (iii) frequent rebuffering. Moreover, selecting a single video rate for a 360° video can be even more limiting, since not all portions of a video frame are equally important. To address these limitations, we identify new dimensions to adapt user QoE - dropping video frames, slowing down video play rate, and adapting different portions in 360° videos. These new dimensions along with rate adaptation give us a more fine-grained adaptation and significantly improve user QoE. We further develop a simple yet effective learning strategy to automatically adapt the buffer reservation to avoid performance degradation beyond optimization horizon. We implement our approach Favor in VLC, a well known open source media player, and demonstrate that Favor on average out-performs Model Predictive Control (MPC), rate-based, and buffer-based adaptation for regular videos by 24%, 36%, and 41%, respectively, and 2X for 360° videos.

Watermarked video delivery: traffic reduction and CDN management

  •      Kun He
  • Patrick Maillé
  • Gwendal Simon

In order to track the users who illegally re-stream live video streams, one solution is to embed identified watermark sequences in the video segments to distinguish the users. However, since all types of watermarked segments should be prepared, the existing solutions require an extra cost of bandwidth for delivery (at least multiplying by two the required bandwidth). In this paper, we study how to reduce the inner delivery (traffic) cost of a Content Delivery Network (CDN). We propose a mechanism that reduces the number of watermarked segments that need to be encoded and delivered. We calculate the best- and worst-case traffics for two different cases: multicast and unicast. The results illustrate that even in the worst cases, the traffic with our approach is much lower than without reducing. Moreover, the watermarked sequences can still maintain uniqueness for each user. Experiments based on a real database are carried out, and illustrate that our mechanism significantly reduces traffic with respect to the current CDN practice.

Category-aware hierarchical caching for video-on-demand content on youtube

  •      Christian Koch
  • Johannes Pfannmüller
  • Amr Rizk
  • David Hausheer
  • Ralf Steinmetz

Content delivery networks (CDNs) carry more than half of the video content in today's Internet. By placing content in caches close to the users, CDNs help increasing the Quality of Experience, e.g., by decreasing the delay until a video playback starts. Existing works on CDN cache performance focus mostly on distinct caching metrics, such as hit rate, given an abstract workload model. Moreover, the nature of the geographical distribution and connection of caches is often oversimplified. In this work, we investigate the performance of cache hierarchies while taking into account the presence of a mixed content workload comprising multiple categories, e.g., news, comedy, and music. We consider the performance of existing caching strategies in terms of cache hit rate and deterioration costs in terms of write operations. Further, we contribute a design and an evaluation of a content category-aware caching strategy, which has the benefit of being sensitive to changing category-specific content popularity. We evaluate our caching strategy, denoted as ACDC (Adaptive Content-Aware Designed Cache), using multiple caching hierarchy models, different cache sizes, and a real world trace covering one week of YouTube requests observed in a large European mobile ISP network. We demonstrate that ACDC increases the cache hit rate for certain hierarchies up to 18.39% and decreases transmission latency up to 12%. Additionally, a decrease in disk write operations up to 55% is observed.

VideoNOC: assessing video QoE for network operators using passive measurements

  •      Tarun Mangla
  • Ellen Zegura
  • Mostafa Ammar
  • Emir Halepovic
  • Kyung-Wook Hwang
  • Rittwik Jana
  • Marco Platania

Video streaming traffic is rapidly growing in mobile networks. Mobile Network Operators (MNOs) are expected to keep up with this growing demand, while maintaining a high video Quality of Experience (QoE). This makes it critical for MNOs to have a solid understanding of users' video QoE with a goal to help with network planning, provisioning and traffic management. However, designing a system to measure video QoE has several challenges: i) large scale of video traffic data and diversity of video streaming services, ii) cross-layer constraints due to complex cellular network architecture, and iii) extracting QoE metrics from network traffic. In this paper, we present VideoNOC, a prototype of a flexible and scalable platform to infer objective video QoE metrics (e.g., bitrate, rebuffering) for MNOs. We describe the design and architecture of VideoNOC, and outline the methodology to generate a novel data source for fine-grained video QoE monitoring. We then demonstrate some of the use cases of such a monitoring system. VideoNOC reveals video demand across the entire network, provides valuable insights on a number of design choices by content providers (e.g., OS-dependent performance, video player parameters like buffer size, range of encoding bitrates, etc.) and helps analyze the impact of network conditions on video QoE (e.g., mobility and high demand).

Dynamic input anomaly detection in interactive multimedia services

  •      Mohammed Shatnawi
  • Mohamed Hefeeda

Multimedia services like Skype, WhatsApp, and Google Hangouts have strict Service Level Agreements (SLAs). These services attempt to address the root causes of SLA violations through techniques such as detecting anomalies in the inputs of the services. The key problem with current anomaly detection and handling techniques is that they can't adapt to service changes in real-time. In current techniques, historic data from prior runs of the service are used to identify anomalies in the service inputs like number of concurrent users, and system states like CPU utilization. These techniques do not evaluate the current impact of anomalies on the service. Thus, they may raise alerts and take corrective measures even if the detected anomalies do not cause SLA violations. Alerts are expensive to handle from a system and engineering support perspectives, and should be raised only if necessary. We propose a dynamic approach for handling service input and system state anomalies in multimedia services in real-time, by evaluating the impact of anomalies, independently and associatively, on the service outputs. Our proposed approach alerts and takes corrective measures like capacity allocations if the detected anomalies result in SLA violations. We implement our approach in a large-scale operational multimedia service, and show that it increases anomaly detection accuracy by 31%, reduces anomaly alerting false positives by 71%, false negatives by 69%, and enhances media sharing quality by 14%.

From theory to practice: improving bitrate adaptation in the DASH reference player

  •      Kevin Spiteri
  • Ramesh Sitaraman
  • Daniel Sparacio

Modern video streaming uses adaptive bitrate (ABR) algorithms than run inside video players and continually adjust the quality (i.e., bitrate) of the video segments that are downloaded and rendered to the user. To maximize the quality-of-experience of the user, ABR algorithms must stream at a high bitrate with low rebuffering and low bitrate oscillations. Further, a good ABR algorithm is responsive to user and network events and can be used in demanding scenarios such as low-latency live streaming. Recent research papers provide an abundance of ABR algorithms, but fall short on many of the above real-world requirements.

We develop Sabre, an open-source publicly-available simulation tool that enables fast and accurate simulation of adaptive streaming environments. We used Sabre to design and evaluate BOLA-E and DYNAMIC, two novel ABR algorithms. We also developed a FAST SWITCHING algorithm that can replace segments that have already been downloaded with higher-bitrate (thus higher-quality) segments. The new algorithms provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations. In addition, these algorithms react faster to user events such as startup and seek, and respond more quickly to network events such as improvements in throughput. Further, they perform very well for live streams that require low latency, a challenging scenario for ABR algorithms. Overall, our algorithms offer superior video QoE and responsiveness for real-life adaptive video streaming, in comparison to the state-of-the-art. Importantly all three algorithms presented in this paper are now part of the official DASH reference player dash.js and are being used by video providers in production environments. While our evaluation and implementation are focused on the DASH environment, our algorithms are equally applicable to other adaptive streaming formats such as Apple HLS.

Classifying flows and buffer state for youtube's HTTP adaptive streaming service in mobile networks

  •      Dimitrios Tsilimantos
  • Theodoros Karagkioules
  • Stefan Valentin

Accurate cross-layer information is very useful to optimize mobile networks for specific applications. However, providing application-layer information to lower protocol layers has become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signaling standards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers. By observing IP packet arrivals, our machine learning system identifies video flows and detects the state of an HAS client's play-back buffer in real time. Our experiments with YouTube's mobile client show that Random Forests achieve very high accuracy even with a strong variation of link quality. Since this high performance is achieved at IP level with a small, generic feature set, our approach requires no Deep Packet Inspection (DPI), comes at low complexity, and does not interfere with end-to-end encryption. Traffic profiling is, thus, a powerful new tool for monitoring and managing even encrypted HAS traffic in mobile networks.

SESSION: Multimedia in 5G network architectures

Automated profiling of virtualized media processing functions using telemetry and machine learning

  •      Rufael Mekuria
  • Michael J. McGrath
  • Vincenzo Riccobene
  • Victor Bayon-Molino
  • Christos Tselios
  • John Thomson
  • Artem Dobrodub

Most media streaming services are composed by different virtualized processing functions such as encoding, packaging, encryption, content stitching etc. Deployment of these functions in the cloud is attractive as it enables flexibility in deployment options and resource allocation for the different functions. Yet, most of the time overprovisioning of cloud resources is necessary in order to meet demand variability. This can be costly, especially for large scale deployments. Prior art proposes resource allocation based on analytical models that minimize the costs of cloud deployments under a quality of service (QoS) constraint. However, these models do not sufficiently capture the underlying complexity of services composed of multiple processing functions. Instead, we introduce a novel methodology based on full-stack telemetry and machine learning to profile virtualized or cloud native media processing functions individually. The basis of the approach consists of investigating 4 categories of performance metrics: throughput, anomaly, latency and entropy (TALE) in offline (stress tests) and online setups using cloud telemetry. Machine learning is then used to profile the media processing function in the targeted cloud/NFV environment and to extract the most relevant cloud level Key Performance Indicators (KPIs) that relate to the final perceived quality and known client side performance indicators. The results enable more efficient monitoring, as only KPI related metrics need to be collected, stored and analyzed, reducing the storage and communication footprints by over 85%. In addition a detailed overview of the functions behavior was obtained, enabling optimized initial configuration and deployment, and more fine-grained dynamic online resource allocation reducing overprovisioning and avoiding function collapse. We further highlight the next steps towards cloud native carrier grade virtualized processing functions relevant for future network architectures such as in emerging 5G architectures.

Multi-path multi-tier 360-degree video streaming in 5G networks

  •      Liyang Sun
  • Fanyi Duanmu
  • Yong Liu
  • Yao Wang
  • Yinghua Ye
  • Hang Shi
  • David Dai

360° video streaming is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. In 360° video streaming, a user may freely navigate through the captured 360° video scene by changing her desired Field-of-View. High-throughput and low-delay data transfers enabled by 5G wireless networks can potentially facilitate untethered 360° video streaming experience. Meanwhile, the high volatility of 5G wireless links present unprecedented challenges for smooth 360° video streaming. In this paper, novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction. We systematically investigate various design trade-offs on streaming quality and robustness. Through simulations driven by real 5G network bandwidth traces and user viewing direction traces, we demonstrate that the proposed 360° video streaming solutions can achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.

Mobile data offloading system for video streaming services over SDN-enabled wireless networks

  •      Donghyeok Ho
  • Gi Seok Park
  • Hwangjun Song

This work presents a mobile data offloading system for video streaming services over software-defined networking (SDN)-enabled wireless networks. The goal of the proposed system is to alleviate cellular network congestion by offloading parts of video traffic to a WiFi network while improving video quality of all users by efficiently and fairly sharing the limited long term evolution (LTE) resources. In the proposed system, SDN architecture is applied to the wireless network environment to quickly react to time-varying network conditions and finely control the amount of traffic transmitted through LTE and WiFi networks. Under the SDN-enabled wireless environment, we frame the mobile data offloading problem for video streaming services as an asymmetric Nash bargaining game to address conflict among competitive mobile users. Furthermore, we propose a resource allocation algorithm that pursues an effective trade-off between global system utility and quality-of-service fairness among users. The system is fully implemented using ONOS SDN controller and Raspberry PI-3-based mobile devices, and performance is evaluated over real wireless networks.

SESSION: Integrative computer vision and multimedia systems

Scalable distributed visual computing for line-rate video streams

  •      Chen Song
  • Jiacheng Chen
  • Ryan Shea
  • Andy Sun
  • Arrvindh Shriraman
  • Jiangchuan Liu

The past decade has witnessed significant breakthroughs in the world of computer vision. Recent deep learning-based computer vision algorithms exhibit strong performance on recognition, detection, and segmentation. While the development of vision algorithms elicits promising applications, it also presents immense computational challenge to the underlying hardware due to its complex nature, especially when attempting to process the data at line-rate.

To this end we develop a highly scalable computer vision processing framework, which leverages advanced technologies such as Spark Streaming and OpenCV to achieve line-rate video data processing. To ensure the greatest flexibility, our framework is agnostic in terms of computer vision model, and can utilize environments with heterogeneous processing devices. To evaluate this framework, we deploy it in a production cloud computing environment, and perform a thorough analysis on the system's performance. We utilize existing real-world live video streams from Simon Fraser University to measure the number of cars entering our university campus. Further, the data collected from our experiments is being used for real-time predictions of traffic conditions on campus.

ISIFT: extracting incremental results from SIFT

  •      Ben Hamlin
  • Ryan Feng
  • Wu-chi Feng

In computer vision, scale-invariant feature transform (SIFT) remains one of the most commonly used algorithms for feature extraction, but its high computational cost makes it hard to deploy in real-time applications. In this paper, we introduce a novel technique to restructure the inter-octave and intra-octave dependencies of SIFT's keypoint detection and description processes, allowing it to be stopped early and produce approximate results in proportion to the time for which it was allowed to run. If our algorithm is run to completion (about 0.7% longer than traditional SIFT), its results and SIFT's converge. Unlike previous approaches to real-time SIFT, we require no special hardware and make no compromises in keypoint quality, making our technique ideal for real-time and near-real-time applications on resource-constrained systems. We use standard data sets and metrics to analyze the performance of our algorithm and the quality of the generated keypoints.

Latency and throughput characterization of convolutional neural networks for mobile computer vision

  •      Jussi Hanhirova
  • Teemu Kämäräinen
  • Sipi Seppälä
  • Matti Siekkinen
  • Vesa Hirvisalo
  • Antti Ylä-Jääski

We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.

SESSION: Immersive multimedia experiences

Improving response time interval in networked event-based mulsemedia systems

  •      Estêvão Bissoli Saleme
  • Celso A. S. Santos
  • Gheorghita Ghinea

Human perception is inherently multisensory involving sight, hearing, smell, touch, and taste. Mulsemedia systems include the combination of traditional media (text, image, video, and audio) with non-traditional ones that stimulate other senses beyond sight and hearing. Whilst work has been done on some user-centred aspects that the distribution of mulsemedia data raises, such as synchronisation, and jitter, this paper tackles complementary issues that temporality constraints pose on the distribution of mulsemedia effects. It aims at improving response time interval in networked event-based mulsemedia systems based upon prior findings in this context. Thus, we reshaped the communication strategy of an open distributed mulsemedia platform called PlaySEM to work more efficiently with other event-based applications, such as games, VR/AR software, and interactive applications, wishing to stimulate other senses to increase the immersion of users. Moreover, we added lightweight communication protocols in its interface to analyse whether they reduce network overhead. To carry out the experiment, we developed mock applications for different protocols to simulate an interactive application working with the PlaySEM, measuring the delay between them. The results showed that by pre-processing sensory effects metadata before real-time communication, and selecting the appropriate protocol, response time interval in networked event-based mulsemedia systems can decrease remarkably.

Modeling sensory effects as first-class entities in multimedia applications

  •      Marina Josué
  • Raphael Abreu
  • Fábio Barreto
  • Douglas Mattos
  • Glauco Amorim
  • Joel dos Santos
  • Débora Muchaluat-Saade

Multimedia applications are usually composed by audiovisual content. Traditional multimedia conceptual models, and consequently declarative multimedia authoring languages, do not support the definition of multiple sensory effects. Multiple sensorial media (mulsemedia) applications consider the use of sensory effects that can stimulate touch, smell and taste, in addition to hearing and sight. Therefore, mulsemedia applications have been usually developed using general-purpose programming languages. In order to fill in this gap, this paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification. Thus, complete descriptions of mulsemedia applications will be made possible with multimedia models and languages. In order to validate our ideas, an interactive mulsemedia application example is presented and specified with NCL (Nested Context Language) and Lua. Lua components are used for translating sensory effect high-level attributes to MPEG-V SEM (Sensory Effect Metadata) files. A sensory effect simulator was developed to receive SEM files and simulate mulsemedia application rendering.

Dynamic adaptive streaming for multi-viewpoint omnidirectional videos

  •      Xavier Corbillon
  • Francesca De Simone
  • Gwendal Simon
  • Pascal Frossard

Full immersion inside a Virtual Reality (VR) scene requires six Degrees of Freedom (6DoF) applications where the user is allowed to perform translational and rotational movements within the virtual space. The implementation of 6DoF applications is however still an open question. In this paper we study a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras. The user can only switch positions to predefined viewpoints (VPs). We focus on the new challenges that are introduced by adaptive MVP 360-degree video streaming. We introduce several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching. We model three video-segment download strategies for an adaptive streaming client into Mixed Integer Linear Programming (MILP) problems: an omniscient download scheduler; one where the client proactively downloads all VPs to guarantee fast VP switch; one where the client reacts to the user's navigation pattern. We recorded a one MVP 360-degree video with three VPs, implemented a mobile MVP 360-degree video player, and recorded the viewing patterns of multiple users navigating the content. We solved the adaptive streaming optimization problems on this video considering the collected navigation traces. The results emphasize the gains obtained by using tiles in terms of objective quality of the delivered content. They also emphasize the importance of performing further study on VP switching prediction to reduce the bandwidth consumption and to measure the impact of VP switching delay on the subjective Quality of Experience (QoE).

Skeleton-based continuous extrinsic calibration of multiple RGB-D kinect cameras

  •      Kevin Desai
  • Balakrishnan Prabhakaran
  • Suraj Raghuraman

Applications involving 3D scanning and reconstruction & 3D Tele-immersion provide an immersive experience by capturing a scene using multiple RGB-D cameras, such as Kinect. Prior knowledge of intrinsic calibration of each of the cameras, and extrinsic calibration between cameras, is essential to reconstruct the captured data. The intrinsic calibration for a given camera rarely ever changes, so only needs to be estimated once. However, the extrinsic calibration between cameras can change, even with a small nudge to the camera. Calibration accuracy depends on sensor noise, features used, sampling method, etc., resulting in the need for iterative calibration to achieve good calibration.

In this paper, we introduce a skeleton based approach to calibrate multiple RGB-D Kinect cameras in a closed setup, automatically without any intervention, within a few seconds. The method uses only the person present in the scene to calibrate, removing the need for manually inserting, detecting and extracting other objects like plane, checker-board, sphere, etc. 3D joints of the extracted skeleton are used as correspondence points between cameras, after undergoing accuracy and orientation checks. Temporal, spatial, and motion constraints are applied during the point selection strategy. Our calibration error checking is inexpensive in terms of computational cost and time and hence is continuously run in the background. Automatic re-calibration of the cameras can be performed when the calibration error goes beyond a threshold due to any possible camera movement. Evaluations show that the method can provide fast, accurate and continuous calibration, as long as a human is moving around in the captured scene.

The prefetch aggressiveness tradeoff in 360° video streaming

  •      Mathias Almquist
  • Viktor Almquist
  • Vengatanathan Krishnamoorthi
  • Niklas Carlsson
  • Derek Eager

With 360° video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. However, while prefetching based on the user's expected viewing direction is best done close to playback deadlines, large buffers are needed to protect against shortfalls in future available bandwidth. This results in conflicting goals and an important prefetch aggressiveness tradeoff problem regarding how far ahead in time from the current play-point prefetching should be done. This paper presents the first characterization of this tradeoff. The main contributions include an empirical characterization of head movement behavior based on data from viewing sessions of four different categories of 360° video, an optimization-based comparison of the prefetch aggressiveness tradeoffs seen for these video categories, and a data-driven discussion of further optimizations, which include a novel system design that allows both tradeoff objectives to be targeted simultaneously. By qualitatively and quantitatively analyzing the above tradeoffs, we provide insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.

Predicting the performance of virtual reality video streaming in mobile networks

  •      Roberto Irajá Tavares da Costa Filho
  • Marcelo Caggiani Luizelli
  • Maria Torres Vega
  • Jeroen van der Hooft
  • Stefano Petrangeli
  • Tim Wauters
  • Filip De Turck
  • Luciano Paschoal Gaspary

The demand of Virtual Reality (VR) video streaming to mobile devices is booming, as VR becomes accessible to the general public. However, the variability of conditions of mobile networks affects the perception of this type of high-bandwidth-demanding services in unexpected ways. In this situation, there is a need for novel performance assessment models fit to the new VR applications. In this paper, we present PERCEIVE, a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks. By means of machine learning techniques, our approach is able to first predict adaptive VR video playout performance, using network Quality of Service (QoS) indicators as predictors. In a second stage, it employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality. The evaluation of PERCEIVE has been performed considering a real-world environment, in which VR videos are streamed while subjected to LTE/4G network condition. The accuracy of PERCEIVE has been assessed by means of the residual error between predicted and measured values. Our approach predicts the different performance metrics of the VR playout with an average prediction error lower than 3.7% and estimates the perceived quality with a prediction error lower than 4% for over 90% of all the tested cases. Moreover, it allows us to pinpoint the QoS conditions that affect adaptive VR streaming services the most.

SESSION: Human-centric internet and multimedia systems

Enhancing the experience of multiplayer shooter games via advanced lag compensation

  •      Steven W. K. Lee
  • Rocky K. C. Chang

In multiplayer shooter games, lag compensation is used to mitigate the effects of network latency, or lag. Traditional lag compensation (TLC), however, introduces an inconsistency known as "shot behind covers" (SBC), especially to less lagged players. A few recent games ameliorate this problem by compensating only players with lag below a certain limit. This forces sufficiently lagged players to aim ahead of their targets, which is difficult and unrealistic. In this paper, we present a novel advanced lag compensation (ALC) algorithm. Based on TLC, this new algorithm retains the benefits of lag compensation but without compromising less lagged players or compensating only certain players. To evaluate ALC, we have invited players to play an FPS game we build from scratch and answer questions after each match. Comparing with TLC, ALC reduces the number of SBC by 94.1%, and a significant drop in the number of SBC reported by players during matches (p < .05) and the perceived SBC frequency collected at the end of each match (p < .05). ALC and TLC also share a similar hit registration accuracy (p = .158 and p = .18) and responsiveness (p = .317).

Valid.IoT: a framework for sensor data quality analysis and interpolation

  •      Daniel Kuemper
  • Thorben Iggena
  • Ralf Toenjes
  • Elke Pulvermueller

Heterogeneous sensor device networks with diverse maintainers and information collected via social media as well as crowdsourcing tend to be elements of uncertainty in IoT and Smart City networks. Often, there is no ground truth available that can be used to check the plausibility and concordance of the new information. This paper proposes the Valid.IoT Framework as an attachable IoT framework component that can be linked to generate QoI vectors and Interpolated sensory data with plausibility and quality estimations to a variety of platforms. The framework utilises extended infrastructure knowledge and infrastructure-aware interpolation algorithms to validate crowdsourced and device generated sensor information through sensor fusion.

Cardea: context-aware visual privacy protection for photo taking and sharing

  •      Jiayu Shu
  • Rui Zheng
  • Pan Hui

The growing popularity of mobile and wearable devices with built-in cameras and social media sites are now threatening people's visual privacy. Motivated by recent user studies that people's visual privacy concerns are closely related to context, we propose Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences. We define four context elements in a photo, including location, scene, others' presences, and hand gestures. Users can specify their context-dependent privacy preferences based on the above four elements. Cardea will offer fine-grained visual privacy protection service to those who request protection using their identifiable information. We present how Cardea can be integrated into: a) privacy-protecting camera apps, where captured photos will be processed before being saved locally; and b) online social media and networking sites, where uploaded photos will first be examined to protect individuals' visual privacy, before they become visible to others. Our evaluation results on an implemented prototype demonstrate that Cardea is effective with 86% overall accuracy and is welcomed by users, showing promising future of context-aware visual privacy protection for photo taking and sharing.

SESSION: IoT and smart cities

PEAT, how much am i burning?

  •      Akshay Uttama Nambi S. N.
  • Venkatesha Prasad R
  • Antonio Reyes Lua
  • Luis Gonzalez

Depletion of fossil fuel and the ever-increasing need for energy in residential and commercial buildings have triggered in-depth research on many energy saving and energy monitoring mechanisms. Currently, users are only aware of their overall energy consumption and its cost in a shared space. Due to the lack of information on individual energy consumption, users are not being able to fine-tune their energy usage. Further, even-splitting of energy cost in shared spaces does not help in creating awareness. With the advent of the Internet of Things (IoT) and wearable devices, apportioning of the total energy consumption of a household to individual occupants can be achieved to create awareness and consequently promoting sustainable energy usage. However, providing personalized energy consumption information in real-time is a challenging task due to the need for collection of fine-grained information at various levels. Particularly, identifying the user(s) utilizing an appliance in a shared space is a hard problem. The reason being, there are no comprehensive means of collecting accurate personalized energy consumption information. In this paper we present the Personalized Energy Apportioning Toolkit (PEAT) to accurately apportion total energy consumption to individual occupants in shared spaces. Apart from performing energy disaggregation, PEAT combines data from IoT devices such as smartphones and smartwatches of occupants to obtain fine-grained information, such as their location and activities. PEAT estimates energy footprint of individuals by modeling the association between the appliances and occupants in the household. We propose several accuracy metrics to study the performance of our toolkit. PEAT was exhaustively evaluated and validated in two multi-occupant households. PEAT achieves 90% energy apportioning accuracy using only the location information of the occupants. Furthermore, the energy apportioning accuracy is around 95% when both location and activity information is available.

Sensorclone: a framework for harnessing smart devices with virtual sensors

  •      Huber Flores
  • Pan Hui
  • Sasu Tarkoma
  • Yong Li
  • Theodoros Anagnostopoulos
  • Vassilis Kostakos
  • Chu Luo
  • Xiang Su

IoT services hosted by low-power devices rely on the cloud infrastructure to propagate their ubiquitous presence over the Internet. A critical challenge for IoT systems is to ensure continuous provisioning of IoT services by overcoming network breakdowns, hardware failures, and energy constraints. To overcome these issues, we propose a cloud-based framework namely SensorClone, which relies on virtual devices to improve IoT resilience. A virtual device is the digital counterpart of a physical device that has learned to emulate its operations from sample data collected from the physical one. SensorClone exploits the collected data of low-power devices to create virtual devices in the cloud. SensorClone then can opportunistically migrate virtual devices from the cloud into other devices, potentially underutilized, with higher capabilities and closer to the edge of the network, e.g., smart devices. Through a real deployment of our SensorClone in the wild, we identify that virtual devices can be used for two purposes, 1) to reduce the energy consumption of physical devices by duty cycling their service provisioning between the physical device and the virtual representation hosted in the cloud, and 2) to scale IoT services at the edge of the network by harnessing temporal periods of underutilization of smart devices. To evaluate our framework, we present a use case of a virtual sensor created from an IoT service of temperature. From our results, we verify that it is possible to achieve unlimited availability up to 90% and substantial power efficiency under acceptable levels of quality of service. Our work makes contributions towards improving IoT scalability and resilience by using virtual devices.

Rethinking ranging of unmodified BLE peripherals in smart city infrastructure

  •      Bashima Islam
  • Mostafa Uddin
  • Sarit Mukherjee
  • Shahriar Nirjon

Mobility tracking of IoT devices in smart city infrastructures such as smart buildings, hospitals, shopping centers, warehouses, smart streets, and outdoor spaces has many applications. Since Bluetooth Low Energy (BLE) is available in almost every IoT device in the market nowadays, a key to localizing and tracking IoT devices is to develop an accurate ranging technique for BLE-enabled IoT devices. This is, however, a challenging feat as billions of these devices are already in use, and for pragmatic reasons, we cannot propose to modify the IoT device (a BLE peripheral) itself. Furthermore, unlike WiFi ranging - where the channel state information (CSI) is readily available and the bandwidth can be increased by stitching 2.4GHz and 5GHz bands together to achieve a high-precision ranging, an unmodified BLE peripheral provides us with only the RSSI information over a very limited bandwidth. Accurately ranging a BLE device is therefore far more challenging than other wireless standards. In this paper, we exploit characteristics of BLE protocol (e.g. frequency hopping and empty control packet transmissions) and propose a technique to directly estimate the range of a BLE peripheral from a BLE access point by multipath profiling. We discuss the theoretical foundation and conduct experiments to show that the technique achieves a 2.44m absolute range estimation error on average.

SESSION: Open datasets & software

SGF: a crowdsourced large-scale event dataset

  •      Jens Heuschkel
  • Alexander Frömmgen

This paper presents a crowdsourced dataset of a large-scale event with more than 1000 measuring participants. The detailed dataset consists of various location data and network measurements of all national carrier collected during a four-day event. The concentrated samples for this short time period enable detailed analysis, e.g., by correlating movement patterns and experienced network conditions.

Lapgyn4: a dataset for 4 automatic content analysis problems in the domain of laparoscopic gynecology

  •      Andreas Leibetseder
  • Stefan Petscharnig
  • Manfred Jürgen Primus
  • Sabrina Kietz
  • Bernd Münzer
  • Klaus Schoeffmann
  • Jörg Keckstein

Modern imaging technology enables medical practitioners to perform minimally invasive surgery (MIS), i.e. a variety of medical interventions inflicting minimal trauma upon patients, hence, greatly improving their recoveries. Not only patients but also surgeons can benefit from this technology, as recorded media can be utilized for speeding-up tedious and time-consuming tasks such as treatment planning or case documentation. In order to improve the predominantly manually conducted process of analyzing said media, with this work we publish four datasets extracted from gynecologic, laparoscopic interventions with the intend on encouraging research in the field of post-surgical automatic media analysis. These datasets are designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures, as well as distinguishing on which anatomical structure a certain action is performed. Furthermore, we provide suggestions for evaluation metrics and first baseline experiments.

Opensea: open search based classification tool

  •      Konstantin Pogorelov
  • Zeno Albisser
  • Olga Ostroukhova
  • Mathias Lux
  • Dag Johansen
  • Pål Halvorsen
  • Michael Riegler

This paper presents an open-source classification tool for image and video frame classification. The classification takes a search-based approach and relies on global and local image features. It has been shown to work with images as well as videos, and is able to perform the classification of video frames in real-time so that the output can be used while the video is recorded, playing, or streamed. OpenSea has been proven to perform comparable to state-of-the-art methods such as deep learning, at the same time performing much faster in terms of processing speed, and can be therefore seen as an easy to get and hard to beat baseline. We present a detailed description of the software, its installation and use. As a use case, we demonstrate the classification of polyps in colonoscopy videos based on a publicly available dataset. We conduct leave-one-out-cross-validation to show the potential of the software in terms of classification time and accuracy.

Mimir: an automatic reporting and reasoning system for deep learning based analysis in the medical domain

  •      Steven Alexander Hicks
  • Sigrun Eskeland
  • Mathias Lux
  • Thomas de Lange
  • Kristin Ranheim Randel
  • Mattis Jeppsson
  • Konstantin Pogorelov
  • Pål Halvorsen
  • Michael Riegler

Automatic detection of diseases is a growing field of interest, and machine learning in form of deep learning neural networks are frequently explored as a potential tool for the medical video analysis. To both improve the "black box"-understanding and assist in the administrative duties of writing an examination report, we release an automated multimedia reporting software dissecting the neural network to learn the intermediate analysis steps, i.e., we are adding a new level of understanding and explainability by looking into the deep learning algorithms decision processes. The presented open-source software can be used for easy retrieval and reuse of data for automatic report generation, comparisons, teaching and research. As an example, we use live colonoscopy as a use case which is the gold standard examination of the large bowel, commonly performed for clinical and screening purposes. The added information has potentially a large value, and reuse of the data for the automatic reporting may potentially save the doctors large amounts of time.

Multi-profile ultra high definition (UHD) AVC and HEVC 4K DASH datasets

  •      Jason J. Quinlan
  • Cormac J. Sreenan

In this paper we present a Multi-Profile Ultra High Definition (UHD) DASH dataset composed of both AVC (H.264) and HEVC (H.265) video content, generated from three well known open-source 4K video clips. The representation rates and resolutions of our dataset range from 40Mbps in 4K down to 235kbps in 320x240, and are comparable to rates utilised by on demand services such as Netflix, Youtube and Amazon Prime. We provide our dataset for both realtime testbed evaluation and trace-based simulation. The real-time testbed content provides a means of evaluating DASH adaptation techniques on physical hardware, while our trace-based content offers simulation over frameworks such as ns-2 and ns-3. We also provide the original pre-DASH MP4 files and our associated DASH generation scripts, so as to provide researchers with a mechanism to create their own DASH profile content locally. Which improves the reproducibility of results and remove re-buffering issues caused by delay/jitter/losses in the Internet.

The primary goal of our dataset is to provide the wide range of video content required for validating DASH Quality of Experience (QoE) delivery over networks, ranging from constrained cellular and satellite systems to future high speed architectures such as the proposed 5G mmwave technology.

HINDSIGHT: an R-based framework towards long short term memory (LSTM) optimization

  •      Konstantinos Kousias
  • Michael Riegler
  • Özgü Alay
  • Antonios Argyriou

Hyperparameter optimization is an important but often ignored part of successfully training Neural Networks (NN) since it is time consuming and rather complex. In this paper, we present HINDSIGHT, an open-source framework for designing and implementing NN that supports hyperparameter optimization. HINDSIGHT is built entirely in R and the current version focuses on Long Short Term Memory (LSTM) networks, a special kind of Recurrent Neural Networks (RNN). HINDSIGHT is designed in a way that it can easily be expanded to other types of Deep Learning (DL) algorithms such as Convolutional Neural Networks (CNN) or feed-forward Deep Neural Networks (DNN). The main goal of HINDSIGHT is to provide a simple and quick interface to get started with LSTM networks and hyperparameter optimization.

4G/LTE channel quality reference signal trace data set

  •      Britta Meixner
  • Jan Willem Kleinrouweler
  • Pablo Cesar

Mobile networks, especially LTE networks, are used more and more for high-bandwidth services like multimedia or video streams. The quality of the data connection plays a major role in the perceived quality of a service. Videos may be presented in a low quality or experience a lot of stalling events, when the connection is too slow to buffer the next frames for playback. So far, no publicly available data set exists that has a larger number of LTE network traces and can be used for deeper analysis. In this data set, we provide 546 traces of 5 minutes each with a sample rate of 100 ms. Thereof 377 traces are pure LTE data. We furthermore provide an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data.

Toulouse campus surveillance dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views

  •      Thierry Malon
  • Geoffrey Roman-Jimenez
  • Patrice Guyot
  • Sylvie Chambon
  • Vincent Charvillat
  • Alain Crouzil
  • André Péninou
  • Julien Pinquier
  • Florence Sèdes
  • Christine Sénac

In surveillance applications, humans and vehicles are the most important common elements studied. In consequence, detecting and matching a person or a car that appears on several videos is a key problem. Many algorithms have been introduced and nowadays, a major relative problem is to evaluate precisely and to compare these algorithms, in reference to a common ground-truth. In this paper, our goal is to introduce a new dataset for evaluating multi-view based methods. This dataset aims at paving the way for multidisciplinary approaches and applications such as 4D-scene reconstruction, object identification/tracking, audio event detection and multi-source meta-data modeling and querying. Consequently, we provide two sets of 25 synchronized videos with audio tracks, all depicting the same scene from multiple viewpoints, each set of videos following a detailed scenario consisting in comings and goings of people and cars. Every video was annotated by regularly drawing bounding boxes on every moving object with a flag indicating whether the object is fully visible or occluded, specifying its category (human or vehicle), providing visual details (for example clothes types or colors), and timestamps of its apparitions and disappearances. Audio events are also annotated by a category and timestamps.

A canadian french emotional speech dataset

  •      Philippe Gournay
  • Olivier Lahaie
  • Roch Lefebvre

Until recently, there was no emotional speech dataset available in Canadian French. This was a limiting factor for research activities not only in Canada, but also elsewhere. This paper introduces the newly released Canadian French Emotional (CaFE) speech dataset and gives details about its design and content. This dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities. The audio is digitally recorded at high-resolution (192 kHz sampling rate, 24 bits per sample). This new dataset is freely available under a Creative Commons license (CC BY-NC-SA 4.0).

AVtrack360: an open dataset and software recording people's head rotations watching 360° videos on an HMD

  •      Stephan Fremerey
  • Ashutosh Singla
  • Kay Meseberg
  • Alexander Raake

In this paper, we present a viewing test with 48 subjects watching 20 different entertaining omnidirectional videos on an HTC Vive Head Mounted Display (HMD) in a task-free scenario. While the subjects were watching the contents, we recorded their head movements. The obtained dataset is publicly available in addition to the links and timestamps of the source contents used. Within this study, subjects were also asked to fill in the Simulator Sickness Questionnaire (SSQ) after every viewing session. Within this paper, at first SSQ results are presented. Several methods for evaluating head rotation data are presented and discussed. In the course of the study, the collected dataset is published along with the scripts for evaluating the head rotation data. The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent. The collected information can be presented as head-saliency maps, too. In case of videos, head-saliency data can be used for training saliency models, as information for evaluating decisions during content creation, or as part of streaming solutions for region-of-interest-specific coding as with the latest tile-based streaming solutions, as discussed also in standardization bodies such as MPEG.

dashc: a highly scalable client emulator for DASH video

  •      Aleksandr Reviakin
  • Ahmed H. Zahran
  • Cormac J. Sreenan

In this paper we introduce a client emulator for experimenting with DASH video. dashc is a standalone, compact, easy-to-build and easy-to-use command line software tool. The design and implementation of dashc were motivated by the pressing need to conduct network experiments with large numbers of video clients. The highly scalable dashc has low CPU and memory usage. dashc collects necessary statistics about video delivery performance in a convenient format, facilitating thorough post hoc analysis. The code of dashc is modular and new video adaptation algorithm can easily be added. We compare dashc to a state-of-the art client and demonstrate its efficacy for large-scale experiments using the Mininet virtual network.

Popsift: a faithful SIFT implementation for real-time applications

  •      Carsten Griwodz
  • Lilian Calvet
  • Pål Halvorsen

The keypoint detector and descriptor Scalable Invariant Feature Transform (SIFT) [8] is famous for its ability to extract and describe keypoints in 2D images of natural scenes. It is used in ranging from object recognition to 3D reconstruction. However, SIFT is considered compute-heavy. This has led to the development of many keypoint extraction and description methods that sacrifice the wide applicability of SIFT for higher speed. We present our CUDA implementation named PopSift that does not sacrifice any detail of the SIFT algorithm, achieves a keypoint extraction and description performance that is as accurate as the best existing implementations, and runs at least 100x faster on a high-end consumer GPU than existing CPU implementations on a desktop CPU. Without any algorithmic trade-offs and short-cuts that sacrifice quality for speed, we extract at >25 fps from 1080p images with upscaling to 3840x2160 pixels on a high-end consumer GPU.

Cataract-101: video dataset of 101 cataract surgeries

  •      Klaus Schoeffmann
  • Mario Taschwer
  • Stephanie Sarny
  • Bernd Münzer
  • Manfred Jürgen Primus
  • Doris Putzgruber

Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.

Open video datasets over operational mobile networks with MONROE

  •      Cise Midoglu
  • Mohamed Moulay
  • Vincenzo Mancuso
  • Özgü Alay
  • Andra Lutu
  • Carsten Griwodz

Video streaming is a very popular service among the end-users of Mobile Broadband (MBB) networks. DASH and WebRTC are two key technologies in the delivery of mobile video. In this work, we empirically assess the performance of video streaming with DASH and WebRTC in operational MBB networks, by using a large number of programmable network probes spread over several countries in the context of the MONROE project. We collect a large dataset from more than 300 video streaming experiments. Our dataset consists of network traces, performance indicators captured during the streaming sessions, and experiment metadata. The dataset captures the wide variability in video streaming performance, and unveils how mobile broadband is still not offering consistent quality guarantees across different countries and networks, especially for users on the move. We open source our complete software toolset and provide the video dataset as open data.

A dataset of head and eye movements for 360° videos

  •      Erwan J. David
  • Jesús Gutiérrez
  • Antoine Coutrot
  • Matthieu Perreira Da Silva
  • Patrick Le Callet

Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.

Multi-codec DASH dataset

  •      Anatoliy Zabrovskiy
  • Christian Feldmann
  • Christian Timmerer

The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today's internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs.

This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.

Subdiv17: a dataset for investigating subjectivity in the visual diversification of image search results

  •      Maia Rohm
  • Bogdan Ionescu
  • Alexandru Lucian Gînscă
  • Rodrygo L. T. Santos
  • Henning Müller

In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.

MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval

  •      Yashar Deldjoo
  • Mihai Gabriel Constantin
  • Bogdan Ionescu
  • Markus Schedl
  • Paolo Cremonesi

In this paper we propose a new dataset, i.e., the MMTF-14K multi-faceted dataset. It is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging (aka tag prediction). The data consists of 13,623 Hollywood-type movie trailers, ranked by 138,492 users, generating a total of almost 12.5 million ratings. To address a broader community, metadata, audio and visual descriptors are also pre-computed and provided along with several baseline benchmarking results for uni-modal and multi-modal recommendation systems. This creates a rich collection of data for benchmarking results and which supports future development of this field.

SWAPUGC: software for adaptive playback of geotagged UGC

  •      Emmanouil Potetsianakis
  • Jean Le Feuvre

Currently on the market there is a plethora of affordable dedicated cameras or smartphones, able to record video and timed geospa-tial data (device location and orientation). This timed metadata can be used to identify relevant (in time and space) recordings. However, there has not been a platform that allows to exploit this information in order to utilize the relevant recordings in an interactive consumption scenario. In this paper we present SWAPUGC, a browser-based platform for building applications that use the accompanying geospatial data to dynamically select the streams for watching an event (or any spatiotemporal reference point). The view selection can be performed either manually, or automatically by a predefined algorithm that switches to the most suitable stream according to the recording characteristics. SWAPUGC is a research tool to test such adaptation algorithms and it is provided as an open-source project, accompanied by an example demo application and references to a compatible dataset and recorder. In this paper, we explain and then demonstrate the capabilities of the platform by an example implementation and examine future prospects and extensions.

Beyond throughput: a 4G LTE dataset with channel and context metrics

  •      Darijo Raca
  • Jason J. Quinlan
  • Ahmed H. Zahran
  • Cormac J. Sreenan

In this paper, we present a 4G trace dataset composed of client-side cellular key performance indicators (KPIs) collected from two major Irish mobile operators, across different mobility patterns (static, pedestrian, car, bus and train). The 4G trace dataset contains 135 traces, with an average duration of fifteen minutes per trace, with viewable throughput ranging from 0 to 173 Mbit/s at a granularity of one sample per second. Our traces are generated from a well-known non-rooted Android network monitoring application, G-NetTrack Pro. This tool enables capturing various channel related KPIs, context-related metrics, downlink and uplink throughput, and also cell-related information. To the best of our knowledge, this is the first publicly available dataset that contains throughput, channel and context information for 4G networks.

To supplement our real-time 4G production network dataset, we also provide a synthetic dataset generated from a large-scale 4G ns-3 simulation that includes one hundred users randomly scattered across a seven-cell cluster. The purpose of this dataset is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the eNodeB environment and scheduling principle, to end user. In addition to this dataset, we also provide the code and context information to allow other researchers to generate their own synthetic datasets.

HTTP adaptive streaming QoE estimation with ITU-T rec. P. 1203: open databases and software

  •      Werner Robitza
  • Steve Göring
  • Alexander Raake
  • David Lindegren
  • Gunnar Heikkilä
  • Jörgen Gustafsson
  • Peter List
  • Bernhard Feiten
  • Ulf Wüstenhagen
  • Marie-Neige Garcia
  • Kazuhisa Yamagishi
  • Simon Broom

This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.

Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients

  •      Enrique Garcia-Ceja
  • Michael Riegler
  • Petter Jakobsen
  • Jim Tørresen
  • Tine Nordgreen
  • Ketil J. Oedegaard
  • Ole Bernt Fasmer

Wearable sensors measuring different parts of people's activity are a common technology nowadays. In research, data collected using these devices also draws attention. Nevertheless, datasets containing sensor data in the field of medicine are rare. Often, data is non-public and only results are published. This makes it hard for other researchers to reproduce and compare results or even collaborate. In this paper we present a unique dataset containing sensor data collected from patients suffering from depression. The dataset contains motor activity recordings of 23 unipolar and bipolar depressed patients and 32 healthy controls. For each patient we provide sensor data over several days of continuous measuring and also some demographic data. The severity of the patients' depressive state was labeled using ratings done by medical experts on the Montgomery-Asberg Depression Rating Scale (MADRS). In this respect, the here presented dataset can be useful to explore and understand the association between depression and motor activity better. By making this dataset available, we invite and enable interested researchers the possibility to tackle this challenging and important societal problem.

OpenCV.js: computer vision processing for the open web platform

  •      Sajjad Taheri
  • Alexander Vedienbaum
  • Alexandru Nicolau
  • Ningxin Hu
  • Mohammad R. Haghighat

The Web is the world's most ubiquitous compute platform and the foundation of digital economy. Ever since its birth in early 1990's, web capabilities have been increasing in both quantity and quality. However, in spite of all such progress, computer vision is not mainstream on the web yet. The reasons are historical and include lack of sufficient performance of JavaScript, lack of camera support in the standard web APIs, and lack of comprehensive computer-vision libraries. These problems are about to get solved, resulting in the potential of an immersive and perceptual web with transformational effects including in online shopping, education, and entertainment among others. This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform. OpenCV is the most popular computer-vision library with a comprehensive set of vision functions and a large developer community. OpenCV is implemented in C++ and up until now, it was not available in the web browsers without the help of unpopular native plugins. This work leverage OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge. It is provided in a format that is easy for JavaScript engines to highly optimize and has an API that is easy for the web programmers to adopt and develop applications. In addition, OpenCV parallel implementations that target SIMD units and multiprocessors can be ported to equivalent web primitives, providing better performance for real-time and interactive use cases.

DEMONSTRATION SESSION: Demos

Eye tracking based foveated rendering for 360 VR tiled video

  •      HyunWook Kim
  • JinWook Yang
  • MinSu Choi
  • JunSuk Lee
  • SangPil Yoon
  • YoungHwa Kim
  • WooChool Park

To increase the sense of immersion of 360VR images, we have proposed and implemented foveated rendering technology through precision region-of-interest detection using eye-tracking-based head-mounted display equipment for a high-efficiency video coding tiled video-based image-decoding and -rendering method. Our method can provide a high rendering speed and high-quality textures.

Fast and easy live video service setup using lightweight virtualization

  •      Antti Heikkinen
  • Pekka Pääkkönen
  • Marko Viitanen
  • Jarno Vanne
  • Tommi Riikonen
  • Kagan Bakanoglu

The service broker provides service providers with virtualized services that can be initialized rapidly and scaled up or down on demand. This demonstration paper describes how a service provider can set up a new video distribution service to end users with a diminutive effort. Our proposal makes use of Docker lightweight virtualization technologies that pack services in containers. This makes it possible to implement video coding and content delivery networks that are scalable and consume resources only when needed. The demonstration showcases a scenario where a video service provider sets up a new live video distribution service to end users. After the setup, live 720p30 video camera feed is encoded in real-time, streamed in HEVC MPEG-DASH format over CDN network, and accessed with a HbbTV compatible set-top-box. This end-to-end system illustrates that virtualization causes no significant resource or performance overhead but is a perfect match for online video services.

Comprehensible reasoning and automated reporting of medical examinations based on deep learning analysis

  •      Steven Alexander Hicks
  • Konstantin Pogorelov
  • Thomas de Lange
  • Mathias Lux
  • Mattis Jeppsson
  • Kristin Ranheim Randel
  • Sigrun Eskeland
  • Pål Halvorsen
  • Michael Riegler

In the future, medical doctors will to an increasing degree be assisted by deep learning neural networks for disease detection during examinations of patients. In order to make qualified decisions, the black box of deep learning must be opened to increase the understanding of the reasoning behind the decision of the machine learning system. Furthermore, preparing reports after the examinations is a significant part of a doctors work-day, but if we already have a system dissecting the neural network for understanding, the same tool can be used for automatic report generation. In this demo, we describe a system that analyses medical videos from the gastrointestinal tract. Our system dissects the Tensorflow-based neural network to provide insights into the analysis and uses the resulting classification and rationale behind the classification to automatically generate an examination report for the patient's medical journal.

Foveated streaming of virtual reality videos

  •      Miguel Fabian Romero-Rondón
  • Lucile Sassatelli
  • Frédéric Precioso
  • Ramon Aparicio-Pardo

While Virtual Reality (VR) represents a revolution in the user experience, current VR systems are flawed on different aspects. The difficulty to focus naturally in current headsets incurs visual discomfort and cognitive overload, while high-end headsets require tethered powerful hardware for scene synthesis. One of the major solutions envisioned to address these problems is foveated rendering. We consider the problem of streaming stored 360° videos to a VR headset equipped with eye-tracking and foveated rendering capabilities. Our end research goal is to make high-performing foveated streaming systems allowing the playback buffer to build up to absorb the network variations, which is permitted in none of the current proposals. We present our foveated streaming prototype based on the FOVE, one of the first commercially available headsets with an integrated eye-tracker. We build on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters. The low- and high-resolution frames are then merged at the client to approach the natural focusing process.

Virtual reality conferencing: multi-user immersive VR experiences on the web

  •      Simon N. B. Gunkel
  • Hans M. Stokking
  • Martin J. Prins
  • Nanda van der Stap
  • Frank B. ter Haar
  • Omar A. Niamut

Virtual Reality (VR) and 360-degree video are set to become part of the future social environment, enriching and enhancing the way we share experiences and collaborate remotely. While Social VR applications are getting more momentum, most services regarding Social VR focus on animated avatars. In this demo, we present our efforts towards Social VR services based on photo-realistic video recordings. In this demo paper, we focus on two parts, the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds. We enhance a green screen (chroma key) like cut-out of the person with depth data, allowing point cloud based rendering in the client. Further, the paper presents a user study with 54 people evaluating a three-people communication use case and a technical analysis to move towards 3D representations of users. This demo consists of two shared virtual environments to communicate and interact with others, i.e. i) a 360-degree virtual space with users being represented as 2D video streams (with the background removed) and ii) a 3D space with users being represented as point clouds (based on color and depth video data).

Tip-on-a-chip: automatic dotting with glitter ink pen for individual identification of tiny parts

  •      Yuta Kudo
  • Hugo Zwaan
  • Toru Takahashi
  • Rui Ishiyama
  • Pieter Jonker

This paper presents a new identification system for tiny parts that have no space for applying conventional ID marking or tagging. The system marks the parts with a single dot using ink containing shiny particles. The particles in a single dot naturally form a unique pattern. The parts are then identified by matching microscopic images of this pattern with a database containing images of these dots. In this paper, we develop an automated system to conduct dotting and image capturing for mass-produced parts. Experimental results show that our "Tip-on-a-chip" system can uniquely identify more than ten thousand chip capacitors.

ImmersiaTV: enabling customizable and immersive multi-screen TV experiences

  •      David Gómez
  • Juan A. Núñez
  • Mario Montagud
  • Sergi Fernández

ImmersiaTV is a H2020 European project that targets the creation of novel forms of TV content production, delivery and consumption to enable customizable and immersive multi-screen TV experiences. The goal is not only to provide an efficient support for multi-screen scenarios, but also to achieve a seamless integration between the traditional TV content formats and consumption devices with the emerging omnidirectional ones, thus opening the door to new fascinating scenarios. This paper initially provides an overview of the end-to-end platform that is being developed in the project. Then, the created contents and considered pilot scenarios are briefly described. Finally, the paper provides details about the consumption part of the ImmersiaTV platform to be showcased. In particular, it enables a customizable, interactive and synchronized consumption of traditional and omnidirectional contents from an opera performance, in multiscreen scenarios, composed of main TVs, tablets and Head Mounted Displays (HMDs).

Realizing the real-time gaze redirection system with convolutional neural network

  •      Chih-Fan Hsu
  • Yu-Cheng Chen
  • Yu-Shuen Wang
  • Chin-Laung Lei
  • Kuan-Ta Chen

Retaining eye contact of remote users is a critical issue in video conferencing systems because of parallax caused by the physical distance between a screen and a camera. To achieve this objective, we present a real-time gaze redirection system called Flx-gaze to post-process each video frame before sending it to the remote end. Specifically, we relocate and relight the pixels representing eyes by using a convolutional neural network (CNN). To prevent visual artifacts during manipulation, we minimize not only the L2 loss function but also four novel loss functions when training the network. Two of them retain the rigidity of eyeballs and eyelids; and the other two prevent color discontinuity on the eye peripheries. By leveraging the CPU and the GPU resources, our implementation achieves real-time performance (i.e., 31 frames per second). Experimental results show that the gazes redirected by our system are of high quality under this restrict time constraint. We also conducted an objective evaluation of our system by measuring the peak signal-to-noise ratio (PSNR) between the real and the synthesized images.

Visual object tracking in a parking garage using compressed domain analysis

  •      Daniel Becker
  • Matthias Schmidt
  • Fernando Bombardelli da Silva
  • Serhan Gül
  • Cornelius Hellge
  • Oliver Sawade
  • Ilja Radusch

Modern driver assistance systems enable a variety of use cases which rely on accurate localization information of all traffic participants. Due to the unavailability of satellite-based localization, the use of infrastructure cameras is a promising alternative in indoor spaces such as parking garages. This paper presents a parking management system which extends the previous work of the eValet system with a low-complexity tracking functionality on compressed video bitstreams (compressed-domain tracking). The advantages of this approach include the improved robustness to partial occlusions as well as a resource-efficient processing of compressed video bit-streams. We have separated the tasks into different modules which are integrated into a comprehensive architecture. The demonstrator setup includes a 2D visualizer illustrating the operation of the algorithms on a single camera stream and a 3D visualizer displaying the abstract object detections in a global reference frame.

A QoE assessment method based on EDA, heart rate and EEG of a virtual reality assistive technology system

  •      Débora Pereira Salgado
  • Felipe Roque Martins
  • Thiago Braga Rodrigues
  • Conor Keighrey
  • Ronan Flynn
  • Eduardo Lázaro Martins Naves
  • Niall Murray

The1 key aim of various assistive technology (AT) systems is to augment an individual's functioning whilst supporting an enhanced quality of life (QoL). In recent times, we have seen the emergence of Virtual Reality (VR) based assistive technology systems made possible by the availability of commercially available Head Mounted Displays (HMDs). The use of VR for AT aims to support levels of interaction and immersion not previously possibly with more traditional AT solutions. Crucial to the success of these technologies is understanding, from the user perspective, the influencing factors that affect the user Quality of Experience (QoE). In addition to the typical QoE metrics, other factors to consider are human behavior like mental and emotional state, posture and gestures. In terms of trying to objectively quantify such factors, there are wide ranges of wearable sensors that are able to monitor physiological signals and provide reliable data. In this demo, we will capture and present the users EEG, heart Rate, EDA and head motion during the use of AT VR application. The prototype is composed of the sensor and presentation systems: for acquisition of biological signals constituted by wearable sensors and the virtual wheelchair simulator that interfaces to a typical LCD display.

Implementing 360 video tiled streaming system

  •      Jangwoo Son
  • Dongmin Jang
  • Eun-Seok Ryu

The computing power and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, this study proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. Therefore, the bitstream generated by the proposed encoder can be extracted in units of tiles. In accordance with what is discussed in the standard, the proposed extractor extracts the bitstream of the tiles corresponding to the viewport. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) for selective streaming with viewport (region of interest (ROI)) tiles. When the proposed HEVC encoder is used, low and high resolution sequences are separately encoded as the BL and EL of SHVC. By streaming the BL(low resolution) and selective EL(high resolution) tiles with ROI instead of streaming whole high quality 360-degree video, the proposed method can reduce the network bandwidth as well as the computational complexity on the decoder side. Experimental results show more than 47% bandwidth reduction.

A DASH video streaming system for immersive contents

  •      Giuseppe Ribezzo
  • Giuseppe Samela
  • Vittorio Palmisano
  • Luca De Cicco
  • Saverio Mascolo

Virtual Reality/Augmented Reality applications require streaming 360° videos to implement new services in a diverse set of fields such as entertainment, art, e-health, e-learning, and smart factories. Providing a high Quality of Experience when streaming 360° videos is particularly challenging due to the very high required network bandwidth. In this paper, we showcase a proof-of-concept implementation of a complete DASH-compliant delivery system for 360° videos that: 1) allows reducing the required bitrate, 2) is independent of the employed encoder, 3) leverages technologies that are already available in the vast majority of mobile platforms and devices. The demo platform allows the user to directly experiment with various parameters, such as the duration of segments, the compression scheme, and the adaptive streaming algorithm parameters.

MUSLIN demo: high QoE fair multi-source live streaming

  •      Simon Da Silva
  • Joachim Bruneau-Queyreix
  • Mathias Lacaud
  • Daniel Négru
  • Laurent Réveillère

Delivering video content with a high and fairly shared quality of experience is a challenging task in view of the drastic video traffic increase forecasts. Currently, content delivery networks provide numerous servers hosting replicas of the video content, and consuming clients are re-directed to the closest server. Then, the video content is streamed using adaptive streaming solutions. However, some servers become overloaded, and clients may experience a poor or unfairly distributed quality of experience.

In this demonstration, we showcase Muslin, a streaming solution supporting a high, fairly shared end-users quality of experience for live streaming. Muslin leverages on MS-Stream, a content delivery solution in which a client can simultaneously use several servers. Muslin dynamically provisions servers and replicates content into servers, and advertises servers to clients based on real-time delivery conditions. Our demonstration shows that our approach outperforms traditional content delivery schemes enabling to increase the fairness and quality of experience at the user side without requiring a greater underlying content delivery platform.

Improving quality and scalability of webRTC video collaboration applications

  •      Stefano Petrangeli
  • Dries Pauwels
  • Jeroen van der Hooft
  • Tim Wauters
  • Filip De Turck
  • Jürgen Slowack

Remote collaboration is common nowadays in conferencing, tele-health and remote teaching applications. To support these interactive use cases, Real-Time Communication (RTC) solutions, as the open-source WebRTC framework, are generally used. WebRTC is peer-to-peer by design, which entails that each sending peer needs to encode a separate, independent stream for each receiving peer in the remote session. This approach is therefore expensive in terms of number of encoders and not able to scale well for a large number of users. To overcome this issue, a WebRTC-compliant framework is proposed in this paper, where only a limited number of encoders are used at sender-side. Consequently, each encoder can transmit to a multitude of receivers at the same time. The conference controller, a centralized Selective Forwarding Unit (SFU), dynamically forwards the most suitable stream to each of the receivers, based on their bandwidth conditions. Moreover, the controller dynamically recomputes the encoding bitrates of the sender, to follow the long-term bandwidth variations of the receivers and increase the delivered video quality. The benefits of this framework are showcased using a demo implemented using the Jitsi-Videobridge software, a WebRTC SFU, for the controller and the Chrome browser for the peers. Particularly, we demonstrate how our framework can improve the received video quality up to 15% compared to an approach where the encoding bitrates are static and do not change over time.

Low-latency delivery of news-based video content

  •      Jeroen van der Hooft
  • Dries Pauwels
  • Cedric De Boom
  • Stefano Petrangeli
  • Tim Wauters
  • Filip De Turck

Nowadays, news-based websites and portals provide significant amounts of multimedia content to accompany news stories and articles. Within this context, HTTP Adaptive Streaming is generally used to deliver video over the best-effort Internet, allowing smooth video playback and a good Quality of Experience (QoE). To stimulate user engagement with the provided content, such as browsing and switching between videos, reducing the video's startup time has become more and more important: while the current median load time is in the order of seconds, research has shown that user waiting times must remain below two seconds to achieve an acceptable QoE. We developed a framework for low-latent delivery of news-related video content, integrating four optimizations either at server-side, client-side, or at the application layer. Using these optimizations, the video's startup time can be reduced significantly, allowing user interaction and fast switching between available content. In this paper, we describe a proof of concept of this framework, using a large dataset of a major Belgian news provider. A dashboard is provided, which allows the user to interact with available video content and assess the gains of the proposed optimizations. Particularly, we demonstrate how the proposed optimizations consistently reduce the video's startup time in different mobile network scenarios. These reductions allow the news provider to improve the user's QoE, reducing the startup time to values well below two seconds in different mobile network scenarios.

Immersive 360° VR tiled streaming system for esports service

  •      HyunWook Kim
  • JinWook Yang
  • MinSu Choi
  • JunSuk Lee
  • SangPil Yoon
  • YongHwa Kim
  • WooChool Park

This paper shows how a system was designed and implemented for the live tiled streaming of 360° Virtual Reality(VR) eSports based on HEVC. A virtual camera technique was designed and implemented based on the Unity and Unreal engines as a means of capturing 360° videos in a virtual space, and a MPEG-DASH SRD(Spatial Relation Description)[1-2] Based dashing and streaming server was configured to build a live HEVC encoding system and for real-time internet streaming. Finally, an integrated 360° VR client was configured for use as a multi-platform. Each technological element was designed to service the client through a 360° VR 3D stereo method, and the MPEG-DASH SRD was expanded such that high-quality game videos can be played at lower bandwidths.