SIGMM Records PhD thesis summaries
PhD thesis abstracts
Edge Indexing in a Grid for Highly Dynamic Virtual Environments
Newly emerging game-based application systems provide three-dimensional virtual environments where multiple users interact with each other in real-time. Such virtual worlds are filled with autonomous, mutable virtual content which is continuously augmented by the users. To make the systems highly scalable and dynamically extensible, they are usually built on a client-server based grid subspace division where the virtual worlds are partitioned into manageable sub-worlds. In each sub-world, the user continuously receives relevant geometry updates of moving objects via a streaming process from remotely connected servers and renders them according to her viewpoint, rather than retrieving them from a local storage medium.
In such systems, the determination of the set of objects that are visible from a user's viewpoint is one of the primary factors that affect server throughput and scalability. Specifically, performing real-time visibility tests in extremely dynamic virtual environments is a very challenging task as millions of objects and sub-millions of active users are moving and interacting. We recognize that the described challenges are closely related to a spatial database problem, and hence we map the moving geometry objects in the virtual space to a set of multi-dimensional objects in a spatial database while modeling each avatar both as a spatial object and a moving query. Unfortunately, existing spatial indexing methods are unsuitable for this kind of new environments. The main contribution of this research is an efficient spatial index structure that minimizes unexpected object popping and supports highly scalable real-time visibility determination. We uncovered many useful properties of this structure and have compared the index structure with various spatial indexing methods in terms of query quality, system throughput, and resource utilization. We expect our approach to lay the groundwork for next-generation metaverses and virtual world frameworks where geometry data is continuously streamed to each user.
A Semantic-Based Framework for Multimedia Management and Interoperability
Interoperable, semantic-based audiovisual content services are necessary in the open Internet environment, where the volume of the available audiovisual information is growing rapidly. Such services can be built on top of structured, semantic-based audiovisual content descriptions.
This thesis focuses on the representation and management of the audiovisual content semantics. The representation and management of the audiovisual content semantics are based on the dominant standards for audiovisual content description and ontology representation which are, respectively, the MPEG-7 and the OWL. In particular, the following components have been developed for the representation and management of the audiovisual content semantics:
The MP7QL (MPeg-7 Query Language) query language and the MP7QL user preference model have also been developed, in order to allow semantic-based retrieval and filtering of the audiovisual content. The MP7QL query language and the MP7QL user preference model allow for the transparent access to the audiovisual material and the expression of conditions for all the components of the MPEG-7 descriptions. In addition, they allow the explicit specification of boolean operators and preference values for the combination of the conditions according to the user intentions.
The above-referred components (ontological infrastructure, model for the representation of the audiovisual content semantics, mapping model, query language and user preference model) comprise the theoretical basis of the DS-MIRF (Domain-Specific Multimedia Information and Filtering Framework). DS-MIRF allows the development of domain knowledge based applications and services for audiovisual content that utilize and extend the MPEG-7 standard.
The DS-MIRF framework comprises of the following components:
The audiovisual content semantics representation model developed in the context of this thesis has been applied in the domains of sports and cultural heritage. In the sports domain, the proposed model was evaluated in comparison with the existing approaches, and was shown to be more effective than them in semantic-based retrieval and filtering support for audiovisual content.
Group Communication Techniques in Overlay Networks
One type of Internet services that have recently gained much attention are services that enable people around the world to communicate in real-time. Such services of real-time interaction are offered by applications most commonly referred to as distributed interactive applications. Concrete examples of distributed interactive applications are multiplayer online games, audio/video conferencing, and many virtual-reality applications linked to education, entertainment, military, etc. A time-dependent requirement generally applies to all distributed interactive applications that aim to support real-time interaction, and is usually in terms of a few hundred milliseconds. The latency requirements are manifested in terms of event-distribution, group membership management, group dynamics, etc., far exceeding the requirements of many other applications.
One general focal point in this thesis is to enable scalable group communication for managing dynamic groups of clients that interact in real-time. This is meant to enable people around the world to dynamically join networks of participants and interact with them in real-time. The main contributions of the thesis are a number of investigations of a wide variety of group communication techniques. The results from the investigations form a foundation to identify the techniques that are particularly suitable for distributed interactive applications.
This thesis investigates membership management techniques, and evaluates both centralized and distributed approaches through empirical and experimental studies on PlanetLab. It proposes 3 membership management techniques and finds that a centralized membership management approach is particularly fast and consistent when there are multiple dynamic groups.
The thesis aims to identify well-placed nodes in the application network that yield low pair-wise latencies to groups of clients. These may, for example, be used for membership managing tasks. It evaluates 5 core-node selection algorithms through group communication simulations and experiments on PlanetLab. From these evaluations it is found that core-node selection algorithms exist that are able to find sufficiently well-placed nodes.
The thesis considers overlay network multicast as the better option to distribute time-dependent events in groups, and finds that centralized graph algorithms are suited to meet the latency requirements put on the overlay constructions and reconfigurations. It evaluates a variety of centralized overlay construction algorithms that aim to build low-latency overlay networks.
Finally, the thesis investigates whether it is possible to obtain accurate all-to-all path latencies to be used by the centralized graph algorithms. For this, 2 latency estimation techniques are evaluated and their accuracy measured by comparing the estimates to all-to-all ping measurements. A real-world system was implemented to perform group communication experiments on PlanetLab. It was found that when latency estimates are used by core-node selection algorithms and overlay construction algorithms, they are sufficiently accurate such that the graph algorithms still find solutions that are close to the real-world.
Modularization and Multi-Granularity Reuse of Learning Resources
This thesis investigates modular reuse of learning resources. In particular, it considers a scenario of reuse in which existing learning resources serve as preliminary products for the creation of new learning resources for Web based training. Authors are interested in reusing the learning resources created by other authors. It is assumed that these authors belong to different organizations. Furthermore, these authors do not use a common authoring tool because they are obliged to use the tools specified by their respective organizations. There are content models which specify how learning resources may be constructed hierarchically. Authoring paradigms, such as authoring by aggregation, allow in principle a new learning resource to be created as the aggregation of different smaller learning resources. However, it is necessary that the learning resources to be combined are stored as individual resources. This approach works well if an organization systematically creates fine-grained, modular learning resources by using a suitable authoring environment. Many authoring tools use arbitrary content formats that are incompatible with other authoring tools or learning management systems. Thus, learning resources are not exchanged in their source format; instead, the Shareable Content Object Reference Model (SCORM) specifies a common exchange format for the learning resources. One disadvantage of this format is that the modular components of a learning resource are no longer able to be distinguished as individual learning resources.
This thesis enables the reuse of modular learning resources, which have due to an export process ceased to exist as individual learning resources. There are five contributions in the thesis that address the challenge of modular, multi-granularity reuse.
In the first contribution, an extension to the SCORM specification has been defined which enables the modular reuse of parts of a SCORM package and allows these learning resources to be modularized and aggregated. Furthermore, several approaches for modularization have been reviewed. As a result, a generic process model for the modularization of learning resources resulted from these various approaches. This process model is the second contribution of this thesis.
The third contribution is an extension of an authoring by aggregation process. The authoring by aggregation within existing implementations is restricted to pure content development only. This thesis has extended one of theses processes by a design phase which integrates the light-weight authoring approach of authoring by aggregation. After learning resources from different origins have been obtained and aggregated, the aggregation often looks like a patchwork. It is necessary to adapt the aggregated learning resources towards a unified appearance. This thesis proposes a framework for learning resource content representation and adaptation. This framework enables the development of adaptation tools which are able to work independent of different document formats and focus on a learning resource in its entirety instead of on individual documents.
Finally, the fifth contribution in this thesis is a new approach for the topical classification of learning resources. For cases in which no suitable training corpus is available, Wikipedia the online encyclopaedia is used as a substitute corpus for training machine learning classifiers. An evaluation of the Wikipedia based classifier has shown that it performs significantly better than traditional approaches.
Codec-Agnostic Dynamic and Distributed Adaptation of Scalable Multimedia Content
Today's Internet is accessible to diverse end devices through a wide variety of network types. Independent from this huge amount of usage contexts, content consumers desire to retrieve content with the best possible supported quality. The designers of new media codecs react to this diversity of usage contexts by including adaptation support into the codec design. Scalable media codecs, such as the new MPEG-4 Scalable Video Codec, enable to easily retrieve different qualities of the media content by simply disregarding certain media segments. All these variables (different end devices, network types, user preferences, media codec types, scalability options) lead to a manifold of needed and possible adaptation operations.
In order to counter this complexity, the MPEG-21 Digital Item Adaptation (DIA) standard specifies a set of descriptions (and related processes) in order to describe the media content, the adaptation possibilities and the usage context in the XML domain. The relevant descriptions are: 1) The generic Bitstream Syntax Description (gBSD), which uses a generic language to describe, for instance, the parts of a media content which may be removed for scalability purposes. 2) The Adaptation Quality of Service Description (AQoS), which describes how (segments of) a media content need(s) to be adapted in order to correspond to the various usage contexts, e.g., how many quality layers need to be dropped to correspond to the currently available network bandwidth. 3) The Usage Environment Descriptions (UEDs) which describe the usage context, e.g., the available network bandwidth. Since all of these descriptions, i.e., all codec-specific information, are provided together with the media content, this helps to enable codec-agnostic adaptation nodes, which support any type of scalable media which is properly described by those DIA descriptions.
This thesis extends the static, server-based, gBSD-driven adaptation mechanism towards dynamic and distributed environments. To achieve this, novel mechanisms for fragmentation, storage and transport of content-related XML metadata are introduced. One particular contribution is the introduction of the concept of samples for metadata by employing Streaming Instructions which steer the fragmentation of and provide timing for XML-based metadata. This enables the synchronized processing of such a metadata stream with the described media samples. Furthermore, investigations of the ISO Base Media File Format show how such metadata streams can be stored for later processing. Finally, the applicability of the Real-Time Transport Protocol (RTP) is analyzed for the transport of such metadata streams. A codec-agnostic adaptation node based on these novel mechanisms is implemented and evaluated with regards to its adaptation performance for different types of scalable media. Extensive measurements with these scalable media contents show which parts of the gBSD-based adaptation process (could) benefit most from optimization.
Additionally, a mechanism based on a novel binary header to enable codec-agnostic adaptation of media content is specified. This Generic Scalability Header (GSH) prefixes each media packet payload and is based on the concepts of the gBSD-based adaptation mechanism. It provides information on both the bitstream syntax and the adaptation options and therefore combines (some of) the information provided by the MPEG-21 DIA gBSD and AQoS descriptions. However it enables codec-agnostic adaptation at a considerably lower performance cost. As above, the adaptation performance of this mechanism is evaluated for several types of scalable media. Finally, both mechanisms are implemented in the same adaptation architecture and compared to each other and additionally to a codec-specific adaptation approach using several types of scalable media.
A concluding discussion analyzes the results of the quantitative and qualitative evaluation of both mechanisms. Most notably the measurements show that for MPEG-4 Scalable Video Codec and MPEG-4 Visual Elementary Streams the GSH-based mechanism's throughput is only about 1.25 times lower than for the codec-specific mechanism and the metadata overhead is less than 1 percent. The gBSD-based mechanism comes at a higher cost for these codecs (about 10 times lower throughput and a maximum of 10 percent metadata overhead with compression). We conclude that, depending on the application scenario, both mechanisms can be viable alternatives to existing codec-specific adaptation approaches. In particular in scenarios where contents encoded with diverse (and potentially changing) scalable media codecs need to be adapted, the flexibility of codec-agnostic approaches can outweigh their reduced performance.
Collaborative Media Streaming
At the time being, multimedia services using IP technology are a hot topic for network and service providers. Examples are IPTV, which stands for television broadcast over a (mostly closed) network infrastructure by means of the IP suite, or video on-demand, which allows for watching selected movies via Internet on TV devices or computers in the home.
Technically, these services can be classified under the notion of streaming. A server sends media data in a continuous fashion to one or several clients, which consume data portions as soon as they arrive, mostly displaying them also. By using a feedback channel customers may influence the play-back, since they may watch programs time-shifted or pause the program.
An enhancement of such streaming services is to watch those movies together with a group of people on several devices in parallel, independent from the location of the other group members. Similar approaches have been developed using IP multicast, for example for distributing lectures or conference talks to a group of listeners. However, users cannot control the presentation: pausing or skipping of more unimportant parts is impossible. Moreover, the streaming presentation is announced by means outside the application instead of adding others to the session directly within the application.
The costream architecture developed in this work offers a collaborative streaming service without these limitations: People may retrieve movies, join others watching a movie or invite others to such a collaborative streaming session. Participants of a collaborative streaming session can control the movie presentation like they do on a DVD player. Dependent on the desired course of the session the control operation is executed for all users, or the group is split into subgroups to let watchers follow their own time-lines. For this, a group management controls access to session control operations by means of user roles. Separate from the group management, the so-called association service provides for streaming session control and synchronization among participants.
This separation of duties is advantageous in the sense that standard components can be used: For group management, SIP conferencing servers are suitable, whereas session control can best be handled using RTSP proxies as already used for caching of media data.
Eventually, the evaluation of this architecture shows that such a service offers both low latency for clients and an acceptable synchronization of media streams to different client devices. Moreover, the communication overhead compared to usual conferencing or streaming systems is very low.