SIGMM Records PhD thesis summaries

June 2009

PhD thesis abstracts

Beomjoo Seo

Edge Indexing in a Grid for Highly Dynamic Virtual Environments

Newly emerging game-based application systems provide three-dimensional virtual environments where multiple users interact with each other in real-time. Such virtual worlds are filled with autonomous, mutable virtual content which is continuously augmented by the users. To make the systems highly scalable and dynamically extensible, they are usually built on a client-server based grid subspace division where the virtual worlds are partitioned into manageable sub-worlds. In each sub-world, the user continuously receives relevant geometry updates of moving objects via a streaming process from remotely connected servers and renders them according to her viewpoint, rather than retrieving them from a local storage medium.

In such systems, the determination of the set of objects that are visible from a user's viewpoint is one of the primary factors that affect server throughput and scalability. Specifically, performing real-time visibility tests in extremely dynamic virtual environments is a very challenging task as millions of objects and sub-millions of active users are moving and interacting. We recognize that the described challenges are closely related to a spatial database problem, and hence we map the moving geometry objects in the virtual space to a set of multi-dimensional objects in a spatial database while modeling each avatar both as a spatial object and a moving query. Unfortunately, existing spatial indexing methods are unsuitable for this kind of new environments. The main contribution of this research is an efficient spatial index structure that minimizes unexpected object popping and supports highly scalable real-time visibility determination. We uncovered many useful properties of this structure and have compared the index structure with various spatial indexing methods in terms of query quality, system throughput, and resource utilization. We expect our approach to lay the groundwork for next-generation metaverses and virtual world frameworks where geometry data is continuously streamed to each user.

Advisor(s): Roger Zimmermann

SIG MM member(s): Roger Zimmermann


Data Management Research Laboratory

The research activities at the Data Management Research Lab (DMRL), formerly at the University of Southern California, Los Angeles ( and now at the School of Computing, National University of Singapore, focus on research in the areas of peer-to-peer systems, collaborative environments, streaming media architectures, geospatial data management, and mobile location-based services.

Chrisa Tsinaraki

A Semantic-Based Framework for Multimedia Management and Interoperability

Interoperable, semantic-based audiovisual content services are necessary in the open Internet environment, where the volume of the available audiovisual information is growing rapidly. Such services can be built on top of structured, semantic-based audiovisual content descriptions.

This thesis focuses on the representation and management of the audiovisual content semantics. The representation and management of the audiovisual content semantics are based on the dominant standards for audiovisual content description and ontology representation which are, respectively, the MPEG-7 and the OWL. In particular, the following components have been developed for the representation and management of the audiovisual content semantics:

  • An MPEG-7 based model that allows: (a) The representation of the audiovisual content semantics; and (b) The representation of domain ontologies that extend the general-purpose MPEG-7 semantics with domain knowledge and may be utilized in the audiovisual content description.
  • An ontological infrastructure, which allows the representation and management of the audiovisual content semantics. This ontological infrastructure represents the MPEG-7 semantics and allows their extension with application-specific and domain-specific knowledge.
  • A mapping model that allows interoperability support between MPEG-7 and OWL. This model maps the MPEG-7 constructs to OWL constructs and allows the transformation of OWL domain ontologies and OWL/RDF audiovisual content descriptions into MPEG-7 descriptions.

The MP7QL (MPeg-7 Query Language) query language and the MP7QL user preference model have also been developed, in order to allow semantic-based retrieval and filtering of the audiovisual content. The MP7QL query language and the MP7QL user preference model allow for the transparent access to the audiovisual material and the expression of conditions for all the components of the MPEG-7 descriptions. In addition, they allow the explicit specification of boolean operators and preference values for the combination of the conditions according to the user intentions.

The above-referred components (ontological infrastructure, model for the representation of the audiovisual content semantics, mapping model, query language and user preference model) comprise the theoretical basis of the DS-MIRF (Domain-Specific Multimedia Information and Filtering Framework). DS-MIRF allows the development of domain knowledge based applications and services for audiovisual content that utilize and extend the MPEG-7 standard.

The DS-MIRF framework comprises of the following components:

  • The DS-MIRF ontological infrastructure, which includes: (a) An OWL-DL Upper Ontology, which captures the semantics of the MPEG-7 standard. Since the MPEG-7 has been expressed in XML Schema syntax, the development of the upper ontology has been based on mapping XML Schema constructs to OWL constructs. The generalization of this methodology led to the development of the XS2OWL mapping model. The XS2OWL mapping model maps XML Schema constructs to OWL constructs, thus allowing the use of Semantic Web tools and methodologies by XML Schema based applications; (b) A set of application ontologies, which extend the upper ontology with application knowledge, so that advanced application support can be provided that utilizes the domain-specific semantics of the different application domains. A semantic user preference ontology, which captures the semantics of the MP7QL user preference model, and a typed relationship ontology have been integrated in the DS-MIRF ontological infrastructure; and (c) A methodology that allows integrating domain ontologies in the DS-MIRF ontological infrastructure. The domain ontologies extend the semantics of the upper ontology and the application ontologies with domain knowledge. The methodology has been tested through the integration of a soccer ontology and a formula 1 ontology.
  • Functionality that allows managing MPEG-7 audiovisual content descriptions, user preference descriptions and domain ontologies.
  • Audiovisual content browsing, retrieval and filtering functionality, as well as audiovisual content service personalization functionality. The retrieval functionality is based on the MP7QL query language, while the personalization functionality and the filtering functionality are based on the MP7QL user preference model.
  • The GraphOnto software component, which was used for the development of the DS-MIRF ontological infrastructure. In addition, the OWL/MPEG-7 mapping model developed in the context of the thesis was implemented in GraphOnto, thus allowing the automatic transformation of OWL domain ontologies and OWL/RDF descriptions into MPEG-7 descriptions.

The audiovisual content semantics representation model developed in the context of this thesis has been applied in the domains of sports and cultural heritage. In the sports domain, the proposed model was evaluated in comparison with the existing approaches, and was shown to be more effective than them in semantic-based retrieval and filtering support for audiovisual content.

Advisor(s): Stavros Christodoulakis (supervisor)

SIG MM member(s): Stavros Christodoulakis



The TUC/MUSIC Laboratory was established in 1990 in the Department of Electronics and Computer Engineering of the Technical University of Crete which is located in Chania, Crete, Greece. The TUC/MUSIC Laboratory is a center of research, development and teaching in the areas of distributed information systems, application engineering, computer graphics, and simulation engineering.

In the general area of systems development, the TUC/MUSIC Laboratory has performed research in the areas of high performance distributed multimedia architectures, information systems offering advanced functionalities, data base systems, information retrieval systems, digital libraries, service oriented architectures, and graphics systems.

In the area of application engineering, the TUC/MUSIC Laboratory has performed research in the topics of large distributed multimedia delivery networks for intelligent TV applications, semantic interoperability infrastructures, web and mobile based application development methodologies, natural language processing, as well as standard-based software infrastructures for multimedia applications in areas such as e-learning, culture and tourism, business applications, TV Applications and medicine.

In the area of simulation engineering, the TUC/MUSIC Laboratory is conducting research in the areas of real-time perceptually-based selective rendering algorithms, fidelity metrics for immersive simulations, uncertainty modeling and visualization and human factors engineering. The TUC/MUSIC Lab has participated in over 40 EU projects and Excellence Networks as Partner, Coordinator and Technical Leader.

Knut-Helge Vik

Group Communication Techniques in Overlay Networks

One type of Internet services that have recently gained much attention are services that enable people around the world to communicate in real-time. Such services of real-time interaction are offered by applications most commonly referred to as distributed interactive applications. Concrete examples of distributed interactive applications are multiplayer online games, audio/video conferencing, and many virtual-reality applications linked to education, entertainment, military, etc. A time-dependent requirement generally applies to all distributed interactive applications that aim to support real-time interaction, and is usually in terms of a few hundred milliseconds. The latency requirements are manifested in terms of event-distribution, group membership management, group dynamics, etc., far exceeding the requirements of many other applications.

One general focal point in this thesis is to enable scalable group communication for managing dynamic groups of clients that interact in real-time. This is meant to enable people around the world to dynamically join networks of participants and interact with them in real-time. The main contributions of the thesis are a number of investigations of a wide variety of group communication techniques. The results from the investigations form a foundation to identify the techniques that are particularly suitable for distributed interactive applications.

This thesis investigates membership management techniques, and evaluates both centralized and distributed approaches through empirical and experimental studies on PlanetLab. It proposes 3 membership management techniques and finds that a centralized membership management approach is particularly fast and consistent when there are multiple dynamic groups.

The thesis aims to identify well-placed nodes in the application network that yield low pair-wise latencies to groups of clients. These may, for example, be used for membership managing tasks. It evaluates 5 core-node selection algorithms through group communication simulations and experiments on PlanetLab. From these evaluations it is found that core-node selection algorithms exist that are able to find sufficiently well-placed nodes.

The thesis considers overlay network multicast as the better option to distribute time-dependent events in groups, and finds that centralized graph algorithms are suited to meet the latency requirements put on the overlay constructions and reconfigurations. It evaluates a variety of centralized overlay construction algorithms that aim to build low-latency overlay networks.

Finally, the thesis investigates whether it is possible to obtain accurate all-to-all path latencies to be used by the centralized graph algorithms. For this, 2 latency estimation techniques are evaluated and their accuracy measured by comparing the estimates to all-to-all ping measurements. A real-world system was implemented to perform group communication experiments on PlanetLab. It was found that when latency estimates are used by core-node selection algorithms and overlay construction algorithms, they are sufficiently accurate such that the graph algorithms still find solutions that are close to the real-world.

Advisor(s): Paal Halvorsen (supervisor), Carsten Griwodz (supervisor), Roger Zimmermann (opponent), Utz Roedig (opponent)

SIG MM member(s): Paal Halvorsen, Carsten Griwodz, Roger Zimmermann



Increasing heterogeneity of end-systems and the increasing speed of consumer access networks make large scale distributed multimedia applications such as streaming, gaming and Internet telephony increasingly popular. Higher bandwidth, interactivity and more symmetrical traffic patterns change the challenges that lie in developing and operating an affordable infrastructure. Users' demands for quality of service remain, as does the lack of end-to-end resource reservation. RELAY investigates system-level approaches that provide quantifiable better support for a class of distributed systems in spite of the lack of control over the infrastructure, at a predictable cost.

RELAY designs, implements and evaluates mechanisms and tools to improve resource utilization, increase throughput, reduce/hide latency and support soft QoS. One of RELAY's main goals is to integrate and combine mechanisms to get more scalable, less resource demanding, high performance systems for time-dependent large-scale distributed multimedia systems.

RELAY considers architectural, kernel and protocol support for reduced resource consumption in servers and intermediate systems, algorithms for the allocation of data and functions to servers and intermediate systems, and investigate combinations of performance-enhancing mechanisms such that they do not counteract each other.

Marek Meyer

Modularization and Multi-Granularity Reuse of Learning Resources

This thesis investigates modular reuse of learning resources. In particular, it considers a scenario of reuse in which existing learning resources serve as preliminary products for the creation of new learning resources for Web based training. Authors are interested in reusing the learning resources created by other authors. It is assumed that these authors belong to different organizations. Furthermore, these authors do not use a common authoring tool because they are obliged to use the tools specified by their respective organizations. There are content models which specify how learning resources may be constructed hierarchically. Authoring paradigms, such as authoring by aggregation, allow in principle a new learning resource to be created as the aggregation of different smaller learning resources. However, it is necessary that the learning resources to be combined are stored as individual resources. This approach works well if an organization systematically creates fine-grained, modular learning resources by using a suitable authoring environment. Many authoring tools use arbitrary content formats that are incompatible with other authoring tools or learning management systems. Thus, learning resources are not exchanged in their source format; instead, the Shareable Content Object Reference Model (SCORM) specifies a common exchange format for the learning resources. One disadvantage of this format is that the modular components of a learning resource are no longer able to be distinguished as individual learning resources.

This thesis enables the reuse of modular learning resources, which have due to an export process ceased to exist as individual learning resources. There are five contributions in the thesis that address the challenge of modular, multi-granularity reuse.

In the first contribution, an extension to the SCORM specification has been defined which enables the modular reuse of parts of a SCORM package and allows these learning resources to be modularized and aggregated. Furthermore, several approaches for modularization have been reviewed. As a result, a generic process model for the modularization of learning resources resulted from these various approaches. This process model is the second contribution of this thesis.

The third contribution is an extension of an authoring by aggregation process. The authoring by aggregation within existing implementations is restricted to pure content development only. This thesis has extended one of theses processes by a design phase which integrates the light-weight authoring approach of authoring by aggregation. After learning resources from different origins have been obtained and aggregated, the aggregation often looks like a patchwork. It is necessary to adapt the aggregated learning resources towards a unified appearance. This thesis proposes a framework for learning resource content representation and adaptation. This framework enables the development of adaptation tools which are able to work independent of different document formats and focus on a learning resource in its entirety instead of on individual documents.

Finally, the fifth contribution in this thesis is a new approach for the topical classification of learning resources. For cases in which no suitable training corpus is available, Wikipedia the online encyclopaedia is used as a substitute corpus for training machine learning classifiers. An evaluation of the Wikipedia based classifier has shown that it performs significantly better than traditional approaches.

Advisor(s): Ralf Steinmetz (first examiner), Abdulmotaleb El Saddik (second examiner)

SIG MM member(s): Ralf Steinmetz and Abdulmotaleb El Saddik


KOM - Multimedia Communications Lab

Multimedia Communications Lab at the Department of Electrical Engineering and Information Technology at TUD is headed by Prof. Dr.-Ing. Ralf Steinmetz (Adjunct Professor of the Department of Computer Science). The Multimedia Communications Lab haunts the vision of seamless multimedia communication. Seamless multimedia communication has the potential to create a future where people from all over the world live, collaborate, and communicate independent of geographical constraints. The communication systems that support this collaboration have to be performant, dependable, secure, and adaptable to user requirements.

The lab works on different Research Areas towards this vision:

  • Communication Services
  • IT Architectures
  • Knowledge Media
  • Mobile Networking
  • Network Mechanisms & QoS
  • Peer-to-Peer Networking
  • Ubiquitous Computing
  • Networked Gaming
  • IT for Mobility and Logistics

Michael Ransburg

Codec-Agnostic Dynamic and Distributed Adaptation of Scalable Multimedia Content

Today's Internet is accessible to diverse end devices through a wide variety of network types. Independent from this huge amount of usage contexts, content consumers desire to retrieve content with the best possible supported quality. The designers of new media codecs react to this diversity of usage contexts by including adaptation support into the codec design. Scalable media codecs, such as the new MPEG-4 Scalable Video Codec, enable to easily retrieve different qualities of the media content by simply disregarding certain media segments. All these variables (different end devices, network types, user preferences, media codec types, scalability options) lead to a manifold of needed and possible adaptation operations.

In order to counter this complexity, the MPEG-21 Digital Item Adaptation (DIA) standard specifies a set of descriptions (and related processes) in order to describe the media content, the adaptation possibilities and the usage context in the XML domain. The relevant descriptions are: 1) The generic Bitstream Syntax Description (gBSD), which uses a generic language to describe, for instance, the parts of a media content which may be removed for scalability purposes. 2) The Adaptation Quality of Service Description (AQoS), which describes how (segments of) a media content need(s) to be adapted in order to correspond to the various usage contexts, e.g., how many quality layers need to be dropped to correspond to the currently available network bandwidth. 3) The Usage Environment Descriptions (UEDs) which describe the usage context, e.g., the available network bandwidth. Since all of these descriptions, i.e., all codec-specific information, are provided together with the media content, this helps to enable codec-agnostic adaptation nodes, which support any type of scalable media which is properly described by those DIA descriptions.

This thesis extends the static, server-based, gBSD-driven adaptation mechanism towards dynamic and distributed environments. To achieve this, novel mechanisms for fragmentation, storage and transport of content-related XML metadata are introduced. One particular contribution is the introduction of the concept of samples for metadata by employing Streaming Instructions which steer the fragmentation of and provide timing for XML-based metadata. This enables the synchronized processing of such a metadata stream with the described media samples. Furthermore, investigations of the ISO Base Media File Format show how such metadata streams can be stored for later processing. Finally, the applicability of the Real-Time Transport Protocol (RTP) is analyzed for the transport of such metadata streams. A codec-agnostic adaptation node based on these novel mechanisms is implemented and evaluated with regards to its adaptation performance for different types of scalable media. Extensive measurements with these scalable media contents show which parts of the gBSD-based adaptation process (could) benefit most from optimization.

Additionally, a mechanism based on a novel binary header to enable codec-agnostic adaptation of media content is specified. This Generic Scalability Header (GSH) prefixes each media packet payload and is based on the concepts of the gBSD-based adaptation mechanism. It provides information on both the bitstream syntax and the adaptation options and therefore combines (some of) the information provided by the MPEG-21 DIA gBSD and AQoS descriptions. However it enables codec-agnostic adaptation at a considerably lower performance cost. As above, the adaptation performance of this mechanism is evaluated for several types of scalable media. Finally, both mechanisms are implemented in the same adaptation architecture and compared to each other and additionally to a codec-specific adaptation approach using several types of scalable media.

A concluding discussion analyzes the results of the quantitative and qualitative evaluation of both mechanisms. Most notably the measurements show that for MPEG-4 Scalable Video Codec and MPEG-4 Visual Elementary Streams the GSH-based mechanism's throughput is only about 1.25 times lower than for the codec-specific mechanism and the metadata overhead is less than 1 percent. The gBSD-based mechanism comes at a higher cost for these codecs (about 10 times lower throughput and a maximum of 10 percent metadata overhead with compression). We conclude that, depending on the application scenario, both mechanisms can be viable alternatives to existing codec-specific adaptation approaches. In particular in scenarios where contents encoded with diverse (and potentially changing) scalable media codecs need to be adapted, the flexibility of codec-agnostic approaches can outweigh their reduced performance.

Advisor(s): Hermann Hellwagner, Rik Van de Walle

SIG MM member(s): Hermann Hellwagner


Multimedia Communication (MMC)

The research group "Multimedia Communication (MMC)" was founded and is being led by Prof. Hermann Hellwagner. In addition, the group currently has three research assistants, seven project staff members, and three administrative and technical staff members.

The research activities of the group are in the areas of: Multimedia communication and quality of service (QoS) provisioning; Adaptation of multimedia content with respect to.t. network, device and usage contexts; Standardization within the ISO/IEC MPEG group (MPEG-21 - Multimedia Framework); Mobile, adaptive multimedia applications; Multimedia in disaster management.

The focus of the MMC group is clearly on adaptive delivery of audio-visual contents, taking into account, for instance, fluctuating network and environmental conditions that can occure when users are on the move. In particular, we are currently investigating the use of Scalable Video Coding (SVC) technology in such networks.

The group actively participates in several international and national research projects on all levels, ranging from basic research to application-oriented projects and direct cooperation with industry.

In teaching, the MMC group covers the technical courses of the Informatics study programme such as Computer Organization, Operating Systems, Computer Networks, Servers and Clusters, Internet QoS, and Multimedia Coding.

Verena Kahmann

Collaborative Media Streaming

At the time being, multimedia services using IP technology are a hot topic for network and service providers. Examples are IPTV, which stands for television broadcast over a (mostly closed) network infrastructure by means of the IP suite, or video on-demand, which allows for watching selected movies via Internet on TV devices or computers in the home.

Technically, these services can be classified under the notion of streaming. A server sends media data in a continuous fashion to one or several clients, which consume data portions as soon as they arrive, mostly displaying them also. By using a feedback channel customers may influence the play-back, since they may watch programs time-shifted or pause the program.

An enhancement of such streaming services is to watch those movies together with a group of people on several devices in parallel, independent from the location of the other group members. Similar approaches have been developed using IP multicast, for example for distributing lectures or conference talks to a group of listeners. However, users cannot control the presentation: pausing or skipping of more unimportant parts is impossible. Moreover, the streaming presentation is announced by means outside the application instead of adding others to the session directly within the application.

The costream architecture developed in this work offers a collaborative streaming service without these limitations: People may retrieve movies, join others watching a movie or invite others to such a collaborative streaming session. Participants of a collaborative streaming session can control the movie presentation like they do on a DVD player. Dependent on the desired course of the session the control operation is executed for all users, or the group is split into subgroups to let watchers follow their own time-lines. For this, a group management controls access to session control operations by means of user roles. Separate from the group management, the so-called association service provides for streaming session control and synchronization among participants.

This separation of duties is advantageous in the sense that standard components can be used: For group management, SIP conferencing servers are suitable, whereas session control can best be handled using RTSP proxies as already used for caching of media data.

Eventually, the evaluation of this architecture shows that such a service offers both low latency for clients and an acceptable synchronization of media streams to different client devices. Moreover, the communication overhead compared to usual conferencing or streaming systems is very low.

Advisor(s): Prof. Dr.-Ing. Lars Wolf, TU Braunschweig (Supervisor), Prof. Dr.-Ing. Jörg Ott, Helsinki University (Second Reviewer)

SIG MM member(s): Lars Wolf, Jörg Ott


Communication and Multimedia Systems

The research in the Communication and Multimedia Systems (CM) group of Prof. Dr.-Ing. Lars Wolf is on architectures of communication and networking systems considering application requirements especially for, but not limited to, the Internet in a broad sense. Our research areas include

  • Multimedia networking and infrastructures, also for mobile and wireless systems
  • Wireless, especially ad-hoc and sensor networks, including also vehicular and delay-tolerant networks
  • Future network architectures and autonomic communication

These areas aren't separated but overlapping, leading to interesting influences on each other. Further, there are sometimes infrastructure and support oriented projects, e.g. for e-learning purposes.

Please feel free to visit our project pages for any further information on ongoing and finished research.

Previous Section Table of Contents Next Section