December 2010

PhD thesis abstracts

Chunxi Liu

A unified user preference based framework for video content personalization

Nowdays, there are many ways for users to access video resource, and the number of videos grows rapidly. On the other hand, the user's needs become more diversified and personalized. However, people's capacities of using and managing video data have not increased with the growth of the video. The confliction between the user's requirement and the actual technologies results in the "intention gap" between the users and the video data. In order to meet the diverse need of the user and overcome the "intention gap", the video content personalization technologies are required. Compared with the traditional video services, the personalization system can better meet the needs of users, improve service quality, and enhance the user experience. Video content personalization technologies have broad application background and market demand, therefore the research is very important.

The traditional personalization recommendation systems, such as the online book recommendation system etc, almost employ the collaborative filtering algorithm. However, the algorithm only considers the similarity between the users and the items for recommendation, and not considers the content of the data. Therefore, it is not suitable for video content personalization. Some works have been contributed to deal with the video content personalization. However due to the diversity and the complexity of the video data, these works are limited to the specific application environment.

This thesis proposes a unified video content personalization model. In the model, firstly the structure of the videos is analyzed. Then, by considering the user's requirement the contents of the videos are analyzed. Finally, by ranking the video contents according to the user's preference the video content personalization is achieved. In order to verify the validity and generality of the model, the thesis tests it on three different types of videos: news video, online video and sports video. The experimental results show that the model is valid and the generalization capacity is good. The research results of the thesis have strong practical application value, and set up a guideline in the video content personalization domain.

Advisor(s): Qingming Huang

SIG MM member(s): Qingming Huang

ISBN number: unpublished


Joint Research \& Development Laboratory for Advanced Computer and Communication Technologies (JDL) is a research unit specialized on multimedia, communication and intelligent human-computer interaction, aiming at the kernel researches in intelligent wide-band network multimedia systems, as well as the development of the key applications in these fields. JDL was founded in March 1996, originally as a joint laboratory cosponsored by the Motorola US and the National Center on Intelligent Computers (NCIC) Institute of Computing Technology of Chinese Academy of Sciences. From the July 2000, it is cooperated by Institute of Computing Technology and Graduate School Chinese Academy of Sciences. The researchers in the laboratory are from several units: the Research Center for Digital Media of Graduate School CAS, Institute of Computing Technology, School of Computer Science and Technology of Harbin Institute of Technology, College of Computer Science of Beijing University of Technology. There are also some visiting researchers from other domestic/foreign units and industrial companies.

The main research fields of JDL include: audio video coding technologies, content-based information retrieval from mass multimedia data, Biometrics, intelligent human-computer interaction, and applied algorithms. Currently, several projects from the National "973" Program, the National Fund of Sciences, the National Hi-Tech R\&D Program of China(863 Program), the Key Technologies R\&D Program, and the Knowledge Innovation Program of Chinese Academy of Sciences, are being studied in the lab.

After years of efforts, many original research fruits are achieved in the lab. More than 200 academic papers are published on the domestic and/or international journals or conferences. We are especially advanced on audio video coding technologies, face detection and recognition, content-based multimedia retrieval and multi-perception technologies, in which many innovative contributions have been achieved.

Four more research units are also contained in JDL: the workgroup for the standardization of the Chinese Audio-Video coding/decoding technologies cooperated by the MPEG-China National Body and the China Ministry of Information Industry, the Technical Center of China-America Digital Academic Library Graduate School of Chinese Academy of Sciences, the UNU/NUL Chinese Language Center, the ICT-YCNC Joint Research \& Development Lab for face recognition. JDL is always keeping well cooperation with both domestic and international universities, research units and IT companies. Broad cooperation projects are warmly welcome concerning face recognition, di1Ggital library, distance education and digital broadcast etc.

Frank Hopfgartner

Personalised Video Retrieval: Application of Implicit Feedback and Semantic User Profiles

A challenging problem in the user profiling domain is to create profiles of users of retrieval systems. This problem even exacerbates in the multimedia domain. Due to the Semantic Gap, the difference between low-level data representation of videos and the higher concepts users associate with videos, it is not trivial to understand the content of multimedia documents and to find other documents that the users might be interested in. A promising approach to ease this problem is to set multimedia documents into their semantic contexts. The semantic context can lead to a better understanding of the personal interests. Knowing the context of a video is useful for recommending users videos that match their information need. By exploiting these contexts, videos can also be linked to other, contextually related videos. From a user profiling point of view, these links can be of high value to recommend semantically related videos, hence creating a semantic-based user profile. This thesis introduces a semantic user profiling approach for news video retrieval, which exploits a generic ontology to put news stories into its context.

Major challenges which inhibit the creation of such semantic user profiles are the identification of user's long-term interests and the adaptation of retrieval results based on these personal interests. Most personalisation services rely on users explicitly specifying preferences, a common approach in the text retrieval domain. By giving explicit feedback, users are forced to update their need, which can be problematic when their information need is vague. Furthermore, users tend not to provide enough feedback on which to base an adaptive retrieval algorithm. Deviating from the method of explicitly asking the user to rate the relevance of retrieval results, the use of implicit feedback techniques helps by learning user interests unobtrusively. The main advantage is that users are relieved from providing feedback. A disadvantage is that information gathered using implicit techniques is less accurate than information based on explicit feedback.

This thesis focuses on three main research questions. First of all, implicit relevance feedback, which is provided while interacting with a video retrieval system, is studied as information source to bridge the Semantic Gap. Therefore, implicit indicators of relevance are identified by analysing representative video retrieval interfaces. Studying whether these indicators can be exploited as implicit feedback within short retrieval sessions, video documents are recommended based on implicit actions performed by a community of users. Secondly, implicit relevance feedback is studied as potential source to build user profiles and hence to identify users' long-term interests in specific topics. This includes studying the identification of different aspects of interests and storing these interests in dynamic user profiles. Finally, this feedback is exploited to adapt retrieval results or to recommend related videos that match the users' interests. The research questions are analysed by performing both simulation-based and user-centred evaluation studies. The results suggest that implicit relevance feedback can be employed in the video domain and that semantic-based user profiles have the potential to improve video exploration.

Advisor(s): Joemon M. Jose (Supervisor), Keith van Rijsbergen (Supervisor), Alan F. Smeaton (Supervisor)

SIG MM member(s): Joemon M. Jose, Alan Smeaton (?), Stefan Rueger (?)


The Information Retrieval Group

The Glasgow Information Retrieval Group has a vigorous programme of research, based on both theory and experiment, aimed at developing novel, effective, and efficient retrieval approaches for all types of information. The group plays a leading role in the international information retrieval community and has set trends in many aspects of IR research. The IR group of Glasgow is one of the oldest and major information retrieval research centres in the world.

The group, part of the School of Computing Science, University of Glasgow, has a long and strong research history in a wide area of information retrieval research from theoretical modelling of the retrieval process to large-scale text retrieval systems building and to the interactive evaluation of multimedia information retrieval systems. The group's interests also include areas such as large-scale and high performance text retrieval, Web information retrieval, Distributed and Peer-to-Peer retrieval, Intranet/Enterprise and Blog search, multilingual retrieval, and the development of novel adaptive interaction techniques. Their research preserves a strong emphasis on theoretically-driven, still practical solutions for large-scale document collections. The group maintains strong links with researchers in Machine Learning and Human-Computer Interaction, as well as with industry through knowledge and technology transfer. Members of the group have also been extensively involved in organising major conferences, workshops and summer schools in the area of information retrieval.

Ingo Kofler

In-Network Adaptation of Scalable Video Content

This thesis investigates mechanisms and applications for in-network adaptation of scalable video bit streams based on the recent H.264/Scalable Video Coding (SVC) standard. In-network adaptation refers to the adaptation of a video stream by a network element during the stream's transport through the network. The advantages of performing adaptation directly in the network are the availability of local monitoring data and a higher responsiveness according to the current networking conditions. In contrast to previous work in this field, this thesis focuses on the feasibility and realization of in-network adaptation on existing home router platforms. In this context this thesis addresses the following six research objectives. Initially, the relevant transport mechanisms for H.264/SVC and their implications on in-network adaptation (1) were analysed. In the context of this work three different Linux-based router platforms which cover a representative range of residential router devices were used as a basis for further studies and evaluations. In general these platforms can be characterized by rather modest processing capabilities and networking performance. The hardware limitations were identified and quantified in evaluations (2) using both different benchmarks and real network traffic. The offered processing power and memory throughput are roughly 10 to 100 times lower than those of a modern desktop PC. Although their application-layer networking performance is not that low, all platforms fail in fully utilizing their nominal link capacities of 100 and 1000 Mbps, respectively. Based on the known limitations the thesis proposes a stateful, packet-based adaptation mechanism for adapting scalable video bit streams (3). The approach utilizes the RTP payload format for H.264/SVC and represents a light-weight approach for in-network adaptation on the application layer. It further meets the important requirements towards a media-aware network element (MANE) to be signaling aware and to operate statefully. The mechanism was integrated in a proxy service which was deployed on all of the three platforms to prove its feasibility. Experimental evaluations with different video bit streams in standard-definition quality demonstrate the scalability of the approach (4). The results indicate that the proxy service is able to adapt up to 16 concurrent video streams depending on the platform and video bit stream. On two of the three evaluated platforms the proposed approach even allows to handle and to adapt video streams in high-definition quality at bit rates around 15 Mbps. In addition to the proposed H.264/SVC-specific adaptation mechanism, also the applicability of generic metadata-driven adaptation on home router platforms was investigated. In particular, a proof-of-concept study of an XML-metadata-driven approach based on the MPEG-21 generic Bitstream Syntax Description (gBSD) was conducted on the platforms (5). In contrast to former evaluations that have been done on PC-based platforms, the obtained results indicate that the use of this generic adaptation cannot be recommended on such resource limited network devices. The benefits of using in-network adaptation on home router platforms are finally demonstrated in the context of high-definition streaming over IEEE 802.11 wireless networks (6). Monitoring information regarding the queueing delay, which is obviously available exclusively on the router, is used to control the adaptation of the video according to the varying throughput of the wireless link. This allows to react timely to changing conditions particularly in the case of mobile clients.


Advisor(s): Hermann Hellwagner (1. rapporteur), Carsten Griwodz (2. rapporteur)

SIG MM member(s): Hermann Hellwagner, Carsten Griwodz


Multimedia Communication (MMC)

The research group "Multimedia Communication (MMC)" was founded and is being led by Prof. Hermann Hellwagner. In addition, the group currently has three research assistants, five project staff members, and three administrative and technical staff members.

The research activities of the group are in the areas of

  • Multimedia communication and QoS provisioning
  • Adaptation of multimedia content w.r.t. network, device, and usage contexts
  • Use of Scalable Video Coding (SVC) technology in networks and P2P systems
  • Adaptive multimedia applications, e.g., IPTV
  • Standardization within ISO/IEC MPEG
  • Multimedia in disaster management

The focus of the MMC group is clearly on adaptive delivery of audio-visual contents, taking into account, for instance, fluctuating network and environmental conditions that can occur when users are on the move. In particular, the group is currently investigating the use of Scalable Video Coding (SVC) technology in such networks. The group actively participates in several international and national research projects on all levels, ranging from basic research to application-oriented projects and direct cooperation with industry. In teaching, the MMC group covers the technical courses of the Informatics study programme such as Computer Organization, Operating Systems, Computer Networks, Servers and Clusters, Internet QoS, and Multimedia Coding.

Jia Li

Learning-based Visual Saliency Computation

With the rapid development of Internet, the amounts of images and videos are now growing explosively, leading to many new challenges on image/video processing. On one hand, the processing capability of computer is limited and the computational resource should be allocated to the important visual information with high priorities. On the other hand, the analysis results given by computer should be consistent with human cognition. To solve these two problems, this

thesis will focus on learning-based visual saliency computation and the main objective can be described as predicting, locating and mining the important visual information that is consistent with human cognition. The main contributions of this thesis can be summarized as follows:

Firstly, this thesis presents a probabilistic multi-task learning approach for computing visual saliency by simultaneously integrating the bottom-up and top-down factors. To the best of our knowledge, it is the first approach that explores the problem of visual saliency computation with the multi-task learning algorithm. In our approach, the bottom-up and the top-down factors are considered simultaneously in a probabilistic framework. In this framework, a bottom-up component simulates the low-level processes in human vision system using multi-scale wavelet decomposition; while a top-down component simulates the high-level processes to bias the competition of the input visual stimuli. Moreover, we propose a multi-task learning algorithm to optimize the models and model fusion strategies for various scenes. Extensive experiments on several datasets show that this approach demonstrates high robustness and effectiveness in computing visual saliency.

Secondly, this thesis proposes a cost-sensitive rank learning approach for visual saliency computation. To the best of our knowledge, it is the first approach that formulates the problem of visual saliency computation in a rank learning framework. For the video dataset with sparse eye-fixations, this approach avoids the explicit selection of reliable positive and negative samples. Instead, all the positive and unlabeled data are directly integrated into a cost-sensitive rank learning framework. Experimental results show that the rank learning framework can simultaneously take the influences of local visual attributes and pair-wise ``target-distractor'' correlations into account, resulting in better performance on the video dataset with sparse eye fixations.

Thirdly, this thesis presents a multi-task rank learning approach for visual saliency computation. In this approach, the problem of visual saliency computation is formulated in a multi-task rank learning framework to infer multiple saliency models that apply to different scene clusters. In the training process, this approach can infer multiple visual saliency models simultaneously. With an appropriate sharing of information across models, the generalization ability of each model can be greatly improved. Extensive experiments on the eye-fixation dataset show that our approach is highly effective in computing visual saliency in various scenes.

Fourthly, the thesis proposes a novel approach for salient object extraction by using complementary saliency maps. Then a video advertising system is developed to demonstrate its feasibility. This system consists of mainly two modules: the pull advertising module and the push advertising module. In these two modules, the interesting/salient objects are extracted through simple user interactions or complementary saliency maps, respectively. These interesting/salient objects, along with the user preferences, are used to provide content-related and user-targeted ads in a low-intrusive way. In the future, this system will be integrated by HuaWei, a worldwide well-known telecommunication company, into their intelligent streaming media service products.

In summary, this thesis investigates three important issues in learning-based visual saliency computation. Moreover, tentative studies have been carried out on salient object extraction and its application in saliency-based video advertising. To the best of our knowledge, this thesis presents a systematic study on how to apply machine learning into visual saliency computation for the first time. Moreover, this thesis demonstrates the feasibility and effectiveness of learning-based visual saliency computation. This will spark a great interest of research in the related communities in years to come.

Advisor(s): Wen Gao

SIG MM member(s): Wen Gao



Kalman Graffi

Monitoring and Management of Peer-to-Peer Systems

The peer-to-peer paradigm has had large success in content distribution and multimedia communication applications on the Internet. In a peer-to-peer network, the participating nodes create an infrastructure to provide a desired functionality and offer their resources to host an application in a distributed manner.

Besides the functional requirements of an application, the non-functional requirements to achieve a high service quality are also an important part of successful peer-to-peer networks and a major challenge is to meet these requirements in networks with unreliable nodes. In contrast to traditional centralized approaches where the quality can be measured and controlled, in a distributed environment it is challenging both to capture the status and performance of the whole distributed system in one point of time and to control its general behavior. In this dissertation, we focus on the monitoring and management of peer-to-peer systems.

We systematically engineer SkyEye.KOM, a fully decentralized monitoring mechanism that provides both a precise status snapshot of the peer-to-peer system and enables queries for peer capacities, such as bandwidth or storage capacities, in a large-scale peer-to-peer system. It considers individual load limits of the peers and ensures that no peer is overloaded. The core tree topology of SkyEye.KOM is established and maintained solely with protocol-relevant messages. It is based on local peer identifier calculations and using the underlying peer-to-peer overlay. As a second step, we focus on the management of peer-to-peer systems and introduce P3R3O.KOM and SkyNet.KOM, two solutions to manage both the reservation of available capacities in the peer-to-peer system and the system behavior in a fully decentralized and efficient manner. P3R3O.KOM is a peer-to-peer protocol for reliable long-term resource reservation that overcomes the limitations of traditional peer-to-peer services, which typically are host only by single peers and cease once the service providing peer fails. Resource reservations are fulfilled with adjustable guarantees (even 100%) in the presence of strong churn through the automated and fully decentralized management of the resource provision redundancy. With SkyNet.KOM, we present a fully decentralized approach for automated management of peer-to-peer systems following the principles of autonomic computing. It allows the user or system provider to set service quality goals for the peer-to-peer system, which are automatically verified by the monitoring solution SkyEye.KOM and analyzed, aligned and enforced by the other components of SkyNet.KOM. Preset quality goals for the peer-to-peer system are reached and held through automated systematic re-configuration of the individual components of the peer-to-peer system. At the end, we present LifeSocial.KOM, a peer-to-peer-based platform for online social networks that incorporates the proposed monitoring mechanism to show the feasibility and application scope of the monitoring and management solutions.

The impact of the thesis is to be seen in extending the applicability of the peer-to-peer paradigm to quality critical applications and scenarios. Through the monitoring approach, a system provider is able to observe and judge the quality of the peer-to-peer system. Regarding the function of capacity-based peer search, the capacities in a peer-to-peer system may be addressed and used to a full extent, allowing for the creation of applications with rich functionality using a wide set of capacities. Through the proposed management mechanisms, these capacities can also be used reliably in the presence of churn to host services and to establish the peer-to-peer paradigm as a serious and reliable alternative to traditional IT architectures. Additionally, through the automated quality control proposed with SkyNet.KOM, quality-controlled peer-to-peer applications may be created and operated, despite being hosted on a large-scale network of unreliable nodes. Lastly, peer-to-peer-based online social networks show the potential to become the next large application area for the peer-to-peer paradigm. LifeSocial.KOM is one of the first in this category and presents a viable approach for quality-aware peer-to-peer applications that satisfies the needs of both users and system providers.

Advisor(s): Prof. Dr.-Ing. Ralf Steinmetz (Supervisor), Prof. Carmen Guerrero Lopez, Ph.D. (Referee)

SIG MM member(s): Prof. Dr.-Ing. Ralf Steinmetz


Multimedia Communications Lab (KOM), TU Darmstadt, Germany

The Multimedia Communications Lab (KOM) is led by Prof. Dr.-Ing. Ralf Steinmetz and strives to work towards the vision of seamless communications, whereby people worldwide independent of their location as well as the used end-systems and devices are able to communicate and work with each other efficiently and effectively.

To reach this goal KOM works on mechanisms for the realization of QoS, security, adaptivity and context-awareness in systems and networks. We address in particular networks (e.g. p2p and mobile networks), communication services (e.g. IP based communication services), IT architectures (e.g. service oriented architectures), and media contents (media for information and knowledge sources, and community applications).

Application areas for the research at KOM are, for example, E-Business, E-Finance, and E-Learning.

Razib Iqbal

An Architecture for Federated Video Processing and Online Streaming

Today access to video is available via numerous multimedia enabled devices through a wide variety of network types. What is required is a mechanism to ensure that users can receive different qualities of video proportional to their device capabilities and network conditions. In this thesis, we propose an online adaptive video streaming approach which uses the Peer-to-Peer (P2P) paradigm to not only distribute the content using peers' bandwidth, but also adapt the video using peers' processing power, while taking into account receiver heterogeneity, watermarking, and perceptual encryption.

The proposed adaptive video streaming architecture aims at online video adaptation with streaming in P2P overlays to serve heterogeneous devices including small handhelds. Participating peers therefore contribute with both bandwidth and CPU power. We used the MPEG-21 generic Bitstream Syntax Description (gBSD) as a content metadata format and implemented a 3-in-1 adaptation-watermarking-encryption system for compressed-domain adaptation of video in a P2P fashion. Simulation is used to manifest that the design is robust, reliable, and suitable for multi-participant real-time collaboration and real-life deployment. System performance is validated against an analytical model also developed in the thesis.

The specific contributions made in this thesis are:

  • A P2P adaptive streaming architecture supporting simultaneous adaptation and streaming of video contents:
    • The adaptive video streaming architecture utilizes content metadata and compressed-domain video processing techniques to meet real-time video adaptation needs.
  • A mathematical model for the adaptive video streaming design to find the optimum solution at a given time:
    • The model is based on the Linear Programming problem and considers the relationship among all parameters that affect the efficiency of the streaming, and computes the trade-offs that exist between service fairness and system efficiency.
  • A taxation-based minimum contribution scheme:
    • Minimum contribution requirement ensures that resources allocated to serve a peer are commensurate with that peer's contribution rate.
    • A concept of fairness constraint is also introduced to ensure maximum service response by enabling equal service distribution to all the participating peers.
  • A compressed-domain spatial and temporal video adaptation scheme:
    • Joint spatiotemporal adaptations are evaluated to observe the real-time performance of the proposed compressed-domain adaptation mechanism.
  • A digital watermarking based authentication scheme.
  • A perceptual encryption scheme:
    • Both the authentication and encryption schemes can be operated in an intermediary node along with the adaptation operations. The encryption scheme is also spatiotemporal adaptation resilient.

Advisor(s): Shervin Shirmohammadi (supervisor), Mohamed Hefeeda (external examiner), Abdulmotaleb El Saddik (internal examiner)

SIG MM member(s): Shervin Shirmohammadi


Distributed and Collaborative Virtual Environment Research Lab (DISCOVER Lab), University of Ottawa, Canada

Research at the DISCOVER Lab is directed towards the enhancement of next generation human-human communication through advanced multimedia technology and virtual environments. Through our many projects, we are developing new ideas and technology that will make easy-to-use virtual environments a reality. Research projects at the DISCOVER lab typically fall into the following categories:


  • Networked Games and Collaborative Virtual Environments
  • Multimedia Systems and Applications
  • 3D Physical Modelling and Animation
  • Intelligent Sensor Networks and Ubiquitous Computing
  • Haptics and Teleoperation
  • Multimedia-Assisted Biomedical Engineering

Previous Section Table of Contents Next Section