SIGMM Records

March 2009 PhD thesis abstracts

PhD thesis abstracts

Birgit Zimmermann

Pattern-based Process Description and Process Support: A Tool to Support Processes for the Adaptation of E-Learning Material

Creating high quality E-Learning material is a time and cost consuming task. Re-using existing material could reduce these costs. But often a one-to-one reuse of the existing material is not possible, as the new usage scenario differs from the original one. If, for example, a learning resource created for company A has to be reused in company B, it is likely that the layout and the terminology have changed. In this case it is necessary to adapt the learning resource to the new usage scenario.

Predominantly, learning resources are created by domain experts, who have expertise in the domain the learning resource deals with. But often these persons are not experts in adapting the learning resources to new usage scenarios. Hence, it is difficult for them to perform the adaptation processes. There are several reasons for this: On the one hand there are many file formats used within learning resources. On the other hand there are many different kinds of adaptations (like layout, language, didactics etc.). Thus, special knowledge is needed to perform adaptation processes. Therefore, to develop software for supporting adaptation processes, it is important to integrate the knowledge of experts in performing adaptation processes into the development process, as they know how the processes are performed and which functionalities are required to support users in performing the processes.

However, in reality experts in performing adaptation processes are barely involved in the design process of new software. Usually, process experts are consulted for the processes they perform, but the design and the development of the software are done by software designers and developers. As a result, the software often reflects the understanding of these persons of the processes, which in many cases differs from how a process expert understands and performs the processes.

In this thesis, a concept is proposed that allows experts in performing adaptation processes to be involved more directly into the development process of software for supporting adaptation processes. Thus a tool can be developed, that contains the knowledge of process experts and supports other persons in performing the processes. The developed concept consists of three steps, where the first step offers process experts the option to describe the adaptation processes. Relying on this description, a software prototype can be created in the second step. In the third step, this prototype serves as a basis to develop a functional tool for supporting adaptation processes.

For the description of the adaptation processes a pattern based notation formalism is applied in the first step. This formalism has the advantage of using natural language. Thus, it is easy to understand and easy to learn even for process experts, who have no knowledge of process modelling. Moreover, patterns have been proven suitable to document expert knowledge on how to perform certain tasks. Additionally, an input tool has been developed that supports process experts in creating the pattern-like process descriptions.

The input tool creates a machine-readable XML-representation of the descriptions provided by the process experts. In the second step this representation serves as input for a second tool that allows process experts to generate a software prototype based on their process descriptions. With this prototype it is possible to check, if the adaptation processes are supported in a correct and desired way. If a prototype does not meet the process understanding of the experts, the process descriptions can be adapted and a new prototype can be generated. This proceeding can be repeated until the prototype meets the expectations of the process experts.

The prototype developed based on this proceeding reflects the understanding of experts of the adaptation processes. Thus, it serves as a valuable basis for further development. In the third step it can be passed on to a developer, who can add automated functionalities to the prototype. Thereby, a fully functional tool to support adaptation processes is developed. This tool guides other persons through the adaptation processes and gives them support, based on the knowledge of experts. Whenever possible, automated functionalities are offered, to serve the convenience of the user. As a result, even unskilled persons, who are not experts in performing adaptation processes, are supported in adapting existing learning resources to new usage scenarios.

As evaluation of the concepts and tools developed in this thesis, a user test has been conducted. It has shown that the concepts presented here enable process experts, to describe the processes they perform, and to create a prototype, which represents these processes. In addition it has been shown in a second user test that the developed tool for adaptation processes, which has been created based on the discussed concepts, enhances the support for adaptation processes: Users make less errors, the adaptation takes less time and the satisfaction of the users is higher compared to users working with a conventional tool.

Advisor(s): Ralf Steinmetz (first examiner), Jörg M. Haake (second examiner)

SIG MM member(s): Ralf Steinmetz


KOM - Multimedia Communications Lab

Multimedia Communications Lab at the Department of Electrical Engineering and Information Technology at TUD is headed by Prof. Dr.-Ing. Ralf Steinmetz (Adjunct Professor of the Department of Computer Science). The Multimedia Communications Lab haunts the vision of seamless multimedia communication. Seamless multimedia communication has the potential to create a future where people from all over the world live, collaborate, and communicate independent of geographical constraints. The communication systems that support this collaboration have to be performant, dependable, secure, and adaptable to user requirements.

The lab works on different Research Areas towards this vision:

Communication Services

IT Architectures

Knowledge Media

Mobile Networking

Network Mechanisms & QoS

Peer-to-Peer Networking

Ubiquitous Computing

Networked Gaming

IT for Mobility and Logistics

Effrosyni Kokiopoulou

Geometry-aware analysis of high-dimensional visual information sets

Over the past few decades we have been experiencing a data explosion; massive amounts of data are increasingly collected and multimedia databases, such as YouTube and Flickr, are rapidly expanding. At the same time rapid technological advancements in mobile devices and vision sensors have led to the emergence of novel multimedia mining architectures. These produce even more multimedia data, which are possibly captured under geometric transformations and need to be efficiently stored and analyzed. It is also common in such systems that data are collected distributively. This very fact poses great challenges in the design of effective methods for analysis and knowledge discovery from multimedia data. In this thesis, we study various instances of the problem of classification of visual data under the view-point of modern challenges. Roughly speaking, classification corresponds to the problem of categorizing an observed object to a particular class (or category), based on previously seen examples. We address important issues related to classification, namely flexible data representation for joint coding and classification, robust classification in the case of large geometric transformations and classification with multiple object observations in both centralized and distributed settings.

We start by identifying the need for flexible data representation methods that are efficient in both storage and classification of multimedia data. Such flexible schemes offer the potential to significantly impact the efficiency of current multimedia mining systems, as they permit the classification of multimedia patterns directly in their compressed form, without the need for decompression. We propose a framework, called semantic approximation, which is based on sparse data representations. It formulates dimensionality reduction as a matrix factorization problem, under hybrid criteria that are posed as a trade-off between approximation for efficient compression and discrimination for effective classification. We demonstrate that the proposed methodology competes with state-of-the-art solutions in image classification and face recognition, implying that compression and classification can be performed jointly without performance penalties with respect to expensive disjoint solutions.

Next, we allow the multimedia patterns to be geometrically transformed and we focus on transformation invariance issues in pattern classification. When a pattern is transformed, it spans a manifold in a high dimensional space. We focus on the problem of computing the distance of a certain test pattern from the manifold, which is also closely related to the image alignment problem. This is a hard non-convex problem that has only been sub-optimally addressed before. We represent transformation manifolds based on sparse geometric expansions, which results in a closed-form representation of the manifold equation with respect to the transformation parameters. When the transformation consists of a synthesis of translations, rotations and scalings, we prove that the objective function of this problem can be decomposed as a difference of convex functions (DC). This very property allows us to solve optimally our optimization problem with a cutting plane algorithm, which is well known to successfully find the global minimizer in practice. We showcase applications in robust face recognition and image alignment.

The classification problem with multiple observations is addressed next. Multiple observations are typically produced in practice when an object is observed over successive time instants or under different viewpoints. In particular, we focus on the problem of classifying an object when multiple geometrically transformed observations of it are available. These multiple observations typically belong to a manifold and the classification problem resides in determining which manifold the observations belong to. We show that this problem can be viewed as a special case of semi-supervised learning, where all unlabelled examples belong to the same class. We design a graph-based algorithm, which exploits the structure of the manifold. Estimating the unknown object class then results into a discrete optimization problem that can be solved efficiently. We show the performance of our algorithm in classification of multiple handwritten digit images and in video-based face recognition. Next, we study the problem of classification of multiple observations in distributed scenarios, such as camera networks. In this case the classification is performed iteratively and distributively, without the presence of a central coordinator node. The main goal is to reach a global classification decision using only local computation and communication, while ensuring robustness to changes in network topology. We propose to use consensus algorithms in order to design a distributed version of the aforementioned graph-based algorithm. We show that the distributed classification algorithm has similar performance as its centralized counterpart, provided that the training set is sufficiently large. Finally, we delve further into the convergence properties of consensus-based distributed algorithms and we propose an acceleration methodology for fast convergence that uses the memory of the sensors. Our simulations show that the convergence is indeed accelerated in both static and dynamic networks, and that distributed classification in sensor networks can significantly benefit from them.

Overall, the present thesis addresses a few important issues related to pattern analysis and classification in modern multimedia systems. Our solutions for semantic approximation and transformation invariance can impact the efficiency and robustness of classification in multimedia systems. Furthermore, our graph-based framework for multiple observations is able to perform effective classification in both centralized and distributed environments. Finally, our fast consensus algorithms can significantly contribute to the accelerated convergence of distributed classification algorithms in sensor networks.

Advisor(s): Pascal Frossard

SIG MM member(s): Pascal Frossard


Signal Processing Laboratory (LTS4) - EPFL

Eunyee Koh

Representing Information Collections for Visual Cognition

The importance of digital information collections is growing. Collections are typically represented with text-only, in a linear list format, which turns out to be a

weak representation for cognition. We learned this from empirical research in cognitive psychology, and by conducting a study to develop an understanding of current

practices and resulting breakdowns in human experiences of building and utilizing collections. Because of limited human attention and memory, participants had trouble finding specific elements in their collections, resulting in low levels of collection utilization. To address these issues, this research develops new collection representations for visual cognition. First, we present the image+text surrogate, a concise representation for a document, or portion thereof, which is easy to understand and think about. An information extraction algorithm is developed to automatically transform a document into a small set of image+text surrogates. After refinement, the average accuracy performance of the algorithm was 90%. Then, we introduce the composition space to represent collections, which helps people connect elements visually in a spatial format. To ensure diverse information from multiple sources to be presented evenly in the composition space, we developed a new control structure, the ResultDistributor. A user study has demonstrated that the participants were able to browse more diverse information using the ResultDistributor-enhanced composition space.

Participants also found it easier and more entertaining to browse information in this representation. This research is applicable to represent the information resources in contexts such as search engines or digital libraries. The better representation will enhance the cognitive efficacy and enjoyment of peopleŽ≠© everyday tasks of information

searching, browsing, collecting, and discovering.

Advisor(s): Chair of Committee: Andruid Kerne

SIG MM member(s): Eunyee Koh, Andruid Kerne


Interface Ecology Lab

The Interface Ecology Lab invents new media and information forms by exploring new relationships between human beings and technology. We instantiate interface ecosystems by connecting sensation, computing, media, semantics, and cultures. We develop digital systems, environments, components, and compositions to elevate the role of human creativity and expression.

Kristopher West

Novel Techniques for Audio Music Classification and Search

This thesis presents a number of modified or novel techniques for the analysis of music audio for the purposes of classifying it according genre or implementing so called `search-by-example' systems, which recommend music to users and generate playlists and personalised radio stations. Novel procedures for the parameterisation of music audio are introduced, including an audio event-based segmentation of the audio feature streams and methods of encoding rhythmic information in the audio signal. A large number of experiments are performed to estimate the performance of different classification algorithms when applied to the classification of various sets of music audio features. The experiments show differing trends regarding the best performing type of classification procedure to use for different feature sets and segmentations of feature streams.

A novel machine learning algorithm (MVCART), based on the classic Decision Tree algorithm (CART), is introduced to more effectively deal with multi-variate audio features and the additional challenges introduced by event-based segmentation of audio feature streams. This algorithm achieves the best results on the classification of event-based music audio features and approaches the performance of state-of-the-art techniques based on summaries of the whole audio stream.

Finally, a number of methods of extending music classifiers, including those based on event-based segmentations and the MVCART algorithm, to build music similarity estimation and search procedures are introduced. Conventional methods of audio music search are based solely on music audio profiles, whereas the methods introduced allow audio music search and recommendation indices to utilise cultural information (in the form of music genres) to enhance or scale their recommendations, without requiring this information to be present for every track. These methods are shown to yield very significant reductions in computational complexity over existing techniques (such as those based on the KL-Divergence) whilst providing a comparable or greater level of performance. Not only does the significantly reduced complexity of these techniques allow them to be applied to much larger collections than the KL-Divergence, but they also produce metric similarity spaces, allowing the use of standard techniques for the scaling of metric search spaces.

Advisor(s): Prof. Stephen Cox (Supervisor), Dr. Ben Milner (Internal Examiner), Dr. Josh Reiss (External Examiner)

SIG MM member(s): Kris West


Previous Section Table of Contents Next Section