We've assembled the finest group of tutorials for this year's ACM Multimedia. We have something for everybody, from new students just starting work in multimedia to the most seasoned researcher. Our presenters have worked hard to distill the essence of their topics into a half-day tutorial. Spend the day learning everything you need to think and work in these areas.

The tutorials are grouped into four areas:

We hope you will enjoy these tutorials. We think there is something for everyone!

Machine Learning

How do we reason about multimedia data. The most common tools for statistical reasoning are based on statistics. Bayesian methods provide the basic approach and are widely used. When basic statistics are not enough, people are added to the loop so that their opinions can actively improve the system's performance.

Tutorial: Bayesian Methods for Multimedia Signal Processing
Presenter: A. Taylan Cemgil

In the last years, there have been a significant growth of multimedia information processing applications that employ ideas from statistical machine learning and probabilistic modeling. In this paradigm, multimedia data (music, audio, video, images, text, ...) is viewed as realizations from highly structured stochastic processes. Once a model is constructed, several interesting problems such as transcription, coding, classification, restoration, tracking, source separation or resynthesis etc. can be formulated as Bayesian inference problems. In this context, graphical models provide a "language" to construct models for quantification of prior knowledge. Unknown parameters in this specification are estimated by probabilistic inference. Often, however, the problem size poses an important challenge and in order to render the approach feasible, specialized inference methods need to be tailored to improve the computational speed and efficiency.

The scope of the proposed tutorial is as follows: First, we will review the fundamentals of probabilistic models, with some focus on music, video and text data. Then, we will discuss the numerical techniques for inference in these models. In particular, we will review exact inference, approximate stochastic inference techniques such as Markov Chain Monte Carlo, Sequential Monte Carlo and deterministic (variational) inference techniques. Our ultimate aim is to provide a basic understanding of probabilistic modeling for multimedia processing, associated computational techniques and a roadmap such that information retrieval researchers new to the Bayesian approach can orient themselves in the relevant literature and understand the current state of the art.

taylanbasPresenter’s biography: A. Taylan Cemgil received his B.Sc. and M.Sc. in Computer Engineering, Bogazici University, Turkey and his Ph.D. (2004) from Radboud University Nijmegen, the Netherlands with a thesis entitled Bayesian music transcription. Between 2003-2005, he worked as a postdoctoral researcher at the University of Amsterdam on vision- based multi-object tracking. He is currently a research associate at the Signal Processing and Communications Lab., University of Cambridge, UK, where he cultivates his interests in machine learning methods, stochastic processes and statistical signal processing. His research is focused towards devoloping computational techniques for audio, music and multimedia processing.


Tutorial: Active learning for multimedia indexing and retrieval.
Presenter: Dr. Georges Quénot

Active learning improves the performance of classification or search systems by adding humans to the loop. It is useful for optimizing the annotation of large corpora, those labeled data being useful for efficient supervised learning. The proposed tutorial responds to a strong need for the integration of this technique in multimedia indexing and retrieval systems. It will present the basics of active learning and will give the necessary information for quickly and efficiently integrating it within a project.

Supervised learning consists of training a system from sets of positive and negative examples. The learning system is composed of feature extractors, classifiers and fusion modules. The system's performance depends not only on the implementation choices and but its performance also strongly depends upon the quantity and quality of the training examples. While it is quite easy and cheap to get large amounts of raw data, it is usually very costly to have them annotated because this involves a human intervention for the assessment of the “ground truth”. While the volume of data that can be manually annotated is limited due to the cost of manual intervention, it is useful to select the data samples to be annotated so that their annotations are as useful as possible. Deciding which samples will be the most useful is not trivial. Active learning is an approach in which the system predicts the usefulness of new labeled samples. The whole process is a virtuous circle is operating in which the system gets better and better as new samples are annotated. This approach is a particular case of incremental learning in which a system is trained several times with a growing set of samples.

The tutorial will include two parts. The first part is an introduction describing the principles, the history and the main applications of active learning. Active learning is often used offline, typically in the case of classification based on supervised learning. It can also be used online when a system is dynamically modeling a user’s need via relevance feedback.
The second part of the tutorial is a detailed analysis of active learning applied to semantic video indexing. This is a high level feature (concepts) extraction task in the NIST TRECVID evaluation campaign. I will analyze the effect of the following parameters: the active learning strategy, the fraction of the training set that is annotated, the active learning step size, the difficulty or the frequency of the targeted concepts, the total size and the variety of the training set, the precision versus recall compromise, and the combination of strategies. A particular focus will be on the optimal creation of annotations for a training corpus. It will be illustrated with the case of the collaborative annotation of the TRECVID 2007 development set which will be done in the context of an active learning process. A global conclusion will be given. It will summarize the important points and options, it will make a synthesis of the acquired experience in the field and it will indicate the still open questions.

Georges_QuenotPresenter’s biography: Dr. Georges Quénot is Researcher at CNRS (French National Centre for Scientific Research). He has an engineer diploma of the French Polytechnic School (1983) and a PhD in computer science (1988) from the University of Orsay. He is currently with the Multimedia Information Indexing and Retrieval group (MRIM) of the Laboratoire d'informatique de Grenoble (LIG) where he is responsible for their activities on video indexing and retrieval. His current research activity is about semantic indexing of image and video documents using supervised learning, networks of classifiers and multimodal fusion. He participated since 2001 in the NIST TRECVID evaluations on shot segmentation, story segmentation, concept indexing and search tasks. He is organizing the collaborative annotation for the TRECVID 2007 development data using an active learning approach.

Emerging Technologies

Tutorial: Digital Inpainting
Presenter: Timothy K. Shih

Digital inpainting is an interesting new research topic in multimedia computing and image processing since 2000. This tutorial will cover the most recent contributions in image inpainting / image completion, video inpainting, and 3-D surface completion. In the literature, the first intention of image inpainting was to remove damages portions of an aged photo, by completing the area with surrounding or global information. The techniques used include the analysis and usage of pixel properties in spatial and frequency domains. Furthermore, image inpainting techniques were used in object removal (or image completion) in photos. Several strategic algorithms were developed based on confidence values and priorities of using patches. The techniques used in still images were then extended to video inpainting, which need to consider temporal properties such as motion vectors. With a reasonable combination of object tracking and image completion, objects in video can be removed and possibly replaced. On the other hand, aged films contain two types of defects: spikes and lone vertical lines. These defects need to be precisely detected and removed to restore the original film. In addition, based on image completion techniques, incompleteness of scanning results of a 3-D scanner due to improper location or other reasons of a scanner can be solved. This tutorial will discuss the above techniques founded in more than 30 top papers. In addition to an overview, the tutorial will be divided into 4 sections: image inpainting, video inpainting, 3-D surface inpainting, and a section to discuss inpainting projects developed by the presenter. Demonstrations will be included. In addition, the tutorial presenter will distribute a DVD ROM for the audience. The DVD ROM will include several interesting computer programs for testing inpainting results. In the past, the tutorial speaker delivered more than 20 keynotes/plenary talks and 16 tutorials (includes those in ACM Multimedia, IEEE ICME, IEEE PCM, MMM, and DMS).

tshihPresenter’s biography: Dr. Shih is a professor of the Department of Computer Science and Information Engineering at Tamkang University, Taiwan and an adjunct professor at National Tsing Hua University, Taiwan. He is a member of ACM. As a senior member of IEEE, Dr. Shih joined the Educational Activities Board of the Computer Society. His current research interests include Multimedia Computing and Distance Learning. He was a faculty of the Computer Engineering Department at Tamkang University in 1986. In 1993 and 1994, he was a part time faculty of the Computer Engineering Department at Santa Clara University. He was also a visiting professor at the University of Aizu, Japan in summer 1999, a visiting researcher at the Academia Sinica, Taiwan in summer 2001, and is a visiting processor at the City University of Hong Kong in summer 2007. Dr. Shih is an adjunct faculty of Xidian University, Huazhong University of Science and Technology, and Beijing Jiaotong University, China. Dr. Shih has edited many books and published over 380 papers and book chapters, as well as participated in many international academic activities, including the organization of more than 50 international conferences and several special issues of international journals. He is the founder and co-editor-in-chief of the International Journal of Distance Education Technologies, published by Idea Group Publishing, USA. Dr. Shih is the associate editor of ACM Transactions on Internet Technology. He was also the associate editor of IEEE Transactions on Multimedia. Dr. Shih has received many research awards, including research awards from National Science Council of Taiwan, IIAS research award from Germany, HSSS award from Greece, Brandon Hall award from USA, and several best paper awards from international conferences. He also received many funded research grants from both domestic and international agencies. Dr. Shih has been invited to give more than 20 keynote speeches and plenary talks in international conferences, tutorials in IEEE ICME 2001/2006 and ACM Multimedia 2002/2007, and talks at international conferences and overseas research organizations. Publications, demonstrations and contact address of Dr. Shih can be retrieved from http://www.mine.tku.edu.tw/chinese/teacher/tshih.htm.


Tutorial: Large data methods for Multimedia
Presenters: Michael Casey and Frank Kurth

This tutorial will describe techniques useful for tackling the large databases that are so common on the Internet today. There are more than 2 million songs in the commercial music catalog; Over 300 million images in a photo service like Flickr. How can we find the music, videos or images we want? How can we organize such large collections: find duplicates, create links between similar documents, and extract the semantic structure of complex audiovisual documents? Conventional methods for handling large data sets, such as hashing, get us part of the way to the right answer, but these methods do not extend or scale when used for similarity-based matching and retrieval in large audiovisual document collections. Similarly, elaborate methods from multimedia retrieval are available for semantic document analysis. Again, these methods do not scale for data sets with millions of items. Instead, new classes of algorithms, combining the best of the worlds of large data methods and semantic analysis, are needed to handle large multimedia databases. Innovative methods such as locality sensitive hashing, which are based on randomized probes, are the new workhorses.

This tutorial will cover methods for multimedia retrieval on large document collections. Starting with image and audio retrieval, it will describe both the theory (i.e., randomized algorithms for hashing) and the implementation details (how do you store hash values for 2 million songs?). A special focus will be on how to combine large data methods with semantically meaningful descriptors in order to facilitate efficient similarity-based retrieval. Besides image and audio, the tutorial will cover 3d motion and video retrieval.

CaseyViennaKremsSmall3Presenters' biographies: Michael Casey (PhD MIT 1998) is a professor at University of London, Goldsmith College. Heconducted his doctoral research at the MIT Media Lab's Music-Mind-Machine group. His research explores new approaches to computing as a creative medium and advanced computational methods for organising large multimedia collections to support digital humanties research. He is an editor of the MPEG-7 International Standard for Multimedia Content Description (ISO15938-4 Audio 2002), a standard for automatic organisation of multimedia databases. Michael is also a composer and artist and he has received a number of international awards for his works in digital media.

Frank KurthFrank Kurth studied computer science and mathematics at Bonn University, Germany, where he received both a Master's degree in computer science and the degree of a Doctor of Natural Sciences (Dr. rer. nat.) in 1997 and 1999, respectively. Since his Habilitation (postdoctoral lecture qualification) in computer science in 2004, he holds the title of a Privatdozent and is teaching at Bonn University. Currently, he is with the Communication Systems group at FGAN-FKIE in Wachtberg-Werthhoven, Germany. His research interests include audio signal processing, fast algorithms, multimedia information retrieval, and digital libraries for multimedia documents. Particular fields of interest are music information retrieval, fast content- based retrieval, and bioacoustical pattern matching.


Tutorial: MPEG Multimedia Standards: Evolution and Future Developments
Presenter: Fernando Pereira

Multimedia communications play a growing role in the every day live o modern societies. Until recently, and except for broadcast television and radio, voice was still the sole communication mechanism. However, the diffusion of digital processing algorithms and hardware has brought images, music, and video into everyday life. The availability of open standards (such as JPEG, MPEG-X Audio and Video, H.26X) has had a major impact on this progression, notably due to the easy interoperability. Such standards have made the creation, and communication of (digital) data aimed at our most important senses, sight and hearing, simple, inexpensive and commonplace. With time, multimedia standards have addressed a growing set of fields from coding and metadata to rights management and content adaptation, following the increasing (functional and technical) complexity of multimedia applications.

Since MPEG standards have played a key role in the progress of the multimedia landscape, this tutorial will provide an evolutional overview of MPEG standards, discussing and explaining why certain choices were made, and thus a certain vision of the multimedia world was followed. Moreover this tutorial will specifically address the most recent MPEG standards, notably MPEG-21, MPEG-4 AVC, SVC and MVC, and finally MPEG-A.

FPPresenter’s biography: Fernando Pereira is currently Professor at the Electrical and Computers Engineering Department of Instituto Superior Técnico. He is responsible for the participation of IST in many national and international research projects. He acts often as project evaluator and auditor for various organizations. He is an Area Editor of the Signal Processing: Image Communication Journal and is or has been an Associate Editor of IEEE Transactions of Circuits and Systems for Video Technology, IEEE Transactions on Image Processing, IEEE Transactions on Multimedia, and IEEE Signal Processing Magazine. He is a Member of the IEEE Signal Processing Society Image and Multiple Dimensional Signal Processing Technical Committee and of the IEEE Signal Processing Society Multimedia Signal Processing Technical Committee. He was an IEEE Distinguished Lecturer in 2005, and member of the Scientific and Program Committees of many international conferences. He has contributed more than 180 papers. He has been participating in the work of ISO/MPEG for many years, notably as the head of the Portuguese delegation, Chairman of the MPEG Requirements Group, and chairing many Ad Hoc Groups related to the MPEG-4 and MPEG-7 standards. His areas of interest are video analysis, processing, coding and description, and interactive multimedia services.


Tutorial: Multimedia Content Protection
Presenters: Dulce Ponceleon and Nelly Fazio

This tutorial has been cancelled. We apologize for the inconvenience.

Tools you can use

Tutorial: Mobile Phone Programming for Multimedia
Presenter: Jürgen Scheibl

If you are an enthusiastic mobile phone user who has many ideas and new ways of using your phone, this practical hands-on tutorial will show you how to realize your own novel concepts and ideas without spending too much time and effort. It aims to equip you with some practical skills of programming mobile devices for your projects and to bring inspiration for innovation.

Whether you are a novice programmer having some basic programming or scripting knowledge (Flash, Php, ...) or you are an experienced programmer, you will get a quick overview and understanding about programmable phone features especially for multimedia,. But first of all you will gain practical experience on how to write mobile applications with Python for S60 (Nokia) - even within this short tutorial time.

Topics to be covered:

    1. Introduction to Python S60 – a trigger for innovation and a toolkit for rapid prototyping
    2. Python S60,feature overview and demo examples (capabilities and limitations)

    Hands-om session:
    3. GUI programming, SMS sending/receiving,
    4. Sound recording/playing, Midi, Text to speech
    5. Camera, 2D Graphics, 3D Open GL ES, Keyboard key programming, Video playing
    6. Bluetooth: phone to phone, phone to computer (e.g. controlling Max/MSP), phone to microcontroller (arduino)
    7. Networking: File upload/download, Wifi, client-server applications
    8. Location, Contacts and Calendar
    (Phones will be provided, but bring your own Laptop along: Mac, windows, Linux)

Online tutorial of Python for S60: http://www.mobilenin.com/pys60/menu.htm

The mobile space and the internet are rapidly converging and are turning into a rich source of opportunities. Modern mobile phones offer a large set of features including camera, sound, video, messaging, telephony, location, Bluetooth, Wifi, GPS, Internet access etc. Features that could easily be combined and used for creating new types of applications that kick and bring engaging experiences to users.

The problem: Developing applications on the mobile platform has been time consuming in the past and required a steep learning curve. As a result, people often gave up early or never started to turn their innovative ideas into working solutions. And in research projects we often face the situation that we have lack of time and resources. Additionally we often need to apply a rapid iterative design process for building our applications, but we might lack suitable tools for doing so. 

Python for S60 which is introduced in this workshop offers a crucial turning point here. It allows developing mobile applications even by novice programmers, artists and people from the creative communities enabling them to contribute applications and concepts to the mobile space.

  • Python for S60 is easy to learn
  • It can drastically reduce development time
  • It makes rapid prototyping easy and efficient by wrapping complex low-level technical details behind simple interfaces
  • and above all - it makes programming on the mobile platform fun.

Jürgen ScheiblPresenter’s biography: Jürgen Scheible is an Engineer (Telecommunications), a music and media artist. He is a doctoral student at the Media Lab, University of Art and Design, Helsinki where he runs the Mobile Hub, a prototype development environment for mobile client/server applications with a strong focus on artistic approaches and creative design. He spent several months in 2006 as a visiting scientist at MIT, Boston, CSAIL and worked previously for Nokia for 8 years. In 2006 and 2007 Jürgen was recognized as a Forum Nokia Champion for his driving vision to be a bridge builder between art, engineering and research. He is internationally active in teaching innovation workshops on rapid mobile application prototyping in academic but also professional settings e.g. at Stanford University, MIT, NTU Taiwan, Yahoo Research Berkeley, Nokia. In the 2nd half of 2007 his book “Mobile Python” will be published by Symbian Press/Wiley, bringing ‘easy programming’of mobile phones to the creative communities. He was one of the ACM Computers in Entertainment Scholarship Award winners in 2006 and Best Arts Paper Award winner at ACM Multimedia 2005 conference. His research focuses on designing multimodal user interfaces for creating and sharing interactive artistic experiences.


Tutorial: Human-Centered Multimedia Systems
Presenters: Alejandro (Alex) Jaimes, IDIAP Research Institute (Switzerland)
            Nicu Sebe, University of Amsterdam (The Netherlands)

This tutorial will take a holistic view on the research issues and applications of Human-Centered Multimedia Systems focusing on three main areas: (1) multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis; (2) image databases, indexing, and retrieval: context modeling, cultural issues, and machine learning for user-centric approaches; (3) multimedia data: conceptual analysis at different levels (feature, cognitive, and affective).

Multimedia plays a fundamental role in many new types of interfaces and application areas (multimodal and attentive interfaces, applications such as surveillance, medicine, art, etc.) in which humans play a central role. This implies that building Multimedia systems (e.g., for human-computer interaction, etc.) lies at the crossroads of many research areas (psychology, artificial intelligence, pattern recognition, computer vision etc.). Although many existing multimedia systems were designed with human uses in mind, many of them are far from being user friendly or are rooted on real-world human-needs (few Multimedia systems can be considered "Human-Centered"). What are the current trends in computing and what can the scientific/engineering community do to effect a change for the better? On one hand, the fact that computers are quickly becoming integrated into everyday objects (ubiquitous and pervasive computing) implies that effective natural human-computer interaction is becoming critical (in many applications, users need to be able to interact naturally with computers the way face-to-face human-human interaction takes place). On the other hand, the wide range of applications that use multimedia, and the amount of multimedia content currently available, imply that building successful computer vision and multimedia applications requires a deep understanding of multimedia content. The success of human- centered multimedia systems, therefore, depends highly on two joint aspects: (1) the way humans interact naturally with such systems (using speech and body language) to express emotion, mood, attitude, and attention, and (2) the human factors that pertain to multimedia data (human subjectivity, levels of interpretation).

In this tutorial, we take a holistic approach to developing human- centered multimodal systems. We aim to identify the important research issues, and to ascertain potentially fruitful future research directions in relation to the two aspects above. In particular, we introduce key concepts, discuss technical approaches and open issues in three areas: (1) multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis; (2) image databases, indexing, and retrieval: context modeling, cultural issues, and machine learning for user- centric approaches; (3) multimedia data: conceptual analysis at different levels (feature, cognitive, and affective). The focus of the tutorial, therefore, is on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing Human- Centered Multimedia Systems.

alexPresenter’s biography: Alejandro Jaimes is Scientific Manager and Senior Researcher at IDIAP Research Institute where he is responsible for managing the research efforts of 12 partners within the EU-funded AMIDA integrated project (Augmented Multiparty Interaction), as well as leading the Human-Machine Interaction unit of IM2 (Interactive Multimodal Information Management), the Swiss National Centre of Competence in Research (NCCR) headed by IDIAP. His technical research focuses on creating new technical approaches for computer-understanding of multimedia content and for human interaction with computers in creative environments, mainly developing computer vision techniques that use machine learning, that involve humans directly, and that are rooted in principles, theories, or techniques from cognitive psychology, the arts, and information sciences, among others. He is a founding member of the IEEE Computer Society Taskforce on Human- Centered Computing, a co-editor of the IEEE Computer Special Issue on Human-Centered Computing (May 2007). Dr. Jaimes received a Ph.D. in Electrical Engineering (2003) and a M.S. in Computer Science from Columbia University (1997) in New York City. His work has led to over 50 technical publications in international conferences and journals, and to numerous contributions to the MPEG-7 standard. He has been granted several patents and serves in the program committee of several international conferences (Creativity and Cognition, ACM Multimedia, ICME, ICIP, CIVR, ICCV and ECCV Workshops on HCI, etc.).

nicuNicu Sebe is a professor in the Department of Computer Science at the University of Amsterdam, The Netherlands. His research interests include human-computer interaction from a computer vision perspective and multimedia information retrieval. He received a PhD in computer science from Leiden University, The Netherlands.






For any questions regarding tutorials please email the tutorial co-chairs: