ACM International Conference on
Multimedia Retrieval,
Hong Kong, Jun. 5 - 8, 2012



ICMR 2012 will include three tutorials:

1. Foundations of Large-Scale Multimedia Information Management & Retrieval [full day]
Edward Y. Chang (Google Research)
Chih-Jen Lin (National Taiwan University and eBay Research)
Venue: M4051, CMC Building, CityU

2. Music Information Retrieval [half day]
Markus Schedl (Johannes Kepler University, Austria)
Masataka Goto (National Institute of Advanced Industrial Science and Technology, Japan)

Venue: M4053, CMC Building, CityU

3. 3D Video Segmentation, Recognition, and Retrieval [half day]
B. Prabhakaran (University of Texas at Dallas, USA)
Venue: M4053, CMC Building, CityU


1. Foundations of Large-Scale Multimedia Information Management and Retrieval

Presenters: Edward Y. Chang (Google Research), Chih-Jen Lin (National Taiwan University and eBay Research)


This tutorial presents an interdisciplinary approach to first establish scientific foundations for multimedia information management and retrieval, then address scalable issues in terms of both data dimensionality and volume.

This tutorial is organized into two parts. The first part depicts a multimedia system's key components, which together aims to comprehend semantics of multimedia data instances. The second part presents methods for scaling up these components for high-dimensional data and very large datasets. In part one this tutorial starts with providing an overview of the research and engineering challenges. It then presents feature extraction, which obtains useful signals from multimedia data instances. We discuss both model-based and data-driven, and then a hybrid approach. The tutorial then deals with the problem of formulating users' query concepts, which can be complex and subjective. We show how active learning and kernel methods can be used to work effectively with both keywords and perceptual features to understand a user's query concept with minimal user feedback. We argue that only after a user's query concept can be thoroughly comprehended, it is then possible to retrieve matching objects. To ensure concept learning can be performed effectively, we address the problem of distance-function formulation, a core subroutine of information retrieval for measuring similarity between data instances. We also address the problem of multimodal fusion, which is an increasingly important subject for integrating metadata representing both context and content. We close part one with introducing what new challenges and opportunities that mobile devices bring to multimedia computing. (Lecture notes of "Foundations of Large-Scale Multimedia Information Management and Retrieval", Edward Chang, Springer, September 2011).

Part two of this tutorial presents various scalability issues and solutions. In particular, we discuss machine learning and data mining algorithms in distributed environments. Most traditional learning algorithms were designed to run on a single computer, but data larger than a machine's capacity have become very common in multimedia information retrieval. We address scalability in three aspects. First, we discuss when to and when not to apply distributed learning and mining methods. Traditional machine learning algorithms focus on the computation, but we argue that in a distributed environment many other issues such as data locality must be taken into consideration. Second, we discuss distributed classification algorithms including linear and kernel Support Vector Machines (SVM), trees, and others. Third, we present approaches for distributed data clustering. Methods such as k-means, spectral clustering, and Latent Dirichlet Allocation (LDA) will be covered. Through the discussion of classification and clustering methods, we also see the advantages/disadvantages of different distributed programming frameworks such as MapReduce and MPI. Finally, we briefly discuss future challenges to tackle large-scale data classification and clustering.

Lecture notes of "Foundations of Large-Scale Multimedia Information Management and Retrieval", Edward Chang, Springer, September 2011.

Short Bio of Presenters

Edward Chang heads Google Research in China since March 2006. Before that, he was a full professor of Electrical Engineering at University of California, Santa Barbara.  His recent research activities are in the areas of distributed data mining and their applications to rich-media data management and social-network collaborative filtering. His big-data research group ecently parallelized SVMs (NIPS 07), PLSA (KDD 08), Association Mining (ACM RS 08), Spectral Clustering (PAMI 10), and LDA (WWW 09) (see MMDS/CIVR/EMMDS/MM/AAIM/ADMA/CIKM keynotes and tutorials for details) to run on parallel machines for mining large-scale datasets. His team at Google have developed and launched Google Confucius (a Q&A system, VLDB 10) at 68 countries including China, Russia, Thailand, Vietnam, Indonesia, 17 Arabic, and 40 Africa nations. Ed also directs Google Mobile 2014 research focused program, which develops novel mobile technologies (see SIGIR/ICADL keynotes). Ed is a recipient of the NSF Career Award.

Chih-Jen Lin is currently a distinguished professor at the Department of Computer Science, National Taiwan University and a visiting principal research scientist at eBay Research.  He obtained his B.S. degree from National Taiwan University in 1993 and Ph.D. degree from University of Michigan in 1998. His major research areas include machine learning, data mining, and numerical optimization. He is best known for his work on support vector machines (SVM) for data classification. His software LIBSVM is one of the most widely used and cited SVM packages. Nearly all major companies apply his software for classification and regression applications. He has received many awards for his research work. A recent one is the ACM KDD 2010 best paper award. He is an IEEE fellow and an ACM distinguished scientist for his contribution to machine learning algorithms and software design. More information about him can be found at

2. Music Information Retrieval

Presenters: Markus Schedl (Johannes Kepler University, Austria), Masataka Goto (National Institute of Advanced Industrial Science and Technology, Japan)


Music is an omnipresent topic, and everyone enjoys listening to his or her favorite tunes. Music information retrieval (MIR) is a research field that aims - among other things - at automatically extracting semantically meaningful information from various multimodal representations of music items, such as a digital audio file, a band's Web page, an image of an album cover, a song's lyrics, or a tweet reflecting current listening activity.

In this tutorial, we first give an introduction to MIR, summarize the ideas behind different categories of computational features and discuss advantages and disadvantages of each. We then present state-of-the-art content-based feature extraction techniques and contextual feature extractors. For the latter, we give an introduction to the field of Web-based MIR. Subsequently, we present approaches that exploit Web-based sources to construct similarity measures for music artists and songs based on collaborative and cultural knowledge, and to estimate the geospatial popularity of a music item. After that we discuss typical applications of MIR research, in particular, focusing on user interfaces to music collections. We eventually outline approaches to music recommendation and personalization and give an outlook to combining the music context with music content and user context to overcome current limitations.

Short Bio of Presenters

Markus Schedl graduated in Computer Science from the Vienna University of Technology. He earned his Ph.D. in Computational Perception from the Johannes Kepler University Linz, where he is employed as assistant professor at the Department of Computational Perception. He further holds a Master’s degree in International Business Administration from the Vienna University of Economics and Business Administration. Markus (co-)authored more than 60 refereed conference papers and journal articles. Furthermore, he serves on various program committees and reviewed submissions to many conferences and journals. His main research interests include music information retrieval, social media mining, multimedia, machine learning, and information visualization. He is co-founder of the International Workshop on Advances in Music Information Research and of the International Workshop on Learning from User-generated Content.

Masataka Goto received the Doctor of Engineering degree from Waseda University in 1998. He is currently a Prime Senior Researcher and the Leader of the Media Interaction Group at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. In 1992 he was one of the first to start work on automatic music understanding, and has since been at the forefront of research in music technologies and music interfaces based on those technologies. Since 1998 he has also worked on speech recognition interfaces. He has published more than 180 papers in refereed journals and international conferences and has received 29 awards. He has served as a committee member of over 80 scientific societies and conferences and was the Chair of the IPSJ Special Interest Group on Music and Computer (SIGMUS) in 2007 and 2008 and the General Chair of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009).

3. 3D Video Segmentation, Recognition, and Retrieval

Presenter: B. Prabhakaran (University of Texas at Dallas, USA)


Use of 3D stereo cameras has been increasing recently, especially with their introduction in gaming environments (e.g. Kinect cameras for Xbox gaming systems). With excellent miniaturization technologies, even several mobile devices (e.g., HTC and Samsung) these days come with 3D capture and glassless rendering technology. The key challenges are to create new features with 3D images and videos. For recognition in real-time, the key challenges for our system are to identify the relevant low-level features that need to be extracted and ensuring that the extraction algorithm is efficient enough to extract the features in real time. For carrying out 3D video processing on mobile devices, we need to worry about power consumption as well as lower processing and memory availability. In this tutorial, we discuss the state-of-the-art in the technologies used for 3D video processing in traditional as well as mobile platforms. We discuss the format of the video generated by popular 3D video cameras. We then analyze the type of features that can be extracted from these video data and the algorithms that can be employed for segmentation, recognition, and retrieval.  We also focus primarily on real-time nature of the algorithms that can be employed for this purpose.

Short Bio of Presenter

Dr. B. Prabhakaran is Professor in the faculty of Computer Science Department, University of Texas at Dallas. Dr. Prabhakaran received the prestigious NSF CAREER Award FY 2003 for his proposal on Animation Databases. Dr. Prabhakaran is General Co-Chair of ACM International Conference on Multimedia Retrieval (ICMR) 2013 and was the General Co-Chair of ACM Multimedia 2011. He is a Member of the Executive Council of the ACM Special Interest Group on Multimedia (SIGMM) and is the Co-Chair of IEEE Technical Committe on Multimedia Computing (TCMC) Special Interest Group on Video Analytics (SIGVA). Dr Prabhakaran is the Editor-in-Chief of the ACM SIGMM (Special Interest Group on Multimedia) web magazine. He is Member of the Editorial board of Multimedia Systems Journal (Springer) and Multimedia Tools and Applications journal (Springer). Prof Prabhakaran has contributed to nearly $10 million in research funding in the last several years. Prof Prabhakaran is an ACM Distinguished Scientist.


Hong Kong Tourist Information

ACM International Conference on Multimedia Retrieval, Jun. 5 - 8, 2012