Nowadays, increasingly rich and massive social media data (such as texts, images, audios, videos, blogs, and so on) are being posted to the web, including social networking websites (e.g., MySpace, Facebook), photo and video sharing websites (e.g., Flickr, YouTube), and photo forums (e.g., Photosig.com and Photo.net). Recently, researchers from multidisciplinary areas have proposed to use data-driven approaches for multimedia content understanding by leveraging such unlimited web images and videos as well as their associated rich contextual information (e.g., tag, comments, category, title and metadata). In our three hour tutorial, we plan to introduce the important general concepts and themes of this timely topic. We will also review and summarize the recent multimedia content analysis methods using web-scale social media data as well as present insight into the challenges and future directions in this area. Moreover, we will also show extensive demos on image annotation and retrieval by using rich social media data.
- Introduction of the rapid advance of Web 2.0
- Data driven approaches
- Image annotation by mining image search results
- Gigantic image collections and their applications in computer vision and multimedia
- Scene completion, geotagging, and photo tourism using millions of photographs
- Machine Learning Methods for Improving the Performance
- Semi-supervised learning in gigantic image collections
- Distance matrix learning to reduce semantic gap for image retrieval
- Transfer learning for consumer image and video understanding
- Tag refinement and Flickr distance
- Other machine learning methods (e.g., active learning, online learning and multiple-label learning)
- Indexing techniques for accelerating the search process
- Global feature based approaches: Small Codes and Spectral Hashing
- Local feature based approaches: bag-of-words model, Hamming Embedding, Bundling feature based approach, image decomposition model
- Challenges and Future Directions
- Indexing methods for speed
- Machine learning methods that handle semantic gap and noise
- Region and object based techniques
This tutorial is at introductory/intermediate level and it is open for all the audiences.
Dong Xu is currently an assistant professor at Nanyang Technological University at Singapore. He received the B.Eng. and PhD degrees from University of Science and Technology of China, in 2001 and 2005, respectively. During his PhD study, he worked at Microsoft Research Asia and The Chinese University of Hong Kong for more than two years. He also worked at Columbia University for one year as a postdoctoral research scientist. His research focuses on new theories, algorithms and systems for intelligent processing and understanding of visual data such as images and videos. He has published more than 35 papers in top venues including T-PAMI, T-IP, T-CSVT, CVPR, ACM MM, ICML and IJCAI. He was co-author (with his PhD student Lixin Duan) of a paper that won the Best Student Paper Award in the IEEE International Conference on Computer Vision and Pattern Recognition 2010. He is an associate editor of Neurocomputing (Elsevier) and he is an editorial board member of Journal of Multimedia (Academy). He has served as the guest editors of three special issues on video and event analysis in T-CSVT, CVIU and PRL. He has also served as the workshop co-chair of The ACM SIGMM Workshop on Social Media and The IEEE ICME Workshop on Visual Content Identification and Search, a Track Chair of ICME 2009, and a Theme Chair of PSIVT 2009.
Lei Zhang is a lead researcher in the Web Search & Mining Group at Microsoft Research Asia in Beijing, and an adjunct professor of Tianjin University. He currently directs a team pursuing new research directions on social media search. Team projects include multimedia content analysis, web-scale image annotation, and information mining for travel search. Results are targeted toward new advanced services in the delivery of intelligence and insight to Web users. He is an IEEE and ACM member, and has served as program co-chair of MMM 2010, and served on international conference program committees, including ACM Multimedia, WWW, SIGIR, WSDM, ICME, MMM, etc. He is the author or co-author of more than 80 published papers in fields such as content-based image retrieval, computer vision, web search and information retrieval. He also holds 11 U.S. patents for his innovation in face-detection, red-eye reduction and image retrieval technologies. He earned a B.S. and M.S. in Computer Science from Tsinghua University, in 1993 and 1995. After two years working in industry, he later returned to Tsinghua University, and received his PhD degree in Computer Science in 2001.
Jiebo Luo is a Senior Principal Scientist with Kodak Research Laboratories, Rochester, New York. He received his Bachelor’s degree from the University of Science and Technology of China in 1989 and his PhD degree in electrical engineering from the University of Rochester in 1995. His research interests include image processing, pattern recognition, computer vision, computational photography, multimedia data mining, and ubiquitous computing. He has authored more than 150 technical papers and holds 50 issued US patents. Dr. Luo is the Editor-in-Chief of the Journal of Multimedia. He also has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), the IEEE Transactions on Multimedia (TMM), the IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Pattern Recognition (PR), Machine Vision and Applications (MVA), and Journal of Electronic Imaging. He has been involved in organizing numerous leading technical conferences sponsored by IEEE, ACM, and SPIE. Dr. Luo is a Kodak Distinguished Inventor, a Fellow of SPIE, and a Fellow of IEEE.