ACM International Conference on Multimedia Retrieval June 11-14, 2018, Yokohama, Japan


ACMMM 2018 TPC Workshop

We are happy to host the ACM Multimedia 2018 TPC workshop at ICMR2018. The workshop will feature stimulating talks by top-level researchers in the field attending the ACM Multimedia 2018's Technical Program Committee meeting which is planned to be held right after the conference in Yokohama. The workshop is hosted as part of the ICMR 2018 conference program, so all registered participants are welcome to join the event.


June 14th, 14:30-16:30 and 17:00-19:00, Hall, Chair: Nicu Sebe





Brain-inspired Deep Models for Visual Recognition

Abstract: In this talk, I will introduce two recent approaches developed in our lab. In the first work, a novel multi-scale deep learning model is proposed for person re-identification across multiple surveillance cameras. The model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching. In the second work, inspired by the recent neuroscience studies on the left and right asymmetry of human brain in processing low and high spatial frequency information, we propose a dual skipping network with application on coarse-to-fine object categorization. This network has two branches to simultaneously deal with both coarse and fine-grained classification. Some promising preliminary results will be introduced.

Yu-Gang Jiang (Fudan University)
Yu-Gang Jiang is Professor of Computer Science and Director of Shanghai Engineering Research Center for Video Technology and System at Fudan University, China. His lab conducts research on all aspects of extracting high-level information from big video data, such as video event recognition, object/scene recognition and large-scale visual search. He is the lead architect of a few best-performing video analytic systems in worldwide competitions such as the annual U.S. NIST TRECVID evaluation. His visual concept detector library (VIREO-374) and video datasets (e.g., CCV, FCVID and THUMOS) are widely used resources in the research community. His work has led to many awards, including the inaugural ACM China Rising Star Award, the 2015 ACM SIGMM Rising Star Award, and the research award for outstanding young researchers from NSF China. He was also selected into China's National Ten-Thousand Talents Program and the Chang Jiang Scholars Program of China Ministry of Education. He is currently an associate editor of ACM TOMM, Machine Vision and Applications (MVA) and Neurocomputing. He holds a PhD in Computer Science from City University of Hong Kong and spent three years working at Columbia University before joining Fudan in 2011.

Frontiers of Music Technologies

Abstract: Music technologies will open the future up to new ways of enjoying music both in terms of music creation and music appreciation. In this talk, I will introduce the frontiers of music technologies by showing some practical research examples, which have already been made open to the public as web services or made into commercial products, to demonstrate how end users can benefit from music understanding technologies and music interfaces.

Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))
Masataka Goto received the Doctor of Engineering degree from Waseda University in 1998. He is currently a Prime Senior Researcher at the National Institute of Advanced Industrial Science and Technology (AIST). In 1992 he was one of the first to start working on automatic music understanding and has since been at the forefront of research in music technologies and music interfaces based on those technologies. Over the past 26 years he has published more than 250 papers in refereed journals and international conferences and has received 46 awards, including several best paper awards, best presentation awards, the Tenth Japan Academy Medal, and the Tenth JSPS PRIZE. He has served as a committee member of over 110 scientific societies and conferences, including the General Chair of the 10th and 15th International Society for Music Information Retrieval Conferences (ISMIR 2009 and 2014). In 2016, as the Research Director he began a 5-year research project (OngaACCEL Project) on music technologies, a project funded by the Japan Science and Technology Agency (ACCEL, JST).

Mental Health Computing via Harvesting Social Media Data

Abstract: Psychological stress and depression are threatening people’s health. It is non-trivial to detect stress or depression timely for proactive care. With the popularity of social media, people are used to sharing their daily activities and interacting with friends on social media platforms, making it feasible to leverage online social media data for stress and depression detection. In this talk, we will systematically introduce our work on stress and depression detection employing large-scale benchmark datasets from real-world social media platforms, including 1) stress-related and depression-related textual, visual and social attributes from various aspects, 2) novel hybrid models for binary stress detection, stress event and subject detection, and cross-domain depression detection, and finally 3) several intriguing phenomena indicating the special online behaviors of stressed as well as depressed people. We would also like to demonstrate our developed mental health care applications at the end of this talk.

Jia Jia (Tsinghua University)
Dr. Jia Jia is an associate professor in Department of Computer Science and Technology, Tsinghua University. Her main research interest is social affective computing and human computer speech interaction. She has been awarded ACM Multimedia Grand Challenge Prize (2012) and Scientific Progress Prizes from the National Ministry of Education twice (2009, 2016). She has authored about 70 papers in leading conferences and journals including T-KDE, T-MM, T-MC, T-ASLP, T-AC, ACM Multimedia, AAAI, IJCAI, WWW etc. She also has wide research collaborations with Tencent, SOGOU, Huawei, Siemens, MSRA, Bosch, etc.

Person Re-Identification: Recent Advances and Challenges

Abstract: As a research topic attracting more and more interests in both academia and industry, person Re-Identification (ReID) targets to identify the re-appearing persons from a large set of videos. It is potential to open great opportunities to address the challenging data storage problems, offering an unprecedented possibility for intelligent video processing and analysis, as well as exploring the promising applications on public security like cross camera pedestrian searching, tracking, and event detection.

This talk aims at reviewing the latest research advances, discussing the remaining challenges in person ReID, and providing a communication platform for researchers working on or interested in this topic. This talk includes several parts on person ReID:

  • Wide deep models for fine-grained pattern recognition
  • Local and global representation learning for person ReID
  • The application of Generative Adversarial Networks in person ReID
  • Open issues and promising research topics of person ReID

This talk also covers our latest work on person ReID, as well as our viewpoints about the unsolved challenging issues in person ReID. We believe this talk would be helpful for researchers working on person ReID and other related topics.

Qi Tian (University of Texas at San Antonio (UTSA))
Qi Tian is currently a Full Professor in the Department of Computer Science, the University of Texas at San Antonio (UTSA). He was a tenured Associate Professor from 2008-2012 and a tenure-track Assistant Professor from 2002-2008. During 2008-2009, he took one-year Faculty Leave at Microsoft Research Asia (MSRA) as Lead Researcher in the Media Computing Group.

Dr. Tian received his Ph.D. in ECE from University of Illinois at Urbana-Champaign (UIUC) in 2002 and received his B.E. in Electronic Engineering from Tsinghua University in 1992 and M.S. in ECE from Drexel University in 1996, respectively. Dr. Tian’s research interests include multimedia information retrieval, computer vision, machine learning and pattern recognition and published over 420 refereed journal and conference papers (including 106 IEEE/ACM Transactions papers and 80 CCF Category A conference papers). His Google Citation is 10500+ with h-index 55. He was the co-author of a Best Paper in ACM ICMR 2015, a Best Paper in PCM 2013, a Best Paper in MMM 2013, a Best Paper in ACM ICIMCS 2012, a Top 10% Paper Award in MMSP 2011, a Best Student Paper in ICASSP 2006, and co-author of a Best Student Paper Candidate in ICME 2015, and a Best Paper Candidate in PCM 2007.

Dr. Tian research projects are funded by ARO, NSF, DHS, Google, FXPAL, NEC, SALSI, CIAS, Akiira Media Systems, HP, Blippar and UTSA. He received 2017 UTSA President’s Distinguished Award for Research Achievement, 2016 UTSA Innovation Award, 2014 Research Achievement Awards from College of Science, UTSA, 2010 Google Faculty Award, and 2010 ACM Service Award. He is the associate editor of IEEE Transactions on Multimedia (TMM), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Multimedia System Journal (MMSJ), and in the Editorial Board of Journal of Multimedia (JMM) and Journal of Machine Vision and Applications (MVA).  Dr. Tian served as Area Chairs for a number of conferences including CVPR, ICCV, ECCV, and ACM MM. Dr. Tian is a Fellow of IEEE.


Multi-level Multi-aspect Multimedia Analysis

Abstract: With the development of smart devices and networks, massive multimedia data including texts, images, and videos are generated daily. Automatic analysis techniques which can extract and understand important information from such multimedia data are desired. Rich information can be mined from multimedia data such as videos from different aspects, for example, the objective aspect which can be perceived visually or aurally and subjective aspect such as emotion status etc. In this talk I will present our recent works on concept detection, video captioning and emotion detection.

Qin Jin (Renmin University of China)
Qin Jin is an associate professor at School of Information at Renmin University of China, who is leading the multimedia content analysis research group. Qin has research interest in multimedia content analysis (including audio, image, video, text etc.), human computer interaction and machine learning in general. Her recent works on video captioning and emotion detection have won awards in several challenge evaluations. Qin received her Ph.D. degree in Language and Information Technologies from Carnegie Mellon University in 2007. She received her B.Sc. and M.S. degrees in computer science and technologies from Tsinghua University, Beijing, China in 1996, 1999, respectively. Qin worked as a research faculty in Language Technologies Institute at Carnegie Mellon University from Feb. 2007 to March, 2012. She was a Research Scientist at IBM China Research Laboratory from April, 2012 to December 2012 before joining Renmin University of China on January 2013.

Mining Automatically Estimated Poses from Video Recordings of Top Athletes

Abstract: Human pose detection systems based on state-of-the-art DNNs are on the go to be extended, adapted and re-trained to fit the application domain of specific sports. Therefore, plenty of noisy pose data will soon be available from videos recorded at a regular and frequent basis. This work is among the first to develop mining algorithms that can mine the expected abundance of noisy and annotation-free pose data from video recordings in individual sports. Using swimming as an example of a sport with dominant cyclic motion, we show how to determine unsupervised time-continuous cycle speeds and temporally striking poses as well as measure unsupervised cycle stability over time. Additionally, we use long jump as an example of a sport with a rigid phase-based motion to present a technique to automatically partition the temporally estimated pose sequences into their respective phases. This enables the extraction of performance relevant, pose-based metrics currently used by national professional sports associations. Experimental results prove the effectiveness of our mining algorithms, which can also be applied to other cycle-based or phase-based types of sport.

Rainer Lienhart (University of Augsburg)
Rainer Lienhart is a full professor in the computer science department of the University of Augsburg and chair of the Multimedia Computing and Computer Vision Lab (MMC Lab). His group is focusing on all aspects of (1) large-scale image, video, humans pose, sensor and data mining algorithms, (2) object, human pose and action detection/recognition as well as (3) image, video, human pose and action retrieval.

Since April 2010, he is also the executive director of the Institute for Computer Science at the University of Augsburg. From August 2017 to March 2018 he spent his sabbatical at FXPAL, while in 2009, he spent his sabbatical at Willow Garage -- a robotic company, which is working on laying the groundwork for the industry that will be needed to enable personal robotics applications by investing in open source such as OpenCV and open platform adoption models. Rainer Lienhart has always been an active contributor to OpenCV. From August 1998 to July 2004 he was a Staff Researcher at Intel's Microprocessor Research Lab in Santa Clara, California, where he worked on transforming a network of heterogeneous, distributed computing platforms into an array of audio/video sensors and actuators capable of performing complex DSP tasks such as distributed beamforming, audio rendering, audio/visual tracking, and camera array processing. In particular, this requires putting distributed heterogeneous computing platforms with audio -visual sensors into a common time and space coordinate system. At the same time, he was also continuing his work on media mining, where he is well known for his work in video content analysis with contributions in text detection/recognition, commercial detection, face detection (see also here), shot and scene detection, and automatic video abstraction. He received his Ph.D. in Computer Science from the University of Mannheim, Germany, in 1998, where he was a member of the Movie Content Analysis Project (MoCA).

The scientific work of Rainer Lienhart covers more than 80 refereed publications and more than 20+ patents. He was a general co-chair of ACM Multimedia 2017, ACM Multimedia 2007 and SPIE Storage and Retrieval of Media Databases 2004 & 2005. He serves in the editorial boards of 3 international journals. For more than a decade he is a committee member of ACM Multimedia, IEEE ICME, SPIE Storage and Retrieval of Media Databases, and many more conferences. From July 2009 till June 2017 he was the vice chair of SIGMM.

Affective Multimodal Analysis for the Media Industry

Abstract: The media industry is constantly making use of affective signals whether in text, sound, image or moving images with the aim of attracting our attention and conveying a message or a story. In this presentation we will look at the analysis of audio-visual content for understanding its affective properties. We will start with proposing a semi supervised approach for identifying the genre of a media (action, drama, horror, etc..). We will then show how the genre of video segments can be used to determine its interestingness. There are many usage scenarios where such information about the content has value for the media editors and archivists. Beyond genre and interestingness, emotion recognition in videos in another important cue when understanding the content of audio-visual documents. For this a deep model combining three key component for recognizing human expression of emotions has been devised. It includes static as well as dynamic facial features and audio information. The approach was shown to perform well on the Emotion Recognition in the Wild 2017 challenge. Applications to past and ongoing research and industry projects will be used throughout to illustrate the presentation.

Benoit Huet (Eurecom)
Dr. Benoit Huet is Assistant Professor in the Data Science department of Eurecom (France). He received his BSc degree in computer science and engineering from the Ecole Superieure de Technologie Electrique (Groupe ESIEE, France) in 1992. In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction, where he then spent two years working as a research and teaching assistant. He received his DPhil degree in Computer Science from the University of York (UK) for his research on the topic of object recognition from large databases. He was awarded the HDR (Habilitation to Direct Research) from the University of Nice Sophia Antipolis, France, in October 2012 on the topic of Multimedia Content Understanding: Bringing Context to Content. He is associate editor for IEEE Multimedia, IEEE transaction on Multimedia Tools and Application (Springer) and Multimedia Systems (Springer) and has been guest editor for a number of special issues (EURASIP Journal on Image and Video Processing, IEEE Multimedia). He regularly serves on the technical program committee of the top conference of the field (ACM MM/ICMR, IEEE ICME/ICIP). He served as Technical Program co-Chair of ACM Multimedia 2016 and will be General Co-Chair for Multimedia Modeline 2019 in Thessaloniki, Greece and ACM Multimedia 2019 in Nice, France. He has co-authored over 150 papers in Books, Journals and International conferences. His current research interests include Large Scale Multimedia Content Analysis, Mining and Indexing - Video Understanding and Hyperlinking - Multimodal Fusion - Socially-Aware Multimedia.

Deep Neural Networks for Automated Prostate Cancer Detection and Diagnosis in Multi-parametric MRI

Abstract: Multi-parameter magnetic resonance imaging (mp-MRI) is increasingly popular for prostate cancer (PCa) detection and diagnosis. However, interpreting mp-MRI data which typically contains multiple unregistered 3D sequences, e.g. apparent diffusion coefficient (ADC) and T2-weighted (T2w) images, is time-consuming and demands special expertise, limiting its usage for large-scale PCa screening. Therefore, solutions to computer-aided detection and diagnosis of PCa in mp-MRI images are highly desirable. Most recent advances in automated methods for PCa detection employ several separate steps, including multimodal image registration, prostate segmentation, voxel-level classification for candidate generation and a region-level classification for verification. Features used in each classification stage are handcrafted. In addition, each step is optimized individually without considering the error tolerance of other steps. As a result, they could either involve unnecessary computational cost or suffer from errors accumulated over steps. In this talk we will introduce a series of our recent works on utilizing deep convolutional neural networks (CNN) for automated PCa detection and diagnosis. We will introduce our co-trained weakly-supervised CNNs which can concurrently identify the presence of PCa in an image and localize lesions. Our weakly-supervised CNNs can learn representative lesion features from entire prostate images with only image-level labels indicating the presence or absence of PCa, significantly alleviating the manual annotation efforts in clinical usage. Multi-model information from ADC and T2w are fused implicitly in CNNs so that the feature learning process of each modality can be mutually guided by each other to capture highly representative PCa-relevant features. We will also introduce our Tissue Deformation Network (TDN) for automated prostate detection and multimodal registration. The TDN can be directly concatenated with our weakly-supervised PCa detection CNNs so that all parameters of the entire network can be jointly optimized in an end-to-end manner. Comprehensive evaluation on 360 patient data demonstrates that our system achieves a high accuracy for CS PCa detection and is outperforms the state-of-the-art CNN-based methods and 6-core systematic prostate biopsies.

Xin Yang (Huazhong University of Science and Technology)
Xin Yang received her PhD degree in University of California, Santa Barbara in 2013. She worked as a Post-doc in Learning-based Multimedia Lab at UCSB (2013-2014). She joined Huazhong University of Science and Technology in August 2014 and is currently the Associate Professor of School of Electronic Information and Communications. Her research interests include medical image analysis, monocular simultaneous localization and mapping, and augmented reality. She has published over 40 technical papers, including TPAMI, TMI, TVCG, MIA, ACM MM, MICCAI, ISMAR, etc., co-authored two books and held 10+ U.S. and Chinese Patents and software copyrights. Prof. Yang is a member of IEEE and a member of ACM.

Cross-media retrieval: state of the art

Abstract: It has been shown that heterogeneous multimedia data gathered from different sources in different media types can be often correlated and linked to the same knowledge space. Consequently, cross-media retrieval has attracted huge amount of attention due to its significance in both research and industrial communities. In this talk, we will introduce the state of the art on this topic and discuss its future trends.

Heng Tao Shen (University of Electronic Science and Technology of China (UESTC))
Heng Tao Shen is currently a Professor of National "Thousand Talents Plan", Dean of School of Computer Science and Engineering, and Director of Center for Future Media at the University of Electronic Science and Technology of China (UESTC). He is also an Honorary Professor at the University of Queensland. He obtained his BSc with 1st class Honours and PhD from Department of Computer Science, National University of Singapore in 2000 and 2004 respectively. He then joined the University of Queensland as a Lecturer, Senior Lecturer, Reader, and became a Professor in late 2011. His research interests mainly include Multimedia Search, Computer Vision, Artificial Intelligence, and Big Data Management. He has published 200+ peer-reviewed papers, among which 120+ appeared in Chinese Computing Federation (CCF) A ranked publication venues, such as ACM Multimedia, CVPR, ICCV, AAAI, IJCAI, SIGMOD, VLDB, ICDE, TOIS, TIP, TPAMI, TKDE, VLDB Journal, etc. He has received 7 Best Paper (or Honorable Mention) Awards from international conferences, including the Best Paper Award from ACM Multimedia 2017 and Best Paper Award - Honorable Mention from ACM SIGIR 2017. He received the Chris Wallace Award in 2010 conferred by Computing Research and Education Association, Australasia, and the Future Fellowship from Australia Research Council in 2012. He is an Associate Editor of IEEE Transactions on Knowledge and Data Engineering, has organized ICDE 2013 as Local Organization Co-Chair, and ACM Multimedia 2015 as Program Committee Co-Chair.

Towards Compact Visual Analysis Systems

Abstract: In this talk, I will review several recent works in our group towards designing and implementing ting compact computer vision systems for retrieval, categorization, analysis, and inference of visual scenes. In particular, I will first review several works on compact binary code learning. Our main innovations lies in embedding the ordinal information into hash functions, a methodology among supervised and unsupervised learning, and can take merits of both. Second, I will review several recent works of our group in designing very compact and fast convolutional neural networks for visual analysis tasks. I will introduce our schemes on how to reduce the time cost of convolutional filters, as well as reducing the memory usage in fully-connected layers. I will also show our progress in FPGA based implementation of convolutional neural networks in various practical applications.

Rongrong Ji (Xiamen University)
Rongrong Ji is currently a Mingjiang Chair Professor at the Department of Cognitive Science, School of Information Science and Engineering, Xiamen University. He is the founded director of Media Analytics and Computing Lab ( Ji's research falls in the field of computer vision, multimedia, and machine learning. His scholarly work mainly focuses on leveraging big data to build computer systems to understand visual scenes and human behaviors, inferring the semantics and retrieving instances for various emerging applications. My recent interests include compact visual descriptor, social media sentiment analysis, and holistic scene understanding. He has published 100+ papers in tier-1 journal and conferences like PAMI IJCV, TIP, CVPR, ICCV, ECCV, IJCAI, AAAI, ACM Multimedia etc, with over 5000 citations in the past 5 years. He is associate editor of Neurocomputing, Multimedia Tools and Applications, The Visual Computer, PLOS ONE, Frontiers of Computer Science etc., guest editor of ACM Transactions on Intelligent Systems and Technology, IEEE Multimedia Magazine, Signal Processing, Neurocomputing etc., General Chair of VALSE (Vision And Learning SEminar) 2017, Local/Session/Area Chairs in IEEE MMSP 2015, ACM ICMR 2014, IEEE VCIP 2014, ACM MMM 2015, IEEE ISM 2015 etc. TPC Members in AAAI 2015, CVPR 2013, ICCV 2013, ACM Multimedia 2010-2015 etc. He has been a Senior Member of IEEE (2014-now), Senior Member of ACM (2015-now), Chair of VAIG Group for IEEE Multimedia Communication Technical Committee (MMTC) (2014-2016), Member of ACM, Chair of CCF YOCSEF Xiamen (2016-2017), and Executive Member of Fujian Association of Artificial Intelligence.

In the past decade, Ji and his collaborators have developed some of the state-of-the-art mobile visual search systems and social multimedia analytics tools, with top performances in the MPEG Compact Descriptor for Visual Search (CDVS) standard evaluations. His work has also been recognized by ACM Multimedia 2011 Best Paper Award, Microsoft Fellowship 2007, and Best Thesis Award of Harbin Institute of Technology. His research has been supported by government agencies like National Science Foundation of China. He is the recipient of the National Science Foundation for Excellent Young Scholars (2014).

Multimedia Research: There's life in the old dog yet

Abstract: Is Multimedia research dying out ... or is it picking up pace? What can be upcoming challenges to meet? The talk will start by casting a critical eye on the past impact of multimedia research. Afterwards, a simplified multimedia processing pipeline will be used as a basis for reviewing promising fields of research. Selected contributions in these fields will be cited, intented to serve as cues for the proposed research directions.

Max Mühlhäuser (Technische Universität Darmstadt)
Prof. Dr. Max Mühlhäuser is Full Professor and head of Telecooperation Lab at Technische Universität Darmstadt. His lab conducts research on cyberphysical smart spaces of all scales (e.g., office, city, ...) along three research axes: (a) big networks (networks & middleware for big data/media, smart & critical infrastructures), (b) novel HCI (devices & concepts for new interaction technologies), and (c) Ubiquitsou cybersecurity (privacy, trust & resilience). Max is head of a doctoral school on privacy & trust for mobile users, deputy speaker of a collaborative research center on Future Internet, and lead PI of Cysec, one of Europe's largest Cybersecurity research alliances. Max (co-)authored over 500 publications. He was and is paying active service to the community: example TPCs/OCs & editorial/review boards include Ubicomp / ACM IMWUT, ACM Multimedia, Percom, IEEE ICME, IEEE ISM, ACM TOIT, Jrnl. PMC, Jrnl. Web Engineering

Information on ICMR2018

Venue Information

Yokohama Media and Communication Center
11 Nihon-Odori, Naka-ku, Yokohama, 231-0023, Japan