MM '16- Proceedings of the 2016 ACM on Multimedia Conference


SESSION: Keynote Address

A Digital World to Thrive In: How the Internet of Things Can Make the "Invisible Hand" Work

  • Dirk Helbing

SESSION: Best Paper

Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis

  • Shengsheng Qian
  • Tianzhu Zhang
  • Changsheng Xu

Patterns of Free-form Curation: Visual Thinking with Web Content

  • Nic Lupfer
  • Andruid Kerne
  • Andrew M. Webb
  • Rhema Linder

DASH2M: Exploring HTTP/2 for Internet Streaming to Mobile Devices

  • Mengbai Xiao
  • Viswanathan Swaminathan
  • Sheng Wei
  • Songqing Chen

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

  • Jingjing Chen
  • Chong-wah Ngo

POSTER SESSION: Posters

GeoTracks: Adaptive Music for Everyday Journeys

  • Chris Greenhalgh
  • Adrian Hazzard
  • Sean McGrath
  • Steve Benford

Abnormal Event Discovery in User Generated Photos

  • Xiaoshan Yang
  • Tianzhu Zhang
  • Changsheng Xu

Deep Bi-directional Cross-triplet Embedding for Cross-Domain Clothing Retrieval

  • Shuhui Jiang
  • Yue Wu
  • Yun Fu

A Discriminative and Compact Audio Representation for Event Detection

  • Liping Jing
  • Bo Liu
  • Jaeyoung Choi
  • Adam Janin
  • Julia Bernd
  • Michael W. Mahoney
  • Gerald Friedland

Jockey Time: Making Video Playback to Enhance Emotional Effect

  • Kyeong Ah Jeong
  • Hyeon-Jeong Suk

Discriminative Paired Dictionary Learning for Visual Recognition

  • Hui-Hung Wang
  • Yi-Ling Chen
  • Chen-Kuo Chiang

From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks

  • Yanhao Zhang
  • Lei Qin
  • Qingming Huang
  • Kuiyuan Yang
  • Jun Zhang
  • Hongxun Yao

Facial Age Estimation Using Robust Label Distribution

  • Ke Chen
  • Joni-Kristian Kämäräinen
  • Zhaoxiang Zhang

What Makes a Good Movie Trailer?: Interpretation from Simultaneous EEG and Eyetracker Recording

  • Sidi Liu
  • Jinglei Lv
  • Yimin Hou
  • Ting Shoemaker
  • Qinglin Dong
  • Kaiming Li
  • Tianming Liu

LIME: A Method for Low-light IMage Enhancement

  • Xiaojie Guo

Multi-Protocol Video Delivery with Late Trans-Muxing

  • Rufael Mekuria
  • Jelte Fennema
  • Dirk Griffioen

Analyzing Structural Characteristics of Object Category Representations From Their Semantic-part Distributions

  • Ravi Kiran Sarvadevabhatla
  • Venkatesh Babu R

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

  • Pichao Wang
  • Zhaoyang Li
  • Yonghong Hou
  • Wanqing Li

Efficient Digital Holographic Image Reconstruction on Mobile Devices

  • Chung-Hua Chu

Scene Image Synthesis from Natural Sentences Using Hierarchical Syntactic Analysis

  • Tetsuaki Mano
  • Hiroaki Yamane
  • Tatsuya Harada

A Fast 3D Retrieval Algorithm via Class-Statistic and Pair-Constraint Model

  • Zan Gao
  • Deyu Wang
  • Hua Zhang
  • Yanbing Xue
  • Guangping Xu

Analyzing and Predicting GIF Interestingness

  • Michael Gygli
  • Mohammad Soleymani

Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition

  • Chen Chen
  • Zuxuan Wu
  • Yu-Gang Jiang

Exploiting Hierarchical Activations of Neural Network for Image Retrieval

  • Ying Li
  • Xiangwei Kong
  • Liang Zheng
  • Qi Tian

A Deeply-Supervised Deconvolutional Network for Horizon Line Detection

  • Lorenzo Porzi
  • Samuel Rota Bulò
  • Elisa Ricci

Exploiting Objects with LSTMs for Video Categorization

  • Yongqing Sun
  • Zuxuan Wu
  • Xi Wang
  • Hiroyuki Arai
  • Tetsuya Kinebuchi
  • Yu-Gang Jiang

Assessing 3D Scan Quality Through Paired-comparisons Psychophysics

  • Jacob Thorn
  • Rodrigo Pizarro
  • Bernhard Spanlang
  • Pablo Bermell-Garcia
  • Mar Gonzalez-Franco

Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization

  • Zhou Zhao
  • Hanqing Lu
  • Cai Deng
  • Xiaofei He
  • Yueting Zhuang

Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering

  • Hervé Bredin
  • Grégory Gelly

Location-Independent WiFi Action Recognition via Vision-based Methods

  • Jen-Yin Chang
  • Kuan-Ying Lee
  • Yu-Lin Wei
  • Kate Ching-Ju Lin
  • Winston Hsu

INRS Audiovisual Quality Dataset

  • Edip Demirbilek
  • Jean-Charles Grégoire

Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition

  • Hui Wu
  • Michele Merler
  • Rosario Uceda-Sosa
  • John R. Smith

Dictionary Learning Based Hashing for Cross-Modal Retrieval

  • Xin-Shun Xu

SocialFX: Studying a Crowdsourced Folksonomy of Audio Effects Terms

  • Taylor Zheng
  • Prem Seetharaman
  • Bryan Pardo

SwiDeN: Convolutional Neural Networks For Depiction Invariant Object Recognition

  • Ravi Kiran Sarvadevabhatla
  • Shiv Surya
  • Srinivas S S Kruthiventi
  • Venkatesh Babu R.

Multi-Scale Triplet CNN for Person Re-Identification

  • Jiawei Liu
  • Zheng-Jun Zha
  • QI Tian
  • Dong Liu
  • Ting Yao
  • Qiang Ling
  • Tao Mei

Multimodal Popularity Prediction of Brand-related Social Media Posts

  • Masoud Mazloom
  • Robert Rietveld
  • Stevan Rudinac
  • Marcel Worring
  • Willemijn van Dolen

Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

  • Nam Le
  • Jean-Marc Odobez

Joint Image-Text Representation by Gaussian Visual-Semantic Embedding

  • Zhou Ren
  • Hailin Jin
  • Zhe Lin
  • Chen Fang
  • Alan Yuille

A Domain Robust Approach For Image Dataset Construction

  • Yazhou Yao
  • Xian-sheng Hua
  • Fumin Shen
  • Jian Zhang
  • Zhenmin Tang

A Supervised Approach for Text Illustration

  • Harsh Jhamtani
  • Shubham Varma
  • Midhun Gundapuneni
  • Siddhartha Kumar Dutta

Learning Music Emotion Primitives via Supervised Dynamic Clustering

  • Yang Liu
  • Yan Liu
  • Xiang Zhang
  • Gong Chen
  • Kejun Zhang

Cross-modal Retrieval by Real Label Partial Least Squares

  • Jianfeng He
  • Bingpeng Ma
  • Shuhui Wang
  • Yugui Liu
  • Qingming Huang

LSOD: Local Sparse Orthogonal Descriptor for Image Matching

  • Yiru Zhao
  • Yaoyi Li
  • Zhiwen Shao
  • Hongtao Lu

Frustratingly Easy Cross-Modal Hashing

  • Dekui Ma
  • Jian Liang
  • Xiangwei Kong
  • Ran He

Families in the Wild (FIW): Large-Scale Kinship Image Database and Benchmarks

  • Joseph P. Robinson
  • Ming Shao
  • Yue Wu
  • Yun Fu

Enabling My Robot To Play Pictionary: Recurrent Neural Networks For Sketch Recognition

  • Ravi Kiran Sarvadevabhatla
  • Jogendra Kundu
  • Venkatesh Babu R

Experience Individualization on Online TV Platforms through Persona-based Account Decomposition

  • Payal Bajaj
  • Sumit Shekhar

Improved Dense Trajectory with Cross Streams

  • Katsunori Ohnishi
  • Masatoshi Hidaka
  • Tatsuya Harada

Joint Image and Text Representation for Aesthetics Analysis

  • Ye Zhou
  • Xin Lu
  • Junping Zhang
  • James Z. Wang

Who is where?: Matching People in Video to Wearable Acceleration During Crowded Mingling Events

  • Laura Cabrera-Quiros
  • Hayley Hung

Supervised Recurrent Hashing for Large Scale Video Retrieval

  • Yun Gu
  • Chao Ma
  • Jie Yang

Adaptation of Word Vectors using Tree Structure for Visual Semantics

  • Nakamasa Inoue
  • Koichi Shinoda

Adaptive Bitrate Selection for Video Encoding with Reduced Block Artifacts

  • Min-Kook Choi
  • Hyun-Gyu Lee
  • Minseok Song
  • Sang-Chul Lee

What Makes Photo Cultures Different?

  • Miriam Redi
  • Damon Crockett
  • Lev Manovich
  • Simon Osindero

Synchronization among Groups of Spectators for Highlight Detection in Movies

  • Michal Muszynski
  • Theodoros Kostoulas
  • Patrizia Lombardo
  • Thierry Pun
  • Guillaume Chanel

On Estimating Air Pollution from Photos Using Convolutional Neural Network

  • Chao Zhang
  • Junchi Yan
  • Changsheng Li
  • Xiaoguang Rui
  • Liang Liu
  • Rongfang Bie

Cross-modal Retrieval with Label Completion

  • Xing Xu
  • Fumin Shen
  • Yang Yang
  • Heng Tao Shen
  • Li He
  • Jingkuan Song

Objectness-aware Semantic Segmentation

  • Yuhang Wang
  • Jing Liu
  • Yong Li
  • Junjie Yan
  • Hanqing Lu

ReadMe: A Real-Time Recommendation System for Mobile Augmented Reality Ecosystems

  • Dimitris Chatzopoulos
  • Pan Hui

Action Recognition Using Local Consistent Group Sparse Coding with Spatio-Temporal Structure

  • Yi Tian
  • Qiuqi Ruan
  • Gaoyun An
  • Yun Fu

Super Resolution of the Partial Pixelated Images With Deep Convolutional Neural Network

  • Haiyi Mao
  • Yue Wu
  • Jun Li
  • Yun Fu

Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks

  • Takuhiro Kaneko
  • Kaoru Hiramatsu
  • Kunio Kashino

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets

  • Angelos Katharopoulos
  • Despoina Paschalidou
  • Christos Diou
  • Anastasios Delopoulos

Semantic Description of Timbral Transformations in Music Production

  • Ryan Stables
  • Brecht De Man
  • Sean Enderby
  • Joshua D. Reiss
  • György Fazekas
  • Thomas Wilmering

Multimodal Learning via Exploring Deep Semantic Similarity

  • Di Hu
  • Xiaoqiang Lu
  • Xuelong Li

Multi-pose Facial Expression Recognition Using Transformed Dirichlet Process

  • Feifei Zhang
  • Qirong Mao
  • Ming Dong
  • Yongzhao Zhan

Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search

  • Botong Wu
  • Yizhou Wang

Attention-based LSTM with Semantic Consistency for Videos Captioning

  • Zhao Guo
  • Lianli Gao
  • Jingkuan Song
  • Xing Xu
  • Jie Shao
  • Heng Tao Shen

Efficient Mobile Implementation of A CNN-based Object Recognition System

  • Keiji Yanai
  • Ryosuke Tanno
  • Koichi Okamoto

Context-aware Geometric Object Reconstruction for Mobile Education

  • Jinxin Zheng
  • Yongtao Wang
  • Zhi Tang

Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching

  • Jen-Chun Lin
  • Wen-Li Wei
  • Hsin-Min Wang

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

  • Kuan-Yu Chen
  • Shih-Hung Liu
  • Berlin Chen
  • Hsin-Min Wang
  • Hsin-Hsi Chen

Micro-Expression Recognition with Expression-State Constrained Spatio-Temporal Feature Representations

  • Dae Hoe Kim
  • Wissam J. Baddar
  • Yong Man Ro

Multimodal Interest Level Estimation via Variational Bayesian Mixture of Robust CCA

  • Yuma Sasaka
  • Takahiro Ogawa
  • Miki Haseyama

Transportation Mode Detection on Mobile Devices Using Recurrent Nets

  • Toan H. Vu
  • Le Dung
  • Jia-Ching Wang

Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection

  • Youbao Tang
  • Xiangqian Wu
  • Wei Bu

Deep Correlation Features for Image Style Classification

  • Wei-Ta Chu
  • Yi-Ling Wu

CNN vs. SIFT for Image Retrieval: Alternative or Complementary?

  • Ke Yan
  • Yaowei Wang
  • Dawei Liang
  • Tiejun Huang
  • Yonghong Tian

Looking Good With Flickr Faves: Gaussian Processes for Finding Difference Makers in Personality Impressions

  • Xiaoyu Xiong
  • Maurizio Filippone
  • Alessandro Vinciarelli

Ad Recommendation for Sponsored Search Engine via Composite Long-Short Term Memory

  • Dejiang Kong
  • Fei Wu
  • Siliang Tang
  • Yueting Zhuang

Learning a Multi-class Discriminative Dictionary with Nonredundancy Constraints for Visual Classification

  • Zhao Liu
  • Yuwei Wu
  • Junsong Yuan
  • Yap-peng Tan

A Compact Binary Aggregated Descriptor via Dual Selection for Visual Search

  • Yuwei Wu
  • Zhe Wang
  • Junsong Yuan
  • Lingyu Duan

Capped Lp-Norm Graph Embedding for Photo Clustering

  • Mengfan Tang
  • Feiping Nie
  • Ramesh Jain

Bidirectional Long-Short Term Memory for Video Description

  • Yi Bin
  • Yang Yang
  • Fumin Shen
  • Xing Xu
  • Heng Tao Shen

A Robust Distance with Correlated Metric Learning for Multi-Instance Multi-Label Data

  • Yashaswi Verma
  • C.V. Jawahar

Multiview Video Super-Resolution via Information Extraction and Merging

  • Yawei Li
  • Xiaofeng Li
  • Zhizhong Fu
  • Wenli Zhong

InnerView: Learning Place Ambiance from Social Media Images

  • Darshan Santani
  • Rui Hu
  • Daniel Gatica-Perez

Quartet-net Learning for Visual Instance Retrieval

  • Jiewei Cao
  • Zi Huang
  • Peng Wang
  • Chao Li
  • Xiaoshuai Sun
  • Heng Tao Shen

AKSDA-MSVM: A GPU-accelerated Multiclass Learning Framework for Multimedia

  • Stavros Arestis-Chartampilas
  • Nikolaos Gkalelis
  • Vasileios Mezaris

Automatic Reflection Removal using Gradient Intensity and Motion Cues

  • Chao Sun
  • Shuaicheng Liu
  • Taotao Yang
  • Bing Zeng
  • Zhengning Wang
  • Guanghui Liu

Personal Multi-view Viewpoint Recommendation based on Trajectory Distribution of the Viewing Target

  • Xueting Wang
  • Kensho Hara
  • Yu Enokibori
  • Takatsugu Hirayama
  • Kenji Mase

Motion Segmentation using Visual and Bio-mechanical Features

  • Stefano Alletto
  • Giuseppe Serra
  • Rita Cucchiara

Locality-preserving K-SVD Based Joint Dictionary and Classifier Learning for Object Recognition

  • Yuan-Shan Lee
  • Chien-Yao Wang
  • Seksan Mathulaprangsan
  • Jia-Hao Zhao
  • Jia-Ching Wang

Label Tree Embeddings for Acoustic Scene Classification

  • Huy Phan
  • Lars Hertel
  • Marco Maass
  • Philipp Koch
  • Alfred Mertins

Deep Learning for Image Memorability Prediction: the Emotional Bias

  • Yoann Baveye
  • Romain Cohendet
  • Matthieu Perreira Da Silva
  • Patrick Le Callet

Demand-adaptive Clothing Image Retrieval Using Hybrid Topic Model

  • Zhengzhong Zhou
  • Jingjin Zhou
  • Liqing Zhang

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

  • Foteini Markatopoulou
  • Vasileios Mezaris
  • Ioannis Patras

Application-Layer Rate-Adaptive Multicast Video Streaming over 802.11 for Mobile Devices

  • Raheeb Muzaffar
  • Evsen Yanmaz
  • Christian Bettstetter
  • Andrea Cavallaro

Scalable Compression of Deep Neural Networks

  • Xing Wang
  • Jie Liang

UnitBox: An Advanced Object Detection Network

  • Jiahui Yu
  • Yuning Jiang
  • Zhangyang Wang
  • Zhimin Cao
  • Thomas Huang

Alone versus In-a-group: A Comparative Analysis of Facial Affect Recognition

  • Wenxuan Mou
  • Hatice Gunes
  • Ioannis Patras

Local Diffusion Map Signature for Symmetry-aware Non-rigid Shape Correspondence

  • Meng Wang
  • Yi Fang

How Cosmopolitan Are Emojis?: Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics

  • Francesco Barbieri
  • German Kruszewski
  • Francesco Ronzano
  • Horacio Saggion

Online Weighted Clustering for Real-time Abnormal Event Detection in Video Surveillance

  • Hanhe Lin
  • Jeremiah D. Deng
  • Brendon J. Woodford
  • Ahmad Shahi

Accelerating Convolutional Neural Networks for Mobile Applications

  • Peisong Wang
  • Jian Cheng

News Program Detection in TV Broadcast Videos

  • Raghvendra Kannao
  • Durgaprasad Dandi
  • Swamy Yellapu
  • Prithwijit Guha

Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model

  • Wenyi Huang
  • Dafang He
  • Xiao Yang
  • Zihan Zhou
  • Daniel Kifer
  • C. Lee Giles

Global Consistent Shape Correspondence for Efficient and Effective Active Shape Models

  • Meng Wang
  • Yi Fang

Towards Ultra-Low-Bitrate Video Conferencing Using Facial Landmarks

  • Pin-Chun Wang
  • Ching-Ling Fan
  • Chun-Ying Huang
  • Kuan-Ta Chen
  • Cheng-Hsin Hsu

Generating Diverse Image Datasets with Limited Labeling

  • Niluthpol Chowdhury Mithun
  • Rameswar Panda
  • Amit K. Roy-Chowdhury

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction

  • Shizhe Chen
  • Qin Jin

Video Generation Using 3D Convolutional Neural Network

  • Shohei Yamamoto
  • Tatsuya Harada

Processing-Aware Privacy-Preserving Photo Sharing over Online Social Networks

  • Weiwei Sun
  • Jiantao Zhou
  • Ran Lyu
  • Shuyuan Zhu

Detecting Violence in Video using Subclasses

  • Xirong Li
  • Yujia Huo
  • Qin Jin
  • Jieping Xu

Deep Representation for Abnormal Event Detection in Crowded Scenes

  • Yachuang Feng
  • Yuan Yuan
  • Xiaoqiang Lu

Exploration of Large Image Corpuses in Virtual Reality

  • Sanket Khanwalkar
  • Shonali Balakrishna
  • Ramesh Jain

HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications

  • Alireza Zare
  • Alireza Aminlou
  • Miska M. Hannuksela
  • Moncef Gabbouj

MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint

  • Rui Wang
  • Dong Liang
  • Wei Zhang
  • Xiaochun Cao

A Novel Shadow-Free Feature Extractor for Real-Time Road Detection

  • Zhenqiang Ying
  • Ge Li
  • Xianghao Zang
  • Ronggang Wang
  • Wenmin Wang

Facial Expression Recognition with Deep two-view Support Vector Machine

  • Chongliang Wu
  • Shangfei Wang
  • Bowen Pan
  • Huaping Chen

Mental Visual Indexing: Towards Fast Video Browsing

  • Richang Hong
  • Jun He
  • Hanwang Zhang
  • Tat-Seng Chua

One Sensor is not Enough: Adapting and Fusing Sensors for the Quality Assessment of User Generated Video

  • Stefan Wilk
  • Manisha Luthra
  • Wolfgang Effelsberg

Boosting Video Description Generation by Explicitly Translating from Frame-Level Captions

  • Yuan Liu
  • Zhongchao Shi

Artist-based Classification via Deep Learning with Multi-scale Weighted Pooling

  • Kevin Alfianto Jangtjik
  • Mei-Chen Yeh
  • Kai-Lung Hua

CrowdNet: A Deep Convolutional Network for Dense Crowd Counting

  • Lokesh Boominathan
  • Srinivas S S Kruthiventi
  • R. Venkatesh Babu

Do Textual Descriptions Help Action Recognition?

  • Matteo Bruni
  • Tiberio Uricchio
  • Lorenzo Seidenari
  • Alberto Del Bimbo

Frame Untangling for Unobtrusive Display-Camera Visible Light Communication

  • Xiao Shu
  • Xiaolin Wu

Performance Measurements of Virtual Reality Systems: Quantifying the Timing and Positioning Accuracy

  • Chun-Ming Chang
  • Cheng-Hsin Hsu
  • Chih-Fan Hsu
  • Kuan-Ta Chen

Synthesizing Emerging Images from Photographs

  • Cheng-Han Yang
  • Ying-Miao Kuo
  • Hung-Kuo Chu

Predicting and Optimizing Image Compression

  • Oleksandr Murashko
  • John Thomson
  • Hugh Leather

Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition

  • Jouni Pohjalainen
  • Fabien Fabien Ringeval
  • Zixing Zhang
  • Björn Schuller

SESSION: Video Program

AntiLoiter: A Loitering Discovery System for Longtime Videos across Multiple Surveillance Cameras

  • Jianquan Liu
  • Shoji Nishimura
  • Takuya Araki

Magic Mirror: A Virtual Fashion Consultant

  • Yejun Liu
  • Jia Jia
  • Jingtian Fu
  • Yihui Ma
  • Jie Huang
  • Zijian Tong

Placing Broadcast News Videos in their Social Media Context Using Hashtags

  • Joseph G. Ellis
  • Svebor Karaman
  • Hongzhi Li
  • Hong Bin Shim
  • Shih-Fu Chang

DEMONSTRATION SESSION: Demonstrations

MARIM: Mobile Augmented Reality for Interactive Manuals

  • Tam V. Nguyen
  • Dorothy Tan
  • Bilal Mirza
  • Jose Sepulveda

A Live Face Swapper

  • Shengtao Xiao
  • Luoqi Liu
  • Xuecheng Nie
  • Jiashi Feng
  • Ashraf A. Kassim
  • Shuicheng Yan

WorkCache: Salvaging siloed knowledge

  • Scott Carter
  • Laurent Denoue
  • Matthew Cooper

Hypervideo Production Using Crowdsourced Youtube Videos

  • Stefan John
  • Christian Handschigl
  • Britta Meixner
  • Michael Granitzer

SceneTextReg: A Real-Time Video OCR System

  • Haojin Yang
  • Cheng Wang
  • Christian Bartz
  • Christoph Meinel

Beauty eMakeup: A Deep Makeup Transfer System

  • Xinyu Ou
  • Si Liu
  • Xiaochun Cao
  • Hefei Ling

Real-time Wearable Computer Vision System for Improved Museum Experience

  • Giovanni Taverriti
  • Stefano Lombini
  • Lorenzo Seidenari
  • Marco Bertini
  • Alberto Del Bimbo

An Intention-Aware Interactive System for Mobile Video Browsing

  • Jun He
  • Hanwang Zhang
  • Ling Shen
  • Richang Hong
  • Tat-Seng Chua

A Multimodal Gamified Platform for Real-Time User Feedback in Sports Performance

  • David Monaghan
  • Freddie Honohan
  • Amin Ahmadi
  • Troy McDaniel
  • Ramin Tadayon
  • Ajay Karpur
  • Kieran Morran
  • Noel E. O'Connor
  • Sethuraman Panchanathan

PlaylistCreator: An Assisted Approach for Playlist Creation

  • Ricardo Dias
  • Daniel Gonçalves
  • Manuel J. Fonseca

WIMBY: What's in My Backyard?

  • Michael Dorkhom
  • Alan Woodley
  • Shlomo Geva
  • Richi Nayak

SuperSelect: An Interactive Superpixel-Based Segmentation Method for Touch Displays

  • Christoph Korinke
  • Tim Claudius Stratmann
  • Tim Laue
  • Susanne Boll

ThePlantGame: Actively Training Human Annotators for Domain-specific Crowdsourcing

  • Maximilien Servajean
  • Alexis Joly
  • Dennis Shasha
  • Julien Champ
  • Esther Pacitti

A Multi-Video Browser for Endoscopic Videos on Tablets

  • Marco A. Hudelist
  • Sabrina Kletz
  • Klaus Schoeffmann

A Tablet Annotation Tool for Endoscopic Videos

  • Marco A. Hudelist
  • Sabrina Kletz
  • Klaus Schoeffmann

News Archive Exploration Combining Face Detection and Tracking with Network Visual Analytics

  • Benjamin Renoust
  • Thanh Duc Ngo
  • Duy-Dinh Le
  • Shin'Ichi Satoh

A New Tool for Collaborative Video Search via Content-based Retrieval and Visual Inspection

  • Wolfgang Hürst
  • Algernon Ip Vai Ching
  • Marco A. Hudelist
  • Manfred J. Primus
  • Klaus Schoeffmann
  • Christian Beecks

A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation

  • Lorenzo Baraldi
  • Costantino Grana
  • Alberto Messina
  • Rita Cucchiara

First-Person Shooter Game for Virtual Reality Headset with Advanced Multi-Agent Intelligent System

  • Ilya Makarov
  • Mikhail Tokmakov
  • Pavel Polyakov
  • Peter Zyuzin
  • Maxim Martynov
  • Oleg Konoplya
  • George Kuznetsov
  • Ivan Guschenko-Cheverda
  • Maxim Uriev
  • Ivan Mokeev
  • Olga Gerasimova
  • Lada Tokmakova
  • Alexey Kosmachev

SuperStreamer: Enabling Progressive Content Streaming in a Game Engine

  • Yong Xue Eu
  • Jermyn Tanu
  • Justin Jieting Law
  • Muhammad Hanif B Ghazali
  • Shuan Siang Tay
  • Wei Tsang Ooi
  • Anand Bhojan

DeepSketch2Image: Deep Convolutional Neural Networks for Partial Sketch Recognition and Image Retrieval

  • Omar Seddati
  • Stéphane Dupont
  • Saïd Mahmoudi

A Fast Cattle Recognition System using Smart devices

  • Santosh Kumar
  • Sanjay Kumar Singh
  • Tanima Datta
  • Hari Prabhat Gupta

Vibrotactile Experiences for Augmented Reality

  • Wolfgang Hürst
  • Nina Rosa
  • Jean-Paul van Bommel

Image2Text: A Multimodal Image Captioner

  • Chang Liu
  • Changhu Wang
  • Fuchun Sun
  • Yong Rui

History Rhyme: Searching Historic Events by Multimedia Knowledge

  • Yifan Xiong
  • Jia Chen
  • Qin Jin
  • Chao Zhang

Intelli-Wrench: Smart Navigation Tool for Mechanical Assembly and Maintenance

  • Toru Takahashi
  • Yuta Kudo
  • Rui Ishiyama

Interactive Image Search for Clothing Recommendation

  • Zhengzhong Zhou
  • Yifei Xu
  • Jingjin Zhou
  • Liqing Zhang

Video ChatBot: Triggering Live Social Interactions by Automatic Video Commenting

  • Yehao Li
  • Ting Yao
  • Rui Hu
  • Tao Mei
  • Yong Rui

bBridge: A Big Data Platform for Social Multimedia Analytics

  • Aleksandr Farseev
  • Ivan Samborskii
  • Tat-Seng Chua

Scalable Multimedia Streaming in Wireless Networks with Device-to-Device Cooperation

  • Karim Jahed
  • Sanaa Sharafeddine
  • Abdallah Moussawi
  • Abbas Abou Daya
  • Hassan Dbouk
  • Saadallah Kassir
  • Zaher Dawy
  • Preethi Valsalan
  • Wael Cherif
  • Fethi Filali

Leveraging ICN for Secure Content Distribution in IP Networks

  • Syed Obaid Amin
  • Qingji Zheng
  • Ravishankar Ravindran
  • GQ Wang

SESSION: Art Exhibition

Data Aesthetics: The Ethics and Aesthetics of Big Data Gathering seen from the Artists Eye

  • Lucas Evers
  • Frank Nack

SESSION: Topics in Multimedia I

Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing

  • Hanwang Zhang
  • Meng Wang
  • Richang Hong
  • Tat-Seng Chua

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification

  • Zuxuan Wu
  • Yu-Gang Jiang
  • Xi Wang
  • Hao Ye
  • Xiangyang Xue

QoE Prediction for Enriched Assessment of Individual Video Viewing Experience

  • Yi Zhu
  • Alan Hanjalic
  • Judith A. Redi

Deep CTR Prediction in Display Advertising

  • Junxuan Chen
  • Baigui Sun
  • Hao Li
  • Hongtao Lu
  • Xian-Sheng Hua

SESSION: Analysis & Search

Event Specific Multimodal Pattern Mining for Knowledge Base Construction

  • Hongzhi Li
  • Joseph G. Ellis
  • Heng Ji
  • Shih-Fu Chang

Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

  • Jingkuan Song
  • Lianli Gao
  • Mihai Marian Puscas
  • Feiping Nie
  • Fumin Shen
  • Nicu Sebe

Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation

  • Qianqian Xu
  • Jiechao Xiong
  • Xiaochun Cao
  • Yuan Yao

Weighted Linear Fusion of Multimodal Data: A Reasonable Baseline?

  • Ognjen Arandjelovic

SESSION: Video Analysis & Streaming

DRIVING: Distributed Scheduling for Video Streaming in Vehicular Wi-Fi Systems

  • Xi Chen
  • Lei Rao
  • Qiao Xiang
  • Xue Liu
  • Fan Bai

Dynamic Resource Provisioning with QoS Guarantee for Video Transcoding in Online Video Sharing Service

  • Guanyu Gao
  • Yonggang Wen
  • Cedric Westphal

High-speed Depth Stream Generation from a Hybrid Camera

  • Xinxin Zuo
  • Sen Wang
  • Jiangbin Zheng
  • Ruigang Yang

Spatio-Temporal Analysis of Bandwidth Maps for Geo-Predictive Video Streaming in Mobile Environments

  • Bayan Taani
  • Roger Zimmermann

SESSION: Topics in Multimedia II

Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model

  • Jingyuan Chen
  • Xuemeng Song
  • Liqiang Nie
  • Xiang Wang
  • Hanwang Zhang
  • Tat-Seng Chua

Leveraging Contextual Cues for Generating Basketball Highlights

  • Vinay Bettadapura
  • Caroline Pantofaru
  • Irfan Essa

Server Allocation for Multiplayer Cloud Gaming

  • Yunhua Deng
  • Yusen Li
  • Xueyan Tang
  • Wentong Cai

Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding

  • Yehao Li
  • Ting Yao
  • Tao Mei
  • Hongyang Chao
  • Yong Rui

SESSION: Brave News Topic

Research Challenges in Developing Multimedia Systems for Managing Emergency Situations

  • Mengfan Tang
  • Siripen Pongpaichet
  • Ramesh Jain

Multimedia on the Mountaintop: Using Public Snow Images to Improve Water Systems Operation

  • Andrea Castelletti
  • Roman Fedorov
  • Piero Fraternali
  • Matteo Giuliani

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

  • Alexis Joly
  • Hervé Goëau
  • Julien Champ
  • Samuel Dufour-Kowalski
  • Henning Müller
  • Pierre Bonnet

Multimedia and Medicine: Teammates for Better Disease Detection and Survival

  • Michael Riegler
  • Mathias Lux
  • Carsten Gridwodz
  • Concetto Spampinato
  • Thomas de Lange
  • Sigrun L. Eskeland
  • Konstantin Pogorelov
  • Wallapak Tavanapong
  • Peter T. Schmidt
  • Cathal Gurrin
  • Dag Johansen
  • Håvard Johansen
  • Pål Halvorsen

SESSION: Deep Learning

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

  • Xiaodong Yang
  • Pavlo Molchanov
  • Jan Kautz

Image Captioning with Deep Bidirectional LSTMs

  • Cheng Wang
  • Haojin Yang
  • Christian Bartz
  • Christoph Meinel

Deep Cross Residual Learning for Multitask Visual Recognition

  • Brendan Jou
  • Shih-Fu Chang

Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks

  • Quanzeng You
  • Liangliang Cao
  • Hailin Jin
  • Jiebo Luo

SESSION: Events and Context

Context-aware Image Tweet Modelling and Recommendation

  • Tao Chen
  • Xiangnan He
  • Min-Yen Kan

Semantic Image Profiling for Historic Events: Linking Images to Phrases

  • Jia Chen
  • Qin Jin
  • Yifan Xiong

Audio Event Detection using Weakly Labeled Data

  • Anurag Kumar
  • Bhiksha Raj

Event Localization in Music Auto-tagging

  • Jen-Yu Liu
  • Yi-Hsuan Yang

SESSION: Multimedia Grand Challenge

Face Recognition via Active Annotation and Learning

  • Hao Ye
  • Weiyuan Shao
  • Hong Wang
  • Jianqi Ma
  • Li Wang
  • Yingbin Zheng
  • Xiangyang Xue

Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition

  • Yue Wu
  • Jun Li
  • Yu Kong
  • Yun Fu

Robust Face Recognition with Deep Multi-View Representation Learning

  • Jianshu Li
  • Jian Zhao
  • Fang Zhao
  • Hao Liu
  • Jing Li
  • Shengmei Shen
  • Jiashi Feng
  • Terence Sim

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

  • Rakshith Shetty
  • Jorma Laaksonen

Contextual Enrichment of Remote-Sensed Events with Social Media Streams

  • Benjamin Bischke
  • Damian Borth
  • Christian Schulze
  • Andreas Dengel

Early Embedding and Late Reranking for Video Captioning

  • Jianfeng Dong
  • Xirong Li
  • Weiyu Lan
  • Yujia Huo
  • Cees G.M. Snoek

Describing Videos using Multi-modal Fusion

  • Qin Jin
  • Jia Chen
  • Shizhe Chen
  • Yifan Xiong
  • Alexander Hauptmann

Multimodal Video Description

  • Vasili Ramanishka
  • Abir Das
  • Dong Huk Park
  • Subhashini Venugopalan
  • Lisa Anne Hendricks
  • Marcus Rohrbach
  • Kate Saenko

Tracking Natural Events through Social Media and Computer Vision

  • Jingya Wang
  • Mohammed Korayem
  • Saul Blanco
  • David J. Crandall

ConTagNet: Exploiting User Context for Image Tag Recommendation

  • Yogesh Singh Rawat
  • Mohan S. Kankanhalli

Image Captioning with both Object and Scene Information

  • Xiangyang Li
  • Xinhang Song
  • Luis Herranz
  • Yaohui Zhu
  • Jiang Shuqiang

Generating Affective Captions using Concept And Syntax Transition Networks

  • Tushar Karayil
  • Philipp Blandfort
  • Damian Borth
  • Andreas Dengel

SESSION: Keynote 2

Visual Analytics for Multimedia: Challenges and Opportunities

  • Jarke J. van Wijk

SESSION: Topics in Multimedia III

V3I-STAL: Visual Vehicle-to-Vehicle Interaction via Simultaneous Tracking and Localization

  • Xiaobai Liu

Are Safer Looking Neighborhoods More Lively?: A Multimodal Investigation into Urban Life

  • Marco De Nadai
  • Radu Laurentiu Vieriu
  • Gloria Zen
  • Stefan Dragicevic
  • Nikhil Naik
  • Michele Caraviello
  • Cesar Augusto Hidalgo
  • Nicu Sebe
  • Bruno Lepri

Detecting Sarcasm in Multimodal Social Platforms

  • Rossano Schifanella
  • Paloma de Juan
  • Joel Tetreault
  • LiangLiang Cao

User Redirection and Direct Haptics in Virtual Environments

  • Cristiano Carvalheiro
  • Rui Nóbrega
  • Hugo da Silva
  • Rui Rodrigues

SESSION: Open Source Software Competition

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

  • Chengxi Ye
  • Chen Zhao
  • Yezhou Yang
  • Cornelia Fermüller
  • Yiannis Aloimonos

Morph: A Fast and Scalable Cloud Transcoding System

  • Guanyu Gao
  • Yonggang Wen

Smart Beholder: An Extensible Smart Lens Platform

  • Chun-Ying Huang
  • Ching-Ling Fan
  • Chih-Fan Hsu
  • Hsin-Yu Chang
  • Tsung-Han Tsai
  • Kuan-Ta Chen
  • Cheng-Hsin Hsu

A Platform for Building New Human-Computer Interface Systems that Support Online Automatic Recognition of Audio-Gestural Commands

  • Nikolaos Kardaris
  • Isidoros Rodomagoulakis
  • Vassilis Pitsikalis
  • Antonis Arvanitakis
  • Petros Maragos

madmom: A New Python Audio and Music Signal Processing Library

  • Sebastian Böck
  • Filip Korzeniowski
  • Jan Schlüter
  • Florian Krebs
  • Gerhard Widmer

Kvazaar: Open-Source HEVC/H.265 Encoder

  • Marko Viitanen
  • Ari Koivula
  • Ari Lemmetti
  • Arttu Ylä-Outinen
  • Jarno Vanne
  • Timo D. Hämäläinen

vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections

  • Luca Rossetto
  • Ivan Giangreco
  • Claudiu Tanase
  • Heiko Schuldt

Kurento: The WebRTC Modular Media Server

  • Luis López
  • Miguel París
  • Santiago Carot
  • Boni García
  • Micael Gallego
  • Francisco Gortázar
  • Raul Benítez
  • Jose A. Santos
  • David Fernández
  • Radu Tom Vlad
  • Iván Gracia
  • Francisco Javier López

Modular Parallelization Framework for Multi-Stream Video Processing

  • Tim Lenertz
  • Gauthier Lafruit

OpenVQ: A Video Quality Assessment Toolkit

  • Kristian Skarseth
  • Henrik Bjørlo
  • Pål Halvorsen
  • Michael Riegler
  • Carsten Griwodz

CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

  • Seyyed Salar Latifi Oskouei
  • Hossein Golestani
  • Matin Hashemi
  • Soheil Ghiasi

Tamp: A Library for Compact Deep Neural Networks with Structured Matrices

  • Bingchen Gong
  • Brendan Jou
  • Felix Yu
  • Shih-Fu Chang

Barrista: Caffe Well-Served

  • Christoph Lassner
  • Daniel Kappler
  • Martin Kiefel
  • Peter Gehler

Pyo, the Python DSP toolbox

  • Olivier Belanger

SenseCap: Synchronized Data Collection with Microsoft Kinect2 and LeapMotion

  • Julian F.P. Kooij

MP3DG-PCC, Open Source Software Framework for Implementation and Evaluation of Point Cloud Compression

  • Rufael Mekuria
  • Pablo Cesar

SESSION: Learning & Hashing

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

  • Keze Wang
  • Shengfu Zhai
  • Hui Cheng
  • Xiaodan Liang
  • Liang Lin

Cross-batch Reference Learning for Deep Classification and Retrieval

  • Huei-Fang Yang
  • Kevin Lin
  • Chu-Song Chen

Binary Optimized Hashing

  • Qi Dai
  • Jianguo Li
  • Jingdong Wang
  • Yu-Gang Jiang

Linear Distance Preserving Pseudo-Supervised and Unsupervised Hashing

  • Min Wang
  • Wengang Zhou
  • Qi Tian
  • Zhengjun Zha
  • Houqiang Li

SESSION: Transport & Experience

A Pragmatically Designed Adaptive and Web-compliant Object-based Video Streaming Methodology: Implementation and Subjective Evaluation

  • Maarten Wijnants
  • Gustavo Rovelo
  • Peter Quax
  • Wim Lamotte

A Perceptual Quality Metric for Videos Distorted by Spatially Correlated Noise

  • Chao Chen
  • Mohammad Izadi
  • Anil Kokaram

Zero-Shot Hashing via Transferring Supervised Knowledge

  • Yang Yang
  • Yadan Luo
  • Weilun Chen
  • Fumin Shen
  • Jie Shao
  • Heng Tao Shen

SDNDASH: Improving QoE of HTTP Adaptive Streaming Using Software Defined Networking

  • Abdelhak Bentaleb
  • Ali C. Begen
  • Roger Zimmermann

SESSION: Topics in Multimedia IV

Query Adaptive Instance Search using Object Sketches

  • Sreyasee Das Bhattacharjee
  • Junsong Yuan
  • Weixiang Hong
  • Xiang Ruan

Key Color Generation for Affective Multimedia Production: An Initial Method and Its Application

  • EunJin Kim
  • Hyeon-Jeong Suk

Academic Coupled Dictionary Learning for Sketch-based Image Retrieval

  • Dan Xu
  • Xavier Alameda-Pineda
  • Jingkuan Song
  • Elisa Ricci
  • Nicu Sebe

Time Matters: Multi-scale Temporalization of Social Media Popularity

  • Bo Wu
  • Wen-Huang Cheng
  • Yongdong Zhang
  • Tao Mei

SESSION: Analysis & Middleware

Transform-Invariant Convolutional Neural Networks for Image Classification and Search

  • Xu Shen
  • Xinmei Tian
  • Anfeng He
  • Shaoyan Sun
  • Dacheng Tao

PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval

  • Liang Zhang
  • Bingpeng Ma
  • Guorong Li
  • Qingming Huang
  • Qi Tian

Video eCommerce: Towards Online Video Advertising

  • Zhi-Qi Cheng
  • Yang Liu
  • Xiao Wu
  • Xian-Sheng Hua

Affective Contextual Mobile Recommender System

  • Chao Wu
  • Jia Jia
  • Wenwu Zhu
  • Xu Chen
  • Bowen Yang
  • Yaoxue Zhang

SESSION: Emotions, People and Faces

Predicting Personalized Emotion Perceptions of Social Images

  • Sicheng Zhao
  • Hongxun Yao
  • Yue Gao
  • Rongrong Ji
  • Wenlong Xie
  • Xiaolei Jiang
  • Tat-Seng Chua

StressClick: Sensing Stress from Gaze-Click Patterns

  • Michael Xuelin Huang
  • Jiajia Li
  • Grace Ngai
  • Hong Va Leong

Ensemble of Sparse Cross-Modal Metrics for Heterogeneous Face Recognition

  • Jing Huo
  • Yang Gao
  • Yinghuan Shi
  • Wanqi Yang
  • Hujun Yin

Shorter-is-Better: Venue Category Estimation from Micro-Video

  • Jianglong Zhang
  • Liqiang Nie
  • Xiang Wang
  • Xiangnan He
  • Xianglin Huang
  • Tat Seng Chua

SESSION: Doctoral Symposium

Multimodal-based Multimedia Analysis, Retrieval, and Services in Support of Social Media Applications

  • Rajiv Ratn Shah

Geospatial Multimedia Data for Situation Recognition

  • Mengfan Tang

Image Emotion Computing

  • Sicheng Zhao

First Person View Video Summarization Subject to the User Needs

  • Ana Garcia del Molino

Sentiment and Emotion Analysis for Social Multimedia: Methodologies and Applications

  • Quanzeng You

n-Dimensional Display Interface

  • Charles D. Estes

Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection

  • Jingyuan Chen

Weakly-Supervised Recognition, Localization, and Explanation of Visual Entities

  • Pascal Mettes

Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization

  • Yi-Jie Lu

TUTORIAL SESSION: Tutorials

Emerging Topics in Learning from Noisy and Missing Data

  • Xavier Alameda-Pineda
  • Timothy M. Hospedales
  • Elisa Ricci
  • Nicu Sebe
  • Xiaogang Wang

The Lifecycle of Geotagged Multimedia Data

  • Rossano Schifanella
  • Bart Thomée

Technology & Art in Stimulating Creative Placemaking in Public-Use Spaces

  • Wendy Ann Mansilla
  • Andrew Perkis

Situation Recognition from Multimodal Data

  • Vivek K. Singh
  • Siripen Pongpaichet
  • Ramesh Jain

Social and Affective Robotics Tutorial

  • Maja Pantic
  • Vanessa Evers
  • Marc Deisenroth
  • Luis Merino
  • Bjoern Schuller

Multimedia Privacy

  • Gerald Friedland
  • Symeon Papadopoulos
  • Julia Bernd
  • Yiannis Kompatsiaris

WORKSHOP SESSION: Workshops

AltMM 2016: 1st International Workshop on Multimedia Alternate Realities

  • Teresa Chambel
  • Rene Kaiser
  • Omar Niamut
  • Wei Tsang Ooi
  • Judith A. Redi

Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge

  • Michel Valstar
  • Jonathan Gratch
  • Björn Schuller
  • Fabien Ringeval
  • Roddy Cowie
  • Maja Pantic

Multimedia COMMONS Workshop 2016 (MMCommons'16): Datasets, Evaluation, and Reproducibility

  • Bart Thomee
  • Damian Borth
  • Julia Bernd

LTA 2016: The First Workshop on Lifelogging Tools and Applications

  • Cathal Gurrin
  • Xavier Giro-i-Nieto
  • Petia Radeva
  • Mariella Dimiccoli
  • Håvard Johansen
  • Hideo Joho
  • Vivek K. Singh

Overview of the ACM MultiMedia 2016 International Workshop on Multimedia Assisted Dietary Management

  • Stavroula Mougiakakou
  • Giovanni Maria Farinella
  • Keiji Yanai

Multimedia for personal health and health care

  • Susanne Boll
  • Kiyo Aizawa
  • Alexia Briasouli
  • Cathal Gurrin
  • Laleh Jalali
  • Jochen Meyer

Vision and Language Integration Meets Multimedia Fusion: Proceedings of ACM Multimedia 2016 Workshop

  • Marie-Francine Moens
  • Katerina Pastra
  • Kate Saenko
  • Tinne Tuytelaars

Seventh International Workshop on Human Behavior Understanding (HBU 2016)

  • Mohamed Chetouani
  • Jeffrey Cohn
  • Albert Ali Salah