MediaEval 2010: Multimedia Retrieval Benchmark Evaluation

June 2010


By Martha Larson and Gareth J.F. Jones

MediaEval is a benchmarking initiative that offers tasks promoting research and innovation on multimodal approaches to multimedia access and retrieval. The focus is on speech, language and social features and how they can be combined with visual features in a variety of innovative tasks. MediaEval is a continuation and expansion of VideoCLEF, which ran 2008-2009 in the CLEF Campaign ( Groups participate in MediaEval by signing up, carrying out tasks, and attending the MediaEval workshop.

In 2010, MediaEval is running five tasks, described briefly here and in greater detail on the website (

The Affect Task involves the prediction of viewer-reported boredom for Internet video. Participants are provided with a set of episodes from a video blog and asked to develop systems that automatically rank the episodes in terms of predicted viewer-reported boredom. The Affect Task corpus is small but unique - boredom scores were generated using a crowdsourcing technique for collecting affective labels. The corpus includes speech recognition transcripts (courtesy of ICSI & SRI International) and extracted keyframes (courtesy of Technische Universit´┐Żt Berlin). The task was designed with the idea that if boredom scores can be predicted with sufficient reliability, they could be combined with topic-specific information in order to improve user satisfaction with video search engines.

The Linking Task involves generating useful links between points within a video and articles in Wikipedia. Participants are supplied with a set of videos (Dutch language) in which segments, so-called multimedia anchors, have been marked. They are asked to link each anchor to a relevant article from the English language Wikipedia. The corpus includes speech recognition transcripts (courtesy of University of Twente) and extracted keyframes (courtesy of Dublin City University). The task also ran in VideoCLEF 2009. Linking is useful in an application in which viewers would like background information about the subject content of video.

The Placing Task is geo-tagging tagging task - it involves answering the question, "Where in the world?" Participants automatically predict the location of Flickr videos using one or more of: video metadata (tags, titles), visual content, audio content, and social information. Any use of open resources, such as gazetteers, or geo-tagged articles in Wikipedia is encouraged. The goal of the task is to come as close to possible to the geo-coordinates of the videos as provided by users or their GPS devices. This task was born of the observation that geo-tags are highly valuable for multimedia browsing and retrieval systems, but are lacking for many Internet videos.

The Tagging Task (Professional Version) tackles the challenge of predicting the semantic theme of television content. Participants are required to automatically assign subject labels to Dutch-language television content using features derived from speech, audio, visual content or associated archival metadata. This task re-uses the TRECVid 2007-2009 data from the archive of the Netherlands Institute for Sound and Vision for a new task - the Tagging Task goes beyond what is depicted in the visual channel to focus on "aboutness" of the video as a whole. The subject labels for the task are drawn from the thesaurus used by Sound and Vision archivists to annotate content in the archive.

The Tagging Task (Wild Wild Web Version) is not for the fainthearted. This task requires participants to develop automatic methods that are capable of reproducing tags assigned by the creators of Internet videos. Participants are provided with a corpus of Creative Commons licensed Internet video that includes speech recognition transcripts (courtesy of LIMSI and VECSYS Research) and extracted keyframes. The ultimate goal of this task is to improve the quality and completeness (and thereby usefulness) of user-contributed tags in social video collections.

Currently, MediaEval 2010 is well underway and we are actively planning for MediaEval 2011. In particular, we are recruiting collaborators - groups or projects who are interested in getting involved in organizing multimedia benchmarking activities. Such groups are invited to submit a summary and/or make a presentation at the MediaEval 2010 workshop. The workshop will be held in Pisa, Italy on Oct 24th, 2010, conveniently located for those attending ACM Multimedia, which starts the next day nearby in Firenze.

MediaEval 2010 is sponsored by PetaMedia (, a FP7 EU Network of Excellence dedicated to research and development in the area of multimedia access and retrieval.

Previous Section Table of Contents Next Section