NTT Docomo Challenge: Event Understanding through Social Media and its Text-Visual Summarization

This challenge is seeking innovative techniques for data-mining on social media to retrieve, summarize, and visualize events for a selected topic. Today, Twitter has become the common communication tool for both business and private purposes. Twitter is an online social networking service, which allows the user to share his/her interests and activities to other users including his/her friends. Such user’s posts may contain photos taken by the user, restaurant reviews, information about shopping discounts at local stores, or comments on the TV program which he/she is currently watching.

The challenge to researchers is to explore Twitter data and extract the data regarding to a selected topic by the researchers, and to summarize the extracted data in a format of a magazine, where each article represents an event. For an example, one may extract the data regarding to the “local events” for New York City, and summarizes these data to have the magazine like “New York of the Day.” Or, one may pick “the Olympic” or “the World Series” as the topic and create the topic specific magazine. There is no limitation of topic to be selected.

As an example of “New York of the Day”, this challenge may require following processes after successfully collecting enough images and texts from social media. (The researchers working on this challenge are not limited with below processes, and they are allowed to take any novel approaches to solve the problem.)

  • Extract the local events from the Twitter data
  • Assign the location information to the image
  • Create a text summary of each local event with tweets and other 3rd party contents
  • Assign the most relevant images to each local event
  • Layout the articles and design the magazine

The “local event” extraction and assigning the appropriate location to the image from the social data could be challenging. The article layout and magazine design could also be critical to have the impressive, nice-looking magazine.

Input
Researcher working on this challenge should collect necessary data from Twitter or Flicker. There will be at least three types of data requirement for this challenge.

  • Images: Twitter or Flicker, or both
  • Text: Tweets from Twitter
  • 3rd party contents: News website such as New York Times, Blog, and others.

Output
To give a brief idea, the output could be in a format of magazine, in which each article represents an event and each article is associated with either/both related images and texts. These images and texts should be self-explanatory to the article. The magazine could be summarized as daily basis, hourly basis, or even shorter. As an example of “New York of the Day”, we think there would not be enough data to create the magazine for an hourly basis, since there would not be enough data.

Evaluation
There will be following three criteria for evaluation.

  • Relevance of the summary/article to the actual topic
  • Relevance of the related images to the abstract text, or vice-versa
  • Quality of magazine design