ATQAM/MAST'20: Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends

ATQAM/MAST'20: Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media<br /> Analytics for Societal Trends<br />

ATQAM/MAST'20: Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media
Analytics for Societal Trends

Digital Library logo
Full Citation in the ACM Digital Library

SESSION: ATQAM'20 Keynote & Invited Talks I

Speeding it Up: Perception of High-Frame Rate Videos

Al Bovik

Modern streaming video providers continuously seek to improve consumer experiences
by delivering higher-quality, denser content. An important direction that bears study
is high-frame rate (HFR) videos, which present unique problems involving balances
between frame rate, video quality, and compression. I will describe new large-scale
perceptual studies that we have conducted that are focused on these issues. I will
also describe new computational video quality models that address highly practical
questions, such as frame rate selection versus compression, and how to combine space-time
sampling with compression. My hopes are that these contributions will help further
advance the global delivery of HFR video content

Going Big or Going Precise: Considerations in building the next-gen VQA Database

Franz Götz-Hahn

Annotated data is a requirement for any kind of modeling of subjective attributes
and is usually constrained by a fixed budget available for paying annotators. The
distribution of this budget is non-trivial, if the available data is large enough.

In the case of video quality assessment (VQA) datasets, it has been commonly deemed
more important to evaluate at a higher precision, i.e. getting more annotations for
each item, than getting more data annotated less precisely. Considering the highly
complex way different technical quality impairments caused by different parts of multiple
video processing pipelines interact, the few hundred items comprising existing VQA
datasets are unlikely to cover the vast degradation space required to generalize well.
An open question, then, is whether some annotation precision can be sacrificed for
additional data without loss of generalization power. How does shifting the vote budget
from say 1,000 items at 100 annotations to 100,000 items with a single annotation
affect predictive performances of state-of-the-art models?

This talk addresses this question at the hand of a new large-scale two-part VQA dataset
[1] comprising, on the one hand, over 1,500 items annotated with a minimum of 89 votes
and, on the other hand, over 150,000 items annotated with 5 votes. Based on this dataset,
different VQA approaches were compared at different distributions of a fixed vote
budget and, surprisingly, their generalization performance was found to be invariant
to this distribution of the budget. This held true for the typical within-dataset
testing as well as cross-dataset testing.

SESSION: ATQAM'20 Session 1

EVA: An Explainable Visual Aesthetics Dataset

Chen Kang
Giuseppe Valenzise
Frédéric Dufaux

Assessing visual aesthetics has important applications in several domains, from image
retrieval and recommendation to enhancement. Modern aesthetic quality predictors are
data driven, and leverage the availability of large annotated datasets to train accurate
models. However, labels in existing datasets are often noisy, incomplete or they do
not allow more sophisticated tasks such as understanding why an image looks beautiful
or not to a human observer. In this paper, we propose an Explainable Visual Aesthetics
(EVA) dataset, which contains 4070 images with at least 30 votes per image. Compared
to previous datasets, EVA has been crowdsourced using a more disciplined approach
inspired by quality assessment best practices. It also offers additional features,
such as the degree of difficulty in assessing the aesthetic score, rating for 4 complementary
aesthetic attributes, as well as the relative importance of each attribute to form
aesthetic opinions. A statistical analysis on EVA demonstrates that the collected
attributes and relative importance can be linearly combined to explain effectively
the overall aesthetic mean opinion scores. The dataset, made publicly available, is
expected to contribute to future research on understanding and predicting visual quality
aesthetics.

SESSION: ATQAM'20 Keynote & Invited Talks II

Modeling Aesthetics and Emotions in Visual Content: From Vincent van Gogh to Robotics and Vision

James Z. Wang

As inborn characteristics, humans possess the ability to judge visual aesthetics,
feel the emotions from the environment and comprehend others? emotional expressions.
Many exciting applications become possible if robots or computers can be empowered
with similar capabilities. Modeling aesthetics, evoked emotions, and emotional expressions
automatically in unconstrained situations, however, is daunting due to the lack of
a full understanding of the relationship between low-level visual content and high-level
aesthetics or emotional expressions. With the growing availability of data, it is
possible to tackle these problems using machine learning and statistical modeling
approaches. In the talk, I provide an overview of our research in the last two decades
on data-driven analyses of visual artworks and digital visual content for modeling
aesthetics and emotions.

First, I discuss our analyses of styles in visual artworks. Art historians have long
observed the highly characteristic brushstroke styles of Vincent van Gogh and have
relied on discerning these styles for authenticating and dating his works. In our
work, we compared van Gogh with his contemporaries by statistically analyzing a massive
set of automatically extracted brushstrokes. A novel extraction method is developed
by exploiting an integration of edge detection and clustering-based segmentation.
Evidence substantiates that van Gogh's brushstrokes are strongly rhythmic.

Next, I describe an effort to model the aesthetic and emotional characteristics in
visual contents such as photographs. By taking a data-driven approach, using the Internet
as the data source, we show that computers can be trained to recognize various characteristics
that are highly relevant to aesthetics and emotions. Future computer systems equipped
with such capabilities are expected to help millions of users with unimagined ways.

Finally, I highlight our research on automated recognition of bodily expression of
emotion. We propose a scalable and reliable crowdsourcing approach for collecting
in-the-wild perceived emotion data for computers to learn to recognize the body language
of humans. Comprehensive statistical analysis revealed many interesting insights from
the dataset. A system to model the emotional expressions based on bodily movements,
named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been
developed and evaluated.

Rating Distribution and Personality Prediction for ImageAesthetics Assessment

Weisi Lin

Aesthetics has been an area of intensive interests and continuing exploration for
long in multiple disciplines, such as philology, psychology, arts, photography, computer
graphics, media, industrial design, and so on. Objective image aesthetic assessment
(IAA) is related to three major considerations. First of all, technical quality assessment
(TQA) of images still plays an important role in general, because basic visual features
(e.g., contrast, brightness, colorfulness and semantic information) definitely influence
humans' perception and experience. TQA has been already relatively better developed
during the past two decades, so to be successful, IAA needs to focus more on the other
two considerations that are special to it. The first consideration special to IAA
is generic IAA (GIAA) which deals with aesthetic factors common to a typical human
being or user, with examples of Rule of Thirds, symmetry, depth of field, object saliency,
color Harmony, etc. The last special consideration is personalized IAA (PIAA) that
is crucial in enabling many practical tasks, because "beauty is in the eye of the
beholder".

Compared with mere TQA, IAA is expected to be much more individualized, and machine
learning provides an effective mean for the related tasks to be tackled. This talk
will therefore introduce and discuss two issues particularly significant to PIAA:
rating distribution prediction and personality-assisted aesthetic assessment. For
the former, objective prediction will be demonstrated to be able to predict the subjective
rating distribution (rather just the mean opinion score (MOS) in most existing TQA
methods), since PIAA may have higher diversity of opinions from subjects (even with
twin-peak distribution), especially for abstract art images. In such situations, a
simple MOS estimation alone is far from the real opinions of aesthetics. For the latter,
viewers/users of different personality (determined by the oft-used Big-Five scheme)
have different preferences toward various categories of images. Personality prediction
and its use in PIAA will be explored in hopes of creating more awareness and trigger
further work in the related field.

From Technical to Aesthetics Quality Assessment and Beyond: Challenges and Potential

Vlad Hosu
Dietmar Saupe
Bastian Goldluecke
Weisi Lin
Wen-Huang Cheng
John See
Lai-Kuan Wong

Every day 1.8+ billion images are being uploaded to Facebook, Instagram, Flickr, Snapchat,
and WhatsApp [6]. The exponential growth of visual media has made quality assessment
become increasingly important for various applications, from image acquisition, synthesis,
restoration, and enhancement, to image search and retrieval, storage, and recognition.

There have been two related but different classes of visual quality assessment techniques:
image quality assessment (IQA) and image aesthetics assessment (IAA). As perceptual
assessment tasks, subjective IQA and IAA share some common underlying factors that
affect user judgments. Moreover, they are similar in methodology (especially NR-IQA
in-the-wild and IAA). However, the emphasis for each is different: IQA focuses on
low-level defects e.g. processing artefacts, noise, and blur, while IAA puts more
emphasis on abstract and higher-level concepts that capture the subjective aesthetics
experience, e.g. established photographic rules encompassing lighting, composition,
and colors, and personalized factors such as personality, cultural background, age,
and emotion.

IQA has been studied extensively over the last decades [3, 14, 22]. There are three
main types of IQA methods: full-reference (FR), reduced-reference (RR), and no-reference
(NR). Among these, NRIQA is the most challenging as it does not depend on reference
images or impose strict assumptions on the distortion types and level. NR-IQA techniques
can be further divided into those that predict the global image score [1, 2, 10, 17,
26] and patch-based IQA [23, 25], naming a few of the more recent approaches.

SESSION: MAST'20 Keynote & Invited Talks I

Understanding Gender Stereotypes and Electoral Success from Visual Self-presentations
of Politicians in Social Media

Danni Chen
Kunwoo Park
Jungseock Joo

Social media have been widely used as a platform for political communication, promoting
firsthand dialogue between politicians and the public. This paper studies the role
of visual self-presentation in social media in political campaigns with a primary
focus on gender stereotypical cues exhibited in Facebook timeline posts of 562 candidates
in the 2018 U.S. general elections. We train a convolutional neural network (CNN)
that infers gender stereotypes from the photographs based on crowdsourced annotations.
Using regression analysis, we find that masculine traits are predictive factors for
winning elections for both gender and parties. In contrast, feminine traits are not
correlated with electoral success. Prediction experiments show that the visual traits
on gender stereotypes can predict the election outcomes with an accuracy of 0.739,
which was better than the performance (0.724) of making a direct prediction from the
raw photographs. Our study demonstrates that the automated visual content analysis
can reliably measure subtle, emotional, and subjective personal trait dimensions from
political images, thereby enabling systematic investigations on multi-modal political
communication via social media.

SESSION: MAST'20 Keynote & Invited Talks II

Exploring Speech Cues in Web-mined COVID-19 Conversational Vlogs

Kexin Feng
Preeti Zanwar
Amir H. Behzadan
Theodora Chaspari

The COVID-19 pandemic caused by the novel SARS-Coronavirus-2 (n-SARS-CoV-2) has impacted
people's lives in unprecedented ways. During the time of the pandemic, social vloggers
have used social media to actively share their opinions or experiences in quarantine.
This paper collected videos from YouTube to track emotional responses in conversational
vlogs and their potential associations with events related to the pandemic. In particular,
vlogs uploaded from locations in New York City were analyzed given that this was one
of the first epicenters of the pandemic in the United States. We observed some common
patterns in vloggers' acoustic and linguistic features across the time span of the
quarantine, which is indicative of changes in emotional reactivity. Additionally,
we investigated fluctuations of acoustic and linguistic patterns in relation to COVID-19
events in the New York area (e.g. the number of daily new cases, number of deaths,
and extension of stay-at-home order and state of emergency). Our results indicate
that acoustic features, such as zero-crossing-rate, jitter, and shimmer, can be valuable
for analyzing emotional reactivity in social media videos. Our findings further indicate
that some of the peaks of the acoustic and linguistic indices align with COVID-19
events, such as the peak in the number of deaths and emergency declaration.