IH&MMSec '19 - Proceedings of the ACM Workshop on Information Hiding and Multimedia Security

IH&MMSec'19- Proceedings of the ACM Workshop on Information Hiding and Multimedia Security

Full Citation in the ACM Digital Library

SESSION: Keynote Presentation #1

Session details: Keynote Presentation #1

Remi Cogranne

Reverse Engineering: What Can We Learn From a Digital Image About Its Own History ?

Jean-Michel Morel

This keynote presentation reviews the algorithms able to analyse a digital image and able to retrieve part of its processing history.This problem is relevant because more and more images happen to have lost their native EXIF metadata. In this presentation is described several tools gathering information about an image's compression, resampling, cropping, its gamma correction and its demosaicing process. This information may be used to detect images manipulations and sometimes its tampering. A common denominator of all detection tools is that they need a false alarms control. I'll illustrate how a false alarm rate can be rigorously associated to each detection.

Joint work with Quentin Bammey, Miguel Colom, Thibaud Ehret, Rafael Grompone, Tina Nikoukhah

SESSION: Session: Video & Audio Steganography

Session details: Session: Video & Audio Steganography

Siwei Lyu

Recurrent Convolutional Neural Networks for AMR Steganalysis Based on Pulse Position

Chen Gong
Xiaowei Yi
Xianfeng Zhao
Yi Ma

With the rapid development of stream multimedia, the adaptive multi-rate (AMR) audio steganography are emerging recently. However, the traditional steganalysis methods face great challenges in detecting short time speech at low embedding rates. To address this problem, we propose a steganalytic scheme by combining Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), SRCNet. AMR fixed codebook (FCB) steganography embed messages by modifying the pulse positions, which would destroy the FCB correlation. Firstly we analyzed the FCB correlations at different distances, and summarized these correlations into four categories. Furthermore, we utilizes RNN to extract higher level contextual representations of FCBs and CNN to fuse spatial-temporal features for the steganalysis. The proposed approach was evaluated on a public data-set. The experiment results validate that the proposed framework greatly outperforms the existing state-of-the-art methods. The correct detection rate of SRCNet has been improved above at least 10% when the sample is as short as 100ms at the 20% embedding rate. In particular, the network achieves the significant improvements for detecting the STCs based adaptive AMR steganography.

Defining Joint Embedding Distortion for Adaptive MP3 Steganography

Yunzhao Yang
Yuntao Wang
Xiaowei Yi
Xianfeng Zhao
Yi Ma

In this paper, a universal joint embedding distortion function (JED) is proposed to improve the undetectability and imperceptibility of MP3 steganography, which can be applied to Huffman codeword mapping (HCM) and sign bit flipping (SBF). Content-aware and statistical distortions are synthetically modeled to formulate the atom modification of the quantified modified discrete cosine transform (QMDCT) coefficients. On the one hand, to retain the hearing imperceptibility, the absolute threshold of hearing is employed to measure the auditory sensitivity of each QMDCT coefficient. On the other hand, considering most of the existing universal MP3 steganalysis features are designed based on correlations, the forward and backward transition probability are utilized to characterize the correlations between adjacent QMDCT coefficients. What's more, we present an implementation of JED in sign bits domain. Experimental results demonstrate that our method is able to achieve higher embedding capacity and better imperceptibility. The detection accuracy of the proposed scheme is about 75% with the bitrate of 320kbps and embedding rate of 11kbit/s, which is respectively decreased by 9.54% ~ 16.94% than existing MP3 steganographic methods.

Adaptive VP8 Steganography Based on Deblocking Filtering

Pei Xie
Hong Zhang
Weike You
Xianfeng Zhao
Jianchang Yu
Yi Ma

In this paper, a novel deblocking filtering-based VP8 steganographic scheme is proposed. The unique aspect of this work and one that distinguishes it from the prior art is that we effectively exploit the characteristics of deblocking filtering. We propose to embed the secret messages by comparing the quantized discrete cosine transform coefficients before and after the in-loop filtering. In the process of encoding, given one frame, first, we encode it to obtain the quantized discrete cosine transform coefficients. Second, a new set of coefficients is obtained by re-encoding the filtered frame. Third, the distortion function is defined by comparing the difference between the two sets of coefficients. Finally, adaptive embedding is realized by using the syndrome-trellis codes. Experimental results show that satisfactory levels of visual quality and steganographic security could be achieved with adequate payloads.

Vibrational Covert Channels using Low-Frequency Acoustic Signals

Nikolay Matyunin
Yujue Wang
Stefan Katzenbeisser

In this paper, we examine how acoustic signals in sub-bass and infrasonic range can be used to establish a vibrational covert channel between speaker-equipped computers and mobile devices. We show that typical consumer speakers are capable of producing low-frequency sounds, which are not perceivable by humans. At the same time, we show that producing such sounds by the speaker's woofer inevitably generates slight vibrations of the speaker and the surface where it is located. Being unnoticeable to people, such vibrations can be captured by the accelerometer sensor of a mobile device located on the same surface. Therefore, information can be encoded into low-frequency sounds played by a speaker and received on a mobile device by analyzing the produced vibrations. Note that access to the accelerometer on modern mobile devices does not require any user permissions, making the transmission completely unnoticeable. We evaluate the presented covert channel for different speakers, apply it to several application scenarios, and give an overview of possible countermeasures.

SESSION: Session: Image Steganography

Session details: Session: Image Steganography

Patrick Bas

Towards Automatic Embedding Cost Learning for JPEG Steganography

Jianhua Yang
Danyang Ruan
Xiangui Kang
Yun-Qing Shi

Current mainstream methods for digital image steganography are content adaptive. That is, the secret messages are embedded in the complicated region in the cover image while minimizing the embedding distortion so as to suppress statistical detectability. Since there is already a practical encoding scheme for data embedding near the payload-distortion bound, the design of the embedding cost function becomes a deterministic part in steganography. Unlike the traditional heuristic hand-crafted method, this paper proposes a novel generative adversarial network based framework to automatically learn the embedding cost function for JPEG steganography. The proposed framework consists of a generator, a gradient-descent friendly inverse discrete cosine transformation module, an embedding simulator and a discriminator for steganalysis. Through training the generator and discriminator in alternation, the embedding cost function can finally be obtained by the trained generator. Experimental results demonstrate that our method can automatically learn a reasonable embedding cost function and achieve a satisfying performance.

Effect of JPEG Quality on Steganographic Security

Jan Butora
Jessica Fridrich

Abstract This work investigates both theoretically and experimentally the security of JPEG steganography as a function of the quality factor. For a fixed relative payload, modern embedding schemes, such as J-UNIWARD and UED-JC, exhibit surprising non-monotone trends due to rounding and clipping of quantization steps. Their security generally increases with increasing quality factor but starts decreasing for qualities above 95. In contrast, old-fashion steganography, such as Jsteg, OutGuess, and model-based steganography, exhibit complementary trends. The results of empirical detectors closely match the trends exhibited by the KL divergence computed between models of cover and stego DCT modes. In particular, our analysis shows that the main reason for the complementary trends is the way modern schemes attenuate embedding change rates with increasing spatial frequency. Our model also provides guidance on how to adjust the embedding algorithm J-UNIWARD to substantially improve its security for high quality factors.

Computing Dependencies between DCT Coefficients for Natural Steganography in JPEG Domain

Théo Taburet
Patrick Bas
Jessica Fridrich
Wadih Sawaya

This short paper is an extension of a family of embedding schemes called Natural Steganography, which embeds a message by mimicking heteroscedastic sensor noise in the JPEG domain. Under the assumption that the development from RAW uses linear de- mosaicking, we derive a closed-form for the covariance matrix of DCT coefficients from 3 × 3 JPEG blocks. This computation relies on a matrix formulation of all steps involved in the development pipeline, which includes demosaicking, conversion to luminance, DCT transform, and reordering. This matrix is then used for pseudo-embedding in the JPEG domain on four lattices of 8 × 8 DCT blocks. The results obtained with the computed covariance matrix are contrasted with the results previously obtained with the covariance matrix estimated using Monte Carlo sampling and scaling. The empirical security using DCTR features at JPEG quality 100 increased from PE = 14% using covariance estimation and scaling to PE = 43% using the newly derived analytic form.

Enhancing Steganography via Stego Post-processing by Reducing Image Residual Difference

Bolin Chen
Weiqi Luo
Peijia Zheng

Most modern steganography methods focus on designing an effective cost function. To our best knowledge, there is no related works concerned about modifying stego to enhance steganography security. In this paper, therefore, we propose a novel post-processing for stego image in the spatial domain. To ensure the correct extraction of hidden message, our method restricts the modification amplitude of each pixel according to the characteristics of STCs (Syndrome-Trellis Codes). To enhance steganography security, our method traverses the stego image pixel by pixel, and modifies those pixels that can reduce the image residual difference between cover and stego under some criterion. Experimental results show that the proposed method can improve the security of current steganography especially for large payloads, e.g. larger than 0.3 bpp. In addition, the post-modification rate is rather low, for instance less than 8 \textpertenthousand \ pixels have been changed in the enhanced stego image for the five existing steganography methods for payload as large as 0.5 bpp.

SESSION: Session: Forensics & Anti-forensics

Session details: Session: Forensics & Anti-forensics

Rainer Bohme

Camera Identification from HDR Images

Morteza Darvish Morshedi Hosseini
Miroslav Goljan

Performance of camera identification methods based on PRNU is very sensitive to geometric operations applied to images during acquisition and processing. Handling images that have been geometrically transformed, such as rotated, downsampled, and/or cropped requires overcoming pixel desynchronization problem. This work expands applicability of camera identification methods based on PRNU to the class of HDR images. Geometric transformations in HDR images revealed in this work are reversed in a series of steps involving block-wise PRNU matching. Efficiency of this method is then tested on HDR images from publicly available UNIFI dataset spanning 26 cameras of mobile devices.

Estimation of Copy-sensitive Codes Using a Neural Approach

Rohit Yadav
Iuliia Tkachenko
Alain Trémeau
Thierry Fournel

Copy sensitive graphical codes are used as anti-counterfeiting solution in packaging and document protection. Their security is funded on a design hard-to-predict after print and scan. In practice there exist different designs. Here random codes printed at the printer resolution are considered. We suggest an estimation of such codes by using neural networks, an in-trend approach which has however not been studied yet in the present context. In this paper, we test a state-of-the-art architecture efficient in the binarization of handwritten characters. The results show that such an approach can be successfully used by an attacker to provide a valid counterfeited code so fool an authentication system.

De-identification Without Losing Faces

Yuezun Li
Siwei Lyu

Training of deep learning models for computer vision requires large image or video datasets from real world. Often, in collecting such datasets, we also need to protect the privacy of the people captured in the images or videos, while still preserve useful attributes such as facial expressions. In this work, we describe a new face de-identification method to achieve this, which is based on a face attribute transfer model (FATM). FATM is a deep neural network model trained to map non-identity related facial attributes to the face of donors, who are a small number of consented subjects. Using the donors' faces ensures the natural appearance of the synthesized faces, and FATM blends the donors' facial attributes to those of the original faces to diversify the appearance of the synthesized faces. Experimental results on several sets of images and videos demonstrate the effectiveness of our face de-ID algorithm.

Impact of Spatial Constraints when Signing in Uncontrolled Mobile Conditions

Majd Abazid
Nesma Houmani
Sonia Garcia-Salicetti

In this paper, we study the impact of uncontrolled mobile conditions on signatures, when signing on two touch-screen devices of different size. 74 persons captured their signatures on both an iPad and an iPhone with different signing areas. The study exploited several quantitative indicators: an intra-personal variability measure, a statistical quality measure, and verification performance with two classifiers. We show that for 69% of writers, their signatures are more complex when made on the large surface of the iPad. This result reflects that the majority of writers are less comfortable with signing on a small handheld device with a very constrained signing surface; besides, the used rubber-tipped stylus does not offer an optimal writing precision regarding the constrained signing surface. For the remaining writers, results show that when signing on the iPad, writers tend to fill out, spontaneously, the whole available space; this leads to information loss reflected by a reduced signature complexity and stability. Performance assessment shows a significant degradation when test and reference signatures are not captured on the same platform. When test and reference signatures are captured on the same platform, performance is better when the writer signs on the small surface available on the iPhone.

SESSION: Special Session 1: Media Forensics -- Fake or Reality ?

Session details: Special Session 1: Media Forensics -- Fake or Reality ?

Luisa Verdoliva
Jana Dittmann

A Face Morphing Detection Concept with a Frequency and a Spatial Domain Feature Space for Images on eMRTD

Tom Neubert
Christian Kraetzer
Jana Dittmann

Since the face morphing attack was introduced by Ferrara et al. in 2014, the detection of face morphings has become a wide spread topic in image forensics. By now, the community is very active and has reported diverse detection approaches. So far, the evaluations are mostly performed on images without post-processing. Face images stored within electronic machine readable documents (eMRTD) are ICAO-passport-scaled to a resolution of 413x531 and a JPG or JP2 lesize of 15 kilobytes. This paper introduces a face morphing detection concept with 3 modules (ICAO-aligned pre- processing module, feature extraction module and classi cation module), tailored for such images on eMRTD. In this work we exemplary design and evaluate two feature spaces for the feature extraction module, a frequency domain and a spatial domain feature space. Our evaluation will compare both feature spaces and is carried out with 66,229 passport-scaled images (64,363 morphed face images and 1,866 authentic face images) which are completly independent from training and include all images provided for the IHMMSEC'19 special session: "Media Forensics - Fake or Real?". Furthermore, we investigate the in uence of di erent morph gen- eration pipelines to the detection accuracies of the concept and we analyse the impact of neutral and smiling genuine faces to the morph detector performance. The evaluation determines a detection rate of 86.0% for passport-scaled morphed images with a false alarm rate of 4.4% for genuine images for the spatial domain feature space

Image Forensics from Chroma Subsampling of High-Quality JPEG Images

Benedikt Lorch
Christian Riess

The JPEG compression format provides a rich source of forensic traces that include quantization artifacts, fingerprints of the container format, and numerical particularities of JPEG compressors. Such a diverse set of cues serves as the basis for a forensic examiner to determine origin and authenticity of an image. In this work, we present a novel artifact that can be used to fingerprint the JPEG compression library. The artifact arises from chroma subsampling in one of the most popular JPEG implementations. Due to integer rounding, every second column of the compressed chroma channel appears on average slightly brighter than its neighboring columns, which is why we call the artifact a "chroma wrinkle". We theoretically derive the chroma wrinkle footprint in DCT domain, and use this footprint for detecting chroma wrinkles. The artifact is detected with more than 90% accuracy on images of JPEG quality 75 and above. Our experiments indicate that the artifact can also be used for manipulation localization, and that it is robust to several global postprocessing operations.

A Simple and Effective Initialization of CNN for Forensics of Image Processing Operations

Ivan Castillo Camacho
Kai Wang

In this paper we present a simple yet effective initialization method for convolutional neural networks (CNNs). The proposed method extends the well-known Xavier initialization and can cope well with CNNs used for forensic detection of image processing operations. Our initialization inherits the simplicity and advantages of the Xavier initialization, and the difference is that our method generates a set of high-pass filters for the initialization of CNN's first layer. This allows us to better identify forensic traces which usually lie towards the high-frequency part of the image. We test the proposed method with two CNNs for two forensic problems, i.e., a multiclass classification problem of a group of image processing operations and a median filtering forensic problem with JPEG post-processing. Experimental results show the utility of our initialization.

Exposing GAN-synthesized Faces Using Landmark Locations

Xin Yang
Yuezun Li
Honggang Qi
Siwei Lyu

Generative adversary networks (GANs) have recently led to highly realistic image synthesis results. In this work, we describe a new method to expose GAN-synthesized images using the locations of the facial landmark points. Our method is based on the observations that the facial parts configuration generated by GAN models are different from those of the real faces, due to the lack of global constraints. We perform experiments demonstrating this phenomenon, and show that an SVM classifier trained using the locations of facial landmark points is sufficient to achieve good classification performance for GAN-synthesized faces.

CNN-based Rescaling Factor Estimation

Chang Liu
Matthias Kirchner

We demonstrate the estimation of image resampling parameters in a deep learning framework by regressing the rescaling factor of 64x64-sized patches on features learned directly from grayscale intensities by a convolutional neural network (CNN). Our end-to-end network design comprises a simple concatenation of 25 convolutional layers with small 3x3 receptive fields and largely abstains from the use of pooling. We report experimental results on a large set of rescaled patches, for which the proposed CNN outperforms state-of-the-art frequency-domain estimators particularly in the case of downscaling. A critical discussion of sensitivities to mismatch between training and testing data points to failure cases.

SESSION: Special Session 2: ALASKA steganalysis challenge 'Into the wild'

Session details: Special Session 2: ALASKA steganalysis challenge 'Into the wild'

Dirk Borghys

The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis

Rémi Cogranne
Quentin Giboulot
Patrick Bas

This paper presents ins and outs of the ALASKA challenge, a steganalysis challenge built to reflect the constraints of a forensic steganalyst. We motivate and explain the main differences w.r.t. the BOSS challenge (2010), specifically the use of a ranking metric prescribing high false positive rates, the analysis of a large diversity of different image sources and the use of a collection of steganographic schemes adapted to handle color JPEGs. The core of the challenge is also described, this includes the RAW image data-set, the implementations used to generate cover images and the specificities of the embedding schemes. The very first outcomes of the challenge are then presented, and the impacts of different parameters such as demosaicking, filtering, image size, JPEG quality factors and cover-source mismatch are analyzed. Eventually, conclusions are presented, highlighting positive and negative points together with future directions for the next challenges in practical steganalysis.

Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain

Yassine Yousfi
Jan Butora
Jessica Fridrich
Quentin Giboulot

This paper describes the architecture and training of detectors developed for the ALASKA steganalysis challenge. For each quality factor in the range 60-98, several multi-class tile detectors implemented as SRNets were trained on various combinations of three input channels: luminance and two chrominance channels. To accept images of arbitrary size, the detector for each quality factor was a multi-class multi-layered perceptron trained on features extracted by the tile detectors. For quality 99 and 100, a new "reverse JPEG compatibility attack" was developed and also implemented using the SRNet via the tile detector. Throughout the paper, we explain various improvements we discovered during the course of the competition and discuss the challenges we encountered and trade offs that had to be adopted in order to build a detector capable of detecting steganographic content in a stego source of great diversity.

SESSION: Keynote Presentation #2

Session details: Keynote Presentation #2

Juan Troncoso-Pastoriza

Trusting Machine Learning: Privacy, Robustness, and Transparency Challenges

Reza Shokri

SESSION: Session: Watermarking and Applied Crypto

Session details: Session: Watermarking and Applied Crypto

Andreas Westfeld

Proving Multimedia Integrity using Sanitizable Signatures Recorded on Blockchain

Karthik Nandakumar
Nalini Ratha
Sharathchandra Pankanti

While significant advancements have been made in the field of multimedia forensics to detect altered content, existing techniques mostly focus on enabling the content recipient to verify the content integrity without any inputs from the content creator. In many application scenarios, the creator has a strong incentive to establish the provenance and integrity of the multimedia data created and released by him. Hence, there is a strong need for mechanisms that allow the content creator to prove the authenticity of the released content. Since blockchain technology provides an immutable distributed database, it is an ideal solution for reliably time-stamping content with its creation time and storing an irrefutable signature of the content at the time of its creation. However, a simple digital signature scheme does not allow modification of the content after the initial commitment. Authorized multimedia content alteration by its creator is often necessary (e.g., redaction of faces to protect the privacy of individuals in a video, redaction of sensitive fields in a text document) before the content is distributed. The main contributions of this paper are: (i) a novel sanitizable signature scheme that enables the content creator to prove the integrity of the redacted content, while preventing the recipients from reconstructing the redacted segments based on the published commitment, and (ii) a blockchain-based solution for securely managing the sanitizable signature. The proposed solution employs a robust hashing scheme using chameleon hash function and Merkle tree to generate the initial signature, which is stored on the blockchain. The auxiliary data required for the integrity verification step is retained by the content creator and only a signature of this auxiliary data is stored on the blockchain. Any modifications to the multimedia content requires only updating the signature of the auxiliary data, which is securely recorded on the blockchain. We demonstrate that the proposed approach enables verification of integrity of redacted multimedia content without compromising the content privacy requirements.

Revisiting Multivariate Lattices for Encrypted Signal Processing

Alberto Pedrouzo-Ulloa
Juan R. Troncoso-Pastoriza
Fernando Pérez-González

Multimedia contents are inherently sensitive signals that must be protected when processed in untrusted environments. The field of Secure Signal Processing addresses this challenge by developing methods which enable operating with sensitive signals in a privacy-conscious way. Recently, we introduced a hard lattice problem called m-RLWE (multivariate Ring Learning with Errors) which gives support to efficient encrypted processing of multidimensional signals. Afterwards, Bootland et al. presented an attack to m-RLWE that reduces the security of the underlying scheme from a lattice with dimension \prod_in_i to \max\n_i\ _i . Our work introduces a new pre-/post-coding block that addresses this attack and achieves the efficient results of our initial approach while basing its security directly on RLWE with dimension \prod_in_i, hence preserving the security and efficiency originally claimed. Additionally, this work provides a detailed comparison between a conventional use of RLWE, m-RLWE and our new pre-/post-coding procedure, which we denote "packed''-RLWE. Finally, we discuss a set of encrypted signal processing applications which clearly benefit from the proposed framework, either alone or in a combination of baseline RLWE, m-RLWE and "packed''-RLWE.

Watermarking Error Exponents in the Presence of Noise: The Case of the Dual Hypercone Detector

Teddy Furon

The study of the error exponents of zero-bit watermarking is addressed in the article by Comesana, Merhav, and Barni, under the assumption that the detector relies solely on second order joint empirical statistics of the received signal and the watermark. This restriction leads to the well-known dual hypercone detector, whose score function is the absolute value of the normalized correlation. They derive the false negative error exponent and the optimum embedding rule. However, they only focus on high SNR regime, i.e. the noiseless scenario.

This paper extends this theoretical study to the noisy scenario. It introduces a new definition of watermarking robustness based on the false negative error exponent, derives this quantity for the dual hypercone detector, and shows that its performances is almost equal to Costa's lower bound.

Nearest Neighbor Decoding for Tardos Fingerprinting Codes

Thijs Laarhoven

Over the past decade, various improvements have been made to Tardos' collusion-resistant fingerprinting scheme [Tardos, STOC 2003], ultimately resulting in a good understanding of what is the minimum code length required to achieve collusion-resistance. In contrast, decreasing the cost of the actual decoding algorithm for identifying the potential colluders has received less attention, even though previous results have shown that using joint decoding strategies, deemed too expensive for decoding, may lead to better code lengths. Moreover, in dynamic settings a fast decoder may be required to provide answers in real-time, further raising the question whether the decoding costs of score-based fingerprinting schemes can be decreased with a smarter decoding algorithm. In this paper we show how to model the decoding step of score-based fingerprinting as a nearest neighbor search problem, and how this relation allows us to apply techniques from the field of (approximate) nearest neighbor searching to obtain decoding times which are sublinear in the total number of users. As this does not affect the encoding and embedding steps, this decoding mechanism can easily be deployed within existing fingerprinting schemes, and this may bring a truly efficient joint decoder closer to reality. Besides the application to fingerprinting, similar techniques can potentially be used to decrease the decoding costs of group testing methods, which may be of independent interest.

SESSION: Session: Deep Learning & Audio Steganalysis

Session details: Session: Deep Learning & Audio Steganalysis

Marc Chaumont

Reference Channels for Steganalysis of Images with Convolutional Neural Networks

Mo Chen
Mehdi Boroumand
Jessica Fridrich

When available, reference signals may dramatically improve the accuracy of steganalysis. Particularly powerful reference signals are embedding invariants that exist when the steganographic algorithm swaps values from small disjoint subsets of the cover elements' dynamic range, such as, but not limited to, embedding schemes utilizing least significant bit replacement. This paper describes a general method how to prepare such reference signals for a certain type of embedding operations, and incorporate them in detectors built as convolutional networks to improve their detection accuracy. The beneficial effect of reference signals is shown experimentally in both the spatial and especially JPEG domain, on model-based steganography and a generic LSB flipper with and without stochastic restoration of the histogram (OutGuess).

A Customized Convolutional Neural Network with Low Model Complexity for JPEG Steganalysis

Junwen Huang
Jiangqun Ni
Linhong Wan
Jingwen Yan

Nowadays, convolutional neural network (CNN) is appied to different types of image classification tasks and outperforms almost all traditional methods. However, one may find it difficult to apply CNN to JPEG steganalysis because of the extremely low SNR (embedding messages to image contents) in the task. In this paper, a selection-channel-aware CNN for JPEG steganalysis is proposed by incorporating domain knowledge. Specifically, instead of random strategy, kernels of the first convolutional layer are initialized with hand-crafted filters to suppress the image content. Then, truncated linear unit (TLU), a heuristically-designed activation function, is adopted in the first layer as the activation function to better adapt to the distribution of feature maps. Finally, we use a generalized residual learning block to incorporate the knowledge of selection channel in the proposed CNN to further boost its performance. J-UNIWARD, a state-of-the-art JPEG steganographic scheme, is used to evaluate the performance of the proposed CNN and other competing JPEG steganalysis methods. Experiment results show that the proposed CNN steganalyzer outperforms other feature-based methods and rivals the state-of-the-art CNN-based methods with much reduced model complexity, at different payloads.

Steganalysis of VoIP Streams with CNN-LSTM Network

Hao Yang
Zhongliang Yang
Yongfeng Huang

Steganalysis of the Quantization Index Modulation (QIM) steganography in VoIP (Voice-over IP) stream is conducted in this research. VoIP is a popular media streaming and communication service on the Internet. QIM steganography makes it possible to hide secret information in VoIP streams. Detecting short and low embedding rates of QIM steganography samples remains an unsolved challenge. Recently, neural network models have been demonstrated to be capable of achieving remarkable performances and be successfully applied to many different tasks. The mainstream architectures of neural network include Convolution Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which adopt totally different ways to understand various signals. In this paper, we first indicate a proper way to combine the strengths of these two architectures and then construct a novel and unified model called CNN-LSTM network to detect QIM-based steganography. In our model, Bidirectional Long Short-Term Memory Recurrent Neural Network (Bi-LSTM) is utilized to capture long time contextual information in carriers and CNN was used subsequently to capture both local features and global ones as well as temporal carrier features. Experiments showed that our model can achieve the state-of-art result in detecting QIM-based steganography in VoIP streams.

Audio Steganalysis with Improved Convolutional Neural Network

Yuzhen Lin
Rangding Wang
Diqun Yan
Li Dong
Xueyuan Zhang

Deep learning, especially the convolutional neural network (CNN), has enjoyed significant success in many fields, e.g., image recognition. Recently, CNN has successfully applied to multimedia steganalysis. However, the detection performance is still unsatisfactory. In this work, we propose an improved CNN-based method for audio steganalysis. Specifically, a special convolutional layer is first carefully designed, which could capture the minor steganographic noise. Then, a truncated linear unit is adapted to activate the output of shallow convolutional layer. In addition, we employ the average pooling to minimize the over-fitting risk. Finally, a parameter transfer strategy is adopted, aiming to boost the detection performance for the low embedding-rate cases. The experimental results evaluated on 30,000 audio clips verify the effectiveness of our method for a variety of embedding rates. Compared with the existing CNN-based steganalysis methods, our proposed method could achieve superior performance. To facilitate the reproducible research, the source code will be released at GitHub.

SESSION: Session: Methods for Steganography and Steganalysis

Session details: Session: Methods for Steganography and Steganalysis

Miroslav Goljan

Exploiting Adversarial Embeddings for Better Steganography

Solène Bernard
Tomás Pevný
Patrick Bas
John Klein

This work proposes a protocol to iteratively build a distortion function for adaptive steganography while increasing its practical security after each iteration. It relies on prior art on targeted attacks and iterative design of steganalysis schemes. It combines targeted attacks on a given detector with a \min\max strategy, which dynamically selects the most difficult stego content associated with the best classifier at each iteration. We theoretically prove the convergence, which is confirmed by the practical results. Applied on J-Uniward this new protocol increases \perr from 7% to 20% estimated by Xu-Net, and from 10% to 23% for a non-targeted steganalysis by a linear classifier with GFR features.

Detection of Classifier Inconsistencies in Image Steganalysis

Daniel Lerch-Hostalot
David Megías

In this paper, a methodology to detect inconsistencies in classification-based image steganalysis is presented. The proposed approach uses two classifiers: the usual one, trained with a set formed by cover and stego images, and a second classifier trained with the set obtained after embedding additional random messages into theoriginal training set. When the decisions of these two classifiers are not consistent, we know that the prediction is not reliable. The number of inconsistencies in the predictions of a testing set may indicate that the classifier is not performing correctly in the testing scenario. This occurs, for example, in case of cover source mismatch,or when we are trying to detect a steganographic method that theclassifier is no capable of modelling accurately. We also show how the number of inconsistencies can be used to predict the reliability of the classifier (classification errors).

Fast and Effective Global Covariance Pooling Network for Image Steganalysis

Xiaoqing Deng
Bolin Chen
Weiqi Luo
Da Luo

Recently, deep learning based methods have achieved superior performance compared to conventional methods based on hand-crafted features in image steganalysis. However, most modern methods are usually quite time consuming. For instance, it takes over 3 days to train a state-of-the-art neural network, i.e. SRNet [3] in our experiments. In this paper, therefore, we propose a fast yet very effective convolutional neural network (CNN) for image steganalysis in spatial domain. To make a good tradeoff between training time and performance, we carefully design the architecture of the proposed network according to our extensive experiments. In addition, we first introduce the global covariance pooling into steganalysis to exploit the second-order statistic of high-level features for further improving the performance. Experimental results show that the proposed network can outperform the current best one, while its training time is significantly reduced.

Linear Hash Functions as a Means of Distortion--Rate Optimization in Data Embedding

Boris Ryabko
Andrey Fionov

Embedding hidden data is usually performed by introducing some distortions (errors) in cover objects. If the distortions exceed a certain bound, steganalysis can detect the presence of hidden data. So the problem is to embed as much data as possible and not exceed a permissible distortion level to ensure indetectability. We describe a general class of stegosystems that solves the problem by employing linear hash functions. The suggested stegosystems allow to transmit hidden information of the amount asymptotically close to the maximum possible under the given distortion.