Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification on Social Networks

Samuel Henrique Silva; Arun Das; Adel Aladdini; Peyman Najafirad

doi:10.1609/icwsm.v16i1.19350

Authors

Samuel Henrique Silva The University of Texas at San Antonio
Arun Das The University of Texas at San Antonio
Adel Aladdini The University of Texas at San Antonio
Peyman Najafirad The University of Texas at San Antonio

DOI:

https://doi.org/10.1609/icwsm.v16i1.19350

Keywords:

Credibility of online content, Qualitative and quantitative studies of social media, Trust; reputation; recommendation systems

Abstract

Advances in Artificial Intelligence (AI) have made it possible to automate human-level visual search and perception tasks on the massive sets of image data shared on social media on a daily basis. However, AI-based automated filters are highly susceptible to deliberate image attacks that can lead to content misclassification of cyberbulling, child sexual abuse material (CSAM), adult content, and deepfakes. One of the most effective methods to defend against such disturbances is adversarial training, but this comes at the cost of generalization for unseen attacks and transferability across models. In this article, we propose a robust defense against adversarial image attacks, which is model agnostic and generalizable to unseen adversaries. We begin with a baseline model, extracting the latent representations for each class and adaptively clustering the latent representations that share a semantic similarity. Next, we obtain the distributions for these clustered latent representations along with their originating images. We then learn semantic reconstruction dictionaries (SRD). We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution. To purify the image, we decompose the input into low and high-frequency components. The high-frequency component is reconstructed based on the best SRD from the clean dataset. In order to evaluate the best SRD, we rely on the distance between the robust latent representations and semantic cluster distributions. The output is a purified image with no perturbations. Evaluations using comprehensive datasets including image benchmarks and social media images demonstrate that our proposed purification approach guards and enhances the accuracy of AI-based image filters for unlawful and harmful perturbed images considerably.

Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification on Social Networks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information