Academic Project Page

Recent developments in diffusion models, particularly with latent diffusion and classifier-free guidance, have produced highly realistic images that can deceive humans. In the detection domain, the need for generalization across diverse generative models has led many to rely on frequency fingerprints or traces for identifying synthetic images therefore often compromising the robustness against complex image degradations. In this paper, we propose a novel approach that does not rely on frequency or direct image-based features. Instead, we leverage pre-trained diffusion models and a sampling technique to detect fake images. Our methodology is based on two key insights: (i) pre-trained diffusion models already contain rich information about the real data distribution, enabling the differentiation between real and fake images through strategic sampling; (ii) the dependency of textual conditional diffusion models on classifier-free guidance, coupled with higher guidance weights, enforces the discernibility between real and diffusion generated fake images. We evaluate our method across the GenImage dataset, with eight distinct image generators and various image degradations. Our method demonstrates its efficacy and robustness in detecting multiple types of AI-generated synthetic images, setting the new state of the art.

Your diffusion model is an implicit synthetic image detector

Abstract

BibTeX