% !TeX root = ../main.tex

\section{Related Work}
\label{sec:related_work}

\paragraph{Data Augmentation for Image Classification}
Data augmentation is a crucial technique for improving the performance and generalization of image classification models.
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, roatation, blurring, color jittering, or random erasing \cite{Zhong2017} to increase the diversity of the training data without changing their semantic meaning.
With the advent of Transformers, new data augmentation operations like PatchDropout \cite{Liu2022d} have been proposed.
Other transformations like Mixup \cite{Zhang2018a}, CutMix \cite{Yun2019}, or random cropping and patching \cite{Takahashi2018} combine multiple input images.
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment \cite{Cubuk2018} and RandAugment \cite{Cubuk2019},
% which automatically search for optimal augmentation policies 
or 3-augment \cite{Touvron2022} which is optimized to train a ViT.
For a general overview of data augmentation techniques for image classification, we refer to \cite{Shorten2019, Xu2023d}.

We build upon these general augmentations by introducing a novel approach to explicitly separate objects and backgrounds for image classification, allowing us to -- unlike these basic transforms -- move beyond dataset image compositions.
Our approach is used in tandem with traditional techniques to improve performance and reduce biases.

\paragraph{Copy-Paste Augmentation}
The copy-paste augmentation \cite{Ghiasi2020}, which is used for object detection \cite{Shermaine2025,Ghiasi2020} and instance segmentation \cite{Werman2021,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
While typically human annotated segmentation masks are used to extract the foreground objects, other foregound sources have been explored, like 3D models \cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background \cite{Dwibedi2017} or synthetic images \cite{Ge2023}.
% DeePaste \cite{Werman2021} focuses on using inpainting for a more seamless integration of the pasted object.
\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use normal dataset images as backgrounds.

% Unlike these methods, \schemename focuses on image classification.
While these methods paste objects onto another image (with a different foreground) or on available or rendered background images of the target scene, we extract foreground objects and fill in the resulting holes in the background in a semantically neutral way.
This way, we are preserving label integrity while also having diverse, neutral backgrounds available for recombination, enabling a controlled and diverse manipulation of image composition.


\paragraph{Model robustness evaluation}
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
Datasets like ImageNet-C \cite{Hendrycks2019} and ImageNet-P \cite{Hendrycks2019} introduce common corruptions and perturbations.
ImageNet-E \cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
Other datasets, such as ImageNet-D \cite{Zhang2024f}, focus on varying background, texture, and material, but rely on synthetic data.
Stylized ImageNet \cite{Geirhos2018} investigates the impact of texture changes.
ImageNet-9 \cite{Xiao2020} explores background variations using segmented images, but the backgrounds are often artificial.

In contrast to these existing datasets, which are used only for evaluation, \name provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
As \name also provides controllable training set generation, it goes beyond simply measuring robustness to actively improving it through training.