ForAug/arxiv_v2_arXiv/sec/related_work.tex


\section{Related Work}
\label{sec:related_work}

\paragraph{Data Augmentation for Image Classification}
Data augmentation is a crucial technique for improving the performance and generalization of image classification models.
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, roatation, blurring, color jittering, or random erasing \cite{Zhong2017} to increase the diversity of the training data without changing their semantic meaning.
With the advent of Vision Transformers, new data augmentation operations like PatchDropout \cite{Liu2022d} have been proposed.
Other transformations like Mixup \cite{Zhang2018a}, CutMix \cite{Yun2019}, or random cropping and patching \cite{Takahashi2018} combine multiple input images.
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment \cite{Cubuk2018} and RandAugment \cite{Cubuk2019},
or 3-augment \cite{Touvron2022} which is optimized to train a ViT.
For a general overview of data augmentation techniques for image classification, we refer to \citet{Shorten2019, Xu2023d}.

We build upon these general augmentations by introducing a novel approach to explicitly separate objects and backgrounds for image classification, allowing us to -- unlike these basic transformations -- move beyond dataset image compositions.
Our approach is used additionally to strong traditional techniques to improve performance and reduce biases.

\paragraph{Copy-Paste Augmentation}
The copy-paste augmentation \cite{Ghiasi2020}, which is used only for object detection \cite{Shermaine2025,Ghiasi2020} and instance segmentation \cite{Werman2021,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
While typically human annotated segmentation masks are used to extract the foreground objects, other foregound sources have been explored, like 3D models \cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background \cite{Dwibedi2017} or synthetic images \cite{Ge2023}.
\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use normal dataset images as backgrounds.

Unlike prior copy-paste methods that overlay objects, \schemename extracts foregrounds and replaces their backgrounds with semantically neutral fills, thereby preserving label integrity while enabling controlled and diverse recombination.

\begin{figure*}[ht!]
    \centering
    \includegraphics[width=.9\textwidth]{img/fig-2.pdf}
    \caption{Overview of \schemename. The data creation consists of two stages: Segmentation (offline, \Cref{sec:segmentation}), where we segment the foreground objects from the background and fill in the background. Recombination (online, \Cref{sec:recombination}), where we combine the foreground objects with different backgrounds to create new samples. After recombination, we apply strong, commonly used augmentation policies.}
    \label{fig:method}
\end{figure*}

\paragraph{Model robustness evaluation}
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
Datasets like ImageNet-C \cite{Hendrycks2019} and ImageNet-P \cite{Hendrycks2019} introduce common corruptions and perturbations.
ImageNet-E \cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
Other datasets, such as ImageNet-D \cite{Zhang2024f}, focus on varying background, texture, and material, but rely on synthetic data.
Stylized ImageNet \cite{Geirhos2018} investigates the impact of texture changes.
ImageNet-9 \cite{Xiao2020} explores background variations using segmented images, but backgrounds are often artificial.

In contrast to these existing datasets, which are used only for evaluation, \schemename provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
As \schemename also provides controllable training set generation, it goes beyond simply measuring robustness to actively improving it through training.