ForAug/sec/related_work.tex

% !TeX root = ../main.tex

\section{Related Work}
\label{sec:related_work}

\textbf{Data Augmentation for Image Classification.}
Data augmentation is a crucial technique for improving the model performance and generalization.
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, rotation, blurring, color jittering, or random erasing~\cite{Zhong2020} to increase training data diversity without changing the semantic meaning.
With the advent of ViTs~\cite{Dosovitskiy2021}, new data augmentation operations like PatchDropout~\cite{Liu2022d} have been proposed.
Other transformations like MixUp~\cite{Zhang2018a}, CutMix~\cite{Yun2019}, or random cropping and patching~\cite{Takahashi2018} combine multiple input images.
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment~\cite{Cubuk2019} and RandAugment~\cite{Cubuk2020}, or 3-Augment~\cite{Touvron2022}. %, which is optimized to train a ViT.
For a general overview of data augmentation for image classification, we refer to Shorten et al.~\cite{Shorten2019} and Xu et al.~\cite{Xu2023d}.

We advance these general augmentations by introducing \schemename to explicitly separate objects and backgrounds for image classification, allowing us to move beyond image compositions from the dataset.
Thus, \schemename unlocks performance improvements and bias reduction not possible with traditional data augmentation.
% \schemename is used additionally to traditional augmentation techniques to improve performance and reduce biases.

\textbf{Copy-Paste Augmentation.}
The copy-paste augmentation~\cite{Ghiasi2021}, which is used only for object detection~\cite{Shermaine2025,Ghiasi2021} and instance segmentation~\cite{Werman2022,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
While typically human annotated segmentation masks are used to extract the foreground objects, other foreground sources have been explored, like 3D models~\cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background~\cite{Dwibedi2017} or synthetic images~\cite{Ge2023}.
Kang et al.~\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use dataset images (with object) as backgrounds.

Unlike prior copy-paste methods that overlay objects, \schemename extracts foregrounds and replaces their backgrounds with semantically neutral fills, thereby preserving label integrity while enabling controlled and diverse recombination.

\textbf{Generative data augmentation.}
Recent work uses generative models to synthesize additional training images, e.g., via GANs or diffusion models driven by text prompts or attribute labels~\cite{Lu2022,Trabucco2024,Islam2024}.
Concurrently to our work, AGA~\cite{Rahat2025} combines LLMs, diffusion models, and segmentation to generate fully synthetic backgrounds from text prompts, onto which real foregrounds are pasted.
These synthetic images are appended to the original training set.

While AGA focuses on increasing diversity via prompt-driven background synthesis, \schemename uses generative models differently:
We apply inpainting only to locally neutralize the original object region, yielding semi-synthetic backgrounds that preserve the global layout, style, and characteristics of real dataset images.
% AGA's focus on synthetic background is likely to produce a shifted, or even collapsed background image distribution~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
Fully synthetic, prompt-generated backgrounds are likely to change, the effective background distribution, especially when prompts or generators are biased~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
We then do online recombination of real foregrounds with these neutralized, dataset-consistent backgrounds under explicit control of object position and scale.
Thus, \schemename acts as a dynamic large-scale augmentation method while AGA is statically expanding small-scale training sets with synthetic data.

\textbf{Model robustness evaluation.}
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
Datasets like ImageNet-A~\cite{Hendrycks2021}, ImageNet-C~\cite{Hendrycks2019} and ImageNet-P~\cite{Hendrycks2019} introduce common corruptions and perturbations.
ImageNet-E~\cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
Other datasets, such as ImageNet-D~\cite{Zhang2024f} and ImageNet-R~\cite{Hendrycks2021a}, focus on varying background, texture, and material, but rely on synthetic data.
Stylized ImageNet~\cite{Geirhos2019} investigates the impact of texture changes.
ImageNet-9~\cite{Xiao2020} explores background variations using segmented images for a 9-class subset of ImageNet with artificial backgrounds.

In contrast to these existing datasets, which are used only for evaluation, \schemename provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
As \schemename also provides controllable training data generation, it goes beyond simply measuring robustness to actively improving it through training.