47 lines
5.3 KiB
TeX
47 lines
5.3 KiB
TeX
% !TeX root = ../main.tex
|
|
|
|
\section{Related Work}
|
|
\label{sec:related_work}
|
|
|
|
\textbf{Data Augmentation for Image Classification.}
|
|
Data augmentation is a crucial technique for improving the model performance and generalization.
|
|
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, rotation, blurring, color jittering, or random erasing~\cite{Zhong2020} to increase training data diversity without changing the semantic meaning.
|
|
With the advent of ViTs~\cite{Dosovitskiy2021}, new data augmentation operations like PatchDropout~\cite{Liu2022d} have been proposed.
|
|
Other transformations like MixUp~\cite{Zhang2018a}, CutMix~\cite{Yun2019}, or random cropping and patching~\cite{Takahashi2018} combine multiple input images.
|
|
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment~\cite{Cubuk2019} and RandAugment~\cite{Cubuk2020}, or 3-Augment~\cite{Touvron2022}. %, which is optimized to train a ViT.
|
|
For a general overview of data augmentation for image classification, we refer to Shorten et al.~\cite{Shorten2019} and Xu et al.~\cite{Xu2023d}.
|
|
|
|
We advance these general augmentations by introducing \schemename to explicitly separate objects and backgrounds for image classification, allowing us to move beyond image compositions from the dataset.
|
|
Thus, \schemename unlocks performance improvements and bias reduction not possible with traditional data augmentation.
|
|
% \schemename is used additionally to traditional augmentation techniques to improve performance and reduce biases.
|
|
|
|
\textbf{Copy-Paste Augmentation.}
|
|
The copy-paste augmentation~\cite{Ghiasi2021}, which is used only for object detection~\cite{Shermaine2025,Ghiasi2021} and instance segmentation~\cite{Werman2022,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
|
|
While typically human annotated segmentation masks are used to extract the foreground objects, other foreground sources have been explored, like 3D models~\cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background~\cite{Dwibedi2017} or synthetic images~\cite{Ge2023}.
|
|
Kang et al.~\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use dataset images (with object) as backgrounds.
|
|
|
|
Unlike prior copy-paste methods that overlay objects, \schemename extracts foregrounds and replaces their backgrounds with semantically neutral fills, thereby preserving label integrity while enabling controlled and diverse recombination.
|
|
|
|
\textbf{Generative data augmentation.}
|
|
Recent work uses generative models to synthesize additional training images, e.g., via GANs or diffusion models driven by text prompts or attribute labels~\cite{Lu2022,Trabucco2024,Islam2024}.
|
|
Concurrently to our work, AGA~\cite{Rahat2025} combines LLMs, diffusion models, and segmentation to generate fully synthetic backgrounds from text prompts, onto which real foregrounds are pasted.
|
|
These synthetic images are appended to the original training set.
|
|
|
|
While AGA focuses on increasing diversity via prompt-driven background synthesis, \schemename uses generative models differently:
|
|
We apply inpainting only to locally neutralize the original object region, yielding semi-synthetic backgrounds that preserve the global layout, style, and characteristics of real dataset images.
|
|
% AGA's focus on synthetic background is likely to produce a shifted, or even collapsed background image distribution~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
|
|
Fully synthetic, prompt-generated backgrounds are likely to change, the effective background distribution, especially when prompts or generators are biased~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
|
|
We then do online recombination of real foregrounds with these neutralized, dataset-consistent backgrounds under explicit control of object position and scale.
|
|
Thus, \schemename acts as a dynamic large-scale augmentation method while AGA is statically expanding small-scale training sets with synthetic data.
|
|
|
|
\textbf{Model robustness evaluation.}
|
|
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
|
|
Datasets like ImageNet-A~\cite{Hendrycks2021}, ImageNet-C~\cite{Hendrycks2019} and ImageNet-P~\cite{Hendrycks2019} introduce common corruptions and perturbations.
|
|
ImageNet-E~\cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
|
|
Other datasets, such as ImageNet-D~\cite{Zhang2024f} and ImageNet-R~\cite{Hendrycks2021a}, focus on varying background, texture, and material, but rely on synthetic data.
|
|
Stylized ImageNet~\cite{Geirhos2019} investigates the impact of texture changes.
|
|
ImageNet-9~\cite{Xiao2020} explores background variations using segmented images for a 9-class subset of ImageNet with artificial backgrounds.
|
|
|
|
In contrast to these existing datasets, which are used only for evaluation, \schemename provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
|
|
As \schemename also provides controllable training data generation, it goes beyond simply measuring robustness to actively improving it through training.
|