41 lines
4.2 KiB
TeX
41 lines
4.2 KiB
TeX
|
|
\section{Related Work}
|
|
\label{sec:related_work}
|
|
|
|
\paragraph{Data Augmentation for Image Classification}
|
|
Data augmentation is a crucial technique for improving the performance and generalization of image classification models.
|
|
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, roatation, blurring, color jittering, or random erasing \cite{Zhong2017} to increase the diversity of the training data without changing their semantic meaning.
|
|
With the advent of Vision Transformers, new data augmentation operations like PatchDropout \cite{Liu2022d} have been proposed.
|
|
Other transformations like Mixup \cite{Zhang2018a}, CutMix \cite{Yun2019}, or random cropping and patching \cite{Takahashi2018} combine multiple input images.
|
|
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment \cite{Cubuk2018} and RandAugment \cite{Cubuk2019},
|
|
or 3-augment \cite{Touvron2022} which is optimized to train a ViT.
|
|
For a general overview of data augmentation techniques for image classification, we refer to \citet{Shorten2019, Xu2023d}.
|
|
|
|
We build upon these general augmentations by introducing a novel approach to explicitly separate objects and backgrounds for image classification, allowing us to -- unlike these basic transformations -- move beyond dataset image compositions.
|
|
Our approach is used additionally to strong traditional techniques to improve performance and reduce biases.
|
|
|
|
\paragraph{Copy-Paste Augmentation}
|
|
The copy-paste augmentation \cite{Ghiasi2020}, which is used only for object detection \cite{Shermaine2025,Ghiasi2020} and instance segmentation \cite{Werman2021,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
|
|
While typically human annotated segmentation masks are used to extract the foreground objects, other foregound sources have been explored, like 3D models \cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background \cite{Dwibedi2017} or synthetic images \cite{Ge2023}.
|
|
\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use normal dataset images as backgrounds.
|
|
|
|
Unlike prior copy-paste methods that overlay objects, \schemename extracts foregrounds and replaces their backgrounds with semantically neutral fills, thereby preserving label integrity while enabling controlled and diverse recombination.
|
|
|
|
\begin{figure*}[ht!]
|
|
\centering
|
|
\includegraphics[width=.9\textwidth]{img/fig-2.pdf}
|
|
\caption{Overview of \schemename. The data creation consists of two stages: Segmentation (offline, \Cref{sec:segmentation}), where we segment the foreground objects from the background and fill in the background. Recombination (online, \Cref{sec:recombination}), where we combine the foreground objects with different backgrounds to create new samples. After recombination, we apply strong, commonly used augmentation policies.}
|
|
\label{fig:method}
|
|
\end{figure*}
|
|
|
|
\paragraph{Model robustness evaluation}
|
|
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
|
|
Datasets like ImageNet-C \cite{Hendrycks2019} and ImageNet-P \cite{Hendrycks2019} introduce common corruptions and perturbations.
|
|
ImageNet-E \cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
|
|
Other datasets, such as ImageNet-D \cite{Zhang2024f}, focus on varying background, texture, and material, but rely on synthetic data.
|
|
Stylized ImageNet \cite{Geirhos2018} investigates the impact of texture changes.
|
|
ImageNet-9 \cite{Xiao2020} explores background variations using segmented images, but backgrounds are often artificial.
|
|
|
|
In contrast to these existing datasets, which are used only for evaluation, \schemename provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
|
|
As \schemename also provides controllable training set generation, it goes beyond simply measuring robustness to actively improving it through training.
|