37 lines
3.9 KiB
TeX
37 lines
3.9 KiB
TeX
% !TeX root = ../main.tex
|
|
|
|
\section{Related Work}
|
|
\label{sec:related_work}
|
|
|
|
\paragraph{Data Augmentation for Image Classification}
|
|
Data augmentation is a crucial technique for improving the performance and generalization of image classification models.
|
|
Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, roatation, blurring, color jittering, or random erasing \cite{Zhong2017} to increase the diversity of the training data without changing their semantic meaning.
|
|
With the advent of Transformers, new data augmentation operations like PatchDropout \cite{Liu2022d} have been proposed.
|
|
Other transformations like Mixup \cite{Zhang2018a}, CutMix \cite{Yun2019}, or random cropping and patching \cite{Takahashi2018} combine multiple input images.
|
|
These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment \cite{Cubuk2018} and RandAugment \cite{Cubuk2019}, which automatically search for optimal augmentation policies or 3-augment \cite{Touvron2022} which is optimized to train a ViT.
|
|
For a general overview of data augmentation techniques for image classification, we refer to \cite{Shorten2019, Xu2023d}.
|
|
|
|
We build upon these general augmentation techniques by introducing a novel approach to explicitly separate and recombine foregrounds and backgrounds for image classification.
|
|
Our approach is used in tandem with traditional data augmentation techniques to improve model performance and reduce biases.
|
|
|
|
\paragraph{Copy-Paste Augmentation}
|
|
The copy-paste augmentation \cite{Ghiasi2020}, which is used for object detection \cite{Shermaine2025,Ghiasi2020} and instance segmentation \cite{Werman2021,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
|
|
While typically human-annotated segmentation masks are used to extract the foreground objects, other foregound sources have been explored, like 3D models \cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background \cite{Dwibedi2017} or synthetic images \cite{Ge2023}.
|
|
DeePaste \cite{Werman2021} focuses on using inpainting for a more seamless integration of the pasted object.
|
|
|
|
Unlike these methods, \name focuses on image classification.
|
|
While for detection and segmentation, objects are pasted onto another image (with a different foreground) or on available or rendered background images of the target scene, we extract foreground objects and fill in the resulting holes in the background in a semantically neutral way.
|
|
This way, we can recombine any foreground object with a large variety of neutral backgrounds from natural images, enabling a controlled and diverse manipulation of image composition.
|
|
|
|
|
|
\paragraph{Model robustness evaluation}
|
|
Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
|
|
Datasets like ImageNet-C \cite{Hendrycks2019} and ImageNet-P \cite{Hendrycks2019} introduce common corruptions and perturbations.
|
|
ImageNet-E \cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
|
|
Other datasets, such as ImageNet-D \cite{Zhang2024f}, focus on varying background, texture, and material, but rely on synthetic data.
|
|
Stylized ImageNet \cite{Geirhos2018} investigates the impact of texture changes.
|
|
ImageNet-9 \cite{Xiao2020} explores background variations using segmented images, but the backgrounds are often artificial.
|
|
|
|
In contrast to these existing datasets, which are used only for evaluation, \name provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
|
|
As \name also provides controllable training set generation, it goes beyond simply measuring robustness to actively improving it through training.
|