first commit of eccv data

2026-02-24 11:13:52 +01:00
commit 0e528233a4
22 changed files with 5743 additions and 0 deletions
--- a/sec/related_work.tex
+++ b/sec/related_work.tex
@@ -0,0 +1,46 @@
+% !TeX root = ../main.tex
+
+\section{Related Work}
+\label{sec:related_work}
+
+\textbf{Data Augmentation for Image Classification.}
+Data augmentation is a crucial technique for improving the model performance and generalization.
+Traditional augmentation strategies rely on simple geometric or color-space transformations like cropping, flipping, rotation, blurring, color jittering, or random erasing~\cite{Zhong2020} to increase training data diversity without changing the semantic meaning.
+With the advent of ViTs~\cite{Dosovitskiy2021}, new data augmentation operations like PatchDropout~\cite{Liu2022d} have been proposed.
+Other transformations like MixUp~\cite{Zhang2018a}, CutMix~\cite{Yun2019}, or random cropping and patching~\cite{Takahashi2018} combine multiple input images.
+These simple transformations are usually bundled to form more complex augmentation policies like AutoAugment~\cite{Cubuk2019} and RandAugment~\cite{Cubuk2020}, or 3-Augment~\cite{Touvron2022}. %, which is optimized to train a ViT.
+For a general overview of data augmentation for image classification, we refer to Shorten et al.~\cite{Shorten2019} and Xu et al.~\cite{Xu2023d}.
+
+We advance these general augmentations by introducing \schemename to explicitly separate objects and backgrounds for image classification, allowing us to move beyond image compositions from the dataset.
+Thus, \schemename unlocks performance improvements and bias reduction not possible with traditional data augmentation.
+% \schemename is used additionally to traditional augmentation techniques to improve performance and reduce biases.
+
+\textbf{Copy-Paste Augmentation.}
+The copy-paste augmentation~\cite{Ghiasi2021}, which is used only for object detection~\cite{Shermaine2025,Ghiasi2021} and instance segmentation~\cite{Werman2022,Ling2022}, involves copying segmented objects from one image and pasting them onto another.
+While typically human annotated segmentation masks are used to extract the foreground objects, other foreground sources have been explored, like 3D models~\cite{Hinterstoisser2019} and pretrained object-detection models for use on objects on white background~\cite{Dwibedi2017} or synthetic images~\cite{Ge2023}.
+Kang et al.~\cite{Kang2022} apply copy-paste as an alternative to CutMix in image classification, but they do not shift the size or position of the foregrounds and use dataset images (with object) as backgrounds.
+
+Unlike prior copy-paste methods that overlay objects, \schemename extracts foregrounds and replaces their backgrounds with semantically neutral fills, thereby preserving label integrity while enabling controlled and diverse recombination.
+
+\textbf{Generative data augmentation.}
+Recent work uses generative models to synthesize additional training images, e.g., via GANs or diffusion models driven by text prompts or attribute labels~\cite{Lu2022,Trabucco2024,Islam2024}.
+Concurrently to our work, AGA~\cite{Rahat2025} combines LLMs, diffusion models, and segmentation to generate fully synthetic backgrounds from text prompts, onto which real foregrounds are pasted.
+These synthetic images are appended to the original training set.
+
+While AGA focuses on increasing diversity via prompt-driven background synthesis, \schemename uses generative models differently:
+We apply inpainting only to locally neutralize the original object region, yielding semi-synthetic backgrounds that preserve the global layout, style, and characteristics of real dataset images.
+% AGA's focus on synthetic background is likely to produce a shifted, or even collapsed background image distribution~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
+Fully synthetic, prompt-generated backgrounds are likely to change, the effective background distribution, especially when prompts or generators are biased~\cite{Zverev2025,Shumailov2024,Adamkiewicz2026}.
+We then do online recombination of real foregrounds with these neutralized, dataset-consistent backgrounds under explicit control of object position and scale.
+Thus, \schemename acts as a dynamic large-scale augmentation method while AGA is statically expanding small-scale training sets with synthetic data.
+
+\textbf{Model robustness evaluation.}
+Evaluating model robustness to various image variations is critical for understanding and improving model generalization.
+Datasets like ImageNet-A~\cite{Hendrycks2021}, ImageNet-C~\cite{Hendrycks2019} and ImageNet-P~\cite{Hendrycks2019} introduce common corruptions and perturbations.
+ImageNet-E~\cite{Li2023e} evaluates model robustness against a collection of distribution shifts.
+Other datasets, such as ImageNet-D~\cite{Zhang2024f} and ImageNet-R~\cite{Hendrycks2021a}, focus on varying background, texture, and material, but rely on synthetic data.
+Stylized ImageNet~\cite{Geirhos2019} investigates the impact of texture changes.
+ImageNet-9~\cite{Xiao2020} explores background variations using segmented images for a 9-class subset of ImageNet with artificial backgrounds.
+
+In contrast to these existing datasets, which are used only for evaluation, \schemename provides fine-grained control over foreground object placement, size, and background selection, enabling a precise and comprehensive analysis of specific model biases within the context of a large-scale, real-world image distribution.
+As \schemename also provides controllable training data generation, it goes beyond simply measuring robustness to actively improving it through training.