cvpr submission

2026-02-24 12:01:26 +01:00
parent 5c08f9d31a
commit e7c0b531d6
59 changed files with 7238 additions and 4939 deletions
--- a/sec/abstract.tex
+++ b/sec/abstract.tex
@@ -1,27 +1,19 @@
 % !TeX root = ../main.tex

 \begin{abstract}
-    % Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
-    % However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
-    % This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
-    % \schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
-    % This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
-    % We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
-    % Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
-    % Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
-    % In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
-    % Our code and dataset are publicly available at \code{<url>}.
-
-    Large-scale image classification datasets exhibit strong compositional biases: objects tend to be centered, appear at characteristic scales, and co-occur with class-specific context.
-    % Models can exploit these biases to achieve high in-distribution accuracy, yet remain brittle under distribution shifts.
-    By exploiting such biases, models attain high in-distribution accuracy but remain fragile under distribution shifts.
-    To address this issue, we introduce \schemename, a controlled composition augmentation scheme that factorizes each training image into a \emph{foreground object} and a \emph{background} and recombines them to explicitly manipulate object position, object scale, and background identity.
-    \schemename uses off-the-shelf segmentation and inpainting models to (i) extract the foreground and synthesize a neutral background, and (ii) paste the foreground onto diverse neutral backgrounds before applying standard strong augmentation policies.
-    Compared to conventional augmentations and content-mixing methods, our factorization provides direct control knobs that break foreground-background correlations. % while preserving the label.
-    Across 10 architectures, \schemename improves ImageNet top-1 accuracy by up to 6 percentage points (p.p.) and yields gains of up to 7.3 p.p. on fine-grained downstream datasets.
-    Moreover, the same control knobs enable targeted diagnostic tests: we quantify background reliance, foreground focus, center bias, and size bias via controlled background swaps and position/scale sweeps, and show that training with \schemename substantially reduces these shortcut behaviors and significantly increases accuracy on standard distribution-shift benchmarks by up to $19$ p.p.
-    % Moreover, the same control knobs enable targeted diagnostic tests: we quantify background reliance, foreground focus, center bias, and size bias via controlled background swaps and position/scale sweeps, and show that training with \schemename substantially reduces these shortcut behaviors and significantly increases accuracy on standard distribution-shift benchmarks like ImageNet-A/-C/-R by up to $19$ p.p.
+    Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
+    However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
+    This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
+    % This paper introduces \name, a novel dataset derived from ImageNet that addresses these challenges.
+    \schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
+    % enabling fine-grained control over image composition during training.
+    % Missing sentence here of how you use it to generate data in what way and with what purpose wrt to bias
+    This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
+    % It thus increases the data diversity and effective number of training samples.
+    We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
+    % Importantly, \schemename enables novel ways of analyzing model behavior and quantifying biases.
+    Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
+    Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
+    In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
    Our code and dataset are publicly available at \code{<url>}.
-
-    \keywords{Data Augmentation \and Vision Transformer \and Robustness}
-\end{abstract}
+\end{abstract}