arxiv V2

2026-02-24 11:57:25 +01:00
parent 7e66c96a60
commit e8cc0ee8a6
275 changed files with 16336 additions and 836 deletions
--- a/arxiv_v2_arXiv/sec/abstract.tex
+++ b/arxiv_v2_arXiv/sec/abstract.tex
@@ -0,0 +1,13 @@
+
+\begin{abstract}
+    Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
+    However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
+    This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
+    \schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
+    This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
+    We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
+    Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
+    Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
+    In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
+    Our code and dataset are publicly available at \code{https://github.com/tobna/ForAug}.
+\end{abstract}