arxiv V2
This commit is contained in:
13
arxiv_v2_arXiv/sec/abstract.tex
Normal file
13
arxiv_v2_arXiv/sec/abstract.tex
Normal file
@@ -0,0 +1,13 @@
|
||||
|
||||
\begin{abstract}
|
||||
Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
|
||||
However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
|
||||
This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
|
||||
\schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
|
||||
This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
|
||||
We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
|
||||
Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
|
||||
Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
|
||||
In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
|
||||
Our code and dataset are publicly available at \code{https://github.com/tobna/ForAug}.
|
||||
\end{abstract}
|
||||
Reference in New Issue
Block a user