cvpr submission
This commit is contained in:
@@ -1,27 +1,19 @@
|
||||
% !TeX root = ../main.tex
|
||||
|
||||
\begin{abstract}
|
||||
% Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
|
||||
% However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
|
||||
% This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
|
||||
% \schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
|
||||
% This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
|
||||
% We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
|
||||
% Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
|
||||
% Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
|
||||
% In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
|
||||
% Our code and dataset are publicly available at \code{<url>}.
|
||||
|
||||
Large-scale image classification datasets exhibit strong compositional biases: objects tend to be centered, appear at characteristic scales, and co-occur with class-specific context.
|
||||
% Models can exploit these biases to achieve high in-distribution accuracy, yet remain brittle under distribution shifts.
|
||||
By exploiting such biases, models attain high in-distribution accuracy but remain fragile under distribution shifts.
|
||||
To address this issue, we introduce \schemename, a controlled composition augmentation scheme that factorizes each training image into a \emph{foreground object} and a \emph{background} and recombines them to explicitly manipulate object position, object scale, and background identity.
|
||||
\schemename uses off-the-shelf segmentation and inpainting models to (i) extract the foreground and synthesize a neutral background, and (ii) paste the foreground onto diverse neutral backgrounds before applying standard strong augmentation policies.
|
||||
Compared to conventional augmentations and content-mixing methods, our factorization provides direct control knobs that break foreground-background correlations. % while preserving the label.
|
||||
Across 10 architectures, \schemename improves ImageNet top-1 accuracy by up to 6 percentage points (p.p.) and yields gains of up to 7.3 p.p. on fine-grained downstream datasets.
|
||||
Moreover, the same control knobs enable targeted diagnostic tests: we quantify background reliance, foreground focus, center bias, and size bias via controlled background swaps and position/scale sweeps, and show that training with \schemename substantially reduces these shortcut behaviors and significantly increases accuracy on standard distribution-shift benchmarks by up to $19$ p.p.
|
||||
% Moreover, the same control knobs enable targeted diagnostic tests: we quantify background reliance, foreground focus, center bias, and size bias via controlled background swaps and position/scale sweeps, and show that training with \schemename substantially reduces these shortcut behaviors and significantly increases accuracy on standard distribution-shift benchmarks like ImageNet-A/-C/-R by up to $19$ p.p.
|
||||
Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification.
|
||||
However, they often require large amounts of data and can exhibit biases, such as center or size bias, that limit their robustness and generalizability.
|
||||
This paper introduces \schemename, a novel data augmentation operation that addresses these challenges by explicitly imposing invariances into the training data, which are otherwise part of the neural network architecture.
|
||||
% This paper introduces \name, a novel dataset derived from ImageNet that addresses these challenges.
|
||||
\schemename is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds.
|
||||
% enabling fine-grained control over image composition during training.
|
||||
% Missing sentence here of how you use it to generate data in what way and with what purpose wrt to bias
|
||||
This recombination step enables us to take fine-grained control over object position and size, as well as background selection.
|
||||
% It thus increases the data diversity and effective number of training samples.
|
||||
We demonstrate that using \schemename significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet, which translates to 7.3 p.p. on downstream tasks.
|
||||
% Importantly, \schemename enables novel ways of analyzing model behavior and quantifying biases.
|
||||
Importantly, \schemename not only improves accuracy but also opens new ways to analyze model behavior and quantify biases.
|
||||
Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that using \schemename during training substantially reduces these biases.
|
||||
In summary, \schemename provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models.
|
||||
Our code and dataset are publicly available at \code{<url>}.
|
||||
|
||||
\keywords{Data Augmentation \and Vision Transformer \and Robustness}
|
||||
\end{abstract}
|
||||
\end{abstract}
|
||||
Reference in New Issue
Block a user