cvpr submission

This commit is contained in:
Tobias Christian Nauen
2026-02-24 12:01:26 +01:00
parent 5c08f9d31a
commit e7c0b531d6
59 changed files with 7238 additions and 4939 deletions

View File

@@ -1,21 +1,11 @@
% !TeX root = ../main.tex
\section{Conclusion \& Future Work}
\section{Discussion \& Conclusion}
\label{sec:conclusion}
% We introduce \schemename, a novel data augmentation scheme that facilitates improved Transformer training for image classification.
% By explicitly separating and recombining foreground objects and backgrounds, \schemename enables controlled data augmentation beyond existing image compositions, leading to significant performance gains on ImageNet and downstream fine-grained classification tasks.
% Furthermore, \schemename provides a powerful framework for analyzing model behavior and quantifying biases, including background robustness, foreground focus, center bias, and size bias.
% Our experiments demonstrate that training using \schemename not only boosts accuracy but also significantly reduces these biases, resulting in more robust and generalizable models.
% In the future, we see \schemename be also applied to other datasets and tasks, like video recognition or segmentation.
% \schemename's ability to both improve performance and provide insights into model behavior makes it a valuable tool for advancing CV research and developing more reliable AI systems.
We introduced \schemename, a controlled composition augmentation scheme that factorizes images into foreground objects and backgrounds and recombines them with explicit control over background identity, object position, and object scale.
% Empirically, \schemename consistently improves clean accuracy and robustness across architectures and scales.
Across diverse architectures, training with \schemename on top of standard strong augmentations yields substantial gains on ImageNet (up to $+6$ p.p.) and fine-grained downstream tasks (up to $+7.3$ p.p.), and consistently improves robustness on well-recognized benchmarks (up to $+19$ p.p.).
\schemename's compositional controls additionally provide a framework for analyzing model behavior and quantify biases, including background robustness, foreground focus, center bias, and size bias.
This dual role of \schemename as both a training mechanism and an evaluation tool highlights the value of explicit compositional factorization in understanding and improving image classifiers.
In future work, we aim to extend controlled composition beyond classification to multi-object and dense prediction settings, including detection, segmentation, and video recognition.
% By coupling performance gains with interpretable, controllable evaluations, \schemename offers a practical data-centric tool for advancing robust and reliable computer vision systems.
More generally, we believe that designing augmentations around explicitly controllable and interpretable generative setups is a promising direction for building robust and reliable vision systems.
We introduce \schemename, a novel data augmentation scheme that facilitates improved Transformer training for image classification.
By explicitly separating and recombining foreground objects and backgrounds, \schemename enables controlled data augmentation beyond existing image compositions, leading to significant performance gains on ImageNet and downstream fine-grained classification tasks.
Furthermore, \schemename provides a powerful framework for analyzing model behavior and quantifying biases, including background robustness, foreground focus, center bias, and size bias.
Our experiments demonstrate that training using \schemename not only boosts accuracy but also significantly reduces these biases, resulting in more robust and generalizable models.
In the future, we see \schemename be also applied to other datasets and tasks, like video recognition or segmentation.
\schemename's ability to both improve performance and provide insights into model behavior makes it a valuable tool for advancing CV research and developing more reliable AI systems.