AAAI Version

2026-02-24 12:22:44 +01:00
parent 5c08f9d31a
commit ff34712155
378 changed files with 19844 additions and 4780 deletions
--- a/Material/ForNet
+++ b/Material/ForNet
@@ -0,0 +1,61 @@
+# Creating the ForNet Dataset
+
+We can't just provide the ForNet dataset here, as it's too large to be part of the appendix and using a link will go against double-blind review rules.
+After acceptance, the dataset will be downloadable online.
+For now, we provide the scripts and steps to recreate the dataset.
+In general, if you are unsure what arguments each script allows, run it using the `--help` flag.
+
+## 1. Setup paths
+
+Fill in the paths in `experiments/general_srun` in the `Model Training Code` folder, as well as in `srun-general.sh`, `slurm-segment-imnet.sh` and all the `sbatch-segment-...` files.
+In particular the `--container-image`, `--container-mounts`, `--output` and `NLTK_DATA` and `HF_HOME` paths in `--export`.
+
+## 2. Pretrain Filtering Models
+
+Use the `Model Trainig Code` to pretrain an ensemble of models to use for filtering in a later step.
+Train those models on either `TinyImageNet` or `ImageNet`, depending on if you want to create `TinyForNet` or `ForNet`.
+The fill in the relevant paths to the pretrained weights in `experiments/filter_segmentation_versions.py` lines 96/98.
+
+## 3. Create the dataset
+
+### Automatically: using slurm
+
+You may just run the `create_dataset.py` file (on a slurm head node). That file will automatically run all the necessary steps one after another.
+
+### Manually and step-by-step
+
+If you want to run each step of the pipeline manually, follow these steps.
+For default arguments and settings, see the `create_dataset.py` script, even though you may not want to run it directly, it can tell you how to run all the other scripts.
+
+#### 3.1 Segment Objects and Backgrounds
+
+Use the segementation script (`segment_imagenet.py`) to segment each of the dataset images.
+Watch out, as this script uses `datadings` for image loading, so you need to provide a `datadings` variant of your dataset.
+You need to provide the root folder of the dataset.
+Choose your segmentation model using the `-model` argument (LaMa or AttErase).
+If you want to use the >general< prompting strategy, set the `--parent_in_promt` flag.
+Use `--output`/`-o` to set the output directory.
+Use `--processes` and `-id` for splitting the task up into multiple parallelizable processes.
+
+#### 3.2 Filter the segmented images
+
+In this step, you use the pretrained ensemble of models (from step 2) for filtering the segmented images.
+As this step is based on the training and model code, it's in the `Model Training Code` directory.
+After setting the relevant paths to the pretrained weights (see step 2), you may run the `experiments/filter_segmentation_versions.py` script using that directory as the PWD.
+
+#### 3.3 Zip the dataset
+
+In distributed storage settings it might be useful to read from one large (unclompressed) zip file instead of reading millions of small single files.
+To do this, run
+
+```commandline
+zip -r -0 backgrounds_train.zip train/backgrounds > /dev/null 2>&1
+```
+
+for the train and val backgrounds and foregrounds
+
+#### 3.4 Compute the foreground size ratios
+
+For the resizing step during recombination, the relative size of each object in each image is needed.
+To compute it, run the `foreground_size_ratio.py` script on your filtered dataset.
+It expects the zipfiled in the folder you provide as `-ds`.