AAAI Version
This commit is contained in:
61
AAAI Supplementary Material/ForNet Creation Code/README.md
Normal file
61
AAAI Supplementary Material/ForNet Creation Code/README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Creating the ForNet Dataset
|
||||
|
||||
We can't just provide the ForNet dataset here, as it's too large to be part of the appendix and using a link will go against double-blind review rules.
|
||||
After acceptance, the dataset will be downloadable online.
|
||||
For now, we provide the scripts and steps to recreate the dataset.
|
||||
In general, if you are unsure what arguments each script allows, run it using the `--help` flag.
|
||||
|
||||
## 1. Setup paths
|
||||
|
||||
Fill in the paths in `experiments/general_srun` in the `Model Training Code` folder, as well as in `srun-general.sh`, `slurm-segment-imnet.sh` and all the `sbatch-segment-...` files.
|
||||
In particular the `--container-image`, `--container-mounts`, `--output` and `NLTK_DATA` and `HF_HOME` paths in `--export`.
|
||||
|
||||
## 2. Pretrain Filtering Models
|
||||
|
||||
Use the `Model Trainig Code` to pretrain an ensemble of models to use for filtering in a later step.
|
||||
Train those models on either `TinyImageNet` or `ImageNet`, depending on if you want to create `TinyForNet` or `ForNet`.
|
||||
The fill in the relevant paths to the pretrained weights in `experiments/filter_segmentation_versions.py` lines 96/98.
|
||||
|
||||
## 3. Create the dataset
|
||||
|
||||
### Automatically: using slurm
|
||||
|
||||
You may just run the `create_dataset.py` file (on a slurm head node). That file will automatically run all the necessary steps one after another.
|
||||
|
||||
### Manually and step-by-step
|
||||
|
||||
If you want to run each step of the pipeline manually, follow these steps.
|
||||
For default arguments and settings, see the `create_dataset.py` script, even though you may not want to run it directly, it can tell you how to run all the other scripts.
|
||||
|
||||
#### 3.1 Segment Objects and Backgrounds
|
||||
|
||||
Use the segementation script (`segment_imagenet.py`) to segment each of the dataset images.
|
||||
Watch out, as this script uses `datadings` for image loading, so you need to provide a `datadings` variant of your dataset.
|
||||
You need to provide the root folder of the dataset.
|
||||
Choose your segmentation model using the `-model` argument (LaMa or AttErase).
|
||||
If you want to use the >general< prompting strategy, set the `--parent_in_promt` flag.
|
||||
Use `--output`/`-o` to set the output directory.
|
||||
Use `--processes` and `-id` for splitting the task up into multiple parallelizable processes.
|
||||
|
||||
#### 3.2 Filter the segmented images
|
||||
|
||||
In this step, you use the pretrained ensemble of models (from step 2) for filtering the segmented images.
|
||||
As this step is based on the training and model code, it's in the `Model Training Code` directory.
|
||||
After setting the relevant paths to the pretrained weights (see step 2), you may run the `experiments/filter_segmentation_versions.py` script using that directory as the PWD.
|
||||
|
||||
#### 3.3 Zip the dataset
|
||||
|
||||
In distributed storage settings it might be useful to read from one large (unclompressed) zip file instead of reading millions of small single files.
|
||||
To do this, run
|
||||
|
||||
```commandline
|
||||
zip -r -0 backgrounds_train.zip train/backgrounds > /dev/null 2>&1
|
||||
```
|
||||
|
||||
for the train and val backgrounds and foregrounds
|
||||
|
||||
#### 3.4 Compute the foreground size ratios
|
||||
|
||||
For the resizing step during recombination, the relative size of each object in each image is needed.
|
||||
To compute it, run the `foreground_size_ratio.py` script on your filtered dataset.
|
||||
It expects the zipfiled in the folder you provide as `-ds`.
|
||||
Reference in New Issue
Block a user