Visa Datamodule#

Visual Anomaly (VisA) Data Module.

This module provides a PyTorch Lightning DataModule for the Visual Anomaly (VisA) dataset. If the dataset is not available locally, it will be downloaded and extracted automatically.

Example

Create a VisA datamodule:

>>> from anomalib.data import Visa
>>> datamodule = Visa(
...     root="./datasets/visa",
...     category="capsules"
... )

Notes

The dataset will be automatically downloaded and converted to the required format when first used. The directory structure after preparation will be:

datasets/
└── visa/
    ├── visa_pytorch/
    │   ├── candle/
    │   ├── capsules/
    │   └── ...
    └── VisA_20220922.tar
License:

The VisA dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/

Reference:

Zou, Y., Jeong, J., Pemula, L., Zhang, D., & Dabeer, O. (2022). SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. In European Conference on Computer Vision (pp. 392-408). Springer, Cham.

class anomalib.data.datamodules.image.visa.Visa(root='./datasets/visa', category='capsules', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

VisA Datamodule.

Parameters:
  • root (Path | str) – Path to the root of the dataset. Defaults to "./datasets/visa".

  • category (str) – Category of the VisA dataset (e.g. "candle"). Defaults to "capsules".

  • train_batch_size (int, optional) – Training batch size. Defaults to 32.

  • eval_batch_size (int, optional) – Test batch size. Defaults to 32.

  • num_workers (int, optional) – Number of workers for data loading. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • test_split_mode (TestSplitMode | str) – Method to create test set. Defaults to TestSplitMode.FROM_DIR.

  • test_split_ratio (float) – Fraction of data to use for testing. Defaults to 0.2.

  • val_split_mode (ValSplitMode | str) – Method to create validation set. Defaults to ValSplitMode.SAME_AS_TEST.

  • val_split_ratio (float) – Fraction of data to use for validation. Defaults to 0.5.

  • seed (int | None, optional) – Random seed for reproducibility. Defaults to None.

apply_cls1_split()#

Apply the 1-class subset splitting using the fixed split in the csv file.

Adapted from amazon-science/spot-diff.

Return type:

None

prepare_data()#

Download and prepare the dataset if not available.

This method checks if the dataset exists and is properly formatted. If not, it downloads and prepares the data in the following steps: :rtype: None

  1. If the processed dataset exists (visa_pytorch/{category}), do nothing

  2. If the raw dataset exists but isn’t processed, apply the train/test split

  3. If the dataset doesn’t exist, download, extract, and process it

The final directory structure will be:

datasets/
└── visa/
    ├── visa_pytorch/
    │   ├── candle/
    │   │   ├── train/
    │   │   │   └── good/
    │   │   ├── test/
    │   │   │   ├── good/
    │   │   │   └── bad/
    │   │   └── ground_truth/
    │   │       └── bad/
    │   └── ...
    └── VisA_20220922.tar

See also

../../datasets/image/visa - VisA Dataset