VAD Datamodule

VAD Datamodule#

VAD Data Module.

This module provides a PyTorch Lightning DataModule for the VAD dataset. If the dataset is not available locally, it will be downloaded and extracted automatically.

Example

Create a VAD datamodule:

>>> from anomalib.data import VAD
>>> datamodule = VAD(
...     root="./datasets/VAD",
...     category="vad"
... )

Notes

The dataset will be automatically downloaded and converted to the required format when first used. The directory structure after preparation will be:

datasets/
└── VAD/
    └── vad/

License:: VAD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/
Reference:: Aimira Baitieva, David Hurych, Victor Besnier, Olivier Bernard: Supervised Anomaly Detection for Complex Industrial Images; in: The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 17754-17762, DOI: 10.1109/CVPR52733.2024.01681.

class anomalib.data.datamodules.image.vad.VAD(root='./datasets/VAD', category='vad', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

VAD Datamodule.

Parameters:

root (Path | str | None) – Path to the root of the dataset. Defaults to "./datasets/VAD".
category (str) – Category of the VAD dataset. Defaults to "vad".
train_batch_size (int) – Training batch size. Defaults to 32.
eval_batch_size (int) – Test batch size. Defaults to 32.
num_workers (int) – Number of workers. Defaults to 8.
train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.
val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.
test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.
augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
test_split_mode (TestSplitMode | str) – Method to create test set. Defaults to TestSplitMode.FROM_DIR.
test_split_ratio (float) – Fraction of data to use for testing. Defaults to 0.2.
val_split_mode (ValSplitMode | str) – Method to create validation set. Defaults to ValSplitMode.SAME_AS_TEST.
val_split_ratio (float) – Fraction of data to use for validation. Defaults to 0.5.
seed (int | None) – Seed for reproducibility. Defaults to None.

Example

Create VAD datamodule with default settings:

>>> datamodule = VAD()
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'image', 'mask_path', 'mask'])

>>> data["image"].shape
torch.Size([32, 3, 256, 256])

Create validation set from test data:

>>> datamodule = VAD(
...     val_split_mode=ValSplitMode.FROM_TEST,
...     val_split_ratio=0.1
... )

Create synthetic validation set:

>>> datamodule = VAD(
...     val_split_mode=ValSplitMode.SYNTHETIC,
...     val_split_ratio=0.2
... )

prepare_data()#

Download the dataset if not available.

This method checks if the specified dataset is available in the file system. If not, it downloads and extracts the dataset into the appropriate directory.

Return type:: None

Example

Assume the dataset is not available on the file system:

>>> datamodule = VAD(
...     root="./datasets/VAD",
...     category="vad"
... )
>>> datamodule.prepare_data()

Directory structure after download:

datasets/
    └── VAD/
        └── vad/

VAD Datamodule

Contents

VAD Datamodule#