MVTecAD Datamodule#
MVTec AD Data Module.
This module provides a PyTorch Lightning DataModule for the MVTec AD dataset. If the dataset is not available locally, it will be downloaded and extracted automatically.
Example
Create a MVTec AD datamodule:
>>> from anomalib.data import MVTecAD
>>> datamodule = MVTecAD(
... root="./datasets/MVTecAD",
... category="bottle"
... )
Notes
The dataset will be automatically downloaded and converted to the required format when first used. The directory structure after preparation will be:
datasets/
└── MVTecAD/
├── bottle/
├── cable/
└── ...
- License:
MVTec AD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/
- Reference:
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision 129(4):1038-1059, 2021, DOI: 10.1007/s11263-020-01400-4.
Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584-9592, 2019, DOI: 10.1109/CVPR.2019.00982.
- class anomalib.data.datamodules.image.mvtecad.MVTec(*args, **kwargs)#
Bases:
MVTecADMVTec datamodule class (Deprecated).
This class is deprecated and will be removed in a future version. Please use MVTecAD instead.
- class anomalib.data.datamodules.image.mvtecad.MVTecAD(root='./datasets/MVTecAD', category='bottle', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#
Bases:
AnomalibDataModuleMVTec AD Datamodule.
- Parameters:
root (Path | str) – Path to the root of the dataset. Defaults to
"./datasets/MVTecAD".category (str) – Category of the MVTec AD dataset (e.g.
"bottle"or"cable"). Defaults to"bottle".train_batch_size (int, optional) – Training batch size. Defaults to
32.eval_batch_size (int, optional) – Test batch size. Defaults to
32.num_workers (int, optional) – Number of workers. Defaults to
8.train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to
None.val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to
None.test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to
None.augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
test_split_mode (TestSplitMode) – Method to create test set. Defaults to
TestSplitMode.FROM_DIR.test_split_ratio (float) – Fraction of data to use for testing. Defaults to
0.2.val_split_mode (ValSplitMode) – Method to create validation set. Defaults to
ValSplitMode.SAME_AS_TEST.val_split_ratio (float) – Fraction of data to use for validation. Defaults to
0.5.seed (int | None, optional) – Seed for reproducibility. Defaults to
None.
Example
Create MVTec AD datamodule with default settings:
>>> datamodule = MVTecAD() >>> datamodule.setup() >>> i, data = next(enumerate(datamodule.train_dataloader())) >>> data.keys() dict_keys(['image_path', 'label', 'image', 'mask_path', 'mask']) >>> data["image"].shape torch.Size([32, 3, 256, 256])
Change the category:
>>> datamodule = MVTecAD(category="cable")
Create validation set from test data:
>>> datamodule = MVTecAD( ... val_split_mode=ValSplitMode.FROM_TEST, ... val_split_ratio=0.1 ... )
Create synthetic validation set:
>>> datamodule = MVTecAD( ... val_split_mode=ValSplitMode.SYNTHETIC, ... val_split_ratio=0.2 ... )
- prepare_data()#
Download the dataset if not available.
This method checks if the specified dataset is available in the file system. If not, it downloads and extracts the dataset into the appropriate directory.
- Return type:
Example
Assume the dataset is not available on the file system:
>>> datamodule = MVTecAD( ... root="./datasets/MVTecAD", ... category="bottle" ... ) >>> datamodule.prepare_data()
Directory structure after download:
datasets/ └── MVTecAD/ ├── bottle/ ├── cable/ └── ...
See also
../../datasets/image/mvtecad - MVTec AD Dataset