MVTecAD2 Datamodule#

MVTec AD 2 Lightning Data Module.

This module implements a PyTorch Lightning DataModule for the MVTec AD 2 dataset. The module handles downloading, loading, and preprocessing of the dataset for training and evaluation.

The dataset provides three different test sets:
  • Public test set (test_public/): Contains both normal and anomalous samples with ground truth masks for facilitating local testing and initial performance estimation

  • Private test set (test_private/): Official unseen test set without ground truth for entering the leaderboard

  • Private mixed test set (test_private_mixed/): Contains unseen test samples captured under seen and unseen lighting conditions (mixed randomly) without ground truth

The public test set is meant for local evaluation, while the private test sets are the official test sets for entering the leaderboard on the evaluation server (https://benchmark.mvtec.com/).

License:

MVTec AD 2 dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/

Reference:

Lars Heckler-Kram, Jan-Hendrik Neudeck, Ulla Scheler, Rebecca König, Carsten Steger: The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection. arXiv preprint, 2024 (to appear).

class anomalib.data.datamodules.image.mvtecad2.MVTecAD2(root='./datasets/MVTec_AD_2', category='sheet_metal', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_type=TestType.PUBLIC, seed=None)#

Bases: AnomalibDataModule

MVTec AD 2 Lightning Data Module.

Parameters:
  • root (str | Path) – Path to the dataset root directory. Defaults to "./datasets/MVTec_AD_2".

  • category (str) – Name of the MVTec AD 2 category to load. Defaults to "sheet_metal".

  • train_batch_size (int, optional) – Training batch size. Defaults to 32.

  • eval_batch_size (int, optional) – Validation and test batch size. Defaults to 32.

  • num_workers (int, optional) – Number of workers for data loading. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • test_type (str | TestType) –

    Type of test set to use: - "public": Test set with ground truth for local evaluation and initial

    performance estimation

    • "private": Official test set without ground truth for leaderboard submission

    • "private_mixed": Official test set with mixed lighting conditions (seen and unseen, randomly mixed) for leaderboard submission

    Defaults to TestType.PUBLIC.

  • seed (int | None, optional) – Random seed for reproducibility. Defaults to None.

Example

>>> from anomalib.data import MVTecAD2
>>> datamodule = MVTecAD2(
...     root="./datasets/MVTec_AD_2",
...     category="sheet_metal",
...     train_batch_size=32,
...     eval_batch_size=32,
...     num_workers=8,
... )

To use private test set: >>> datamodule = MVTecAD2( … root=”./datasets/MVTec_AD_2”, … category=”sheet_metal”, … test_type=”private”, … )

Access different test sets: >>> datamodule.setup() >>> public_loader = datamodule.test_dataloader() # returns loader based on test_type >>> private_loader = datamodule.test_dataloader(test_type=”private”) >>> mixed_loader = datamodule.test_dataloader(test_type=”private_mixed”)

prepare_data()#

Download the dataset if not available.

This method checks if the specified dataset is available in the file system. If not, it downloads and extracts the dataset into the appropriate directory.

Return type:

None

Example

Assume the dataset is not available on the file system:

>>> datamodule = MVTecAD2(
...     root="./datasets/MVTecAD2",
...     category="can"
... )
>>> datamodule.prepare_data()

Directory structure after download:

datasets/
└── MVTecAD2/
    ├── can/
    ├── fabric/
    └── ...
test_dataloader(test_type=None)#

Get test dataloader for the specified test type.

Parameters:

test_type (str | TestType | None, optional) – Type of test set to use: - "public": Test set with ground truth for local evaluation - "private": Official test set without ground truth for leaderboard - "private_mixed": Official test set with mixed lighting conditions If None, uses the test_type specified in __init__. Defaults to None.

Example

>>> datamodule.setup()
>>> public_loader = datamodule.test_dataloader()  # returns loader based on test_type
>>> private_loader = datamodule.test_dataloader(test_type="private")
>>> mixed_loader = datamodule.test_dataloader(test_type="private_mixed")
Returns:

Test dataloader for the specified test type.

Return type:

EVAL_DATALOADERS

See also

../../datasets/image/mvtecad2 - MVTec AD 2 Dataset