MPDD#

MPDD Data Module.

This module provides a PyTorch Lightning DataModule for the MPDD dataset.

MPDD is a dataset aimed at benchmarking visual defect detection methods in industrial metal parts manufacturing. It contains 6 categories of industrial objects with both normal and anomalous samples. Each category includes RGB images and pixel-level ground truth masks for anomaly segmentation.

Example

Create a MPDD datamodule:

>>> from anomalib.data import MPDD
>>> datamodule = MPDD(
...     root="./datasets/MPDD",
...     category="bracket_black"
... )

Notes

The dataset should be downloaded manually from OneDrive and placed in the appropriate directory. See DOWNLOAD_INSTRUCTIONS for detailed steps.

License:

MPDD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/

Reference:

S. Jezek, M. Jonak, R. Burget, P. Dvorak and M. Skotak (2021). Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), 2021, pp. 66-71, DOI: 10.1109/ICUMT54235.2021.9631567.

class anomalib.data.datamodules.image.mpdd.MPDD(root='./datasets/MPDD', category='bracket_black', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

MPDD Datamodule.

Parameters:
  • root (Path | str | None) – Path to the root of the dataset. Defaults to "./datasets/MPDD".

  • category (str) – Category of the MPDD dataset (e.g. "bracket_black" or "bracket_brown"). Defaults to "bracket_black".

  • train_batch_size (int) – Training batch size. Defaults to 32.

  • eval_batch_size (int) – Test batch size. Defaults to 32.

  • num_workers (int) – Number of workers. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • test_split_mode (TestSplitMode | str) – Method to create test set. Defaults to TestSplitMode.FROM_DIR.

  • test_split_ratio (float) – Fraction of data to use for testing. Defaults to 0.2.

  • val_split_mode (ValSplitMode | str) – Method to create validation set. Defaults to ValSplitMode.SAME_AS_TEST.

  • val_split_ratio (float) – Fraction of data to use for validation. Defaults to 0.5.

  • seed (int | None) – Seed for reproducibility. Defaults to None.

Example

Create MPDD datamodule with default settings:

>>> datamodule = MPDD()
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'image', 'mask_path', 'mask'])

>>> data["image"].shape
torch.Size([32, 3, 256, 256])

Change the category:

>>> datamodule = MPDD(category="bracket_brown")

Create validation set from test data:

>>> datamodule = MPDD(
...     val_split_mode=ValSplitMode.FROM_TEST,
...     val_split_ratio=0.1
... )

Create synthetic validation set:

>>> datamodule = MPDD(
...     val_split_mode=ValSplitMode.SYNTHETIC,
...     val_split_ratio=0.2
... )
prepare_data()#

Verify that the dataset is available and provide download instructions.

This method checks if the dataset exists in the root directory. If not, it provides instructions for downloading from OneDrive.

The MPDD dataset is available at: https://vutbr-my.sharepoint.com/:f:/g/personal/xjezek16_vutbr_cz/EhHS_ufVigxDo3MC6Lweau0BVMuoCmhMZj6ddamiQ7-FnA?e=oHKCxI

Return type:

None

anomalib.data.datamodules.image.mpdd.get_download_instructions(root_path)#

Get download instructions for the MPDD dataset.

Parameters:

root_path (Path) – Path where the dataset should be downloaded.

Returns:

Formatted download instructions.

Return type:

str