anomalib.data.mvtec

MVTec AD Dataset (CC BY-NC-SA 4.0).

Description:
This script contains PyTorch Dataset, Dataloader and PyTorch

Lightning DataModule for the MVTec AD dataset.

If the dataset is not on the file system, the script downloads and

extracts the dataset and create PyTorch data objects.

License:

MVTec AD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)(https://creativecommons.org/licenses/by-nc-sa/4.0/).

Reference:
  • Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision 129(4):1038-1059, 2021, DOI: 10.1007/s11263-020-01400-4.

  • Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584-9592, 2019, DOI: 10.1109/CVPR.2019.00982.

Module Contents

Classes

MVTecDataset

MVTec AD PyTorch Dataset.

MVTec

MVTec AD Lightning Data Module.

Functions

make_mvtec_dataset(path: pathlib.Path, split: Optional[str] = None, split_ratio: float = 0.1, seed: int = 0, create_validation_set: bool = False) → pandas.core.frame.DataFrame

Create MVTec AD samples by parsing the MVTec AD data file structure.

Attributes

anomalib.data.mvtec.logger[source]
anomalib.data.mvtec.make_mvtec_dataset(path: pathlib.Path, split: Optional[str] = None, split_ratio: float = 0.1, seed: int = 0, create_validation_set: bool = False) pandas.core.frame.DataFrame[source]

Create MVTec AD samples by parsing the MVTec AD data file structure.

The files are expected to follow the structure:

path/to/dataset/split/category/image_filename.png path/to/dataset/ground_truth/category/mask_filename.png

This function creates a dataframe to store the parsed information based on the following format: |---|—————|-------|———|---------------|—————————————|-------------| | | path | split | label | image_path | mask_path | label_index | |---|—————|-------|———|---------------|—————————————|-------------| | 0 | datasets/name | test | defect | filename.png | ground_truth/defect/filename_mask.png | 1 | |---|—————|-------|———|---------------|—————————————|-------------|

Parameters
  • path (Path) – Path to dataset

  • split (str, optional) – Dataset split (ie., either train or test). Defaults to None.

  • split_ratio (float, optional) – Ratio to split normal training images and add to the test set in case test set doesn’t contain any normal images. Defaults to 0.1.

  • seed (int, optional) – Random seed to ensure reproducibility when splitting. Defaults to 0.

  • create_validation_set (bool, optional) – Boolean to create a validation set from the test set. MVTec AD dataset does not contain a validation set. Those wanting to create a validation set could set this flag to True.

Example

The following example shows how to get training samples from MVTec AD bottle category:

>>> root = Path('./MVTec')
>>> category = 'bottle'
>>> path = root / category
>>> path
PosixPath('MVTec/bottle')
>>> samples = make_mvtec_dataset(path, split='train', split_ratio=0.1, seed=0)
>>> samples.head()
   path         split label image_path                           mask_path                   label_index
0  MVTec/bottle train good MVTec/bottle/train/good/105.png MVTec/bottle/ground_truth/good/105_mask.png 0
1  MVTec/bottle train good MVTec/bottle/train/good/017.png MVTec/bottle/ground_truth/good/017_mask.png 0
2  MVTec/bottle train good MVTec/bottle/train/good/137.png MVTec/bottle/ground_truth/good/137_mask.png 0
3  MVTec/bottle train good MVTec/bottle/train/good/152.png MVTec/bottle/ground_truth/good/152_mask.png 0
4  MVTec/bottle train good MVTec/bottle/train/good/109.png MVTec/bottle/ground_truth/good/109_mask.png 0
Returns

an output dataframe containing samples for the requested split (ie., train or test)

Return type

DataFrame

class anomalib.data.mvtec.MVTecDataset(root: Union[pathlib.Path, str], category: str, pre_process: anomalib.pre_processing.PreProcessor, split: str, task: str = 'segmentation', seed: int = 0, create_validation_set: bool = False)[source]

Bases: torchvision.datasets.folder.VisionDataset

MVTec AD PyTorch Dataset.

__len__(self) int[source]

Get length of the dataset.

__getitem__(self, index: int) Dict[str, Union[str, torch.Tensor]][source]

Get dataset item for the index index.

Parameters

index (int) – Index to get the item.

Returns

Dict of image tensor during training.

Otherwise, Dict containing image path, target path, image tensor, label and transformed bounding box.

Return type

Union[Dict[str, Tensor], Dict[str, Union[str, Tensor]]]

class anomalib.data.mvtec.MVTec(root: str, category: str, image_size: Optional[Union[int, Tuple[int, int]]] = None, train_batch_size: int = 32, test_batch_size: int = 32, num_workers: int = 8, task: str = 'segmentation', transform_config_train: Optional[Union[str, albumentations.Compose]] = None, transform_config_val: Optional[Union[str, albumentations.Compose]] = None, seed: int = 0, create_validation_set: bool = False)[source]

Bases: pytorch_lightning.core.datamodule.LightningDataModule

MVTec AD Lightning Data Module.

prepare_data(self) None[source]

Download the dataset if not available.

setup(self, stage: Optional[str] = None) None[source]

Setup train, validation and test data.

Parameters

stage – Optional[str]: Train/Val/Test stages. (Default value = None)

train_dataloader(self) pytorch_lightning.utilities.types.TRAIN_DATALOADERS[source]

Get train dataloader.

val_dataloader(self) pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]

Get validation dataloader.

test_dataloader(self) pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]

Get test dataloader.

predict_dataloader(self) pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]

Get predict dataloader.