`anomalib.data.mvtec`¶

MVTec AD Dataset (CC BY-NC-SA 4.0).

Description:

This script contains PyTorch Dataset, Dataloader and PyTorch: Lightning DataModule for the MVTec AD dataset.
If the dataset is not on the file system, the script downloads and: extracts the dataset and create PyTorch data objects.

License:

MVTec AD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)(https://creativecommons.org/licenses/by-nc-sa/4.0/).

Reference:

Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision 129(4):1038-1059, 2021, DOI: 10.1007/s11263-020-01400-4.
Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584-9592, 2019, DOI: 10.1109/CVPR.2019.00982.

Module Contents¶

Classes¶

`MVTecDataset`	MVTec AD PyTorch Dataset.
`MVTec`	MVTec AD Lightning Data Module.

Functions¶

make_mvtec_dataset(→ pandas.core.frame.DataFrame)

Create MVTec AD samples by parsing the MVTec AD data file structure.

Attributes¶

logger

anomalib.data.mvtec.logger[source]¶

anomalib.data.mvtec.make_mvtec_dataset(path: pathlib.Path, split: Optional[str] = None, split_ratio: float = 0.1, seed: Optional[int] = None, create_validation_set: bool = False) → pandas.core.frame.DataFrame[source]¶

Create MVTec AD samples by parsing the MVTec AD data file structure.

The files are expected to follow the structure:: path/to/dataset/split/category/image_filename.png path/to/dataset/ground_truth/category/mask_filename.png

This function creates a dataframe to store the parsed information based on the following format: |---|—————|-------|———|---------------|—————————————|-------------| | | path | split | label | image_path | mask_path | label_index | |---|—————|-------|———|---------------|—————————————|-------------| | 0 | datasets/name | test | defect | filename.png | ground_truth/defect/filename_mask.png | 1 | |---|—————|-------|———|---------------|—————————————|-------------|

Parameters

path (Path) – Path to dataset
split (str, optional) – Dataset split (ie., either train or test). Defaults to None.
split_ratio (float, optional) – Ratio to split normal training images and add to the test set in case test set doesn’t contain any normal images. Defaults to 0.1.
seed (int, optional) – Random seed to ensure reproducibility when splitting. Defaults to 0.
create_validation_set (bool, optional) – Boolean to create a validation set from the test set. MVTec AD dataset does not contain a validation set. Those wanting to create a validation set could set this flag to True.

Example

The following example shows how to get training samples from MVTec AD bottle category:

>>> root = Path('./MVTec')
>>> category = 'bottle'
>>> path = root / category
>>> path
PosixPath('MVTec/bottle')

>>> samples = make_mvtec_dataset(path, split='train', split_ratio=0.1, seed=0)
>>> samples.head()
   path         split label image_path                           mask_path                   label_index
0  MVTec/bottle train good MVTec/bottle/train/good/105.png MVTec/bottle/ground_truth/good/105_mask.png 0
1  MVTec/bottle train good MVTec/bottle/train/good/017.png MVTec/bottle/ground_truth/good/017_mask.png 0
2  MVTec/bottle train good MVTec/bottle/train/good/137.png MVTec/bottle/ground_truth/good/137_mask.png 0
3  MVTec/bottle train good MVTec/bottle/train/good/152.png MVTec/bottle/ground_truth/good/152_mask.png 0
4  MVTec/bottle train good MVTec/bottle/train/good/109.png MVTec/bottle/ground_truth/good/109_mask.png 0

Returns: an output dataframe containing samples for the requested split (ie., train or test)
Return type: DataFrame

class anomalib.data.mvtec.MVTecDataset(root: Union[pathlib.Path, str], category: str, pre_process: anomalib.pre_processing.PreProcessor, split: str, task: str = 'segmentation', seed: Optional[int] = None, create_validation_set: bool = False)[source]¶

Bases: torchvision.datasets.folder.VisionDataset

MVTec AD PyTorch Dataset.

__len__() → int[source]¶: Get length of the dataset.

__getitem__(index: int) → Dict[str, Union[str, torch.Tensor]][source]¶

Get dataset item for the index index.

Parameters

index (int) – Index to get the item.

Returns

Dict of image tensor during training.: Otherwise, Dict containing image path, target path, image tensor, label and transformed bounding box.

Return type

Union[Dict[str, Tensor], Dict[str, Union[str, Tensor]]]

class anomalib.data.mvtec.MVTec(root: str, category: str, image_size: Optional[Union[int, Tuple[int, int]]] = None, train_batch_size: int = 32, test_batch_size: int = 32, num_workers: int = 8, task: str = 'segmentation', transform_config_train: Optional[Union[str, albumentations.Compose]] = None, transform_config_val: Optional[Union[str, albumentations.Compose]] = None, seed: Optional[int] = None, create_validation_set: bool = False)[source]¶

Bases: pytorch_lightning.core.datamodule.LightningDataModule

MVTec AD Lightning Data Module.

prepare_data() → None[source]¶: Download the dataset if not available.

setup(stage: Optional[str] = None) → None[source]¶

Setup train, validation and test data.

Parameters: stage – Optional[str]: Train/Val/Test stages. (Default value = None)

train_dataloader() → pytorch_lightning.utilities.types.TRAIN_DATALOADERS[source]¶: Get train dataloader.

val_dataloader() → pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]¶: Get validation dataloader.

test_dataloader() → pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]¶: Get test dataloader.

predict_dataloader() → pytorch_lightning.utilities.types.EVAL_DATALOADERS[source]¶: Get predict dataloader.

anomalib.data.mvtec¶

Module Contents¶

Classes¶

Functions¶

Attributes¶

`anomalib.data.mvtec`¶