anomalib.data.mvtec¶
MVTec AD Dataset (CC BY-NC-SA 4.0).
- Description:
- This script contains PyTorch Dataset, Dataloader and PyTorch
Lightning DataModule for the MVTec AD dataset.
- If the dataset is not on the file system, the script downloads and
extracts the dataset and create PyTorch data objects.
- License:
MVTec AD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)(https://creativecommons.org/licenses/by-nc-sa/4.0/).
- Reference:
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision 129(4):1038-1059, 2021, DOI: 10.1007/s11263-020-01400-4.
Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584-9592, 2019, DOI: 10.1109/CVPR.2019.00982.
Module Contents¶
Classes¶
MVTec AD PyTorch Dataset. |
|
MVTec AD Lightning Data Module. |
Functions¶
|
Create MVTec AD samples by parsing the MVTec AD data file structure. |
Attributes¶
- anomalib.data.mvtec.make_mvtec_dataset(path: pathlib.Path, split: Optional[str] = None, split_ratio: float = 0.1, seed: Optional[int] = None, create_validation_set: bool = False) pandas.core.frame.DataFrame[source]¶
Create MVTec AD samples by parsing the MVTec AD data file structure.
- The files are expected to follow the structure:
path/to/dataset/split/category/image_filename.png path/to/dataset/ground_truth/category/mask_filename.png
This function creates a dataframe to store the parsed information based on the following format: |---|—————|-------|———|---------------|—————————————|-------------| | | path | split | label | image_path | mask_path | label_index | |---|—————|-------|———|---------------|—————————————|-------------| | 0 | datasets/name | test | defect | filename.png | ground_truth/defect/filename_mask.png | 1 | |---|—————|-------|———|---------------|—————————————|-------------|
- Parameters
path (Path) – Path to dataset
split (str, optional) – Dataset split (ie., either train or test). Defaults to None.
split_ratio (float, optional) – Ratio to split normal training images and add to the test set in case test set doesn’t contain any normal images. Defaults to 0.1.
seed (int, optional) – Random seed to ensure reproducibility when splitting. Defaults to 0.
create_validation_set (bool, optional) – Boolean to create a validation set from the test set. MVTec AD dataset does not contain a validation set. Those wanting to create a validation set could set this flag to
True.
Example
The following example shows how to get training samples from MVTec AD bottle category:
>>> root = Path('./MVTec') >>> category = 'bottle' >>> path = root / category >>> path PosixPath('MVTec/bottle')
>>> samples = make_mvtec_dataset(path, split='train', split_ratio=0.1, seed=0) >>> samples.head() path split label image_path mask_path label_index 0 MVTec/bottle train good MVTec/bottle/train/good/105.png MVTec/bottle/ground_truth/good/105_mask.png 0 1 MVTec/bottle train good MVTec/bottle/train/good/017.png MVTec/bottle/ground_truth/good/017_mask.png 0 2 MVTec/bottle train good MVTec/bottle/train/good/137.png MVTec/bottle/ground_truth/good/137_mask.png 0 3 MVTec/bottle train good MVTec/bottle/train/good/152.png MVTec/bottle/ground_truth/good/152_mask.png 0 4 MVTec/bottle train good MVTec/bottle/train/good/109.png MVTec/bottle/ground_truth/good/109_mask.png 0
- Returns
an output dataframe containing samples for the requested split (ie., train or test)
- Return type
DataFrame
- class anomalib.data.mvtec.MVTecDataset(root: Union[pathlib.Path, str], category: str, pre_process: anomalib.pre_processing.PreProcessor, split: str, task: str = 'segmentation', seed: Optional[int] = None, create_validation_set: bool = False)[source]¶
Bases:
torchvision.datasets.folder.VisionDatasetMVTec AD PyTorch Dataset.
- __getitem__(index: int) Dict[str, Union[str, torch.Tensor]][source]¶
Get dataset item for the index
index.- Parameters
index (int) – Index to get the item.
- Returns
- Dict of image tensor during training.
Otherwise, Dict containing image path, target path, image tensor, label and transformed bounding box.
- Return type
Union[Dict[str, Tensor], Dict[str, Union[str, Tensor]]]
- class anomalib.data.mvtec.MVTec(root: str, category: str, image_size: Optional[Union[int, Tuple[int, int]]] = None, train_batch_size: int = 32, test_batch_size: int = 32, num_workers: int = 8, task: str = 'segmentation', transform_config_train: Optional[Union[str, albumentations.Compose]] = None, transform_config_val: Optional[Union[str, albumentations.Compose]] = None, seed: Optional[int] = None, create_validation_set: bool = False)[source]¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModuleMVTec AD Lightning Data Module.
- setup(stage: Optional[str] = None) None[source]¶
Setup train, validation and test data.
- Parameters
stage – Optional[str]: Train/Val/Test stages. (Default value = None)
- train_dataloader() pytorch_lightning.utilities.types.TRAIN_DATALOADERS[source]¶
Get train dataloader.