Btech Datamodule

Btech Datamodule#

BTech Data Module.

This module provides a PyTorch Lightning DataModule for the BTech dataset. If the dataset is not available locally, it will be downloaded and extracted automatically.

Example

Create a BTech datamodule:

>>> from anomalib.data import BTech
>>> datamodule = BTech(
...     root="./datasets/BTech",
...     category="01"
... )

Notes

The dataset will be automatically downloaded and converted to the required format when first used. The directory structure after preparation will be:

datasets/
└── BTech/
    ├── 01/
    ├── 02/
    └── 03/
License:

BTech dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/

Reference:

Mishra, Pankaj, et al. “BTAD—A Large Scale Dataset and Benchmark for Real-World Industrial Anomaly Detection.” Pattern Recognition 136 (2024): 109542.

class anomalib.data.datamodules.image.btech.BTech(root='./datasets/BTech', category='01', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

BTech Lightning Data Module.

Parameters:
  • root (Path | str) – Path to the root of the dataset. Defaults to "./datasets/BTech".

  • category (str) – Category of the BTech dataset (e.g. "01", "02", or "03"). Defaults to "01".

  • train_batch_size (int, optional) – Training batch size. Defaults to 32.

  • eval_batch_size (int, optional) – Test batch size. Defaults to 32.

  • num_workers (int, optional) – Number of workers. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • test_split_mode (TestSplitMode) – Setting that determines how the testing subset is obtained. Defaults to TestSplitMode.FROM_DIR.

  • test_split_ratio (float) – Fraction of images from the train set that will be reserved for testing. Defaults to 0.2.

  • val_split_mode (ValSplitMode) – Setting that determines how the validation subset is obtained. Defaults to ValSplitMode.SAME_AS_TEST.

  • val_split_ratio (float) – Fraction of train or test images that will be reserved for validation. Defaults to 0.5.

  • seed (int | None, optional) – Seed which may be set to a fixed value for reproducibility. Defaults to None.

Example

To create the BTech datamodule, instantiate the class and call setup:

>>> from anomalib.data import BTech
>>> datamodule = BTech(
...     root="./datasets/BTech",
...     category="01",
...     train_batch_size=32,
...     eval_batch_size=32,
...     num_workers=8,
... )
>>> datamodule.setup()

Get the train dataloader and first batch:

>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image'])
>>> data["image"].shape
torch.Size([32, 3, 256, 256])

Access the validation dataloader and first batch:

>>> i, data = next(enumerate(datamodule.val_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'mask_path', 'image', 'mask'])
>>> data["image"].shape, data["mask"].shape
(torch.Size([32, 3, 256, 256]), torch.Size([32, 256, 256]))

Access the test dataloader and first batch:

>>> i, data = next(enumerate(datamodule.test_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'mask_path', 'image', 'mask'])
>>> data["image"].shape, data["mask"].shape
(torch.Size([32, 3, 256, 256]), torch.Size([32, 256, 256]))
prepare_data()#

Download the dataset if not available.

This method checks if the specified dataset is available in the file system. If not, it downloads and extracts the dataset into the appropriate directory.

Return type:

None

Example

Assume the dataset is not available on the file system. Here’s how the directory structure looks before and after calling prepare_data:

# Before
$ tree datasets
datasets
├── dataset1
└── dataset2

# Calling prepare_data
>>> datamodule = BTech(root="./datasets/BTech", category="01")
>>> datamodule.prepare_data()

# After
$ tree datasets
datasets
├── dataset1
├── dataset2
└── BTech
    ├── 01
    ├── 02
    └── 03