Datumaro Datamodule

Contents

Datumaro Datamodule#

DataModule for Datumaro format.

This module provides a PyTorch Lightning DataModule for datasets in Datumaro format. Currently only supports annotations exported from Intel Geti™.

Example

Create a Datumaro datamodule:

>>> from pathlib import Path
>>> from anomalib.data import Datumaro
>>> datamodule = Datumaro(
...     root="./datasets/datumaro",
...     train_batch_size=32,
...     eval_batch_size=32,
...     num_workers=8,
... )
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'image'])

Notes

The directory structure should be organized as follows:

root/
├── annotations/
│   ├── train.json
│   └── test.json
└── images/
    ├── train/
    │   ├── image1.jpg
    │   └── image2.jpg
    └── test/
        ├── image3.jpg
        └── image4.jpg
class anomalib.data.datamodules.image.datumaro.Datumaro(root, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.5, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

Datumaro datamodule.

Parameters:
  • root (Path | str) – Path to the dataset root directory.

  • train_batch_size (int, optional) – Training batch size. Defaults to 32.

  • eval_batch_size (int, optional) – Test batch size. Defaults to 32.

  • num_workers (int, optional) – Number of workers. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • image_size (tuple[int, int], optional) – Size to which input images should be resized. Defaults to None.

  • test_split_mode (TestSplitMode) – Setting that determines how the testing subset is obtained. Defaults to TestSplitMode.FROM_DIR.

  • test_split_ratio (float) – Fraction of images from the train set that will be reserved for testing. Defaults to 0.2.

  • val_split_mode (ValSplitMode) – Setting that determines how the validation subset is obtained. Defaults to ValSplitMode.SAME_AS_TEST.

  • val_split_ratio (float) – Fraction of train or test images that will be reserved for validation. Defaults to 0.5.

  • seed (int | None, optional) – Seed which may be set to a fixed value for reproducibility. Defaults to None.

Example

>>> from anomalib.data import Datumaro
>>> datamodule = Datumaro(
...     root="./datasets/datumaro",
...     train_batch_size=32,
...     eval_batch_size=32,
...     num_workers=8,
... )
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'image'])