Datumaro Datamodule#
DataModule for Datumaro format.
This module provides a PyTorch Lightning DataModule for datasets in Datumaro format. Currently only supports annotations exported from Intel Geti™.
Example
Create a Datumaro datamodule:
>>> from pathlib import Path
>>> from anomalib.data import Datumaro
>>> datamodule = Datumaro(
... root="./datasets/datumaro",
... train_batch_size=32,
... eval_batch_size=32,
... num_workers=8,
... )
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image_path', 'label', 'image'])
Notes
The directory structure should be organized as follows:
root/
├── annotations/
│ ├── train.json
│ └── test.json
└── images/
├── train/
│ ├── image1.jpg
│ └── image2.jpg
└── test/
├── image3.jpg
└── image4.jpg
- class anomalib.data.datamodules.image.datumaro.Datumaro(root, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.5, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#
Bases:
AnomalibDataModuleDatumaro datamodule.
- Parameters:
root (Path | str) – Path to the dataset root directory.
train_batch_size (int, optional) – Training batch size. Defaults to
32.eval_batch_size (int, optional) – Test batch size. Defaults to
32.num_workers (int, optional) – Number of workers. Defaults to
8.train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to
None.val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to
None.test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to
None.augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
image_size (tuple[int, int], optional) – Size to which input images should be resized. Defaults to
None.test_split_mode (TestSplitMode) – Setting that determines how the testing subset is obtained. Defaults to
TestSplitMode.FROM_DIR.test_split_ratio (float) – Fraction of images from the train set that will be reserved for testing. Defaults to
0.2.val_split_mode (ValSplitMode) – Setting that determines how the validation subset is obtained. Defaults to
ValSplitMode.SAME_AS_TEST.val_split_ratio (float) – Fraction of train or test images that will be reserved for validation. Defaults to
0.5.seed (int | None, optional) – Seed which may be set to a fixed value for reproducibility. Defaults to
None.
Example
>>> from anomalib.data import Datumaro >>> datamodule = Datumaro( ... root="./datasets/datumaro", ... train_batch_size=32, ... eval_batch_size=32, ... num_workers=8, ... ) >>> datamodule.setup() >>> i, data = next(enumerate(datamodule.train_dataloader())) >>> data.keys() dict_keys(['image_path', 'label', 'image'])