Image Base Datamodule#

Base Anomalib data module.

This module provides the base data module class used across Anomalib. It handles dataset splitting, validation set creation, and dataloader configuration.

The module contains:

Example

Create a datamodule from a config file:

>>> from anomalib.data import AnomalibDataModule
>>> data_config = "examples/configs/data/mvtec.yaml"
>>> datamodule = AnomalibDataModule.from_config(config_path=data_config)

Override config with additional arguments:

>>> override_kwargs = {"data.train_batch_size": 8}
>>> datamodule = AnomalibDataModule.from_config(
...     config_path=data_config,
...     **override_kwargs
... )
class anomalib.data.datamodules.base.image.AnomalibDataModule(train_batch_size, eval_batch_size, num_workers, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, val_split_mode=None, val_split_ratio=None, test_split_mode=None, test_split_ratio=None, seed=None)#

Bases: LightningDataModule, ABC

Base Anomalib data module.

This class extends PyTorch Lightning’s LightningDataModule to provide common functionality for anomaly detection datasets.

Parameters:
  • train_batch_size (int) – Batch size used by the train dataloader.

  • eval_batch_size (int) – Batch size used by the val and test dataloaders.

  • num_workers (int) – Number of workers used by the train, val and test dataloaders.

  • train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • val_split_mode (ValSplitMode | str) –

    Method to obtain validation set. Options:

    • none: No validation set

    • same_as_test: Use test set as validation

    • from_test: Sample from test set

    • synthetic: Generate synthetic anomalies

  • val_split_ratio (float) – Fraction of data to use for validation

  • test_split_mode (TestSplitMode | str | None) –

    Method to obtain test set. Options:

    • none: No test split

    • from_dir: Use separate test directory

    • synthetic: Generate synthetic anomalies

    Defaults to None.

  • test_split_ratio (float | None) – Fraction of data to use for testing. Defaults to None.

  • seed (int | None) – Random seed for reproducible splitting. Defaults to None.

property category: str#

Get dataset category name.

Returns:

Name of the current category

Return type:

str

classmethod from_config(config_path, **kwargs)#

Create datamodule instance from config file.

Parameters:
  • config_path (str | Path) – Path to config file

  • **kwargs – Additional args to override config

Returns:

Instantiated datamodule

Return type:

AnomalibDataModule

Raises:

Example

Load from config file:

>>> config_path = "examples/configs/data/mvtec.yaml"
>>> datamodule = AnomalibDataModule.from_config(config_path)

Override config values:

>>> datamodule = AnomalibDataModule.from_config(
...     config_path,
...     data_train_batch_size=8
... )
property name: str#

Name of the datamodule.

Returns:

Class name of the datamodule

Return type:

str

predict_dataloader()#

Get prediction dataloader.

By default uses the test dataloader.

Returns:

Prediction dataloader

Return type:

DataLoader

setup(stage=None)#

Set up train, validation and test data.

This method handles the data splitting logic based on the configured modes.

Parameters:

stage (str | None) – Current stage (fit/validate/test/predict). Defaults to None.

Return type:

None

property task: TaskType#

Get the task type.

Returns:

Type of anomaly task (classification/segmentation)

Return type:

TaskType

Raises:

AttributeError – If no datasets have been set up yet

test_dataloader()#

Get test dataloader.

Returns:

Test dataloader

Return type:

DataLoader

train_dataloader()#

Get training dataloader.

Returns:

Training dataloader

Return type:

DataLoader

val_dataloader()#

Get validation dataloader.

Returns:

Validation dataloader

Return type:

DataLoader