Image Base Datamodule#
Base Anomalib data module.
This module provides the base data module class used across Anomalib. It handles dataset splitting, validation set creation, and dataloader configuration.
- The module contains:
AnomalibDataModule: Base class for all Anomalib data modules
Example
Create a datamodule from a config file:
>>> from anomalib.data import AnomalibDataModule
>>> data_config = "examples/configs/data/mvtec.yaml"
>>> datamodule = AnomalibDataModule.from_config(config_path=data_config)
Override config with additional arguments:
>>> override_kwargs = {"data.train_batch_size": 8}
>>> datamodule = AnomalibDataModule.from_config(
... config_path=data_config,
... **override_kwargs
... )
- class anomalib.data.datamodules.base.image.AnomalibDataModule(train_batch_size, eval_batch_size, num_workers, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, val_split_mode=None, val_split_ratio=None, test_split_mode=None, test_split_ratio=None, seed=None)#
Bases:
LightningDataModule,ABCBase Anomalib data module.
This class extends PyTorch Lightning’s
LightningDataModuleto provide common functionality for anomaly detection datasets.- Parameters:
train_batch_size (int) – Batch size used by the train dataloader.
eval_batch_size (int) – Batch size used by the val and test dataloaders.
num_workers (int) – Number of workers used by the train, val and test dataloaders.
train_augmentations (Transform | None) – Augmentations to apply dto the training images Defaults to
None.val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to
None.test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to
None.augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
val_split_mode (ValSplitMode | str) –
Method to obtain validation set. Options:
none: No validation setsame_as_test: Use test set as validationfrom_test: Sample from test setsynthetic: Generate synthetic anomalies
val_split_ratio (float) – Fraction of data to use for validation
test_split_mode (TestSplitMode | str | None) –
Method to obtain test set. Options:
none: No test splitfrom_dir: Use separate test directorysynthetic: Generate synthetic anomalies
Defaults to
None.test_split_ratio (float | None) – Fraction of data to use for testing. Defaults to
None.seed (int | None) – Random seed for reproducible splitting. Defaults to
None.
- property category: str#
Get dataset category name.
- Returns:
Name of the current category
- Return type:
- classmethod from_config(config_path, **kwargs)#
Create datamodule instance from config file.
- Parameters:
config_path (str | Path) – Path to config file
**kwargs – Additional args to override config
- Returns:
Instantiated datamodule
- Return type:
- Raises:
FileNotFoundError – If config file not found
ValueError – If instantiated object is not AnomalibDataModule
Example
Load from config file:
>>> config_path = "examples/configs/data/mvtec.yaml" >>> datamodule = AnomalibDataModule.from_config(config_path)
Override config values:
>>> datamodule = AnomalibDataModule.from_config( ... config_path, ... data_train_batch_size=8 ... )
- predict_dataloader()#
Get prediction dataloader.
By default uses the test dataloader.
- Returns:
Prediction dataloader
- Return type:
DataLoader
- setup(stage=None)#
Set up train, validation and test data.
This method handles the data splitting logic based on the configured modes.
- property task: TaskType#
Get the task type.
- Returns:
Type of anomaly task (classification/segmentation)
- Return type:
TaskType
- Raises:
AttributeError – If no datasets have been set up yet
- test_dataloader()#
Get test dataloader.
- Returns:
Test dataloader
- Return type:
DataLoader
- train_dataloader()#
Get training dataloader.
- Returns:
Training dataloader
- Return type:
DataLoader
- val_dataloader()#
Get validation dataloader.
- Returns:
Validation dataloader
- Return type:
DataLoader