Tabular Datamodule

Tabular Datamodule#

Custom Tabular Data Module.

This script creates a custom Lightning DataModule from a table or tabular file containing image paths and labels.

Example

Create a Tabular datamodule:

>>> from anomalib.data import Tabular
>>> samples = {
...     "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
...     "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL,  ... ],
...     "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
...     name="custom",
...     samples=samples,
...     root="./datasets/custom",
... )
class anomalib.data.datamodules.image.tabular.Tabular(name, samples, root=None, normal_split_ratio=0.2, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

Tabular DataModule.

Parameters:
  • name (str) – Name of the dataset. Used for logging/saving.

  • samples (dict | list | DataFrame) – Pandas DataFrame or compatible list or dict containing the dataset information.

  • root (str | Path | None) – Root folder containing normal and abnormal directories. Defaults to None.

  • normal_split_ratio (float) – Ratio to split normal training images for test set when no normal test images exist. Defaults to 0.2.

  • train_batch_size (int) – Training batch size. Defaults to 32.

  • eval_batch_size (int) – Validation/test batch size. Defaults to 32.

  • num_workers (int) – Number of workers for data loading. Defaults to 8.

  • train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.

  • val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.

  • test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.

  • augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.

  • test_split_mode (TestSplitMode) – Method to obtain test subset. Defaults to TestSplitMode.FROM_DIR.

  • test_split_ratio (float) – Fraction of train images for testing. Defaults to 0.2.

  • val_split_mode (ValSplitMode) – Method to obtain validation subset. Defaults to ValSplitMode.FROM_TEST.

  • val_split_ratio (float) – Fraction of images for validation. Defaults to 0.5.

  • seed (int | None) – Random seed for splitting. Defaults to None.

Example

Create and setup a tabular datamodule:

>>> from anomalib.data import Tabular
>>> samples = {
...     "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
...     "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL,  ... ],
...     "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
...     name="custom",
...     samples=samples,
...     root="./datasets/custom",
... )
>>> datamodule.setup()

Get a batch from train dataloader:

>>> batch = next(iter(datamodule.train_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

Get a batch from test dataloader:

>>> batch = next(iter(datamodule.test_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
classmethod from_file(name, file_path, file_format=None, pd_kwargs=None, **kwargs)#

Create Tabular Datamodule from file.

Parameters:
  • name (str) – Name of the dataset. This is used to name the datamodule, especially when logging/saving.

  • file_path (str | Path) – Path to tabular file containing the datset information.

  • file_format (str) – File format supported by a pd.read_* method, such as csv, parquet or json. Defaults to None (inferred from file suffix).

  • pd_kwargs (dict | None) – Keyword argument dictionary for the pd.read_* method. Defaults to None.

  • kwargs (dict) – Additional keyword arguments for the Tabular Datamodule class.

Returns:

Tabular Datamodule

Return type:

Tabular

Example

Prepare a tabular file (such as samples.csv or samples.parquet) with the following columns: image_path (absolute or relative to root), label_index (0 for normal, 1 for anomalous samples), and split (train or test). For segmentation tasks, also include a mask_path column.

From this file, create and setup a tabular datamodule:

>>> from anomalib.data import Tabular
>>> datamodule = Tabular.from_file(
...     name="custom",
...     file_path="./samples.csv",
...     root="./datasets/custom",
... )
>>> datamodule.setup()

Get a batch from train dataloader:

>>> batch = next(iter(datamodule.train_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

Get a batch from test dataloader:

>>> batch = next(iter(datamodule.test_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
property name: str#

Get name of the datamodule.

Returns:

Name of the datamodule.

See also

../../datasets/image/tabular - Tabular Dataset