Tabular Datamodule

Tabular Datamodule#

Custom Tabular Data Module.

This script creates a custom Lightning DataModule from a table or tabular file containing image paths and labels.

Example

Create a Tabular datamodule:

>>> from anomalib.data import Tabular
>>> samples = {
...     "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
...     "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL,  ... ],
...     "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
...     name="custom",
...     samples=samples,
...     root="./datasets/custom",
... )

class anomalib.data.datamodules.image.tabular.Tabular(name, samples, root=None, normal_split_ratio=0.2, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibDataModule

Tabular DataModule.

Parameters:

name (str) – Name of the dataset. Used for logging/saving.
samples (dict | list | DataFrame) – Pandas DataFrame or compatible list or dict containing the dataset information.
root (str | Path | None) – Root folder containing normal and abnormal directories. Defaults to None.
normal_split_ratio (float) – Ratio to split normal training images for test set when no normal test images exist. Defaults to 0.2.
train_batch_size (int) – Training batch size. Defaults to 32.
eval_batch_size (int) – Validation/test batch size. Defaults to 32.
num_workers (int) – Number of workers for data loading. Defaults to 8.
train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.
val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.
test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.
augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
test_split_mode (TestSplitMode) – Method to obtain test subset. Defaults to TestSplitMode.FROM_DIR.
test_split_ratio (float) – Fraction of train images for testing. Defaults to 0.2.
val_split_mode (ValSplitMode) – Method to obtain validation subset. Defaults to ValSplitMode.FROM_TEST.
val_split_ratio (float) – Fraction of images for validation. Defaults to 0.5.
seed (int | None) – Random seed for splitting. Defaults to None.

Example

Create and setup a tabular datamodule:

>>> from anomalib.data import Tabular
>>> samples = {
...     "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
...     "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL,  ... ],
...     "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
...     name="custom",
...     samples=samples,
...     root="./datasets/custom",
... )
>>> datamodule.setup()

Get a batch from train dataloader:

>>> batch = next(iter(datamodule.train_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

Get a batch from test dataloader:

>>> batch = next(iter(datamodule.test_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

classmethod from_file(name, file_path, file_format=None, pd_kwargs=None, **kwargs)#

Create Tabular Datamodule from file.

Parameters:

name (str) – Name of the dataset. This is used to name the datamodule, especially when logging/saving.
file_path (str | Path) – Path to tabular file containing the datset information.
file_format (str) – File format supported by a pd.read_* method, such as csv, parquet or json. Defaults to None (inferred from file suffix).
pd_kwargs (dict | None) – Keyword argument dictionary for the pd.read_* method. Defaults to None.
kwargs (dict) – Additional keyword arguments for the Tabular Datamodule class.

Returns:

Tabular Datamodule

Return type:

Tabular

Example

Prepare a tabular file (such as samples.csv or samples.parquet) with the following columns: image_path (absolute or relative to root), label_index (0 for normal, 1 for anomalous samples), and split (train or test). For segmentation tasks, also include a mask_path column.

From this file, create and setup a tabular datamodule:

>>> from anomalib.data import Tabular
>>> datamodule = Tabular.from_file(
...     name="custom",
...     file_path="./samples.csv",
...     root="./datasets/custom",
... )
>>> datamodule.setup()

Get a batch from train dataloader:

>>> batch = next(iter(datamodule.train_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

Get a batch from test dataloader:

>>> batch = next(iter(datamodule.test_dataloader()))
>>> batch.keys()
dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])

property name: str#

Get name of the datamodule.

Returns:: Name of the datamodule.

Tabular Datamodule

Contents

Tabular Datamodule#