Tabular Datamodule#
Custom Tabular Data Module.
This script creates a custom Lightning DataModule from a table or tabular file containing image paths and labels.
Example
Create a Tabular datamodule:
>>> from anomalib.data import Tabular
>>> samples = {
... "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
... "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL, ... ],
... "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
... name="custom",
... samples=samples,
... root="./datasets/custom",
... )
- class anomalib.data.datamodules.image.tabular.Tabular(name, samples, root=None, normal_split_ratio=0.2, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#
Bases:
AnomalibDataModuleTabular DataModule.
- Parameters:
name (str) – Name of the dataset. Used for logging/saving.
samples (dict | list | DataFrame) – Pandas
DataFrameor compatiblelistordictcontaining the dataset information.root (str | Path | None) – Root folder containing normal and abnormal directories. Defaults to
None.normal_split_ratio (float) – Ratio to split normal training images for test set when no normal test images exist. Defaults to
0.2.train_batch_size (int) – Training batch size. Defaults to
32.eval_batch_size (int) – Validation/test batch size. Defaults to
32.num_workers (int) – Number of workers for data loading. Defaults to
8.train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to
None.val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to
None.test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to
None.augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
test_split_mode (TestSplitMode) – Method to obtain test subset. Defaults to
TestSplitMode.FROM_DIR.test_split_ratio (float) – Fraction of train images for testing. Defaults to
0.2.val_split_mode (ValSplitMode) – Method to obtain validation subset. Defaults to
ValSplitMode.FROM_TEST.val_split_ratio (float) – Fraction of images for validation. Defaults to
0.5.seed (int | None) – Random seed for splitting. Defaults to
None.
Example
Create and setup a tabular datamodule:
>>> from anomalib.data import Tabular >>> samples = { ... "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ], ... "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL, ... ], ... "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ], ... } >>> datamodule = Tabular( ... name="custom", ... samples=samples, ... root="./datasets/custom", ... ) >>> datamodule.setup()
Get a batch from train dataloader:
>>> batch = next(iter(datamodule.train_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
Get a batch from test dataloader:
>>> batch = next(iter(datamodule.test_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
- classmethod from_file(name, file_path, file_format=None, pd_kwargs=None, **kwargs)#
Create Tabular Datamodule from file.
- Parameters:
name (str) – Name of the dataset. This is used to name the datamodule, especially when logging/saving.
file_path (str | Path) – Path to tabular file containing the datset information.
file_format (str) – File format supported by a pd.read_* method, such as
csv,parquetorjson. Defaults toNone(inferred from file suffix).pd_kwargs (dict | None) – Keyword argument dictionary for the pd.read_* method. Defaults to
None.kwargs (dict) – Additional keyword arguments for the Tabular Datamodule class.
- Returns:
Tabular Datamodule
- Return type:
Example
Prepare a tabular file (such as
samples.csvorsamples.parquet) with the following columns:image_path(absolute or relative toroot),label_index(0for normal,1for anomalous samples), andsplit(trainortest). For segmentation tasks, also include amask_pathcolumn.From this file, create and setup a tabular datamodule:
>>> from anomalib.data import Tabular >>> datamodule = Tabular.from_file( ... name="custom", ... file_path="./samples.csv", ... root="./datasets/custom", ... ) >>> datamodule.setup()
Get a batch from train dataloader:
>>> batch = next(iter(datamodule.train_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
Get a batch from test dataloader:
>>> batch = next(iter(datamodule.test_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
See also
../../datasets/image/tabular - Tabular Dataset