Tabular Datamodule#
Custom Tabular Data Module.
This script creates a custom Lightning DataModule from a table or tabular file containing image paths and labels.
Example
Create a Tabular datamodule:
>>> from anomalib.data import Tabular
>>> samples = {
... "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ],
... "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL, ... ],
... "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ],
... }
>>> datamodule = Tabular(
... name="custom",
... samples=samples,
... root="./datasets/custom",
... )
- class anomalib.data.datamodules.image.tabular.Tabular(name, samples, root=None, normal_split_ratio=0.2, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.FROM_TEST, val_split_ratio=0.5, seed=None)#
Bases:
AnomalibDataModuleTabular DataModule.
- Parameters:
name (
str) – Name of the dataset. Used for logging/saving.samples (
dict|list|DataFrame) – PandasDataFrameor compatiblelistordictcontaining the dataset information.root (
str|Path|None) – Root folder containing normal and abnormal directories. Defaults toNone.normal_split_ratio (
float) – Ratio to split normal training images for test set when no normal test images exist. Defaults to0.2.train_batch_size (
int) – Training batch size. Defaults to32.eval_batch_size (
int) – Validation/test batch size. Defaults to32.num_workers (
int) – Number of workers for data loading. Defaults to8.train_augmentations (
Transform|None) – Augmentations to apply to the training images Defaults toNone.val_augmentations (
Transform|None) – Augmentations to apply to the validation images. Defaults toNone.test_augmentations (
Transform|None) – Augmentations to apply to the test images. Defaults toNone.augmentations (
Transform|None) – General augmentations to apply if stage-specific augmentations are not provided.test_split_mode (
TestSplitMode|str) – Method to obtain test subset. Defaults toTestSplitMode.FROM_DIR.test_split_ratio (
float) – Fraction of train images for testing. Defaults to0.2.val_split_mode (
ValSplitMode|str) – Method to obtain validation subset. Defaults toValSplitMode.FROM_TEST.val_split_ratio (
float) – Fraction of images for validation. Defaults to0.5.seed (
int|None) – Random seed for splitting. Defaults toNone.
Example
Create and setup a tabular datamodule:
>>> from anomalib.data import Tabular >>> samples = { ... "image_path": ["images/image1.png", "images/image2.png", "images/image3.png", ... ], ... "label_index": [LabelName.NORMAL, LabelName.NORMAL, LabelName.ABNORMAL, ... ], ... "split": [Split.TRAIN, Split.TRAIN, Split.TEST, ... ], ... } >>> datamodule = Tabular( ... name="custom", ... samples=samples, ... root="./datasets/custom", ... ) >>> datamodule.setup()
Get a batch from train dataloader:
>>> batch = next(iter(datamodule.train_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
Get a batch from test dataloader:
>>> batch = next(iter(datamodule.test_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
- classmethod from_file(name, file_path, file_format=None, pd_kwargs=None, **kwargs)#
Create Tabular Datamodule from file.
- Parameters:
name (
str) – Name of the dataset. This is used to name the datamodule, especially when logging/saving.file_path (
str|Path) – Path to tabular file containing the datset information.file_format (
str|None) – File format supported by a pd.read_* method, such ascsv,parquetorjson. Defaults toNone(inferred from file suffix).pd_kwargs (
dict|None) – Keyword argument dictionary for the pd.read_* method. Defaults toNone.kwargs (dict) – Additional keyword arguments for the Tabular Datamodule class.
- Returns:
Tabular Datamodule
- Return type:
Example
Prepare a tabular file (such as
samples.csvorsamples.parquet) with the following columns:image_path(absolute or relative toroot),label_index(0for normal,1for anomalous samples), andsplit(trainortest). For segmentation tasks, also include amask_pathcolumn.From this file, create and setup a tabular datamodule:
>>> from anomalib.data import Tabular >>> datamodule = Tabular.from_file( ... name="custom", ... file_path="./samples.csv", ... root="./datasets/custom", ... ) >>> datamodule.setup()
Get a batch from train dataloader:
>>> batch = next(iter(datamodule.train_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
Get a batch from test dataloader:
>>> batch = next(iter(datamodule.test_dataloader())) >>> batch.keys() dict_keys(['image', 'label', 'mask', 'image_path', 'mask_path'])
See also
../../datasets/image/tabular - Tabular Dataset