Image and Video Utils#

Path Utils#

Path utilities for handling file paths in anomalib.

This module provides utilities for:

  • Validating and resolving file paths

  • Checking path length and character restrictions

  • Converting between path types

  • Handling file extensions

  • Managing directory types for anomaly detection

Example

>>> from anomalib.data.utils.path import validate_path
>>> path = validate_path("./datasets/MVTec/bottle/train/good/000.png")
>>> print(path)
PosixPath('/abs/path/to/anomalib/datasets/MVTec/bottle/train/good/000.png')
>>> from anomalib.data.utils.path import DirType
>>> print(DirType.NORMAL)
normal
class anomalib.data.utils.path.DirType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Directory type names for organizing anomaly detection datasets.

NORMAL#

Directory containing normal/good samples for training

ABNORMAL#

Directory containing anomalous/defective samples

NORMAL_TEST#

Directory containing normal test samples

NORMAL_DEPTH#

Directory containing depth maps for normal samples

ABNORMAL_DEPTH#

Directory containing depth maps for abnormal samples

NORMAL_TEST_DEPTH#

Directory containing depth maps for normal test samples

MASK#

Directory containing ground truth segmentation masks

anomalib.data.utils.path.contains_non_printable_characters(path)#

Check if path contains non-printable characters.

Parameters:

path (str | Path) – Path to check

Return type:

bool

Returns:

True if path contains non-printable chars, False otherwise

Example

>>> contains_non_printable_characters("normal.txt")
False
>>> contains_non_printable_characters("test\x00.txt")
True
anomalib.data.utils.path.is_path_too_long(path, max_length=512)#

Check if path exceeds maximum allowed length.

Parameters:
  • path (str | Path) – Path to check

  • max_length (int) – Maximum allowed path length. Defaults to 512

Return type:

bool

Returns:

True if path is too long, False otherwise

Example

>>> is_path_too_long("short_path.txt")
False
>>> is_path_too_long("a" * 1000)
True
anomalib.data.utils.path.resolve_path(folder, root=None)#

Combine root and folder paths into absolute path.

Parameters:
  • folder (str | Path) – Folder location containing image or mask data

  • root (str | Path | None) – Optional root directory for the dataset

Return type:

Path

Returns:

Absolute path combining root and folder

Example

>>> path = resolve_path("subdir", "/root")
>>> path.is_absolute()
True
anomalib.data.utils.path.validate_and_resolve_path(folder, root=None, base_dir=None)#

Validate and resolve path by combining validation and resolution.

Parameters:
  • folder (str | Path) – Folder location containing image or mask data

  • root (str | Path | None) – Root directory for the dataset

  • base_dir (str | Path | None) – Base directory to restrict file access

Return type:

Path

Returns:

Validated and resolved absolute Path

Example

>>> path = validate_and_resolve_path("subdir", "/root")
>>> path.is_absolute()
True
anomalib.data.utils.path.validate_path(path, base_dir=None, should_exist=True, extensions=None)#

Validate path for existence, permissions and extension.

Parameters:
  • path (str | Path) – Path to validate

  • base_dir (str | Path | None) – Base directory to restrict file access

  • should_exist (bool) – If True, verify path exists

  • extensions (tuple[str, ...] | None) – Allowed file extensions

Return type:

Path

Returns:

Validated Path object

Raises:

Example

>>> path = validate_path("./datasets/image.png", extensions=(".png",))
>>> path.suffix
'.png'

Download Utils#

Helper functions for downloading datasets with progress bars and hash verification.

This module provides utilities for: - Showing progress bars during downloads with urlretrieve - Verifying file hashes - Safely extracting compressed files

class anomalib.data.utils.download.DownloadInfo(name, url, hashsum, filename=None)#

Bases: object

Information needed to download a dataset from a URL.

Parameters:
  • name (str) – Name of the dataset

  • url (str) – URL to download the dataset from

  • hashsum (str) – Expected hash value of the downloaded file

  • filename (str | None) – Optional filename to save as. If not provided, extracts from URL

class anomalib.data.utils.download.DownloadProgressBar(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, use_ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, delay=0, gui=False, **kwargs)#

Bases: tqdm

Progress bar for urlretrieve downloads.

Subclasses tqdm to provide a progress bar during file downloads.

Example

>>> url = "https://example.com/file.zip"
>>> output_path = "file.zip"
>>> with DownloadProgressBar(unit='B', unit_scale=True, miniters=1,
...         desc=url.split('/')[-1]) as p_bar:
...     urlretrieve(url, filename=output_path,
...                reporthook=p_bar.update_to)
Parameters:
  • iterable (Iterable | None) – Iterable to decorate with a progressbar

  • desc (str | None) – Prefix for the progressbar

  • total (int | float | None) – Expected number of iterations

  • leave (bool | None) – Whether to leave the progress bar after completion

  • file (TextIOWrapper | StringIO | None) – Output stream for progress messages

  • ncols (int | None) – Width of the progress bar

  • mininterval (float | None) – Minimum update interval in seconds

  • maxinterval (float | None) – Maximum update interval in seconds

  • miniters (int | float | None) – Minimum progress display update interval in iterations

  • use_ascii (bool | str | None) – Whether to use ASCII characters for the progress bar

  • disable (bool | None) – Whether to disable the progress bar

  • unit (str | None) – Unit of measurement

  • unit_scale (bool | int | float | None) – Whether to scale units automatically

  • dynamic_ncols (bool | None) – Whether to adapt to terminal resizes

  • smoothing (float | None) – Exponential moving average smoothing factor

  • bar_format (str | None) – Custom progress bar format string

  • initial (int | float | None) – Initial counter value

  • position (int | None) – Line offset for printing

  • postfix (dict | None) – Additional stats to display

  • unit_divisor (float | None) – Unit divisor for scaling

  • write_bytes (bool | None) – Whether to write bytes

  • lock_args (tuple | None) – Arguments passed to refresh

  • nrows (int | None) – Screen height

  • colour (str | None) – Bar color

  • delay (float | None) – Display delay in seconds

  • gui (bool | None) – Whether to use matplotlib animations

update_to(chunk_number=1, max_chunk_size=1, total_size=None)#

Update progress bar based on download progress.

This method is used as a callback for urlretrieve to update the progress bar during downloads.

Parameters:
  • chunk_number (int) – Current chunk being processed

  • max_chunk_size (int) – Maximum size of each chunk

  • total_size (int | None) – Total download size

Return type:

None

anomalib.data.utils.download.check_hash(file_path, expected_hash, algorithm='sha256')#

Verify that a file’s hash matches the expected value.

Parameters:
  • file_path (Path) – Path to file to check

  • expected_hash (str) – Expected hash value

  • algorithm (str) – Hashing algorithm to use

Raises:

ValueError – If the calculated hash does not match the expected hash

Return type:

None

anomalib.data.utils.download.download_and_extract(root, info)#

Download and extract a dataset.

Parameters:
  • root (Path) – Root directory where the dataset will be stored

  • info (DownloadInfo) – Download information for the dataset

Raises:

RuntimeError – If the URL scheme is not http(s)

Return type:

None

anomalib.data.utils.download.extract(file_name, root)#

Extract a compressed dataset file.

Supports .zip, .tar, .gz, .xz and .tgz formats.

Parameters:
  • file_name (Path) – Path of the file to extract

  • root (Path) – Root directory for extraction

Raises:

ValueError – If the file format is not recognized

Return type:

None

anomalib.data.utils.download.generate_hash(file_path, algorithm='sha256')#

Generate a hash of a file using the specified algorithm.

Parameters:
  • file_path (str | Path) – Path to the file to hash

  • algorithm (str) – Hashing algorithm to use (e.g. ‘sha256’, ‘sha3_512’)

Return type:

str

Returns:

Hexadecimal hash string of the file

Raises:

ValueError – If the specified hashing algorithm is not supported

anomalib.data.utils.download.is_file_potentially_dangerous(file_name)#

Check if a file path contains potentially dangerous patterns.

Parameters:

file_name (str) – Path to check

Return type:

bool

Returns:

True if the path matches unsafe patterns, False otherwise

anomalib.data.utils.download.is_within_directory(directory, target)#

Check if a target path is located within a given directory.

Parameters:
  • directory (Path) – Path of the parent directory

  • target (Path) – Path to check

Return type:

bool

Returns:

True if target is within directory, False otherwise

anomalib.data.utils.download.safe_extract(tar_file, root, members)#

Safely extract members from a tar archive.

Parameters:
  • tar_file (TarFile) – TarFile object to extract from

  • root (Path) – Root directory for extraction

  • members (list[TarInfo]) – List of safe members to extract

Return type:

None

Image Utils#

Image utilities for reading, writing and processing images.

This module provides various utility functions for handling images in Anomalib:

  • Reading images in various formats (RGB, grayscale, depth)

  • Writing images to disk

  • Converting between different image formats

  • Processing images (padding, resizing etc.)

  • Handling image filenames and paths

Example

>>> from anomalib.data.utils import read_image
>>> # Read image as numpy array
>>> image = read_image("image.jpg")
>>> print(type(image))
<class 'numpy.ndarray'>
>>> # Read image as tensor
>>> image = read_image("image.jpg", as_tensor=True)
>>> print(type(image))
<class 'torch.Tensor'>
anomalib.data.utils.image.duplicate_filename(path)#

Add numeric suffix to filename if it already exists.

Parameters:

path (str | Path) – Path to file

Returns:

Path with numeric suffix if original exists

Return type:

Path

Examples

>>> duplicate_filename("image.jpg")  # File doesn't exist
PosixPath('image.jpg')
>>> duplicate_filename("exists.jpg")  # File exists
PosixPath('exists_1.jpg')
>>> duplicate_filename("exists.jpg")  # Both exist
PosixPath('exists_2.jpg')
anomalib.data.utils.image.figure_to_array(fig)#

Convert matplotlib figure to numpy array.

Parameters:

fig (Figure) – Matplotlib figure to convert

Returns:

RGB image array

Return type:

np.ndarray

Examples

>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> plt.plot([1, 2, 3])
>>> img = figure_to_array(fig)
>>> type(img)
<class 'numpy.ndarray'>
anomalib.data.utils.image.generate_output_image_filename(input_path, output_path)#

Generate output filename for inference image.

Parameters:
  • input_path (str | Path) – Path to input image

  • output_path (str | Path) – Path to save output (file or directory)

Returns:

Generated output filename

Return type:

Path

Raises:

ValueError – If input_path is not a file

Examples

>>> generate_output_image_filename("input.jpg", "output.jpg")
PosixPath('output.jpg')  # or output_1.jpg if exists
>>> generate_output_image_filename("dir/input.jpg", "outdir")
PosixPath('outdir/dir/input.jpg')
anomalib.data.utils.image.get_image_filename(filename)#

Get validated image filename.

Parameters:

filename (str | Path) – Path to image file

Returns:

Validated path to image file

Return type:

Path

Raises:

Examples

>>> get_image_filename("image.jpg")
PosixPath('image.jpg')
>>> get_image_filename("missing.jpg")
Traceback (most recent call last):
    ...
FileNotFoundError: File not found: missing.jpg
>>> get_image_filename("text.txt")
Traceback (most recent call last):
    ...
ValueError: ``filename`` is not an image file: text.txt
anomalib.data.utils.image.get_image_filenames(path, base_dir=None)#

Get list of image filenames from path.

Parameters:
  • path (str | Path) – Path to image file or directory

  • base_dir (str | Path | None) – Base directory to restrict file access

Returns:

List of paths to image files

Return type:

list[Path]

Examples

>>> get_image_filenames("image.jpg")
[PosixPath('image.jpg')]
>>> get_image_filenames("images/")
[PosixPath('images/001.jpg'), PosixPath('images/002.png')]
>>> get_image_filenames("images/", base_dir="allowed/")
Traceback (most recent call last):
    ...
ValueError: Access denied: Path is outside the allowed directory
anomalib.data.utils.image.get_image_filenames_from_dir(path)#

Get list of image filenames from directory.

Parameters:

path (str | Path) – Path to directory containing images

Returns:

List of paths to image files

Return type:

list[Path]

Raises:

ValueError – If path is not a directory or no images found

Examples

>>> get_image_filenames_from_dir("images/")
[PosixPath('images/001.jpg'), PosixPath('images/002.png')]
>>> get_image_filenames_from_dir("empty/")
Traceback (most recent call last):
    ...
ValueError: Found 0 images in empty/
anomalib.data.utils.image.get_image_height_and_width(image_size)#

Get height and width from image size parameter.

Parameters:

image_size (int | Sequence[int]) – Single int for square, or (H,W) sequence

Returns:

Image height and width

Return type:

tuple[int, int]

Raises:

TypeError – If image_size is not int or sequence of ints

Examples

>>> get_image_height_and_width(256)
(256, 256)
>>> get_image_height_and_width((480, 640))
(480, 640)
>>> get_image_height_and_width(256.0)
Traceback (most recent call last):
    ...
TypeError: ``image_size`` could be either int or tuple[int, int]
anomalib.data.utils.image.is_image_file(filename)#

Check if the filename has a valid image extension.

Parameters:

filename (str | Path) – Path to file to check

Returns:

True if filename has valid image extension

Return type:

bool

Examples

>>> is_image_file("image.jpg")
True
>>> is_image_file("image.png")
True
>>> is_image_file("image.txt")
False
anomalib.data.utils.image.pad_nextpow2(batch)#

Pad images to next power of 2 size.

Finds largest dimension and pads to square power-of-2 size. Handles odd sizes.

Parameters:

batch (torch.Tensor) – Batch of images to pad

Returns:

Padded image batch

Return type:

torch.Tensor

Examples

>>> x = torch.randn(1, 3, 127, 128)
>>> padded = pad_nextpow2(x)
>>> padded.shape
torch.Size([1, 3, 128, 128])
anomalib.data.utils.image.read_depth_image(path)#

Read depth image from TIFF file.

Parameters:

path (str | Path) – Path to TIFF depth image

Returns:

Depth image array

Return type:

np.ndarray

Examples

>>> depth = read_depth_image("depth.tiff")
>>> type(depth)
<class 'numpy.ndarray'>
anomalib.data.utils.image.read_image(path, as_tensor=False)#

Read RGB image from disk.

Parameters:
  • path (str | Path) – Path to image file

  • as_tensor (bool) – If True, return torch.Tensor. Defaults to False

Returns:

Image as tensor or array, normalized to [0,1]

Return type:

torch.Tensor | np.ndarray

Examples

>>> image = read_image("image.jpg")
>>> type(image)
<class 'numpy.ndarray'>
>>> image = read_image("image.jpg", as_tensor=True)
>>> type(image)
<class 'torch.Tensor'>
anomalib.data.utils.image.read_mask(path, as_tensor=False)#

Read grayscale mask from disk.

Parameters:
  • path (str | Path) – Path to mask file

  • as_tensor (bool) – If True, return torch.Tensor. Defaults to False

Returns:

Mask as tensor or array

Return type:

torch.Tensor | np.ndarray

Examples

>>> mask = read_mask("mask.png")
>>> type(mask)
<class 'numpy.ndarray'>
>>> mask = read_mask("mask.png", as_tensor=True)
>>> type(mask)
<class 'torch.Tensor'>
anomalib.data.utils.image.save_image(filename, image, root=None)#

Save image to disk.

Parameters:
  • filename (Path | str) – Output filename

  • image (np.ndarray | Figure) – Image or matplotlib figure to save

  • root (Path | None) – Optional root dir to save under. Defaults to None

Return type:

None

Examples

>>> img = read_image("input.jpg")
>>> save_image("output.jpg", img)
>>> save_image("subdir/output.jpg", img, root=Path("results"))
anomalib.data.utils.image.show_image(image, title='Image')#

Display image in window.

Parameters:
  • image (np.ndarray | Figure) – Image or matplotlib figure to display

  • title (str) – Window title. Defaults to “Image”

Return type:

None

Examples

>>> img = read_image("image.jpg")
>>> show_image(img, title="My Image")

Video Utils#

Video utilities for processing video data in anomaly detection.

This module provides utilities for:

  • Indexing video clips and their corresponding masks

  • Converting videos between different codecs

  • Handling video frames and clips in PyTorch format

Example

>>> from anomalib.data.utils.video import ClipsIndexer
>>> # Create indexer for video files and masks
>>> indexer = ClipsIndexer(
...     video_paths=["video1.mp4", "video2.mp4"],
...     mask_paths=["mask1.mp4", "mask2.mp4"],
...     clip_length_in_frames=16
... )
>>> # Get video clip with metadata
>>> video_item = indexer.get_item(0)
>>> video_item.image.shape  # (16, 3, H, W)
torch.Size([16, 3, 256, 256])
class anomalib.data.utils.video.ClipsIndexer(video_paths, mask_paths, clip_length_in_frames=2, frames_between_clips=1)#

Bases: VideoClips, ABC

Extension of torchvision’s VideoClips class for video and mask indexing.

This class extends VideoClips to handle both video frames and their corresponding mask annotations. It provides functionality to:

  • Index and retrieve video clips

  • Access corresponding mask frames

  • Track frame indices and video metadata

Subclasses must implement the get_mask method. The default implementation assumes video_paths contains video files. For custom data formats (e.g., image sequences), subclasses should override get_clip and _compute_frame_pts.

Parameters:
  • video_paths (list[str]) – List of paths to video files in the dataset

  • mask_paths (list[str]) – List of paths to mask files corresponding to each video

  • clip_length_in_frames (int) – Number of frames in each clip. Defaults to 2

  • frames_between_clips (int) – Stride between consecutive clips. Defaults to 1

get_item(idx)#

Get video clip and metadata at the given index.

Parameters:

idx (int) – Index of the clip to retrieve

Return type:

VideoItem

Returns:

VideoItem containing the clip frames, masks, path and metadata

abstract get_mask(idx)#

Get masks for the clip at the given index.

Parameters:

idx (int) – Index of the clip

Return type:

Tensor | None

Returns:

Tensor containing mask frames, or None if no masks exist

last_frame_idx(video_idx)#

Get index of the last frame in a video.

Parameters:

video_idx (int) – Index of the video in the dataset

Return type:

int

Returns:

Index of the last frame

anomalib.data.utils.video.convert_video(input_path, output_path, codec='MP4V')#

Convert a video file to use a different codec.

Creates the output directory if it doesn’t exist. Reads the input video frame by frame and writes to a new file using the specified codec.

Parameters:
  • input_path (Path) – Path to the input video file

  • output_path (Path) – Path where the converted video will be saved

  • codec (str) – FourCC code for the desired output codec. Defaults to "MP4V"

Return type:

None

Label Utils#

Label name enumeration class.

This module defines an enumeration class for labeling data in anomaly detection tasks. The labels are represented as integers, where:

  • NORMAL (0): Represents normal/good samples

  • ABNORMAL (1): Represents anomalous/defective samples

Example

>>> from anomalib.data.utils.label import LabelName
>>> label = LabelName.NORMAL
>>> label.value
0
>>> label = LabelName.ABNORMAL
>>> label.value
1
class anomalib.data.utils.label.LabelName(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: int, Enum

Enumeration class for labeling data in anomaly detection.

This class inherits from both int and Enum to create an integer-based enumeration. This allows for easy comparison and conversion between label names and their corresponding integer values.

NORMAL#

Label value 0, representing normal/good samples

Type:

int

ABNORMAL#

Label value 1, representing anomalous/defective samples

Type:

int

Bounding Box Utils#

Helper functions for processing bounding box detections and annotations.

This module provides utility functions for converting between different bounding box formats and handling bounding box operations.

anomalib.data.utils.boxes.boxes_to_anomaly_maps(boxes, scores, image_size)#

Convert bounding boxes and scores to anomaly heatmaps.

Parameters:
  • boxes (Tensor) – List of length B where each element is tensor of shape (N, 4) containing bounding box coordinates in xyxy format

  • scores (Tensor) – List of length B where each element is 1D tensor of length N containing anomaly scores for each box

  • image_size (tuple[int, int]) – Output heatmap size as (H, W)

Return type:

Tensor

Returns:

Anomaly heatmaps of shape (B, H, W). Pixels within each box are set to that box’s anomaly score. For overlapping boxes, the highest score is used.

Examples

>>> boxes = [torch.tensor([[10, 15, 20, 25]])]  # One box
>>> scores = [torch.tensor([0.9])]  # Score for the box
>>> maps = boxes_to_anomaly_maps(boxes, scores, (32, 32))
>>> maps[0, 20, 15]  # Point inside box
tensor(0.9000)
anomalib.data.utils.boxes.boxes_to_masks(boxes, image_size)#

Convert bounding boxes to segmentation masks.

Parameters:
  • boxes (list[Tensor]) – List of length B where each element is tensor of shape (N, 4) containing bounding box coordinates in xyxy format

  • image_size (tuple[int, int]) – Output mask size as (H, W)

Return type:

Tensor

Returns:

Binary masks of shape (B, H, W) where pixels contained within boxes are set to 1

Examples

>>> boxes = [torch.tensor([[10, 15, 20, 25]])]  # One box in first image
>>> masks = boxes_to_masks(boxes, (32, 32))
>>> masks.shape
torch.Size([1, 32, 32])
anomalib.data.utils.boxes.masks_to_boxes(masks, anomaly_maps=None)#

Convert batch of segmentation masks to bounding box coordinates.

Parameters:
  • masks (Tensor) – Input tensor of masks. Can be one of: - shape (B, 1, H, W) - shape (B, H, W) - shape (H, W)

  • anomaly_maps (Tensor | None) – Optional anomaly maps. Can be one of: - shape (B, 1, H, W) - shape (B, H, W) - shape (H, W) Used to determine anomaly scores for converted bounding boxes.

Returns:

  • List of length B where each element is tensor of shape (N, 4) containing bounding box coordinates in xyxy format

  • List of length B where each element is tensor of length N containing anomaly scores for each converted box

Return type:

Tuple containing

Examples

>>> import torch
>>> masks = torch.zeros((2, 1, 32, 32))
>>> masks[0, 0, 10:20, 15:25] = 1  # Add box in first image
>>> boxes, scores = masks_to_boxes(masks)
>>> boxes[0]  # Coordinates for first image
tensor([[15., 10., 24., 19.]])
anomalib.data.utils.boxes.scale_boxes(boxes, image_size, new_size)#

Scale bounding box coordinates to a new image size.

Parameters:
  • boxes (Tensor) – Boxes of shape (N, 4) in (x1, y1, x2, y2) format

  • image_size (Size) – Original image size the boxes were computed for

  • new_size (Size) – Target image size to scale boxes to

Return type:

Tensor

Returns:

Scaled boxes of shape (N, 4) in (x1, y1, x2, y2) format

Examples

>>> boxes = torch.tensor([[10, 15, 20, 25]])
>>> scaled = scale_boxes(boxes, (32, 32), (64, 64))
>>> scaled
tensor([[20., 30., 40., 50.]])

Dataset Split Utils#

Dataset splitting utilities.

This module provides functions for splitting datasets in anomaly detection tasks:

  • Splitting normal images into training and validation sets

  • Creating validation sets from test sets

  • Label-aware splitting to maintain class distributions

  • Random splitting with optional seed for reproducibility

These utilities are particularly useful when:

  • The test set lacks normal images

  • The dataset needs a validation set

  • Class balance needs to be maintained during splits

Example

>>> from anomalib.data.utils.split import random_split
>>> # Split dataset with 80/20 ratio
>>> train_set, val_set = random_split(dataset, split_ratio=0.2)
>>> len(train_set), len(val_set)
(800, 200)
>>> # Label-aware split preserving class distributions
>>> splits = random_split(dataset, [0.7, 0.2, 0.1], label_aware=True)
>>> len(splits)
3
class anomalib.data.utils.split.Split(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Dataset split type.

TRAIN#

Training split

VAL#

Validation split

TEST#

Test split

class anomalib.data.utils.split.TestSplitMode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Mode used to obtain test split.

NONE#

No test split

FROM_DIR#

Test split from directory

SYNTHETIC#

Synthetic test split

class anomalib.data.utils.split.ValSplitMode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Mode used to obtain validation split.

NONE#

No validation split

SAME_AS_TEST#

Use same split as test

FROM_TRAIN#

Split from training set

FROM_TEST#

Split from test set

SYNTHETIC#

Synthetic validation split

anomalib.data.utils.split.concatenate_datasets(datasets)#

Concatenate multiple datasets into a single dataset.

Parameters:

datasets (Sequence[AnomalibDataset]) – Sequence of at least two datasets to concatenate

Return type:

AnomalibDataset

Returns:

Combined dataset containing samples from all input datasets

Example

>>> combined = concatenate_datasets([dataset1, dataset2])
>>> len(combined) == len(dataset1) + len(dataset2)
True
anomalib.data.utils.split.random_split(dataset, split_ratio, label_aware=False, seed=None)#

Randomly split a dataset into multiple subsets.

Parameters:
  • dataset (AnomalibDataset) – Source dataset to split

  • split_ratio (float | Sequence[float]) – Split ratios that must sum to 1. If single float x is provided, splits into [1-x, x]

  • label_aware (bool) – If True, maintains class label distributions in splits

  • seed (int | None) – Random seed for reproducibility

Return type:

list[AnomalibDataset]

Returns:

List of dataset splits based on provided ratios

Example

>>> splits = random_split(dataset, [0.7, 0.3], seed=42)
>>> len(splits)
2
>>> # Label-aware splitting
>>> splits = random_split(dataset, 0.2, label_aware=True)
>>> len(splits)
2
anomalib.data.utils.split.split_by_label(dataset)#

Split dataset into normal and anomalous subsets.

Parameters:

dataset (AnomalibDataset) – Dataset to split by label

Returns:

  • Dataset with only normal samples (label 0)

  • Dataset with only anomalous samples (label 1)

Return type:

Tuple containing

Example

>>> normal, anomalous = split_by_label(dataset)
>>> len(normal) + len(anomalous) == len(dataset)
True