Synthetic Data Utils#
Utilities to generate synthetic data.
This module provides utilities for generating synthetic data for anomaly detection. The utilities include:
Perlin noise generation: Functions for creating Perlin noise patterns
Anomaly generation: Classes for generating synthetic anomalies
Example
>>> from anomalib.data.utils.generators import generate_perlin_noise
>>> # Generate 256x256 Perlin noise
>>> noise = generate_perlin_noise(256, 256)
>>> print(noise.shape)
torch.Size([256, 256])
>>> from anomalib.data.utils.generators import PerlinAnomalyGenerator
>>> # Create anomaly generator
>>> generator = PerlinAnomalyGenerator()
>>> # Generate anomaly mask
>>> mask = generator.generate(256, 256)
- class anomalib.data.utils.generators.PerlinAnomalyGenerator(anomaly_source_path=None, probability=0.5, blend_factor=(0.2, 1.0), rotation_range=(-90, 90))#
Bases:
Transform
Perlin noise-based synthetic anomaly generator.
This class provides functionality to generate synthetic anomalies using Perlin noise patterns. It can also use real anomaly source images for more realistic anomaly generation.
- Parameters:
anomaly_source_path (
str
|None
) – Optional path to directory containing anomaly source images. If provided, these images will be used instead of Perlin noise patterns.probability (
float
) – Probability of applying the anomaly transformation to an image. Default:0.5
.blend_factor (
float
|tuple
[float
,float
]) – Factor determining how much of the anomaly to blend with the original image. Can be a float or a tuple of(min, max)
. Default:(0.2, 1.0)
.rotation_range (
tuple
[float
,float
]) – Range of rotation angles in degrees for the Perlin noise pattern. Default:(-90, 90)
.
Example
>>> # Single image usage with default parameters >>> transform = PerlinAnomalyGenerator() >>> image = torch.randn(3, 256, 256) # [C, H, W] >>> augmented_image, anomaly_mask = transform(image) >>> print(augmented_image.shape) # [C, H, W] >>> print(anomaly_mask.shape) # [1, H, W]
>>> # Batch usage with custom parameters >>> transform = PerlinAnomalyGenerator( ... probability=0.8, ... blend_factor=0.5 ... ) >>> batch = torch.randn(4, 3, 256, 256) # [B, C, H, W] >>> augmented_batch, anomaly_masks = transform(batch) >>> print(augmented_batch.shape) # [B, C, H, W] >>> print(anomaly_masks.shape) # [B, 1, H, W]
>>> # Using anomaly source images >>> transform = PerlinAnomalyGenerator( ... anomaly_source_path='path/to/anomaly/images', ... probability=0.7, ... blend_factor=(0.3, 0.9), ... rotation_range=(-45, 45) ... ) >>> augmented_image, anomaly_mask = transform(image)
- forward(img)#
Apply augmentation using the mask for single image or batch.
- Parameters:
img (
Tensor
) – Input image tensor of shape[C, H, W]
or batch tensor of shape[B, C, H, W]
.- Returns:
- Tuple containing:
Augmented image tensor of same shape as input
Mask tensor of shape
[1, H, W]
or[B, 1, H, W]
- Return type:
- generate_perturbation(height, width, device=None, anomaly_source_path=None)#
Generate perturbed image and mask.
- Parameters:
- Returns:
- Tuple containing:
Perturbation tensor of shape
[H, W, C]
Mask tensor of shape
[H, W, 1]
- Return type:
- anomalib.data.utils.generators.generate_perlin_noise(height, width, scale=None, device=None)#
Generate a Perlin noise pattern.
This function generates a Perlin noise pattern using a grid-based gradient noise approach. The noise is generated by interpolating between randomly generated gradient vectors at grid vertices. The interpolation uses a quintic curve for smooth transitions.
- Parameters:
height (
int
) – Desired height of the noise pattern.width (
int
) – Desired width of the noise pattern.scale (
tuple
[int
,int
] |None
) – Tuple of(scale_x, scale_y)
for noise granularity. IfNone
, random scales will be used. Larger scales produce coarser noise patterns, while smaller scales produce finer patterns.device (
device
|None
) – Device to generate the noise on. IfNone
, uses current default device.
- Returns:
- Tensor of shape
[height, width]
containing the noise pattern, with values roughly in
[-1, 1]
range.
- Tensor of shape
- Return type:
Example
>>> # Generate 256x256 noise with default random scale >>> noise = generate_perlin_noise(256, 256) >>> print(noise.shape) torch.Size([256, 256])
>>> # Generate 512x512 noise with fixed scale >>> noise = generate_perlin_noise(512, 512, scale=(8, 8)) >>> print(noise.shape) torch.Size([512, 512])
>>> # Generate noise on GPU if available >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") >>> noise = generate_perlin_noise(128, 128, device=device)
Dataset that generates synthetic anomalies.
This module provides functionality to generate synthetic anomalies when real anomalous data is scarce or unavailable. It includes:
A dataset class that generates synthetic anomalies from normal images
Functions to convert normal samples into synthetic anomalous samples
Perlin noise-based anomaly generation
Temporary file management for synthetic data
Example
>>> from anomalib.data.utils.synthetic import SyntheticAnomalyDataset
>>> # Create synthetic dataset from normal samples
>>> synthetic_dataset = SyntheticAnomalyDataset(
... transform=transforms,
... source_samples=normal_samples
... )
>>> len(synthetic_dataset) # 50/50 normal/anomalous split
200
- class anomalib.data.utils.synthetic.SyntheticAnomalyDataset(augmentations, source_samples)#
Bases:
AnomalibDataset
Dataset for generating and managing synthetic anomalies.
The dataset creates synthetic anomalous images by applying Perlin noise-based perturbations to normal images. The synthetic images are stored in a temporary directory that is cleaned up when the dataset object is deleted.
- Parameters:
augmentations (Transform | None) – Transform object describing the input data augmentations.
source_samples (
DataFrame
) – DataFrame containing normal samples used as source for synthetic anomalies.
Example
>>> transform = Compose([...]) >>> dataset = SyntheticAnomalyDataset( ... transform=transform, ... source_samples=normal_df ... ) >>> len(dataset) # 50/50 normal/anomalous split 100
- classmethod from_dataset(dataset)#
Create synthetic dataset from existing dataset of normal images.
- Parameters:
dataset (
AnomalibDataset
) – Dataset containing only normal images to convert into a synthetic dataset with 50/50 normal/anomalous split.- Return type:
- Returns:
New synthetic anomaly dataset.
Example
>>> normal_dataset = Dataset(...) >>> synthetic = SyntheticAnomalyDataset.from_dataset(normal_dataset)
- anomalib.data.utils.synthetic.make_synthetic_dataset(source_samples, image_dir, mask_dir, anomalous_ratio=0.5)#
Convert normal samples into a mixed set with synthetic anomalies.
The function generates synthetic anomalous images and their corresponding masks by applying Perlin noise-based perturbations to normal images.
- Parameters:
source_samples (
DataFrame
) – DataFrame containing normal images used as source for synthetic anomalies. Must contain columns:image_path
,label
,label_index
,mask_path
, andsplit
.image_dir (
Path
) – Directory where synthetic anomalous images will be saved.mask_dir (
Path
) – Directory where ground truth anomaly masks will be saved.anomalous_ratio (
float
) – Fraction of source samples to convert to anomalous samples. Defaults to0.5
.
- Return type:
DataFrame
- Returns:
DataFrame containing both normal and synthetic anomalous samples.
- Raises:
ValueError – If source samples contain any anomalous images.
NotADirectoryError – If
image_dir
ormask_dir
is not a directory.
Example
>>> df = make_synthetic_dataset( ... source_samples=normal_df, ... image_dir=Path("./synthetic/images"), ... mask_dir=Path("./synthetic/masks"), ... anomalous_ratio=0.3 ... ) >>> len(df[df.label == "abnormal"]) # 30% are anomalous 30