GLASS

GLASS#

Architecture#

GLASS - Unsupervised anomaly detection via Gradient Ascent for Industrial Anomaly detection and localization.

This module implements the GLASS model for unsupervised anomaly detection and localization. GLASS synthesizes both global and local anomalies using Gaussian noise guided by gradient ascent to enhance weak defect detection in industrial settings.

The model consists of:

A feature extractor and feature adaptor to obtain robust normal representations
A Global Anomaly Synthesis (GAS) module that perturbs features using Gaussian noise and gradient ascent with truncated projection
A Local Anomaly Synthesis (LAS) module that overlays augmented textures onto images using Perlin noise masks
A shared discriminator trained with features from normal, global, and local synthetic samples

Paper: A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization <https://arxiv.org/pdf/2407.09359>

class anomalib.models.image.glass.lightning_model.Glass(input_shape=(288, 288), anomaly_source_path=None, backbone='wide_resnet50_2', pretrain_embed_dim=1536, target_embed_dim=1536, patchsize=3, patchstride=1, pre_trained=True, layers=None, pre_projection=1, discriminator_layers=2, discriminator_hidden=1024, learning_rate=0.0001, step=20, svd=0, gaussian_noise_std=0.015, radius_quantile=0.75, focal_loss_quantile_threshold=0.5, mining=True, pre_processor=True, post_processor=True, evaluator=True, visualizer=True)#

Bases: AnomalibModule

PyTorch Lightning Implementation of the GLASS Model.

The model uses a pre-trained feature extractor to extract features and a feature adaptor to mitigate latent domain: bias.

Global anomaly features are synthesized from adapted normal features using gradient ascent. Local anomaly images are synthesized using texture overlay datasets like dtd which are then processed by feature

extractor and feature adaptor.

All three different features are passed to the discriminator trained using loss functions.

Parameters:

input_shape (tuple[int, int]) – Input image dimensions as a tuple of (height, width). Required for shaping the input pipeline. Defaults to (288, 288).
anomaly_source_path (str | None) – Path to the dataset or source directory containing normal images and anomaly textures
backbone (str) – Name of the CNN backbone used for feature extraction. Defaults to “wide_resnet50_2”.
pretrain_embed_dim (int) –
Dimensionality of features extracted by the pre-trained backbone before adaptation.

Defaults to 1536.
target_embed_dim (int) – Dimensionality of the target adapted features after projection. Defaults to 1536.
patchsize (int) – Size of the local patch used in feature aggregation (e.g., for neighborhood pooling). Defaults to 3.
patchstride (int) – Stride used when extracting patches for local feature aggregation. Defaults to 1.
pre_trained (bool) – Whether to use ImageNet pre-trained weights for the backbone network. Defaults to True.
layers (list[str] | None) – List of backbone layers to extract features from. Defaults to [“layer2”, “layer3”].
pre_projection (int) –
Number of projection layers used in the feature adaptor (e.g., MLP before discriminator).

Defaults to 1.
discriminator_layers (int) – Number of layers in the discriminator network. Defaults to 2.
discriminator_hidden (int) – Number of hidden units in each discriminator layer. Defaults to 1024.
learning_rate (float) – Learning rate for training the feature adaptor and discriminator networks. Defaults to 0.0001.
step (int) – Number of gradient ascent steps for anomaly synthesis. Defaults to 20.
svd (int) – Flag to enable SVD-based feature projection. Defaults to 0.
gaussian_noise_std (float) – Standard deviation of Gaussian noise added to features for global anomaly synthesis. Defaults to 0.015.
radius_quantile (float) – Quantile used to compute the truncated projection radius during gradient ascent. Defaults to 0.75.
focal_loss_quantile_threshold (float) – Quantile threshold for hard example mining in focal loss computation. When 0, all samples are used. Defaults to 0.5.
mining (bool) – Whether to perform gradient ascent or skip it. Defaults to True.
pre_processor (PreProcessor | bool) – reprocessing module or flag to enable default preprocessing. Set to True to apply default normalization and resizing. Defaults to True.
post_processor (PostProcessor | bool) –
Postprocessing module or flag to enable default output smoothing or thresholding.

Defaults to True.
evaluator (Evaluator | bool) – Evaluation module for calculating metrics such as AUROC and PRO. Defaults to True.
visualizer (Visualizer | bool) –
Visualization module to generate heatmaps, segmentation overlays, and anomaly scores.

Defaults to True.

static configure_evaluator()#

Configure the evaluator with validation and test metrics.

Overrides the default evaluator to include both image_AUROC and pixel_AUROC as validation metrics. The official GLASS implementation selects the best checkpoint based on image_auroc + pixel_auroc, so both must be available during validation.

Returns:: Configured evaluator with both validation and test metrics.
Return type:: Evaluator

Example

>>> evaluator = Glass.configure_evaluator()
>>> len(evaluator.val_metrics) > 0
True

configure_optimizers()#

Configure optimizers for the discriminator, projection, and backbone.

Returns all active optimizers in a fixed order: discriminator first, then projection (if pre_projection > 0), then backbone (if not pre-trained). This ordering is critical for checkpoint resume.

Returns:: List of optimizers managed by Lightning.
Return type:: list[Optimizer]

classmethod configure_pre_processor(image_size=None, center_crop_size=None)#

Configure the default pre-processor for GLASS.

If valid center_crop_size is provided, the pre-processor will also perform center cropping, according to the paper.

Parameters:

image_size (tuple[int, int] | None) – Target size for resizing. Defaults to (288, 288).
center_crop_size (tuple[int, int] | None) – Size for center cropping. Defaults to None.

Returns:

Configured pre-processor instance.

Return type:

PreProcessor

Raises:

ValueError – If at least one dimension of center_crop_size is larger than correspondent image_size dimension.

Example

>>> pre_processor = Glass.configure_pre_processor(
...     image_size=(288, 288)
... )
>>> transformed_image = pre_processor(image)

property learning_type: LearningType#

Return the learning type of the model.

Returns:: Learning type (ONE_CLASS for GLASS)
Return type:: LearningType

on_train_epoch_start()#

Initialize model by computing mean feature representation across training dataset.

This method is called at the start of training and computes a mean feature vector that serves as a reference point for the normal class distribution.

Return type:: None

property trainer_arguments: dict[str, Any]#

Return GLASS trainer arguments.

Returns:: Dictionary containing trainer configuration
Return type:: dict[str, Any]

training_step(batch, batch_idx)#

Training step for GLASS model.

Parameters:

batch (Batch) – Input batch containing images and metadata
batch_idx (int) – Index of the current batch

Returns:

Dictionary containing loss values and metrics

Return type:

Union[Tensor, Mapping[str, Any], None]

validation_step(batch, batch_idx)#

Performs a single validation step during model evaluation.

Parameters:

batch (Batch) – A batch of input data, typically containing images and ground truth labels.
batch_idx (int) – Index of the batch (unused in this function).

Returns:

Output of the validation step, usually containing predictions and any associated metrics.

Return type:

Union[Tensor, Mapping[str, Any], None]

GLASS - Unsupervised anomaly detection via Gradient Ascent for Industrial Anomaly detection and localization.

This module implements the GLASS model for unsupervised anomaly detection and localization. GLASS synthesizes both global and local anomalies using Gaussian noise guided by gradient ascent to enhance weak defect detection in industrial settings.

The model consists of:

A feature extractor and feature adaptor to obtain robust normal representations
A Global Anomaly Synthesis (GAS) module that perturbs features using Gaussian noise and gradient ascent with truncated projection
A Local Anomaly Synthesis (LAS) module that overlays augmented textures onto images using Perlin noise masks
A shared discriminator trained with features from normal, global, and local synthetic samples

Paper: A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization <https://arxiv.org/pdf/2407.09359>

class anomalib.models.image.glass.torch_model.GlassModel(input_shape=(288, 288), anomaly_source_path=None, pretrain_embed_dim=1536, target_embed_dim=1536, backbone='wide_resnet50_2', patchsize=3, patchstride=1, pre_trained=True, layers=None, pre_projection=1, discriminator_layers=2, discriminator_hidden=1024, step=20, svd=0, gaussian_noise_std=0.015, radius_quantile=0.75, focal_loss_quantile_threshold=0.5, mining=True, normalize_mean=None, normalize_std=None)#

Bases: Module

PyTorch Implementation of the GLASS Model.

calculate_anomaly_scores(images)#

Calculates anomaly scores and segmentation masks for input images.

Parameters:

images (Tensor) – Batch of input images of shape [B, C, H, W].

Returns:

image_scores: Anomaly scores per image, shape [B].
masks: Segmentation masks, shape [B, H, W].

Return type:

tuple[Tensor, Tensor]

calculate_center(dataloader, device)#

Calculates and updates the center embedding from a dataset.

This method runs the model in evaluation mode and computes the mean feature representation (center) across the entire dataset. The center is used for further downstream tasks such as anomaly detection or feature normalization.

Parameters:

dataloader (DataLoader) – A PyTorch DataLoader providing batches of data, where each batch contains an image attribute.
device (device) – The device on which tensors should be processed (e.g., torch.device("cuda") or torch.device("cpu")).

Returns:

The method updates self.center in-place with the computed center tensor.

Return type:

None

calculate_features(img, aug, evaluation=False)#

Calculate and return feature embeddings for the input and augmented images.

Depending on whether a pre-projection module is used, this method optionally applies it to the embeddings before returning them.

Parameters:

img (Tensor) – The original input image tensor.
aug (Tensor) – The augmented image tensor.
evaluation (bool) – Whether the model is in evaluation mode. Defaults to False.

Returns:

A tuple containing the feature embeddings: for the original image (true_feats), the augmented image (fake_feats), and the patch grid shapes from the first feature level.

Return type:

tuple[Tensor, Tensor, list[tuple[int, int]]]

forward(img)#

Forward pass for training and inference.

During training, synthesizes global and local anomalies and computes the combined loss (BCE + focal). During inference, skips augmentation entirely and directly computes anomaly scores and segmentation masks.

Return type:: tuple[Tensor, Tensor, Tensor, Tensor, Tensor] | InferenceBatch

generate_embeddings(images, evaluation=False)#

Generates patch-wise feature embeddings for a batch of input images.

Extracts multi-scale features, patchifies them, aligns spatial sizes via bilinear interpolation, then preprocesses and aggregates into a single embedding tensor.

Parameters:

images (Tensor) – Input images of shape (B, C, H, W).
evaluation (bool) – Whether to run in evaluation mode. Default is False.

Returns:

Patch-level embeddings of shape (B*N, D) where N is patches per image.
List of (height, width) patch counts per feature level.

Return type:

tuple[Tensor, list[tuple[int, int]]]

GLASS

Contents

GLASS#

Architecture#