AnomalyDINO

AnomalyDINO#

AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2.

This module implements AnomalyDINO. A memory-bank model for anomaly detection that utilizes DINOv2-Small as its backbone. At inference time it uses kNN to search for anomalous patches. The image anomaly score is dependent on the worst 99th percentile of the pixel-wise anomaly score.

The model has optional masking to remove noisy background components, also optionally can use greedy coreset-subsampling if needed.

Example

>>> from anomalib.data import MVTecAD
>>> from anomalib.models.image.anomaly_dino.lightning_model import AnomalyDINO
>>> from anomalib.engine import Engine

>>> MVTEC_CATEGORIES = [
...     "hazelnut", "grid", "carpet", "bottle", "cable", "capsule", "leather",
...     "metal_nut", "pill", "screw", "tile", "toothbrush", "transistor", "wood", "zipper"
... ]
>>> MASKED_CATEGORIES = ["capsule", "hazelnut", "pill", "screw", "toothbrush"]

>>> for category in MVTEC_CATEGORIES:
...     mask = category in MASKED_CATEGORIES
...     print(f"--- Running category: {category} | masking={mask} ---")

… # Initialize data module … datamodule = MVTecAD(category=category)

… # Initialize model … model = AnomalyDINO( … num_neighbours=1, … encoder_name=”vit_small_patch14_dinov2”, … masking=mask, … coreset_subsampling=False, … )

… # Train and test … engine = Engine() … engine.fit(model=model, datamodule=datamodule) … engine.test(datamodule=datamodule) >>> print(“All categories processed.”)

class anomalib.models.image.anomaly_dino.lightning_model.AnomalyDINO(num_neighbours=1, encoder_name='vit_small_patch14_dinov2', masking=False, coreset_subsampling=False, sampling_ratio=0.1, precision=PrecisionType.FLOAT32, pre_processor=True, post_processor=True, evaluator=True, visualizer=True)#

Bases: MemoryBankMixin, AnomalibModule

AnomalyDINO Lightning Module for anomaly detection.

This class implements the AnomalyDINO algorithm, which leverages self-supervised DINO (self-distillation with no labels) vision transformer (ViT) encoders for feature extraction in anomaly detection tasks. Similar to PatchCore, it uses a memory bank of patch embeddings and performs nearest neighbor search to identify anomalous regions in test images.

The model operates in two phases: 1. Training: Extracts and stores patch embeddings from normal training images. 2. Inference: Compares test image patch embeddings with the memory bank

to identify anomalies based on distance metrics.

Parameters:

num_neighbours (int) – Number of nearest neighbors to use for anomaly scoring. Defaults to 1.
encoder_name (str) – Name of the pretrained DINO encoder to use. Defaults to "vit_small_patch14_dinov2".
masking (bool) – Whether to apply masking during feature extraction to simulate occlusions or missing patches. Defaults to False.
coreset_subsampling (bool) – Whether to apply coreset subsampling to reduce the size of the memory bank. Defaults to False.
sampling_ratio (float) – If coreset subsampling, by what ratio should we subsample. Defaults to 0.1
precision (str | PrecisionType) – Precision type for model computations. Can be either a string ("float32", "float16") or a PrecisionType enum value. Defaults to PrecisionType.FLOAT32.
pre_processor (Module | bool) – Pre-processor instance or bool flag to enable default preprocessing. Defaults to True.
post_processor (Module | bool) – Post-processor instance or bool flag to enable default postprocessing. Defaults to True.
evaluator (Evaluator | bool) – Evaluator instance or bool flag for performance computation. Defaults to True.
visualizer (Visualizer | bool) – Visualizer instance or bool flag to enable visualization. Defaults to True.

Example

>>> from anomalib.data import MVTecAD
>>> from anomalib.models.image.anomaly_dino.lightning_model import AnomalyDINO
>>> from anomalib.engine import Engine

>>> MVTEC_CATEGORIES = [
...     "hazelnut", "grid", "carpet", "bottle", "cable", "capsule", "leather",
...     "metal_nut", "pill", "screw", "tile", "toothbrush", "transistor", "wood", "zipper"
... ]
>>> MASKED_CATEGORIES = ["capsule", "hazelnut", "pill", "screw", "toothbrush"]

>>> for category in MVTEC_CATEGORIES:
...     mask = category in MASKED_CATEGORIES
...     print(f"--- Running category: {category} | masking={mask} ---")

… # Initialize data module … datamodule = MVTecAD(category=category)

… # Initialize model … model = AnomalyDINO( … num_neighbours=1, … encoder_name=”vit_small_patch14_dinov2”, … masking=mask, … coreset_subsampling=False, … )

… # Train and test … engine = Engine() … engine.fit(model=model, datamodule=datamodule) … engine.test(datamodule=datamodule)

>>> print("All categories processed.")

Notes

The model does not require backpropagation or optimization, as it relies on pretrained transformer embeddings and similarity search.
Works best when trained exclusively on normal (non-anomalous) samples.

See also

anomalib.models.components.AnomalibModule:
Base class for all anomaly detection models
anomalib.models.components.MemoryBankMixin:
Mixin class for models using memory bank embeddings

static configure_optimizers()#

Configure optimizers.

Returns:: AnomalyDINO does not require optimization or gradient updates.
Return type:: None

static configure_post_processor()#

Configure the default post-processor.

Returns:

Post-processor that converts raw model scores into: interpretable anomaly predictions and maps.

Return type:

PostProcessor

classmethod configure_pre_processor(image_size=None)#

Configure the default pre-processor for AnomalyDINO.

Parameters:: image_size (tuple[int, int] | int | None) – Target size for resizing input images. Defaults to (252, 252). Note if int, keeps aspect ratio and resizes shortest side.
Returns:: Configured pre-processor instance.
Return type:: PreProcessor

Example

>>> pre_processor = AnomalyDINO.configure_pre_processor(
...     image_size=(252, 252)
... )
>>> transformed_image = pre_processor(image)

fit()#

Optional fitting step.

This method is a placeholder for potential post-training operations such as coreset subsampling or feature normalization. The model handles fitting (if-needed).

Return type:: None

property learning_type: LearningType#

Get the learning type for AnomalyDINO.

Returns:: Always LearningType.ONE_CLASS since the model is trained only on normal samples.
Return type:: LearningType

on_load_checkpoint(checkpoint)#

Make checkpoints trained before the timm-encoder migration loadable.

The frozen DINOv2 encoder was migrated from a custom Vision Transformer to a frozen TimmFeatureExtractor. The legacy encoder weights are dropped and replaced by the current timm encoder weights so the strict state-dict load still succeeds; the stored memory bank and other non-encoder state are left untouched. See restore_frozen_encoder_weights().

Parameters:: checkpoint (dict[str, Any]) – The checkpoint dictionary being loaded, modified in place.
Return type:: None

property trainer_arguments: dict[str, Any]#

Default PyTorch Lightning trainer arguments for AnomalyDINO.

Returns:

Trainer configuration with:

gradient_clip_val: 0 (no gradient clipping)
max_epochs: 1 (single pass over training data)
num_sanity_val_steps: 0 (skip validation sanity checks)
devices: 1 (single GPU supported)

Return type:

dict[str, Any]

training_step(batch, *args, **kwargs)#

Extract feature embeddings from training images.

Parameters:

batch (Batch) – Input batch containing images and metadata.
*args – Additional arguments (unused).
**kwargs – Additional keyword arguments (unused).

Returns:

Dummy loss tensor for Lightning compatibility.

Return type:

None

Note

The extracted embeddings are stored in the models memory bank for later use during the coreset sampling or inference phase.

validation_step(batch, *args, **kwargs)#

Generate anomaly predictions for a validation batch.

Parameters:

batch (Batch) – Input batch containing images and metadata.
*args – Additional arguments (unused).
**kwargs – Additional keyword arguments (unused).

Returns:

Batch with added predictions including anomaly maps and: scores computed using nearest neighbor search.

Return type:

Union[Tensor, Mapping[str, Any], None]

PyTorch model implementation for AnomalyDINO.

This module defines the low-level PyTorch implementation of the AnomalyDINO model, which combines a DINOv2 Vision Transformer encoder with a memory-bank approach for few-shot anomaly detection. It performs patch-based feature extraction, optional background masking, and k-nearest neighbor search for anomaly scoring.

Example

>>> from anomalib.models.image.anomaly_dino.torch_model import AnomalyDINOModel
>>> model = AnomalyDINOModel(
...     num_neighbours=1,
...     encoder_name="vit_small_patch14_dinov2",
...     masking=False,
...     coreset_subsampling=False,
...     sampling_ratio=0.1,
... )

class anomalib.models.image.anomaly_dino.torch_model.AnomalyDINOModel(num_neighbours=1, encoder_name='vit_small_patch14_dinov2', masking=False, coreset_subsampling=False, sampling_ratio=0.1)#

Bases: DynamicBufferMixin, Module

AnomalyDINO base PyTorch model for patch-based anomaly detection.

This model uses DINOv2 transformers as feature extractors and applies a memory-bank mechanism for few-shot anomaly detection, similar to PatchCore. It supports optional background masking and coreset subsampling.

Parameters:

num_neighbours (int) – Number of nearest neighbors used for anomaly scoring. Defaults to 1.
encoder_name (str) – DINO encoder architecture name (timm model name containing "dino"). Defaults to "vit_small_patch14_dinov2".
masking (bool) – Whether to apply PCA-based masking to suppress background features. Defaults to False.
coreset_subsampling (bool) – Whether to apply greedy coreset selection to reduce memory bank size. Defaults to False.
sampling_ratio (float) – Fraction of samples retained during coreset subsampling. Defaults to 0.1.

Example

>>> model = AnomalyDINOModel(masking=True, coreset_subsampling=True)
>>> x = torch.randn(1, 3, 224, 224)
>>> preds = model(x)
>>> preds.pred_score.shape
torch.Size([1, 1])

static compute_background_masks(batch_features, grid_size, threshold=10.0, kernel_size=3, border=0.2)#

Compute binary masks to identify foreground patches.

This method uses PCA on patch embeddings to estimate foreground regions, followed by morphological operations to clean up the mask.

Parameters:

batch_features (ndarray) – Patch embeddings of shape (B, N, D).
grid_size (tuple[int, int]) – Spatial grid dimensions (H, W).
threshold (float) – PCA threshold for foreground separation. Defaults to 10.0.
kernel_size (int) – Morphological kernel size. Defaults to 3.
border (float) – Fraction of image borders excluded from thresholding. Defaults to 0.2.

Returns:

Boolean masks of shape (B, N), where True indicates foreground patches.

Return type:

ndarray

extract_features(image_tensor)#

Extract patch-level feature embeddings from the last transformer layer.

Returns flattened patch tokens excluding CLS and register tokens.

Parameters:: image_tensor (Tensor) – Input image tensor of shape (B, 3, H, W).
Returns:: Patch feature embeddings of shape (B, N, D), where N is the number of patches and D the feature dimension.
Return type:: Tensor

fit()#

Finalize and optionally subsample the memory bank after training.

Once all embeddings from normal training images have been collected, this method consolidates them into the memory bank and optionally performs coreset-based subsampling.

Raises:: ValueError – If called before collecting any embeddings.
Return type:: None

forward(input_tensor)#

Forward pass for both training and inference.

In training mode:

Extracts normalized patch features.
Collects embeddings into the memory bank.

In inference mode:

Computes distances between input features and the memory bank.
Performs kNN-based scoring and anomaly map generation.

Parameters:

input_tensor (Tensor) – Input batch of shape (B, 3, H, W).

Returns:

In training: dummy scalar tensor (no loss backprop).
In inference: anomalib.data.InferenceBatch containing:
- pred_score: Image-level anomaly score (B, 1)
- anomaly_map: Pixel-level anomaly heatmap (B, 1, H, W)

Return type:

Tensor | InferenceBatch

static mean_top1p(distances)#

Compute the mean of the top 1% distances per image.

Used as a robust aggregation of patch-level anomaly scores into a single image-level anomaly score.

Parameters:: distances (Tensor) – Patch-level distances of shape (B, N).
Returns:: Mean of the top 1% distances per image, shape (B, 1).
Return type:: Tensor

AnomalyDINO

Contents

AnomalyDINO#