AnomalyDINO#
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2.
This module implements AnomalyDINO. A memory-bank model for anomaly detection that utilizes DINOv2-Small as its backbone. At inference time it uses kNN to search for anomalous patches. The image anomaly score is dependent on the worst 99th percentile of the pixel-wise anomaly score.
The model has optional masking to remove noisy background components, also optionally can use greedy coreset-subsampling if needed.
Example
>>> from anomalib.data import MVTecAD
>>> from anomalib.models.image.anomaly_dino.lightning_model import AnomalyDINO
>>> from anomalib.engine import Engine
>>> MVTEC_CATEGORIES = [
... "hazelnut", "grid", "carpet", "bottle", "cable", "capsule", "leather",
... "metal_nut", "pill", "screw", "tile", "toothbrush", "transistor", "wood", "zipper"
... ]
>>> MASKED_CATEGORIES = ["capsule", "hazelnut", "pill", "screw", "toothbrush"]
>>> for category in MVTEC_CATEGORIES:
... mask = category in MASKED_CATEGORIES
... print(f"--- Running category: {category} | masking={mask} ---")
… # Initialize data module … datamodule = MVTecAD(category=category)
… # Initialize model … model = AnomalyDINO( … num_neighbours=1, … encoder_name=”dinov2_vit_small_14”, … masking=mask, … coreset_subsampling=False, … )
… # Train and test … engine = Engine() … engine.fit(model=model, datamodule=datamodule) … engine.test(datamodule=datamodule) >>> print(“All categories processed.”)
- class anomalib.models.image.anomaly_dino.lightning_model.AnomalyDINO(num_neighbours=1, encoder_name='dinov2_vit_small_14', masking=False, coreset_subsampling=False, sampling_ratio=0.1, precision=PrecisionType.FLOAT32, pre_processor=True, post_processor=True, evaluator=True, visualizer=True)#
Bases:
MemoryBankMixin,AnomalibModuleAnomalyDINO Lightning Module for anomaly detection.
This class implements the AnomalyDINO algorithm, which leverages self-supervised DINO (self-distillation with no labels) vision transformer (ViT) encoders for feature extraction in anomaly detection tasks. Similar to PatchCore, it uses a memory bank of patch embeddings and performs nearest neighbor search to identify anomalous regions in test images.
The model operates in two phases: 1. Training: Extracts and stores patch embeddings from normal training images. 2. Inference: Compares test image patch embeddings with the memory bank
to identify anomalies based on distance metrics.
- Parameters:
num_neighbours (
int) – Number of nearest neighbors to use for anomaly scoring. Defaults to1.encoder_name (
str) – Name of the pretrained DINO encoder to use. Defaults to"dinov2_vits14".masking (
bool) – Whether to apply masking during feature extraction to simulate occlusions or missing patches. Defaults toFalse.coreset_subsampling (
bool) – Whether to apply coreset subsampling to reduce the size of the memory bank. Defaults toFalse.sampling_ratio (
float) – If coreset subsampling, by what ratio should we subsample. Defaults to0.1precision (
str|PrecisionType) – Precision type for model computations. Can be either a string ("float32","float16") or aPrecisionTypeenum value. Defaults toPrecisionType.FLOAT32.pre_processor (
Module|bool) – Pre-processor instance or bool flag to enable default preprocessing. Defaults toTrue.post_processor (
Module|bool) – Post-processor instance or bool flag to enable default postprocessing. Defaults toTrue.evaluator (
Evaluator|bool) – Evaluator instance or bool flag for performance computation. Defaults toTrue.visualizer (
Visualizer|bool) – Visualizer instance or bool flag to enable visualization. Defaults toTrue.
Example
>>> from anomalib.data import MVTecAD >>> from anomalib.models.image.anomaly_dino.lightning_model import AnomalyDINO >>> from anomalib.engine import Engine
>>> MVTEC_CATEGORIES = [ ... "hazelnut", "grid", "carpet", "bottle", "cable", "capsule", "leather", ... "metal_nut", "pill", "screw", "tile", "toothbrush", "transistor", "wood", "zipper" ... ] >>> MASKED_CATEGORIES = ["capsule", "hazelnut", "pill", "screw", "toothbrush"]
>>> for category in MVTEC_CATEGORIES: ... mask = category in MASKED_CATEGORIES ... print(f"--- Running category: {category} | masking={mask} ---")
… # Initialize data module … datamodule = MVTecAD(category=category)
… # Initialize model … model = AnomalyDINO( … num_neighbours=1, … encoder_name=”dinov2_vit_small_14”, … masking=mask, … coreset_subsampling=False, … )
… # Train and test … engine = Engine() … engine.fit(model=model, datamodule=datamodule) … engine.test(datamodule=datamodule)
>>> print("All categories processed.")
Notes
The model does not require backpropagation or optimization, as it relies on pretrained transformer embeddings and similarity search.
Works best when trained exclusively on normal (non-anomalous) samples.
See also
anomalib.models.components.AnomalibModule:Base class for all anomaly detection models
anomalib.models.components.MemoryBankMixin:Mixin class for models using memory bank embeddings
- static configure_optimizers()#
Configure optimizers.
- Returns:
AnomalyDINO does not require optimization or gradient updates.
- Return type:
- static configure_post_processor()#
Configure the default post-processor.
- Returns:
- Post-processor that converts raw model scores into
interpretable anomaly predictions and maps.
- Return type:
- classmethod configure_pre_processor(image_size=None)#
Configure the default pre-processor for AnomalyDINO.
- Parameters:
image_size (
tuple[int,int] |int|None) – Target size for resizing input images. Defaults to(252, 252). Note if int, keeps aspect ratio and resizes shortest side.- Returns:
Configured pre-processor instance.
- Return type:
Example
>>> pre_processor = AnomalyDINO.configure_pre_processor( ... image_size=(252, 252) ... ) >>> transformed_image = pre_processor(image)
- fit()#
Optional fitting step.
This method is a placeholder for potential post-training operations such as coreset subsampling or feature normalization. The model handles fitting (if-needed).
- Return type:
- property learning_type: LearningType#
Get the learning type for AnomalyDINO.
- Returns:
Always
LearningType.ONE_CLASSsince the model is trained only on normal samples.- Return type:
LearningType
- property trainer_arguments: dict[str, Any]#
Default PyTorch Lightning trainer arguments for AnomalyDINO.
- training_step(batch, *args, **kwargs)#
Extract feature embeddings from training images.
- Parameters:
batch (
Batch) – Input batch containing images and metadata.*args – Additional arguments (unused).
**kwargs – Additional keyword arguments (unused).
- Returns:
Dummy loss tensor for Lightning compatibility.
- Return type:
Note
The extracted embeddings are stored in the models memory bank for later use during the coreset sampling or inference phase.
- validation_step(batch, *args, **kwargs)#
Generate anomaly predictions for a validation batch.
- Parameters:
batch (
Batch) – Input batch containing images and metadata.*args – Additional arguments (unused).
**kwargs – Additional keyword arguments (unused).
- Returns:
- Batch with added predictions including anomaly maps and
scores computed using nearest neighbor search.
- Return type:
PyTorch model implementation for AnomalyDINO.
This module defines the low-level PyTorch implementation of the AnomalyDINO model, which combines a DINOv2 Vision Transformer encoder with a memory-bank approach for few-shot anomaly detection. It performs patch-based feature extraction, optional background masking, and k-nearest neighbor search for anomaly scoring.
Example
>>> from anomalib.models.image.anomaly_dino.torch_model import AnomalyDINOModel
>>> model = AnomalyDINOModel(
... num_neighbours=1,
... encoder_name="dinov2_vit_small_14",
... masking=False,
... coreset_subsampling=False,
... sampling_ratio=0.1,
... )
- class anomalib.models.image.anomaly_dino.torch_model.AnomalyDINOModel(num_neighbours=1, encoder_name='dinov2_vit_small_14', masking=False, coreset_subsampling=False, sampling_ratio=0.1)#
Bases:
DynamicBufferMixin,ModuleAnomalyDINO base PyTorch model for patch-based anomaly detection.
This model uses DINOv2 transformers as feature extractors and applies a memory-bank mechanism for few-shot anomaly detection, similar to PatchCore. It supports optional background masking and coreset subsampling.
- Parameters:
num_neighbours (
int) – Number of nearest neighbors used for anomaly scoring. Defaults to1.encoder_name (
str) – DINOv2 encoder architecture name. Must start with"dinov2". Defaults to"dinov2_vit_small_14".masking (
bool) – Whether to apply PCA-based masking to suppress background features. Defaults toFalse.coreset_subsampling (
bool) – Whether to apply greedy coreset selection to reduce memory bank size. Defaults toFalse.sampling_ratio (
float) – Fraction of samples retained during coreset subsampling. Defaults to0.1.
Example
>>> model = AnomalyDINOModel(masking=True, coreset_subsampling=True) >>> x = torch.randn(1, 3, 224, 224) >>> preds = model(x) >>> preds.pred_score.shape torch.Size([1, 1])
- static compute_background_masks(batch_features, grid_size, threshold=10.0, kernel_size=3, border=0.2)#
Compute binary masks to identify foreground patches.
This method uses PCA on patch embeddings to estimate foreground regions, followed by morphological operations to clean up the mask.
- Parameters:
batch_features (
ndarray) – Patch embeddings of shape(B, N, D).grid_size (
tuple[int,int]) – Spatial grid dimensions (H, W).threshold (
float) – PCA threshold for foreground separation. Defaults to10.0.kernel_size (
int) – Morphological kernel size. Defaults to3.border (
float) – Fraction of image borders excluded from thresholding. Defaults to0.2.
- Returns:
Boolean masks of shape
(B, N), whereTrueindicates foreground patches.- Return type:
ndarray
- extract_features(image_tensor)#
Extract patch-level feature embeddings from the last transformer layer.
Returns flattened patch tokens excluding CLS and register tokens.
- fit()#
Finalize and optionally subsample the memory bank after training.
Once all embeddings from normal training images have been collected, this method consolidates them into the memory bank and optionally performs coreset-based subsampling.
- Raises:
ValueError – If called before collecting any embeddings.
- Return type:
- forward(input_tensor)#
Forward pass for both training and inference.
- In training mode:
Extracts normalized patch features.
Collects embeddings into the memory bank.
- In inference mode:
Computes distances between input features and the memory bank.
Performs kNN-based scoring and anomaly map generation.
- Parameters:
input_tensor (
Tensor) – Input batch of shape(B, 3, H, W).- Returns:
In training: dummy scalar tensor (no loss backprop).
- In inference:
anomalib.data.InferenceBatchcontaining: pred_score: Image-level anomaly score(B, 1)anomaly_map: Pixel-level anomaly heatmap(B, 1, H, W)
- In inference:
- Return type:
- static mean_top1p(distances)#
Compute the mean of the top 1% distances per image.
Used as a robust aggregation of patch-level anomaly scores into a single image-level anomaly score.