Metrics#

Custom metrics for evaluating anomaly detection models.

This module provides various metrics for evaluating anomaly detection performance:

  • Area Under Curve (AUC) metrics:
    • AUROC: Area Under Receiver Operating Characteristic curve

    • AUPR: Area Under Precision-Recall curve

    • AUPRO: Area Under Per-Region Overlap curve

    • AUPIMO: Area Under Per-Image Missed Overlap curve

  • F1-score metrics:
    • F1Score: Standard F1 score

    • F1Max: Maximum F1 score across thresholds

  • Threshold metrics:
    • F1AdaptiveThreshold: Finds optimal threshold by maximizing F1 score

    • ManualThreshold: Uses manually specified threshold

  • Other metrics:
    • AnomalibMetric: Base class for custom metrics

    • AnomalyScoreDistribution: Analyzes score distributions

    • BinaryPrecisionRecallCurve: Computes precision-recall curves

    • Evaluator: Combines multiple metrics for evaluation

    • MinMax: Normalizes scores to [0,1] range

    • PRO: Per-Region Overlap score

    • PIMO: Per-Image Missed Overlap score

Example

>>> from anomalib.metrics import AUROC, F1Score
>>> auroc = AUROC()
>>> f1 = F1Score()
>>> labels = torch.tensor([0, 1, 0, 1])
>>> scores = torch.tensor([0.1, 0.9, 0.2, 0.8])
>>> auroc(scores, labels)
tensor(1.)
>>> f1(scores, labels, threshold=0.5)
tensor(1.)
class anomalib.metrics.AUPIMO(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _AUPIMO

Wrapper adding AnomalibMetric functionality to AUPIMO metric.

class anomalib.metrics.AUPR(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _AUPR

Wrapper to add AnomalibMetric functionality to AUPR metric.

class anomalib.metrics.AUPRO(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _AUPRO

Wrapper to add AnomalibMetric functionality to AUPRO metric.

class anomalib.metrics.AUROC(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _AUROC

Wrapper to add AnomalibMetric functionality to AUROC metric.

class anomalib.metrics.AnomalibMetric(fields=None, prefix='', **kwargs)#

Bases: object

Base class for metrics in Anomalib.

Makes any torchmetrics metric compatible with the Anomalib framework by adding batch processing capabilities. Subclasses must inherit from both this class and a torchmetrics metric.

The class enables updating metrics with Batch objects instead of individual tensors. It extracts the specified fields from the batch and passes them to the underlying metric’s update method.

Parameters:
  • fields (Sequence[str] | None) – Names of fields to extract from batch. If None, uses class’s default_fields. Required if no defaults.

  • prefix (str) – Prefix added to metric name. Defaults to “”.

  • **kwargs – Additional arguments passed to parent metric class.

Raises:

ValueError – If no fields are specified and class has no defaults.

Example

Create image and pixel-level F1 metrics:

>>> from torchmetrics.classification import BinaryF1Score
>>> class F1Score(AnomalibMetric, BinaryF1Score):
...     pass
...
>>> # Image-level metric using pred_label and gt_label
>>> image_f1 = F1Score(
...     fields=["pred_label", "gt_label"],
...     prefix="image_"
... )
>>> # Pixel-level metric using pred_mask and gt_mask
>>> pixel_f1 = F1Score(
...     fields=["pred_mask", "gt_mask"],
...     prefix="pixel_"
... )
update(batch, *args, **kwargs)#

Update metric with values from batch fields.

Parameters:
  • batch (Batch) – Batch object containing required fields.

  • *args – Additional positional arguments passed to parent update.

  • **kwargs – Additional keyword arguments passed to parent update.

Raises:

ValueError – If batch is missing any required fields.

Return type:

None

class anomalib.metrics.AnomalyScoreDistribution(**kwargs)#

Bases: Metric

Compute distribution statistics of anomaly scores.

This class tracks and computes the mean and standard deviation of anomaly scores from the normal samples in the training set. Statistics are computed for both image-level scores and pixel-level anomaly maps.

The metric maintains internal state to accumulate scores and maps across batches before computing final statistics.

Example

>>> dist = AnomalyScoreDistribution()
>>> # Update with batch of scores
>>> scores = torch.tensor([0.1, 0.2, 0.3])
>>> dist.update(anomaly_scores=scores)
>>> # Compute statistics
>>> img_mean, img_std, pix_mean, pix_std = dist.compute()
compute()#

Compute distribution statistics from accumulated scores and maps.

Returns:

  • image_mean: Mean of log-transformed image anomaly scores

  • image_std: Standard deviation of log-transformed image scores

  • pixel_mean: Mean of log-transformed pixel anomaly maps

  • pixel_std: Standard deviation of log-transformed pixel maps

Return type:

tuple containing

update(*args, anomaly_scores=None, anomaly_maps=None, **kwargs)#

Update the internal state with new scores and maps.

Parameters:
  • *args – Unused positional arguments.

  • anomaly_scores (Tensor | None) – Batch of image-level anomaly scores.

  • anomaly_maps (Tensor | None) – Batch of pixel-level anomaly maps.

  • **kwargs – Unused keyword arguments.

Return type:

None

class anomalib.metrics.BinaryPrecisionRecallCurve(thresholds=None, ignore_index=None, validate_args=True, **kwargs)#

Bases: BinaryPrecisionRecallCurve

Binary precision-recall curve without threshold prediction normalization.

This class extends the torchmetrics BinaryPrecisionRecallCurve class but removes the sigmoid normalization step applied to prediction thresholds.

Example

>>> import torch
>>> from anomalib.metrics import BinaryPrecisionRecallCurve
>>> metric = BinaryPrecisionRecallCurve()
>>> preds = torch.tensor([0.1, 0.4, 0.35, 0.8])
>>> target = torch.tensor([0, 0, 1, 1])
>>> metric.update(preds, target)
>>> precision, recall, thresholds = metric.compute()
update(preds, target)#

Update metric state with new predictions and targets.

Unlike the base class, this method accepts raw predictions without applying sigmoid normalization.

Parameters:
  • preds (Tensor) – Raw predicted scores or probabilities

  • target (Tensor) – Ground truth binary labels (0 or 1)

Return type:

None

class anomalib.metrics.Evaluator(val_metrics=None, test_metrics=None, compute_on_cpu=True)#

Bases: Module, Callback

Evaluator module for LightningModule.

The Evaluator module is a PyTorch module that computes and logs metrics during validation and test steps. Each AnomalibModule should have an Evaluator module as a submodule to compute and log metrics during validation and test steps. An Evaluation module can be passed to the AnomalibModule as a parameter during initialization. When no Evaluator module is provided, the AnomalibModule will use a default Evaluator module that logs a default set of metrics.

Parameters:
  • val_metrics (Sequence[AnomalibMetric], optional) – Validation metrics. Defaults to [].

  • test_metrics (Sequence[AnomalibMetric], optional) – Test metrics. Defaults to [].

  • compute_on_cpu (bool, optional) – Whether to compute metrics on CPU. Defaults to True.

Examples

>>> from anomalib.metrics import F1Score, AUROC
>>> from anomalib.data import ImageBatch
>>> import torch
>>>
>>> f1_score = F1Score(fields=["pred_label", "gt_label"])
>>> auroc = AUROC(fields=["pred_score", "gt_label"])
>>>
>>> evaluator = Evaluator(test_metrics=[f1_score])
metrics_to_cpu(metrics)#

Set the compute_on_cpu attribute of the metrics to True.

Return type:

None

on_test_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#

Update test metrics with the batch output.

Return type:

None

on_test_epoch_end(trainer, pl_module)#

Compute and log test metrics.

Return type:

None

on_validation_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#

Update validation metrics with the batch output.

Return type:

None

on_validation_epoch_end(trainer, pl_module)#

Compute and log validation metrics.

Return type:

None

setup(trainer, pl_module, stage)#

Move metrics to cpu if num_devices == 1 and compute_on_cpu is set to True.

Return type:

None

static validate_metrics(metrics)#

Validate metrics.

Return type:

Sequence[AnomalibMetric]

class anomalib.metrics.F1AdaptiveThreshold(default_value=0.5, **kwargs)#

Bases: BinaryPrecisionRecallCurve, Threshold

Adaptive threshold that maximizes F1 score.

This class computes and stores the optimal threshold for converting anomaly scores to binary predictions by maximizing the F1 score on validation data.

Parameters:
  • default_value (float) – Initial threshold value used before computation. Defaults to 0.5.

  • **kwargs – Additional arguments passed to parent classes.

value#

Current threshold value.

Type:

torch.Tensor

Example

>>> from anomalib.metrics import F1AdaptiveThreshold
>>> import torch
>>> # Create validation data
>>> labels = torch.tensor([0, 0, 1, 1])  # 2 normal, 2 anomalous
>>> scores = torch.tensor([0.1, 0.2, 0.8, 0.9])  # Anomaly scores
>>> # Initialize threshold
>>> threshold = F1AdaptiveThreshold()
>>> # Compute optimal threshold
>>> optimal_value = threshold(scores, labels)
>>> print(f"Optimal threshold: {optimal_value:.4f}")
Optimal threshold: 0.5000
compute()#

Compute optimal threshold by maximizing F1 score.

Calculates precision-recall curve and corresponding thresholds, then finds the threshold that maximizes the F1 score.

Returns:

Optimal threshold value.

Return type:

torch.Tensor

Warning

If validation set contains no anomalous samples, the threshold will default to the maximum anomaly score, which may lead to poor performance.

class anomalib.metrics.F1Max(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _F1Max

Wrapper to add AnomalibMetric functionality to F1Max metric.

This class wraps the internal _F1Max metric to make it compatible with Anomalib’s batch processing capabilities.

Example

>>> from anomalib.metrics import F1Max
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Create metric with batch fields
>>> f1_max = F1Max(fields=["pred_score", "gt_label"])
>>> # Create sample batch
>>> batch = ImageBatch(
...     image=torch.rand(4, 3, 32, 32),
...     pred_score=torch.tensor([0.1, 0.4, 0.35, 0.8]),
...     gt_label=torch.tensor([0, 0, 1, 1])
... )
>>> # Update and compute
>>> f1_max.update(batch)
>>> f1_max.compute()
tensor(1.0)
class anomalib.metrics.F1Score(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, BinaryF1Score

Wrapper to add AnomalibMetric functionality to F1Score metric.

This class wraps the torchmetrics BinaryF1Score to make it compatible with Anomalib’s batch processing capabilities.

Example

>>> from anomalib.metrics import F1Score
>>> import torch
>>> # Create metric
>>> f1 = F1Score()
>>> # Create sample data
>>> preds = torch.tensor([0, 0, 1, 1])
>>> target = torch.tensor([0, 1, 1, 1])
>>> # Update and compute
>>> f1.update(preds, target)
>>> f1.compute()
tensor(0.8571)
class anomalib.metrics.ManualThreshold(default_value=0.5, **kwargs)#

Bases: Threshold

Initialize Manual Threshold.

Parameters:
  • default_value (float, optional) – Default threshold value. Defaults to 0.5.

  • kwargs – Any keyword arguments.

Examples

>>> from anomalib.metrics import ManualThreshold
>>> import torch
...
>>> manual_threshold = ManualThreshold(default_value=0.5)
...
>>> labels = torch.randint(low=0, high=2, size=(5,))
>>> preds = torch.rand(5)
...
>>> threshold = manual_threshold(preds, labels)
>>> threshold
tensor(0.5000, dtype=torch.float64)

As the threshold is manually set, the threshold value is the same as the default_value.

>>> labels = torch.randint(low=0, high=2, size=(5,))
>>> preds = torch.rand(5)
>>> threshold = manual_threshold(preds2, labels2)
>>> threshold
tensor(0.5000, dtype=torch.float64)

The threshold value remains the same even if the inputs change.

compute()#

Compute the threshold.

In case of manual thresholding, the threshold is already set and does not need to be computed.

Returns:

Value of the optimal threshold.

Return type:

torch.Tensor

static update(*args, **kwargs)#

Do nothing.

Parameters:
  • *args – Any positional arguments.

  • **kwargs – Any keyword arguments.

Return type:

None

class anomalib.metrics.MinMax(**kwargs)#

Bases: Metric

Track minimum and maximum values across batches.

This metric maintains running minimum and maximum values across all batches it processes. It is useful for tasks like normalization or monitoring the range of values during training.

Parameters:
  • full_state_update (bool, optional) – Whether to update the internal state with each new batch. Defaults to True.

  • kwargs – Additional keyword arguments passed to the parent class.

min#

Running minimum value seen across all batches

Type:

torch.Tensor

max#

Running maximum value seen across all batches

Type:

torch.Tensor

Example

>>> from anomalib.metrics import MinMax
>>> import torch
>>> # Create metric
>>> minmax = MinMax()
>>> # Update with batches
>>> batch1 = torch.tensor([0.1, 0.2, 0.3])
>>> batch2 = torch.tensor([0.2, 0.4, 0.5])
>>> minmax.update(batch1)
>>> minmax.update(batch2)
>>> # Get final min/max values
>>> min_val, max_val = minmax.compute()
>>> min_val, max_val
(tensor(0.1000), tensor(0.5000))
compute()#

Compute final minimum and maximum values.

Returns:

Tuple containing the (min, max)

values tracked across all batches

Return type:

tuple[torch.Tensor, torch.Tensor]

update(predictions, *args, **kwargs)#

Update running min and max values with new predictions.

Parameters:
  • predictions (torch.Tensor) – New tensor of values to include in min/max tracking

  • *args – Additional positional arguments (unused)

  • **kwargs – Additional keyword arguments (unused)

Return type:

None

class anomalib.metrics.PIMO(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _PIMO

Wrapper adding AnomalibMetric functionality to PIMO metric.

class anomalib.metrics.PRO(fields=None, prefix='', **kwargs)#

Bases: AnomalibMetric, _PRO

Wrapper to add AnomalibMetric functionality to PRO metric.

This class inherits from both AnomalibMetric and _PRO to combine Anomalib’s metric functionality with the PRO score computation.