Metrics#

Anomalib provides a comprehensive set of metrics for evaluating anomaly detection model performance. All metrics extend TorchMetrics’ functionality with Anomalib-specific features.

Available Metrics#

Area Under Curve Metrics#

AUROC

Area Under the Receiver Operating Characteristic curve. Measures the model’s ability to distinguish between normal and anomalous samples.

anomalib.metrics.AUROC
AUPR

Area Under the Precision-Recall curve. Particularly useful for imbalanced datasets.

anomalib.metrics.AUPR
AUPRO

Area Under the Per-Region Overlap curve. Evaluates pixel-level anomaly localization performance.

anomalib.metrics.AUPRO
AUPIMO

Area Under the Per-Image Missed Overlap curve. Advanced metric for evaluating localization quality.

anomalib.metrics.AUPIMO

F1 Score Metrics#

F1Score

Standard F1 score for binary classification. Harmonic mean of precision and recall.

anomalib.metrics.F1Score
F1Max

Maximum F1 score across all possible thresholds. Useful for finding optimal operating points.

anomalib.metrics.F1Max

Threshold Metrics#

F1AdaptiveThreshold

Automatically determines the optimal threshold by maximizing F1 score.

anomalib.metrics.F1AdaptiveThreshold
ManualThreshold

Uses a manually specified threshold for classification.

anomalib.metrics.ManualThreshold

Other Metrics#

PRO

Per-Region Overlap score for evaluating pixel-level localization.

anomalib.metrics.PRO
PIMO

Per-Image Missed Overlap for assessing localization errors.

anomalib.metrics.PIMO
PGn

Presorted Good with n% bad samples missed. Measures false negative rate at specific operating points.

anomalib.metrics.PGn
PBn

Presorted Bad with n% good samples misclassified. Measures false positive rate at specific operating points.

anomalib.metrics.PBn
MinMax

Normalizes anomaly scores to [0, 1] range using min-max scaling.

anomalib.metrics.MinMax
AnomalyScoreDistribution

Analyzes and tracks the distribution of anomaly scores for model diagnostics.

anomalib.metrics.AnomalyScoreDistribution

Utility Classes#

AnomalibMetric

Base class for all Anomalib metrics. Extends TorchMetrics with field-based updates.

anomalib.metrics.AnomalibMetric
Evaluator

Orchestrates multiple metrics for comprehensive model evaluation.

anomalib.metrics.Evaluator
BinaryPrecisionRecallCurve

Computes precision-recall curves for binary classification tasks.

anomalib.metrics.BinaryPrecisionRecallCurve

API Reference#

Custom metrics for evaluating anomaly detection models.

This module provides various metrics for evaluating anomaly detection performance:

  • Area Under Curve (AUC) metrics:
    • AUROC: Area Under Receiver Operating Characteristic curve

    • AUPR: Area Under Precision-Recall curve

    • AUPRO: Area Under Per-Region Overlap curve

    • AUPIMO: Area Under Per-Image Missed Overlap curve

  • F1-score metrics:
    • F1Score: Standard F1 score

    • F1Max: Maximum F1 score across thresholds

  • Threshold metrics:
    • F1AdaptiveThreshold: Finds optimal threshold by maximizing F1 score

    • ManualThreshold: Uses manually specified threshold

  • Other metrics:
    • AnomalibMetric: Base class for custom metrics

    • AnomalyScoreDistribution: Analyzes score distributions

    • BinaryPrecisionRecallCurve: Computes precision-recall curves

    • Evaluator: Combines multiple metrics for evaluation

    • MinMax: Normalizes scores to [0,1] range

    • PBn: Presorted bad with n% good samples misclassified

    • PGn: Presorted good with n% bad samples missed

    • PRO: Per-Region Overlap score

    • PIMO: Per-Image Missed Overlap score

Example

>>> from anomalib.metrics import AUROC, F1Score
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Initialize metrics with required fields
>>> auroc = AUROC(fields=["pred_score", "gt_label"])
>>> f1 = F1Score(fields=["pred_score", "gt_label"], threshold=0.5)
>>> # Create sample batch
>>> batch = ImageBatch(
...     image=torch.rand(4, 3, 32, 32),
...     pred_score=torch.tensor([0.1, 0.9, 0.2, 0.8]),
...     gt_label=torch.tensor([0, 1, 0, 1])
... )
>>> # Calculate metrics
>>> auroc(batch)
tensor(1.)
>>> f1(batch)
tensor(1.)
class anomalib.metrics.AUPIMO(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _AUPIMO

Wrapper adding AnomalibMetric functionality to AUPIMO metric.

default_fields: Sequence[str] = ('anomaly_map', 'gt_mask')#
class anomalib.metrics.AUPR(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _AUPR

Wrapper to add AnomalibMetric functionality to AUPR metric.

class anomalib.metrics.AUPRO(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _AUPRO

Wrapper to add AnomalibMetric functionality to AUPRO metric.

class anomalib.metrics.AUROC(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _AUROC

Wrapper to add AnomalibMetric functionality to AUROC metric.

This class wraps the internal _AUROC metric to make it compatible with Anomalib’s batch processing capabilities.

Example

>>> from anomalib.metrics import AUROC
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Create sample batch
>>> batch = ImageBatch(
...     image=torch.rand(4, 3, 32, 32),
...     pred_score=torch.tensor([0.1, 0.2, 0.8, 0.9]),
...     gt_label=torch.tensor([0, 0, 1, 1])
... )
>>> # Initialize and compute AUROC
>>> auroc = AUROC(fields=["pred_score", "gt_label"])
>>> auroc(batch)
tensor(1.0)
class anomalib.metrics.AnomalibMetric(fields=None, prefix='', strict=True, **kwargs)#

Bases: object

Base class for metrics in Anomalib.

Makes any torchmetrics metric compatible with the Anomalib framework by adding batch processing capabilities. Subclasses must inherit from both this class and a torchmetrics metric.

The class enables updating metrics with Batch objects instead of individual tensors. It extracts the specified fields from the batch and passes them to the underlying metric’s update method.

Parameters:
  • fields (Sequence[str] | None) – Names of fields to extract from batch. If None, uses class’s default_fields. Required if no defaults.

  • prefix (str) – Prefix added to metric name. Defaults to “”.

  • strict (bool) – Whether to raise an error if batch is missing fields.

  • **kwargs – Additional arguments passed to parent metric class.

Raises:

ValueError – If no fields are specified and class has no defaults.

Example

Create image and pixel-level F1 metrics:

>>> from torchmetrics.classification import BinaryF1Score
>>> class F1Score(AnomalibMetric, BinaryF1Score):
...     pass
...
>>> # Image-level metric using pred_label and gt_label
>>> image_f1 = F1Score(
...     fields=["pred_label", "gt_label"],
...     prefix="image_"
... )
>>> # Pixel-level metric using pred_mask and gt_mask
>>> pixel_f1 = F1Score(
...     fields=["pred_mask", "gt_mask"],
...     prefix="pixel_"
... )
compute()#

Compute the metric value.

If the metric has not been updated, and metric is not in strict mode, return None.

Returns:

Computed metric value or None.

Return type:

Tensor

default_fields: Sequence[str]#
update(batch, *args, **kwargs)#

Update metric with values from batch fields.

Parameters:
  • batch (Batch) – Batch object containing required fields.

  • *args – Additional positional arguments passed to parent update.

  • **kwargs – Additional keyword arguments passed to parent update.

Raises:

ValueError – If batch is missing any required fields.

Return type:

None

class anomalib.metrics.AnomalyScoreDistribution(**kwargs)#

Bases: Metric

Compute distribution statistics of anomaly scores.

This class tracks and computes the mean and standard deviation of anomaly scores. Statistics are computed for both image-level scores and pixel-level anomaly maps.

The metric maintains internal state to accumulate scores, anomaly maps, and labels across batches before computing final statistics.

Example

>>> dist = AnomalyScoreDistribution()
>>> # Update with batch of scores
>>> scores = torch.tensor([0.1, 0.2, 0.3])
>>> dist.update(anomaly_scores=scores)
>>> # Compute statistics
>>> img_mean, img_std, pix_mean, pix_std = dist.compute()
compute()#

Compute distribution statistics from accumulated scores and maps.

Returns:

  • image_mean: Mean of log-transformed image anomaly scores

  • image_std: Standard deviation of log-transformed image scores

  • pixel_mean: Mean of log-transformed pixel anomaly maps

  • pixel_std: Standard deviation of log-transformed pixel maps

Return type:

tuple[Tensor, Tensor, Tensor, Tensor]

plot(bins=30, good_color='skyblue', bad_color='salmon', xlabel='Score', ylabel='Relative Count', title='Score Histogram', legend_labels=('Good', 'Bad'))#

Generate a histogram of scores.

Parameters:
  • bins (int) – Number of histogram bins. Defaults to 30.

  • good_color (str) – Color for good samples. Defaults to “skyblue”.

  • bad_color (str) – Color for bad samples. Defaults to “salmon”.

  • xlabel (str) – Label for the x-axis. Defaults to “Score”.

  • ylabel (str) – Label for the y-axis. Defaults to “Relative Count”.

  • title (str) – Title of the plot. Defaults to “Score Histogram”.

  • legend_labels (tuple[str, str]) – Legend labels for good and bad samples. Defaults to (“Good”, “Bad”).

Returns:

Tuple containing both the figure and the figure

title to be used for logging

Return type:

tuple[Figure, str]

Raises:

ValueError – If no anomaly scores or labels are available.

update(*args, anomaly_scores=None, anomaly_maps=None, labels=None, **kwargs)#

Update the internal state with new scores and maps.

Parameters:
  • *args – Unused positional arguments.

  • anomaly_scores (Tensor | None) – Batch of image-level anomaly scores.

  • anomaly_maps (Tensor | None) – Batch of pixel-level anomaly maps.

  • labels (Tensor | None) – Batch of binary labels.

  • **kwargs – Unused keyword arguments.

Return type:

None

class anomalib.metrics.BinaryPrecisionRecallCurve(thresholds=None, ignore_index=None, validate_args=True, normalization='sigmoid', **kwargs)#

Bases: BinaryPrecisionRecallCurve

Binary precision-recall curve without threshold prediction normalization.

This class extends the torchmetrics BinaryPrecisionRecallCurve class but removes the sigmoid normalization step applied to prediction thresholds.

Example

>>> import torch
>>> from anomalib.metrics import BinaryPrecisionRecallCurve
>>> metric = BinaryPrecisionRecallCurve()
>>> preds = torch.tensor([0.1, 0.4, 0.35, 0.8])
>>> target = torch.tensor([0, 0, 1, 1])
>>> metric.update(preds, target)
>>> precision, recall, thresholds = metric.compute()
update(preds, target)#

Update metric state with new predictions and targets.

Unlike the base class, this method accepts raw predictions without applying sigmoid normalization.

Parameters:
  • preds (Tensor) – Raw predicted scores or probabilities

  • target (Tensor) – Ground truth binary labels (0 or 1)

Return type:

None

class anomalib.metrics.Evaluator(val_metrics=None, test_metrics=None, compute_on_cpu=True)#

Bases: Module, Callback

Evaluator module for LightningModule.

The Evaluator module is a PyTorch module that computes and logs metrics during validation and test steps. Each AnomalibModule should have an Evaluator module as a submodule to compute and log metrics during validation and test steps. An Evaluation module can be passed to the AnomalibModule as a parameter during initialization. When no Evaluator module is provided, the AnomalibModule will use a default Evaluator module that logs a default set of metrics.

Parameters:

Examples

>>> from anomalib.metrics import F1Score, AUROC
>>> from anomalib.data import ImageBatch
>>> import torch
>>>
>>> f1_score = F1Score(fields=["pred_label", "gt_label"])
>>> auroc = AUROC(fields=["pred_score", "gt_label"])
>>>
>>> evaluator = Evaluator(test_metrics=[f1_score])
metrics_to_cpu(metrics)#

Set the compute_on_cpu attribute of the metrics to True.

Return type:

None

on_test_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#

Update test metrics with the batch output.

Return type:

None

on_test_epoch_end(trainer, pl_module)#

Compute and log test metrics.

Return type:

None

on_validation_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#

Update validation metrics with the batch output.

Return type:

None

on_validation_epoch_end(trainer, pl_module)#

Compute and log validation metrics.

Return type:

None

setup(trainer, pl_module, stage)#

Move metrics to cpu if num_devices == 1 and compute_on_cpu is set to True.

Return type:

None

static validate_metrics(metrics)#

Validate metrics.

Return type:

Sequence[AnomalibMetric]

class anomalib.metrics.F1AdaptiveThreshold(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _F1AdaptiveThreshold

Wrapper to add AnomalibMetric functionality to F1AdaptiveThreshold metric.

class anomalib.metrics.F1Max(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _F1Max

Wrapper to add AnomalibMetric functionality to F1Max metric.

This class wraps the internal _F1Max metric to make it compatible with Anomalib’s batch processing capabilities.

Example

>>> from anomalib.metrics import F1Max
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Create metric with batch fields
>>> f1_max = F1Max(fields=["pred_score", "gt_label"])
>>> # Create sample batch
>>> batch = ImageBatch(
...     image=torch.rand(4, 3, 32, 32),
...     pred_score=torch.tensor([0.1, 0.4, 0.35, 0.8]),
...     gt_label=torch.tensor([0, 0, 1, 1])
... )
>>> # Update and compute
>>> f1_max.update(batch)
>>> f1_max.compute()
tensor(0.8000)
class anomalib.metrics.F1Score(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, BinaryF1Score

Wrapper to add AnomalibMetric functionality to F1Score metric.

This class wraps the torchmetrics BinaryF1Score to make it compatible with Anomalib’s batch processing capabilities.

Example

>>> from anomalib.metrics import F1Score
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Create metric
>>> f1 = F1Score(fields=["pred_score", "gt_label"])
>>> # Create sample batch
>>> batch = ImageBatch(
...     image=torch.rand(4, 3, 32, 32),
...     pred_score=torch.tensor([0, 0, 1, 1]),
...     gt_label=torch.tensor([0, 1, 1, 1])
... )
>>> # Update and compute
>>> f1.update(batch)
>>> f1.compute()
tensor(0.8000)
class anomalib.metrics.ManualThreshold(default_value=0.5, **kwargs)#

Bases: Threshold

Initialize Manual Threshold.

Parameters:
  • default_value (float) – Default threshold value. Defaults to 0.5.

  • kwargs – Any keyword arguments.

Examples

>>> from anomalib.metrics import ManualThreshold
>>> import torch
...
>>> manual_threshold = ManualThreshold(default_value=0.5)
...
>>> labels = torch.randint(low=0, high=2, size=(5,))
>>> preds = torch.rand(5)
...
>>> threshold = manual_threshold(preds, labels)
>>> threshold
tensor(0.5000, dtype=torch.float64)

As the threshold is manually set, the threshold value is the same as the default_value.

>>> labels = torch.randint(low=0, high=2, size=(5,))
>>> preds = torch.rand(5)
>>> threshold = manual_threshold(preds2, labels2)
>>> threshold
tensor(0.5000, dtype=torch.float64)

The threshold value remains the same even if the inputs change.

compute()#

Compute the threshold.

In case of manual thresholding, the threshold is already set and does not need to be computed.

Returns:

Value of the optimal threshold.

Return type:

Tensor

static update(*args, **kwargs)#

Do nothing.

Parameters:
  • *args – Any positional arguments.

  • **kwargs – Any keyword arguments.

Return type:

None

class anomalib.metrics.MinMax(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _MinMax

Wrapper to add AnomalibMetric functionality to MinMax metric.

class anomalib.metrics.PBn(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _PBn

Wrapper to add AnomalibMetric functionality to PBn metric.

This class wraps the internal _PBn metric to make it compatible with Anomalib’s batch processing capabilities.

default_fields: Sequence[str] = ('pred_score', 'gt_label')#
class anomalib.metrics.PGn(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _PGn

Wrapper to add AnomalibMetric functionality to PGn metric.

This class wraps the internal _PGn metric to make it compatible with Anomalib’s batch processing capabilities.

default_fields: Sequence[str] = ('pred_score', 'gt_label')#
class anomalib.metrics.PIMO(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _PIMO

Wrapper adding AnomalibMetric functionality to PIMO metric.

default_fields: Sequence[str] = ('anomaly_map', 'gt_mask')#
class anomalib.metrics.PRO(fields=None, prefix='', strict=True, **kwargs)#

Bases: AnomalibMetric, _PRO

Wrapper to add AnomalibMetric functionality to PRO metric.

This class inherits from both AnomalibMetric and _PRO to combine Anomalib’s metric functionality with the PRO score computation.

anomalib.metrics.create_anomalib_metric(metric_cls)#

Create an Anomalib version of a torchmetrics metric.

Factory function that creates a new class inheriting from both AnomalibMetric and the input metric class. The resulting class has batch processing capabilities while maintaining the original metric’s functionality.

Parameters:

metric_cls (type) – torchmetrics metric class to wrap.

Returns:

New class inheriting from AnomalibMetric and input class.

Return type:

type

Raises:

AssertionError – If input class is not a torchmetrics.Metric subclass.

Example

Create F1 score metric:

>>> from torchmetrics.classification import BinaryF1Score
>>> F1Score = create_anomalib_metric(BinaryF1Score)
>>> f1_score = F1Score(fields=["pred_label", "gt_label"])
>>> f1_score.update(batch)  # Can update with batch directly
>>> f1_score.compute()
tensor(0.6667)