Metrics#
Anomalib provides a comprehensive set of metrics for evaluating anomaly detection model performance. All metrics extend TorchMetrics’ functionality with Anomalib-specific features.
Available Metrics#
Area Under Curve Metrics#
Area Under the Receiver Operating Characteristic curve. Measures the model’s ability to distinguish between normal and anomalous samples.
Area Under the Precision-Recall curve. Particularly useful for imbalanced datasets.
Area Under the Per-Region Overlap curve. Evaluates pixel-level anomaly localization performance.
Area Under the Per-Image Missed Overlap curve. Advanced metric for evaluating localization quality.
F1 Score Metrics#
Standard F1 score for binary classification. Harmonic mean of precision and recall.
Maximum F1 score across all possible thresholds. Useful for finding optimal operating points.
Threshold Metrics#
Automatically determines the optimal threshold by maximizing F1 score.
Uses a manually specified threshold for classification.
Other Metrics#
Per-Region Overlap score for evaluating pixel-level localization.
Per-Image Missed Overlap for assessing localization errors.
Presorted Good with n% bad samples missed. Measures false negative rate at specific operating points.
Presorted Bad with n% good samples misclassified. Measures false positive rate at specific operating points.
Normalizes anomaly scores to [0, 1] range using min-max scaling.
Analyzes and tracks the distribution of anomaly scores for model diagnostics.
Utility Classes#
Base class for all Anomalib metrics. Extends TorchMetrics with field-based updates.
Orchestrates multiple metrics for comprehensive model evaluation.
Computes precision-recall curves for binary classification tasks.
API Reference#
Custom metrics for evaluating anomaly detection models.
This module provides various metrics for evaluating anomaly detection performance:
- Area Under Curve (AUC) metrics:
AUROC: Area Under Receiver Operating Characteristic curveAUPR: Area Under Precision-Recall curveAUPRO: Area Under Per-Region Overlap curveAUPIMO: Area Under Per-Image Missed Overlap curve
- F1-score metrics:
F1Score: Standard F1 scoreF1Max: Maximum F1 score across thresholds
- Threshold metrics:
F1AdaptiveThreshold: Finds optimal threshold by maximizing F1 scoreManualThreshold: Uses manually specified threshold
- Other metrics:
AnomalibMetric: Base class for custom metricsAnomalyScoreDistribution: Analyzes score distributionsBinaryPrecisionRecallCurve: Computes precision-recall curvesEvaluator: Combines multiple metrics for evaluationMinMax: Normalizes scores to [0,1] rangePBn: Presorted bad with n% good samples misclassifiedPGn: Presorted good with n% bad samples missedPRO: Per-Region Overlap scorePIMO: Per-Image Missed Overlap score
Example
>>> from anomalib.metrics import AUROC, F1Score
>>> from anomalib.data import ImageBatch
>>> import torch
>>> # Initialize metrics with required fields
>>> auroc = AUROC(fields=["pred_score", "gt_label"])
>>> f1 = F1Score(fields=["pred_score", "gt_label"], threshold=0.5)
>>> # Create sample batch
>>> batch = ImageBatch(
... image=torch.rand(4, 3, 32, 32),
... pred_score=torch.tensor([0.1, 0.9, 0.2, 0.8]),
... gt_label=torch.tensor([0, 1, 0, 1])
... )
>>> # Calculate metrics
>>> auroc(batch)
tensor(1.)
>>> f1(batch)
tensor(1.)
- class anomalib.metrics.AUPIMO(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_AUPIMOWrapper adding AnomalibMetric functionality to AUPIMO metric.
- class anomalib.metrics.AUPR(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_AUPRWrapper to add AnomalibMetric functionality to AUPR metric.
- class anomalib.metrics.AUPRO(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_AUPROWrapper to add AnomalibMetric functionality to AUPRO metric.
- class anomalib.metrics.AUROC(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_AUROCWrapper to add AnomalibMetric functionality to AUROC metric.
This class wraps the internal
_AUROCmetric to make it compatible with Anomalib’s batch processing capabilities.Example
>>> from anomalib.metrics import AUROC >>> from anomalib.data import ImageBatch >>> import torch >>> # Create sample batch >>> batch = ImageBatch( ... image=torch.rand(4, 3, 32, 32), ... pred_score=torch.tensor([0.1, 0.2, 0.8, 0.9]), ... gt_label=torch.tensor([0, 0, 1, 1]) ... ) >>> # Initialize and compute AUROC >>> auroc = AUROC(fields=["pred_score", "gt_label"]) >>> auroc(batch) tensor(1.0)
- class anomalib.metrics.AnomalibMetric(fields=None, prefix='', strict=True, **kwargs)#
Bases:
objectBase class for metrics in Anomalib.
Makes any torchmetrics metric compatible with the Anomalib framework by adding batch processing capabilities. Subclasses must inherit from both this class and a torchmetrics metric.
The class enables updating metrics with
Batchobjects instead of individual tensors. It extracts the specified fields from the batch and passes them to the underlying metric’s update method.- Parameters:
fields (
Sequence[str] |None) – Names of fields to extract from batch. If None, uses class’sdefault_fields. Required if no defaults.prefix (
str) – Prefix added to metric name. Defaults to “”.strict (
bool) – Whether to raise an error if batch is missing fields.**kwargs – Additional arguments passed to parent metric class.
- Raises:
ValueError – If no fields are specified and class has no defaults.
Example
Create image and pixel-level F1 metrics:
>>> from torchmetrics.classification import BinaryF1Score >>> class F1Score(AnomalibMetric, BinaryF1Score): ... pass ... >>> # Image-level metric using pred_label and gt_label >>> image_f1 = F1Score( ... fields=["pred_label", "gt_label"], ... prefix="image_" ... ) >>> # Pixel-level metric using pred_mask and gt_mask >>> pixel_f1 = F1Score( ... fields=["pred_mask", "gt_mask"], ... prefix="pixel_" ... )
- compute()#
Compute the metric value.
If the metric has not been updated, and metric is not in strict mode, return None.
- Returns:
Computed metric value or None.
- Return type:
- update(batch, *args, **kwargs)#
Update metric with values from batch fields.
- Parameters:
batch (
Batch) – Batch object containing required fields.*args – Additional positional arguments passed to parent update.
**kwargs – Additional keyword arguments passed to parent update.
- Raises:
ValueError – If batch is missing any required fields.
- Return type:
- class anomalib.metrics.AnomalyScoreDistribution(**kwargs)#
Bases:
MetricCompute distribution statistics of anomaly scores.
This class tracks and computes the mean and standard deviation of anomaly scores. Statistics are computed for both image-level scores and pixel-level anomaly maps.
The metric maintains internal state to accumulate scores, anomaly maps, and labels across batches before computing final statistics.
Example
>>> dist = AnomalyScoreDistribution() >>> # Update with batch of scores >>> scores = torch.tensor([0.1, 0.2, 0.3]) >>> dist.update(anomaly_scores=scores) >>> # Compute statistics >>> img_mean, img_std, pix_mean, pix_std = dist.compute()
- compute()#
Compute distribution statistics from accumulated scores and maps.
- plot(bins=30, good_color='skyblue', bad_color='salmon', xlabel='Score', ylabel='Relative Count', title='Score Histogram', legend_labels=('Good', 'Bad'))#
Generate a histogram of scores.
- Parameters:
bins (
int) – Number of histogram bins. Defaults to 30.good_color (
str) – Color for good samples. Defaults to “skyblue”.bad_color (
str) – Color for bad samples. Defaults to “salmon”.xlabel (
str) – Label for the x-axis. Defaults to “Score”.ylabel (
str) – Label for the y-axis. Defaults to “Relative Count”.title (
str) – Title of the plot. Defaults to “Score Histogram”.legend_labels (
tuple[str,str]) – Legend labels for good and bad samples. Defaults to (“Good”, “Bad”).
- Returns:
- Tuple containing both the figure and the figure
title to be used for logging
- Return type:
- Raises:
ValueError – If no anomaly scores or labels are available.
- update(*args, anomaly_scores=None, anomaly_maps=None, labels=None, **kwargs)#
Update the internal state with new scores and maps.
- class anomalib.metrics.BinaryPrecisionRecallCurve(thresholds=None, ignore_index=None, validate_args=True, normalization='sigmoid', **kwargs)#
Bases:
BinaryPrecisionRecallCurveBinary precision-recall curve without threshold prediction normalization.
This class extends the torchmetrics
BinaryPrecisionRecallCurveclass but removes the sigmoid normalization step applied to prediction thresholds.Example
>>> import torch >>> from anomalib.metrics import BinaryPrecisionRecallCurve >>> metric = BinaryPrecisionRecallCurve() >>> preds = torch.tensor([0.1, 0.4, 0.35, 0.8]) >>> target = torch.tensor([0, 0, 1, 1]) >>> metric.update(preds, target) >>> precision, recall, thresholds = metric.compute()
- update(preds, target)#
Update metric state with new predictions and targets.
Unlike the base class, this method accepts raw predictions without applying sigmoid normalization.
- class anomalib.metrics.Evaluator(val_metrics=None, test_metrics=None, compute_on_cpu=True)#
-
Evaluator module for LightningModule.
The Evaluator module is a PyTorch module that computes and logs metrics during validation and test steps. Each AnomalibModule should have an Evaluator module as a submodule to compute and log metrics during validation and test steps. An Evaluation module can be passed to the AnomalibModule as a parameter during initialization. When no Evaluator module is provided, the AnomalibModule will use a default Evaluator module that logs a default set of metrics.
- Parameters:
val_metrics (
AnomalibMetric|Sequence[AnomalibMetric] |None) – Validation metrics. Defaults to[].test_metrics (
AnomalibMetric|Sequence[AnomalibMetric] |None) – Test metrics. Defaults to[].compute_on_cpu (
bool) – Whether to compute metrics on CPU. Defaults toTrue.
Examples
>>> from anomalib.metrics import F1Score, AUROC >>> from anomalib.data import ImageBatch >>> import torch >>> >>> f1_score = F1Score(fields=["pred_label", "gt_label"]) >>> auroc = AUROC(fields=["pred_score", "gt_label"]) >>> >>> evaluator = Evaluator(test_metrics=[f1_score])
- on_test_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#
Update test metrics with the batch output.
- Return type:
- on_validation_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#
Update validation metrics with the batch output.
- Return type:
- setup(trainer, pl_module, stage)#
Move metrics to cpu if
num_devices == 1andcompute_on_cpuis set toTrue.- Return type:
- static validate_metrics(metrics)#
Validate metrics.
- Return type:
- class anomalib.metrics.F1AdaptiveThreshold(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_F1AdaptiveThresholdWrapper to add AnomalibMetric functionality to F1AdaptiveThreshold metric.
- class anomalib.metrics.F1Max(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_F1MaxWrapper to add AnomalibMetric functionality to F1Max metric.
This class wraps the internal
_F1Maxmetric to make it compatible with Anomalib’s batch processing capabilities.Example
>>> from anomalib.metrics import F1Max >>> from anomalib.data import ImageBatch >>> import torch >>> # Create metric with batch fields >>> f1_max = F1Max(fields=["pred_score", "gt_label"]) >>> # Create sample batch >>> batch = ImageBatch( ... image=torch.rand(4, 3, 32, 32), ... pred_score=torch.tensor([0.1, 0.4, 0.35, 0.8]), ... gt_label=torch.tensor([0, 0, 1, 1]) ... ) >>> # Update and compute >>> f1_max.update(batch) >>> f1_max.compute() tensor(0.8000)
- class anomalib.metrics.F1Score(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,BinaryF1ScoreWrapper to add AnomalibMetric functionality to F1Score metric.
This class wraps the torchmetrics
BinaryF1Scoreto make it compatible with Anomalib’s batch processing capabilities.Example
>>> from anomalib.metrics import F1Score >>> from anomalib.data import ImageBatch >>> import torch >>> # Create metric >>> f1 = F1Score(fields=["pred_score", "gt_label"]) >>> # Create sample batch >>> batch = ImageBatch( ... image=torch.rand(4, 3, 32, 32), ... pred_score=torch.tensor([0, 0, 1, 1]), ... gt_label=torch.tensor([0, 1, 1, 1]) ... ) >>> # Update and compute >>> f1.update(batch) >>> f1.compute() tensor(0.8000)
- class anomalib.metrics.ManualThreshold(default_value=0.5, **kwargs)#
Bases:
ThresholdInitialize Manual Threshold.
- Parameters:
default_value (
float) – Default threshold value. Defaults to0.5.kwargs – Any keyword arguments.
Examples
>>> from anomalib.metrics import ManualThreshold >>> import torch ... >>> manual_threshold = ManualThreshold(default_value=0.5) ... >>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) ... >>> threshold = manual_threshold(preds, labels) >>> threshold tensor(0.5000, dtype=torch.float64)
As the threshold is manually set, the threshold value is the same as the
default_value.>>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) >>> threshold = manual_threshold(preds2, labels2) >>> threshold tensor(0.5000, dtype=torch.float64)
The threshold value remains the same even if the inputs change.
- compute()#
Compute the threshold.
In case of manual thresholding, the threshold is already set and does not need to be computed.
- Returns:
Value of the optimal threshold.
- Return type:
- class anomalib.metrics.MinMax(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_MinMaxWrapper to add AnomalibMetric functionality to MinMax metric.
- class anomalib.metrics.PBn(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_PBnWrapper to add AnomalibMetric functionality to PBn metric.
This class wraps the internal
_PBnmetric to make it compatible with Anomalib’s batch processing capabilities.
- class anomalib.metrics.PGn(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_PGnWrapper to add AnomalibMetric functionality to PGn metric.
This class wraps the internal
_PGnmetric to make it compatible with Anomalib’s batch processing capabilities.
- class anomalib.metrics.PIMO(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_PIMOWrapper adding AnomalibMetric functionality to PIMO metric.
- class anomalib.metrics.PRO(fields=None, prefix='', strict=True, **kwargs)#
Bases:
AnomalibMetric,_PROWrapper to add AnomalibMetric functionality to PRO metric.
This class inherits from both
AnomalibMetricand_PROto combine Anomalib’s metric functionality with the PRO score computation.
- anomalib.metrics.create_anomalib_metric(metric_cls)#
Create an Anomalib version of a torchmetrics metric.
Factory function that creates a new class inheriting from both
AnomalibMetricand the input metric class. The resulting class has batch processing capabilities while maintaining the original metric’s functionality.- Parameters:
metric_cls (
type) – torchmetrics metric class to wrap.- Returns:
New class inheriting from
AnomalibMetricand input class.- Return type:
- Raises:
AssertionError – If input class is not a torchmetrics.Metric subclass.
Example
Create F1 score metric:
>>> from torchmetrics.classification import BinaryF1Score >>> F1Score = create_anomalib_metric(BinaryF1Score) >>> f1_score = F1Score(fields=["pred_label", "gt_label"]) >>> f1_score.update(batch) # Can update with batch directly >>> f1_score.compute() tensor(0.6667)