Metrics#
Custom metrics for evaluating anomaly detection models.
This module provides various metrics for evaluating anomaly detection performance:
- Area Under Curve (AUC) metrics:
AUROC
: Area Under Receiver Operating Characteristic curveAUPR
: Area Under Precision-Recall curveAUPRO
: Area Under Per-Region Overlap curveAUPIMO
: Area Under Per-Image Missed Overlap curve
- F1-score metrics:
F1Score
: Standard F1 scoreF1Max
: Maximum F1 score across thresholds
- Threshold metrics:
F1AdaptiveThreshold
: Finds optimal threshold by maximizing F1 scoreManualThreshold
: Uses manually specified threshold
- Other metrics:
AnomalibMetric
: Base class for custom metricsAnomalyScoreDistribution
: Analyzes score distributionsBinaryPrecisionRecallCurve
: Computes precision-recall curvesEvaluator
: Combines multiple metrics for evaluationMinMax
: Normalizes scores to [0,1] rangePRO
: Per-Region Overlap scorePIMO
: Per-Image Missed Overlap score
Example
>>> from anomalib.metrics import AUROC, F1Score
>>> auroc = AUROC()
>>> f1 = F1Score()
>>> labels = torch.tensor([0, 1, 0, 1])
>>> scores = torch.tensor([0.1, 0.9, 0.2, 0.8])
>>> auroc(scores, labels)
tensor(1.)
>>> f1(scores, labels, threshold=0.5)
tensor(1.)
- class anomalib.metrics.AUPIMO(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_AUPIMO
Wrapper adding AnomalibMetric functionality to AUPIMO metric.
- class anomalib.metrics.AUPR(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_AUPR
Wrapper to add AnomalibMetric functionality to AUPR metric.
- class anomalib.metrics.AUPRO(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_AUPRO
Wrapper to add AnomalibMetric functionality to AUPRO metric.
- class anomalib.metrics.AUROC(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_AUROC
Wrapper to add AnomalibMetric functionality to AUROC metric.
- class anomalib.metrics.AnomalibMetric(fields=None, prefix='', **kwargs)#
Bases:
object
Base class for metrics in Anomalib.
Makes any torchmetrics metric compatible with the Anomalib framework by adding batch processing capabilities. Subclasses must inherit from both this class and a torchmetrics metric.
The class enables updating metrics with
Batch
objects instead of individual tensors. It extracts the specified fields from the batch and passes them to the underlying metric’s update method.- Parameters:
- Raises:
ValueError – If no fields are specified and class has no defaults.
Example
Create image and pixel-level F1 metrics:
>>> from torchmetrics.classification import BinaryF1Score >>> class F1Score(AnomalibMetric, BinaryF1Score): ... pass ... >>> # Image-level metric using pred_label and gt_label >>> image_f1 = F1Score( ... fields=["pred_label", "gt_label"], ... prefix="image_" ... ) >>> # Pixel-level metric using pred_mask and gt_mask >>> pixel_f1 = F1Score( ... fields=["pred_mask", "gt_mask"], ... prefix="pixel_" ... )
- update(batch, *args, **kwargs)#
Update metric with values from batch fields.
- Parameters:
batch (Batch) – Batch object containing required fields.
*args – Additional positional arguments passed to parent update.
**kwargs – Additional keyword arguments passed to parent update.
- Raises:
ValueError – If batch is missing any required fields.
- Return type:
- class anomalib.metrics.AnomalyScoreDistribution(**kwargs)#
Bases:
Metric
Compute distribution statistics of anomaly scores.
This class tracks and computes the mean and standard deviation of anomaly scores from the normal samples in the training set. Statistics are computed for both image-level scores and pixel-level anomaly maps.
The metric maintains internal state to accumulate scores and maps across batches before computing final statistics.
Example
>>> dist = AnomalyScoreDistribution() >>> # Update with batch of scores >>> scores = torch.tensor([0.1, 0.2, 0.3]) >>> dist.update(anomaly_scores=scores) >>> # Compute statistics >>> img_mean, img_std, pix_mean, pix_std = dist.compute()
- compute()#
Compute distribution statistics from accumulated scores and maps.
- Returns:
image_mean: Mean of log-transformed image anomaly scores
image_std: Standard deviation of log-transformed image scores
pixel_mean: Mean of log-transformed pixel anomaly maps
pixel_std: Standard deviation of log-transformed pixel maps
- Return type:
tuple containing
- update(*args, anomaly_scores=None, anomaly_maps=None, **kwargs)#
Update the internal state with new scores and maps.
- class anomalib.metrics.BinaryPrecisionRecallCurve(thresholds=None, ignore_index=None, validate_args=True, **kwargs)#
Bases:
BinaryPrecisionRecallCurve
Binary precision-recall curve without threshold prediction normalization.
This class extends the torchmetrics
BinaryPrecisionRecallCurve
class but removes the sigmoid normalization step applied to prediction thresholds.Example
>>> import torch >>> from anomalib.metrics import BinaryPrecisionRecallCurve >>> metric = BinaryPrecisionRecallCurve() >>> preds = torch.tensor([0.1, 0.4, 0.35, 0.8]) >>> target = torch.tensor([0, 0, 1, 1]) >>> metric.update(preds, target) >>> precision, recall, thresholds = metric.compute()
- update(preds, target)#
Update metric state with new predictions and targets.
Unlike the base class, this method accepts raw predictions without applying sigmoid normalization.
- Parameters:
preds (Tensor) – Raw predicted scores or probabilities
target (Tensor) – Ground truth binary labels (0 or 1)
- Return type:
- class anomalib.metrics.Evaluator(val_metrics=None, test_metrics=None, compute_on_cpu=True)#
-
Evaluator module for LightningModule.
The Evaluator module is a PyTorch module that computes and logs metrics during validation and test steps. Each AnomalibModule should have an Evaluator module as a submodule to compute and log metrics during validation and test steps. An Evaluation module can be passed to the AnomalibModule as a parameter during initialization. When no Evaluator module is provided, the AnomalibModule will use a default Evaluator module that logs a default set of metrics.
- Parameters:
val_metrics (Sequence[AnomalibMetric], optional) – Validation metrics. Defaults to
[]
.test_metrics (Sequence[AnomalibMetric], optional) – Test metrics. Defaults to
[]
.compute_on_cpu (bool, optional) – Whether to compute metrics on CPU. Defaults to
True
.
Examples
>>> from anomalib.metrics import F1Score, AUROC >>> from anomalib.data import ImageBatch >>> import torch >>> >>> f1_score = F1Score(fields=["pred_label", "gt_label"]) >>> auroc = AUROC(fields=["pred_score", "gt_label"]) >>> >>> evaluator = Evaluator(test_metrics=[f1_score])
- on_test_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#
Update test metrics with the batch output.
- Return type:
- on_validation_batch_end(trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0)#
Update validation metrics with the batch output.
- Return type:
- setup(trainer, pl_module, stage)#
Move metrics to cpu if
num_devices == 1
andcompute_on_cpu
is set toTrue
.- Return type:
- static validate_metrics(metrics)#
Validate metrics.
- Return type:
- class anomalib.metrics.F1AdaptiveThreshold(default_value=0.5, **kwargs)#
Bases:
BinaryPrecisionRecallCurve
,Threshold
Adaptive threshold that maximizes F1 score.
This class computes and stores the optimal threshold for converting anomaly scores to binary predictions by maximizing the F1 score on validation data.
- Parameters:
default_value (
float
) – Initial threshold value used before computation. Defaults to0.5
.**kwargs – Additional arguments passed to parent classes.
- value#
Current threshold value.
- Type:
Example
>>> from anomalib.metrics import F1AdaptiveThreshold >>> import torch >>> # Create validation data >>> labels = torch.tensor([0, 0, 1, 1]) # 2 normal, 2 anomalous >>> scores = torch.tensor([0.1, 0.2, 0.8, 0.9]) # Anomaly scores >>> # Initialize threshold >>> threshold = F1AdaptiveThreshold() >>> # Compute optimal threshold >>> optimal_value = threshold(scores, labels) >>> print(f"Optimal threshold: {optimal_value:.4f}") Optimal threshold: 0.5000
- compute()#
Compute optimal threshold by maximizing F1 score.
Calculates precision-recall curve and corresponding thresholds, then finds the threshold that maximizes the F1 score.
- Returns:
Optimal threshold value.
- Return type:
Warning
If validation set contains no anomalous samples, the threshold will default to the maximum anomaly score, which may lead to poor performance.
- class anomalib.metrics.F1Max(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_F1Max
Wrapper to add AnomalibMetric functionality to F1Max metric.
This class wraps the internal
_F1Max
metric to make it compatible with Anomalib’s batch processing capabilities.Example
>>> from anomalib.metrics import F1Max >>> from anomalib.data import ImageBatch >>> import torch >>> # Create metric with batch fields >>> f1_max = F1Max(fields=["pred_score", "gt_label"]) >>> # Create sample batch >>> batch = ImageBatch( ... image=torch.rand(4, 3, 32, 32), ... pred_score=torch.tensor([0.1, 0.4, 0.35, 0.8]), ... gt_label=torch.tensor([0, 0, 1, 1]) ... ) >>> # Update and compute >>> f1_max.update(batch) >>> f1_max.compute() tensor(1.0)
- class anomalib.metrics.F1Score(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,BinaryF1Score
Wrapper to add AnomalibMetric functionality to F1Score metric.
This class wraps the torchmetrics
BinaryF1Score
to make it compatible with Anomalib’s batch processing capabilities.Example
>>> from anomalib.metrics import F1Score >>> import torch >>> # Create metric >>> f1 = F1Score() >>> # Create sample data >>> preds = torch.tensor([0, 0, 1, 1]) >>> target = torch.tensor([0, 1, 1, 1]) >>> # Update and compute >>> f1.update(preds, target) >>> f1.compute() tensor(0.8571)
- class anomalib.metrics.ManualThreshold(default_value=0.5, **kwargs)#
Bases:
Threshold
Initialize Manual Threshold.
- Parameters:
default_value (float, optional) – Default threshold value. Defaults to
0.5
.kwargs – Any keyword arguments.
Examples
>>> from anomalib.metrics import ManualThreshold >>> import torch ... >>> manual_threshold = ManualThreshold(default_value=0.5) ... >>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) ... >>> threshold = manual_threshold(preds, labels) >>> threshold tensor(0.5000, dtype=torch.float64)
As the threshold is manually set, the threshold value is the same as the
default_value
.>>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) >>> threshold = manual_threshold(preds2, labels2) >>> threshold tensor(0.5000, dtype=torch.float64)
The threshold value remains the same even if the inputs change.
- compute()#
Compute the threshold.
In case of manual thresholding, the threshold is already set and does not need to be computed.
- Returns:
Value of the optimal threshold.
- Return type:
- class anomalib.metrics.MinMax(**kwargs)#
Bases:
Metric
Track minimum and maximum values across batches.
This metric maintains running minimum and maximum values across all batches it processes. It is useful for tasks like normalization or monitoring the range of values during training.
- Parameters:
full_state_update (bool, optional) – Whether to update the internal state with each new batch. Defaults to
True
.kwargs – Additional keyword arguments passed to the parent class.
- min#
Running minimum value seen across all batches
- Type:
- max#
Running maximum value seen across all batches
- Type:
Example
>>> from anomalib.metrics import MinMax >>> import torch >>> # Create metric >>> minmax = MinMax() >>> # Update with batches >>> batch1 = torch.tensor([0.1, 0.2, 0.3]) >>> batch2 = torch.tensor([0.2, 0.4, 0.5]) >>> minmax.update(batch1) >>> minmax.update(batch2) >>> # Get final min/max values >>> min_val, max_val = minmax.compute() >>> min_val, max_val (tensor(0.1000), tensor(0.5000))
- compute()#
Compute final minimum and maximum values.
- Returns:
- Tuple containing the (min, max)
values tracked across all batches
- Return type:
- update(predictions, *args, **kwargs)#
Update running min and max values with new predictions.
- Parameters:
predictions (torch.Tensor) – New tensor of values to include in min/max tracking
*args – Additional positional arguments (unused)
**kwargs – Additional keyword arguments (unused)
- Return type:
- class anomalib.metrics.PIMO(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_PIMO
Wrapper adding AnomalibMetric functionality to PIMO metric.
- class anomalib.metrics.PRO(fields=None, prefix='', **kwargs)#
Bases:
AnomalibMetric
,_PRO
Wrapper to add AnomalibMetric functionality to PRO metric.
This class inherits from both
AnomalibMetric
and_PRO
to combine Anomalib’s metric functionality with the PRO score computation.