Metrics#
Custom anomaly evaluation metrics.
- class anomalib.metrics.AUPIMO(num_thresholds=300000, fpr_bounds=(1e-05, 0.0001), return_average=True, force=False)#
Bases:
PIMO
Area Under the Per-Image Overlap (PIMO) curve.
This torchmetrics interface is a wrapper around the functional interface, which is a wrapper around the numpy code. The tensors are converted to numpy arrays and then passed and validated in the numpy code. The results are converted back to tensors and wrapped in an dataclass object.
Scores are computed from the integration of the PIMO curves within the given FPR bounds, then normalized to [0, 1]. It can be thought of as the average TPR of the PIMO curves within the given FPR bounds.
Details: anomalib.metrics.per_image.pimo.
- Notation:
N: number of images H: image height W: image width K: number of thresholds
- anomaly_maps#
floating point anomaly score maps of shape (N, H, W)
- masks#
binary (bool or int) ground truth masks of shape (N, H, W)
- Parameters:
num_thresholds (
int
) – number of thresholds to compute (K)fpr_bounds (
tuple
[float
,float
]) – lower and upper bounds of the FPR integration rangeforce (
bool
) – whether to force the computation despite bad conditions
- Returns:
PIMO and AUPIMO results dataclass objects. See PIMOResult and AUPIMOResult.
- Return type:
tuple[PIMOResult, AUPIMOResult]
- compute(force=None)#
Compute the PIMO curves and their Area Under the curve (AUPIMO) scores.
Call the functional interface aupimo_scores(), which is a wrapper around the numpy code.
- Parameters:
force (
bool
|None
) – if given (not None), override the force attribute.- Returns:
- PIMO curves and AUPIMO scores dataclass objects.
See PIMOResult and AUPIMOResult for details.
- Return type:
tuple[PIMOResult, AUPIMOResult]
- static normalizing_factor(fpr_bounds)#
Constant that normalizes the AUPIMO integral to 0-1 range.
It is the maximum possible value from the integral in AUPIMO’s definition. It corresponds to assuming a constant function T_i: thresh –> 1.
- Parameters:
fpr_bounds (
tuple
[float
,float
]) – lower and upper bounds of the FPR integration range.- Returns:
the normalization factor (>0).
- Return type:
float
- class anomalib.metrics.AUPR(thresholds=None, ignore_index=None, validate_args=True, **kwargs)#
Bases:
BinaryPrecisionRecallCurve
Area under the PR curve.
This metric computes the area under the precision-recall curve.
- Parameters:
kwargs (
Any
) – Additional arguments to the TorchMetrics base class.
Examples
To compute the metric for a set of predictions and ground truth targets:
>>> true = torch.tensor([0, 1, 1, 1, 0, 0, 0, 0, 1, 1]) >>> pred = torch.tensor([0.59, 0.35, 0.72, 0.33, 0.73, 0.81, 0.30, 0.05, 0.04, 0.48])
>>> metric = AUPR() >>> metric(pred, true) tensor(0.4899)
It is also possible to update the metric state incrementally within batches:
>>> for batch in dataloader: ... # Compute prediction and target tensors ... metric.update(pred, true) >>> metric.compute()
Once the metric has been computed, we can plot the PR curve:
>>> figure, title = metric.generate_figure()
- compute()#
First compute PR curve, then compute area under the curve.
- Return type:
Tensor
- Returns:
Value of the AUPR metric
- generate_figure()#
Generate a figure containing the PR curve as well as the random baseline and the AUC.
- Returns:
Tuple containing both the PR curve and the figure title to be used for logging
- Return type:
tuple[Figure, str]
- update(preds, target)#
Update state with new values.
Need to flatten new values as PrecicionRecallCurve expects them in this format for binary classification.
- Parameters:
preds (torch.Tensor) – predictions of the model
target (torch.Tensor) – ground truth targets
- Return type:
None
- class anomalib.metrics.AUPRO(dist_sync_on_step=False, process_group=None, dist_sync_fn=None, fpr_limit=0.3, num_thresholds=None)#
Bases:
Metric
Area under per region overlap (AUPRO) Metric.
- Parameters:
dist_sync_on_step (bool) – Synchronize metric state across processes at each
forward()
before returning the value at the step. Default:False
process_group (Optional[Any]) – Specify the process group on which synchronization is called. Default:
None
(which selects the entire world)dist_sync_fn (Optional[Callable]) – Callback that performs the allgather operation on the metric state. When
None
, DDP will be used to perform the allgather. Default:None
fpr_limit (float) – Limit for the false positive rate. Defaults to
0.3
.num_thresholds (int) – Number of thresholds to use for computing the roc curve. Defaults to
None
. IfNone
, the roc curve is computed with the thresholds returned bytorchmetrics.functional.classification.thresholds
.
Examples
>>> import torch >>> from anomalib.metrics import AUPRO ... >>> labels = torch.randint(low=0, high=2, size=(1, 10, 5), dtype=torch.float32) >>> preds = torch.rand_like(labels) ... >>> aupro = AUPRO(fpr_limit=0.3) >>> aupro(preds, labels) tensor(0.4321)
Increasing the fpr_limit will increase the AUPRO value:
>>> aupro = AUPRO(fpr_limit=0.7) >>> aupro(preds, labels) tensor(0.5271)
- compute()#
Fist compute PRO curve, then compute and scale area under the curve.
- Returns:
Value of the AUPRO metric
- Return type:
Tensor
- compute_pro(cca, target, preds)#
Compute the pro/fpr value-pairs until the fpr specified by self.fpr_limit.
It leverages the fact that the overlap corresponds to the tpr, and thus computes the overall PRO curve by aggregating per-region tpr/fpr values produced by ROC-construction.
- Returns:
tuple containing final fpr and tpr values.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- generate_figure()#
Generate a figure containing the PRO curve and the AUPRO.
- Returns:
Tuple containing both the figure and the figure title to be used for logging
- Return type:
tuple[Figure, str]
- static interp1d(old_x, old_y, new_x)#
Interpolate a 1D signal linearly to new sampling points.
- Parameters:
old_x (torch.Tensor) – original 1-D x values (same size as y)
old_y (torch.Tensor) – original 1-D y values (same size as x)
new_x (torch.Tensor) – x-values where y should be interpolated at
- Returns:
y-values at corresponding new_x values.
- Return type:
Tensor
- perform_cca()#
Perform the Connected Component Analysis on the self.target tensor.
- Raises:
ValueError – ValueError is raised if self.target doesn’t conform with requirements imposed by kornia for connected component analysis.
- Returns:
Components labeled from 0 to N.
- Return type:
Tensor
- update(preds, target)#
Update state with new values.
- Parameters:
preds (torch.Tensor) – predictions of the model
target (torch.Tensor) – ground truth targets
- Return type:
None
- class anomalib.metrics.AUROC(thresholds=None, ignore_index=None, validate_args=True, **kwargs)#
Bases:
BinaryROC
Area under the ROC curve.
Examples
>>> import torch >>> from anomalib.metrics import AUROC ... >>> preds = torch.tensor([0.13, 0.26, 0.08, 0.92, 0.03]) >>> target = torch.tensor([0, 0, 1, 1, 0]) ... >>> auroc = AUROC() >>> auroc(preds, target) tensor(0.6667)
It is possible to update the metric state incrementally:
>>> auroc.update(preds[:2], target[:2]) >>> auroc.update(preds[2:], target[2:]) >>> auroc.compute() tensor(0.6667)
To plot the ROC curve, use the
generate_figure
method:>>> fig, title = auroc.generate_figure()
- compute()#
First compute ROC curve, then compute area under the curve.
- Returns:
Value of the AUROC metric
- Return type:
Tensor
- generate_figure()#
Generate a figure containing the ROC curve, the baseline and the AUROC.
- Returns:
Tuple containing both the figure and the figure title to be used for logging
- Return type:
tuple[Figure, str]
- update(preds, target)#
Update state with new values.
Need to flatten new values as ROC expects them in this format for binary classification.
- Parameters:
preds (torch.Tensor) – predictions of the model
target (torch.Tensor) – ground truth targets
- Return type:
None
- class anomalib.metrics.AnomalyScoreDistribution(**kwargs)#
Bases:
Metric
Mean and standard deviation of the anomaly scores of normal training data.
- compute()#
Compute stats.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
]
- update(*args, anomaly_scores=None, anomaly_maps=None, **kwargs)#
Update the precision-recall curve metric.
- Return type:
None
- class anomalib.metrics.BinaryPrecisionRecallCurve(thresholds=None, ignore_index=None, validate_args=True, **kwargs)#
Bases:
BinaryPrecisionRecallCurve
Binary precision-recall curve with without threshold prediction normalization.
- update(preds, target)#
Update metric state with new predictions and targets.
Unlike the base class, this accepts raw predictions and targets.
- Parameters:
preds (Tensor) – Predicted probabilities
target (Tensor) – Ground truth labels
- Return type:
None
- class anomalib.metrics.F1AdaptiveThreshold(default_value=0.5, **kwargs)#
Bases:
BinaryPrecisionRecallCurve
,Threshold
Anomaly Score Threshold.
This class computes/stores the threshold that determines the anomalous label given anomaly scores. It initially computes the adaptive threshold to find the optimal f1_score and stores the computed adaptive threshold value.
- Parameters:
default_value (
float
) – Default value of the threshold. Defaults to0.5
.
Examples
To find the best threshold that maximizes the F1 score, we could run the following:
>>> from anomalib.metrics import F1AdaptiveThreshold >>> import torch ... >>> labels = torch.tensor([0, 0, 0, 1, 1]) >>> preds = torch.tensor([2.3, 1.6, 2.6, 7.9, 3.3]) ... >>> adaptive_threshold = F1AdaptiveThreshold(default_value=0.5) >>> threshold = adaptive_threshold(preds, labels) >>> threshold tensor(3.3000)
- compute()#
Compute the threshold that yields the optimal F1 score.
Compute the F1 scores while varying the threshold. Store the optimal threshold as attribute and return the maximum value of the F1 score.
- Return type:
Tensor
- Returns:
Value of the F1 score at the optimal threshold.
- class anomalib.metrics.F1Max(**kwargs)#
Bases:
Metric
F1Max Metric for Computing the Maximum F1 Score.
This class is designed to calculate the maximum F1 score from the precision- recall curve for binary classification tasks. The F1 score is a harmonic mean of precision and recall, offering a balance between these two metrics. The maximum F1 score (F1-Max) is particularly useful in scenarios where an optimal balance between precision and recall is desired, such as in imbalanced datasets or when both false positives and false negatives carry significant costs.
After computing the F1Max score, the class also identifies and stores the threshold that yields this maximum F1 score, which providing insight into the optimal point for the classification decision.
- Parameters:
**kwargs – Variable keyword arguments that can be passed to the parent class.
- full_state_update#
Indicates whether the metric requires updating the entire state. Set to False for this metric as it calculates the F1 score based on the current state without needing historical data.
- Type:
bool
- precision_recall_curve#
Utility to compute precision and recall values across different thresholds.
- threshold#
Stores the threshold value that results in the maximum F1 score.
- Type:
torch.Tensor
Examples
>>> from anomalib.metrics import F1Max >>> import torch
>>> preds = torch.tensor([0.1, 0.4, 0.35, 0.8]) >>> target = torch.tensor([0, 0, 1, 1])
>>> f1_max = F1Max() >>> f1_max.update(preds, target)
>>> optimal_f1_score = f1_max.compute() >>> print(f"Optimal F1 Score: {f1_max_score}") >>> print(f"Optimal Threshold: {f1_max.threshold}")
Note
Use update method to input predictions and target labels.
Use compute method to calculate the maximum F1 score after all updates.
Use reset method to clear the current state and prepare for a new set of calculations.
- compute()#
Compute the value of the optimal F1 score.
Compute the F1 scores while varying the threshold. Store the optimal threshold as attribute and return the maximum value of the F1 score.
- Return type:
Tensor
- Returns:
Value of the F1 score at the optimal threshold.
- reset()#
Reset the metric.
- Return type:
None
- update(preds, target, *args, **kwargs)#
Update the precision-recall curve metric.
- Return type:
None
- class anomalib.metrics.F1Score(threshold=0.5, multidim_average='global', ignore_index=None, validate_args=True, **kwargs)#
Bases:
BinaryF1Score
This is a wrapper around torchmetrics’ BinaryF1Score.
The idea behind this is to retain the current configuration otherwise the one from torchmetrics requires
task
as a parameter.
- class anomalib.metrics.ManualThreshold(default_value=0.5, **kwargs)#
Bases:
Threshold
Initialize Manual Threshold.
- Parameters:
default_value (float, optional) – Default threshold value. Defaults to
0.5
.kwargs – Any keyword arguments.
Examples
>>> from anomalib.metrics import ManualThreshold >>> import torch ... >>> manual_threshold = ManualThreshold(default_value=0.5) ... >>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) ... >>> threshold = manual_threshold(preds, labels) >>> threshold tensor(0.5000, dtype=torch.float64)
As the threshold is manually set, the threshold value is the same as the
default_value
.>>> labels = torch.randint(low=0, high=2, size=(5,)) >>> preds = torch.rand(5) >>> threshold = manual_threshold(preds2, labels2) >>> threshold tensor(0.5000, dtype=torch.float64)
The threshold value remains the same even if the inputs change.
- compute()#
Compute the threshold.
In case of manual thresholding, the threshold is already set and does not need to be computed.
- Returns:
Value of the optimal threshold.
- Return type:
torch.Tensor
- static update(*args, **kwargs)#
Do nothing.
- Parameters:
*args – Any positional arguments.
**kwargs – Any keyword arguments.
- Return type:
None
- class anomalib.metrics.MinMax(**kwargs)#
Bases:
Metric
Track the min and max values of the observations in each batch.
- Parameters:
full_state_update (bool, optional) – Whether to update the state with the new values. Defaults to
True
.kwargs – Any keyword arguments.
Examples
>>> from anomalib.metrics import MinMax >>> import torch ... >>> predictions = torch.tensor([0.0807, 0.6329, 0.0559, 0.9860, 0.3595]) >>> minmax = MinMax() >>> minmax(predictions) (tensor(0.0559), tensor(0.9860))
It is possible to update the minmax values with a new tensor of predictions.
>>> new_predictions = torch.tensor([0.3251, 0.3169, 0.3072, 0.6247, 0.9999]) >>> minmax.update(new_predictions) >>> minmax.compute() (tensor(0.0559), tensor(0.9999))
- compute()#
Return min and max values.
- Return type:
tuple
[Tensor
,Tensor
]
- update(predictions, *args, **kwargs)#
Update the min and max values.
- Return type:
None
- class anomalib.metrics.PIMO(num_thresholds)#
Bases:
Metric
Per-IMage Overlap (PIMO, pronounced pee-mo) curves.
This torchmetrics interface is a wrapper around the functional interface, which is a wrapper around the numpy code. The tensors are converted to numpy arrays and then passed and validated in the numpy code. The results are converted back to tensors and wrapped in an dataclass object.
PIMO is a curve of True Positive Rate (TPR) values on each image across multiple anomaly score thresholds. The anomaly score thresholds are indexed by a (cross-image shared) value of False Positive Rate (FPR) measure on the normal images.
Details: anomalib.metrics.per_image.pimo.
- Notation:
N: number of images H: image height W: image width K: number of thresholds
- anomaly_maps#
floating point anomaly score maps of shape (N, H, W)
- masks#
binary (bool or int) ground truth masks of shape (N, H, W)
- Parameters:
num_thresholds (
int
) – number of thresholds to compute (K)binclf_algorithm – algorithm to compute the binary classifier curve (see binclf_curve_numpy.Algorithm)
- Returns:
PIMO curves dataclass object. See PIMOResult for details.
- Return type:
PIMOResult
- compute()#
Compute the PIMO curves.
Call the functional interface pimo_curves(), which is a wrapper around the numpy code.
- Returns:
PIMO curves dataclass object. See PIMOResult for details.
- Return type:
PIMOResult
- property image_classes: Tensor#
anomalous).
- Type:
Image classes (0
- Type:
normal, 1
- property num_images: int#
Number of images.
- update(anomaly_maps, masks)#
Update lists of anomaly maps and masks.
- Parameters:
anomaly_maps (torch.Tensor) – predictions of the model (ndim == 2, float)
masks (torch.Tensor) – ground truth masks (ndim == 2, binary)
- Return type:
None
- class anomalib.metrics.PRO(threshold=0.5, **kwargs)#
Bases:
Metric
Per-Region Overlap (PRO) Score.
This metric computes the macro average of the per-region overlap between the predicted anomaly masks and the ground truth masks.
- Parameters:
threshold (float) – Threshold used to binarize the predictions. Defaults to
0.5
.kwargs – Additional arguments to the TorchMetrics base class.
Example
Import the metric from the package:
>>> import torch >>> from anomalib.metrics import PRO
Create random
preds
andlabels
tensors:>>> labels = torch.randint(low=0, high=2, size=(1, 10, 5), dtype=torch.float32) >>> preds = torch.rand_like(labels)
Compute the PRO score for labels and preds:
>>> pro = PRO(threshold=0.5) >>> pro.update(preds, labels) >>> pro.compute() tensor(0.5433)
Note
Note that the example above shows random predictions and labels. Therefore, the PRO score above may not be reproducible.
- compute()#
Compute the macro average of the PRO score across all regions in all batches.
- Return type:
Tensor
Example
To compute the metric based on the state accumulated from multiple batches, use the
compute
method:>>> pro.compute() tensor(0.5433)
- update(predictions, targets)#
Compute the PRO score for the current batch.
- Parameters:
predictions (torch.Tensor) – Predicted anomaly masks (Bx1xHxW)
targets (torch.Tensor) – Ground truth anomaly masks (Bx1xHxW)
- Return type:
None
Example
To update the metric state for the current batch, use the
update
method:>>> pro.update(preds, labels)