L2BT#

Architecture#

L2BT Architecture

Learning to Be a Transformer to Pinpoint Anomalies.

This module implements the L2BT model for anomaly detection as described in Costanzino et al. (2025).

The model consists of:

  • A pre-trained Vision Transformer teacher that extracts patch embeddings

  • Two shallow student MLPs (backward_net and forward_net) that learn to match teacher patch embeddings

  • Feature distillation between teacher and student representations

  • Anomaly detection based on student ability to reconstruct teacher features

Example

>>> from anomalib.models.image import L2BT
>>> from anomalib.engine import Engine
>>> from anomalib.data import MVTecAD
>>> datamodule = MVTecAD()
>>> model = L2BT(
...     layers=(7, 11),
...     topk_ratio=0.001
... )
>>> engine = Engine(model=model, datamodule=datamodule)
>>> engine.fit()
>>> predictions = engine.predict()

See also

  • L2BT: Lightning implementation of the model

  • L2BTModel: PyTorch implementation of the model architecture

class anomalib.models.image.l2bt.lightning_model.L2BT(lr=0.0001, layers=(7, 11), blur_w_l=5, blur_w_u=7, blur_pad_l=2, blur_pad_u=3, blur_repeats_l=5, blur_repeats_u=3, topk_ratio=0.001, pre_processor=True, post_processor=True, evaluator=True, visualizer=True)#

Bases: AnomalibModule

Learning to Be a Transformer algorithm.

The L2BT model consists of a pre-trained Vision Transformer teacher that extracts patch embeddings and two shallow student MLPs (backward_net and forward_net) that learn to match the teacher’s patch embeddings. The model detects anomalies by comparing the student’s ability to reconstruct teacher embeddings on normal images, where degradation indicates anomalies.

Parameters:
  • lr (float) – Learning rate for student network optimization. Defaults to 1e-4.

  • layers (Sequence[int]) – Indices of Vision Transformer layers used for feature extraction. Must be a sequence of exactly two indices. Defaults to (7, 11).

  • blur_w_l (int) – Lower bound for blur kernel width in augmentation. Defaults to 5.

  • blur_w_u (int) – Upper bound for blur kernel width in augmentation. Defaults to 7.

  • blur_pad_l (int) – Lower bound for blur padding in augmentation. Defaults to 2.

  • blur_pad_u (int) – Upper bound for blur padding in augmentation. Defaults to 3.

  • blur_repeats_l (int) – Number of repetitions for lower blur kernel. Defaults to 3.

  • blur_repeats_u (int) – Number of repetitions for upper blur kernel. Defaults to 5.

  • topk_ratio (float) – Fraction of highest anomaly-map values to use for image-level anomaly scoring. Defaults to 0.001.

  • pre_processor (PreProcessor | bool) – Pre-processor to transform input data before passing to model. If True, uses default. Defaults to True.

  • post_processor (PostProcessor | bool) – Post-processor to generate predictions from model outputs. If True, uses default. Defaults to True.

  • evaluator (Evaluator | bool) – Evaluator to compute metrics. If True, uses default. Defaults to True.

  • visualizer (Visualizer | bool) – Visualizer to display results. If True, uses default. Defaults to True.

Example

>>> from anomalib.models.image import L2BT
>>> from anomalib.data import MVTecAD
>>> from anomalib.engine import Engine
>>> datamodule = MVTecAD()
>>> model = L2BT(
...     layers=(7, 11),
...     topk_ratio=0.001
... )
>>> engine = Engine(model=model, datamodule=datamodule)
>>> engine.fit()
>>> predictions = engine.predict()

See also

configure_optimizers()#

Configure the optimizer for training.

Returns:

Adam optimizer with the following parameters:
  • Learning rate: as specified in the constructor (default 1e-4)

  • Optimizes parameters of both backward_net and forward_net

Return type:

Optimizer

static configure_pre_processor(image_size=None)#

Configure the default pre-processor for L2BT.

The original L2BT pipeline applies: SquarePad (edge replication) → Resize (bicubic interpolation) → ImageNet normalization.

Parameters:

image_size (tuple[int, int] | None) – Target image size for resizing. Defaults to None. If None, (224, 224) is used.

Returns:

Configured pre-processor with the L2BT transform pipeline.

Return type:

PreProcessor

property learning_type: LearningType#

Get the learning type of the model.

Returns:

The model uses one-class learning.

Return type:

LearningType

property trainer_arguments: dict[str, Any]#

Get required trainer arguments for the model.

Returns:

Dictionary of trainer arguments (empty for L2BT as no

special trainer configuration is required).

Return type:

dict[str, Any]

training_step(batch, *args, **kwargs)#

Perform a training step of L2BT.

For each batch, teacher patch embeddings are extracted from the Vision Transformer, and student MLPs are trained to reconstruct these embeddings. Multiple loss terms are computed: main loss, middle layer loss, and final layer loss for comprehensive supervision.

Parameters:
  • batch (Batch) – Input batch containing images and labels.

  • args – Additional arguments (unused).

  • kwargs – Additional keyword arguments (unused).

Returns:

Dictionary containing the loss value.

Return type:

Union[Tensor, Mapping[str, Any], None]

validation_step(batch, *args, **kwargs)#

Perform a validation step of L2BT.

Similar to training, extracts teacher patch embeddings and computes student reconstruction errors, generating anomaly maps for evaluation.

Parameters:
  • batch (Batch) – Input batch containing images and labels.

  • args – Additional arguments (unused).

  • kwargs – Additional keyword arguments (unused).

Returns:

Updated batch with images, anomaly maps, labels and

masks for evaluation.

Return type:

Union[Tensor, Mapping[str, Any], None]

PyTorch model implementation for L2BT.

class anomalib.models.image.l2bt.torch_model.L2BTModel(layers=(7, 11), blur_w_l=5, blur_w_u=7, blur_pad_l=2, blur_pad_u=3, blur_repeats_l=5, blur_repeats_u=3, topk_ratio=0.001)#

Bases: Module

PyTorch implementation of L2BT (teacher + two students).

compute_losses(middle_patch, last_patch, predicted_middle_patch, predicted_last_patch)#

Return total loss plus the two directional losses used in the original code.

Return type:

tuple[Tensor, Tensor, Tensor]

extract_teacher_features(images)#

Extract frozen teacher features for the two selected ViT layers.

Return type:

tuple[Tensor, Tensor]

forward(images)#

Run training or inference depending on module mode.

Return type:

dict[str, Tensor] | InferenceBatch

predict_student_features(middle_patch, last_patch)#

Predict the cross-layer mappings learned by the two student MLPs.

Return type:

tuple[Tensor, Tensor]