L2BT#
Architecture#
Learning to Be a Transformer to Pinpoint Anomalies.
This module implements the L2BT model for anomaly detection as described in Costanzino et al. (2025).
The model consists of:
A pre-trained Vision Transformer teacher that extracts patch embeddings
Two shallow student MLPs (backward_net and forward_net) that learn to match teacher patch embeddings
Feature distillation between teacher and student representations
Anomaly detection based on student ability to reconstruct teacher features
Example
>>> from anomalib.models.image import L2BT
>>> from anomalib.engine import Engine
>>> from anomalib.data import MVTecAD
>>> datamodule = MVTecAD()
>>> model = L2BT(
... layers=(7, 11),
... topk_ratio=0.001
... )
>>> engine = Engine(model=model, datamodule=datamodule)
>>> engine.fit()
>>> predictions = engine.predict()
See also
L2BT: Lightning implementation of the modelL2BTModel: PyTorch implementation of the model architecture
- class anomalib.models.image.l2bt.lightning_model.L2BT(lr=0.0001, layers=(7, 11), blur_w_l=5, blur_w_u=7, blur_pad_l=2, blur_pad_u=3, blur_repeats_l=5, blur_repeats_u=3, topk_ratio=0.001, pre_processor=True, post_processor=True, evaluator=True, visualizer=True)#
Bases:
AnomalibModuleLearning to Be a Transformer algorithm.
The L2BT model consists of a pre-trained Vision Transformer teacher that extracts patch embeddings and two shallow student MLPs (backward_net and forward_net) that learn to match the teacher’s patch embeddings. The model detects anomalies by comparing the student’s ability to reconstruct teacher embeddings on normal images, where degradation indicates anomalies.
- Parameters:
lr (
float) – Learning rate for student network optimization. Defaults to1e-4.layers (
Sequence[int]) – Indices of Vision Transformer layers used for feature extraction. Must be a sequence of exactly two indices. Defaults to(7, 11).blur_w_l (
int) – Lower bound for blur kernel width in augmentation. Defaults to5.blur_w_u (
int) – Upper bound for blur kernel width in augmentation. Defaults to7.blur_pad_l (
int) – Lower bound for blur padding in augmentation. Defaults to2.blur_pad_u (
int) – Upper bound for blur padding in augmentation. Defaults to3.blur_repeats_l (
int) – Number of repetitions for lower blur kernel. Defaults to3.blur_repeats_u (
int) – Number of repetitions for upper blur kernel. Defaults to5.topk_ratio (
float) – Fraction of highest anomaly-map values to use for image-level anomaly scoring. Defaults to0.001.pre_processor (
PreProcessor|bool) – Pre-processor to transform input data before passing to model. IfTrue, uses default. Defaults toTrue.post_processor (
PostProcessor|bool) – Post-processor to generate predictions from model outputs. IfTrue, uses default. Defaults toTrue.evaluator (
Evaluator|bool) – Evaluator to compute metrics. IfTrue, uses default. Defaults toTrue.visualizer (
Visualizer|bool) – Visualizer to display results. IfTrue, uses default. Defaults toTrue.
Example
>>> from anomalib.models.image import L2BT >>> from anomalib.data import MVTecAD >>> from anomalib.engine import Engine >>> datamodule = MVTecAD() >>> model = L2BT( ... layers=(7, 11), ... topk_ratio=0.001 ... ) >>> engine = Engine(model=model, datamodule=datamodule) >>> engine.fit() >>> predictions = engine.predict()
See also
anomalib.models.image.l2bt.torch_model.L2BTModel:PyTorch implementation of the model architecture
- configure_optimizers()#
Configure the optimizer for training.
- Returns:
- Adam optimizer with the following parameters:
Learning rate: as specified in the constructor (default 1e-4)
Optimizes parameters of both backward_net and forward_net
- Return type:
- static configure_pre_processor(image_size=None)#
Configure the default pre-processor for L2BT.
The original L2BT pipeline applies: SquarePad (edge replication) → Resize (bicubic interpolation) → ImageNet normalization.
- property learning_type: LearningType#
Get the learning type of the model.
- Returns:
The model uses one-class learning.
- Return type:
LearningType
- training_step(batch, *args, **kwargs)#
Perform a training step of L2BT.
For each batch, teacher patch embeddings are extracted from the Vision Transformer, and student MLPs are trained to reconstruct these embeddings. Multiple loss terms are computed: main loss, middle layer loss, and final layer loss for comprehensive supervision.
- validation_step(batch, *args, **kwargs)#
Perform a validation step of L2BT.
Similar to training, extracts teacher patch embeddings and computes student reconstruction errors, generating anomaly maps for evaluation.
PyTorch model implementation for L2BT.
- class anomalib.models.image.l2bt.torch_model.L2BTModel(layers=(7, 11), blur_w_l=5, blur_w_u=7, blur_pad_l=2, blur_pad_u=3, blur_repeats_l=5, blur_repeats_u=3, topk_ratio=0.001)#
Bases:
ModulePyTorch implementation of L2BT (teacher + two students).
- compute_losses(middle_patch, last_patch, predicted_middle_patch, predicted_last_patch)#
Return total loss plus the two directional losses used in the original code.
- extract_teacher_features(images)#
Extract frozen teacher features for the two selected ViT layers.
- forward(images)#
Run training or inference depending on module mode.
- Return type:
dict[str,Tensor] |InferenceBatch