VLM-AD#

Vision Language Model (VLM) based Anomaly Detection.

This module implements anomaly detection using Vision Language Models (VLMs) like GPT-4V, LLaVA, etc. The models use natural language prompting to detect anomalies in images by comparing them with reference normal images.

Example

>>> from anomalib.models.image import VlmAd
>>> from anomalib.data import MVTecAD
>>> from anomalib.engine import Engine
>>> # Initialize model and data
>>> datamodule = MVTecAD()
>>> model = VlmAd(
...     backend="chatgpt",
...     model_name="gpt-4-vision-preview"
... )
>>> # Predict using the Engine
>>> engine = Engine()
>>> engine.predict(model=model, datamodule=datamodule)  

See also

  • VlmAd: Main model class for VLM-based anomaly detection

  • backends: Different VLM backend implementations

  • utils: Utility functions for prompting and responses

class anomalib.models.image.vlm_ad.VlmAd(model=ModelName.LLAMA_OLLAMA, api_key=None, k_shot=0)#

Bases: AnomalibModule

Vision Language Model (VLM) based anomaly detection model.

This model uses VLMs like GPT-4V, LLaVA, etc. to detect anomalies in images by comparing them with reference normal images through natural language prompting.

Parameters:
  • model (ModelName | str) – Name of the VLM model to use. Can be one of: - ModelName.LLAMA_OLLAMA - ModelName.GPT_4O_MINI - ModelName.VICUNA_7B_HF - ModelName.VICUNA_13B_HF - ModelName.MISTRAL_7B_HF Defaults to ModelName.LLAMA_OLLAMA.

  • api_key (str | None, optional) – API key for models that require authentication. Defaults to None.

  • k_shot (int, optional) – Number of reference normal images to use for few-shot learning. If 0, uses zero-shot approach. Defaults to 0.

Example

>>> from anomalib.models.image import VlmAd
>>> # Zero-shot approach
>>> model = VlmAd(  
...     model="gpt-4-vision-preview",
...     api_key="YOUR_API_KEY"
... )
>>> # Few-shot approach with 3 reference images
>>> model = VlmAd(  
...     model="gpt-4-vision-preview",
...     api_key="YOUR_API_KEY",
...     k_shot=3
... )
Raises:

ValueError – If an unsupported VLM model is specified.

collect_reference_images(dataloader)#

Collect reference images for few-shot inference.

Parameters:

dataloader (DataLoader) – DataLoader containing normal images for reference.

Return type:

None

static configure_evaluator()#

Configure default evaluator.

Returns:

Evaluator configured with F1Score metric.

Return type:

Evaluator

classmethod configure_post_processor()#

Configure post processor.

Returns:

None as post processing is not required.

Return type:

PostProcessor | None

static configure_transforms(image_size=None)#

Configure image transforms.

Parameters:

image_size (tuple[int, int] | None, optional) – Ignored as each backend has its own transforms. Defaults to None.

Return type:

None

property learning_type: LearningType#

Get the learning type of the model.

Returns:

ZERO_SHOT if k_shot=0, else FEW_SHOT.

Return type:

LearningType

property prompt: Prompt#

Get the prompt for VLM interaction.

Returns:

Object containing prompts for prediction and few-shot learning.

Return type:

Prompt

to_onnx(*_, **__)#

Skip export to onnx.

Return type:

None

to_openvino(*_, **__)#

Skip export to openvino.

Return type:

None

to_torch(*_, **__)#

Skip export to torch.

Return type:

None

property trainer_arguments: dict[str, int | float]#

Get trainer arguments.

Returns:

Empty dict as no training is needed.

Return type:

dict[str, int | float]

validation_step(batch, *args, **kwargs)#

Perform validation step.

Parameters:
  • batch (ImageBatch) – Batch of images to validate.

  • *args – Variable length argument list.

  • **kwargs – Arbitrary keyword arguments.

Returns:

Batch with predictions and explanations added.

Return type:

ImageBatch