VLM-AD

VLM-AD#

Vision Language Model (VLM) based Anomaly Detection.

This module implements anomaly detection using Vision Language Models (VLMs) like GPT-4V, LLaVA, etc. The models use natural language prompting to detect anomalies in images by comparing them with reference normal images.

Example

>>> from anomalib.models.image import VlmAd
>>> from anomalib.data import MVTecAD
>>> from anomalib.engine import Engine

>>> # Initialize model and data
>>> datamodule = MVTecAD()
>>> model = VlmAd(
...     model="gpt-4o-mini",
...     api_key="YOUR_API_KEY"
... )

>>> # Predict using the Engine
>>> engine = Engine()
>>> engine.predict(model=model, datamodule=datamodule)  

See also

VlmAd: Main model class for VLM-based anomaly detection
backends: Different VLM backend implementations
utils: Utility functions for prompting and responses

class anomalib.models.image.vlm_ad.VlmAd(model=ModelName.LLAMA_OLLAMA, api_key=None, k_shot=0, hf_model_revision=None)#

Bases: AnomalibModule

Vision Language Model (VLM) based anomaly detection model.

This model uses VLMs like GPT-4V, LLaVA, etc. to detect anomalies in images by comparing them with reference normal images through natural language prompting.

Parameters:

model (ModelName | str) – Name of the VLM model to use. Can be one of: - ModelName.LLAMA_OLLAMA - ModelName.GPT_4O_MINI - ModelName.VICUNA_7B_HF - ModelName.VICUNA_13B_HF - ModelName.MISTRAL_7B_HF Defaults to ModelName.LLAMA_OLLAMA.
api_key (str | None, optional) – API key for models that require authentication. Defaults to None.
k_shot (int, optional) – Number of reference normal images to use for few-shot learning. If 0, uses zero-shot approach. Defaults to 0.
hf_model_revision (str, optional) – Model revision/branch/tag to use when using HuggingFace models. Defaults to “main”.

Example

>>> from anomalib.models.image import VlmAd
>>> # Zero-shot approach
>>> model = VlmAd(  
...     model="gpt-4-vision-preview",
...     api_key="YOUR_API_KEY"
... )
>>> # Few-shot approach with 3 reference images
>>> model = VlmAd(  
...     model="gpt-4-vision-preview",
...     api_key="YOUR_API_KEY",
...     k_shot=3
... )
>>> # Using a HuggingFace model with specific revision
>>> model = VlmAd(  
...     model="llava-hf/llava-v1.6-vicuna-7b-hf",
...     k_shot=5,
...     hf_model_revision="c916e6cdcd760b4cecd1dd4907f84ac649f93b23"
... )

Raises:: ValueError – If an unsupported VLM model is specified.

collect_reference_images(dataloader)#

Collect reference images for few-shot inference.

Parameters:: dataloader (DataLoader) – DataLoader containing normal images for reference.
Return type:: None

static configure_evaluator()#

Configure default evaluator.

Returns:: Evaluator configured with F1Score metric.
Return type:: Evaluator

classmethod configure_post_processor()#

Configure post processor.

Returns:: None as post processing is not required.
Return type:: PostProcessor | None

static configure_transforms(image_size=None)#

Configure image transforms.

Parameters:: image_size (tuple[int, int] | None, optional) – Ignored as each backend has its own transforms. Defaults to None.
Return type:: None

property learning_type: LearningType#

Get the learning type of the model.

Returns:: ZERO_SHOT if k_shot=0, else FEW_SHOT.
Return type:: LearningType

predict_step(batch, *args, **kwargs)#

Redirect to validation step.

Return type:: ImageBatch

property prompt: Prompt#

Get the prompt for VLM interaction.

Returns:: Object containing prompts for prediction and few-shot learning.
Return type:: Prompt

test_step(batch, *args, **kwargs)#

Redirect to validation step.

Return type:: ImageBatch

to_onnx(*_, **__)#

Skip export to onnx.

Return type:: None

to_openvino(*_, **__)#

Skip export to openvino.

Return type:: None

to_torch(*_, **__)#

Skip export to torch.

Return type:: None

property trainer_arguments: dict[str, int | float]#

Get trainer arguments.

Returns:: Empty dict as no training is needed.
Return type:: dict[str, int | float]

validation_step(batch, *args, **kwargs)#

Perform validation step.

Parameters:

batch (ImageBatch) – Batch of images to validate.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.

Returns:

Batch with predictions and explanations added.

Return type:

ImageBatch

VLM-AD

Contents

VLM-AD#