VLM-AD#
Vision Language Model (VLM) based Anomaly Detection.
This module implements anomaly detection using Vision Language Models (VLMs) like GPT-4V, LLaVA, etc. The models use natural language prompting to detect anomalies in images by comparing them with reference normal images.
Example
>>> from anomalib.models.image import VlmAd
>>> from anomalib.data import MVTecAD
>>> from anomalib.engine import Engine
>>> # Initialize model and data
>>> datamodule = MVTecAD()
>>> model = VlmAd(
... model="gpt-4o-mini",
... api_key="YOUR_API_KEY"
... )
>>> # Predict using the Engine
>>> engine = Engine()
>>> engine.predict(model=model, datamodule=datamodule)
See also
VlmAd: Main model class for VLM-based anomaly detectionbackends: Different VLM backend implementationsutils: Utility functions for prompting and responses
- class anomalib.models.image.vlm_ad.VlmAd(model=ModelName.LLAMA_OLLAMA, api_key=None, k_shot=0, hf_model_revision=None)#
Bases:
AnomalibModuleVision Language Model (VLM) based anomaly detection model.
This model uses VLMs like GPT-4V, LLaVA, etc. to detect anomalies in images by comparing them with reference normal images through natural language prompting.
- Parameters:
model (ModelName | str) – Name of the VLM model to use. Can be one of: -
ModelName.LLAMA_OLLAMA-ModelName.GPT_4O_MINI-ModelName.VICUNA_7B_HF-ModelName.VICUNA_13B_HF-ModelName.MISTRAL_7B_HFDefaults toModelName.LLAMA_OLLAMA.api_key (str | None, optional) – API key for models that require authentication. Defaults to None.
k_shot (int, optional) – Number of reference normal images to use for few-shot learning. If 0, uses zero-shot approach. Defaults to 0.
hf_model_revision (str, optional) – Model revision/branch/tag to use when using HuggingFace models. Defaults to “main”.
Example
>>> from anomalib.models.image import VlmAd >>> # Zero-shot approach >>> model = VlmAd( ... model="gpt-4-vision-preview", ... api_key="YOUR_API_KEY" ... ) >>> # Few-shot approach with 3 reference images >>> model = VlmAd( ... model="gpt-4-vision-preview", ... api_key="YOUR_API_KEY", ... k_shot=3 ... ) >>> # Using a HuggingFace model with specific revision >>> model = VlmAd( ... model="llava-hf/llava-v1.6-vicuna-7b-hf", ... k_shot=5, ... hf_model_revision="c916e6cdcd760b4cecd1dd4907f84ac649f93b23" ... )
- Raises:
ValueError – If an unsupported VLM model is specified.
- collect_reference_images(dataloader)#
Collect reference images for few-shot inference.
- Parameters:
dataloader (DataLoader) – DataLoader containing normal images for reference.
- Return type:
- static configure_evaluator()#
Configure default evaluator.
- Returns:
Evaluator configured with F1Score metric.
- Return type:
- classmethod configure_post_processor()#
Configure post processor.
- Returns:
None as post processing is not required.
- Return type:
PostProcessor | None
- static configure_transforms(image_size=None)#
Configure image transforms.
- property learning_type: LearningType#
Get the learning type of the model.
- Returns:
ZERO_SHOT if k_shot=0, else FEW_SHOT.
- Return type:
LearningType
- predict_step(batch, *args, **kwargs)#
Redirect to validation step.
- Return type:
- property prompt: Prompt#
Get the prompt for VLM interaction.
- Returns:
Object containing prompts for prediction and few-shot learning.
- Return type:
Prompt
- test_step(batch, *args, **kwargs)#
Redirect to validation step.
- Return type:
- validation_step(batch, *args, **kwargs)#
Perform validation step.
- Parameters:
batch (ImageBatch) – Batch of images to validate.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- Returns:
Batch with predictions and explanations added.
- Return type: