VLM-AD#
Vision Language Model (VLM) based Anomaly Detection.
This module implements anomaly detection using Vision Language Models (VLMs) like GPT-4V, LLaVA, etc. The models use natural language prompting to detect anomalies in images by comparing them with reference normal images.
Example
>>> from anomalib.models.image import VlmAd
>>> from anomalib.data import MVTecAD
>>> from anomalib.engine import Engine
>>> # Initialize model and data
>>> datamodule = MVTecAD()
>>> model = VlmAd(
... backend="chatgpt",
... model_name="gpt-4-vision-preview"
... )
>>> # Predict using the Engine
>>> engine = Engine()
>>> engine.predict(model=model, datamodule=datamodule)
See also
VlmAd: Main model class for VLM-based anomaly detectionbackends: Different VLM backend implementationsutils: Utility functions for prompting and responses
- class anomalib.models.image.vlm_ad.VlmAd(model=ModelName.LLAMA_OLLAMA, api_key=None, k_shot=0)#
Bases:
AnomalibModuleVision Language Model (VLM) based anomaly detection model.
This model uses VLMs like GPT-4V, LLaVA, etc. to detect anomalies in images by comparing them with reference normal images through natural language prompting.
- Parameters:
model (ModelName | str) – Name of the VLM model to use. Can be one of: -
ModelName.LLAMA_OLLAMA-ModelName.GPT_4O_MINI-ModelName.VICUNA_7B_HF-ModelName.VICUNA_13B_HF-ModelName.MISTRAL_7B_HFDefaults toModelName.LLAMA_OLLAMA.api_key (str | None, optional) – API key for models that require authentication. Defaults to None.
k_shot (int, optional) – Number of reference normal images to use for few-shot learning. If 0, uses zero-shot approach. Defaults to 0.
Example
>>> from anomalib.models.image import VlmAd >>> # Zero-shot approach >>> model = VlmAd( ... model="gpt-4-vision-preview", ... api_key="YOUR_API_KEY" ... ) >>> # Few-shot approach with 3 reference images >>> model = VlmAd( ... model="gpt-4-vision-preview", ... api_key="YOUR_API_KEY", ... k_shot=3 ... )
- Raises:
ValueError – If an unsupported VLM model is specified.
- collect_reference_images(dataloader)#
Collect reference images for few-shot inference.
- Parameters:
dataloader (DataLoader) – DataLoader containing normal images for reference.
- Return type:
- static configure_evaluator()#
Configure default evaluator.
- Returns:
Evaluator configured with F1Score metric.
- Return type:
- classmethod configure_post_processor()#
Configure post processor.
- Returns:
None as post processing is not required.
- Return type:
PostProcessor | None
- static configure_transforms(image_size=None)#
Configure image transforms.
- property learning_type: LearningType#
Get the learning type of the model.
- Returns:
ZERO_SHOT if k_shot=0, else FEW_SHOT.
- Return type:
LearningType
- property prompt: Prompt#
Get the prompt for VLM interaction.
- Returns:
Object containing prompts for prediction and few-shot learning.
- Return type:
Prompt
- validation_step(batch, *args, **kwargs)#
Perform validation step.
- Parameters:
batch (ImageBatch) – Batch of images to validate.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- Returns:
Batch with predictions and explanations added.
- Return type: