Sampling Components#
Sampling methods for anomaly detection models.
This module provides sampling techniques used in anomaly detection models to select representative samples from datasets.
- Classes:
- KCenterGreedy: K-center greedy sampling algorithm that selects diverse and
representative samples.
Example
>>> import torch
>>> from anomalib.models.components.sampling import KCenterGreedy
>>> # Create sampler
>>> sampler = KCenterGreedy()
>>> # Sample from feature embeddings
>>> features = torch.randn(100, 512) # 100 samples with 512 dimensions
>>> selected_idx = sampler.select_coreset(features, n=10)
- class anomalib.models.components.sampling.KCenterGreedy(embedding, sampling_ratio)#
Bases:
object
k-center-greedy method for coreset selection.
This class implements the k-center-greedy method to select a coreset from an embedding space. The method aims to minimize the maximum distance between any point and its nearest center.
- Parameters:
embedding (torch.Tensor) – Embedding tensor extracted from a CNN.
sampling_ratio (float) – Ratio to determine coreset size from embedding size.
- embedding#
Input embedding tensor.
- Type:
- model#
Dimensionality reduction model.
- Type:
- features#
Transformed features after dimensionality reduction.
- Type:
- min_distances#
Minimum distances to cluster centers.
- Type:
Example
>>> import torch >>> embedding = torch.randn(219520, 1536) >>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001) >>> sampled_idxs = sampler.select_coreset_idxs() >>> coreset = embedding[sampled_idxs] >>> coreset.shape torch.Size([219, 1536])
- get_new_idx()#
Get index of the next sample based on maximum minimum distance.
- sample_coreset(selected_idxs=None)#
Select coreset from the embedding.
- Parameters:
selected_idxs (list[int] | None, optional) – Indices of pre-selected samples. Defaults to None.
- Returns:
Selected coreset.
- Return type:
Example
>>> import torch >>> embedding = torch.randn(219520, 1536) >>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001) >>> coreset = sampler.sample_coreset() >>> coreset.shape torch.Size([219, 1536])
- select_coreset_idxs(selected_idxs=None)#
Greedily form a coreset to minimize maximum distance to cluster centers.
- Parameters:
selected_idxs (list[int] | None, optional) – Indices of pre-selected samples. Defaults to None.
- Returns:
- Indices of samples selected to minimize distance to cluster
centers.
- Return type:
- Raises:
ValueError – If a newly selected index is already in selected_idxs.