Sampling Components#
Sampling methods for anomaly detection models.
This module provides sampling techniques used in anomaly detection models to select representative samples from datasets.
- Classes:
- KCenterGreedy: K-center greedy sampling algorithm that selects diverse and
representative samples.
Example
>>> import torch
>>> from anomalib.models.components.sampling import KCenterGreedy
>>> # Create sampler
>>> sampler = KCenterGreedy()
>>> # Sample from feature embeddings
>>> features = torch.randn(100, 512) # 100 samples with 512 dimensions
>>> selected_idx = sampler.select_coreset(features, n=10)
- class anomalib.models.components.sampling.KCenterGreedy(embedding, sampling_ratio)#
Bases:
objectk-center-greedy method for coreset selection.
This class implements the k-center-greedy method to select a coreset from an embedding space. The method aims to minimize the maximum distance between any point and its nearest center.
- Parameters:
embedding (torch.Tensor) – Embedding tensor extracted from a CNN.
sampling_ratio (float) – Ratio to determine coreset size from embedding size.
- embedding#
Input embedding tensor.
- Type:
- model#
Dimensionality reduction model.
- Type:
- features#
Transformed features after dimensionality reduction.
- Type:
- min_distances#
Minimum distances to cluster centers.
- Type:
Example
>>> import torch >>> embedding = torch.randn(219520, 1536) >>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001) >>> sampled_idxs = sampler.select_coreset_idxs() >>> coreset = embedding[sampled_idxs] >>> coreset.shape torch.Size([219, 1536])
- get_new_idx()#
Get index of the next sample based on maximum minimum distance.
- Returns:
Index of the selected sample (tensor, not converted to int).
- Return type:
- Raises:
TypeError – If self.min_distances is not a torch.Tensor.
- sample_coreset()#
Select coreset from the embedding.
- Returns:
Selected coreset.
- Return type:
Example
>>> import torch >>> embedding = torch.randn(219520, 1536) >>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001) >>> coreset = sampler.sample_coreset() >>> coreset.shape torch.Size([219, 1536])
- select_coreset_idxs()#
Greedily select coreset indices to minimize maximum distance to centers.
The algorithm iteratively selects points that are farthest from the already selected centers, starting from a random initial point.
- update_distances(cluster_center)#
Update minimum distances given a single cluster center.
- Parameters:
cluster_center (int | torch.Tensor | None) – Index of a single cluster center. Can be an int, a 0-d tensor, or a 1-d tensor with shape [1].
- Return type:
Note
This method is optimized for single-center updates. Passing multiple indices may result in incorrect behavior or runtime errors.