Sampling Components#

Sampling methods for anomaly detection models.

This module provides sampling techniques used in anomaly detection models to select representative samples from datasets.

Classes:
KCenterGreedy: K-center greedy sampling algorithm that selects diverse and

representative samples.

Example

>>> import torch
>>> from anomalib.models.components.sampling import KCenterGreedy
>>> # Create sampler
>>> sampler = KCenterGreedy()
>>> # Sample from feature embeddings
>>> features = torch.randn(100, 512)  # 100 samples with 512 dimensions
>>> selected_idx = sampler.select_coreset(features, n=10)
class anomalib.models.components.sampling.KCenterGreedy(embedding, sampling_ratio)#

Bases: object

k-center-greedy method for coreset selection.

This class implements the k-center-greedy method to select a coreset from an embedding space. The method aims to minimize the maximum distance between any point and its nearest center.

Parameters:
  • embedding (torch.Tensor) – Embedding tensor extracted from a CNN.

  • sampling_ratio (float) – Ratio to determine coreset size from embedding size.

embedding#

Input embedding tensor.

Type:

torch.Tensor

coreset_size#

Size of the coreset to be selected.

Type:

int

model#

Dimensionality reduction model.

Type:

SparseRandomProjection

features#

Transformed features after dimensionality reduction.

Type:

torch.Tensor

min_distances#

Minimum distances to cluster centers.

Type:

torch.Tensor

n_observations#

Number of observations in the embedding.

Type:

int

Example

>>> import torch
>>> embedding = torch.randn(219520, 1536)
>>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001)
>>> sampled_idxs = sampler.select_coreset_idxs()
>>> coreset = embedding[sampled_idxs]
>>> coreset.shape
torch.Size([219, 1536])
get_new_idx()#

Get index of the next sample based on maximum minimum distance.

Returns:

Index of the selected sample (tensor, not converted to int).

Return type:

torch.Tensor

Raises:

TypeError – If self.min_distances is not a torch.Tensor.

reset_distances()#

Reset minimum distances to None.

Return type:

None

sample_coreset()#

Select coreset from the embedding.

Returns:

Selected coreset.

Return type:

torch.Tensor

Example

>>> import torch
>>> embedding = torch.randn(219520, 1536)
>>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001)
>>> coreset = sampler.sample_coreset()
>>> coreset.shape
torch.Size([219, 1536])
select_coreset_idxs()#

Greedily select coreset indices to minimize maximum distance to centers.

The algorithm iteratively selects points that are farthest from the already selected centers, starting from a random initial point.

Returns:

Indices of samples selected to form the coreset.

Return type:

list[int]

update_distances(cluster_center)#

Update minimum distances given a single cluster center.

Parameters:

cluster_center (int | torch.Tensor | None) – Index of a single cluster center. Can be an int, a 0-d tensor, or a 1-d tensor with shape [1].

Return type:

None

Note

This method is optimized for single-center updates. Passing multiple indices may result in incorrect behavior or runtime errors.