Sampling Components#

Sampling methods for anomaly detection models.

This module provides sampling techniques used in anomaly detection models to select representative samples from datasets.

Classes:
KCenterGreedy: K-center greedy sampling algorithm that selects diverse and

representative samples.

Example

>>> import torch
>>> from anomalib.models.components.sampling import KCenterGreedy
>>> # Create sampler
>>> sampler = KCenterGreedy()
>>> # Sample from feature embeddings
>>> features = torch.randn(100, 512)  # 100 samples with 512 dimensions
>>> selected_idx = sampler.select_coreset(features, n=10)
class anomalib.models.components.sampling.KCenterGreedy(embedding, sampling_ratio)#

Bases: object

k-center-greedy method for coreset selection.

This class implements the k-center-greedy method to select a coreset from an embedding space. The method aims to minimize the maximum distance between any point and its nearest center.

Parameters:
  • embedding (torch.Tensor) – Embedding tensor extracted from a CNN.

  • sampling_ratio (float) – Ratio to determine coreset size from embedding size.

embedding#

Input embedding tensor.

Type:

torch.Tensor

coreset_size#

Size of the coreset to be selected.

Type:

int

model#

Dimensionality reduction model.

Type:

SparseRandomProjection

features#

Transformed features after dimensionality reduction.

Type:

torch.Tensor

min_distances#

Minimum distances to cluster centers.

Type:

torch.Tensor

n_observations#

Number of observations in the embedding.

Type:

int

Example

>>> import torch
>>> embedding = torch.randn(219520, 1536)
>>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001)
>>> sampled_idxs = sampler.select_coreset_idxs()
>>> coreset = embedding[sampled_idxs]
>>> coreset.shape
torch.Size([219, 1536])
get_new_idx()#

Get index of the next sample based on maximum minimum distance.

Returns:

Index of the selected sample.

Return type:

int

Raises:

TypeError – If self.min_distances is not a torch.Tensor.

reset_distances()#

Reset minimum distances to None.

Return type:

None

sample_coreset(selected_idxs=None)#

Select coreset from the embedding.

Parameters:

selected_idxs (list[int] | None, optional) – Indices of pre-selected samples. Defaults to None.

Returns:

Selected coreset.

Return type:

torch.Tensor

Example

>>> import torch
>>> embedding = torch.randn(219520, 1536)
>>> sampler = KCenterGreedy(embedding=embedding, sampling_ratio=0.001)
>>> coreset = sampler.sample_coreset()
>>> coreset.shape
torch.Size([219, 1536])
select_coreset_idxs(selected_idxs=None)#

Greedily form a coreset to minimize maximum distance to cluster centers.

Parameters:

selected_idxs (list[int] | None, optional) – Indices of pre-selected samples. Defaults to None.

Returns:

Indices of samples selected to minimize distance to cluster

centers.

Return type:

list[int]

Raises:

ValueError – If a newly selected index is already in selected_idxs.

update_distances(cluster_centers)#

Update minimum distances given cluster centers.

Parameters:

cluster_centers (list[int]) – Indices of cluster centers.

Return type:

None