Cluster#

Clustering algorithm implementations using PyTorch.

This module provides clustering algorithms implemented in PyTorch for anomaly detection tasks.

Classes:

GaussianMixture: Gaussian Mixture Model for density estimation and clustering. KMeans: K-Means clustering algorithm.

Example

>>> from anomalib.models.components.cluster import GaussianMixture, KMeans
>>> # Create and fit a GMM
>>> gmm = GaussianMixture(n_components=3)
>>> features = torch.randn(100, 10)  # Example features
>>> gmm.fit(features)
>>> # Create and fit KMeans
>>> kmeans = KMeans(n_clusters=5)
>>> kmeans.fit(features)
class anomalib.models.components.cluster.GaussianMixture(n_components, n_iter=100, tol=0.001)#

Bases: DynamicBufferMixin

Gaussian Mixture Model for clustering data into Gaussian distributions.

Parameters:
  • n_components (int) – Number of Gaussian components to fit.

  • n_iter (int, optional) – Maximum number of EM iterations. Defaults to 100.

  • tol (float, optional) – Convergence threshold for log-likelihood. Defaults to 1e-3.

means#

Means of the Gaussian components. Shape: (n_components, n_features).

Type:

torch.Tensor

covariances#

Covariance matrices of components. Shape: (n_components, n_features, n_features).

Type:

torch.Tensor

weights#

Mixing weights of components. Shape: (n_components,).

Type:

torch.Tensor

Example

>>> import torch
>>> from anomalib.models.components.cluster import GaussianMixture
>>> # Create synthetic data with two clusters
>>> data = torch.tensor([
...     [2, 1], [2, 2], [2, 3],  # Cluster 1
...     [7, 5], [8, 5], [9, 5],  # Cluster 2
... ]).float()
>>> # Initialize and fit GMM
>>> model = GaussianMixture(n_components=2)
>>> model.fit(data)
>>> # Get cluster means
>>> model.means
tensor([[8., 5.],
        [2., 2.]])
>>> # Predict cluster assignments
>>> model.predict(data)
tensor([1, 1, 1, 0, 0, 0])
>>> # Get log-likelihood scores
>>> model.score_samples(data)
tensor([3.8295, 4.5795, 3.8295, 3.8295, 4.5795, 3.8295])
fit(data)#

Fit the GMM to the input data using EM algorithm.

Parameters:

data (torch.Tensor) – Input data to fit the model to. Shape: (n_samples, n_features).

Return type:

None

predict(data)#

Predict cluster assignments for the input data.

Parameters:

data (torch.Tensor) – Input samples. Shape: (n_samples, n_features).

Returns:

Predicted cluster labels.

Shape: (n_samples,).

Return type:

torch.Tensor

score_samples(data)#

Compute per-sample likelihood scores.

Parameters:

data (torch.Tensor) – Input samples to score. Shape: (n_samples, n_features).

Returns:

Log-likelihood scores.

Shape: (n_samples,).

Return type:

torch.Tensor

class anomalib.models.components.cluster.KMeans(n_clusters, max_iter=10)#

Bases: object

K-means clustering algorithm implementation.

Parameters:
  • n_clusters (int) – Number of clusters to partition the data into.

  • max_iter (int, optional) – Maximum number of iterations for the clustering algorithm. Defaults to 10.

cluster_centers_#

Coordinates of cluster centers after fitting. Shape: (n_clusters, n_features).

Type:

torch.Tensor

labels_#

Cluster labels for the training data after fitting. Shape: (n_samples,).

Type:

torch.Tensor

Example

>>> import torch
>>> from anomalib.models.components.cluster import KMeans
>>> kmeans = KMeans(n_clusters=3)
>>> data = torch.randn(100, 5)  # 100 samples, 5 features
>>> labels, centers = kmeans.fit(data)
>>> print(f"Cluster assignments shape: {labels.shape}")
>>> print(f"Cluster centers shape: {centers.shape}")
fit(inputs)#

Fit the K-means algorithm to the input data.

Parameters:

inputs (torch.Tensor) – Input data to cluster. Shape: (n_samples, n_features).

Returns:

Tuple containing:
  • labels: Cluster assignments for each input point. Shape: (n_samples,)

  • cluster_centers: Coordinates of the cluster centers. Shape: (n_clusters, n_features)

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:

ValueError – If n_clusters is less than or equal to 0.

Example

>>> kmeans = KMeans(n_clusters=2)
>>> data = torch.tensor([[1.0, 2.0], [4.0, 5.0], [1.2, 2.1]])
>>> labels, centers = kmeans.fit(data)
>>> print(f"Number of points in each cluster: {
...     [(labels == i).sum().item() for i in range(2)]
... }")
predict(inputs)#

Predict cluster labels for input data.

Parameters:

inputs (torch.Tensor) – Input data to assign to clusters. Shape: (n_samples, n_features).

Returns:

Predicted cluster labels.

Shape: (n_samples,).

Return type:

torch.Tensor

Raises:

AttributeError – If called before fitting the model.

Example

>>> kmeans = KMeans(n_clusters=2)
>>> # First fit the model
>>> train_data = torch.tensor([[1.0, 2.0], [4.0, 5.0]])
>>> kmeans.fit(train_data)
>>> # Then predict on new data
>>> new_data = torch.tensor([[1.1, 2.1], [3.9, 4.8]])
>>> predictions = kmeans.predict(new_data)