torchgmm.clustering.KMeans#

class torchgmm.clustering.KMeans(num_clusters=1, *, init_strategy='kmeans++', convergence_tolerance=0.0001, batch_size=None, trainer_params=None)#

Model for clustering data into a predefined number of clusters. More information on K-means clustering is available on Wikipedia.

Attributes table#

persistent_attributes

Returns the list of fitted attributes that ought to be saved and loaded.

model_

The fitted PyTorch module with all estimated parameters.

converged_

A boolean indicating whether the model converged during training.

num_iter_

The number of iterations the model was fitted for, excluding initialization.

inertia_

The mean squared distance of all datapoints to their closest cluster centers.

Methods table#

clone()

Clones the estimator without copying any fitted attributes.

fit(data)

Fits the KMeans model on the provided data by running Lloyd's algorithm.

fit_predict(data)

Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator.

fit_transform(data)

Fits the estimator using the provided data and subsequently transforms the data using the fitted estimator.

get_params([deep])

Returns the estimator's parameters as passed to the initializer.

load(path)

Loads the estimator and (if available) the fitted model.

load_attributes(path)

Loads the fitted attributes that are stored at the fitted path.

load_parameters(path)

Initializes this estimator by loading its parameters.

predict(data)

Predicts the closest cluster for each item provided.

save(path)

Saves the estimator to the provided directory.

save_attributes(path)

Saves the fitted attributes of this estimator.

save_parameters(path)

Saves the parameters of this estimator.

score(data)

Computes the average inertia of all the provided datapoints.

score_samples(data)

Computes the inertia for each of the the provided datapoints.

set_params(values)

Sets the provided values on the estimator.

trainer(**kwargs)

Returns the trainer as configured by the estimator.

transform(data)

Transforms the provided data into the cluster-distance space.

Attributes#

KMeans.persistent_attributes#

Returns the list of fitted attributes that ought to be saved and loaded.

By default, this encompasses all annotations.

KMeans.model_: KMeansModel#

The fitted PyTorch module with all estimated parameters.

KMeans.converged_: bool#

A boolean indicating whether the model converged during training.

KMeans.num_iter_: int#

The number of iterations the model was fitted for, excluding initialization.

KMeans.inertia_: float#

The mean squared distance of all datapoints to their closest cluster centers.

Methods#

KMeans.clone()#

Clones the estimator without copying any fitted attributes. All parameters of this estimator are copied via copy.deepcopy().

Return type:

TypeVar(E, bound= BaseEstimator)

Returns:

The cloned estimator with the same parameters.

KMeans.fit(data)#

Fits the KMeans model on the provided data by running Lloyd’s algorithm.

Args:
data: The tabular data to fit on. The dimensionality of the KMeans model is

automatically inferred from this data.

Return type:

KMeans

Returns:

The fitted KMeans model.

KMeans.fit_predict(data)#

Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator. It simply chains calls to fit() and predict().

Args:
data: The data to use for fitting and to predict labels for. The data must have the

same type as for the fit() method.

Return type:

TypeVar(R_co, covariant=True)

Returns:

The predicted labels. Consult the predict() documentation for more information on the return type.

KMeans.fit_transform(data)#

Fits the estimator using the provided data and subsequently transforms the data using the fitted estimator. It simply chains calls to fit() and transform().

Args:
data: The data to use for fitting and to transform. The data must have the

same type as for the fit() method.

Return type:

TypeVar(R_co, covariant=True)

Returns:

The transformed data. Consult the transform() documentation for more information on the return type.

KMeans.get_params(deep=True)#

Returns the estimator’s parameters as passed to the initializer.

Args:

deep: Ignored. For Scikit-learn compatibility.

Return type:

dict[str, Any]

Returns:

The mapping from init parameters to values.

classmethod KMeans.load(path)#

Loads the estimator and (if available) the fitted model. This method should only be expected to work to load an estimator that has previously been saved via save().

Args:

path: The directory from which to load the estimator.

Return type:

TypeVar(E, bound= BaseEstimator)

Returns:

The loaded estimator, either fitted or not.

KMeans.load_attributes(path)#

Loads the fitted attributes that are stored at the fitted path. If subclasses overwrite save_attributes(), this method should also be overwritten.

Typically, this method should not be called directly. It is called as part of load().

Return type:

None

Args:

path: The directory from which the parameters should be loaded.

Raises:

FileNotFoundError – If the no fitted attributes have been stored.:

classmethod KMeans.load_parameters(path)#

Initializes this estimator by loading its parameters. If subclasses overwrite save_parameters(), this method should also be overwritten.

Typically, this method should not be called directly. It is called as part of load().

Return type:

TypeVar(E, bound= BaseEstimator)

Args:

path: The directory from which the parameters should be loaded.

KMeans.predict(data)#

Predicts the closest cluster for each item provided.

Args:

data: The datapoints for which to predict the clusters.

Return type:

Tensor

Returns:

Tensor of shape [num_datapoints] with the index of the closest cluster for each

datapoint.

Attention:

When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.

KMeans.save(path)#

Saves the estimator to the provided directory. It saves a file named estimator.pickle for the configuration of the estimator and additional files for the fitted model (if applicable). For more information on the files saved for the fitted model or for more customization, look at get_params() and torchgmm.base.nn.Configurable.save().

Return type:

None

Args:

path: The directory to which all files should be saved.

Note:

This method may be called regardless of whether the estimator has already been fitted.

Attention:

If the dictionary returned by get_params() is not JSON-serializable, this method uses pickle which is not necessarily backwards-compatible.

KMeans.save_attributes(path)#

Saves the fitted attributes of this estimator. By default, it uses JSON and falls back to pickle. Subclasses should overwrite this method if non-primitive attributes are fitted.

Typically, this method should not be called directly. It is called as part of save().

Return type:

None

Args:

path: The directory to which the fitted attributed should be saved.

Raises:

NotFittedError – If the estimator has not been fitted.:

KMeans.save_parameters(path)#

Saves the parameters of this estimator. By default, it uses JSON and falls back to pickle. It subclasses use non-primitive types as parameters, they should overwrite this method.

Typically, this method should not be called directly. It is called as part of save().

Return type:

None

Args:

path: The directory to which the parameters should be saved.

KMeans.score(data)#

Computes the average inertia of all the provided datapoints. That is, it computes the mean squared distance to each datapoint’s closest centroid.

Args:

data: The data for which to compute the average inertia.

Return type:

float

Returns:

The average inertia.

Note:

See score_samples() to obtain the inertia for individual sequences.

KMeans.score_samples(data)#

Computes the inertia for each of the the provided datapoints. That is, it computes the mean squared distance of each datapoint to its closest centroid.

Args:

data: The data for which to compute the inertia values.

Return type:

Tensor

Returns:

A tensor of shape [num_datapoints] with the inertia of each datapoint.

Attention:

When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.

KMeans.set_params(values)#

Sets the provided values on the estimator. The estimator is returned as well, but the estimator on which this function is called is also modified.

Args:

values: The values to set.

Return type:

TypeVar(E, bound= BaseEstimator)

Returns:

The estimator where the values have been set.

KMeans.trainer(**kwargs)#

Returns the trainer as configured by the estimator. Typically, this method is only called by functions in the estimator.

Args:
kwargs: Additional arguments that override the trainer arguments registered in the

initializer of the estimator.

Return type:

Trainer

Returns:

A fully initialized PyTorch Lightning trainer.

Note:

This function should be preferred over initializing the trainer directly. It ensures that the returned trainer correctly deals with TorchGMM components that may be introduced in the future.

KMeans.transform(data)#

Transforms the provided data into the cluster-distance space. That is, it returns the distance of each datapoint to each cluster centroid.

Args:

data: The data to transform.

Return type:

Tensor

Returns:

A tensor of shape [num_datapoints, num_clusters] with the distances to the cluster

centroids.

Attention:

When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.