torchgmm.clustering.KMeans#
- class torchgmm.clustering.KMeans(num_clusters=1, *, init_strategy='kmeans++', convergence_tolerance=0.0001, batch_size=None, trainer_params=None)#
Model for clustering data into a predefined number of clusters. More information on K-means clustering is available on Wikipedia.
Attributes table#
Returns the list of fitted attributes that ought to be saved and loaded. |
|
The fitted PyTorch module with all estimated parameters. |
|
A boolean indicating whether the model converged during training. |
|
The number of iterations the model was fitted for, excluding initialization. |
|
The mean squared distance of all datapoints to their closest cluster centers. |
Methods table#
|
Clones the estimator without copying any fitted attributes. |
|
Fits the KMeans model on the provided data by running Lloyd's algorithm. |
|
Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator. |
|
Fits the estimator using the provided data and subsequently transforms the data using the fitted estimator. |
|
Returns the estimator's parameters as passed to the initializer. |
|
Loads the estimator and (if available) the fitted model. |
|
Loads the fitted attributes that are stored at the fitted path. |
|
Initializes this estimator by loading its parameters. |
|
Predicts the closest cluster for each item provided. |
|
Saves the estimator to the provided directory. |
|
Saves the fitted attributes of this estimator. |
|
Saves the parameters of this estimator. |
|
Computes the average inertia of all the provided datapoints. |
|
Computes the inertia for each of the the provided datapoints. |
|
Sets the provided values on the estimator. |
|
Returns the trainer as configured by the estimator. |
|
Transforms the provided data into the cluster-distance space. |
Attributes#
- KMeans.persistent_attributes#
Returns the list of fitted attributes that ought to be saved and loaded.
By default, this encompasses all annotations.
-
KMeans.model_:
KMeansModel# The fitted PyTorch module with all estimated parameters.
Methods#
- KMeans.clone()#
Clones the estimator without copying any fitted attributes. All parameters of this estimator are copied via
copy.deepcopy().- Return type:
TypeVar(E, bound= BaseEstimator)- Returns:
The cloned estimator with the same parameters.
- KMeans.fit(data)#
Fits the KMeans model on the provided data by running Lloyd’s algorithm.
- Args:
- data: The tabular data to fit on. The dimensionality of the KMeans model is
automatically inferred from this data.
- Return type:
- Returns:
The fitted KMeans model.
- KMeans.fit_predict(data)#
Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator. It simply chains calls to
fit()andpredict().- Args:
- data: The data to use for fitting and to predict labels for. The data must have the
same type as for the
fit()method.
- KMeans.fit_transform(data)#
Fits the estimator using the provided data and subsequently transforms the data using the fitted estimator. It simply chains calls to
fit()andtransform().- Args:
- data: The data to use for fitting and to transform. The data must have the
same type as for the
fit()method.
- Return type:
TypeVar(R_co, covariant=True)- Returns:
The transformed data. Consult the
transform()documentation for more information on the return type.
- KMeans.get_params(deep=True)#
Returns the estimator’s parameters as passed to the initializer.
- Args:
deep: Ignored. For Scikit-learn compatibility.
- classmethod KMeans.load(path)#
Loads the estimator and (if available) the fitted model. This method should only be expected to work to load an estimator that has previously been saved via
save().- Args:
path: The directory from which to load the estimator.
- Return type:
TypeVar(E, bound= BaseEstimator)- Returns:
The loaded estimator, either fitted or not.
- KMeans.load_attributes(path)#
Loads the fitted attributes that are stored at the fitted path. If subclasses overwrite
save_attributes(), this method should also be overwritten.Typically, this method should not be called directly. It is called as part of
load().- Return type:
- Args:
path: The directory from which the parameters should be loaded.
- Raises:
FileNotFoundError – If the no fitted attributes have been stored.:
- classmethod KMeans.load_parameters(path)#
Initializes this estimator by loading its parameters. If subclasses overwrite
save_parameters(), this method should also be overwritten.Typically, this method should not be called directly. It is called as part of
load().- Return type:
TypeVar(E, bound= BaseEstimator)
- Args:
path: The directory from which the parameters should be loaded.
- KMeans.predict(data)#
Predicts the closest cluster for each item provided.
- Args:
data: The datapoints for which to predict the clusters.
- Return type:
Tensor- Returns:
- Tensor of shape
[num_datapoints]with the index of the closest cluster for each datapoint.
- Attention:
When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.
- Tensor of shape
- KMeans.save(path)#
Saves the estimator to the provided directory. It saves a file named
estimator.picklefor the configuration of the estimator and additional files for the fitted model (if applicable). For more information on the files saved for the fitted model or for more customization, look atget_params()andtorchgmm.base.nn.Configurable.save().- Return type:
- Args:
path: The directory to which all files should be saved.
- Note:
This method may be called regardless of whether the estimator has already been fitted.
- Attention:
If the dictionary returned by
get_params()is not JSON-serializable, this method usespicklewhich is not necessarily backwards-compatible.
- KMeans.save_attributes(path)#
Saves the fitted attributes of this estimator. By default, it uses JSON and falls back to
pickle. Subclasses should overwrite this method if non-primitive attributes are fitted.Typically, this method should not be called directly. It is called as part of
save().- Return type:
- Args:
path: The directory to which the fitted attributed should be saved.
- Raises:
NotFittedError – If the estimator has not been fitted.:
- KMeans.save_parameters(path)#
Saves the parameters of this estimator. By default, it uses JSON and falls back to
pickle. It subclasses use non-primitive types as parameters, they should overwrite this method.Typically, this method should not be called directly. It is called as part of
save().- Return type:
- Args:
path: The directory to which the parameters should be saved.
- KMeans.score(data)#
Computes the average inertia of all the provided datapoints. That is, it computes the mean squared distance to each datapoint’s closest centroid.
- Args:
data: The data for which to compute the average inertia.
- Return type:
- Returns:
The average inertia.
- Note:
See
score_samples()to obtain the inertia for individual sequences.
- KMeans.score_samples(data)#
Computes the inertia for each of the the provided datapoints. That is, it computes the mean squared distance of each datapoint to its closest centroid.
- Args:
data: The data for which to compute the inertia values.
- Return type:
Tensor- Returns:
A tensor of shape
[num_datapoints]with the inertia of each datapoint.- Attention:
When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.
- KMeans.set_params(values)#
Sets the provided values on the estimator. The estimator is returned as well, but the estimator on which this function is called is also modified.
- Args:
values: The values to set.
- Return type:
TypeVar(E, bound= BaseEstimator)- Returns:
The estimator where the values have been set.
- KMeans.trainer(**kwargs)#
Returns the trainer as configured by the estimator. Typically, this method is only called by functions in the estimator.
- Args:
- kwargs: Additional arguments that override the trainer arguments registered in the
initializer of the estimator.
- Return type:
Trainer- Returns:
A fully initialized PyTorch Lightning trainer.
- Note:
This function should be preferred over initializing the trainer directly. It ensures that the returned trainer correctly deals with TorchGMM components that may be introduced in the future.
- KMeans.transform(data)#
Transforms the provided data into the cluster-distance space. That is, it returns the distance of each datapoint to each cluster centroid.
- Args:
data: The data to transform.
- Return type:
Tensor- Returns:
- A tensor of shape
[num_datapoints, num_clusters]with the distances to the cluster centroids.
- Attention:
When calling this function in a multi-process environment, each process receives only a subset of the predictions. If you want to aggregate predictions, make sure to gather the values returned from this method.
- A tensor of shape