>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
>>> kmeans.predict([[0, 0], [12, 3]])
array([1, 0], dtype=int32)
>>> kmeans.cluster_centers_
array([[10., 2.],
[ 1., 2.]])
Methods
fit
(X[, y, sample_weight])
Compute k-means clustering.
fit_predict
(X[, y, sample_weight])
Compute cluster centers and predict cluster index for each sample.
fit_transform
(X[, y, sample_weight])
Compute clustering and transform X to cluster-distance space.
get_feature_names_out
([input_features])
Get output feature names for transformation.
get_params
([deep])
Get parameters for this estimator.
predict
(X[, sample_weight])
Predict the closest cluster each sample in X belongs to.
score
(X[, y, sample_weight])
Opposite of the value of X on the K-means objective.
set_output
(*[, transform])
Set output container.
set_params
(**params)
Set the parameters of this estimator.
transform
(X)
Transform X to a cluster-distance space.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)Training instances to cluster. It must be noted that the data
will be converted to C ordering, which will cause a memory
copy if the given data is not C-contiguous.
If a sparse matrix is passed, a copy will be made if it’s not in
CSR format.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
New in version 0.20.
fit_predict(X, y=None, sample_weight=None)[source]
Compute cluster centers and predict cluster index for each sample.
Convenience method; equivalent to calling fit(X) followed by
predict(X).
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
Returns:
labelsndarray of shape (n_samples,)Index of the cluster each sample belongs to.
fit_transform(X, y=None, sample_weight=None)[source]
Compute clustering and transform X to cluster-distance space.
Equivalent to fit(X).transform(X), but more efficiently implemented.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
Returns:
X_newndarray of shape (n_samples, n_clusters)X transformed in the new space.
get_feature_names_out(input_features=None)[source]
Get output feature names for transformation.
The feature names out will prefixed by the lowercased class name. For
example, if the transformer outputs 3 features, then the feature names
out are: ["class_name0", "class_name1", "class_name2"]
.
Parameters:
input_featuresarray-like of str or None, default=NoneOnly used to validate feature names with the names seen in fit
.
Returns:
feature_names_outndarray of str objectsTransformed feature names.
Parameters:
deepbool, default=TrueIf True, will return the parameters for this estimator and
contained subobjects that are estimators.
Returns:
paramsdictParameter names mapped to their values.
predict(X, sample_weight=None)[source]
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_
is called
the code book and each value returned by predict
is the index of
the closest code in the code book.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to predict.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
Returns:
labelsndarray of shape (n_samples,)Index of the cluster each sample belongs to.
score(X, y=None, sample_weight=None)[source]
Opposite of the value of X on the K-means objective.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
Returns:
scorefloatOpposite of the value of X on the K-means objective.
set_output(*, transform=None)[source]
Set output container.
See Introducing the set_output API
for an example on how to use the API.
Parameters:
transform{“default”, “pandas”}, default=NoneConfigure output of transform
and fit_transform
.
"default"
: Default output format of a transformer
"pandas"
: DataFrame output
None
: Transform configuration is unchanged
set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it’s
possible to update each component of a nested object.
Parameters:
**paramsdictEstimator parameters.
Returns:
selfestimator instanceEstimator instance.
transform(X)[source]
Transform X to a cluster-distance space.
In the new space, each dimension is the distance to the cluster
centers. Note that even if X is sparse, the array returned by
transform
will typically be dense.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
Returns:
X_newndarray of shape (n_samples, n_clusters)X transformed in the new space.
Release Highlights for scikit-learn 0.23
A demo of K-Means clustering on the handwritten digits data
A demo of K-Means clustering on the handwritten digits data
Bisecting K-Means and Regular K-Means Performance Comparison
Bisecting K-Means and Regular K-Means Performance Comparison
Color Quantization using K-Means
Color Quantization using K-Means
Comparison of the K-Means and MiniBatchKMeans clustering algorithms
Comparison of the K-Means and MiniBatchKMeans clustering algorithms
Demonstration of k-means assumptions
Demonstration of k-means assumptions
Empirical evaluation of the impact of k-means initialization
Empirical evaluation of the impact of k-means initialization
K-means Clustering
K-means Clustering
Selecting the number of clusters with silhouette analysis on KMeans clustering
Selecting the number of clusters with silhouette analysis on KMeans clustering
Clustering text documents using k-means
Clustering text documents using k-means