>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
>>> kmeans.predict([[0, 0], [12, 3]])
array([1, 0], dtype=int32)
>>> kmeans.cluster_centers_
array([[10., 2.],
[ 1., 2.]])
(X[, y, sample_weight])
Compute k-means clustering.
(X[, y, sample_weight])
Compute cluster centers and predict cluster index for each sample.
(X[, y, sample_weight])
Compute clustering and transform X to cluster-distance space.
Get output feature names for transformation.
Get parameters for this estimator.
(X[, sample_weight])
Predict the closest cluster each sample in X belongs to.
(X[, y, sample_weight])
Opposite of the value of X on the K-means objective.
(*[, transform])
Set output container.
Set the parameters of this estimator.
Transform X to a cluster-distance space.
X{array-like, sparse matrix} of shape (n_samples, n_features)Training instances to cluster. It must be noted that the data
will be converted to C ordering, which will cause a memory
copy if the given data is not C-contiguous.
If a sparse matrix is passed, a copy will be made if it’s not in
CSR format.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
New in version 0.20.
fit_predict(X, y=None, sample_weight=None)[source]
Compute cluster centers and predict cluster index for each sample.
Convenience method; equivalent to calling fit(X) followed by
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
labelsndarray of shape (n_samples,)Index of the cluster each sample belongs to.
fit_transform(X, y=None, sample_weight=None)[source]
Compute clustering and transform X to cluster-distance space.
Equivalent to fit(X).transform(X), but more efficiently implemented.
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
X_newndarray of shape (n_samples, n_clusters)X transformed in the new space.
Get output feature names for transformation.
The feature names out will prefixed by the lowercased class name. For
example, if the transformer outputs 3 features, then the feature names
out are: ["class_name0", "class_name1", "class_name2"]
input_featuresarray-like of str or None, default=NoneOnly used to validate feature names with the names seen in fit
feature_names_outndarray of str objectsTransformed feature names.
deepbool, default=TrueIf True, will return the parameters for this estimator and
contained subobjects that are estimators.
paramsdictParameter names mapped to their values.
predict(X, sample_weight=None)[source]
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_
is called
the code book and each value returned by predict
is the index of
the closest code in the code book.
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to predict.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
labelsndarray of shape (n_samples,)Index of the cluster each sample belongs to.
score(X, y=None, sample_weight=None)[source]
Opposite of the value of X on the K-means objective.
X{array-like, sparse matrix} of shape (n_samples, n_features)New data.
yIgnoredNot used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=NoneThe weights for each observation in X. If None, all observations
are assigned equal weight.
scorefloatOpposite of the value of X on the K-means objective.
set_output(*, transform=None)[source]
Set output container.
See Introducing the set_output API
for an example on how to use the API.
transform{“default”, “pandas”}, default=NoneConfigure output of transform
and fit_transform
: Default output format of a transformer
: DataFrame output
: Transform configuration is unchanged
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as Pipeline
). The latter have
parameters of the form <component>__<parameter>
so that it’s
possible to update each component of a nested object.
**paramsdictEstimator parameters.
selfestimator instanceEstimator instance.
Transform X to a cluster-distance space.
In the new space, each dimension is the distance to the cluster
centers. Note that even if X is sparse, the array returned by
will typically be dense.
X{array-like, sparse matrix} of shape (n_samples, n_features)New data to transform.
X_newndarray of shape (n_samples, n_clusters)X transformed in the new space.
Release Highlights for scikit-learn 0.23
A demo of K-Means clustering on the handwritten digits data
A demo of K-Means clustering on the handwritten digits data
Bisecting K-Means and Regular K-Means Performance Comparison
Bisecting K-Means and Regular K-Means Performance Comparison
Color Quantization using K-Means
Color Quantization using K-Means
Comparison of the K-Means and MiniBatchKMeans clustering algorithms
Comparison of the K-Means and MiniBatchKMeans clustering algorithms
Demonstration of k-means assumptions
Demonstration of k-means assumptions
Empirical evaluation of the impact of k-means initialization
Empirical evaluation of the impact of k-means initialization
K-means Clustering
K-means Clustering
Selecting the number of clusters with silhouette analysis on KMeans clustering
Selecting the number of clusters with silhouette analysis on KMeans clustering
Clustering text documents using k-means
Clustering text documents using k-means