>>> import numpy as np
>>> from sklearn.neighbors import KDTree
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)
>>> dist, ind = tree.query(X[:1], k=3)
>>> print(ind) # indices of 3 closest neighbors
[0 3 1]
>>> print(dist) # distances to 3 closest neighbors
[ 0. 0.19662693 0.29473397]
Pickle and Unpickle a tree. Note that the state of the tree is saved in the
pickle operation: the tree needs not be rebuilt upon unpickling.
>>> import numpy as np
>>> import pickle
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)
>>> s = pickle.dumps(tree)
>>> tree_copy = pickle.loads(s)
>>> dist, ind = tree_copy.query(X[:1], k=3)
>>> print(ind) # indices of 3 closest neighbors
[0 3 1]
>>> print(dist) # distances to 3 closest neighbors
[ 0. 0.19662693 0.29473397]
Query for neighbors within a given radius
>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)
>>> print(tree.query_radius(X[:1], r=0.3, count_only=True))
>>> ind = tree.query_radius(X[:1], r=0.3)
>>> print(ind) # indices of neighbors within distance 0.3
[3 0 1]
Compute a gaussian kernel density estimate:
>>> import numpy as np
>>> rng = np.random.RandomState(42)
>>> X = rng.random_sample((100, 3))
>>> tree = KDTree(X)
>>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')
array([ 6.94114649, 7.83281226, 7.2071716 ])
Compute a two-point auto-correlation function
>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((30, 3))
>>> r = np.linspace(0, 1, 5)
>>> tree = KDTree(X)
>>> tree.two_point_correlation(X, r)
array([ 30, 62, 278, 580, 820])
kernel_density(X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)
Compute the kernel density estimate at points X with the given kernel,
using the distance metric specified at tree creation.
Parameters:
Xarray-like of shape (n_samples, n_features)An array of points to query. Last dimension should match dimension
of training data.
hfloatthe bandwidth of the kernel
kernelstr, default=”gaussian”specify the kernel to use. Options are
- ‘gaussian’
- ‘tophat’
- ‘epanechnikov’
- ‘exponential’
- ‘linear’
- ‘cosine’
Default is kernel = ‘gaussian’
atolfloat, default=0Specify the desired absolute tolerance of the result.
If the true result is K_true
, then the returned result K_ret
satisfies abs(K_true - K_ret) < atol + rtol * K_ret
The default is zero (i.e. machine precision).
rtolfloat, default=1e-8Specify the desired relative tolerance of the result.
If the true result is K_true
, then the returned result K_ret
satisfies abs(K_true - K_ret) < atol + rtol * K_ret
The default is 1e-8
(i.e. machine precision).
breadth_firstbool, default=FalseIf True, use a breadth-first search. If False (default) use a
depth-first search. Breadth-first is generally faster for
compact kernels and/or high tolerances.
return_logbool, default=FalseReturn the logarithm of the result. This can be more accurate
than returning the result itself for narrow kernels.
Returns:
densityndarray of shape X.shape[:-1]The array of (log)-density evaluations
query(X, k=1, return_distance=True, dualtree=False, breadth_first=False)
query the tree for the k nearest neighbors
Parameters:
Xarray-like of shape (n_samples, n_features)An array of points to query
kint, default=1The number of nearest neighbors to return
return_distancebool, default=Trueif True, return a tuple (d, i) of distances and indices
if False, return array i
dualtreebool, default=Falseif True, use the dual tree formalism for the query: a tree is
built for the query points, and the pair of trees is used to
efficiently search this space. This can lead to better
performance as the number of points grows large.
breadth_firstbool, default=Falseif True, then query the nodes in a breadth-first manner.
Otherwise, query the nodes in a depth-first manner.
sort_resultsbool, default=Trueif True, then distances and indices of each point are sorted
on return, so that the first column contains the closest points.
Otherwise, neighbors are returned in an arbitrary order.
Returns:
iif return_distance == False
(d,i)if return_distance == True
dndarray of shape X.shape[:-1] + (k,), dtype=doubleEach entry gives the list of distances to the neighbors of the
corresponding point.
indarray of shape X.shape[:-1] + (k,), dtype=intEach entry gives the list of indices of neighbors of the
corresponding point.
query_radius(X, r, return_distance=False, count_only=False, sort_results=False)
query the tree for neighbors within a radius r
Parameters:
Xarray-like of shape (n_samples, n_features)An array of points to query
rdistance within which neighbors are returnedr can be a single value, or an array of values of shape
x.shape[:-1] if different radii are desired for each point.
return_distancebool, default=Falseif True, return distances to neighbors of each point
if False, return only neighbors
Note that unlike the query() method, setting return_distance=True
here adds to the computation time. Not all distances need to be
calculated explicitly for return_distance=False. Results are
not sorted by default: see sort_results
keyword.
count_onlybool, default=Falseif True, return only the count of points within distance r
if False, return the indices of all points within distance r
If return_distance==True, setting count_only=True will
result in an error.
sort_resultsbool, default=Falseif True, the distances and indices will be sorted before being
returned. If False, the results will not be sorted. If
return_distance == False, setting sort_results = True will
result in an error.
Returns:
countif count_only == True
indif count_only == False and return_distance == False
(ind, dist)if count_only == False and return_distance == True
countndarray of shape X.shape[:-1], dtype=intEach entry gives the number of neighbors within a distance r of the
corresponding point.
indndarray of shape X.shape[:-1], dtype=objectEach element is a numpy integer array listing the indices of
neighbors of the corresponding point. Note that unlike
the results of a k-neighbors query, the returned neighbors
are not sorted by distance by default.
distndarray of shape X.shape[:-1], dtype=objectEach element is a numpy double array listing the distances
corresponding to indices in i.
two_point_correlation(X, r, dualtree=False)
Compute the two-point correlation function
Parameters:
Xarray-like of shape (n_samples, n_features)An array of points to query. Last dimension should match dimension
of training data.
rarray-likeA one-dimensional array of distances
dualtreebool, default=FalseIf True, use a dualtree algorithm. Otherwise, use a single-tree
algorithm. Dual tree algorithms can have better scaling for
large N.
Returns:
countsndarraycounts[i] contains the number of pairs of points with distance
less than or equal to r[i]