添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
活泼的匕首  ·  地方动态·  1 月前    · 
踏实的移动电源  ·  $10 DIY Wifi IFTTT ...·  6 月前    · 
眉毛粗的电梯  ·  手办 | Hpoi手办维基·  6 月前    · 
酷酷的茴香  ·  Simulating Delay ...·  11 月前    · 
  • inplace_csr_row_normalize_l1
  • inplace_csr_row_normalize_l2
  • single_source_shortest_path_length
  • sample_without_replacement
  • min_pos
  • MetadataRequest
  • MetadataRouter
  • MethodMapping
  • get_routing_for_object
  • process_routing
  • all_displays
  • all_estimators
  • all_functions
  • check_estimator
  • parametrize_with_checks
  • Parallel
  • delayed
  • Recently Deprecated
    • parallel_backend
    • register_parallel_backend
    • class sklearn.neighbors. KDTree #

      KDTree for fast generalized N-point problems

      Read more in the User Guide .

      Parameters :
      X array-like of shape (n_samples, n_features)

      n_samples is the number of points in the data set, and n_features is the dimension of the parameter space. Note: if X is a C-contiguous array of doubles then data will not be copied. Otherwise, an internal copy will be made.

      leaf_size positive int, default=40

      Number of points at which to switch to brute-force. Changing leaf_size will not affect the results of a query, but can significantly impact the speed of a query and the memory required to store the constructed tree. The amount of memory needed to store the tree scales as approximately n_samples / leaf_size. For a specified leaf_size , a leaf node is guaranteed to satisfy leaf_size <= n_points <= 2 * leaf_size , except in the case that n_samples < leaf_size .

      metric str or DistanceMetric64 object, default=’minkowski’

      Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. A list of valid metrics for KDTree is given by the attribute valid_metrics . See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for more information on any distance metric.

      Additional keywords are passed to the distance metric class.
      Note: Callable functions in the metric parameter are NOT supported for KDTree
      and Ball Tree. Function call overhead will result in very poor performance.
      Attributes :
      data memory view

      The training data

      valid_metrics: list of str

      List of valid distance metrics.

      Query for k-nearest neighbors

      >>> import numpy as np
      >>> from sklearn.neighbors import KDTree
      >>> rng = np.random.RandomState(0)
      >>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
      >>> tree = KDTree(X, leaf_size=2)              
      >>> dist, ind = tree.query(X[:1], k=3)                
      >>> print(ind)  # indices of 3 closest neighbors
      [0 3 1]
      >>> print(dist)  # distances to 3 closest neighbors
      [ 0.          0.19662693  0.29473397]
      

      Pickle and Unpickle a tree. Note that the state of the tree is saved in the pickle operation: the tree needs not be rebuilt upon unpickling.

      >>> import numpy as np
      >>> import pickle
      >>> rng = np.random.RandomState(0)
      >>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
      >>> tree = KDTree(X, leaf_size=2)        
      >>> s = pickle.dumps(tree)                     
      >>> tree_copy = pickle.loads(s)                
      >>> dist, ind = tree_copy.query(X[:1], k=3)     
      >>> print(ind)  # indices of 3 closest neighbors
      [0 3 1]
      >>> print(dist)  # distances to 3 closest neighbors
      [ 0.          0.19662693  0.29473397]
      

      Query for neighbors within a given radius

      >>> import numpy as np
      >>> rng = np.random.RandomState(0)
      >>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
      >>> tree = KDTree(X, leaf_size=2)     
      >>> print(tree.query_radius(X[:1], r=0.3, count_only=True))
      >>> ind = tree.query_radius(X[:1], r=0.3)  
      >>> print(ind)  # indices of neighbors within distance 0.3
      [3 0 1]
      

      Compute a gaussian kernel density estimate:

      >>> import numpy as np
      >>> rng = np.random.RandomState(42)
      >>> X = rng.random_sample((100, 3))
      >>> tree = KDTree(X)                
      >>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')
      array([ 6.94114649,  7.83281226,  7.2071716 ])
      

      Compute a two-point auto-correlation function

      >>> import numpy as np
      >>> rng = np.random.RandomState(0)
      >>> X = rng.random_sample((30, 3))
      >>> r = np.linspace(0, 1, 5)
      >>> tree = KDTree(X)                
      >>> tree.two_point_correlation(X, r)
      array([ 30,  62, 278, 580, 820])
      kernel_density(X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)#
      

      Compute the kernel density estimate at points X with the given kernel, using the distance metric specified at tree creation.

      Parameters:
      Xarray-like of shape (n_samples, n_features)

      An array of points to query. Last dimension should match dimension of training data.

      hfloat

      the bandwidth of the kernel

      kernelstr, default=”gaussian”

      specify the kernel to use. Options are - ‘gaussian’ - ‘tophat’ - ‘epanechnikov’ - ‘exponential’ - ‘linear’ - ‘cosine’ Default is kernel = ‘gaussian’

      atolfloat, default=0

      Specify the desired absolute tolerance of the result. If the true result is K_true, then the returned result K_ret satisfies abs(K_true - K_ret) < atol + rtol * K_ret The default is zero (i.e. machine precision).

      rtolfloat, default=1e-8

      Specify the desired relative tolerance of the result. If the true result is K_true, then the returned result K_ret satisfies abs(K_true - K_ret) < atol + rtol * K_ret The default is 1e-8 (i.e. machine precision).

      breadth_firstbool, default=False

      If True, use a breadth-first search. If False (default) use a depth-first search. Breadth-first is generally faster for compact kernels and/or high tolerances.

      return_logbool, default=False

      Return the logarithm of the result. This can be more accurate than returning the result itself for narrow kernels.

      Returns:
      densityndarray of shape X.shape[:-1]

      The array of (log)-density evaluations

      query(X, k=1, return_distance=True, dualtree=False, breadth_first=False)#

      query the tree for the k nearest neighbors

      Parameters:
      Xarray-like of shape (n_samples, n_features)

      An array of points to query

      kint, default=1

      The number of nearest neighbors to return

      return_distancebool, default=True

      if True, return a tuple (d, i) of distances and indices if False, return array i

      dualtreebool, default=False

      if True, use the dual tree formalism for the query: a tree is built for the query points, and the pair of trees is used to efficiently search this space. This can lead to better performance as the number of points grows large.

      breadth_firstbool, default=False

      if True, then query the nodes in a breadth-first manner. Otherwise, query the nodes in a depth-first manner.

      sort_resultsbool, default=True

      if True, then distances and indices of each point are sorted on return, so that the first column contains the closest points. Otherwise, neighbors are returned in an arbitrary order.

      Returns:
      iif return_distance == False
      (d,i)if return_distance == True
      dndarray of shape X.shape[:-1] + (k,), dtype=double

      Each entry gives the list of distances to the neighbors of the corresponding point.

      indarray of shape X.shape[:-1] + (k,), dtype=int

      Each entry gives the list of indices of neighbors of the corresponding point.

      query_radius(X, r, return_distance=False, count_only=False, sort_results=False)#

      query the tree for neighbors within a radius r

      Parameters:
      Xarray-like of shape (n_samples, n_features)

      An array of points to query

      rdistance within which neighbors are returned

      r can be a single value, or an array of values of shape x.shape[:-1] if different radii are desired for each point.

      return_distancebool, default=False

      if True, return distances to neighbors of each point if False, return only neighbors Note that unlike the query() method, setting return_distance=True here adds to the computation time. Not all distances need to be calculated explicitly for return_distance=False. Results are not sorted by default: see sort_results keyword.

      count_onlybool, default=False

      if True, return only the count of points within distance r if False, return the indices of all points within distance r If return_distance==True, setting count_only=True will result in an error.

      sort_resultsbool, default=False

      if True, the distances and indices will be sorted before being returned. If False, the results will not be sorted. If return_distance == False, setting sort_results = True will result in an error.

      Returns:
      countif count_only == True
      indif count_only == False and return_distance == False
      (ind, dist)if count_only == False and return_distance == True
      countndarray of shape X.shape[:-1], dtype=int

      Each entry gives the number of neighbors within a distance r of the corresponding point.

      indndarray of shape X.shape[:-1], dtype=object

      Each element is a numpy integer array listing the indices of neighbors of the corresponding point. Note that unlike the results of a k-neighbors query, the returned neighbors are not sorted by distance by default.

      distndarray of shape X.shape[:-1], dtype=object

      Each element is a numpy double array listing the distances corresponding to indices in i.

      two_point_correlation(X, r, dualtree=False)#

      Compute the two-point correlation function

      Parameters:
      Xarray-like of shape (n_samples, n_features)

      An array of points to query. Last dimension should match dimension of training data.

      rarray-like

      A one-dimensional array of distances

      dualtreebool, default=False

      If True, use a dualtree algorithm. Otherwise, use a single-tree algorithm. Dual tree algorithms can have better scaling for large N.

      Returns:
      countsndarray

      counts[i] contains the number of pairs of points with distance less than or equal to r[i]

  •