添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
含蓄的眼镜  ·  Binance API 使用文档·  3 月前    · 
打酱油的荒野  ·  Spring Security - ...·  4 月前    · 
害羞的豆芽  ·  Python的函数 | Just Do It·  5 月前    · 
孤独的咖啡  ·  Developer Community·  6 月前    · 
  • Python Programming for Data Science
  • Unsupervised Machine Learning
  • Blogs
  • D ensity B ased S patial C lustering of A pplications with N oise(DBSCAN) is one of the clustering algorithms which can find clusters in noisy data. It works even on those datasets where K-Means fail to find meaningful clusters. More information about it can be found here .

    You can learn more about the DBSCAN algorithm in the below video.

    import matplotlib.pyplot as plt # Create Sample data from sklearn.datasets import make_moons X, y= make_moons(n_samples=500, shuffle=True, noise=0.1, random_state=20) plt.scatter(x= X[:,0], y= X[:,1])

    The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b – a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples – 1.

    The best value of the Silhouette Coefficient is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster

    import pandas as pd from sklearn.metrics import silhouette_score from sklearn.cluster import DBSCAN # Defining the list of hyperparameters to try eps_list=np.arange(start=0.1, stop=0.9, step=0.01) min_sample_list=np.arange(start=2, stop=5, step=1) # Creating empty data frame to store the silhouette scores for each trials silhouette_scores_data=pd.DataFrame() for eps_trial in eps_list: for min_sample_trial in min_sample_list: # Generating DBSAN clusters db = DBSCAN(eps=eps_trial, min_samples=min_sample_trial) if(len(np.unique(db.fit_predict(X)))&gt;1): sil_score=silhouette_score(X, db.fit_predict(X)) else: continue trial_parameters="eps:" + str(eps_trial.round(1)) +" min_sample :" + str(min_sample_trial) silhouette_scores_data=silhouette_scores_data.append(pd.DataFrame(data=[[sil_score,trial_parameters]], columns=["score", "parameters"])) # Finding out the best hyperparameters with highest Score silhouette_scores_data.sort_values(by='score', ascending=False).head(1)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    ## Finding best values of eps and min_samples
    import numpy as np
    import pandas as pd
    from sklearn . metrics import silhouette_score
    from sklearn . cluster import DBSCAN
    # Defining the list of hyperparameters to try
    eps_list = np . arange ( start = 0.1 , stop = 0.9 , step = 0.01 )
    min_sample_list = np . arange ( start = 2 , stop = 5 , step = 1 )
    # Creating empty data frame to store the silhouette scores for each trials
    silhouette_scores_data = pd . DataFrame ( )
    for eps_trial in eps_list :
    for min_sample_trial in min_sample_list :
    # Generating DBSAN clusters
    db = DBSCAN ( eps = eps_trial , min_samples = min_sample_trial )
    if ( len ( np . unique ( db . fit_predict ( X ) ) ) & gt ; 1 ) :
    sil_score = silhouette_score ( X , db . fit_predict ( X ) )
    else :
    continue
    trial_parameters = "eps:" + str ( eps_trial . round ( 1 ) ) + " min_sample :" + str ( min_sample_trial )
    silhouette_scores_data = silhouette_scores_data . append ( pd . DataFrame ( data = [ [ sil_score , trial_parameters ] ] , columns = [ "score" , "parameters" ] ) )
    # Finding out the best hyperparameters with highest Score
    silhouette_scores_data . sort_values ( by = 'score' , ascending = False ) . head ( 1 )
    # DBSCAN Clustering from sklearn.cluster import DBSCAN db = DBSCAN(eps=0.18, min_samples=2) # Plotting the clusters plt.scatter(x= X[:,0], y= X[:,1], c=db.fit_predict(X)) Lead Data Scientist Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

    Hi! Thanks for the code snippet. Just a heads up it appears there may be a rendering error in line 20:

    if(len(np.unique(db.fit_predict(X)))>1):

    Reply

    Leave a Reply! Cancel Reply

    Your email address will not be published. Required fields are marked *