添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

本文详细介绍 鸢尾花(iris)数据集 matplotlib.pyplot.scatter matplotlib.axes.Axes.scatter 两种方法绘制散点图 scatter。

你将学到什么?

1、鸢尾花(iris)数据集
数据集导入、查看特征
    DESCR
    feature_names
    target
    target_names
将鸢尾花数据集转为DataFrame数据集
2、matplotlib.pyplot.scatter法绘制散点图 (参数详解)
3、matplotlib.axes.Axes.scatter法绘制散点图 (参数详解)

更好的阅读体验请戳:手把手教您python matlibplot绘制散点图(scatter)

1、鸢尾花(iris)数据集详细介绍

  • 数据集导入、查看特征

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    from sklearn import datasets 
    iris=datasets.load_iris()
    dir(iris)
    

    ['DESCR', 'data', 'feature_names', 'target', 'target_names']

    DESCR

    #DESCR为数据集的描述信息,输出来看看:

    print(iris.DESCR)
    
    Iris Plants Database
    ====================
    Notes
    -----
    Data Set Characteristics:
        :Number of Instances: 150 (50 in each of three classes)
        :Number of Attributes: 4 numeric, predictive attributes and the class
        :Attribute Information:#四列数据的四个特征
            - sepal length in cm
            - sepal width in cm
            - petal length in cm
            - petal width in cm
            - class:#数据描述三类鸢尾花
                    - Iris-Setosa
                    - Iris-Versicolour
                    - Iris-Virginica
        :Summary Statistics:#四列数据的简单统计信息
        ============== ==== ==== ======= ===== ====================
                        Min  Max   Mean    SD   Class Correlation
        ============== ==== ==== ======= ===== ====================
        sepal length:   4.3  7.9   5.84   0.83    0.7826
        sepal width:    2.0  4.4   3.05   0.43   -0.4194
        petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
        petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
        ============== ==== ==== ======= ===== ====================
        :Missing Attribute Values: None
        :Class Distribution: 33.3% for each of 3 classes.
        :Creator: R.A. Fisher
        :Donor: Michael Marshall (MARSHALL%[email protected])
        :Date: July, 1988
    This is a copy of UCI ML iris datasets.
    http://archive.ics.uci.edu/ml/datasets/Iris
    The famous Iris database, first used by Sir R.A Fisher
    This is perhaps the best known database to be found in the
    pattern recognition literature.  Fisher's paper is a classic in the field and
    is referenced frequently to this day.  (See Duda & Hart, for example.)  The
    data set contains 3 classes of 50 instances each, where each class refers to a
    type of iris plant.  One class is linearly separable from the other 2; the
    latter are NOT linearly separable from each other.
    References
    ----------
       - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
         Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
         Mathematical Statistics" (John Wiley, NY, 1950).
       - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
         (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
       - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
         Structure and Classification Rule for Recognition in Partially Exposed
         Environments".  IEEE Transactions on Pattern Analysis and Machine
         Intelligence, Vol. PAMI-2, No. 1, 67-71.
       - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
         on Information Theory, May 1972, 431-433.
       - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
         conceptual clustering system finds 3 classes in the data.
       - Many, many more ...
    

    鸢尾花四个特征的数据。

    print(type(iris.data))
    print(iris.data.shape)
    iris.data[:10,:]
    

    <class 'numpy.ndarray'>#数据格式为numpy.ndarray(150, 4)#数据集大小为150行4列array([[5.1, 3.5, 1.4, 0.2],#数据集前十行
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])

    feature_names

    以上4列数据的名称,从左到右依次为花萼长度、花萼宽度、花瓣长度、花瓣宽度,单位都是cm。

    print(iris.feature_names)
    

    ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

    target

    使用数字0. ,1. ,2.标识每行数据代表什么类的鸢尾花。

    print(iris.target)#150个元素的list
    

    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

    target_names

    鸢尾花的名称,Setosa(山鸢尾花)、Versicolour(杂色鸢尾花)、Virginica(维吉尼亚鸢尾花)。

    print(iris.target_names)
    

    ['setosa' 'versicolor' 'virginica']

    将鸢尾花数据集转为DataFrame数据集

    x, y = iris.data, iris.target
    pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150, 1))),columns=['sepal length(cm)','sepal width(cm)','petal length(cm)','petal width(cm)','class'] )
    #np.hstack()类似linux中的paste
    #np.vstack()类似linux中的cat
    pd_iris.head()
    

    2、matplotlib.pyplot.scatter法绘制散点图 (参数详解)

  • 取数据集前两列绘制简单散点图
  • import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    #数据准备
    from sklearn import datasets 
    iris=datasets.load_iris()
    x, y = iris.data, iris.target
    pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150, 1))),columns=['sepal length(cm)','sepal width(cm)','petal length(cm)','petal width(cm)','class'] )
    plt.figure(dpi=100)
    plt.scatter(pd_iris['sepal length(cm)'],pd_iris['sepal width(cm)'])
    #根据sepal length(cm)和sepal width(cm)两列,每一行两个数值确定的点绘制到figure上即为散点
    
  • 三种不同鸢尾花的数据使用不同的图形(marker)和颜色表示
  • import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    #数据准备
    from sklearn import datasets 
    iris=datasets.load_iris()
    x, y = iris.data, iris.target
    pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150, 1))),columns=['sepal length(cm)','sepal width(cm)','petal length(cm)','petal width(cm)','class'] )
    plt.figure(dpi=150)#设置图的分辨率
    plt.style.use('Solarize_Light2')#使用Solarize_Light2风格绘图
    iris_type=pd_iris['class'].unique()#根据class列将点分为三类
    iris_name=iris.target_names#获取每一类的名称
    colors = ['#c72e29','#098154','#fb832d']#三种不同颜色
    markers = ['$\clubsuit,'.','+']#三种不同图形
    for i in range(len(iris_type)):
        plt.scatter(pd_iris.loc[pd_iris['class'] == iris_type[i], 'sepal length(cm)'],#传入数据x
                    pd_iris.loc[pd_iris['class'] == iris_type[i], 'sepal width(cm)'],#传入数据y
                    s = 50,#散点图形(marker)的大小
                    c = colors[i],#marker颜色
                    marker = markers[i],#marker形状
                    #marker=matplotlib.markers.MarkerStyle(marker = markers[i],fillstyle='full'),#设置marker的填充
                    alpha=0.8,#marker透明度,范围为0-1
                    facecolors='r',#marker的填充颜色,当上面c参数设置了颜色,优先c
                    edgecolors='none',#marker的边缘线色
                    linewidths=1,#marker边缘线宽度,edgecolors不设置时,该参数不起作用
                    label = iris_name[i])#后面图例的名称取自label
    plt.legend(loc = 'upper right')
    

    3、matplotlib.axes.Axes.scatter法绘制散点图(参数详解)

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    #数据准备
    from sklearn import datasets 
    iris=datasets.load_iris()
    x, y = iris.data, iris.target
    pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150, 1))),columns=['sepal length(cm)','sepal width(cm)','petal length(cm)','petal width(cm)','class'] )
    fig,ax = plt.subplots(dpi=150)
    iris_type=pd_iris['class'].unique()#根据class列将点分为三类
    iris_name=iris.target_names#获取每一类的名称
    colors = ['#c72e29','#098154','#fb832d']#三种不同颜色
    markers = ['$\clubsuit,'.','
    
    
    
    
        
    +']#三种不同图形
    for i in range(len(iris_type)):
        plt.scatter(pd_iris.loc[pd_iris['class'] == iris_type[i], 'sepal length(cm)'],#传入数据x
                    pd_iris.loc[pd_iris['class'] == iris_type[i], 'sepal width(cm)'],#传入数据y
                    s = 50,#散点图形(marker)的大小
                    c = colors[i],#marker颜色
                    marker = markers[i],#marker形状
                    #marker=matplotlib.markers.MarkerStyle(marker = markers[i],fillstyle='full'),#设置marker的填充
                    alpha=0.8,#marker透明度,范围为0-1
                    facecolors='r',#marker的填充颜色,当上面c参数设置了颜色,优先c
                    edgecolors='none',#marker的边缘线色
                    linewidths=1,#marker边缘线宽度,edgecolors不设置时,改参数不起作用
                    label = iris_name[i])#后面图例的名称取自label
    plt.legend(loc = 'upper right')
    

    4、参考资料

    scikit-learn.org/stable/data…
    matplotlib.org/api/_as_gen…
    matplotlib.org/api/_as_gen…

    更好的阅读体验请戳:手把手教您python matlibplot绘制散点图(scatter)

    同系列好文

    Python可视化|matplotlib07-Matplotlib Colormap(三)
    Python可视化|08-Palettable库中颜色条Colormap(四)
    Python|R可视化|09-提取图片颜色绘图(五-颜色使用完结篇)

    欢迎关注公众号:pythonic生物人

    分类:
    人工智能
  •