添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Sample, a copy-pastable example if possible

dicts = [{'filter_col':False, 'groupby_col':True, 'bool_col':True, 'float_col':10.5}, {'filter_col':True, 'groupby_col':True, 'bool_col':True, 'float_col':20.5}, {'filter_col':True, 'groupby_col':True, 'bool_col':True, 'float_col':30.5}]
df = DataFrame(dicts)
df_filter = df[df['filter_col'] == True]
dfgb = df_filter.groupby('groupby_col')
dfgb.std()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/site-packages/pandas/core/groupby.py", line 1055, in std
    return np.sqrt(self.var(ddof=ddof))
AttributeError: 'bool' object has no attribute 'sqrt'

Problem description

Required elements for the error to appear are:

  • groupby() is applied to a filtered DataFrame, not an original DataFrame
  • std(), not another aggregate function (e.g. mean()), is called on the DataFrameGroupBy object
  • the DataFrame contains a column of type bool
  • there are at least 2 rows w/ the same value of the .groupby() column (here, 'groupby_col')
  • In my more-complicated real-world data where I ran into the error, I would also see an Exception complaining about type float:

    AttributeError: 'float' object has no attribute 'sqrt'

    However, even in that case, deleting the bool column would resolve the issue.

    Presumably I'll be able to work around the issue by calling .std() on individual columns of the DataFrameGroupBy object, but it seems like pandas should be able to handle this case w/o choking.

    Expected Output

                 bool_col  filter_col  float_col
    groupby_col                                 
    True              0.0     0.0       7.07107
    

    Output of pd.show_versions()

    INSTALLED VERSIONS

    commit: None

    python: 3.5.3.final.0

    python-bits: 64

    OS: Linux

    OS-release: 4.9.16-gentoo
    machine: x86_64
    processor: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
    byteorder: little
    LC_ALL: en_US.UTF-8
    LANG: en_US.UTF-8
    LOCALE: en_US.UTF-8

    pandas: 0.19.1
    nose: None
    pip: 7.1.2
    setuptools: 30.4.0
    Cython: 0.25.1
    numpy: 1.10.4
    scipy: 0.16.1
    statsmodels: 0.6.1
    xarray: None
    IPython: None
    sphinx: None
    patsy: 0.4.1
    dateutil: 2.4.2
    pytz: 2016.3
    blosc: None
    bottleneck: 1.0.0
    tables: None
    numexpr: 2.6.1
    matplotlib: 1.5.3
    openpyxl: None
    xlrd: None
    xlwt: None
    xlsxwriter: None
    lxml: None
    bs4: 4.5.3
    html5lib: 0.9999999
    httplib2: 0.9.2
    apiclient: None
    sqlalchemy: None
    pymysql: None
    psycopg2: 2.6.2 (dt dec pq3 ext lo64)
    jinja2: 2.9.5
    boto: None
    pandas_datareader: None

    we exclude non-numeric columns in aggregations. however, bool is valid for some.

    In [8]: df.groupby('groupby_col').sum()
    Out[8]: 
                 bool_col  filter_col  float_col
    groupby_col                                 
    True              3.0         2.0       61.5
    In [9]: df.groupby('groupby_col').mean()
    Out[9]: 
                 bool_col  filter_col  float_col
    groupby_col                                 
    True              1.0    0.666667       20.5
    In [10]: df.dtypes
    Out[10]: 
    bool_col          bool
    filter_col        bool
    float_col      float64
    groupby_col       bool
    dtype: object
    

    so we could fix generally, by simply astyping bool columns (we already cast certain columns for computation anyhow), or could pull back and remove bool from numeric aggregations like sum/mean.

    @TomAugspurger

    sqrt and var can also make sense for booleans, but we seem to fail for when the column being aggregated has no variance.

    In [5]: pd.DataFrame({"A": [1, 1, 1], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-5-375645985fa5> in <module>()
    ----> 1 pd.DataFrame({"A": [1, 1, 1], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
    /Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/groupby.py in std(self, ddof, *args, **kwargs)
       1080         # TODO: implement at Cython level?
       1081         nv.validate_groupby_func('std', args, kwargs)
    -> 1082         return np.sqrt(self.var(ddof=ddof, **kwargs))
       1084     @Substitution(name='groupby')
    AttributeError: 'bool' object has no attribute 'sqrt'
    In [6]: pd.DataFrame({"A": [1, 1, 2], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
    Out[6]:
    1  0.0  0.0
    2  NaN  NaN
    In [7]: pd.DataFrame({"A": [1, 1, 1], "B": [True, True, False], "C": [1, 1, 1]}).groupby("A").std()
    Out[7]:
    1  0.57735  0.0

    Really, the underlying issue is probably unrelated to groupby.

    In [45]: pd.DataFrame({"A": [1, 1, 1, 1], "B": [True, True, True, True], "C": [1, 1, 1, 2]}).groupby("A").var()
    Out[45]:
    1  False  0.25

    Should the B column there be 0, not False? That'd be consistent with numpy

    In [46]: np.var([1, 1, 1, 1])
    Out[46]: 0.0
    >>> dicts = [ ... {"filter_col": False, "groupby_col": True, "bool_col": True, "float_col": 10.5}, ... {"filter_col": True, "groupby_col": True, "bool_col": True, "float_col": 20.5}, ... {"filter_col": True, "groupby_col": True, "bool_col": True, "float_col": 30.5}, ... ] >>> df = pd.DataFrame(dicts) >>> df_filter = df[df["filter_col"] == True] >>> dfgb = df_filter.groupby("groupby_col") >>> dfgb.std() filter_col bool_col float_col groupby_col True 0.0 0.0 7.071068

    could use a test.