GroupBy.count() returns the grouping column as both index and column · Issue #5610

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

刀枪不入的金针菇 · nanomsg · PyPI· 6 月前 ·

犯傻的麻辣香锅 · How to use a Scala ...· 6 月前 ·

考研的麦片 · Google Cloud PHP ...· 7 月前 ·

大气的梨子 · 如何添加 wordpress ...· 9 月前 ·

沉着的大白菜 · FR.ajax- ...· 10 月前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy.count() (with the default as_index=True ) return the grouping column both as index and as column, while other methods as first and sum keep it only as the index (which is most logical I think). This seems a minor inconsistency to me:

In [41]: data = pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})
In [42]: data
Out[42]:
   counts name
0       3    a
1       4    a
2       3    b
3       2    d
In [43]: g = data.groupby('name')
In [45]: g.count()
Out[45]:
      counts  name
a          2     2
b          1     1
d          1     1
In [46]: g.first()
Out[46]:
      counts
a          3
b          3
d          2
In [47]: g.sum()
Out[47]:
      counts
a          7
b          3
d          2
@hayd @TomAugspurger
these look right? are first/last just different?
In [2]: df= pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})
In [3]: g = df.groupby('name')
In [4]: g.count()
Out[4]: 
      counts  name
a          2     2
b          1     1
d          1     1
[3 rows x 2 columns]
In [5]: g.first()
Out[5]: 
      counts
a          3
b          3
d          2
[3 rows x 1 columns]
In [6]: g.head()
Out[6]: 
   counts name
0       3    a
1       4    a
2       3    b
3       2    d
[4 rows x 2 columns]
In [7]: g.tail()
Out[7]: 
   counts name
0       3    a
1       4    a
2       3    b
3       2    d
[4 rows x 2 columns]
In [8]: g.last()
Out[8]: 
      counts
a          4
b          3
d          2
[3 rows x 1 columns]
          IMO first should do the same as g.nth(0) and last as g.nth(-1), since as mentioned in the larger PR they are not aggregations (I think breaking these are on the roadmap for 0.14?). First and last are implemented as aggs atm.
Original issue is that count includes name, I also don't think it should. Will have a look at this, may be simple fix. Related to cumsum etc. including the grouped by columns (so may be a generic fix in agg).
          http://stackoverflow.com/questions/23352418/unexpected-behavior-in-pandas-mad-with-groupby/23352706#23352706
mad is the same problem (as are prob the non-cythonized calls).
          These seem completely wrong (I haven't changed anything yet to exlucde the 'A' column

when testing these)
In [1]: df = DataFrame([[1, 2, 'foo'], [1, nan, 'bar',], [3, nan, 'baz']], columns=['A', 'B','C'])
In [2]: df
Out[2]: 
   A   B    C
0  1   2  foo
1  1 NaN  bar
2  3 NaN  baz
[3 rows x 3 columns]
In [3]: df.groupby('A').shift(1)
Out[3]: 
    A   B    C
0 NaN NaN  NaN
1   1   2  foo
2 NaN NaN  NaN
[3 rows x 3 columns]
In [4]: df.groupby('A').fillna(-1)
Out[4]: 
   A  B    C
0  1  2  foo
1  1 -1  bar
2  3 -1  baz
[3 rows x 3 columns]
In [5]: df.groupby('A').apply(lambda x: x.fillna(-1))
Out[5]: 
   A  B    C
0  1  2  foo
1  1 -1  bar
2  3 -1  baz
[3 rows x 3 columns]
          Why should they raise? Eg df.groupby('A').shift(1) seems correct to me (shift within each group).

It's a bit like the transform functions?
The use of fillna is less clear, but it seems this is added explicitely in the whitelist.
      ENH/BUG: add count to grouper / ensure that grouper keys are not included in the returned
      #7000
      ENH/BUG: add count to grouper / ensure that grouper keys are not included in the returned
      jreback/pandas