In [41]: data = pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})
In [42]: data
Out[42]:
counts name
0 3 a
1 4 a
2 3 b
3 2 d
In [43]: g = data.groupby('name')
In [45]: g.count()
Out[45]:
counts name
a 2 2
b 1 1
d 1 1
In [46]: g.first()
Out[46]:
counts
a 3
b 3
d 2
In [47]: g.sum()
Out[47]:
counts
a 7
b 3
d 2
@hayd @TomAugspurger
these look right? are first/last just different?
In [2]: df= pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})
In [3]: g = df.groupby('name')
In [4]: g.count()
Out[4]:
counts name
a 2 2
b 1 1
d 1 1
[3 rows x 2 columns]
In [5]: g.first()
Out[5]:
counts
a 3
b 3
d 2
[3 rows x 1 columns]
In [6]: g.head()
Out[6]:
counts name
0 3 a
1 4 a
2 3 b
3 2 d
[4 rows x 2 columns]
In [7]: g.tail()
Out[7]:
counts name
0 3 a
1 4 a
2 3 b
3 2 d
[4 rows x 2 columns]
In [8]: g.last()
Out[8]:
counts
a 4
b 3
d 2
[3 rows x 1 columns]
IMO first should do the same as g.nth(0) and last as g.nth(-1), since as mentioned in the larger PR they are not aggregations (I think breaking these are on the roadmap for 0.14?). First and last are implemented as aggs atm.
Original issue is that count includes name, I also don't think it should. Will have a look at this, may be simple fix. Related to cumsum etc. including the grouped by columns (so may be a generic fix in agg).
http://stackoverflow.com/questions/23352418/unexpected-behavior-in-pandas-mad-with-groupby/23352706#23352706
mad
is the same problem (as are prob the non-cythonized calls).
These seem completely wrong (I haven't changed anything yet to exlucde the 'A' column
when testing these)
In [1]: df = DataFrame([[1, 2, 'foo'], [1, nan, 'bar',], [3, nan, 'baz']], columns=['A', 'B','C'])
In [2]: df
Out[2]:
A B C
0 1 2 foo
1 1 NaN bar
2 3 NaN baz
[3 rows x 3 columns]
In [3]: df.groupby('A').shift(1)
Out[3]:
A B C
0 NaN NaN NaN
1 1 2 foo
2 NaN NaN NaN
[3 rows x 3 columns]
In [4]: df.groupby('A').fillna(-1)
Out[4]:
A B C
0 1 2 foo
1 1 -1 bar
2 3 -1 baz
[3 rows x 3 columns]
In [5]: df.groupby('A').apply(lambda x: x.fillna(-1))
Out[5]:
A B C
0 1 2 foo
1 1 -1 bar
2 3 -1 baz
[3 rows x 3 columns]
Why should they raise? Eg df.groupby('A').shift(1)
seems correct to me (shift within each group).
It's a bit like the transform functions?
The use of fillna
is less clear, but it seems this is added explicitely in the whitelist.
ENH/BUG: add count to grouper / ensure that grouper keys are not included in the returned
#7000
ENH/BUG: add count to grouper / ensure that grouper keys are not included in the returned
jreback/pandas