添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Python – How to get first value in groupby , TypeError: first() missing 1 required positional argument: ‘offset’

dataframe group-by pandas python

trying to get most occured counts for each status

my code:

 df.groupby('States')['Counts'].value_counts().first(), gives 

TypeError: first() missing 1 required positional argument: 'offset'

expected output:

  States Counts
  AK     one
  LO     three

Use lambda function:

df = df.groupby('States')['Counts'].apply(lambda x: x.value_counts().index[0]).reset_index(name='val')
print (df)
  States    val
0     AK    one
1     LO  three

The function ord() gets the int value of the char. And in case you want to convert back after playing with the number, function chr() does the trick.

>>> ord('a')
>>> chr(97)
>>> chr(ord('a') + 3)

In Python 2, there was also the unichr function, returning the Unicode character whose ordinal is the unichr argument:

>>> unichr(97)
>>> unichr(1234)
u'\u04d2'

In Python 3 you can use chr instead of unichr.

ord() - Python 3.6.5rc1 documentation

ord() - Python 2.7.14 documentation

;WITH cte AS
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
   FROM DocumentStatusLogs
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

    Related Question