添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Sample, a copy-pastable example if possible

resampled_bookings = bookings.resample('BMS').agg(
    unique_identity_ids=pd.NamedAgg(column="identity_id", aggfunc=pd.Series.nunique),
    booking_count=pd.NamedAgg(column="booking_id", aggfunc=pd.Series.count),
    booking_distance=pd.NamedAgg(column="distance_m", aggfunc=pd.Series.sum)

Problem description

This error prevents me from aggregating data and (re)naming columns at the same time. Instead, it is necessary to use the following approach, which isn't altogether bad:

resampled_bookings = bookings.resample('BMS').agg({
    "booking_id": "count",
    "distance_m": sum,
    "identity_id": pd.Series.nunique
resampled_bookings.rename(columns={
    "booking_id": "booking_count",
    "distance_m": "booking_distance_m",
    "identity_id": "unique_identity_ids"
}, inplace=True)

Expected Output

The aggregate function would produce a new DataFrame with the following columns:

  • booking_count
  • booking_distance_m
  • unique_identity_ids
  • Output of pd.show_versions()

    INSTALLED VERSIONS

    commit : None
    python : 3.7.5.final.0
    python-bits : 64
    OS : Linux
    OS-release : 5.3.0-19-generic
    machine : x86_64
    processor : x86_64
    byteorder : little
    LC_ALL : None
    LANG : en_US.UTF-8
    LOCALE : en_US.UTF-8

    pandas : 1.0.1
    numpy : 1.18.1
    pytz : 2019.3
    dateutil : 2.8.1
    pip : 18.1
    setuptools : 40.8.0
    Cython : None
    pytest : None
    hypothesis : None
    sphinx : None
    blosc : None
    feather : None
    xlsxwriter : None
    lxml.etree : None
    html5lib : None
    pymysql : None
    psycopg2 : None
    jinja2 : 2.11.1
    IPython : 7.12.0
    pandas_datareader: None
    bs4 : None
    bottleneck : None
    fastparquet : None
    gcsfs : None
    lxml.etree : None
    matplotlib : 3.2.0
    numexpr : None
    odfpy : None
    openpyxl : None
    pandas_gbq : None
    pyarrow : None
    pytables : None
    pytest : None
    pyxlsb : None
    s3fs : None
    scipy : 1.4.1
    sqlalchemy : None
    tables : None
    tabulate : None
    xarray : None
    xlrd : None
    xlwt : None
    xlsxwriter : None
    numba : None

    I got the same error with resampling and aggregating. However, if you switch order with aggregation and then resampling it works but the result is datetime object which is not aggregated within dates.

    def new_dataset(rule,num_points = 9):
        Generate a new timeseries dataset using intervals in rule string
        and number of points num_points
        Returns a series object
        ind = pd.date_range('1/1/2000', periods=num_points, freq=rule)
        return pd.Series(list(map(lambda x: x*2,range(num_points))), index=ind)
    minutes = new_dataset('min')
    minutes.resample('2min',axis=0).agg(column1 = ('mean', 'mean'), column2 = ('sum', 'sum'))

    The above results in

    TypeError: aggregate() missing 1 required positional argument: 'func'
    

    However, with

    minutes.resample('2min',axis=0).agg( ('mean', 'mean'), ('sum', 'sum'))
    Output: 
    	mean	mean
    2000-01-01 00:00:00	1	1
    2000-01-01 00:02:00	5	5
    2000-01-01 00:04:00	9	9
    2000-01-01 00:06:00	13	13
    2000-01-01 00:08:00	16	16
    

    Call is made twice to only first aggregation which is supported even if call .mean() directly.

    Credits: https://towardsdatascience.com/using-the-pandas-resample-function-a231144194c4

    Can you try to provide a small reproducible example with the error message you're seeing (others don't have access to your data)?

    @brylie ideally the code sample could double as the test, so a minimal reproducible example would be beneficial, see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

    Pretty sure this is the same as the issue I opened here, so I'll copy over the example I put there:

    import pandas as pd
    df = pd.DataFrame(
        {"group": ["a"], "col": [1.0]}, index=[pd.to_datetime("2019-11-04 10:32:09.737")]
    df.groupby("group").resample("1D").agg(open=pd.NamedAgg("col", "first"))

    all will close that one

    you can see the comment in pandas.core.base.SelectionMixin._aggregate, line 320&331
    pandas.core.base.SpecificationError: nested renamer is not supported
    but u can use these code

    resampled_bookings = bookings.resample('BMS').agg(
            "identity_id": pd.Series.nunique,
            "booking_id": pd.Series.count,
            "distance_m": pd.Series.sum
    

    or use multiple group func

    resampled_bookings = bookings.resample('BMS').agg(
            "identity_id": [pd.Series.nunique],
            "booking_id": [pd.Series.count],
            "distance_m": [pd.Series.sum]
    

    or give clear indication of func param

    resampled_bookings = bookings.resample('BMS').agg(
       func= {
            "identity_id": [pd.Series.nunique],
            "booking_id": [pd.Series.count],
            "distance_m": [pd.Series.sum]