添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"], "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"], "C": [1, 2, 2, 3, 3, 4, 5, 6, 7],

running the following:

df.pivot_table(values="C", index=["A", "B"],aggfunc=np.median)

results:

Which is the require result. However, when running this with dask dataframe it doesn’t go through:

ddf = dd.from_pandas(df, npartitions=3)
ddf.pivot_table(values="C", index=["A", "B"],aggfunc=np.median)

results:
ValueError: 'index' must be the name of an existing column

seems like the DD implementation is rather limited to scalars (dask.dataframe.reshape.pivot_table — Dask documentation)
Is there another way to achieve this?

Hi @jadeidev,

Not exactly the same as you’ll get a Series instead of a DataFrame, but you can still get the same results with:

res = ddf.groupby(["A", "B"]).C.median()
# Optional, depends on what you want to do
pd_series = res.compute()

Does that help?