to your account
import pandas as pd
ddf1 = dd.from_pandas(pd.DataFrame([{'foo': float('nan')}]), npartitions=1)
ddf2 = dd.from_pandas(pd.DataFrame([{'foo': ['string']}]), npartitions=1)
ddf2['foo'] = ddf2['foo'].astype('category')
dd.concat([ddf1, ddf2])
Traceback (most recent call last)
[<ipython-input-13-9c00824f1f78>](https://localhost:8080/#) in <module>()
----> 1 dd.concat([ddf1, ddf2])
5 frames
[/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/concat.py](https://localhost:8080/#) in <genexpr>(.0)
272 if not all(
273 is_dtype_equal(other.categories.dtype, first.categories.dtype)
--> 274 for other in to_union[1:]
275 ):
276 raise TypeError("dtype of categories must be the same")
AttributeError: 'numpy.ndarray' object has no attribute 'categories'
Why is this painful?
The error gives no indication that there's a column type mismatch. In the actual situation where I encountered this error, the dataframes were much more complex, so I initially had no suspicion that the column types were the issue. I had to spend about 20 minutes poking around before I finally realized what was going on.
@eric-yu-snorkel Thanks for reporting!
I wasn't able to execute your exact example because pd.DataFrame([{'foo': ['string']}]).astype({'foo': 'category'})
gives a TypeError: unhashable type: 'list'
even with only pandas -- I think it was because we're creating the DataFrame using a dict within a list.
It still looks like an error though, here's a reproducer:
import dask.dataframe as dd
import pandas as pd
df1 = pd.DataFrame({'foo': [float('nan')]})
ddf1 = dd.from_pandas(df1, npartitions=1)
df2 = pd.DataFrame({'foo': ['string']})
ddf2 = dd.from_pandas(df2, npartitions=1)
df2['foo'] = df2['foo'].astype('category')
ddf2['foo'] = ddf2['foo'].astype('category')
pd.concat([df1, df2]) # Works
dd.concat([ddf1, ddf2]) # AttributeError: 'numpy.ndarray' object has no attribute 'categories'
needs attention
It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
label
Jun 27, 2022
needs attention
It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
label
Jul 8, 2024